All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: joro@8bytes.org, will@kernel.org, nicoleotsuka@gmail.com,
	thierry.reding@gmail.com, vdumpa@nvidia.com,
	nwatterson@nvidia.com, jean-philippe@linaro.org,
	thunder.leizhen@huawei.com, chenxiang66@hisilicon.com,
	Jonathan.Cameron@huawei.com, yuzenghui@huawei.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org,
	linux-tegra@vger.kernel.org, jgg@nvidia.com
Subject: Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
Date: Wed, 22 Dec 2021 11:57:05 +0000	[thread overview]
Message-ID: <2fbc6a98-6569-0a06-c901-9a37e40ccb7c@arm.com> (raw)
In-Reply-To: <20211221220037.GA6292@Asurada-Nvidia>

On 2021-12-21 22:00, Nicolin Chen wrote:
[...]
>>>> The challenge to make ECMDQ useful to Linux is how to make sure that all
>>>> the commands expected to be within scope of a future CMND_SYNC plus that
>>>> sync itself all get issued on the same queue, so I'd be mildly surprised
>>>> if you didn't have the same problem.
>>>
>>> PATCH-3 in this series actually helps align the command queues,
>>> between issued commands and SYNC, if bool sync == true. Yet, if
>>> doing something like issue->issue->issue_with_sync, it could be
>>> tricker.
>>
>> Indeed between the iommu_iotlb_gather mechanism and low-level command
>> batching things are already a lot more concentrated than they could be,
>> but arm_smmu_cmdq_batch_add() and its callers stand out as examples of
>> where we'd still be vulnerable to preemption. What I haven't even tried
>> to reason about yet is assumptions in the higher-level APIs, e.g. if
>> io-pgtable might chuck out a TLBI during an iommu_unmap() which we
>> implicitly expect a later iommu_iotlb_sync() to cover.
> 
> Though I might have oversimplified the situation here, I see
> the arm_smmu_cmdq_batch_add() calls are typically followed by
> arm_smmu_cmdq_batch_submit(). Could we just add a SYNC in the
> _batch_submit() to all the queues that it previously touched
> in the _batch_add()?

Keeping track of which queues a batch has touched is certainly possible, 
but it's yet more overhead to impose across the board when intra-batch 
preemption should (hopefully) be very rare in practice. I was thinking 
more along the lines of disabling preemption/migration for the lifetime 
of a batch, or more pragmatically just hoisting the queue selection all 
the way out to the scope of the batch itself (which also conveniently 
seems about the right shape for potentially forking off a whole other 
dedicated command submission flow from that point later).

We still can't mitigate inter-batch preemption, though, so we'll just 
have to audit everything very carefully to make sure we don't have (or 
inadvertently introduce in future) any places where that could be 
problematic. We really want to avoid over-syncing as that's liable to 
end up being just as bad for performance as the contention that we're 
nominally avoiding.

>> I've been thinking that in many ways per-domain queues make quite a bit
>> of sense and would be easier to manage than per-CPU ones - plus that's
>> pretty much the usage model once we get to VMs anyway - but that fails
>> to help the significant cases like networking and storage where many
>> CPUs are servicing a big monolithic device in a single domain :(
> 
> Yea, and it's hard to assume which client would use CMDQ more
> frequently, in order to balance or assign more queues to that
> client, which feels like a QoS conundrum.

Indeed, plus once we start assigning queues to VMs we're going to want 
to remove them from the general pool for host usage, so we definitely 
want to plan ahead here.

Cheers,
Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: jean-philippe@linaro.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, linux-tegra@vger.kernel.org,
	thierry.reding@gmail.com, jgg@nvidia.com, will@kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
Date: Wed, 22 Dec 2021 11:57:05 +0000	[thread overview]
Message-ID: <2fbc6a98-6569-0a06-c901-9a37e40ccb7c@arm.com> (raw)
In-Reply-To: <20211221220037.GA6292@Asurada-Nvidia>

On 2021-12-21 22:00, Nicolin Chen wrote:
[...]
>>>> The challenge to make ECMDQ useful to Linux is how to make sure that all
>>>> the commands expected to be within scope of a future CMND_SYNC plus that
>>>> sync itself all get issued on the same queue, so I'd be mildly surprised
>>>> if you didn't have the same problem.
>>>
>>> PATCH-3 in this series actually helps align the command queues,
>>> between issued commands and SYNC, if bool sync == true. Yet, if
>>> doing something like issue->issue->issue_with_sync, it could be
>>> tricker.
>>
>> Indeed between the iommu_iotlb_gather mechanism and low-level command
>> batching things are already a lot more concentrated than they could be,
>> but arm_smmu_cmdq_batch_add() and its callers stand out as examples of
>> where we'd still be vulnerable to preemption. What I haven't even tried
>> to reason about yet is assumptions in the higher-level APIs, e.g. if
>> io-pgtable might chuck out a TLBI during an iommu_unmap() which we
>> implicitly expect a later iommu_iotlb_sync() to cover.
> 
> Though I might have oversimplified the situation here, I see
> the arm_smmu_cmdq_batch_add() calls are typically followed by
> arm_smmu_cmdq_batch_submit(). Could we just add a SYNC in the
> _batch_submit() to all the queues that it previously touched
> in the _batch_add()?

Keeping track of which queues a batch has touched is certainly possible, 
but it's yet more overhead to impose across the board when intra-batch 
preemption should (hopefully) be very rare in practice. I was thinking 
more along the lines of disabling preemption/migration for the lifetime 
of a batch, or more pragmatically just hoisting the queue selection all 
the way out to the scope of the batch itself (which also conveniently 
seems about the right shape for potentially forking off a whole other 
dedicated command submission flow from that point later).

We still can't mitigate inter-batch preemption, though, so we'll just 
have to audit everything very carefully to make sure we don't have (or 
inadvertently introduce in future) any places where that could be 
problematic. We really want to avoid over-syncing as that's liable to 
end up being just as bad for performance as the contention that we're 
nominally avoiding.

>> I've been thinking that in many ways per-domain queues make quite a bit
>> of sense and would be easier to manage than per-CPU ones - plus that's
>> pretty much the usage model once we get to VMs anyway - but that fails
>> to help the significant cases like networking and storage where many
>> CPUs are servicing a big monolithic device in a single domain :(
> 
> Yea, and it's hard to assume which client would use CMDQ more
> frequently, in order to balance or assign more queues to that
> client, which feels like a QoS conundrum.

Indeed, plus once we start assigning queues to VMs we're going to want 
to remove them from the general pool for host usage, so we definitely 
want to plan ahead here.

Cheers,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: jean-philippe@linaro.org, nwatterson@nvidia.com,
	chenxiang66@hisilicon.com, joro@8bytes.org,
	Jonathan.Cameron@huawei.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, nicoleotsuka@gmail.com,
	linux-tegra@vger.kernel.org, thierry.reding@gmail.com,
	jgg@nvidia.com, thunder.leizhen@huawei.com, yuzenghui@huawei.com,
	will@kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
Date: Wed, 22 Dec 2021 11:57:05 +0000	[thread overview]
Message-ID: <2fbc6a98-6569-0a06-c901-9a37e40ccb7c@arm.com> (raw)
In-Reply-To: <20211221220037.GA6292@Asurada-Nvidia>

On 2021-12-21 22:00, Nicolin Chen wrote:
[...]
>>>> The challenge to make ECMDQ useful to Linux is how to make sure that all
>>>> the commands expected to be within scope of a future CMND_SYNC plus that
>>>> sync itself all get issued on the same queue, so I'd be mildly surprised
>>>> if you didn't have the same problem.
>>>
>>> PATCH-3 in this series actually helps align the command queues,
>>> between issued commands and SYNC, if bool sync == true. Yet, if
>>> doing something like issue->issue->issue_with_sync, it could be
>>> tricker.
>>
>> Indeed between the iommu_iotlb_gather mechanism and low-level command
>> batching things are already a lot more concentrated than they could be,
>> but arm_smmu_cmdq_batch_add() and its callers stand out as examples of
>> where we'd still be vulnerable to preemption. What I haven't even tried
>> to reason about yet is assumptions in the higher-level APIs, e.g. if
>> io-pgtable might chuck out a TLBI during an iommu_unmap() which we
>> implicitly expect a later iommu_iotlb_sync() to cover.
> 
> Though I might have oversimplified the situation here, I see
> the arm_smmu_cmdq_batch_add() calls are typically followed by
> arm_smmu_cmdq_batch_submit(). Could we just add a SYNC in the
> _batch_submit() to all the queues that it previously touched
> in the _batch_add()?

Keeping track of which queues a batch has touched is certainly possible, 
but it's yet more overhead to impose across the board when intra-batch 
preemption should (hopefully) be very rare in practice. I was thinking 
more along the lines of disabling preemption/migration for the lifetime 
of a batch, or more pragmatically just hoisting the queue selection all 
the way out to the scope of the batch itself (which also conveniently 
seems about the right shape for potentially forking off a whole other 
dedicated command submission flow from that point later).

We still can't mitigate inter-batch preemption, though, so we'll just 
have to audit everything very carefully to make sure we don't have (or 
inadvertently introduce in future) any places where that could be 
problematic. We really want to avoid over-syncing as that's liable to 
end up being just as bad for performance as the contention that we're 
nominally avoiding.

>> I've been thinking that in many ways per-domain queues make quite a bit
>> of sense and would be easier to manage than per-CPU ones - plus that's
>> pretty much the usage model once we get to VMs anyway - but that fails
>> to help the significant cases like networking and storage where many
>> CPUs are servicing a big monolithic device in a single domain :(
> 
> Yea, and it's hard to assume which client would use CMDQ more
> frequently, in order to balance or assign more queues to that
> client, which feels like a QoS conundrum.

Indeed, plus once we start assigning queues to VMs we're going to want 
to remove them from the general pool for host usage, so we definitely 
want to plan ahead here.

Cheers,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-12-22 11:57 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-19  7:19 [PATCH v3 0/5] iommu/arm-smmu-v3: Add NVIDIA Grace CMDQ-V Support Nicolin Chen
2021-11-19  7:19 ` Nicolin Chen
2021-11-19  7:19 ` Nicolin Chen via iommu
2021-11-19  7:19 ` [PATCH v3 1/5] iommu/arm-smmu-v3: Add CS_NONE quirk Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen via iommu
2021-11-19  7:19 ` [PATCH v3 2/5] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen via iommu
2021-11-19  7:19 ` [PATCH v3 3/5] iommu/arm-smmu-v3: Pass cmdq pointer in arm_smmu_cmdq_issue_cmdlist() Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen via iommu
2021-11-19  7:19 ` [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen via iommu
2021-12-20 18:42   ` Robin Murphy
2021-12-20 18:42     ` Robin Murphy
2021-12-20 18:42     ` Robin Murphy
2021-12-20 19:27     ` Nicolin Chen
2021-12-20 19:27       ` Nicolin Chen
2021-12-20 19:27       ` Nicolin Chen via iommu
2021-12-21 18:55       ` Robin Murphy
2021-12-21 18:55         ` Robin Murphy
2021-12-21 18:55         ` Robin Murphy
2021-12-21 22:00         ` Nicolin Chen
2021-12-21 22:00           ` Nicolin Chen
2021-12-21 22:00           ` Nicolin Chen via iommu
2021-12-22 11:57           ` Robin Murphy [this message]
2021-12-22 11:57             ` Robin Murphy
2021-12-22 11:57             ` Robin Murphy
2021-11-19  7:19 ` [PATCH v3 5/5] iommu/nvidia-grace-cmdqv: Limit CMDs for guest owned VINTF Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen
2021-11-19  7:19   ` Nicolin Chen via iommu
2021-12-22 12:32   ` Robin Murphy
2021-12-22 12:32     ` Robin Murphy
2021-12-22 12:32     ` Robin Murphy
2021-12-22 22:52     ` Nicolin Chen
2021-12-22 22:52       ` Nicolin Chen
2021-12-22 22:52       ` Nicolin Chen via iommu
2021-12-23 11:14       ` Robin Murphy
2021-12-23 11:14         ` Robin Murphy
2021-12-23 11:14         ` Robin Murphy
2021-12-24  8:02         ` Nicolin Chen
2021-12-24  8:02           ` Nicolin Chen
2021-12-24  8:02           ` Nicolin Chen via iommu
2021-12-24 12:13           ` Robin Murphy
2021-12-24 12:13             ` Robin Murphy
2021-12-24 12:13             ` Robin Murphy
2021-12-28  5:49             ` Nicolin Chen
2021-12-28  5:49               ` Nicolin Chen
2021-12-28  5:49               ` Nicolin Chen via iommu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fbc6a98-6569-0a06-c901-9a37e40ccb7c@arm.com \
    --to=robin.murphy@arm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=chenxiang66@hisilicon.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=nicoleotsuka@gmail.com \
    --cc=nicolinc@nvidia.com \
    --cc=nwatterson@nvidia.com \
    --cc=thierry.reding@gmail.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=vdumpa@nvidia.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.