iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>
Cc: Vijay Kilary <vkilari@codeaurora.org>,
	Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	Jon Masters <jcm@redhat.com>, Jan Glauber <jglauber@marvell.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	iommu@lists.linux-foundation.org,
	Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
	Robin Murphy <robin.murphy@arm.com>
Subject: Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
Date: Thu, 25 Jul 2019 12:31:05 +0100	[thread overview]
Message-ID: <de49178c-0628-e759-fcc0-aabd7e86337c@huawei.com> (raw)
In-Reply-To: <20190724143355.r2zw6z37igwav2ki@willie-the-truck>

Hi Will,

>>> +		old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
>>
>> I added some basic debug to the stress test on your branch, and this cmpxchg
>> was failing ~10 times on average on my D06.
>>
>> So we're not using the spinlock now, but this cmpxchg may lack fairness.
>
> It definitely lacks fairness, although that's going to be the case in many
> other places where locking is elided as well. If it shows up as an issue, we
> should try to address it, but queueing means serialisation and that largely
> defeats the point of this patch. I also don't expect it to be a problem
> unless the cmpxchg() is heavily contended, which shouldn't be the case if
> we're batching.
>
>> Since we're batching commands, I wonder if it's better to restore the
>> spinlock and send batched commands + CMD_SYNC under the lock, and then wait
>> for the CMD_SYNC completion outside the lock.
>
> Again, we'd need some numbers, but my concern with that approach is that
> we're serialising CPUs which is what I've been trying hard to avoid.

That makes sense about the serialisation.

I'm just concerned with the scenario when the queue is heavily 
contented; here we may be gathering multiple batches of commands, and 
owners and non-owners a. may delay each other in writing the commands b. 
also pend the final CMD_SYNC consumption, which is for all gathered 
commands, and not just the commands which a particular CPU is interested in.

It
> also doesn't let you straightforwardly remove the cmpxchg() loop, since
> the owner clears the OWNED flag and you probably wouldn't want to hold
> the lock for that.
>
>> I don't know if it improves the queue contetion, but at least the prod
>> pointer would be more closely track the issued commands, such that we're not
>> waiting to kick off many gathered batches of commands, while the SMMU HW may
>> be idle (in terms of command processing).
>
> Again, probably going to need some numbers to change this, although it
> sounds like your other suggestion about having the owner move prod twice
> would largely address your concerns.

Limited initial testing shows that it makes little difference...

  Reintroducing the lock, on the other
> hand, feels like a big step backwards to me, and the whole reason I started
> down the current route was because of vague claims that the locking was a
> problem for large systems.

To me, the biggest improvement is batching of the commands, and it may 
not make much difference either way in how we deal with submitting them 
to the queue.

Cheers,
John

>
> Thanks,
>
> Will
>
> .
>


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2019-07-25 11:31 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-11 17:19 [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 01/19] iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 02/19] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 03/19] iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes Will Deacon
2019-07-24  7:19   ` Joerg Roedel
2019-07-24  7:41     ` Will Deacon
2019-07-25  7:58       ` Joerg Roedel
2019-07-11 17:19 ` [RFC PATCH v2 05/19] iommu: Introduce iommu_iotlb_gather_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 06/19] iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 07/19] iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 08/19] iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 09/19] iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 10/19] iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 11/19] iommu/io-pgtable: Remove unused ->tlb_sync() callback Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 12/19] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 13/19] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 14/19] iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 15/19] iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 16/19] iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 17/19] iommu/arm-smmu-v3: Operate directly on low-level queue where possible Will Deacon
2019-07-11 17:19 ` [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion Will Deacon
2019-07-19 11:04   ` John Garry
2019-07-24 12:15     ` Will Deacon
2019-07-24 14:03       ` John Garry
2019-07-24 14:07         ` Will Deacon
2019-07-24  8:20   ` John Garry
2019-07-24 14:33     ` Will Deacon
2019-07-25 11:31       ` John Garry [this message]
2019-07-11 17:19 ` [RFC PATCH v2 19/19] iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync() Will Deacon
2019-07-19  4:25 ` [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue Ganapatrao Kulkarni
2019-07-24 12:28   ` Will Deacon
2019-07-24  9:58 ` John Garry
2019-07-24 12:20   ` Will Deacon
2019-07-24 14:25     ` John Garry
2019-07-24 14:48       ` Will Deacon
2019-07-25 10:11         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de49178c-0628-e759-fcc0-aabd7e86337c@huawei.com \
    --to=john.garry@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jcm@redhat.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=jglauber@marvell.com \
    --cc=jnair@marvell.com \
    --cc=robin.murphy@arm.com \
    --cc=vkilari@codeaurora.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).