All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>,
	joro@8bytes.org, trivial@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, maz@kernel.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 12:32:35 +0100	[thread overview]
Message-ID: <20200716113234.GA7290@willie-the-truck> (raw)
In-Reply-To: <aef1e4a4-d594-f477-9029-8f4295bb9f6c@arm.com>

On Thu, Jul 16, 2020 at 12:22:05PM +0100, Robin Murphy wrote:
> On 2020-07-16 11:56, John Garry wrote:
> > On 16/07/2020 11:28, Will Deacon wrote:
> > > On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
> > > > On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
> > > > > On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
> > > > > > As mentioned in [0], the CPU may consume many cycles processing
> > > > > > arm_smmu_cmdq_issue_cmdlist(). One issue we find is the
> > > > > > cmpxchg() loop to
> > > > > > get space on the queue takes approx 25% of the cycles
> > > > > > for this function.
> > > > > > 
> > > > > > This series removes that cmpxchg().
> > > > > 
> > > > > How about something much simpler like the diff below? >>
> > > > Ah, scratch that, I don't drop the lock if we fail the cas with it held.
> > > > Let me hack it some more (I have no hardware so I can only
> > > > build-test this).
> > > 
> > > Right, second attempt...
> > 
> > I can try it, but if performance if not as good, then please check mine
> > further (patch 4/4 specifically) - performance is really good, IMHO.
> 
> Perhaps a silly question (I'm too engrossed in PMU world ATM to get properly
> back up to speed on this), but couldn't this be done without cmpxchg anyway?
> Instinctively it feels like instead of maintaining a literal software copy
> of the prod value, we could resolve the "claim my slot in the queue" part
> with atomic_fetch_add on a free-running 32-bit "pseudo-prod" index, then
> whoever updates the hardware deals with the truncation and wrap bit to
> convert it to an actual register value.

Maybe, but it's easier said than done. The hard part is figuring how that
you have space and there's no way I'm touching that logic without a way to
test this.

I'm also not keen to complicate the code because of systems that don't scale
well with contended CAS [1]. If you put down loads of cores, you need an
interconnect/coherence protocol that can handle it.

Will

[1] https://lore.kernel.org/lkml/20190607072652.GA5522@hc/T/#m0d00f107c29223045933292a8b5b90d2ca9b7e5c

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: trivial@kernel.org, maz@kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 12:32:35 +0100	[thread overview]
Message-ID: <20200716113234.GA7290@willie-the-truck> (raw)
In-Reply-To: <aef1e4a4-d594-f477-9029-8f4295bb9f6c@arm.com>

On Thu, Jul 16, 2020 at 12:22:05PM +0100, Robin Murphy wrote:
> On 2020-07-16 11:56, John Garry wrote:
> > On 16/07/2020 11:28, Will Deacon wrote:
> > > On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
> > > > On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
> > > > > On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
> > > > > > As mentioned in [0], the CPU may consume many cycles processing
> > > > > > arm_smmu_cmdq_issue_cmdlist(). One issue we find is the
> > > > > > cmpxchg() loop to
> > > > > > get space on the queue takes approx 25% of the cycles
> > > > > > for this function.
> > > > > > 
> > > > > > This series removes that cmpxchg().
> > > > > 
> > > > > How about something much simpler like the diff below? >>
> > > > Ah, scratch that, I don't drop the lock if we fail the cas with it held.
> > > > Let me hack it some more (I have no hardware so I can only
> > > > build-test this).
> > > 
> > > Right, second attempt...
> > 
> > I can try it, but if performance if not as good, then please check mine
> > further (patch 4/4 specifically) - performance is really good, IMHO.
> 
> Perhaps a silly question (I'm too engrossed in PMU world ATM to get properly
> back up to speed on this), but couldn't this be done without cmpxchg anyway?
> Instinctively it feels like instead of maintaining a literal software copy
> of the prod value, we could resolve the "claim my slot in the queue" part
> with atomic_fetch_add on a free-running 32-bit "pseudo-prod" index, then
> whoever updates the hardware deals with the truncation and wrap bit to
> convert it to an actual register value.

Maybe, but it's easier said than done. The hard part is figuring how that
you have space and there's no way I'm touching that logic without a way to
test this.

I'm also not keen to complicate the code because of systems that don't scale
well with contended CAS [1]. If you put down loads of cores, you need an
interconnect/coherence protocol that can handle it.

Will

[1] https://lore.kernel.org/lkml/20190607072652.GA5522@hc/T/#m0d00f107c29223045933292a8b5b90d2ca9b7e5c
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: trivial@kernel.org, maz@kernel.org, joro@8bytes.org,
	John Garry <john.garry@huawei.com>,
	linux-kernel@vger.kernel.org, linuxarm@huawei.com,
	iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 12:32:35 +0100	[thread overview]
Message-ID: <20200716113234.GA7290@willie-the-truck> (raw)
In-Reply-To: <aef1e4a4-d594-f477-9029-8f4295bb9f6c@arm.com>

On Thu, Jul 16, 2020 at 12:22:05PM +0100, Robin Murphy wrote:
> On 2020-07-16 11:56, John Garry wrote:
> > On 16/07/2020 11:28, Will Deacon wrote:
> > > On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
> > > > On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
> > > > > On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
> > > > > > As mentioned in [0], the CPU may consume many cycles processing
> > > > > > arm_smmu_cmdq_issue_cmdlist(). One issue we find is the
> > > > > > cmpxchg() loop to
> > > > > > get space on the queue takes approx 25% of the cycles
> > > > > > for this function.
> > > > > > 
> > > > > > This series removes that cmpxchg().
> > > > > 
> > > > > How about something much simpler like the diff below? >>
> > > > Ah, scratch that, I don't drop the lock if we fail the cas with it held.
> > > > Let me hack it some more (I have no hardware so I can only
> > > > build-test this).
> > > 
> > > Right, second attempt...
> > 
> > I can try it, but if performance if not as good, then please check mine
> > further (patch 4/4 specifically) - performance is really good, IMHO.
> 
> Perhaps a silly question (I'm too engrossed in PMU world ATM to get properly
> back up to speed on this), but couldn't this be done without cmpxchg anyway?
> Instinctively it feels like instead of maintaining a literal software copy
> of the prod value, we could resolve the "claim my slot in the queue" part
> with atomic_fetch_add on a free-running 32-bit "pseudo-prod" index, then
> whoever updates the hardware deals with the truncation and wrap bit to
> convert it to an actual register value.

Maybe, but it's easier said than done. The hard part is figuring how that
you have space and there's no way I'm touching that logic without a way to
test this.

I'm also not keen to complicate the code because of systems that don't scale
well with contended CAS [1]. If you put down loads of cores, you need an
interconnect/coherence protocol that can handle it.

Will

[1] https://lore.kernel.org/lkml/20190607072652.GA5522@hc/T/#m0d00f107c29223045933292a8b5b90d2ca9b7e5c

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-07-16 11:32 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 17:28 [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-06-22 17:28 ` John Garry
2020-06-22 17:28 ` [PATCH 1/4] iommu/arm-smmu-v3: Fix trivial typo John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 2/4] iommu/arm-smmu-v3: Calculate bits for prod and owner John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 3/4] iommu/arm-smmu-v3: Always issue a CMD_SYNC per batch John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist() John Garry
2020-06-22 17:28   ` John Garry
2020-06-23  1:07   ` kernel test robot
2020-06-23  1:07     ` kernel test robot
2020-06-23  1:07     ` kernel test robot
2020-06-23  9:21     ` John Garry
2020-06-23  9:21       ` John Garry
2020-06-23  9:21       ` John Garry
2020-06-23  9:35       ` Rikard Falkeborn
2020-06-23 10:19         ` John Garry
2020-06-23 10:19           ` John Garry
2020-06-23 10:19           ` John Garry
2020-06-23 13:55           ` Rikard Falkeborn
2020-06-26 10:05             ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-23 16:22       ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-24  8:15         ` John Garry
2020-06-24  8:15           ` John Garry
2020-06-24  8:15           ` John Garry
2020-06-24  8:15           ` John Garry
2020-07-16 10:20   ` Will Deacon
2020-07-16 10:20     ` Will Deacon
2020-07-16 10:20     ` Will Deacon
2020-07-16 10:26     ` John Garry
2020-07-16 10:26       ` John Garry
2020-07-16 10:26       ` John Garry
2020-07-08 13:00 ` [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-07-08 13:00   ` John Garry
2020-07-08 13:00   ` John Garry
2020-07-16 10:19 ` Will Deacon
2020-07-16 10:19   ` Will Deacon
2020-07-16 10:19   ` Will Deacon
2020-07-16 10:22   ` Will Deacon
2020-07-16 10:22     ` Will Deacon
2020-07-16 10:22     ` Will Deacon
2020-07-16 10:28     ` Will Deacon
2020-07-16 10:28       ` Will Deacon
2020-07-16 10:28       ` Will Deacon
2020-07-16 10:56       ` John Garry
2020-07-16 10:56         ` John Garry
2020-07-16 10:56         ` John Garry
2020-07-16 11:22         ` Robin Murphy
2020-07-16 11:22           ` Robin Murphy
2020-07-16 11:22           ` Robin Murphy
2020-07-16 11:30           ` John Garry
2020-07-16 11:30             ` John Garry
2020-07-16 11:30             ` John Garry
2020-07-16 11:32           ` Will Deacon [this message]
2020-07-16 11:32             ` Will Deacon
2020-07-16 11:32             ` Will Deacon
2020-07-16 16:50             ` John Garry
2020-07-16 16:50               ` John Garry
2020-07-16 16:50               ` John Garry
2020-07-16 13:31       ` John Garry
2020-07-16 13:31         ` John Garry
2020-07-16 13:31         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200716113234.GA7290@willie-the-truck \
    --to=will@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trivial@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.