All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>
Cc: <robin.murphy@arm.com>, <joro@8bytes.org>, <trivial@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<iommu@lists.linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <linuxarm@huawei.com>,
	<maz@kernel.org>
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 14:31:17 +0100	[thread overview]
Message-ID: <54b0c8b9-d079-f832-2338-11cf6b09fa00@huawei.com> (raw)
In-Reply-To: <20200716102814.GD7036@willie-the-truck>

On 16/07/2020 11:28, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
>> On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
>>> On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
>>>> As mentioned in [0], the CPU may consume many cycles processing
>>>> arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
>>>> get space on the queue takes approx 25% of the cycles for this function.
>>>>
>>>> This series removes that cmpxchg().
>>>
>>> How about something much simpler like the diff below?
>>
>> Ah, scratch that, I don't drop the lock if we fail the cas with it held.
>> Let me hack it some more (I have no hardware so I can only build-test this).
> 
> Right, second attempt...
> 
> Will

Unfortunately that hangs my machine during boot:

[10.902893] 00:01: ttyS0 at MMIO 0x3f00003f8 (irq = 6, base_baud = 
115200) is a 16550A
[10.912048] SuperH (H)SCI(F) driver initialized
[10.916811] msm_serial: driver initialized
[10.921371] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[10.926946] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit 
(features 0x00000fef)
[10.935374] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[10.942522] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq


> 
> --->8
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677a5c41..e6bcddd6ef69 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -560,6 +560,7 @@ struct arm_smmu_cmdq {
>   	atomic_long_t			*valid_map;
>   	atomic_t			owner_prod;
>   	atomic_t			lock;
> +	spinlock_t			slock;
>   };
>   
>   struct arm_smmu_cmdq_batch {
> @@ -1378,7 +1379,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	u64 cmd_sync[CMDQ_ENT_DWORDS];
>   	u32 prod;
>   	unsigned long flags;
> -	bool owner;
> +	bool owner, locked = false;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
>   	struct arm_smmu_ll_queue llq = {
>   		.max_n_shift = cmdq->q.llq.max_n_shift,
> @@ -1387,27 +1388,38 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
> -	llq.val = READ_ONCE(cmdq->q.llq.val);
>   	do {
>   		u64 old;
> +		llq.val = READ_ONCE(cmdq->q.llq.val);
>   
> -		while (!queue_has_space(&llq, n + sync)) {
> +		if (queue_has_space(&llq, n + sync))
> +			goto try_cas;
> +
> +		if (locked)
> +			spin_unlock(&cmdq->slock);
> +
> +		do {
>   			local_irq_restore(flags);
>   			if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
>   				dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>   			local_irq_save(flags);
> -		}
> +		} while (!queue_has_space(&llq, n + sync));
>   
> +try_cas:
>   		head.cons = llq.cons;
>   		head.prod = queue_inc_prod_n(&llq, n + sync) |
>   					     CMDQ_PROD_OWNED_FLAG;
>   
>   		old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
> -		if (old == llq.val)
> +		if (old != llq.val)

Not sure why you changed this. And if I change it back, it seems that we 
could drop out of the loop with cmdq->slock held, so need to drop the 
lock also.

I tried that and it stops my machine hanging. Let me know that was the 
intention, so I can test.

Thanks,
John

>   			break;
>   
> -		llq.val = old;
> +		if (!locked) {
> +			spin_lock(&cmdq->slock);
> +			locked = true;
> +		}
>   	} while (1);
> +
>   	owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
>   	head.prod &= ~CMDQ_PROD_OWNED_FLAG;
>   	llq.prod &= ~CMDQ_PROD_OWNED_FLAG;
> @@ -3192,6 +3204,7 @@ static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
>   
>   	atomic_set(&cmdq->owner_prod, 0);
>   	atomic_set(&cmdq->lock, 0);
> +	spin_lock_init(&cmdq->slock);
>   
>   	bitmap = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
>   	if (!bitmap) {
> .
> 


WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>
Cc: trivial@kernel.org, maz@kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, iommu@lists.linux-foundation.org,
	robin.murphy@arm.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 14:31:17 +0100	[thread overview]
Message-ID: <54b0c8b9-d079-f832-2338-11cf6b09fa00@huawei.com> (raw)
In-Reply-To: <20200716102814.GD7036@willie-the-truck>

On 16/07/2020 11:28, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
>> On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
>>> On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
>>>> As mentioned in [0], the CPU may consume many cycles processing
>>>> arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
>>>> get space on the queue takes approx 25% of the cycles for this function.
>>>>
>>>> This series removes that cmpxchg().
>>>
>>> How about something much simpler like the diff below?
>>
>> Ah, scratch that, I don't drop the lock if we fail the cas with it held.
>> Let me hack it some more (I have no hardware so I can only build-test this).
> 
> Right, second attempt...
> 
> Will

Unfortunately that hangs my machine during boot:

[10.902893] 00:01: ttyS0 at MMIO 0x3f00003f8 (irq = 6, base_baud = 
115200) is a 16550A
[10.912048] SuperH (H)SCI(F) driver initialized
[10.916811] msm_serial: driver initialized
[10.921371] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[10.926946] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit 
(features 0x00000fef)
[10.935374] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[10.942522] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq


> 
> --->8
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677a5c41..e6bcddd6ef69 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -560,6 +560,7 @@ struct arm_smmu_cmdq {
>   	atomic_long_t			*valid_map;
>   	atomic_t			owner_prod;
>   	atomic_t			lock;
> +	spinlock_t			slock;
>   };
>   
>   struct arm_smmu_cmdq_batch {
> @@ -1378,7 +1379,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	u64 cmd_sync[CMDQ_ENT_DWORDS];
>   	u32 prod;
>   	unsigned long flags;
> -	bool owner;
> +	bool owner, locked = false;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
>   	struct arm_smmu_ll_queue llq = {
>   		.max_n_shift = cmdq->q.llq.max_n_shift,
> @@ -1387,27 +1388,38 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
> -	llq.val = READ_ONCE(cmdq->q.llq.val);
>   	do {
>   		u64 old;
> +		llq.val = READ_ONCE(cmdq->q.llq.val);
>   
> -		while (!queue_has_space(&llq, n + sync)) {
> +		if (queue_has_space(&llq, n + sync))
> +			goto try_cas;
> +
> +		if (locked)
> +			spin_unlock(&cmdq->slock);
> +
> +		do {
>   			local_irq_restore(flags);
>   			if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
>   				dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>   			local_irq_save(flags);
> -		}
> +		} while (!queue_has_space(&llq, n + sync));
>   
> +try_cas:
>   		head.cons = llq.cons;
>   		head.prod = queue_inc_prod_n(&llq, n + sync) |
>   					     CMDQ_PROD_OWNED_FLAG;
>   
>   		old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
> -		if (old == llq.val)
> +		if (old != llq.val)

Not sure why you changed this. And if I change it back, it seems that we 
could drop out of the loop with cmdq->slock held, so need to drop the 
lock also.

I tried that and it stops my machine hanging. Let me know that was the 
intention, so I can test.

Thanks,
John

>   			break;
>   
> -		llq.val = old;
> +		if (!locked) {
> +			spin_lock(&cmdq->slock);
> +			locked = true;
> +		}
>   	} while (1);
> +
>   	owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
>   	head.prod &= ~CMDQ_PROD_OWNED_FLAG;
>   	llq.prod &= ~CMDQ_PROD_OWNED_FLAG;
> @@ -3192,6 +3204,7 @@ static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
>   
>   	atomic_set(&cmdq->owner_prod, 0);
>   	atomic_set(&cmdq->lock, 0);
> +	spin_lock_init(&cmdq->slock);
>   
>   	bitmap = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
>   	if (!bitmap) {
> .
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: Will Deacon <will@kernel.org>
Cc: trivial@kernel.org, maz@kernel.org, joro@8bytes.org,
	linux-kernel@vger.kernel.org, linuxarm@huawei.com,
	iommu@lists.linux-foundation.org, robin.murphy@arm.com,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency
Date: Thu, 16 Jul 2020 14:31:17 +0100	[thread overview]
Message-ID: <54b0c8b9-d079-f832-2338-11cf6b09fa00@huawei.com> (raw)
In-Reply-To: <20200716102814.GD7036@willie-the-truck>

On 16/07/2020 11:28, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 11:22:33AM +0100, Will Deacon wrote:
>> On Thu, Jul 16, 2020 at 11:19:41AM +0100, Will Deacon wrote:
>>> On Tue, Jun 23, 2020 at 01:28:36AM +0800, John Garry wrote:
>>>> As mentioned in [0], the CPU may consume many cycles processing
>>>> arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
>>>> get space on the queue takes approx 25% of the cycles for this function.
>>>>
>>>> This series removes that cmpxchg().
>>>
>>> How about something much simpler like the diff below?
>>
>> Ah, scratch that, I don't drop the lock if we fail the cas with it held.
>> Let me hack it some more (I have no hardware so I can only build-test this).
> 
> Right, second attempt...
> 
> Will

Unfortunately that hangs my machine during boot:

[10.902893] 00:01: ttyS0 at MMIO 0x3f00003f8 (irq = 6, base_baud = 
115200) is a 16550A
[10.912048] SuperH (H)SCI(F) driver initialized
[10.916811] msm_serial: driver initialized
[10.921371] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[10.926946] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit 
(features 0x00000fef)
[10.935374] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[10.942522] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq


> 
> --->8
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index f578677a5c41..e6bcddd6ef69 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -560,6 +560,7 @@ struct arm_smmu_cmdq {
>   	atomic_long_t			*valid_map;
>   	atomic_t			owner_prod;
>   	atomic_t			lock;
> +	spinlock_t			slock;
>   };
>   
>   struct arm_smmu_cmdq_batch {
> @@ -1378,7 +1379,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	u64 cmd_sync[CMDQ_ENT_DWORDS];
>   	u32 prod;
>   	unsigned long flags;
> -	bool owner;
> +	bool owner, locked = false;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
>   	struct arm_smmu_ll_queue llq = {
>   		.max_n_shift = cmdq->q.llq.max_n_shift,
> @@ -1387,27 +1388,38 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
> -	llq.val = READ_ONCE(cmdq->q.llq.val);
>   	do {
>   		u64 old;
> +		llq.val = READ_ONCE(cmdq->q.llq.val);
>   
> -		while (!queue_has_space(&llq, n + sync)) {
> +		if (queue_has_space(&llq, n + sync))
> +			goto try_cas;
> +
> +		if (locked)
> +			spin_unlock(&cmdq->slock);
> +
> +		do {
>   			local_irq_restore(flags);
>   			if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
>   				dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>   			local_irq_save(flags);
> -		}
> +		} while (!queue_has_space(&llq, n + sync));
>   
> +try_cas:
>   		head.cons = llq.cons;
>   		head.prod = queue_inc_prod_n(&llq, n + sync) |
>   					     CMDQ_PROD_OWNED_FLAG;
>   
>   		old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
> -		if (old == llq.val)
> +		if (old != llq.val)

Not sure why you changed this. And if I change it back, it seems that we 
could drop out of the loop with cmdq->slock held, so need to drop the 
lock also.

I tried that and it stops my machine hanging. Let me know that was the 
intention, so I can test.

Thanks,
John

>   			break;
>   
> -		llq.val = old;
> +		if (!locked) {
> +			spin_lock(&cmdq->slock);
> +			locked = true;
> +		}
>   	} while (1);
> +
>   	owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
>   	head.prod &= ~CMDQ_PROD_OWNED_FLAG;
>   	llq.prod &= ~CMDQ_PROD_OWNED_FLAG;
> @@ -3192,6 +3204,7 @@ static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu)
>   
>   	atomic_set(&cmdq->owner_prod, 0);
>   	atomic_set(&cmdq->lock, 0);
> +	spin_lock_init(&cmdq->slock);
>   
>   	bitmap = (atomic_long_t *)bitmap_zalloc(nents, GFP_KERNEL);
>   	if (!bitmap) {
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-07-16 13:33 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 17:28 [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-06-22 17:28 ` John Garry
2020-06-22 17:28 ` [PATCH 1/4] iommu/arm-smmu-v3: Fix trivial typo John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 2/4] iommu/arm-smmu-v3: Calculate bits for prod and owner John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 3/4] iommu/arm-smmu-v3: Always issue a CMD_SYNC per batch John Garry
2020-06-22 17:28   ` John Garry
2020-06-22 17:28 ` [PATCH 4/4] iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist() John Garry
2020-06-22 17:28   ` John Garry
2020-06-23  1:07   ` kernel test robot
2020-06-23  1:07     ` kernel test robot
2020-06-23  1:07     ` kernel test robot
2020-06-23  9:21     ` John Garry
2020-06-23  9:21       ` John Garry
2020-06-23  9:21       ` John Garry
2020-06-23  9:35       ` Rikard Falkeborn
2020-06-23 10:19         ` John Garry
2020-06-23 10:19           ` John Garry
2020-06-23 10:19           ` John Garry
2020-06-23 13:55           ` Rikard Falkeborn
2020-06-26 10:05             ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-26 10:05               ` John Garry
2020-06-23 16:22       ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-23 16:22         ` Robin Murphy
2020-06-24  8:15         ` John Garry
2020-06-24  8:15           ` John Garry
2020-06-24  8:15           ` John Garry
2020-06-24  8:15           ` John Garry
2020-07-16 10:20   ` Will Deacon
2020-07-16 10:20     ` Will Deacon
2020-07-16 10:20     ` Will Deacon
2020-07-16 10:26     ` John Garry
2020-07-16 10:26       ` John Garry
2020-07-16 10:26       ` John Garry
2020-07-08 13:00 ` [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency John Garry
2020-07-08 13:00   ` John Garry
2020-07-08 13:00   ` John Garry
2020-07-16 10:19 ` Will Deacon
2020-07-16 10:19   ` Will Deacon
2020-07-16 10:19   ` Will Deacon
2020-07-16 10:22   ` Will Deacon
2020-07-16 10:22     ` Will Deacon
2020-07-16 10:22     ` Will Deacon
2020-07-16 10:28     ` Will Deacon
2020-07-16 10:28       ` Will Deacon
2020-07-16 10:28       ` Will Deacon
2020-07-16 10:56       ` John Garry
2020-07-16 10:56         ` John Garry
2020-07-16 10:56         ` John Garry
2020-07-16 11:22         ` Robin Murphy
2020-07-16 11:22           ` Robin Murphy
2020-07-16 11:22           ` Robin Murphy
2020-07-16 11:30           ` John Garry
2020-07-16 11:30             ` John Garry
2020-07-16 11:30             ` John Garry
2020-07-16 11:32           ` Will Deacon
2020-07-16 11:32             ` Will Deacon
2020-07-16 11:32             ` Will Deacon
2020-07-16 16:50             ` John Garry
2020-07-16 16:50               ` John Garry
2020-07-16 16:50               ` John Garry
2020-07-16 13:31       ` John Garry [this message]
2020-07-16 13:31         ` John Garry
2020-07-16 13:31         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54b0c8b9-d079-f832-2338-11cf6b09fa00@huawei.com \
    --to=john.garry@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=maz@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trivial@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.