All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, will@kernel.org
Cc: joro@8bytes.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com
Subject: Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()
Date: Thu, 5 Aug 2021 12:24:33 +0100	[thread overview]
Message-ID: <ee1f3ab5-3acc-f442-f2d2-898cf88bc447@arm.com> (raw)
In-Reply-To: <1624293394-202509-1-git-send-email-john.garry@huawei.com>

On 2021-06-21 17:36, John Garry wrote:
> Members of struct "llq" will be zero-inited, apart from member max_n_shift.
> But we write llq.val straight after the init, so it was pointless to zero
> init those other members. As such, separately init member max_n_shift
> only.
> 
> In addition, struct "head" is initialised to "llq" only so that member
> max_n_shift is set. But that member is never referenced for "head", so
> remove any init there.
> 
> Removing these initializations is seen as a small performance optimisation,
> as this code is (very) hot path.

I looked at this and immediately thought "surely the compiler can see 
that all the prod/cons/val fields are written anyway and elide the 
initialisation?", so I dumped the before and after disassembly, and... oh.

You should probably clarify that it's zero-initialising all the 
cacheline padding which is both pointless and painful. With that,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

However, having looked this closely I'm now tangentially wondering why 
max_n_shift isn't inside the padded union? It's read at the same time as 
both prod and cons by queue_has_space(), and never updated, so there 
doesn't appear to be any benefit to it being in a separate cacheline all 
by itself, and llq is already twice as big as it needs to be. Sorting 
that would also be a good opportunity to store the value of interest in 
its appropriate form so we're not needlessly recalculating 1 << shift 
every flippin' time...

Robin.

> Signed-off-by: John Garry <john.garry@huawei.com>
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 54b2f27b81d4..8a8ad49bb7fd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	unsigned long flags;
>   	bool owner;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> -	struct arm_smmu_ll_queue llq = {
> -		.max_n_shift = cmdq->q.llq.max_n_shift,
> -	}, head = llq;
> +	struct arm_smmu_ll_queue llq, head;
>   	int ret = 0;
>   
> +	llq.max_n_shift = cmdq->q.llq.max_n_shift;
> +
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
>   	llq.val = READ_ONCE(cmdq->q.llq.val);
> 

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, will@kernel.org
Cc: linuxarm@huawei.com, iommu@lists.linux-foundation.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()
Date: Thu, 5 Aug 2021 12:24:33 +0100	[thread overview]
Message-ID: <ee1f3ab5-3acc-f442-f2d2-898cf88bc447@arm.com> (raw)
In-Reply-To: <1624293394-202509-1-git-send-email-john.garry@huawei.com>

On 2021-06-21 17:36, John Garry wrote:
> Members of struct "llq" will be zero-inited, apart from member max_n_shift.
> But we write llq.val straight after the init, so it was pointless to zero
> init those other members. As such, separately init member max_n_shift
> only.
> 
> In addition, struct "head" is initialised to "llq" only so that member
> max_n_shift is set. But that member is never referenced for "head", so
> remove any init there.
> 
> Removing these initializations is seen as a small performance optimisation,
> as this code is (very) hot path.

I looked at this and immediately thought "surely the compiler can see 
that all the prod/cons/val fields are written anyway and elide the 
initialisation?", so I dumped the before and after disassembly, and... oh.

You should probably clarify that it's zero-initialising all the 
cacheline padding which is both pointless and painful. With that,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

However, having looked this closely I'm now tangentially wondering why 
max_n_shift isn't inside the padded union? It's read at the same time as 
both prod and cons by queue_has_space(), and never updated, so there 
doesn't appear to be any benefit to it being in a separate cacheline all 
by itself, and llq is already twice as big as it needs to be. Sorting 
that would also be a good opportunity to store the value of interest in 
its appropriate form so we're not needlessly recalculating 1 << shift 
every flippin' time...

Robin.

> Signed-off-by: John Garry <john.garry@huawei.com>
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 54b2f27b81d4..8a8ad49bb7fd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	unsigned long flags;
>   	bool owner;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> -	struct arm_smmu_ll_queue llq = {
> -		.max_n_shift = cmdq->q.llq.max_n_shift,
> -	}, head = llq;
> +	struct arm_smmu_ll_queue llq, head;
>   	int ret = 0;
>   
> +	llq.max_n_shift = cmdq->q.llq.max_n_shift;
> +
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
>   	llq.val = READ_ONCE(cmdq->q.llq.val);
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, will@kernel.org
Cc: joro@8bytes.org, linux-arm-kernel@lists.infradead.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com
Subject: Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()
Date: Thu, 5 Aug 2021 12:24:33 +0100	[thread overview]
Message-ID: <ee1f3ab5-3acc-f442-f2d2-898cf88bc447@arm.com> (raw)
In-Reply-To: <1624293394-202509-1-git-send-email-john.garry@huawei.com>

On 2021-06-21 17:36, John Garry wrote:
> Members of struct "llq" will be zero-inited, apart from member max_n_shift.
> But we write llq.val straight after the init, so it was pointless to zero
> init those other members. As such, separately init member max_n_shift
> only.
> 
> In addition, struct "head" is initialised to "llq" only so that member
> max_n_shift is set. But that member is never referenced for "head", so
> remove any init there.
> 
> Removing these initializations is seen as a small performance optimisation,
> as this code is (very) hot path.

I looked at this and immediately thought "surely the compiler can see 
that all the prod/cons/val fields are written anyway and elide the 
initialisation?", so I dumped the before and after disassembly, and... oh.

You should probably clarify that it's zero-initialising all the 
cacheline padding which is both pointless and painful. With that,

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

However, having looked this closely I'm now tangentially wondering why 
max_n_shift isn't inside the padded union? It's read at the same time as 
both prod and cons by queue_has_space(), and never updated, so there 
doesn't appear to be any benefit to it being in a separate cacheline all 
by itself, and llq is already twice as big as it needs to be. Sorting 
that would also be a good opportunity to store the value of interest in 
its appropriate form so we're not needlessly recalculating 1 << shift 
every flippin' time...

Robin.

> Signed-off-by: John Garry <john.garry@huawei.com>
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 54b2f27b81d4..8a8ad49bb7fd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>   	unsigned long flags;
>   	bool owner;
>   	struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> -	struct arm_smmu_ll_queue llq = {
> -		.max_n_shift = cmdq->q.llq.max_n_shift,
> -	}, head = llq;
> +	struct arm_smmu_ll_queue llq, head;
>   	int ret = 0;
>   
> +	llq.max_n_shift = cmdq->q.llq.max_n_shift;
> +
>   	/* 1. Allocate some space in the queue */
>   	local_irq_save(flags);
>   	llq.val = READ_ONCE(cmdq->q.llq.val);
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-08-05 11:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 16:36 [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist() John Garry
2021-06-21 16:36 ` John Garry
2021-06-21 16:36 ` John Garry
2021-08-05 10:22 ` John Garry
2021-08-05 10:22   ` John Garry
2021-08-05 10:22   ` John Garry
2021-08-05 11:21   ` Will Deacon
2021-08-05 11:21     ` Will Deacon
2021-08-05 11:21     ` Will Deacon
2021-08-05 11:24 ` Robin Murphy [this message]
2021-08-05 11:24   ` Robin Murphy
2021-08-05 11:24   ` Robin Murphy
2021-08-05 12:18   ` Robin Murphy
2021-08-05 12:18     ` Robin Murphy
2021-08-05 12:18     ` Robin Murphy
2021-08-05 13:40   ` John Garry
2021-08-05 13:40     ` John Garry
2021-08-05 13:40     ` John Garry
2021-08-05 14:41     ` Robin Murphy
2021-08-05 14:41       ` Robin Murphy
2021-08-05 14:41       ` Robin Murphy
2021-08-05 15:16       ` John Garry
2021-08-05 15:16         ` John Garry
2021-08-05 15:16         ` John Garry
2021-08-05 17:14         ` Robin Murphy
2021-08-05 17:14           ` Robin Murphy
2021-08-05 17:14           ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee1f3ab5-3acc-f442-f2d2-898cf88bc447@arm.com \
    --to=robin.murphy@arm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.