All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
@ 2023-01-23 18:30 ` Robin Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2023-01-23 18:30 UTC (permalink / raw)
  To: will; +Cc: mark.rutland, linux-arm-kernel, linux-kernel, ilkka

It turns out the optimisation implemented by commit 4f2c3872dde5 is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.

If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.

Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/arm-cmn.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index b80a9b74662b..1deb61b22bc7 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
 			hw->dn++;
 			continue;
 		}
-		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
 		hw->num_dns++;
 		if (bynodeid)
 			break;
@@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
 		return -EINVAL;
 	}
+	/*
+	 * Keep assuming non-cycles events count in all DTC domains; turns out
+	 * it's hard to make a worthwhile optimisation around this, short of
+	 * going all-in with domain-local counter allocation as well.
+	 */
+	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
 
 	return arm_cmn_validate_group(cmn, event);
 }
-- 
2.36.1.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
@ 2023-01-23 18:30 ` Robin Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Murphy @ 2023-01-23 18:30 UTC (permalink / raw)
  To: will; +Cc: mark.rutland, linux-arm-kernel, linux-kernel, ilkka

It turns out the optimisation implemented by commit 4f2c3872dde5 is
totally broken, since all the places that consume hw->dtcs_used for
events other than cycle count are still not expecting it to be sparsely
populated, and fail to read all the relevant DTC counters correctly if
so.

If implemented correctly, the optimisation potentially saves up to 3
register reads per event update, which is reasonably significant for
events targeting a single node, but still not worth a massive amount of
additional code complexity overall. Getting it right within the current
design looks a fair bit more involved than it was ever intended to be,
so let's just make a functional revert which restores the old behaviour
while still backporting easily.

Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/arm-cmn.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index b80a9b74662b..1deb61b22bc7 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
 			hw->dn++;
 			continue;
 		}
-		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
 		hw->num_dns++;
 		if (bynodeid)
 			break;
@@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
 		return -EINVAL;
 	}
+	/*
+	 * Keep assuming non-cycles events count in all DTC domains; turns out
+	 * it's hard to make a worthwhile optimisation around this, short of
+	 * going all-in with domain-local counter allocation as well.
+	 */
+	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
 
 	return arm_cmn_validate_group(cmn, event);
 }
-- 
2.36.1.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
  2023-01-23 18:30 ` Robin Murphy
@ 2023-01-23 23:11   ` Ilkka Koskinen
  -1 siblings, 0 replies; 6+ messages in thread
From: Ilkka Koskinen @ 2023-01-23 23:11 UTC (permalink / raw)
  To: Robin Murphy; +Cc: will, mark.rutland, linux-arm-kernel, linux-kernel, ilkka

Hi Robin,

On Mon, 23 Jan 2023, Robin Murphy wrote:
> It turns out the optimisation implemented by commit 4f2c3872dde5 is
> totally broken, since all the places that consume hw->dtcs_used for
> events other than cycle count are still not expecting it to be sparsely
> populated, and fail to read all the relevant DTC counters correctly if
> so.
>
> If implemented correctly, the optimisation potentially saves up to 3
> register reads per event update, which is reasonably significant for
> events targeting a single node, but still not worth a massive amount of
> additional code complexity overall. Getting it right within the current
> design looks a fair bit more involved than it was ever intended to be,
> so let's just make a functional revert which restores the old behaviour
> while still backporting easily.
>
> Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
> Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Thanks for the patch. It looks good to me and seems to work fine.

Cheers, Ilkka

> ---
> drivers/perf/arm-cmn.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index b80a9b74662b..1deb61b22bc7 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			hw->dn++;
> 			continue;
> 		}
> -		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
> 		hw->num_dns++;
> 		if (bynodeid)
> 			break;
> @@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
> 		return -EINVAL;
> 	}
> +	/*
> +	 * Keep assuming non-cycles events count in all DTC domains; turns out
> +	 * it's hard to make a worthwhile optimisation around this, short of
> +	 * going all-in with domain-local counter allocation as well.
> +	 */
> +	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
>
> 	return arm_cmn_validate_group(cmn, event);
> }
> -- 
> 2.36.1.dirty
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
@ 2023-01-23 23:11   ` Ilkka Koskinen
  0 siblings, 0 replies; 6+ messages in thread
From: Ilkka Koskinen @ 2023-01-23 23:11 UTC (permalink / raw)
  To: Robin Murphy; +Cc: will, mark.rutland, linux-arm-kernel, linux-kernel, ilkka

Hi Robin,

On Mon, 23 Jan 2023, Robin Murphy wrote:
> It turns out the optimisation implemented by commit 4f2c3872dde5 is
> totally broken, since all the places that consume hw->dtcs_used for
> events other than cycle count are still not expecting it to be sparsely
> populated, and fail to read all the relevant DTC counters correctly if
> so.
>
> If implemented correctly, the optimisation potentially saves up to 3
> register reads per event update, which is reasonably significant for
> events targeting a single node, but still not worth a massive amount of
> additional code complexity overall. Getting it right within the current
> design looks a fair bit more involved than it was ever intended to be,
> so let's just make a functional revert which restores the old behaviour
> while still backporting easily.
>
> Fixes: 4f2c3872dde5 ("perf/arm-cmn: Optimise DTC counter accesses")
> Reported-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Thanks for the patch. It looks good to me and seems to work fine.

Cheers, Ilkka

> ---
> drivers/perf/arm-cmn.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index b80a9b74662b..1deb61b22bc7 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1576,7 +1576,6 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			hw->dn++;
> 			continue;
> 		}
> -		hw->dtcs_used |= arm_cmn_node_to_xp(cmn, dn)->dtc;
> 		hw->num_dns++;
> 		if (bynodeid)
> 			break;
> @@ -1589,6 +1588,12 @@ static int arm_cmn_event_init(struct perf_event *event)
> 			nodeid, nid.x, nid.y, nid.port, nid.dev, type);
> 		return -EINVAL;
> 	}
> +	/*
> +	 * Keep assuming non-cycles events count in all DTC domains; turns out
> +	 * it's hard to make a worthwhile optimisation around this, short of
> +	 * going all-in with domain-local counter allocation as well.
> +	 */
> +	hw->dtcs_used = (1U << cmn->num_dtcs) - 1;
>
> 	return arm_cmn_validate_group(cmn, event);
> }
> -- 
> 2.36.1.dirty
>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
  2023-01-23 18:30 ` Robin Murphy
@ 2023-01-26 15:42   ` Will Deacon
  -1 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2023-01-26 15:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, kernel-team, Will Deacon, linux-kernel, ilkka,
	linux-arm-kernel, mark.rutland

On Mon, 23 Jan 2023 18:30:38 +0000, Robin Murphy wrote:
> It turns out the optimisation implemented by commit 4f2c3872dde5 is
> totally broken, since all the places that consume hw->dtcs_used for
> events other than cycle count are still not expecting it to be sparsely
> populated, and fail to read all the relevant DTC counters correctly if
> so.
> 
> If implemented correctly, the optimisation potentially saves up to 3
> register reads per event update, which is reasonably significant for
> events targeting a single node, but still not worth a massive amount of
> additional code complexity overall. Getting it right within the current
> design looks a fair bit more involved than it was ever intended to be,
> so let's just make a functional revert which restores the old behaviour
> while still backporting easily.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
      https://git.kernel.org/arm64/c/a428eb4b99ab

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
@ 2023-01-26 15:42   ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2023-01-26 15:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, kernel-team, Will Deacon, linux-kernel, ilkka,
	linux-arm-kernel, mark.rutland

On Mon, 23 Jan 2023 18:30:38 +0000, Robin Murphy wrote:
> It turns out the optimisation implemented by commit 4f2c3872dde5 is
> totally broken, since all the places that consume hw->dtcs_used for
> events other than cycle count are still not expecting it to be sparsely
> populated, and fail to read all the relevant DTC counters correctly if
> so.
> 
> If implemented correctly, the optimisation potentially saves up to 3
> register reads per event update, which is reasonably significant for
> events targeting a single node, but still not worth a massive amount of
> additional code complexity overall. Getting it right within the current
> design looks a fair bit more involved than it was ever intended to be,
> so let's just make a functional revert which restores the old behaviour
> while still backporting easily.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] Partially revert "perf/arm-cmn: Optimise DTC counter accesses"
      https://git.kernel.org/arm64/c/a428eb4b99ab

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-26 15:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-23 18:30 [PATCH] Partially revert "perf/arm-cmn: Optimise DTC counter accesses" Robin Murphy
2023-01-23 18:30 ` Robin Murphy
2023-01-23 23:11 ` Ilkka Koskinen
2023-01-23 23:11   ` Ilkka Koskinen
2023-01-26 15:42 ` Will Deacon
2023-01-26 15:42   ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.