All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Ian Rogers <irogers@google.com>
Cc: Will Deacon <will@kernel.org>, James Clark <james.clark@arm.com>,
	"Mike Leach" <mike.leach@linaro.org>,
	Leo Yan <leo.yan@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Zhengjun Xing" <zhengjun.xing@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	<linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-perf-users@vger.kernel.org>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH v3 00/17] Compress the pmu_event tables
Date: Tue, 2 Aug 2022 10:08:47 +0100	[thread overview]
Message-ID: <3d0c1ec0-42ec-8c51-743b-5d93cabb53fb@huawei.com> (raw)
In-Reply-To: <CAP-5=fV0CMpBGkNOBMRjbdZfdw1mQjrDhLVG38jLtpsdWXtQ_Q@mail.gmail.com>

On 29/07/2022 18:27, Ian Rogers wrote:
>> This implementation would require core pmu.c to be changed, but there is
>> ways that this could be done without needing to change core pmu.c
>>
>> Thanks,
>> John
> Thanks John!
> 
> You are right about broadwell, it is an extreme case of sharing. IIRC
> BDX is the server core/uncore events, BDW is the consumer core/uncore
> and BDW-DE is consumer core with server uncore - so the sharing is
> inherent in this. Metrics become interesting as they may mix core and
> uncore, but I'll ignore that here.
> 
> In the old code every event needs 15 char*s, with 64-bit that is 15*8
> bytes per entry with 741 core and 23 uncore entries for BDW, and 372
> core and 1284 uncore for BDX. I expect the strings themselves will be
> shared by the C compiler, and so I just focus on the pointer sizes.
> With the new code every event is just 1 32-bit int. So for BDW we go
> from 15*8*(741+23)=91,680 to 4*(741+23)=3056, BDX is
> 15*8*(372+1284)=198720 to 4*(372+1284)=6624. This means we've gone
> from 290,400bytes to 9,680bytes for BDW and BDX. BDW-DE goes from
> 243,000bytes to 8,100bytes -


> we can ignore the costs of the strings as
> they should be fully shared, especially for BDW-DE.

Are you sure about this? I did not think that the compiler would have 
the intelligence to try to share strings. That is the basis of the size 
optimisation which I was proposing (that the compiler would not share 
strings).

> 
> If we added some kind of table sharing, so BDW-DE was core from BDW
> and uncore from BDX and the tables shared, then in the old code you
> could save nearly 0.25MB but with the new code the saving is only
> around 8KB. I think we can go after that 8KB but it is less urgent
> after this change which gets 96% of the benefit. We have 72
> architectures for jevents at the moment and so I'd ball park (assuming
> they all saved as much as BDW-DE) the max saving as about 0.5MB, which
> is 12% of what is saved here.
> 
> Longer term I'd like to make the pmu-events.c logic look closer to the
> sysfs API. Perhaps we can unify the uncore events for BDX and BDW-DE
> with some notion of a common PMU, or PMUs with common events and
> tables, and automate deduction of this. It also isn't clear to me the
> advantage of storing the metrics inside the events, separate tables
> feel cleaner. Anyway, there's lots of follow up.

Thanks,
John


WARNING: multiple messages have this Message-ID (diff)
From: John Garry <john.garry@huawei.com>
To: Ian Rogers <irogers@google.com>
Cc: Will Deacon <will@kernel.org>, James Clark <james.clark@arm.com>,
	"Mike Leach" <mike.leach@linaro.org>,
	Leo Yan <leo.yan@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Zhengjun Xing" <zhengjun.xing@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	<linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-perf-users@vger.kernel.org>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH v3 00/17] Compress the pmu_event tables
Date: Tue, 2 Aug 2022 10:08:47 +0100	[thread overview]
Message-ID: <3d0c1ec0-42ec-8c51-743b-5d93cabb53fb@huawei.com> (raw)
In-Reply-To: <CAP-5=fV0CMpBGkNOBMRjbdZfdw1mQjrDhLVG38jLtpsdWXtQ_Q@mail.gmail.com>

On 29/07/2022 18:27, Ian Rogers wrote:
>> This implementation would require core pmu.c to be changed, but there is
>> ways that this could be done without needing to change core pmu.c
>>
>> Thanks,
>> John
> Thanks John!
> 
> You are right about broadwell, it is an extreme case of sharing. IIRC
> BDX is the server core/uncore events, BDW is the consumer core/uncore
> and BDW-DE is consumer core with server uncore - so the sharing is
> inherent in this. Metrics become interesting as they may mix core and
> uncore, but I'll ignore that here.
> 
> In the old code every event needs 15 char*s, with 64-bit that is 15*8
> bytes per entry with 741 core and 23 uncore entries for BDW, and 372
> core and 1284 uncore for BDX. I expect the strings themselves will be
> shared by the C compiler, and so I just focus on the pointer sizes.
> With the new code every event is just 1 32-bit int. So for BDW we go
> from 15*8*(741+23)=91,680 to 4*(741+23)=3056, BDX is
> 15*8*(372+1284)=198720 to 4*(372+1284)=6624. This means we've gone
> from 290,400bytes to 9,680bytes for BDW and BDX. BDW-DE goes from
> 243,000bytes to 8,100bytes -


> we can ignore the costs of the strings as
> they should be fully shared, especially for BDW-DE.

Are you sure about this? I did not think that the compiler would have 
the intelligence to try to share strings. That is the basis of the size 
optimisation which I was proposing (that the compiler would not share 
strings).

> 
> If we added some kind of table sharing, so BDW-DE was core from BDW
> and uncore from BDX and the tables shared, then in the old code you
> could save nearly 0.25MB but with the new code the saving is only
> around 8KB. I think we can go after that 8KB but it is less urgent
> after this change which gets 96% of the benefit. We have 72
> architectures for jevents at the moment and so I'd ball park (assuming
> they all saved as much as BDW-DE) the max saving as about 0.5MB, which
> is 12% of what is saved here.
> 
> Longer term I'd like to make the pmu-events.c logic look closer to the
> sysfs API. Perhaps we can unify the uncore events for BDX and BDW-DE
> with some notion of a common PMU, or PMUs with common events and
> tables, and automate deduction of this. It also isn't clear to me the
> advantage of storing the metrics inside the events, separate tables
> feel cleaner. Anyway, there's lots of follow up.

Thanks,
John


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-08-02  9:08 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29  7:43 [PATCH v3 00/17] Compress the pmu_event tables Ian Rogers
2022-07-29  7:43 ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 01/17] perf jevents: Clean up pytype warnings Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 02/17] perf jevents: Simplify generation of C-string Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 03/17] perf jevents: Add JEVENTS_ARCH make option Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 04/17] perf jevent: Add an 'all' architecture argument Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 05/17] perf jevents: Remove the type/version variables Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  8:29   ` John Garry
2022-07-29  8:29     ` John Garry
2022-07-29 14:24     ` Ian Rogers
2022-07-29 14:24       ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 06/17] perf jevents: Provide path to json file on error Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 07/17] perf jevents: Sort json files entries Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 08/17] perf pmu-events: Hide pmu_sys_event_tables Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 09/17] perf pmu-events: Avoid passing pmu_events_map Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 10/17] perf pmu-events: Hide pmu_events_map Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 11/17] perf test: Use full metric resolution Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 12/17] perf pmu-events: Move test events/metrics to json Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 13/17] perf pmu-events: Don't assume pmu_event is an array Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 14/17] perf pmu-events: Hide the pmu_events Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 15/17] perf metrics: Copy entire pmu_event in find metric Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 16/17] perf jevents: Compress the pmu_events_table Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29  7:43 ` [PATCH v3 17/17] perf jevents: Fold strings optimization Ian Rogers
2022-07-29  7:43   ` Ian Rogers
2022-07-29 15:03 ` [PATCH v3 00/17] Compress the pmu_event tables John Garry
2022-07-29 15:03   ` John Garry
2022-07-29 17:27   ` Ian Rogers
2022-07-29 17:27     ` Ian Rogers
2022-08-02  9:08     ` John Garry [this message]
2022-08-02  9:08       ` John Garry
2022-08-05  8:11       ` John Garry
2022-08-05  8:11         ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d0c1ec0-42ec-8c51-743b-5d93cabb53fb@huawei.com \
    --to=john.garry@huawei.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=will@kernel.org \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.