From: Shijith Thotton <sthotton@marvell.com>
To: Julien Thierry <Julien.Thierry@arm.com>,
Steven Price <Steven.Price@arm.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Cc: Mark Rutland <Mark.Rutland@arm.com>,
"peterz@infradead.org" <peterz@infradead.org>,
Catalin Marinas <Catalin.Marinas@arm.com>,
Will Deacon <Will.Deacon@arm.com>,
"acme@kernel.org" <acme@kernel.org>,
"alexander.shishkin@linux.intel.com"
<alexander.shishkin@linux.intel.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"namhyung@kernel.org" <namhyung@kernel.org>,
"jolsa@redhat.com" <jolsa@redhat.com>,
"liwei391@huawei.com" <liwei391@huawei.com>
Subject: Re: [PATCH v3 1/9] arm64: perf: avoid PMXEV* indirection
Date: Wed, 17 Jul 2019 04:45:47 +0000 [thread overview]
Message-ID: <374a9f8f-6d1d-a43c-1e25-ab32fcb63b02@marvell.com> (raw)
In-Reply-To: <750864d6-543b-32a4-9b90-4a928c824a4b@arm.com>
Hi Julien,
On 7/16/19 3:54 AM, Julien Thierry wrote:
> On 16/07/2019 11:33, Shijith Thotton wrote:
>> On 7/10/19 4:01 AM, Julien Thierry wrote:
>>> On 10/07/2019 11:57, Steven Price wrote:
>>>> On 08/07/2019 15:32, Julien Thierry wrote:
>>>>> From: Mark Rutland <mark.rutland@arm.com>
>>>>>
>>>>> Currently we access the counter registers and their respective type
>>>>> registers indirectly. This requires us to write to PMSELR, issue an ISB,
>>>>> then access the relevant PMXEV* registers.
>>>>>
>>>>> This is unfortunate, because:
>>>>>
>>>>> * Under virtualization, accessing one registers requires two traps to
>>>>> the hypervisor, even though we could access the register directly with
>>>>> a single trap.
>>>>>
>>>>> * We have to issue an ISB which we could otherwise avoid the cost of.
>>>>>
>>>>> * When we use NMIs, the NMI handler will have to save/restore the select
>>>>> register in case the code it preempted was attempting to access a
>>>>> counter or its type register.
>>>>>
>>>>> We can avoid these issues by directly accessing the relevant registers.
>>>>> This patch adds helpers to do so.
>>>>>
>>>>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>>>>> [Julien T.: Don't inline read/write functions to avoid big code-size
>>>>> increase, remove unused read_pmevtypern function,
>>>>> fix counter index issue.]
>>>>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
>>>>> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>>>>> Cc: Jiri Olsa <jolsa@redhat.com>
>>>>> Cc: Namhyung Kim <namhyung@kernel.org>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> ---
>>>>> arch/arm64/kernel/perf_event.c | 96 ++++++++++++++++++++++++++++++++++++------
>>>>> 1 file changed, 83 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>>>>> index 96e90e2..7759f8a 100644
>>>>> --- a/arch/arm64/kernel/perf_event.c
>>>>> +++ b/arch/arm64/kernel/perf_event.c
>>>>> @@ -369,6 +369,77 @@ static inline bool armv8pmu_event_is_chained(struct perf_event *event)
>>>>> #define ARMV8_IDX_TO_COUNTER(x) \
>>>>> (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK)
>>>>>
>>>>> +/*
>>>>> + * This code is really good
>>>>> + */
>>>>> +
>>>>> +#define PMEVN_CASE(n, case_macro) \
>>>>> + case n: case_macro(n); break
>>>>> +
>>>>> +#define PMEVN_SWITCH(x, case_macro) \
>>>>> + do { \
>>>>> + switch (x) { \
>>>>> + PMEVN_CASE(0, case_macro); \
>>>>> + PMEVN_CASE(1, case_macro); \
>>>>> + PMEVN_CASE(2, case_macro); \
>>>>> + PMEVN_CASE(3, case_macro); \
>>>>> + PMEVN_CASE(4, case_macro); \
>>>>> + PMEVN_CASE(5, case_macro); \
>>>>> + PMEVN_CASE(6, case_macro); \
>>>>> + PMEVN_CASE(7, case_macro); \
>>>>> + PMEVN_CASE(8, case_macro); \
>>>>> + PMEVN_CASE(9, case_macro); \
>>>>> + PMEVN_CASE(10, case_macro); \
>>>>> + PMEVN_CASE(11, case_macro); \
>>>>> + PMEVN_CASE(12, case_macro); \
>>>>> + PMEVN_CASE(13, case_macro); \
>>>>> + PMEVN_CASE(14, case_macro); \
>>>>> + PMEVN_CASE(15, case_macro); \
>>>>> + PMEVN_CASE(16, case_macro); \
>>>>> + PMEVN_CASE(17, case_macro); \
>>>>> + PMEVN_CASE(18, case_macro); \
>>>>> + PMEVN_CASE(19, case_macro); \
>>>>
>>>> Is 20 missing on purpose?
>>>>
>>>
>>> That would have been fun to debug. Well spotted!
>>>
>>> I'll fix it in the next version.
>>>
>>> Thanks,
>>>
>>
>> Tried perf top/record on this patch and are working fine.
>> Output of perf record on "iperf -c 127.0.0.1 -t 30" is below. (single core)
>>
>> With Pseudo-NMI:
>> 20.35% [k] lock_acquire
>> 16.95% [k] lock_release
>> 11.02% [k] __arch_copy_from_user
>> 7.78% [k] lock_is_held_type
>> 2.12% [k] ipt_do_table
>> 1.34% [k] kmem_cache_free
>> 1.25% [k] _raw_spin_unlock_irqrestore
>> 1.21% [k] __nf_conntrack_find_get
>> 1.06% [k] get_page_from_freelist
>> 1.04% [k] ktime_get
>> 1.03% [k] kfree
>> 1.00% [k] nf_conntrack_tcp_packet
>> 0.96% [k] tcp_sendmsg_locked
>> 0.87% [k] __softirqentry_text_start
>> 0.87% [k] process_backlog
>> 0.76% [k] __local_bh_enable_ip
>> 0.75% [k] ip_finish_output2
>> 0.68% [k] __tcp_transmit_skb
>> 0.62% [k] enqueue_to_backlog
>> 0.60% [k] __lock_acquire.isra.17
>> 0.58% [k] __free_pages_ok
>> 0.54% [k] nf_conntrack_in
>>
>> With IRQ:
>> 16.49% [k] __arch_copy_from_user
>> 12.38% [k] _raw_spin_unlock_irqrestore
>> 9.41% [k] lock_acquire
>> 6.92% [k] lock_release
>> 3.71% [k] lock_is_held_type
>> 2.80% [k] ipt_do_table
>> 2.06% [k] get_page_from_freelist
>> 1.82% [k] ktime_get
>> 1.73% [k] process_backlog
>> 1.27% [k] nf_conntrack_tcp_packet
>> 1.21% [k] enqueue_to_backlog
>> 1.17% [k] __tcp_transmit_skb
>> 1.12% [k] ip_finish_output2
>> 1.11% [k] tcp_sendmsg_locked
>> 1.06% [k] __free_pages_ok
>> 1.05% [k] tcp_ack
>> 0.99% [k] __netif_receive_skb_core
>> 0.88% [k] __nf_conntrack_find_get
>> 0.71% [k] nf_conntrack_in
>> 0.61% [k] kmem_cache_free
>> 0.59% [k] kfree
>> 0.57% [k] __alloc_pages_nodemask
>>
>> Thanks Juilen and Wei,
>> Tested-by: Shijith Thotton <sthotton@marvell.com>
>>
>
> Thanks for testing this and confirming the improvement.
>
> I'm gonna post a new version soon. Is it alright if I apply this tag for
> the other arm64 patches that enable the use of Pseudo-NMI for the PMU?
> (I'm mostly thinking of patches 8 and 9 since there haven't been
> comments on them and won't have behavioural changes in the next version).
>
Yes please.
Thanks,
Shijith
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-07-17 4:46 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 14:32 [PATCH v3 0/9] arm_pmu: Use NMI for perf interrupt Julien Thierry
2019-07-08 14:32 ` [PATCH v3 1/9] arm64: perf: avoid PMXEV* indirection Julien Thierry
2019-07-08 15:03 ` Mark Rutland
2019-07-10 10:57 ` Steven Price
2019-07-10 11:01 ` Julien Thierry
2019-07-16 10:33 ` Shijith Thotton
2019-07-16 10:54 ` Julien Thierry
2019-07-17 4:45 ` Shijith Thotton [this message]
2019-07-08 14:32 ` [PATCH v3 2/9] arm64: perf: Remove PMU locking Julien Thierry
2019-07-08 15:03 ` Mark Rutland
2019-07-08 15:34 ` Julien Thierry
2019-07-09 11:22 ` Mark Rutland
2019-07-08 14:32 ` [PATCH v3 3/9] arm: perf: save/resore pmsel Julien Thierry
2019-07-08 15:06 ` Mark Rutland
2019-07-08 15:40 ` Julien Thierry
2019-07-08 14:32 ` [PATCH v3 4/9] arm: perf: Remove Remove PMU locking Julien Thierry
2019-07-08 15:10 ` Mark Rutland
2019-07-08 14:32 ` [PATCH v3 5/9] perf/arm_pmu: Move PMU lock to ARMv6 events Julien Thierry
2019-07-08 15:19 ` Mark Rutland
2019-07-08 15:50 ` Julien Thierry
2019-07-08 14:32 ` [PATCH v3 6/9] arm64: perf: Do not call irq_work_run in NMI context Julien Thierry
2019-07-08 15:29 ` Mark Rutland
2019-07-08 16:00 ` Julien Thierry
2019-07-08 14:32 ` [PATCH v3 7/9] arm/arm64: kvm: pmu: Make overflow handler NMI safe Julien Thierry
2019-07-08 15:30 ` Mark Rutland
2019-07-11 12:38 ` Zenghui Yu
2019-07-08 14:32 ` [PATCH v3 8/9] arm_pmu: Introduce pmu_irq_ops Julien Thierry
2019-07-08 14:32 ` [PATCH v3 9/9] arm_pmu: Use NMIs for PMU Julien Thierry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=374a9f8f-6d1d-a43c-1e25-ab32fcb63b02@marvell.com \
--to=sthotton@marvell.com \
--cc=Catalin.Marinas@arm.com \
--cc=Julien.Thierry@arm.com \
--cc=Mark.Rutland@arm.com \
--cc=Steven.Price@arm.com \
--cc=Will.Deacon@arm.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=jolsa@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=liwei391@huawei.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).