All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"liuqi (BA)" <liuqi115@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: "Zengtao (B)" <prime.zeng@hisilicon.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] arm64: kprobes: Enable OPTPROBE for arm64
Date: Wed, 30 Jun 2021 11:22:00 +0100	[thread overview]
Message-ID: <527265b8-35c3-eeec-5751-cc2920184d4e@arm.com> (raw)
In-Reply-To: <d19df1a099704089ad671e1d3625655d@hisilicon.com>

On 2021-06-30 08:05, Song Bao Hua (Barry Song) wrote:
>>
>> On 2021/6/4 18:50, Qi Liu wrote:
>>> This patch introduce optprobe for ARM64. In optprobe, probed
>>> instruction is replaced by a branch instruction to detour
>>> buffer. Detour buffer contains trampoline code and a call to
>>> optimized_callback(). optimized_callback() calls opt_pre_handler()
>>> to execute kprobe handler.
>>>
>>> Limitations:
>>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
>>> guarantee the offset between probe point and kprobe pre_handler
>>> is not larger than 128MiB.
>>>
>>> Performance of optprobe on Hip08 platform is test using kprobe
>>> example module[1] to analyze the latency of a kernel function,
>>> and here is the result:
> 
> + Jean-Philippe Brucker as well.
> 
> I assume both Jean and Robin expressed interest on having
> an optprobe solution on ARM64 in a previous discussion
> when I tried to add some tracepoints for debugging:
> "[PATCH] iommu/arm-smmu-v3: add tracepoints for cmdq_issue_cmdlist"
> 
> https://lore.kernel.org/linux-arm-kernel/20200828083325.GC3825485@myrica/
> https://lore.kernel.org/linux-arm-kernel/9acf1acf-19fb-26db-e908-eb4d4c666bae@arm.com/

FWIW mine was a more general comment that if the possibility exists, 
making kprobes faster seems more productive than adding tracepoints to 
every bit of code where performance might be of interest to work around 
kprobes being slow. I don't know enough about the details to 
meaningfully review an implementation, sorry.

>>>
>>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
>> mples/kprobes/kretprobe_example.c
>>>
>>> kprobe before optimized:
>>> [280709.846380] do_empty returned 0 and took 1530 ns to execute
>>> [280709.852057] do_empty returned 0 and took 550 ns to execute
>>> [280709.857631] do_empty returned 0 and took 440 ns to execute
>>> [280709.863215] do_empty returned 0 and took 380 ns to execute
>>> [280709.868787] do_empty returned 0 and took 360 ns to execute
>>> [280709.874362] do_empty returned 0 and took 340 ns to execute
>>> [280709.879936] do_empty returned 0 and took 320 ns to execute
>>> [280709.885505] do_empty returned 0 and took 300 ns to execute
>>> [280709.891075] do_empty returned 0 and took 280 ns to execute
>>> [280709.896646] do_empty returned 0 and took 290 ns to execute
>>> [280709.902220] do_empty returned 0 and took 290 ns to execute
>>> [280709.907807] do_empty returned 0 and took 290 ns to execute
> 
> I used to see the same phenomenon when I used kprobe to debug
> arm64 smmu driver. When a kprobe was executed for the first
> time, it was crazily slow. But second time it became much faster
> though it was still slow and affected the performance related
> debugging negatively.
> Not sure if it was due to hot cache or something. I didn't dig
> into it.

 From the shape of the data, my hunch would be that retraining of branch 
predictors is probably a factor (but again I don't know enough about the 
existing kprobes implementation to back that up).

Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"liuqi (BA)" <liuqi115@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: "Zengtao (B)" <prime.zeng@hisilicon.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] arm64: kprobes: Enable OPTPROBE for arm64
Date: Wed, 30 Jun 2021 11:22:00 +0100	[thread overview]
Message-ID: <527265b8-35c3-eeec-5751-cc2920184d4e@arm.com> (raw)
In-Reply-To: <d19df1a099704089ad671e1d3625655d@hisilicon.com>

On 2021-06-30 08:05, Song Bao Hua (Barry Song) wrote:
>>
>> On 2021/6/4 18:50, Qi Liu wrote:
>>> This patch introduce optprobe for ARM64. In optprobe, probed
>>> instruction is replaced by a branch instruction to detour
>>> buffer. Detour buffer contains trampoline code and a call to
>>> optimized_callback(). optimized_callback() calls opt_pre_handler()
>>> to execute kprobe handler.
>>>
>>> Limitations:
>>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
>>> guarantee the offset between probe point and kprobe pre_handler
>>> is not larger than 128MiB.
>>>
>>> Performance of optprobe on Hip08 platform is test using kprobe
>>> example module[1] to analyze the latency of a kernel function,
>>> and here is the result:
> 
> + Jean-Philippe Brucker as well.
> 
> I assume both Jean and Robin expressed interest on having
> an optprobe solution on ARM64 in a previous discussion
> when I tried to add some tracepoints for debugging:
> "[PATCH] iommu/arm-smmu-v3: add tracepoints for cmdq_issue_cmdlist"
> 
> https://lore.kernel.org/linux-arm-kernel/20200828083325.GC3825485@myrica/
> https://lore.kernel.org/linux-arm-kernel/9acf1acf-19fb-26db-e908-eb4d4c666bae@arm.com/

FWIW mine was a more general comment that if the possibility exists, 
making kprobes faster seems more productive than adding tracepoints to 
every bit of code where performance might be of interest to work around 
kprobes being slow. I don't know enough about the details to 
meaningfully review an implementation, sorry.

>>>
>>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
>> mples/kprobes/kretprobe_example.c
>>>
>>> kprobe before optimized:
>>> [280709.846380] do_empty returned 0 and took 1530 ns to execute
>>> [280709.852057] do_empty returned 0 and took 550 ns to execute
>>> [280709.857631] do_empty returned 0 and took 440 ns to execute
>>> [280709.863215] do_empty returned 0 and took 380 ns to execute
>>> [280709.868787] do_empty returned 0 and took 360 ns to execute
>>> [280709.874362] do_empty returned 0 and took 340 ns to execute
>>> [280709.879936] do_empty returned 0 and took 320 ns to execute
>>> [280709.885505] do_empty returned 0 and took 300 ns to execute
>>> [280709.891075] do_empty returned 0 and took 280 ns to execute
>>> [280709.896646] do_empty returned 0 and took 290 ns to execute
>>> [280709.902220] do_empty returned 0 and took 290 ns to execute
>>> [280709.907807] do_empty returned 0 and took 290 ns to execute
> 
> I used to see the same phenomenon when I used kprobe to debug
> arm64 smmu driver. When a kprobe was executed for the first
> time, it was crazily slow. But second time it became much faster
> though it was still slow and affected the performance related
> debugging negatively.
> Not sure if it was due to hot cache or something. I didn't dig
> into it.

 From the shape of the data, my hunch would be that retraining of branch 
predictors is probably a factor (but again I don't know enough about the 
existing kprobes implementation to back that up).

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-30 10:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04 10:50 [RFC PATCH] arm64: kprobes: Enable OPTPROBE for arm64 Qi Liu
2021-06-04 10:50 ` Qi Liu
2021-06-04 18:19 ` kernel test robot
2021-06-23  2:27 ` liuqi (BA)
2021-06-23  2:27   ` liuqi (BA)
2021-06-30  7:05   ` Song Bao Hua (Barry Song)
2021-06-30  7:05     ` Song Bao Hua (Barry Song)
2021-06-30 10:22     ` Robin Murphy [this message]
2021-06-30 10:22       ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=527265b8-35c3-eeec-5751-cc2920184d4e@arm.com \
    --to=robin.murphy@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=jean-philippe@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liuqi115@huawei.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.