linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Ghiti <alex@ghiti.fr>
To: Evgenii Shatokhin <e.shatokhin@yadro.com>,
	Andy Chiu <andy.chiu@sifive.com>
Cc: palmer@dabbelt.com, paul.walmsley@sifive.com,
	aou@eecs.berkeley.edu, rostedt@goodmis.org, mingo@redhat.com,
	peterz@infradead.org, jpoimboe@kernel.org, jbaron@akamai.com,
	ardb@kernel.org, greentime.hu@sifive.com, zong.li@sifive.com,
	guoren@kernel.org, Jessica Clarke <jrtc27@jrtc27.com>,
	kernel@esmil.dk, linux-riscv@lists.infradead.org,
	linux@yadro.com
Subject: Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
Date: Wed, 6 Mar 2024 21:57:20 +0100	[thread overview]
Message-ID: <bd7c2263-4e81-4be6-af4d-22a0842ba2e4@ghiti.fr> (raw)
In-Reply-To: <51f21b87-ebed-4411-afbc-c00d3dea2bab@yadro.com>

Hi Evgenii,

On 21/02/2024 17:55, Evgenii Shatokhin wrote:
> On 21.02.2024 08:27, Andy Chiu wrote:
>> «Внимание! Данное письмо от внешнего адресата!»
>>
>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin 
>> <e.shatokhin@yadro.com> wrote:
>>>
>>> Hi,
>>>
>>> On 13.09.2022 12:42, Andy Chiu wrote:
>>>> This patch removes dependency of dynamic ftrace from calling
>>>> stop_machine(), and makes it compatiable with kernel preemption.
>>>> Originally, we ran into stack corruptions, or execution of partially
>>>> updated instructions when starting or stopping ftrace on a fully
>>>> preemptible kernel configuration. The reason is that kernel 
>>>> periodically
>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the 
>>>> code-patching
>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
>>>> marked as notrace, it would call a bunch of tracable functions if we
>>>> configured the kernel as preemptible. For example, these are some 
>>>> functions
>>>> that happened to have a symbol and have not been marked as notrace 
>>>> on a
>>>> RISC-V preemptible kernel compiled with GCC-11:
>>>>    - __rcu_report_exp_rnp()
>>>>    - rcu_report_exp_cpu_mult()
>>>>    - rcu_preempt_deferred_qs()
>>>>    - rcu_preempt_need_deferred_qs()
>>>>    - rcu_preempt_deferred_qs_irqrestore()
>>>>
>>>> Thus, this make it not ideal for us to rely on stop_machine() and
>>>> handly marked "notrace"s to perform runtime code patching. To remove
>>>> such dependency, we must make updates of code seemed atomic on running
>>>> cores. This might not be obvious for RISC-V since it usaually uses 
>>>> a pair
>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
>>>> executed concurrently if we consider preemptions. As such, this patch
>>>> proposed a way to make it possible. It embeds a 32-bit rel-address 
>>>> data
>>>> into instructions of each ftrace prologue and jumps indirectly. In 
>>>> this
>>>> way, we could store and load the address atomically so that the code
>>>> patching core could run simutaneously with the rest of running cores.
>>>>
>>>> After applying the patchset, we compiled a preemptible kernel with all
>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU 
>>>> virt
>>>> machine. The kernel could boot up successfully, passing all ftrace
>>>> testsuits. Besides, we ran a script that randomly pick a tracer on 
>>>> every
>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
>>>> contrast, a preemptible kernel without our patch would panic in few
>>>> rounds on the same machine.
>>>>
>>>> Though we ran into errors when using hwlat or irqsoff tracers together
>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
>>>> believe the reason may be that  percpu workers of the tracers are 
>>>> being
>>>> queued into unbounded workqueue when cpu get offlined and patches 
>>>> will go
>>>> through tracing tree.
>>>>
>>>> Additionally, we found patching of tracepoints unsafe since the
>>>> instructions being patched are not naturally aligned. This may 
>>>> result in
>>>> 2 half-word stores, which breaks atomicity, during the code patching.
>>>>
>>>> changes in patch v2:
>>>>    - Enforce alignments on all functions with a compiler workaround.
>>>>    - Support 64bit addressing for ftrace targets if xlen == 64
>>>>    - Initialize ftrace target addresses to avoid calling bad 
>>>> address in a
>>>>      hypothesized case.
>>>>    - Use LGPTR instead of SZPTR since .align is log-scaled for
>>>>      mcount-dyn.S
>>>>    - Require the nop instruction of all jump_labels aligns 
>>>> naturally on
>>>>      4B.
>>>>
>>>> Andy Chiu (5):
>>>>     riscv: align ftrace to 4 Byte boundary and increase ftrace 
>>>> prologue
>>>>       size
>>>>     riscv: export patch_insn_write
>>>>     riscv: ftrace: use indirect jump to work with kernel preemption
>>>>     riscv: ftrace: do not use stop_machine to update code
>>>>     riscv: align arch_static_branch function
>>>>
>>>>    arch/riscv/Makefile                 |   2 +-
>>>>    arch/riscv/include/asm/ftrace.h     |  24 ----
>>>>    arch/riscv/include/asm/jump_label.h |   2 +
>>>>    arch/riscv/include/asm/patch.h      |   1 +
>>>>    arch/riscv/kernel/ftrace.c          | 179 
>>>> ++++++++++++++++++++--------
>>>>    arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
>>>>    arch/riscv/kernel/patch.c           |   4 +-
>>>>    7 files changed, 188 insertions(+), 93 deletions(-)
>>>>
>>>
>>> First of all, thank you for working on making dynamic Ftrace robust in
>>> preemptible kernels on RISC-V.
>>> It is an important use case but, for now, dynamic Ftrace and related
>>> tracers cannot be safely used with such kernels.
>>>
>>> Are there any updates on this series?
>>> It needs a rebase, of course, but it looks doable.
>>>
>>> If I understand the discussion correctly, the only blocker was that
>>> using "-falign-functions" was not enough to properly align cold
>>> functions and "-fno-guess-branch-probability" would likely have a
>>> performance cost.
>>>
>>> It seems, GCC developers have recently provided a workaround for that
>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326, 
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
>>>
>>> "-fmin-function-alignment" should help but, I do not know, which GCC
>>> versions have got that patch already. In the meantime, one could
>>> probably check if "-fmin-function-alignment" is supported by the
>>> compiler and use it, if it is.
>>>
>>> Thoughts?
>>
>> Hi Evgenii,
>>
>> Thanks for the update. Indeed, it is essential to this patch for
>> toolchain to provide forced alignment. We can test this flag in the
>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
>> figured out a way for this to work on any 2-B align addresses but
>> hadn't implemented it out yet. Basically it would require more
>> patching space for us to do software alignment. I would opt for a
>> special toolchain flag if the toolchain just supports it.
>>
>> Let me take some time to look and get back to you soon.
>
> Thank you! Looking forward to it.
>
> In case it helps, here is what I have checked so far.
>
> 1.
> I added the patch 
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326 
> to the current revision of GCC 13.2.0 from RISC-V toolchain.
>
> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context 
> changes, SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
>
> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling 
> preemption").
>
> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> ------------------
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index b33b787c8b07..dcd0adeebaae 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>      LDFLAGS_vmlinux += --no-relax
>      KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>  ifeq ($(CONFIG_RISCV_ISA_C),y)
> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=12 
> -fmin-function-alignment=4
>  else
> -    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> +    CC_FLAGS_FTRACE := -fpatchable-function-entry=6 
> -fmin-function-alignment=4
>  endif
>  endif
>
> ------------------
>
> As far as I can see from objdump, the functions that were not aligned 
> at 4-byte boundary with -falign-functions=4, are now aligned correctly 
> with -fmin-function-alignment=4.
>
> 2.
> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
>
> The boottime tests for Ftrace had passed, except the tests for 
> function_graph. I described the failure and the possible fix here:
> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/ 
>
>
> 3.
> There were also boottime warnings about "RCU not on for: 
> arch_cpu_idle+0x0/0x2c". These are probably not related to your 
> patchset, but rather to the fact that Ftrace is enabled in a 
> preemptble kernel where RCU does different things.
>
> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> ------------------
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 92922dbd5b5c..6abeecbfc51d 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
>
>  extern asmlinkage void ret_from_fork(void);
>
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
>  {
>      cpu_do_idle();
>  }


I came up with the same fix for this, based on a similar fix for s390. I 
have a patch ready and will send it soon since to me, it is a fix, not a 
workaround.

Thanks,

Alex


>
> ------------------
>
> 4.
> Stress-testing revealed an issue though, which I do not understand yet.
>
> Probably similar to what you did earlier, I ran a script that switched 
> the current tracer to "function", "function_graph", "nop", "blk" each 
> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
>
> The kernel usually crashed within a few minutes, in seemingly random 
> locations, but often in one of two ways:
>
> (a) Invalid instruction, because the address of ftrace_caller function 
> was somehow written to the body of the traced function rather than 
> just to the Ftrace prologue.
>
> In the following example, the crash happened at 0xffffffff800d3398. 
> "b0 d7" is actually not part of the code here, but rather the lower 
> bytes of 0xffffffff8000d7b0, the address of ftrace_caller() in this 
> kernel.
>
> (gdb) disas /r 0xffffffff800d3382,+0x20
> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> ...
>    0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv a5,a4
>    0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j 
> 0xffffffff800d3366 <clockevents_program_event+98>
>    0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw 
> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
>    0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte 
> 0x8000
>    0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte 
> 0xffff
>    0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte 
> 0xffff
>    0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j 
> 0xffffffff800d3394 <clockevents_program_event+144
>
> The backtrace usually contains one or more occurrences of 
> return_to_handler() in this case.
>
> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> ----------------------
>
> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte 
> instruction. %ra usually points right after the last instruction, 
> "jalr   a2", in return_to_handler() in such cases, so the jump was 
> likely made from there.
>
> The problem is reproducible, although I have not found what causes it 
> yet.
>
> Any help is appreciated, of course.
>
>>
>>>
>>> Regards,
>>> Evgenii
>>
>> Regards,
>> Andy
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-03-06 20:57 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
2022-09-15 13:53   ` Guo Ren
2022-09-17  1:15     ` Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
2022-09-14 13:45   ` Guo Ren
2022-09-15 13:30     ` Guo Ren
2022-09-17  1:04     ` Andy Chiu
2022-09-17 10:56       ` Guo Ren
2024-02-20 14:17   ` Evgenii Shatokhin
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
2022-09-14 14:06   ` Guo Ren
2022-09-16 23:54     ` Andy Chiu
2022-09-17  0:22       ` Guo Ren
2022-09-17 18:17         ` [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C guoren
2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
2022-09-17 23:49           ` Guo Ren
2022-09-17 23:59           ` Guo Ren
2022-09-18  0:12           ` Jessica Clarke
2022-09-18  0:46             ` Guo Ren
2022-09-14 14:24   ` Jessica Clarke
2022-09-15  1:47     ` Guo Ren
2022-09-15  2:34       ` Jessica Clarke
2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
2024-02-21  5:27   ` Andy Chiu
2024-02-21 16:55     ` Evgenii Shatokhin
2024-03-06 20:57       ` Alexandre Ghiti [this message]
2024-03-07  8:35         ` Evgenii Shatokhin
2024-03-07 12:27         ` Andy Chiu
2024-03-07 13:21           ` Alexandre Ghiti
2024-03-07 15:57             ` Samuel Holland
2024-03-11 14:24               ` Andy Chiu
2024-03-19 14:50                 ` Alexandre Ghiti
2024-03-19 14:58                   ` Conor Dooley
2024-03-20 16:37                   ` Andy Chiu
2024-03-18 15:31       ` Andy Chiu
2024-03-19 15:32         ` Evgenii Shatokhin
2024-03-20 16:38           ` Andy Chiu
2024-03-19 17:37         ` Alexandre Ghiti
2024-03-20 16:36           ` Andy Chiu
2024-03-21 11:02             ` Alexandre Ghiti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd7c2263-4e81-4be6-af4d-22a0842ba2e4@ghiti.fr \
    --to=alex@ghiti.fr \
    --cc=andy.chiu@sifive.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=e.shatokhin@yadro.com \
    --cc=greentime.hu@sifive.com \
    --cc=guoren@kernel.org \
    --cc=jbaron@akamai.com \
    --cc=jpoimboe@kernel.org \
    --cc=jrtc27@jrtc27.com \
    --cc=kernel@esmil.dk \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@yadro.com \
    --cc=mingo@redhat.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).