All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Chiu <andy.chiu@sifive.com>
To: Alexandre Ghiti <alex@ghiti.fr>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>,
	palmer@dabbelt.com, paul.walmsley@sifive.com,
	 aou@eecs.berkeley.edu, rostedt@goodmis.org, mingo@redhat.com,
	 peterz@infradead.org, jpoimboe@kernel.org, jbaron@akamai.com,
	ardb@kernel.org,  greentime.hu@sifive.com, zong.li@sifive.com,
	guoren@kernel.org,  Jessica Clarke <jrtc27@jrtc27.com>,
	kernel@esmil.dk, linux-riscv@lists.infradead.org,
	 linux@yadro.com
Subject: Re: [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V
Date: Thu, 21 Mar 2024 00:36:00 +0800	[thread overview]
Message-ID: <CABgGipUzBXMC83LNGKfXem4L50FioMd7wG_5xfq0k=XefmBVgw@mail.gmail.com> (raw)
In-Reply-To: <c670594f-2264-4114-bdfa-50dc0e011372@ghiti.fr>

On Wed, Mar 20, 2024 at 1:37 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Andy,
>
> On 18/03/2024 16:31, Andy Chiu wrote:
> > Hi Evgenii,
> >
> > Thanks for your help!
> >
> > I just rebased upon 6.8-rc1 and passed the stress-ng + ftrace/nop
> > testing. I will add some random tracers to test and some optimization
> > before sending out again. Here are a few things needed:
> >
> > On Thu, Feb 22, 2024 at 12:55 AM Evgenii Shatokhin
> > <e.shatokhin@yadro.com> wrote:
> >> On 21.02.2024 08:27, Andy Chiu wrote:
> >>> «Внимание! Данное письмо от внешнего адресата!»
> >>>
> >>> On Wed, Feb 14, 2024 at 3:42 AM Evgenii Shatokhin <e.shatokhin@yadro.com> wrote:
> >>>> Hi,
> >>>>
> >>>> On 13.09.2022 12:42, Andy Chiu wrote:
> >>>>> This patch removes dependency of dynamic ftrace from calling
> >>>>> stop_machine(), and makes it compatiable with kernel preemption.
> >>>>> Originally, we ran into stack corruptions, or execution of partially
> >>>>> updated instructions when starting or stopping ftrace on a fully
> >>>>> preemptible kernel configuration. The reason is that kernel periodically
> >>>>> calls rcu_momentary_dyntick_idle() on cores waiting for the code-patching
> >>>>> core running in ftrace. Though rcu_momentary_dyntick_idle() itself is
> >>>>> marked as notrace, it would call a bunch of tracable functions if we
> >>>>> configured the kernel as preemptible. For example, these are some functions
> >>>>> that happened to have a symbol and have not been marked as notrace on a
> >>>>> RISC-V preemptible kernel compiled with GCC-11:
> >>>>>     - __rcu_report_exp_rnp()
> >>>>>     - rcu_report_exp_cpu_mult()
> >>>>>     - rcu_preempt_deferred_qs()
> >>>>>     - rcu_preempt_need_deferred_qs()
> >>>>>     - rcu_preempt_deferred_qs_irqrestore()
> >>>>>
> >>>>> Thus, this make it not ideal for us to rely on stop_machine() and
> >>>>> handly marked "notrace"s to perform runtime code patching. To remove
> >>>>> such dependency, we must make updates of code seemed atomic on running
> >>>>> cores. This might not be obvious for RISC-V since it usaually uses a pair
> >>>>> of AUIPC + JALR to perform a long jump, which cannot be modified and
> >>>>> executed concurrently if we consider preemptions. As such, this patch
> >>>>> proposed a way to make it possible. It embeds a 32-bit rel-address data
> >>>>> into instructions of each ftrace prologue and jumps indirectly. In this
> >>>>> way, we could store and load the address atomically so that the code
> >>>>> patching core could run simutaneously with the rest of running cores.
> >>>>>
> >>>>> After applying the patchset, we compiled a preemptible kernel with all
> >>>>> tracers and ftrace-selftest enabled, and booted it on a 2-core QEMU virt
> >>>>> machine. The kernel could boot up successfully, passing all ftrace
> >>>>> testsuits. Besides, we ran a script that randomly pick a tracer on every
> >>>>> 0~5 seconds. The kernel has sustained over 20K rounds of the test. In
> >>>>> contrast, a preemptible kernel without our patch would panic in few
> >>>>> rounds on the same machine.
> >>>>>
> >>>>> Though we ran into errors when using hwlat or irqsoff tracers together
> >>>>> with cpu-online stressor from stress-ng on a preemptible kernel. We
> >>>>> believe the reason may be that  percpu workers of the tracers are being
> >>>>> queued into unbounded workqueue when cpu get offlined and patches will go
> >>>>> through tracing tree.
> >>>>>
> >>>>> Additionally, we found patching of tracepoints unsafe since the
> >>>>> instructions being patched are not naturally aligned. This may result in
> >>>>> 2 half-word stores, which breaks atomicity, during the code patching.
> >>>>>
> >>>>> changes in patch v2:
> >>>>>     - Enforce alignments on all functions with a compiler workaround.
> >>>>>     - Support 64bit addressing for ftrace targets if xlen == 64
> >>>>>     - Initialize ftrace target addresses to avoid calling bad address in a
> >>>>>       hypothesized case.
> >>>>>     - Use LGPTR instead of SZPTR since .align is log-scaled for
> >>>>>       mcount-dyn.S
> >>>>>     - Require the nop instruction of all jump_labels aligns naturally on
> >>>>>       4B.
> >>>>>
> >>>>> Andy Chiu (5):
> >>>>>      riscv: align ftrace to 4 Byte boundary and increase ftrace prologue
> >>>>>        size
> >>>>>      riscv: export patch_insn_write
> >>>>>      riscv: ftrace: use indirect jump to work with kernel preemption
> >>>>>      riscv: ftrace: do not use stop_machine to update code
> >>>>>      riscv: align arch_static_branch function
> >>>>>
> >>>>>     arch/riscv/Makefile                 |   2 +-
> >>>>>     arch/riscv/include/asm/ftrace.h     |  24 ----
> >>>>>     arch/riscv/include/asm/jump_label.h |   2 +
> >>>>>     arch/riscv/include/asm/patch.h      |   1 +
> >>>>>     arch/riscv/kernel/ftrace.c          | 179 ++++++++++++++++++++--------
> >>>>>     arch/riscv/kernel/mcount-dyn.S      |  69 ++++++++---
> >>>>>     arch/riscv/kernel/patch.c           |   4 +-
> >>>>>     7 files changed, 188 insertions(+), 93 deletions(-)
> >>>>>
> >>>> First of all, thank you for working on making dynamic Ftrace robust in
> >>>> preemptible kernels on RISC-V.
> >>>> It is an important use case but, for now, dynamic Ftrace and related
> >>>> tracers cannot be safely used with such kernels.
> >>>>
> >>>> Are there any updates on this series?
> >>>> It needs a rebase, of course, but it looks doable.
> >>>>
> >>>> If I understand the discussion correctly, the only blocker was that
> >>>> using "-falign-functions" was not enough to properly align cold
> >>>> functions and "-fno-guess-branch-probability" would likely have a
> >>>> performance cost.
> >>>>
> >>>> It seems, GCC developers have recently provided a workaround for that
> >>>> (https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326,
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345#c24).
> >>>>
> >>>> "-fmin-function-alignment" should help but, I do not know, which GCC
> >>>> versions have got that patch already. In the meantime, one could
> >>>> probably check if "-fmin-function-alignment" is supported by the
> >>>> compiler and use it, if it is.
> >>>>
> >>>> Thoughts?
> >>> Hi Evgenii,
> >>>
> >>> Thanks for the update. Indeed, it is essential to this patch for
> >>> toolchain to provide forced alignment. We can test this flag in the
> >>> Makefile to sort out if toolchain supports it or not. Meanwhile, I had
> >>> figured out a way for this to work on any 2-B align addresses but
> >>> hadn't implemented it out yet. Basically it would require more
> >>> patching space for us to do software alignment. I would opt for a
> >>> special toolchain flag if the toolchain just supports it.
> >>>
> >>> Let me take some time to look and get back to you soon.
> >> Thank you! Looking forward to it.
> >>
> >> In case it helps, here is what I have checked so far.
> >>
> >> 1.
> >> I added the patch
> >> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=0f5a9a00e3ab1fe96142f304cfbcf3f63b15f326
> >> to the current revision of GCC 13.2.0 from RISC-V toolchain.
> >>
> >> Rebased your patchset on top of Linux 6.8-rc4 (mostly - context changes,
> >> SYM_FUNC_START/SYM_FUNC_END for asm symbols, etc.).
> >>
> >> Reverted 8547649981e6 ("riscv: ftrace: Fixup panic by disabling
> >> preemption").
> >>
> >> Switched from -falign-functions=4 to -fmin-function-alignment=4:
> >> ------------------
> >> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> >> index b33b787c8b07..dcd0adeebaae 100644
> >> --- a/arch/riscv/Makefile
> >> +++ b/arch/riscv/Makefile
> >> @@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
> >>          LDFLAGS_vmlinux += --no-relax
> >>          KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> >>    ifeq ($(CONFIG_RISCV_ISA_C),y)
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=12 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=12
> >> -fmin-function-alignment=4
> >>    else
> >> -       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -falign-functions=4
> >> +       CC_FLAGS_FTRACE := -fpatchable-function-entry=6 -fmin-function-alignment=4
> >>    endif
> >>    endif
> >>
> >> ------------------
> >>
> >> As far as I can see from objdump, the functions that were not aligned at
> >> 4-byte boundary with -falign-functions=4, are now aligned correctly with
> >> -fmin-function-alignment=4.
> >>
> >> 2.
> >> I tried the kernel in a QEMU VM with 2 CPUs and "-machine virt".
> >>
> >> The boottime tests for Ftrace had passed, except the tests for
> >> function_graph. I described the failure and the possible fix here:
> >> https://lore.kernel.org/all/dcc5976d-635a-4710-92df-94a99653314e@yadro.com/
> > Indeed, this is needed. I am not sure why I got ftrace boot-time tests
> > passed back then. Thank you for solving it!
> >
> >> 3.
> >> There were also boottime warnings about "RCU not on for:
> >> arch_cpu_idle+0x0/0x2c". These are probably not related to your
> >> patchset, but rather to the fact that Ftrace is enabled in a preemptble
> >> kernel where RCU does different things.
> >>
> >> As a workaround, I disabled tracing of arch_cpu_idle() for now:
> >> ------------------
> >> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> >> index 92922dbd5b5c..6abeecbfc51d 100644
> >> --- a/arch/riscv/kernel/process.c
> >> +++ b/arch/riscv/kernel/process.c
> >> @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__stack_chk_guard);
> >>
> >>    extern asmlinkage void ret_from_fork(void);
> >>
> >> -void arch_cpu_idle(void)
> >> +void noinstr arch_cpu_idle(void)
> >>    {
> >>          cpu_do_idle();
> >>    }
> >>
> >> ------------------
> >>
> >> 4.
> >> Stress-testing revealed an issue though, which I do not understand yet.
> >>
> >> Probably similar to what you did earlier, I ran a script that switched
> >> the current tracer to "function", "function_graph", "nop", "blk" each
> >> 1-5 seconds. In another shell, "stress-ng --hrtimers 1" was running.
> >>
> >> The kernel usually crashed within a few minutes, in seemingly random
> >> locations, but often in one of two ways:
> >>
> >> (a) Invalid instruction, because the address of ftrace_caller function
> >> was somehow written to the body of the traced function rather than just
> >> to the Ftrace prologue.
> > The reason for this is probably that any one of your ftrace_*_call is
> > not 8-B aligned.
> >
> >> In the following example, the crash happened at 0xffffffff800d3398. "b0
> >> d7" is actually not part of the code here, but rather the lower bytes of
> >> 0xffffffff8000d7b0, the address of ftrace_caller() in this kernel.
> > It seems like there is a bug in patch_insn_write(). I think we should
> > at least disable migration during patch_map() and patch_unmap(). I'd
> > need some time to dig into patch_map(). But since __set_fixmap() only
> > flush local tlb, I'd assume it is not safe to context switch out and
> > migrate while holding the fix-map mapping. Adding preempt_disable()
> > and preempt_enable() before calling __patch_insn_write() solves the
> > issue.
>
>
> Yes, Andrea already mentioned this, I came up with the same idea of
> preempt_disable() but then I noticed arm64 actually disables IRQ: any
> idea why?
> https://lore.kernel.org/linux-riscv/CAHVXubj7ChgpvN4F_QO0oASaT5WC2VS0Q-bEqhnmF8z8QV=yDQ@mail.gmail.com/

Hi, I took a quick look and it seems that it is a design choice in
software to me. ARM uses a spinlock to protect text and we use a
mutex. If they have a requirement to do patching while irq is off
(maybe in an ipi handler), then the only viable option would be to use
raw_spin_lock_irqsave. I think preempt_disable should be enough for us
if we use text_mutex to protect patching. Or, am I missing something?




>
>
> >> (gdb) disas /r 0xffffffff800d3382,+0x20
> >> Dump of assembler code from 0xffffffff800d3382 to 0xffffffff800d33a2:
> >> ...
> >>      0xffffffff800d3394 <clockevents_program_event+144>:  ba 87   mv
> >> a5,a4
> >>      0xffffffff800d3396 <clockevents_program_event+146>:  c1 bf   j
> >> 0xffffffff800d3366 <clockevents_program_event+98>
> >>      0xffffffff800d3398 <clockevents_program_event+148>:  b0 d7   sw
> >> a2,104(a5) // 0xffffffff8000d7b0, the address of ftrace_caller().
> >>      0xffffffff800d339a <clockevents_program_event+150>:  00 80   .2byte
> >> 0x8000
> >>      0xffffffff800d339c <clockevents_program_event+152>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d339e <clockevents_program_event+154>:  ff ff   .2byte
> >> 0xffff
> >>      0xffffffff800d33a0 <clockevents_program_event+156>:  d5 bf   j
> >> 0xffffffff800d3394 <clockevents_program_event+144
> >>
> >> The backtrace usually contains one or more occurrences of
> >> return_to_handler() in this case.
> >>
> >> [  260.520394] [<ffffffff800d3398>] clockevents_program_event+0xac/0x100
> >> [  260.521195] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.521843] [<ffffffff800c50ba>] hrtimer_interrupt+0x122/0x20c
> >> [  260.522492] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.523132] [<ffffffff8009785e>] handle_percpu_devid_irq+0x9e/0x1ec
> >> [  260.523788] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.524437] [<ffffffff8000d2bc>] return_to_handler+0x0/0x26
> >> [  260.525080] [<ffffffff80a8acfa>] handle_riscv_irq+0x4a/0x74
> >> [  260.525726] [<ffffffff80a97b9a>] call_on_irq_stack+0x32/0x40
> >> ----------------------
> >>
> >> (b) Jump to an invalid location, e.g. to the middle of a valid 4-byte
> >> instruction. %ra usually points right after the last instruction, "jalr
> >>     a2", in return_to_handler() in such cases, so the jump was likely
> >> made from there.
> > I haven't done fgraph tests yet. I will try out and see.
> >
> >> The problem is reproducible, although I have not found what causes it yet.
> >>
> >> Any help is appreciated, of course.
> >>
> >>>> Regards,
> >>>> Evgenii
> >>> Regards,
> >>> Andy
> > Also, here is another side note,
> >
> > It seems like the ftrace save/restore routine should save more
> > registers as clang's fastcc may use t2 when the number of arguments
> > exceeds what ABI defines for passing arg through registers.
> >
> > Cheers,
> > Andy
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-03-20 16:37 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13  9:42 [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 1/5] riscv: align ftrace to 4 Byte boundary and increase ftrace prologue size Andy Chiu
2022-09-15 13:53   ` Guo Ren
2022-09-17  1:15     ` Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 2/5] riscv: export patch_insn_write Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 3/5] riscv: ftrace: use indirect jump to work with kernel preemption Andy Chiu
2022-09-14 13:45   ` Guo Ren
2022-09-15 13:30     ` Guo Ren
2022-09-17  1:04     ` Andy Chiu
2022-09-17 10:56       ` Guo Ren
2024-02-20 14:17   ` Evgenii Shatokhin
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 4/5] riscv: ftrace: do not use stop_machine to update code Andy Chiu
2022-09-13  9:42 ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function Andy Chiu
2022-09-14 14:06   ` Guo Ren
2022-09-16 23:54     ` Andy Chiu
2022-09-17  0:22       ` Guo Ren
2022-09-17 18:17         ` [PATCH] riscv: jump_label: Optimize size with RISCV_ISA_C guoren
2022-09-17 18:38         ` [PATCH RFC v2 riscv/for-next 5/5] riscv: align arch_static_branch function guoren
2022-09-17 23:49           ` Guo Ren
2022-09-17 23:59           ` Guo Ren
2022-09-18  0:12           ` Jessica Clarke
2022-09-18  0:46             ` Guo Ren
2022-09-14 14:24   ` Jessica Clarke
2022-09-15  1:47     ` Guo Ren
2022-09-15  2:34       ` Jessica Clarke
2024-02-13 19:42 ` [PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V Evgenii Shatokhin
2024-02-21  5:27   ` Andy Chiu
2024-02-21 16:55     ` Evgenii Shatokhin
2024-03-06 20:57       ` Alexandre Ghiti
2024-03-07  8:35         ` Evgenii Shatokhin
2024-03-07 12:27         ` Andy Chiu
2024-03-07 13:21           ` Alexandre Ghiti
2024-03-07 15:57             ` Samuel Holland
2024-03-11 14:24               ` Andy Chiu
2024-03-19 14:50                 ` Alexandre Ghiti
2024-03-19 14:58                   ` Conor Dooley
2024-03-20 16:37                   ` Andy Chiu
2024-03-18 15:31       ` Andy Chiu
2024-03-19 15:32         ` Evgenii Shatokhin
2024-03-20 16:38           ` Andy Chiu
2024-03-19 17:37         ` Alexandre Ghiti
2024-03-20 16:36           ` Andy Chiu [this message]
2024-03-21 11:02             ` Alexandre Ghiti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABgGipUzBXMC83LNGKfXem4L50FioMd7wG_5xfq0k=XefmBVgw@mail.gmail.com' \
    --to=andy.chiu@sifive.com \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=e.shatokhin@yadro.com \
    --cc=greentime.hu@sifive.com \
    --cc=guoren@kernel.org \
    --cc=jbaron@akamai.com \
    --cc=jpoimboe@kernel.org \
    --cc=jrtc27@jrtc27.com \
    --cc=kernel@esmil.dk \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@yadro.com \
    --cc=mingo@redhat.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.