linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] kprobes: notrace enhancements
@ 2022-11-22 19:53 Nadav Amit
  2022-11-22 19:53 ` [PATCH 1/3] kprobes: Mark descendents of core_kernel_text as notrace Nadav Amit
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Nadav Amit @ 2022-11-22 19:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arm-kernel, linux-kernel, linux-ia64, linux-um, linux-arch,
	linux-mm, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Arnd Bergmann, Andrew Morton, Nadav Amit

From: Nadav Amit <namit@vmware.com>

There are inconsistencies and some issues in marking functions as
notrace. On one hand, all inline functions are marked as "notrace" and
some libraries cannot be traced. At the same time, some functions should
not be traced but are not marked as notrace.

These patch address issues that I encountered during work on an
automatic tracing tool.

Nadav Amit (3):
  kprobes: Mark descendents of core_kernel_text as notrace
  lib/usercopy: Allow traceing of usercopy, xarray, iov_iter
  compiler: inline does not imply notrace

 arch/arm/kernel/process.c             | 2 +-
 arch/ia64/mm/init.c                   | 2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
 arch/x86/um/mem_32.c                  | 2 +-
 include/asm-generic/sections.h        | 6 +++---
 include/linux/compiler_types.h        | 2 +-
 include/linux/kallsyms.h              | 6 +++---
 include/linux/mm.h                    | 2 +-
 lib/Makefile                          | 3 +++
 9 files changed, 15 insertions(+), 12 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] kprobes: Mark descendents of core_kernel_text as notrace
  2022-11-22 19:53 [PATCH 0/3] kprobes: notrace enhancements Nadav Amit
@ 2022-11-22 19:53 ` Nadav Amit
  2022-11-22 19:53 ` [PATCH 2/3] lib/usercopy: Allow traceing of usercopy, xarray, iov_iter Nadav Amit
  2022-11-22 19:53 ` [PATCH 3/3] compiler: inline does not imply notrace Nadav Amit
  2 siblings, 0 replies; 12+ messages in thread
From: Nadav Amit @ 2022-11-22 19:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arm-kernel, linux-kernel, linux-ia64, linux-um, linux-arch,
	linux-mm, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Arnd Bergmann, Andrew Morton, Nadav Amit,
	Marcin Nowakowski

From: Nadav Amit <namit@vmware.com>

Commit c0d80ddab899 ("kernel/extable.c: mark core_kernel_text notrace")
disabled the tracing of core_kernel_text to avoid recursive calls. For
the same reasons, all the functions in the dynamic extents of
core_kernel_text should be marked as notrace.

Cc: Marcin Nowakowski <marcin.nowakowski@mips.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/arm/kernel/process.c             | 2 +-
 arch/ia64/mm/init.c                   | 2 +-
 arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
 arch/x86/um/mem_32.c                  | 2 +-
 include/asm-generic/sections.h        | 6 +++---
 include/linux/kallsyms.h              | 6 +++---
 include/linux/mm.h                    | 2 +-
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index a2b31d91a1b6..a32ca8fcab5a 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -331,7 +331,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
 	return (addr >= gate_vma.vm_start) && (addr < gate_vma.vm_end);
 }
 
-int in_gate_area_no_mm(unsigned long addr)
+notrace int in_gate_area_no_mm(unsigned long addr)
 {
 	return in_gate_area(NULL, addr);
 }
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index fc4e4217e87f..e3d63d3d3e59 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -284,7 +284,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 	return &gate_vma;
 }
 
-int in_gate_area_no_mm(unsigned long addr)
+notrace int in_gate_area_no_mm(unsigned long addr)
 {
 	if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END))
 		return 1;
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 4af81df133ee..68ebad6abd2b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -340,7 +340,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
  * context. It is less reliable than using a task's mm and may give
  * false positives.
  */
-int in_gate_area_no_mm(unsigned long addr)
+notrace int in_gate_area_no_mm(unsigned long addr)
 {
 	return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR;
 }
diff --git a/arch/x86/um/mem_32.c b/arch/x86/um/mem_32.c
index cafd01f730da..cfec8b00b136 100644
--- a/arch/x86/um/mem_32.c
+++ b/arch/x86/um/mem_32.c
@@ -28,7 +28,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 	return FIXADDR_USER_START ? &gate_vma : NULL;
 }
 
-int in_gate_area_no_mm(unsigned long addr)
+notrace int in_gate_area_no_mm(unsigned long addr)
 {
 	if (!FIXADDR_USER_START)
 		return 0;
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index db13bb620f52..d519965b67bf 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -188,7 +188,7 @@ static inline bool is_kernel_rodata(unsigned long addr)
  *
  * Returns: true if the address is located in .init.text, false otherwise.
  */
-static inline bool is_kernel_inittext(unsigned long addr)
+static notrace inline bool is_kernel_inittext(unsigned long addr)
 {
 	return addr >= (unsigned long)_sinittext &&
 	       addr < (unsigned long)_einittext;
@@ -203,7 +203,7 @@ static inline bool is_kernel_inittext(unsigned long addr)
  * Returns: true if the address is located in .text, false otherwise.
  * Note: an internal helper, only check the range of _stext to _etext.
  */
-static inline bool __is_kernel_text(unsigned long addr)
+static notrace inline bool __is_kernel_text(unsigned long addr)
 {
 	return addr >= (unsigned long)_stext &&
 	       addr < (unsigned long)_etext;
@@ -219,7 +219,7 @@ static inline bool __is_kernel_text(unsigned long addr)
  *       and range from __init_begin to __init_end, which can be outside
  *       of the _stext to _end range.
  */
-static inline bool __is_kernel(unsigned long addr)
+static notrace inline bool __is_kernel(unsigned long addr)
 {
 	return ((addr >= (unsigned long)_stext &&
 	         addr < (unsigned long)_end) ||
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 649faac31ddb..7ee6a734b738 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -24,21 +24,21 @@
 struct cred;
 struct module;
 
-static inline int is_kernel_text(unsigned long addr)
+static notrace inline int is_kernel_text(unsigned long addr)
 {
 	if (__is_kernel_text(addr))
 		return 1;
 	return in_gate_area_no_mm(addr);
 }
 
-static inline int is_kernel(unsigned long addr)
+static notrace inline int is_kernel(unsigned long addr)
 {
 	if (__is_kernel(addr))
 		return 1;
 	return in_gate_area_no_mm(addr);
 }
 
-static inline int is_ksym_addr(unsigned long addr)
+static notrace inline int is_ksym_addr(unsigned long addr)
 {
 	if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
 		return is_kernel(addr);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bfac5a166cb8..36a938c10ede 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3186,7 +3186,7 @@ static inline struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 {
 	return NULL;
 }
-static inline int in_gate_area_no_mm(unsigned long addr) { return 0; }
+static notrace inline int in_gate_area_no_mm(unsigned long addr) { return 0; }
 static inline int in_gate_area(struct mm_struct *mm, unsigned long addr)
 {
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] lib/usercopy: Allow traceing of usercopy, xarray, iov_iter
  2022-11-22 19:53 [PATCH 0/3] kprobes: notrace enhancements Nadav Amit
  2022-11-22 19:53 ` [PATCH 1/3] kprobes: Mark descendents of core_kernel_text as notrace Nadav Amit
@ 2022-11-22 19:53 ` Nadav Amit
  2022-11-22 19:53 ` [PATCH 3/3] compiler: inline does not imply notrace Nadav Amit
  2 siblings, 0 replies; 12+ messages in thread
From: Nadav Amit @ 2022-11-22 19:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arm-kernel, linux-kernel, linux-ia64, linux-um, linux-arch,
	linux-mm, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Arnd Bergmann, Andrew Morton, Nadav Amit

From: Nadav Amit <namit@vmware.com>

There is no reason not to allow the use of ftrace for usercopy, xarray
and iov_iter.  Enable tracing for these compilation unit.

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 lib/Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index 59bd7c2f793a..32766aa26670 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -27,6 +27,8 @@ KASAN_SANITIZE_string.o := n
 CFLAGS_string.o += -fno-stack-protector
 endif
 
+CFLAGS_xarray.o += $(CC_FLAGS_FTRACE)
+CFLAGS_iov_iter.o += $(CC_FLAGS_FTRACE)
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 maple_tree.o idr.o extable.o irq_regs.o argv_split.o \
@@ -42,6 +44,7 @@ lib-$(CONFIG_SMP) += cpumask.o
 lib-y	+= kobject.o klist.o
 obj-y	+= lockref.o
 
+CFLAGS_usercopy.o += $(CC_FLAGS_FTRACE)
 obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 list_sort.o uuid.o iov_iter.o clz_ctz.o \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-22 19:53 [PATCH 0/3] kprobes: notrace enhancements Nadav Amit
  2022-11-22 19:53 ` [PATCH 1/3] kprobes: Mark descendents of core_kernel_text as notrace Nadav Amit
  2022-11-22 19:53 ` [PATCH 2/3] lib/usercopy: Allow traceing of usercopy, xarray, iov_iter Nadav Amit
@ 2022-11-22 19:53 ` Nadav Amit
  2022-11-22 20:09   ` Arnd Bergmann
  2 siblings, 1 reply; 12+ messages in thread
From: Nadav Amit @ 2022-11-22 19:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arm-kernel, linux-kernel, linux-ia64, linux-um, linux-arch,
	linux-mm, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Arnd Bergmann, Andrew Morton, Nadav Amit

From: Nadav Amit <namit@vmware.com>

Functions that are marked as "inline" are currently also not tracable.
Apparently, this has been done to prevent differences between different
configs that caused different functions to be tracable on different
platforms.

Anyhow, this consideration is not very strong, and tying "inline" and
"notrace" does not seem very beneficial. The "inline" keyword is just a
hint, and many functions are currently not tracable due to this reason.

Disconnect "inline" from "notrace".

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 include/linux/compiler_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index eb0466236661..36a99ef03a1a 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -158,7 +158,7 @@ struct ftrace_likely_data {
  * of extern inline functions at link time.
  * A lot of inline functions can cause havoc with function tracing.
  */
-#define inline inline __gnu_inline __inline_maybe_unused notrace
+#define inline inline __gnu_inline __inline_maybe_unused
 
 /*
  * gcc provides both __inline__ and __inline as alternate spellings of
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-22 19:53 ` [PATCH 3/3] compiler: inline does not imply notrace Nadav Amit
@ 2022-11-22 20:09   ` Arnd Bergmann
  2022-11-22 20:28     ` Steven Rostedt
  2022-11-22 20:51     ` Nadav Amit
  0 siblings, 2 replies; 12+ messages in thread
From: Arnd Bergmann @ 2022-11-22 20:09 UTC (permalink / raw)
  To: Nadav Amit, Thomas Gleixner
  Cc: linux-arm-kernel, linux-kernel, linux-ia64, linux-um, Linux-Arch,
	linux-mm, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Richard Weinberger, Anton Ivanov,
	Johannes Berg, Andrew Morton, Nadav Amit, Steven Rostedt

On Tue, Nov 22, 2022, at 20:53, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
>
> Functions that are marked as "inline" are currently also not tracable.
> Apparently, this has been done to prevent differences between different
> configs that caused different functions to be tracable on different
> platforms.
>
> Anyhow, this consideration is not very strong, and tying "inline" and
> "notrace" does not seem very beneficial. The "inline" keyword is just a
> hint, and many functions are currently not tracable due to this reason.

The original reason was listed in 93b3cca1ccd3 ("ftrace: Make all
inline tags also include notrace"), which describes

    Commit 5963e317b1e9d2a ("ftrace/x86: Do not change stacks in DEBUG when
    calling lockdep") prevented lockdep calls from the int3 breakpoint handler
    from reseting the stack if a function that was called was in the process
    of being converted for tracing and had a breakpoint on it. The idea is,
    before calling the lockdep code, do a load_idt() to the special IDT that
    kept the breakpoint stack from reseting. This worked well as a quick fix
    for this kernel release, until a certain config caused a lockup in the
    function tracer start up tests.
    
    Investigating it, I found that the load_idt that was used to prevent
    the int3 from changing stacks was itself being traced!

and this sounds like a much stronger reason than what you describe,
and I would expect your change to cause regressions in similar places.

It's possible that the right answer is that the affected functions
should be marked as __always_inline. 

      Arnd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-22 20:09   ` Arnd Bergmann
@ 2022-11-22 20:28     ` Steven Rostedt
  2022-11-22 20:51     ` Nadav Amit
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2022-11-22 20:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nadav Amit, Thomas Gleixner, linux-arm-kernel, linux-kernel,
	linux-ia64, linux-um, Linux-Arch, linux-mm, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Richard Weinberger, Anton Ivanov, Johannes Berg, Andrew Morton,
	Nadav Amit, Peter Zijlstra

On Tue, 22 Nov 2022 21:09:08 +0100
"Arnd Bergmann" <arnd@arndb.de> wrote:

> On Tue, Nov 22, 2022, at 20:53, Nadav Amit wrote:
> > From: Nadav Amit <namit@vmware.com>
> >
> > Functions that are marked as "inline" are currently also not tracable.
> > Apparently, this has been done to prevent differences between different
> > configs that caused different functions to be tracable on different
> > platforms.
> >
> > Anyhow, this consideration is not very strong, and tying "inline" and
> > "notrace" does not seem very beneficial. The "inline" keyword is just a
> > hint, and many functions are currently not tracable due to this reason.  
> 
> The original reason was listed in 93b3cca1ccd3 ("ftrace: Make all
> inline tags also include notrace"), which describes
> 
>     Commit 5963e317b1e9d2a ("ftrace/x86: Do not change stacks in DEBUG when
>     calling lockdep") prevented lockdep calls from the int3 breakpoint handler
>     from reseting the stack if a function that was called was in the process
>     of being converted for tracing and had a breakpoint on it. The idea is,
>     before calling the lockdep code, do a load_idt() to the special IDT that
>     kept the breakpoint stack from reseting. This worked well as a quick fix
>     for this kernel release, until a certain config caused a lockup in the
>     function tracer start up tests.
>     
>     Investigating it, I found that the load_idt that was used to prevent
>     the int3 from changing stacks was itself being traced!
> 
> and this sounds like a much stronger reason than what you describe,
> and I would expect your change to cause regressions in similar places.
> 
> It's possible that the right answer is that the affected functions
> should be marked as __always_inline. 

Actually, this requirement may not be as needed as much today. There's been
a lot of work in the last 10 years (when that commit was added) to make
ftrace much more robust.

We could remove the notrace from inline and then see where it breaks ;-)

But I'm guessing that it's probably not as much of an issue as it was
before. Although, it may cause some splats with noinstr but I think that
will be caught at compile time.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-22 20:09   ` Arnd Bergmann
  2022-11-22 20:28     ` Steven Rostedt
@ 2022-11-22 20:51     ` Nadav Amit
  2022-11-29  2:36       ` Nadav Amit
  1 sibling, 1 reply; 12+ messages in thread
From: Nadav Amit @ 2022-11-22 20:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, linux-arm-kernel, kernel list, linux-ia64,
	linux-um, Linux-Arch, Linux-MM, Andy Lutomirski, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Andrew Morton, Steven Rostedt

On Nov 22, 2022, at 12:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> !! External Email
> 
> On Tue, Nov 22, 2022, at 20:53, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Functions that are marked as "inline" are currently also not tracable.
>> Apparently, this has been done to prevent differences between different
>> configs that caused different functions to be tracable on different
>> platforms.
>> 
>> Anyhow, this consideration is not very strong, and tying "inline" and
>> "notrace" does not seem very beneficial. The "inline" keyword is just a
>> hint, and many functions are currently not tracable due to this reason.
> 
> The original reason was listed in 93b3cca1ccd3 ("ftrace: Make all
> inline tags also include notrace"), which describes
> 
>    Commit 5963e317b1e9d2a ("ftrace/x86: Do not change stacks in DEBUG when
>    calling lockdep") prevented lockdep calls from the int3 breakpoint handler
>    from reseting the stack if a function that was called was in the process
>    of being converted for tracing and had a breakpoint on it. The idea is,
>    before calling the lockdep code, do a load_idt() to the special IDT that
>    kept the breakpoint stack from reseting. This worked well as a quick fix
>    for this kernel release, until a certain config caused a lockup in the
>    function tracer start up tests.
> 
>    Investigating it, I found that the load_idt that was used to prevent
>    the int3 from changing stacks was itself being traced!
> 
> and this sounds like a much stronger reason than what you describe,
> and I would expect your change to cause regressions in similar places.

I had no intention of misrepresenting. That was my understanding from my
previous discussion with Steven.

I assume that this patch might cause some regressions. The first patch in
this series was intended to prevents some regressions that I encountered.
This patch also marks a function that was missing “notrace” before. And I
did get kernel hangs due to the missing “notrace”.

Anyhow, I believe that the alternative - of leaving things as they are
(“inline”->”notrace”) - is even worse. Obviously it prevents proper tracing,
as there are even system calls that use inline, for instance
__do_sys_process_madvise() and __do_sys_mremap().

But more importantly, the current “inline”->”notrace” solution just papers
over missing “notrace” annotations. Anyone can remove the “inline” at any
given moment since there is no direct (or indirect) relationship between
“inline” and “notrace”. It seems to me all random and bound to fail at some
point.

> It's possible that the right answer is that the affected functions
> should be marked as __always_inline.

I think that it is probably better to mark them as notrace. People might
remove __always_inline. If we want two versions - one traceable and one not
traceable - we can also do that. But I am not sure how many people are aware
of the relationships between inline/__always_inline and tracing.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-22 20:51     ` Nadav Amit
@ 2022-11-29  2:36       ` Nadav Amit
  2022-11-29  4:15         ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Nadav Amit @ 2022-11-29  2:36 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: Arnd Bergmann, Thomas Gleixner, linux-arm-kernel, kernel list,
	linux-ia64, linux-um, Linux-Arch, Linux-MM, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	Richard Weinberger, Anton Ivanov, Johannes Berg, Andrew Morton,
	Steven Rostedt, Nadav Amit

On Nov 22, 2022, at 12:51 PM, Nadav Amit <namit@vmware.com> wrote:

> But more importantly, the current “inline”->”notrace” solution just papers
> over missing “notrace” annotations. Anyone can remove the “inline” at any
> given moment since there is no direct (or indirect) relationship between
> “inline” and “notrace”. It seems to me all random and bound to fail at some
> point.

Peter, Steven, (and others),

Beyond the issues that are addressed in this patch-set, I encountered one
more, which reiterates the fact that the heuristics of marking “inline”
functions as “notrace” is not good enough.

Before I send a patch, I would like to get your feedback. I include a splat
below. It appeaers the execution might get stuck since some functions that
can be used for function tracing can be traced themselves.

For example, __kernel_text_address() and unwind_get_return_address() are
traceable. I think that we need to disallow the tracing of all functions
that are called directly and indirectly from function_stack_trace_call()
(i.e., they are in the dynamic extent of function_stack_trace_call).

In the lack of a proper automated static analysis tool for the matter, I
suggest the following solution, but I would like to check that you are ok
with the granularity of the “notrace” as I propose. Again, note that this is
not caused by this “inline” patch, but an issue that existed before.

-- >8 --

Author: Nadav Amit <namit@vmware.com>
Date:   Tue Nov 29 02:25:12 2022 +0000

    trace: Disable tracing of code called from function_stack_trace_call()
    
    Signed-off-by: Nadav Amit <namit@vmware.com>

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a20a5ebfacd7..185933222d08 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,10 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
+CFLAGS_REMOVE_unwind_frame.o = -pg
+CFLAGS_REMOVE_unwind_guess.o = -pg
+CFLAGS_REMOVE_unwind_orc.o = -pg
+CFLAGS_REMOVE_stacktrace.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o                           := n
diff --git a/kernel/Makefile b/kernel/Makefile
index 318789c728d3..d688eab1e1f8 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -19,6 +19,8 @@ obj-$(CONFIG_MULTIUSER) += groups.o
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace internal ftrace files
 CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stacktrace.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_extable.o = $(CC_FLAGS_FTRACE)
 endif
 
 # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()


---


[531394.447185] sched: RT throttling activated
[531394.874541] NMI watchdog: Watchdog detected hard LOCKUP on cpu 26
[531394.874745] Modules linked in: zram
[531394.875224] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 6.0.0-rc6umem+ #29
[531394.875426] Hardware name: Cisco Systems Inc UCSC-C220-M5SX/UCSC-C220-M5SX, BIOS C220M5.4.0.1i.0.0522190226 05/22/2019
[531394.875623] RIP: 0010:poke_int3_handler (arch/x86/kernel/alternative.c:1435) 
[531394.875903] Code: 45 01 48 8b 0d c8 0a 45 01 49 8d 70 ff 83 f8 01 7f 1c 48 63 39 31 c0 48 81 c7 00 00 00 81 48 39 fe 74 3c f0 ff 0d b3 0a 45 01 <c3> 31 c0 c3 49 89 ca 49 89 c1 49 d1 e9 4c 89 c9 48 c1 e1 04 4c 01
All code
========
   0:	45 01 48 8b          	add    %r9d,-0x75(%r8)
   4:	0d c8 0a 45 01       	or     $0x1450ac8,%eax
   9:	49 8d 70 ff          	lea    -0x1(%r8),%rsi
   d:	83 f8 01             	cmp    $0x1,%eax
  10:	7f 1c                	jg     0x2e
  12:	48 63 39             	movslq (%rcx),%rdi
  15:	31 c0                	xor    %eax,%eax
  17:	48 81 c7 00 00 00 81 	add    $0xffffffff81000000,%rdi
  1e:	48 39 fe             	cmp    %rdi,%rsi
  21:	74 3c                	je     0x5f
  23:	f0 ff 0d b3 0a 45 01 	lock decl 0x1450ab3(%rip)        # 0x1450add
  2a:*	c3                   	ret    		<-- trapping instruction
  2b:	31 c0                	xor    %eax,%eax
  2d:	c3                   	ret    
  2e:	49 89 ca             	mov    %rcx,%r10
  31:	49 89 c1             	mov    %rax,%r9
  34:	49 d1 e9             	shr    %r9
  37:	4c 89 c9             	mov    %r9,%rcx
  3a:	48 c1 e1 04          	shl    $0x4,%rcx
  3e:	4c                   	rex.WR
  3f:	01                   	.byte 0x1

Code starting with the faulting instruction
===========================================
   0:	c3                   	ret    
   1:	31 c0                	xor    %eax,%eax
   3:	c3                   	ret    
   4:	49 89 ca             	mov    %rcx,%r10
   7:	49 89 c1             	mov    %rax,%r9
   a:	49 d1 e9             	shr    %r9
   d:	4c 89 c9             	mov    %r9,%rcx
  10:	48 c1 e1 04          	shl    $0x4,%rcx
  14:	4c                   	rex.WR
  15:	01                   	.byte 0x1
[531394.876031] RSP: 0018:ffffc9000cd387b0 EFLAGS: 00000003
[531394.876320] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffc9000cd388a8
[531394.876482] RDX: ffffc9000cd387d8 RSI: ffffffff812ba310 RDI: ffffffffc0409094
[531394.876652] RBP: ffffc9000cd387c8 R08: ffffffffc0409099 R09: 0000000000000000
[531394.876782] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000cd387d8
[531394.876927] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[531394.877180] FS:  0000000000000000(0000) GS:ffff88afdf900000(0000) knlGS:0000000000000000
[531394.877371] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[531394.877503] CR2: 00007efd1e47b01c CR3: 0000000006a0a001 CR4: 00000000007706e0
[531394.877641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[531394.877822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[531394.877993] PKRU: 55555554
[531394.878164] Call Trace:
[531394.878356]  <IRQ>
[531394.878573] ? exc_int3 (arch/x86/kernel/traps.c:817) 
[531394.879306] asm_exc_int3 (./arch/x86/include/asm/idtentry.h:569) 
[531394.879725] RIP: 0010:function_stack_trace_call (kernel/trace/trace_functions.c:219) 
[531394.880058] Code: 2b 81 48 c7 c7 e0 e2 33 83 e8 1c fa fd ff 48 c7 c7 80 e6 1c 83 e8 40 61 fe 00 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc <55> 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 4c 8b 62 18 41
All code
========
   0:	2b 81 48 c7 c7 e0    	sub    -0x1f3838b8(%rcx),%eax
   6:	e2 33                	loop   0x3b
   8:	83 e8 1c             	sub    $0x1c,%eax
   b:	fa                   	cli    
   c:	fd                   	std    
   d:	ff 48 c7             	decl   -0x39(%rax)
  10:	c7 80 e6 1c 83 e8 40 	movl   $0xfe6140,-0x177ce31a(%rax)
  17:	61 fe 00 
  1a:	5d                   	pop    %rbp
  1b:	c3                   	ret    
  1c:	cc                   	int3   
  1d:	cc                   	int3   
  1e:	cc                   	int3   
  1f:	cc                   	int3   
  20:	cc                   	int3   
  21:	cc                   	int3   
  22:	cc                   	int3   
  23:	cc                   	int3   
  24:	cc                   	int3   
  25:	cc                   	int3   
  26:	cc                   	int3   
  27:	cc                   	int3   
  28:	cc                   	int3   
  29:	cc                   	int3   
  2a:*	55                   	push   %rbp		<-- trapping instruction
  2b:	48 89 e5             	mov    %rsp,%rbp
  2e:	41 57                	push   %r15
  30:	41 56                	push   %r14
  32:	41 55                	push   %r13
  34:	41 54                	push   %r12
  36:	53                   	push   %rbx
  37:	48 83 ec 08          	sub    $0x8,%rsp
  3b:	4c 8b 62 18          	mov    0x18(%rdx),%r12
  3f:	41                   	rex.B

Code starting with the faulting instruction
===========================================
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	41 57                	push   %r15
   6:	41 56                	push   %r14
   8:	41 55                	push   %r13
   a:	41 54                	push   %r12
   c:	53                   	push   %rbx
   d:	48 83 ec 08          	sub    $0x8,%rsp
  11:	4c 8b 62 18          	mov    0x18(%rdx),%r12
  15:	41                   	rex.B
[531394.880225] RSP: 0018:ffffc9000cd388a8 EFLAGS: 00000082
[531394.880538] RAX: 0000000000000002 RBX: ffffc9000cd389a8 RCX: ffffc9000cd388b0
[531394.880702] RDX: ffffffff831cbec0 RSI: ffffffff8111893f RDI: ffffffff811b2c80
[531394.880884] RBP: ffffc9000cd38958 R08: ffffc900082f7f48 R09: 0000000000000001
[531394.881055] R10: 0000000000000e00 R11: 0000000000000001 R12: ffffc9000cd38a50
[531394.881225] R13: 0000000000000000 R14: ffff889846ecc080 R15: 0000000000000004
[531394.883694] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
[531394.883984] ? kernel_text_address (kernel/extable.c:78) 
[531394.885410]  ? 0xffffffffc0409095
[531394.886432]  ? 0xffffffffc0409099
[531394.887278]  ? 0xffffffffc0409099
[531394.889561] ? __trace_stack (kernel/trace/trace.c:3119) 
[531394.890036] ? __kernel_text_address (kernel/extable.c:78) 
[531394.891196] __kernel_text_address (kernel/extable.c:78) 
[531394.891658] unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
[531394.892112] ? __kernel_text_address (kernel/extable.c:78) 
[531394.892382] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
[531394.892789] ? write_profile (kernel/stacktrace.c:83) 
[531394.893260] arch_stack_walk (arch/x86/kernel/stacktrace.c:26) 
[531394.895650] ? __trace_stack (kernel/trace/trace.c:3119) 
[531394.897209] stack_trace_save (kernel/stacktrace.c:123) 
[531394.898344] __ftrace_trace_stack (kernel/trace/trace.c:3061) 
[531394.899500] ? rt_mutex_postunlock (kernel/printk/printk.c:2894) 
[531394.899975] __trace_stack (kernel/trace/trace.c:3119) 
[531394.901025] function_stack_trace_call (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/trace/trace_functions.c:245) 
[531394.901356] ? fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
[531394.902748]  0xffffffffc0409099
[531394.904289] ? fb_get_color_depth (drivers/video/fbdev/core/fbmem.c:92) 
[531394.906173] ? console_conditional_schedule (kernel/printk/printk.c:2895) 
[531394.907244] console_conditional_schedule (kernel/printk/printk.c:2895) 
[531394.907727] fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
[531394.908200] ? console_conditional_schedule (kernel/printk/printk.c:2895) 
[531394.908506] ? fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
[531394.911230] fbcon_scroll (drivers/video/fbdev/core/fbcon.c:1838) 
[531394.912743] con_scroll (drivers/tty/vt/vt.c:630 (discriminator 1)) 
[531394.914360] lf (drivers/tty/vt/vt.c:1507) 
[531394.914680] ? lf (drivers/tty/vt/vt.c:1502) 
[531394.915779] vt_console_print (drivers/tty/vt/vt.c:3126) 
[531394.918248] console_emit_next_record.constprop.0 (kernel/printk/printk.c:1945 kernel/printk/printk.c:2732) 
[531394.922365] console_unlock (kernel/printk/printk.c:2794 kernel/printk/printk.c:2861) 
[531394.924129] vprintk_emit (kernel/printk/printk.c:2272) 
[531394.925782] vprintk_default (kernel/printk/printk.c:2283) 
[531394.926306] vprintk (kernel/printk/printk_safe.c:50) 
[531394.926943] _printk (kernel/printk/printk.c:2296) 
[531394.929016] perf_duration_warn.cold (kernel/events/core.c:510 kernel/events/core.c:508) 
[531394.929518] irq_work_single (kernel/irq_work.c:211) 
[531394.930449] irq_work_run_list (kernel/irq_work.c:241 (discriminator 3)) 
[531394.931169] irq_work_run (kernel/irq_work.c:253) 
[531394.931816] __sysvec_irq_work (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./arch/x86/include/asm/trace/irq_vectors.h:64 arch/x86/kernel/irq_work.c:23) 
[531394.932259] sysvec_irq_work (arch/x86/kernel/irq_work.c:17 (discriminator 14)) 
[531394.932660]  </IRQ>
[531394.932854]  <TASK>
[531394.933607] asm_sysvec_irq_work (./arch/x86/include/asm/idtentry.h:675)






^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-29  2:36       ` Nadav Amit
@ 2022-11-29  4:15         ` Steven Rostedt
  2022-11-29  4:25           ` Nadav Amit
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2022-11-29  4:15 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Arnd Bergmann, Thomas Gleixner, linux-arm-kernel,
	kernel list, linux-ia64, linux-um, Linux-Arch, Linux-MM,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Andrew Morton, Nadav Amit

On Tue, 29 Nov 2022 02:36:22 +0000
Nadav Amit <namit@vmware.com> wrote:

> On Nov 22, 2022, at 12:51 PM, Nadav Amit <namit@vmware.com> wrote:
> 
> > But more importantly, the current “inline”->”notrace” solution just papers
> > over missing “notrace” annotations. Anyone can remove the “inline” at any
> > given moment since there is no direct (or indirect) relationship between
> > “inline” and “notrace”. It seems to me all random and bound to fail at some
> > point.  
> 
> Peter, Steven, (and others),
> 
> Beyond the issues that are addressed in this patch-set, I encountered one
> more, which reiterates the fact that the heuristics of marking “inline”
> functions as “notrace” is not good enough.
> 
> Before I send a patch, I would like to get your feedback. I include a splat
> below. It appeaers the execution might get stuck since some functions that
> can be used for function tracing can be traced themselves.
> 
> For example, __kernel_text_address() and unwind_get_return_address() are
> traceable. I think that we need to disallow the tracing of all functions
> that are called directly and indirectly from function_stack_trace_call()
> (i.e., they are in the dynamic extent of function_stack_trace_call).

How did this happen. It should be able to handle recursion:

static void
function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
			  struct ftrace_ops *op, struct ftrace_regs *fregs)
{
	struct trace_array *tr = op->private;
	struct trace_array_cpu *data;
	unsigned long flags;
	long disabled;
	int cpu;
	unsigned int trace_ctx;

	if (unlikely(!tr->function_enabled))
		return;

	/*
	 * Need to use raw, since this must be called before the
	 * recursive protection is performed.
	 */
	local_irq_save(flags);
	cpu = raw_smp_processor_id();
	data = per_cpu_ptr(tr->array_buffer.data, cpu);
	disabled = atomic_inc_return(&data->disabled);

	if (likely(disabled == 1)) { <<<---- This stops recursion

		trace_ctx = tracing_gen_ctx_flags(flags);
		trace_function(tr, ip, parent_ip, trace_ctx);
		__trace_stack(tr, trace_ctx, STACK_SKIP);
	}

	atomic_dec(&data->disabled);
	local_irq_restore(flags);
}

Each of the stack trace functions may recurse back into this function, but
it will not recurse further. How did it crash?

-- Steve


> 
> In the lack of a proper automated static analysis tool for the matter, I
> suggest the following solution, but I would like to check that you are ok
> with the granularity of the “notrace” as I propose. Again, note that this is
> not caused by this “inline” patch, but an issue that existed before.
> 
> -- >8 --  
> 
> Author: Nadav Amit <namit@vmware.com>
> Date:   Tue Nov 29 02:25:12 2022 +0000
> 
>     trace: Disable tracing of code called from function_stack_trace_call()
>     
>     Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index a20a5ebfacd7..185933222d08 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -21,6 +21,10 @@ CFLAGS_REMOVE_ftrace.o = -pg
>  CFLAGS_REMOVE_early_printk.o = -pg
>  CFLAGS_REMOVE_head64.o = -pg
>  CFLAGS_REMOVE_sev.o = -pg
> +CFLAGS_REMOVE_unwind_frame.o = -pg
> +CFLAGS_REMOVE_unwind_guess.o = -pg
> +CFLAGS_REMOVE_unwind_orc.o = -pg
> +CFLAGS_REMOVE_stacktrace.o = -pg
>  endif
>  
>  KASAN_SANITIZE_head$(BITS).o                           := n
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 318789c728d3..d688eab1e1f8 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -19,6 +19,8 @@ obj-$(CONFIG_MULTIUSER) += groups.o
>  ifdef CONFIG_FUNCTION_TRACER
>  # Do not trace internal ftrace files
>  CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_stacktrace.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_extable.o = $(CC_FLAGS_FTRACE)
>  endif
>  
>  # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
> 
> 
> ---
> 
> 
> [531394.447185] sched: RT throttling activated
> [531394.874541] NMI watchdog: Watchdog detected hard LOCKUP on cpu 26
> [531394.874745] Modules linked in: zram
> [531394.875224] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 6.0.0-rc6umem+ #29
> [531394.875426] Hardware name: Cisco Systems Inc UCSC-C220-M5SX/UCSC-C220-M5SX, BIOS C220M5.4.0.1i.0.0522190226 05/22/2019
> [531394.875623] RIP: 0010:poke_int3_handler (arch/x86/kernel/alternative.c:1435) 
> [531394.875903] Code: 45 01 48 8b 0d c8 0a 45 01 49 8d 70 ff 83 f8 01 7f 1c 48 63 39 31 c0 48 81 c7 00 00 00 81 48 39 fe 74 3c f0 ff 0d b3 0a 45 01 <c3> 31 c0 c3 49 89 ca 49 89 c1 49 d1 e9 4c 89 c9 48 c1 e1 04 4c 01
> All code
> ========
>    0:	45 01 48 8b          	add    %r9d,-0x75(%r8)
>    4:	0d c8 0a 45 01       	or     $0x1450ac8,%eax
>    9:	49 8d 70 ff          	lea    -0x1(%r8),%rsi
>    d:	83 f8 01             	cmp    $0x1,%eax
>   10:	7f 1c                	jg     0x2e
>   12:	48 63 39             	movslq (%rcx),%rdi
>   15:	31 c0                	xor    %eax,%eax
>   17:	48 81 c7 00 00 00 81 	add    $0xffffffff81000000,%rdi
>   1e:	48 39 fe             	cmp    %rdi,%rsi
>   21:	74 3c                	je     0x5f
>   23:	f0 ff 0d b3 0a 45 01 	lock decl 0x1450ab3(%rip)        # 0x1450add
>   2a:*	c3                   	ret    		<-- trapping instruction
>   2b:	31 c0                	xor    %eax,%eax
>   2d:	c3                   	ret    
>   2e:	49 89 ca             	mov    %rcx,%r10
>   31:	49 89 c1             	mov    %rax,%r9
>   34:	49 d1 e9             	shr    %r9
>   37:	4c 89 c9             	mov    %r9,%rcx
>   3a:	48 c1 e1 04          	shl    $0x4,%rcx
>   3e:	4c                   	rex.WR
>   3f:	01                   	.byte 0x1
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	c3                   	ret    
>    1:	31 c0                	xor    %eax,%eax
>    3:	c3                   	ret    
>    4:	49 89 ca             	mov    %rcx,%r10
>    7:	49 89 c1             	mov    %rax,%r9
>    a:	49 d1 e9             	shr    %r9
>    d:	4c 89 c9             	mov    %r9,%rcx
>   10:	48 c1 e1 04          	shl    $0x4,%rcx
>   14:	4c                   	rex.WR
>   15:	01                   	.byte 0x1
> [531394.876031] RSP: 0018:ffffc9000cd387b0 EFLAGS: 00000003
> [531394.876320] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffc9000cd388a8
> [531394.876482] RDX: ffffc9000cd387d8 RSI: ffffffff812ba310 RDI: ffffffffc0409094
> [531394.876652] RBP: ffffc9000cd387c8 R08: ffffffffc0409099 R09: 0000000000000000
> [531394.876782] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000cd387d8
> [531394.876927] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [531394.877180] FS:  0000000000000000(0000) GS:ffff88afdf900000(0000) knlGS:0000000000000000
> [531394.877371] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [531394.877503] CR2: 00007efd1e47b01c CR3: 0000000006a0a001 CR4: 00000000007706e0
> [531394.877641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [531394.877822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [531394.877993] PKRU: 55555554
> [531394.878164] Call Trace:
> [531394.878356]  <IRQ>
> [531394.878573] ? exc_int3 (arch/x86/kernel/traps.c:817) 
> [531394.879306] asm_exc_int3 (./arch/x86/include/asm/idtentry.h:569) 
> [531394.879725] RIP: 0010:function_stack_trace_call (kernel/trace/trace_functions.c:219) 
> [531394.880058] Code: 2b 81 48 c7 c7 e0 e2 33 83 e8 1c fa fd ff 48 c7 c7 80 e6 1c 83 e8 40 61 fe 00 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc <55> 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 4c 8b 62 18 41
> All code
> ========
>    0:	2b 81 48 c7 c7 e0    	sub    -0x1f3838b8(%rcx),%eax
>    6:	e2 33                	loop   0x3b
>    8:	83 e8 1c             	sub    $0x1c,%eax
>    b:	fa                   	cli    
>    c:	fd                   	std    
>    d:	ff 48 c7             	decl   -0x39(%rax)
>   10:	c7 80 e6 1c 83 e8 40 	movl   $0xfe6140,-0x177ce31a(%rax)
>   17:	61 fe 00 
>   1a:	5d                   	pop    %rbp
>   1b:	c3                   	ret    
>   1c:	cc                   	int3   
>   1d:	cc                   	int3   
>   1e:	cc                   	int3   
>   1f:	cc                   	int3   
>   20:	cc                   	int3   
>   21:	cc                   	int3   
>   22:	cc                   	int3   
>   23:	cc                   	int3   
>   24:	cc                   	int3   
>   25:	cc                   	int3   
>   26:	cc                   	int3   
>   27:	cc                   	int3   
>   28:	cc                   	int3   
>   29:	cc                   	int3   
>   2a:*	55                   	push   %rbp		<-- trapping instruction
>   2b:	48 89 e5             	mov    %rsp,%rbp
>   2e:	41 57                	push   %r15
>   30:	41 56                	push   %r14
>   32:	41 55                	push   %r13
>   34:	41 54                	push   %r12
>   36:	53                   	push   %rbx
>   37:	48 83 ec 08          	sub    $0x8,%rsp
>   3b:	4c 8b 62 18          	mov    0x18(%rdx),%r12
>   3f:	41                   	rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	55                   	push   %rbp
>    1:	48 89 e5             	mov    %rsp,%rbp
>    4:	41 57                	push   %r15
>    6:	41 56                	push   %r14
>    8:	41 55                	push   %r13
>    a:	41 54                	push   %r12
>    c:	53                   	push   %rbx
>    d:	48 83 ec 08          	sub    $0x8,%rsp
>   11:	4c 8b 62 18          	mov    0x18(%rdx),%r12
>   15:	41                   	rex.B
> [531394.880225] RSP: 0018:ffffc9000cd388a8 EFLAGS: 00000082
> [531394.880538] RAX: 0000000000000002 RBX: ffffc9000cd389a8 RCX: ffffc9000cd388b0
> [531394.880702] RDX: ffffffff831cbec0 RSI: ffffffff8111893f RDI: ffffffff811b2c80
> [531394.880884] RBP: ffffc9000cd38958 R08: ffffc900082f7f48 R09: 0000000000000001
> [531394.881055] R10: 0000000000000e00 R11: 0000000000000001 R12: ffffc9000cd38a50
> [531394.881225] R13: 0000000000000000 R14: ffff889846ecc080 R15: 0000000000000004
> [531394.883694] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
> [531394.883984] ? kernel_text_address (kernel/extable.c:78) 
> [531394.885410]  ? 0xffffffffc0409095
> [531394.886432]  ? 0xffffffffc0409099
> [531394.887278]  ? 0xffffffffc0409099
> [531394.889561] ? __trace_stack (kernel/trace/trace.c:3119) 
> [531394.890036] ? __kernel_text_address (kernel/extable.c:78) 
> [531394.891196] __kernel_text_address (kernel/extable.c:78) 
> [531394.891658] unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
> [531394.892112] ? __kernel_text_address (kernel/extable.c:78) 
> [531394.892382] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 arch/x86/kernel/unwind_frame.c:14) 
> [531394.892789] ? write_profile (kernel/stacktrace.c:83) 
> [531394.893260] arch_stack_walk (arch/x86/kernel/stacktrace.c:26) 
> [531394.895650] ? __trace_stack (kernel/trace/trace.c:3119) 
> [531394.897209] stack_trace_save (kernel/stacktrace.c:123) 
> [531394.898344] __ftrace_trace_stack (kernel/trace/trace.c:3061) 
> [531394.899500] ? rt_mutex_postunlock (kernel/printk/printk.c:2894) 
> [531394.899975] __trace_stack (kernel/trace/trace.c:3119) 
> [531394.901025] function_stack_trace_call (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/trace/trace_functions.c:245) 
> [531394.901356] ? fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
> [531394.902748]  0xffffffffc0409099
> [531394.904289] ? fb_get_color_depth (drivers/video/fbdev/core/fbmem.c:92) 
> [531394.906173] ? console_conditional_schedule (kernel/printk/printk.c:2895) 
> [531394.907244] console_conditional_schedule (kernel/printk/printk.c:2895) 
> [531394.907727] fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
> [531394.908200] ? console_conditional_schedule (kernel/printk/printk.c:2895) 
> [531394.908506] ? fbcon_redraw.constprop.0 (drivers/video/fbdev/core/fbcon.c:1661) 
> [531394.911230] fbcon_scroll (drivers/video/fbdev/core/fbcon.c:1838) 
> [531394.912743] con_scroll (drivers/tty/vt/vt.c:630 (discriminator 1)) 
> [531394.914360] lf (drivers/tty/vt/vt.c:1507) 
> [531394.914680] ? lf (drivers/tty/vt/vt.c:1502) 
> [531394.915779] vt_console_print (drivers/tty/vt/vt.c:3126) 
> [531394.918248] console_emit_next_record.constprop.0 (kernel/printk/printk.c:1945 kernel/printk/printk.c:2732) 
> [531394.922365] console_unlock (kernel/printk/printk.c:2794 kernel/printk/printk.c:2861) 
> [531394.924129] vprintk_emit (kernel/printk/printk.c:2272) 
> [531394.925782] vprintk_default (kernel/printk/printk.c:2283) 
> [531394.926306] vprintk (kernel/printk/printk_safe.c:50) 
> [531394.926943] _printk (kernel/printk/printk.c:2296) 
> [531394.929016] perf_duration_warn.cold (kernel/events/core.c:510 kernel/events/core.c:508) 
> [531394.929518] irq_work_single (kernel/irq_work.c:211) 
> [531394.930449] irq_work_run_list (kernel/irq_work.c:241 (discriminator 3)) 
> [531394.931169] irq_work_run (kernel/irq_work.c:253) 
> [531394.931816] __sysvec_irq_work (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./arch/x86/include/asm/trace/irq_vectors.h:64 arch/x86/kernel/irq_work.c:23) 
> [531394.932259] sysvec_irq_work (arch/x86/kernel/irq_work.c:17 (discriminator 14)) 
> [531394.932660]  </IRQ>
> [531394.932854]  <TASK>
> [531394.933607] asm_sysvec_irq_work (./arch/x86/include/asm/idtentry.h:675)
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-29  4:15         ` Steven Rostedt
@ 2022-11-29  4:25           ` Nadav Amit
  2022-11-29 15:06             ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Nadav Amit @ 2022-11-29  4:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Arnd Bergmann, Thomas Gleixner, linux-arm-kernel,
	kernel list, linux-ia64, linux-um, Linux-Arch, Linux-MM,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Andrew Morton

On Nov 28, 2022, at 8:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:

> !! External Email
> 
> On Tue, 29 Nov 2022 02:36:22 +0000
> Nadav Amit <namit@vmware.com> wrote:
> 
>> On Nov 22, 2022, at 12:51 PM, Nadav Amit <namit@vmware.com> wrote:
>> 
>>> But more importantly, the current “inline”->”notrace” solution just papers
>>> over missing “notrace” annotations. Anyone can remove the “inline” at any
>>> given moment since there is no direct (or indirect) relationship between
>>> “inline” and “notrace”. It seems to me all random and bound to fail at some
>>> point.
>> 
>> Peter, Steven, (and others),
>> 
>> Beyond the issues that are addressed in this patch-set, I encountered one
>> more, which reiterates the fact that the heuristics of marking “inline”
>> functions as “notrace” is not good enough.
>> 
>> Before I send a patch, I would like to get your feedback. I include a splat
>> below. It appeaers the execution might get stuck since some functions that
>> can be used for function tracing can be traced themselves.
>> 
>> For example, __kernel_text_address() and unwind_get_return_address() are
>> traceable. I think that we need to disallow the tracing of all functions
>> that are called directly and indirectly from function_stack_trace_call()
>> (i.e., they are in the dynamic extent of function_stack_trace_call).
> 
> How did this happen. It should be able to handle recursion:
> 
> static void
> function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
>                          struct ftrace_ops *op, struct ftrace_regs *fregs)
> {
>        struct trace_array *tr = op->private;
>        struct trace_array_cpu *data;
>        unsigned long flags;
>        long disabled;
>        int cpu;
>        unsigned int trace_ctx;
> 
>        if (unlikely(!tr->function_enabled))
>                return;
> 
>        /*
>         * Need to use raw, since this must be called before the
>         * recursive protection is performed.
>         */
>        local_irq_save(flags);
>        cpu = raw_smp_processor_id();
>        data = per_cpu_ptr(tr->array_buffer.data, cpu);
>        disabled = atomic_inc_return(&data->disabled);
> 
>        if (likely(disabled == 1)) { <<<---- This stops recursion
> 
>                trace_ctx = tracing_gen_ctx_flags(flags);
>                trace_function(tr, ip, parent_ip, trace_ctx);
>                __trace_stack(tr, trace_ctx, STACK_SKIP);
>        }
> 
>        atomic_dec(&data->disabled);
>        local_irq_restore(flags);
> }
> 
> Each of the stack trace functions may recurse back into this function, but
> it will not recurse further. How did it crash?

Ugh. Thanks. I didn’t know that - so your input is really helpful.

I will need to further debug it, but this issue does not occur every time.

The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep
enabled, so it is not a deadlock that lockdep knows. Could it be that
somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have
no idea. I would need to recreate the scenario. 

For the record, I tried to saved some details in the previous email. It was
kind of hard to understand what’s going on on the other cores, since the
trace of other cores was interleaved. I extract the parts from that I think
the refer to the another CPU (yes, the output is really slow, as seen in the
timestamps):

[531413.923628] Code: 00 00 31 c0 eb f1 0f 1f 80 00 00 00 00 e8 1b 2e 16 3e 55 48 89 e5 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 99 da f1 fe 65 8b 05 f2 99 d7 7d 85 c0 74 02 5d

All code
========
0: 00 00 add %al,(%rax)
2: 31 c0 xor %eax,%eax
4: eb f1 jmp 0xfffffffffffffff7
6: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
d: e8 1b 2e 16 3e call 0x3e162e2d
12: 55 push %rbp
13: 48 89 e5 mov %rsp,%rbp
16: c6 07 00 movb $0x0,(%rdi)
19: 0f 1f 00 nopl (%rax)
1c: f7 c6 00 02 00 00 test $0x200,%esi
22: 74 06 je 0x2a
24: fb sti 
25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
2a:* bf 01 00 00 00 mov $0x1,%edi <-- trapping instruction
2f: e8 99 da f1 fe call 0xfffffffffef1dacd
34: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a2d
3b: 85 c0 test %eax,%eax
3d: 74 02 je 0x41
3f: 5d pop %rbp

Code starting with the faulting instruction
===========================================
0: bf 01 00 00 00 mov $0x1,%edi
5: e8 99 da f1 fe call 0xfffffffffef1daa3
a: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a03
11: 85 c0 test %eax,%eax
13: 74 02 je 0x17
15: 5d pop %rbp

[531414.066765] RSP: 0018:ffffc9000c9a77d8 EFLAGS: 00000206
[531414.077943] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:443 kernel/smp.c:988) 
[531416.987351] on_each_cpu_cond_mask (kernel/smp.c:1155) 
[531416.205862] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
[531416.681294] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
[531417.468443] text_poke_bp_batch (arch/x86/kernel/alternative.c:1553) 
[531418.939923] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:500)
[531419.882055] ? ftrace_no_pid_write (kernel/trace/ftrace.c:7864) 
[531420.510376] ftrace_update_pid_func (kernel/trace/ftrace.c:374 (discriminator 1))
[531420.784703] ftrace_pid_open (kernel/trace/ftrace.c:2918 kernel/trace/ftrace.c:2932 kernel/trace/ftrace.c:7725 kernel/trace/ftrace.c:7835 kernel/trace/ftrace.c:7865) 
[531421.851294] vfs_open (fs/open.c:1017)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-29  4:25           ` Nadav Amit
@ 2022-11-29 15:06             ` Steven Rostedt
  2022-11-29 18:02               ` Nadav Amit
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2022-11-29 15:06 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Arnd Bergmann, Thomas Gleixner, linux-arm-kernel,
	kernel list, linux-ia64, linux-um, Linux-Arch, Linux-MM,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Andrew Morton

On Tue, 29 Nov 2022 04:25:38 +0000
Nadav Amit <namit@vmware.com> wrote:


> I will need to further debug it, but this issue does not occur every time.
> 
> The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep
> enabled, so it is not a deadlock that lockdep knows. Could it be that
> somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have
> no idea. I would need to recreate the scenario. 

You have lockdep enabled and you are running function tracing with stack
trace on? So you are doing a stack trace on *every* function that is traced?

I don't think you hit a deadlock, I think you hit a live lock. You could
possibly slow the system down so much that when an interrupt finishes it's
time for it to be triggered again, and you never make forward progress.

> 
> For the record, I tried to saved some details in the previous email. It was
> kind of hard to understand what’s going on on the other cores, since the
> trace of other cores was interleaved. I extract the parts from that I think
> the refer to the another CPU (yes, the output is really slow, as seen in the
> timestamps):
> 
> [531413.923628] Code: 00 00 31 c0 eb f1 0f 1f 80 00 00 00 00 e8 1b 2e 16 3e 55 48 89 e5 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 99 da f1 fe 65 8b 05 f2 99 d7 7d 85 c0 74 02 5d
> 
> All code
> ========
> 0: 00 00 add %al,(%rax)
> 2: 31 c0 xor %eax,%eax
> 4: eb f1 jmp 0xfffffffffffffff7
> 6: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> d: e8 1b 2e 16 3e call 0x3e162e2d
> 12: 55 push %rbp
> 13: 48 89 e5 mov %rsp,%rbp
> 16: c6 07 00 movb $0x0,(%rdi)
> 19: 0f 1f 00 nopl (%rax)
> 1c: f7 c6 00 02 00 00 test $0x200,%esi
> 22: 74 06 je 0x2a
> 24: fb sti 
> 25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 2a:* bf 01 00 00 00 mov $0x1,%edi <-- trapping instruction
> 2f: e8 99 da f1 fe call 0xfffffffffef1dacd
> 34: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a2d
> 3b: 85 c0 test %eax,%eax
> 3d: 74 02 je 0x41
> 3f: 5d pop %rbp
> 
> Code starting with the faulting instruction
> ===========================================
> 0: bf 01 00 00 00 mov $0x1,%edi
> 5: e8 99 da f1 fe call 0xfffffffffef1daa3
> a: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a03
> 11: 85 c0 test %eax,%eax
> 13: 74 02 je 0x17
> 15: 5d pop %rbp
> 
> [531414.066765] RSP: 0018:ffffc9000c9a77d8 EFLAGS: 00000206
> [531414.077943] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:443 kernel/smp.c:988) 
> [531416.987351] on_each_cpu_cond_mask (kernel/smp.c:1155) 
> [531416.205862] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
> [531416.681294] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
> [531417.468443] text_poke_bp_batch (arch/x86/kernel/alternative.c:1553) 
> [531418.939923] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:500)
> [531419.882055] ? ftrace_no_pid_write (kernel/trace/ftrace.c:7864) 
> [531420.510376] ftrace_update_pid_func (kernel/trace/ftrace.c:374 (discriminator 1))
> [531420.784703] ftrace_pid_open (kernel/trace/ftrace.c:2918 kernel/trace/ftrace.c:2932 kernel/trace/ftrace.c:7725 kernel/trace/ftrace.c:7835 kernel/trace/ftrace.c:7865) 
> [531421.851294] vfs_open (fs/open.c:1017)
> 
> 
> 

Do you have an issue with normal function tracing, and not tracing every
function.

I should also add this, because it detects recursion faster than the
atomic_inc_return() does.

-- Steve

diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..93ec756dc24b 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -221,12 +221,18 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
+	int bit;
 	int cpu;
 	unsigned int trace_ctx;
 
 	if (unlikely(!tr->function_enabled))
 		return;
 
+	/* Faster than atomic_inc_return() */
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
+	if (bit < 0)
+		return;
+
 	/*
 	 * Need to use raw, since this must be called before the
 	 * recursive protection is performed.
@@ -244,6 +250,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 
 	atomic_dec(&data->disabled);
 	local_irq_restore(flags);
+	ftrace_test_recursion_unlock(bit);
 }
 
 static inline bool is_repeat_check(struct trace_array *tr,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] compiler: inline does not imply notrace
  2022-11-29 15:06             ` Steven Rostedt
@ 2022-11-29 18:02               ` Nadav Amit
  0 siblings, 0 replies; 12+ messages in thread
From: Nadav Amit @ 2022-11-29 18:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Arnd Bergmann, Thomas Gleixner, linux-arm-kernel,
	kernel list, linux-ia64, linux-um, Linux-Arch, Linux-MM,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Andrew Morton

On Nov 29, 2022, at 7:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 29 Nov 2022 04:25:38 +0000
> Nadav Amit <namit@vmware.com> wrote:
> 
> 
>> I will need to further debug it, but this issue does not occur every time.
>> 
>> The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep
>> enabled, so it is not a deadlock that lockdep knows. Could it be that
>> somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have
>> no idea. I would need to recreate the scenario. 
> 
> You have lockdep enabled and you are running function tracing with stack
> trace on? So you are doing a stack trace on *every* function that is traced?
> 
> I don't think you hit a deadlock, I think you hit a live lock. You could
> possibly slow the system down so much that when an interrupt finishes it's
> time for it to be triggered again, and you never make forward progress.

It might be the issue. Perhaps I have a bug, because my code was supposed to
either enable stack-tracing with selected functions or create a trace all
function but *without* stack-tracing.

Thanks for the pointer and sorry for the noise.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-29 18:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 19:53 [PATCH 0/3] kprobes: notrace enhancements Nadav Amit
2022-11-22 19:53 ` [PATCH 1/3] kprobes: Mark descendents of core_kernel_text as notrace Nadav Amit
2022-11-22 19:53 ` [PATCH 2/3] lib/usercopy: Allow traceing of usercopy, xarray, iov_iter Nadav Amit
2022-11-22 19:53 ` [PATCH 3/3] compiler: inline does not imply notrace Nadav Amit
2022-11-22 20:09   ` Arnd Bergmann
2022-11-22 20:28     ` Steven Rostedt
2022-11-22 20:51     ` Nadav Amit
2022-11-29  2:36       ` Nadav Amit
2022-11-29  4:15         ` Steven Rostedt
2022-11-29  4:25           ` Nadav Amit
2022-11-29 15:06             ` Steven Rostedt
2022-11-29 18:02               ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).