linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  2024-05-01 21:06 97%                                         ` Linus Torvalds
@ 2024-05-01 21:20 94%                                           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-05-01 21:20 UTC (permalink / raw)
  To: Marco Elver
  Cc: paulmck, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
	linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
	Al Viro, Jiri Slaby

On Wed, 1 May 2024 at 14:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it would be something like
>
>         const struct file_operations    * __data_racy f_op;
>
> and only the load of f_op would be volatile - not the pointer itself.

Noe that in reality, we'd actually prefer the compiler to treat that
"__data_racy" as volatile in the sense of "don't reload this value",
but at the same time be the opposite of volatile in the sense that
using one read multiple times is actually a good idea.

IOW, the problem is rematerialization ("read the value more than once
when there is just one access in the source"), not strictly a "read
the value separately each time it is accessed".

We've actually had that before: it's not that we want each access to
force a read from memory, we want to avoid a TOCTOU race.

Many of our "READ_ONCE()" uses are of that kind, and using "volatile"
sadly generates horrible code, but is the only way to tell the
compiler to not ever rematerialize the value by loading it _twice_.

I'd love to see an extension where "const volatile" basically means
exactly that: the volatile tells the compiler that it can't
rematerialize by doing the load multiple times, but the "const" would
say that if the compiler sees two or more accesses, it can still CSE
them.

Oh well. Thankfully it's not a hugely common code generation problem.
It comes up every once in a while, and I think the last time this
worry came up, I think we had gcc people tell us that they don't
actually ever rematerialize loads from memory.

Of course, that was an implementation issue, not a guarantee.

                           Linus

^ permalink raw reply	[relevance 94%]

* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  @ 2024-05-01 21:06 97%                                         ` Linus Torvalds
  2024-05-01 21:20 94%                                           ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-05-01 21:06 UTC (permalink / raw)
  To: Marco Elver
  Cc: paulmck, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
	linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
	Al Viro, Jiri Slaby

On Wed, 1 May 2024 at 13:15, Marco Elver <elver@google.com> wrote:
>
> This is relatively trivial:
>
> #ifdef __SANITIZE_THREAD__
> #define __data_racy volatile
> #endif

I really wouldn't want to make a code generation difference, but I
guess when the sanitizer is on, the compiler generating crap code
isn't a huge deal.

> In some cases it might cause the compiler to complain if converting a
> volatile pointer to a non-volatile pointer

No. Note that it's not the *pointer* that is volatile, it's the
structure member.

So it would be something like

        const struct file_operations    * __data_racy f_op;

and only the load of f_op would be volatile - not the pointer itself.

Of course, if somebody then does "&file->f_op" to get a pointer to a
pointer, *that* would now be a volatile pointer, but I don't see
people doing that.

So I guess this might be a way forward. Anybody want to verify?

Now, the "hung_up_tty_fops" *do* need to be expanded to have hung up
ops for every op that is non-NULL in the normal tty ops. That was a
real bug. We'd also want to add a big comment to the tty fops to make
sure anybody who adds a new tty f_op member to make sure to populate
the hung up version too.

                Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  @ 2024-05-01 18:56 99%                                   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-05-01 18:56 UTC (permalink / raw)
  To: paulmck
  Cc: Marco Elver, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov,
	syzbot, linux-kernel, syzkaller-bugs, Nathan Chancellor,
	Arnd Bergmann, Al Viro, Jiri Slaby

On Wed, 1 May 2024 at 11:46, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In short, I for one do greatly value KCSAN's help.  Along with that of
> a great many other tools, none of which are perfect, but all of which
> are helpful.

It's not that I don't value what KCSAN does, but I really think this
is a KCSAN issue.

I absolutely *detest* these crazy "randomly add data race annotations".

Could we instead annotate particular structure fields? I don't want to
mark things actually "volatile", because that then causes the compiler
to generate absolutely horrendous code. But some KCSAN equivalent of
"this field has data races, and we don't care" kind of annotation
would be lovely..

                 Linus

^ permalink raw reply	[relevance 99%]

* [tip: x86/urgent] x86/mm: Remove broken vsyscall emulation code from the page fault code
  2024-04-29  1:33 75%         ` Linus Torvalds
    2024-04-30  6:16 51%           ` [tip: x86/urgent] " tip-bot2 for Linus Torvalds
@ 2024-05-01  7:50 50%           ` tip-bot2 for Linus Torvalds
  2 siblings, 0 replies; 200+ results
From: tip-bot2 for Linus Torvalds @ 2024-05-01  7:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+83e7f982ca045ab4405c, Linus Torvalds, Ingo Molnar,
	Jiri Olsa, Andy Lutomirski, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     02b670c1f88e78f42a6c5aee155c7b26960ca054
Gitweb:        https://git.kernel.org/tip/02b670c1f88e78f42a6c5aee155c7b26960ca054
Author:        Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate:    Mon, 29 Apr 2024 10:00:51 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 01 May 2024 09:41:43 +02:00

x86/mm: Remove broken vsyscall emulation code from the page fault code

The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:

  https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com

... and I think that's actually the important thing here:

 - the first page fault is from user space, and triggers the vsyscall
   emulation.

 - the second page fault is from __do_sys_gettimeofday(), and that should
   just have caused the exception that then sets the return value to
   -EFAULT

 - the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
   preempt_schedule() -> trace_sched_switch(), which then causes a BPF
   trace program to run, which does that bpf_probe_read_compat(), which
   causes that page fault under pagefault_disable().

It's quite the nasty backtrace, and there's a lot going on.

The problem is literally the vsyscall emulation, which sets

        current->thread.sig_on_uaccess_err = 1;

and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.

And I think that is in fact completely bogus.  It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.

In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.

Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.

I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.

The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:

                if (in_interrupt())
                        return;

which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.

IOW, I think the only right thing is to remove that horrendously broken
code.

The attached patch looks like the ObviouslyCorrect(tm) thing to do.

NOTE! This broken code goes back to this commit in 2011:

  4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")

... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:

    This fixes issues with UML when vsyscall=emulate.

... and so my patch to remove this garbage will probably break UML in this
situation.

I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.

Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 28 +---------------------
 arch/x86/include/asm/processor.h      |  1 +-
 arch/x86/mm/fault.c                   | 33 +--------------------------
 3 files changed, 3 insertions(+), 59 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df1..2fb7d53 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 
 static bool write_ok_or_segv(unsigned long ptr, size_t size)
 {
-	/*
-	 * XXX: if access_ok, get_user, and put_user handled
-	 * sig_on_uaccess_err, this could go away.
-	 */
-
 	if (!access_ok((void __user *)ptr, size)) {
 		struct thread_struct *thread = &current->thread;
 
@@ -120,10 +115,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
 bool emulate_vsyscall(unsigned long error_code,
 		      struct pt_regs *regs, unsigned long address)
 {
-	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr, syscall_nr, tmp;
-	int prev_sig_on_uaccess_err;
 	long ret;
 	unsigned long orig_dx;
 
@@ -172,8 +165,6 @@ bool emulate_vsyscall(unsigned long error_code,
 		goto sigsegv;
 	}
 
-	tsk = current;
-
 	/*
 	 * Check for access_ok violations and find the syscall nr.
 	 *
@@ -234,12 +225,8 @@ bool emulate_vsyscall(unsigned long error_code,
 		goto do_ret;  /* skip requested */
 
 	/*
-	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
-	 * preserve that behavior to make writing exploits harder.
+	 * With a real vsyscall, page faults cause SIGSEGV.
 	 */
-	prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
-	current->thread.sig_on_uaccess_err = 1;
-
 	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
@@ -262,23 +249,12 @@ bool emulate_vsyscall(unsigned long error_code,
 		break;
 	}
 
-	current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
 check_fault:
 	if (ret == -EFAULT) {
 		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-
-		/*
-		 * If we failed to generate a signal for any reason,
-		 * generate one here.  (This should be impossible.)
-		 */
-		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
-				 !sigismember(&tsk->pending.signal, SIGSEGV)))
-			goto sigsegv;
-
-		return true;  /* Don't emulate the ret. */
+		goto sigsegv;
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f..78e51b0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
 	unsigned long		iopl_emul;
 
 	unsigned int		iopl_warn:1;
-	unsigned int		sig_on_uaccess_err:1;
 
 	/*
 	 * Protection Keys Register for Userspace.  Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12e..bba4e02 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
 	WARN_ON_ONCE(user_mode(regs));
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
-		/*
-		 * Any interrupt that takes a fault gets the fixup. This makes
-		 * the below recursive fault logic only apply to a faults from
-		 * task context.
-		 */
-		if (in_interrupt())
-			return;
-
-		/*
-		 * Per the above we're !in_interrupt(), aka. task context.
-		 *
-		 * In this case we need to make sure we're not recursively
-		 * faulting through the emulate_vsyscall() logic.
-		 */
-		if (current->thread.sig_on_uaccess_err && signal) {
-			sanitize_error_code(address, &error_code);
-
-			set_signal_archinfo(address, error_code);
-
-			if (si_code == SEGV_PKUERR) {
-				force_sig_pkuerr((void __user *)address, pkey);
-			} else {
-				/* XXX: hwpoison faults will set the wrong code. */
-				force_sig_fault(signal, si_code, (void __user *)address);
-			}
-		}
-
-		/*
-		 * Barring that, we can do the fixup and be happy.
-		 */
+	if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
 		return;
-	}
 
 	/*
 	 * AMD erratum #91 manifests as a spurious page fault on a PREFETCH

^ permalink raw reply related	[relevance 50%]

* [tip: x86/urgent] x86/mm: Remove broken vsyscall emulation code from the page fault code
  2024-04-29  1:33 75%         ` Linus Torvalds
  @ 2024-04-30  6:16 51%           ` tip-bot2 for Linus Torvalds
  2024-05-01  7:50 50%           ` tip-bot2 for Linus Torvalds
  2 siblings, 0 replies; 200+ results
From: tip-bot2 for Linus Torvalds @ 2024-04-30  6:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+83e7f982ca045ab4405c, Linus Torvalds, Ingo Molnar,
	Jiri Olsa, Andy Lutomirski, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     c9e1dc9825319392b44d3c22493dc543075933b9
Gitweb:        https://git.kernel.org/tip/c9e1dc9825319392b44d3c22493dc543075933b9
Author:        Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate:    Mon, 29 Apr 2024 10:00:51 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 30 Apr 2024 08:08:30 +02:00

x86/mm: Remove broken vsyscall emulation code from the page fault code

The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:

  https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com

... and I think that's actually the important thing here:

 - the first page fault is from user space, and triggers the vsyscall
   emulation.

 - the second page fault is from __do_sys_gettimeofday(), and that should
   just have caused the exception that then sets the return value to
   -EFAULT

 - the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
   preempt_schedule() -> trace_sched_switch(), which then causes a BPF
   trace program to run, which does that bpf_probe_read_compat(), which
   causes that page fault under pagefault_disable().

It's quite the nasty backtrace, and there's a lot going on.

The problem is literally the vsyscall emulation, which sets

        current->thread.sig_on_uaccess_err = 1;

and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.

And I think that is in fact completely bogus.  It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.

In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.

Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.

I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.

The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:

                if (in_interrupt())
                        return;

which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.

IOW, I think the only right thing is to remove that horrendously broken
code.

The attached patch looks like the ObviouslyCorrect(tm) thing to do.

NOTE! This broken code goes back to this commit in 2011:

  4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")

... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:

    This fixes issues with UML when vsyscall=emulate.

... and so my patch to remove this garbage will probably break UML in this
situation.

I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.

Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 25 +-------------------
 arch/x86/include/asm/processor.h      |  1 +-
 arch/x86/mm/fault.c                   | 33 +--------------------------
 3 files changed, 3 insertions(+), 56 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df1..3b0f61b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 
 static bool write_ok_or_segv(unsigned long ptr, size_t size)
 {
-	/*
-	 * XXX: if access_ok, get_user, and put_user handled
-	 * sig_on_uaccess_err, this could go away.
-	 */
-
 	if (!access_ok((void __user *)ptr, size)) {
 		struct thread_struct *thread = &current->thread;
 
@@ -123,7 +118,6 @@ bool emulate_vsyscall(unsigned long error_code,
 	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr, syscall_nr, tmp;
-	int prev_sig_on_uaccess_err;
 	long ret;
 	unsigned long orig_dx;
 
@@ -234,12 +228,8 @@ bool emulate_vsyscall(unsigned long error_code,
 		goto do_ret;  /* skip requested */
 
 	/*
-	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
-	 * preserve that behavior to make writing exploits harder.
+	 * With a real vsyscall, page faults cause SIGSEGV.
 	 */
-	prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
-	current->thread.sig_on_uaccess_err = 1;
-
 	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
@@ -262,23 +252,12 @@ bool emulate_vsyscall(unsigned long error_code,
 		break;
 	}
 
-	current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
 check_fault:
 	if (ret == -EFAULT) {
 		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-
-		/*
-		 * If we failed to generate a signal for any reason,
-		 * generate one here.  (This should be impossible.)
-		 */
-		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
-				 !sigismember(&tsk->pending.signal, SIGSEGV)))
-			goto sigsegv;
-
-		return true;  /* Don't emulate the ret. */
+		goto sigsegv;
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f..78e51b0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
 	unsigned long		iopl_emul;
 
 	unsigned int		iopl_warn:1;
-	unsigned int		sig_on_uaccess_err:1;
 
 	/*
 	 * Protection Keys Register for Userspace.  Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12e..bba4e02 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
 	WARN_ON_ONCE(user_mode(regs));
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
-		/*
-		 * Any interrupt that takes a fault gets the fixup. This makes
-		 * the below recursive fault logic only apply to a faults from
-		 * task context.
-		 */
-		if (in_interrupt())
-			return;
-
-		/*
-		 * Per the above we're !in_interrupt(), aka. task context.
-		 *
-		 * In this case we need to make sure we're not recursively
-		 * faulting through the emulate_vsyscall() logic.
-		 */
-		if (current->thread.sig_on_uaccess_err && signal) {
-			sanitize_error_code(address, &error_code);
-
-			set_signal_archinfo(address, error_code);
-
-			if (si_code == SEGV_PKUERR) {
-				force_sig_pkuerr((void __user *)address, pkey);
-			} else {
-				/* XXX: hwpoison faults will set the wrong code. */
-				force_sig_fault(signal, si_code, (void __user *)address);
-			}
-		}
-
-		/*
-		 * Barring that, we can do the fixup and be happy.
-		 */
+	if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
 		return;
-	}
 
 	/*
 	 * AMD erratum #91 manifests as a spurious page fault on a PREFETCH

^ permalink raw reply related	[relevance 51%]

* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
  @ 2024-04-30  0:05 99%                     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-30  0:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Hillf Danton, Peter Anvin, Adrian Bunk, syzbot,
	Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

On Mon, 29 Apr 2024 at 16:30, Andy Lutomirski <luto@amacapital.net> wrote:
>
> What strange page table handling do we do for XONLY?

Ahh, I misread set_vsyscall_pgtable_user_bits(). It's used for EMULATE
not for XONLY.

And the code in pti_setup_vsyscall() is just wrong, and does it for all cases.

> So I think we should remove EMULATE before removing XONLY.

Ok, looking at that again, I don't disagree. I misread that XONLY as
mapping it executable, but it is actually just mapping it readable

Yes, let's remove EMULATE, and keep XONLY.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
  2024-04-29 18:47 95%               ` Linus Torvalds
@ 2024-04-29 19:07 98%                 ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 19:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
	Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

On Mon, 29 Apr 2024 at 11:47, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In particular, I think the page fault emulation code should be moved
> from do_user_addr_fault() to do_kern_addr_fault(), and the horrible
> hack that is fault_in_kernel_space() should be removed (it is what now
> makes a vsyscall page fault be treated as a user address, and the only
> _reason_ for that is that we do the vsyscall handling in the wrong
> place).

Final note: we should also remove the XONLY option entirely, and
remove all the strange page table handling we currently do for it.

It won't work anyway on future CPUs with LASS, and we *have* to
emulate things (and not in the page fault path, I think LASS will
cause a GP fault).

I think the LASS patches ended up just disabling LASS if people wanted
vsyscall, which is probably the worst case.

Again, this is more of a "I think we have more work to do", and should
all happen after that sig_on_uaccess_err stuff is gone.

I guess that patch to rip out sig_on_uaccess_err needs to go into 6.9
and even be marked for stable, since it most definitely breaks some
stuff currently. Even if that "some stuff" is pretty esoteric (ie
"vsyscall=emulate" together with tracing).

                  Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
  2024-04-29 15:51 99%             ` Linus Torvalds
@ 2024-04-29 18:47 95%               ` Linus Torvalds
  2024-04-29 19:07 98%                 ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 18:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
	Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

On Mon, 29 Apr 2024 at 08:51, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Well, Hilf had it go through the syzbot testing, and Jiri seems to
> have tested it on his setup too, so it looks like it's all good, and
> you can change the "Not-Yet-Signed-off-by" to be a proper sign-off
> from me.

Side note: having looked more at this, I suspect we have room for
further cleanups in this area.

In particular, I think the page fault emulation code should be moved
from do_user_addr_fault() to do_kern_addr_fault(), and the horrible
hack that is fault_in_kernel_space() should be removed (it is what now
makes a vsyscall page fault be treated as a user address, and the only
_reason_ for that is that we do the vsyscall handling in the wrong
place).

I also think that the vsyscall emulation code should just be cleaned
up - instead of looking up the system call number and then calling the
__x64_xyz() system call stub, I think we should just write out the
code in-place. That would get the SIGSEGV cases right too, and I think
it would actually clean up the code. We already do almost everything
but the (trivial) low-level ops anyway.

But I think my patch to remove the 'sig_on_uaccess_err' should just go
in first, since it fixes a real and present issue. And then if
somebody has the energy - or if it turns out that we actually need to
get the SIGSEGV siginfo details right - we can do the other cleanups.
They are mostly unrelated, but the current sig_on_uaccess_err code
just makes everything more complicated and needs to go.

                     Linus

^ permalink raw reply	[relevance 95%]

* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
  @ 2024-04-29 15:51 99%             ` Linus Torvalds
  2024-04-29 18:47 95%               ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
	Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

On Mon, 29 Apr 2024 at 01:00, Ingo Molnar <mingo@kernel.org> wrote:
>
> I did some Simple Testing™, and nothing seemed to break in any way visible
> to me, and the diffstat is lovely:
>
>     3 files changed, 3 insertions(+), 56 deletions(-)
>
> Might stick this into tip:x86/mm and see what happens?

Well, Hilf had it go through the syzbot testing, and Jiri seems to
have tested it on his setup too, so it looks like it's all good, and
you can change the "Not-Yet-Signed-off-by" to be a proper sign-off
from me.

It would be good to have some UML testing done, but at the same time I
do think that anybody running UML on modern kernels should be running
a modern user-mode setup too, so while the exact SIGSEGV details may
have been an issue in 2011, I don't think it's reasonable to think
that it's an issue in 2024.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  @ 2024-04-29 15:38 99%                               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 15:38 UTC (permalink / raw)
  To: Marco Elver
  Cc: Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
	linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
	Al Viro, Jiri Slaby, Paul E. McKenney

On Mon, 29 Apr 2024 at 06:56, Marco Elver <elver@google.com> wrote:
>
> A WRITE_ONCE() / READ_ONCE() pair would do it here. What should we use instead?

Why would we annotate a "any other code generation is insane" issues at all?

When we do chained pointer loads in

    file->f_op->op()

and we say "I don't care what value I get for the middle one", I don't
see the value in annotating that at all.

There is no compiler that will sanely and validly do a pointer chain
load by *anything* but a load. And it doesn't matter to us if it then
spills and reloads, it will *STILL* be a load.

We're not talking about "extract different bits in separate
operations". We're talking about following one pointer that can point
to two separate static values.

Reality matters. A *lot* more than some "C standard" that we already
have ignored for decades because it's not strong enough.

                       Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS
  @ 2024-04-29 15:32 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-29 15:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-kernel,
	Михаил
	Новоселов,
	Ильфат
	Гаптрахманов,
	stable, Rik van Riel, Mel Gorman, Peter Zijlstra, Ingo Molnar,
	Andrew Morton

On Mon, 29 Apr 2024 at 07:48, Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> bits_per() rounds up to the next power of two when passed a power of
> two.  This causes crashes on some machines and configurations.

Bah. Your patch is *still* wrong, because bits_per() thinks you need
one bit for a zero value, so when you do

        bits_per(CONFIG_NR_CPUS - 1)

and some insane person has enabled SMP and managed to set
CONFIG_NR_CPUS to 1, the math is *still* broken.

The right thing to do is

        order_base_2(CONFIG_NR_CPUS)

and 'bits_per()' should be avoided, having completely crazy semantics
(you can tell how almost all users actually do "x-1" as the argument).

We should probably get rid of that horrid bits_per(() entirely.

I applied your patch with that fixed (which admittedly make it all
*my* patch, but applying it as yours just to get the changelog).

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
  2024-04-29  0:50 99%       ` Linus Torvalds
@ 2024-04-29  1:33 75%         ` Linus Torvalds
                               ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-04-29  1:33 UTC (permalink / raw)
  To: Hillf Danton, Andy Lutomirski, Peter Anvin, Ingo Molnar, Adrian Bunk
  Cc: syzbot, Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

[-- Attachment #1: Type: text/plain, Size: 3180 bytes --]

On Sun, 28 Apr 2024 at 17:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>    But the immediate problem is
> not the user space access, it's that something goes horribly wrong
> *around* it.

Side note: that stack trace from hell actually has three nested page
faults, and I think that's actually the important thing here:

 - the first page fault is from user space, and triggers the vsyscall emulation.

 - the second page fault is from __do_sys_gettimeofday, and that
should just have caused the exception that then sets the return value
to -EFAULT

 - the third nested page fault is due to _raw_spin_unlock_irqrestore
-> preempt_schedule -> trace_sched_switch, which then causes that bpf
trace program to run, which does that bpf_probe_read_compat, which
causes that page fault under pagefault_disable().

It's quite the nasty backtrace, and there's a lot going on.

And I think I finally see what may be going on. The problem is
literally the vsyscall emulation, which sets

        current->thread.sig_on_uaccess_err = 1;

and that causes the fixup_exception() code to send the signal
*despite* the exception being caught.

And I think that is in fact completely bogus.  It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent
- like for the bpf user mode trace gathering.

In other words, I think the whole "sig_on_uaccess_err" thing is
entirely broken, because it makes any nested page-faults do all the
wrong things.

Now, arguably, I don't think anybody should enable vsyscall emulation
any more, but this test case clearly does.

I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state
for something that isn't actually per thread.

The x86 page fault code actually tried to deal with the "incorrect
nesting" by having that

                if (in_interrupt())
                        return;

which ignores the sig_on_uaccess_err case when it happens in
interrupts, but as shown by this example, these nested page faults do
not need to be about interrupts at all.

IOW, I think the only right thing is to remove that horrendously broken code.

The attached patch is ENTIRELY UNTESTED, but looks like the
ObviouslyCorrect(tm) thing to do.

NOTE! This broken code goes back to commit 4fc3490114bb ("x86-64: Set
siginfo and context on vsyscall emulation faults") in 2011, and back
then the reason was to get all the siginfo details right. Honestly, I
do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says

    This fixes issues with UML when vsyscall=emulate.

and so my patch to remove this garbage will probably break UML in this
situation.

I cannot find it in myself to care, since I do not believe that
anybody should be running with vsyscall=emulate in 2024 in the first
place, much less if you are doing things like UML. But let's see if
somebody screams.

Also, somebody should obviously test my COMPLETELY UNTESTED patch.

Did I make it clear enough that this is UNTESTED and just does
crapectgomy on something that is clearly broken?

           Linus "UNTESTED" Torvalds

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4002 bytes --]

 arch/x86/entry/vsyscall/vsyscall_64.c | 25 ++-----------------------
 arch/x86/include/asm/processor.h      |  1 -
 arch/x86/mm/fault.c                   | 33 +--------------------------------
 3 files changed, 3 insertions(+), 56 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df11d0e6..3b0f61b2ea6d 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 
 static bool write_ok_or_segv(unsigned long ptr, size_t size)
 {
-	/*
-	 * XXX: if access_ok, get_user, and put_user handled
-	 * sig_on_uaccess_err, this could go away.
-	 */
-
 	if (!access_ok((void __user *)ptr, size)) {
 		struct thread_struct *thread = &current->thread;
 
@@ -123,7 +118,6 @@ bool emulate_vsyscall(unsigned long error_code,
 	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr, syscall_nr, tmp;
-	int prev_sig_on_uaccess_err;
 	long ret;
 	unsigned long orig_dx;
 
@@ -234,12 +228,8 @@ bool emulate_vsyscall(unsigned long error_code,
 		goto do_ret;  /* skip requested */
 
 	/*
-	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
-	 * preserve that behavior to make writing exploits harder.
+	 * With a real vsyscall, page faults cause SIGSEGV.
 	 */
-	prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
-	current->thread.sig_on_uaccess_err = 1;
-
 	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
@@ -262,23 +252,12 @@ bool emulate_vsyscall(unsigned long error_code,
 		break;
 	}
 
-	current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
 check_fault:
 	if (ret == -EFAULT) {
 		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-
-		/*
-		 * If we failed to generate a signal for any reason,
-		 * generate one here.  (This should be impossible.)
-		 */
-		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
-				 !sigismember(&tsk->pending.signal, SIGSEGV)))
-			goto sigsegv;
-
-		return true;  /* Don't emulate the ret. */
+		goto sigsegv;
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..78e51b0d6433 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
 	unsigned long		iopl_emul;
 
 	unsigned int		iopl_warn:1;
-	unsigned int		sig_on_uaccess_err:1;
 
 	/*
 	 * Protection Keys Register for Userspace.  Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12ec7f08..bba4e020dd64 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
 	WARN_ON_ONCE(user_mode(regs));
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
-		/*
-		 * Any interrupt that takes a fault gets the fixup. This makes
-		 * the below recursive fault logic only apply to a faults from
-		 * task context.
-		 */
-		if (in_interrupt())
-			return;
-
-		/*
-		 * Per the above we're !in_interrupt(), aka. task context.
-		 *
-		 * In this case we need to make sure we're not recursively
-		 * faulting through the emulate_vsyscall() logic.
-		 */
-		if (current->thread.sig_on_uaccess_err && signal) {
-			sanitize_error_code(address, &error_code);
-
-			set_signal_archinfo(address, error_code);
-
-			if (si_code == SEGV_PKUERR) {
-				force_sig_pkuerr((void __user *)address, pkey);
-			} else {
-				/* XXX: hwpoison faults will set the wrong code. */
-				force_sig_fault(signal, si_code, (void __user *)address);
-			}
-		}
-
-		/*
-		 * Barring that, we can do the fixup and be happy.
-		 */
+	if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
 		return;
-	}
 
 	/*
 	 * AMD erratum #91 manifests as a spurious page fault on a PREFETCH

^ permalink raw reply related	[relevance 75%]

* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
  @ 2024-04-29  0:50 99%       ` Linus Torvalds
  2024-04-29  1:33 75%         ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29  0:50 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs

On Sun, 28 Apr 2024 at 16:23, Hillf Danton <hdanton@sina.com> wrote:
>
> So is game like copying from/putting to user with runqueue locked
> at the first place.

No, that should be perfectly fine. In fact, it's even normal. It would
happen any time you have any kind of tracing thing, where looking up
the user mode frame involves doing user accesses with page faults
disabled.

The runqueue lock is irrelevant. As mentioned, it's only a symptom of
something else going wrong.

Now, judging by the syz reproducer, the trigger for this all is almost
certainly that

   bpf$BPF_RAW_TRACEPOINT_OPEN(0x11,
&(0x7f00000000c0)={&(0x7f0000000080)='sched_switch\x00', r0}, 0x10)

and that probably causes the instability. But the immediate problem is
not the user space access, it's that something goes horribly wrong
*around* it.

> Plus as per another syzbot report [1], bpf could make trouble with
> workqueue pool locked.

That seems to be entirely different. There's no unexplained page fault
in that case, that seems to be purely a "take lock in the wrong order"

                Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.9-rc6
@ 2024-04-28 20:58 43% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Things continue to look pretty normal, and nothing here really stands
out. The biggest single change that stands out in the diffstat is
literally a documentation update, everything else looks pretty small
and spread out.

We have the usual driver updates (mainly networking and gpu but some
updates elsewhere), some filesystem updates (mainly smb, bcachefs,
nfsd reverts, and some ntfs compat updates), and misc other fixes all
over - wifi fixes, arm dts fixlets, yadda yadda.

Nothing looks particularly big or bad. Shortlog appended for details,
please do keep testing,

                Linus

---

Abdelrahman Morsy (1):
      HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled

Akhil R (1):
      dmaengine: tegra186: Fix residual calculation

Alex Deucher (1):
      drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3

Alex Elder (1):
      mailmap: add entries for Alex Elder

Alexey Brodkin (1):
      ARC: [plat-hsdk]: Remove misplaced interrupt-cells property

Alice Ryhl (1):
      rust: don't select CONSTRUCTORS

Andrei Simion (2):
      ARM: dts: microchip: at91-sama7g5ek: Replace
regulator-suspend-voltage with the valid property
      ARM: dts: microchip: at91-sama7g54_curiosity: Replace
regulator-suspend-voltage with the valid property

Andrew Jones (1):
      RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2

Andrey Ryabinin (1):
      stackdepot: respect __GFP_NOLOCKDEP allocation flag

Andy Shevchenko (2):
      idma64: Don't try to serve interrupts when device is powered off
      gpio: tangier: Use correct type for the IRQ chip data

Andy Yan (1):
      arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi CM5

AngeloGioacchino Del Regno (1):
      soc: mediatek: mtk-svs: Append "-thermal" to thermal zone names

Arkadiusz Kubalewski (1):
      dpll: fix dpll_pin_on_pin_register() for multiple parent pins

Arnd Bergmann (2):
      dmaengine: owl: fix register access functions
      mtd: diskonchip: work around ubsan link failure

Arınç ÜNAL (1):
      arm64: dts: rockchip: set PHY address of MT7531 switch to 0x1f

Aswin Unnikrishnan (1):
      rust: remove `params` from `module` macro example

Avraham Stern (1):
      wifi: iwlwifi: mvm: remove old PASN station when adding a new one

Baoquan He (1):
      LoongArch: Fix Kconfig item and left code related to CRASH_CORE

Bartosz Golaszewski (1):
      Bluetooth: qca: set power_ctrl_enabled on NULL returned by
gpiod_get_optional()

Ben Zong-You Xie (1):
      perf riscv: Fix the warning due to the incompatible type

Benjamin Tissoires (1):
      MAINTAINERS: update Benjamin's email address

Benno Lossin (1):
      rust: macros: fix soundness issue in `module!` macro

Bibo Mao (1):
      LoongArch: Lately init pmu after smp is online

Bjorn Helgaas (1):
      ARC: Fix typos

Bo-Wei Chen (1):
      docs: rust: fix improper rendering in Arch Support page

Christian Brauner (3):
      ntfs3: serve as alias for the legacy ntfs driver
      ntfs3: enforce read-only when used as legacy ntfs driver
      ntfs3: add legacy ntfs file operations

Christian Gmeiner (1):
      Revert "drm/etnaviv: Expose a few more chipspecs to userspace"

Christian Marangi (2):
      mtd: rawnand: qcom: Fix broken OP_RESET_DEVICE command in
qcom_misc_cmd_type_exec()
      mtd: limit OTP NVMEM cell parse to non-NAND devices

Christoph Müllner (2):
      riscv: thead: Rename T-Head PBMT to MAE
      riscv: T-Head: Test availability bit before enabling MAE errata

Chuck Lever (3):
      Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"
      Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt
is shut down"
      Revert "NFSD: Convert the callback workqueue to use delayed_work"

Chun-Yi Lee (1):
      Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor

Clément Léger (2):
      riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN
      selftests: sud_test: return correct emulated syscall value on RISC-V

Conor Dooley (1):
      rust: make mutually exclusive with CFI_CLANG

Cristian Ciocaltea (1):
      phy: phy-rockchip-samsung-hdptx: Select CONFIG_RATIONAL

Dan Carpenter (1):
      net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()

Dan Williams (1):
      cxl/core: Fix potential payload size confusion in cxl_mem_get_poison()

Daniel Golle (2):
      soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUS
      net: phy: mediatek-ge-soc: follow netdev LED trigger semantics

Daniel Okazaki (1):
      eeprom: at24: fix memory corruption race condition

Daniele Palmas (1):
      net: usb: qmi_wwan: add Telit FN920C04 compositions

David Bauer (1):
      vxlan: drop packets from invalid src-address

David Christensen (1):
      MAINTAINERS: eth: mark IBM eHEA as an Orphan

David Hildenbrand (1):
      LoongArch: Fix a build error due to __tlb_remove_tlb_entry()

David Howells (4):
      cifs: Fix reacquisition of volume cookie on still-live connection
      cifs: Add tracing for the cifs_tcon struct refcounting
      netfs: Fix writethrough-mode error handling
      netfs: Fix the pre-flush when appending to a file in writethrough mode

David Kaplan (1):
      x86/cpu: Fix check for RDPKRU in __show_regs()

David Sterba (1):
      btrfs: remove colon from messages with state

Derek Foreman (1):
      drm/etnaviv: fix tx clock gating on some GC7000 variants

Dragan Simic (2):
      arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
      arm64: dts: rockchip: Designate the system power controller on QuartzPro64

Duanqiang Wen (3):
      net: libwx: fix alloc msix vectors failed
      Revert "net: txgbe: fix i2c dev name cannot match clkdev"
      Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"

Duoming Zhou (1):
      ax25: Fix netdev refcount issue

Edward Liaw (1):
      selftests/harness: remove use of LINE_MAX

Eric Dumazet (4):
      icmp: prevent possible NULL dereferences from icmp_build_probe()
      net: fix sk_memory_allocated_{add|sub} vs softirqs
      ipv4: check for NULL idev in ip_route_use_hint()
      net: usb: ax88179_178a: stop lying about skb->truesize

Eric Van Hensbergen (1):
      fs/9p: mitigate inode collisions

Erwan Velu (1):
      i40e: Report MFS in decimal base instead of hex

Felix Fietkau (1):
      wifi: mac80211: split mesh fast tx cache into local/proxied/forwarded

Felix Kuehling (3):
      drm/amdkfd: Fix eviction fence handling
      drm/amdgpu: Update BO eviction priorities
      drm/amdkfd: Fix rescheduling of restore worker

Fenghua Yu (1):
      dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

Gabor Juhos (1):
      phy: qcom: m31: match requested regulator name with dt schema

Geert Uytterhoeven (1):
      net: ravb: Fix registered interrupt names

Guanrui Huang (1):
      irqchip/gic-v3-its: Prevent double free on error

Guenter Roeck (1):
      MAINTAINERS: Drop entry for PCA9541 bus master selector

Gustavo A. R. Silva (1):
      smb: client: Fix struct_group() usage in __packed structs

Günther Noack (1):
      fs: Return ENOTTY directly if FS_IOC_GETUUID or FS_IOC_GETFSSYSFSPATH fail

Hangbin Liu (1):
      bridge/br_netlink.c: no need to return void function

Hans de Goede (1):
      phy: ti: tusb1210: Resolve charger-det crash if charger psy is
unregistered

Himal Prasad Ghimiray (2):
      drm/xe: Remove sysfs only once on action add failure
      drm/xe: call free_gsc_pkt only once on action add failure

Huacai Chen (1):
      LoongArch: Fix callchain parse error with kernel tracepoint events

Hyunwoo Kim (3):
      tcp: Fix Use-After-Free in tcp_ao_connect_init
      net: gtp: Fix Use-After-Free in gtp_dellink
      net: openvswitch: Fix Use-After-Free in ovs_ct_exit

Ido Schimmel (12):
      mlxsw: core: Unregister EMAD trap using FORWARD action
      mlxsw: core_env: Fix driver initialization with old firmware
      mlxsw: pci: Fix driver initialization with old firmware
      mlxsw: spectrum_acl_tcam: Fix race in region ID allocation
      mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
      mlxsw: spectrum_acl_tcam: Fix possible use-after-free during
activity update
      mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
      mlxsw: spectrum_acl_tcam: Rate limit error message
      mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
      mlxsw: spectrum_acl_tcam: Fix warning during rehash
      mlxsw: spectrum_acl_tcam: Fix incorrect list API usage
      mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work

Igor Artemiev (1):
      wifi: cfg80211: fix the order of arguments for trace events of
the tx_rx_evt class

Ikjoon Jang (1):
      arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg

Iskander Amara (2):
      arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma
      arm64: dts: rockchip: fix alphabetical ordering RK3399 puma

Ismael Luceno (1):
      ipvs: Fix checksumming on GSO of SCTP packets

Jack Xiao (1):
      drm/amdgpu/mes: fix use-after-free issue

Jacob Keller (1):
      ice: fix LAG and VF lock dependency in ice_reset_vf()

Jakub Kicinski (2):
      tools: ynl: don't ignore errors in NLMSG_DONE messages
      eth: bnxt: fix counting packets discarded due to OOM and netpoll

Jarred White (1):
      ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro

Jason Reeder (1):
      net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets

Jiantao Shan (1):
      LoongArch: Fix access error when read fault on a write-only VMA

Johan Hovold (5):
      phy: qcom: qmp-combo: fix VCO div offset on v5_5nm and v6
      arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
      Bluetooth: qca: fix invalid device address check
      Bluetooth: qca: fix NULL-deref on non-serdev suspend
      Bluetooth: qca: fix NULL-deref on non-serdev setup

Johannes Berg (12):
      wifi: mac80211: check EHT/TTLM action frame length
      wifi: mac80211: don't use rate mask for scanning
      Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices"
      wifi: mac80211: fix idle calculation with multi-link
      wifi: mac80211: mlme: re-parse with correct mode
      wifi: mac80211: mlme: fix memory leak
      wifi: mac80211: mlme: re-parse if AP mode is less than client
      wifi: nl80211: don't free NULL coalescing rule
      wifi: mac80211_hwsim: init peer measurement result
      wifi: mac80211: remove link before AP
      wifi: mac80211: fix unaligned le16 access
      wifi: iwlwifi: mvm: fix link ID management

Johannes Thumshirn (1):
      btrfs: fix information leak in btrfs_ioctl_logical_to_ino()

Johannes Weiner (1):
      mm: zswap: fix shrinker NULL crash with cgroup_disable=memory

Jose Ignacio Tornos Martinez (1):
      arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro

Joshua Ashton (1):
      drm/amd/display: Set color_mgmt_changed to true on unsuspend

Justin Chen (1):
      net: bcmasp: fix memory leak when bringing down interface

Kalle Valo (1):
      wifi: ath11k: use RCU when accessing struct inet6_dev::ac_list

Kenny Levinsen (1):
      HID: i2c-hid: Revert to await reset ACK before reading report descriptor

Kent Overstreet (14):
      bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
      bcachefs: node scan: ignore multiple nodes with same seq if interior
      bcachefs: make sure to release last journal pin in replay
      bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift
      bcachefs: KEY_TYPE_error is allowed for reflink
      bcachefs: fix leak in bch2_gc_write_reflink_key
      bcachefs: Fix bio alloc in check_extent_checksum()
      bcachefs: Check for journal entries overruning end of sb clean section
      bcachefs: Fix missing call to bch2_fs_allocator_background_exit()
      bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong
      bcachefs: Tweak btree key cache shrinker so it actually frees
      bcachefs: Fix deadlock in journal write path
      bcachefs: Fix inode early destruction path
      bcachefs: If we run merges at a lower watermark, they must be nonblocking

Kirill A. Shutemov (1):
      x86/tdx: Preserve shared bit on mprotect()

Krzysztof Kozlowski (4):
      arm64: dts: rockchip: drop panel port unit address in GRU Scarlet
      arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
      arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
      arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2

Kuniyuki Iwashima (1):
      af_unix: Suppress false-positive lockdep splat for spin_lock()
in __unix_gc().

Laine Taffin Altman (1):
      rust: init: remove impl Zeroable for Infallible

Lang Yu (2):
      drm/amdkfd: make sure VM is ready for updating operations
      drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend

Lijo Lazar (2):
      drm/amdgpu: Assign correct bits for SDMA HDP flush
      drm/amd/pm: Restore config space after reset

Linus Torvalds (1):
      Linux 6.9-rc6

Louis Chauvet (1):
      dmaengine: xilinx: xdma: Fix synchronization issue

Luca Weiss (1):
      arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs

Lucas Stach (1):
      drm/atomic-helper: fix parameter order in
drm_format_conv_state_copy() call

Luiz Augusto von Dentz (3):
      Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
      Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE
      Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID

Lukas Wunner (1):
      igc: Fix LED-related deadlock on driver unbind

MD Danish Anwar (1):
      net: phy: dp83869: Fix MII mode failure

Ma Jun (1):
      drm/amdgpu/pm: Remove gpu_od if it's an empty directory

Maksim Kiselev (1):
      mmc: sdhci-of-dwcmshc: th1520: Increase tuning loop count to 128

Manivannan Sadhasivam (3):
      arm64: dts: qcom: sm8450: Fix the msi-map entries
      arm64: dts: qcom: sm8550: Fix the msi-map entries
      arm64: dts: qcom: sm8650: Fix the msi-map entries

Mantas Pucka (1):
      mmc: sdhci-msm: pervent access to suspended controller

Marcel Ziswiler (1):
      phy: freescale: imx8m-pcie: fix pcie link-up instability

Marek Vasut (1):
      arm64: dts: imx8mp: Fix assigned-clocks for second CSI2

Marios Makassikis (1):
      ksmbd: clear RENAME_NOREPLACE before calling vfs_rename

Matthew Sakai (1):
      dm vdo murmurhash: remove unneeded semicolon

Matthew Wilcox (Oracle) (3):
      mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
      mm: support page_mapcount() on page_has_type() pages
      mm: turn folio_test_hugetlb into a PageType

Matthias Schiffer (1):
      net: dsa: mv88e6xx: fix supported_interfaces setup in
mv88e6250_phylink_get_caps()

Maximilian Luz (2):
      firmware: qcom: uefisecapp: Fix memory related IO errors and crashes
      arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller

Miaohe Lin (1):
      mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()

Michael Chan (1):
      bnxt_en: Fix error recovery for 5760X (P7) chips

Michael Heimpold (1):
      ARM: dts: imx6ull-tarragon: fix USB over-current polarity

Michal Tomek (1):
      phy: rockchip-snps-pcie3: fix bifurcation on rk3588

Michal Wajdeczko (1):
      drm/xe/guc: Fix arguments passed to relay G2H handlers

Miguel Ojeda (2):
      kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
      kbuild: rust: force `alloc` extern to allow "empty" Rust files

Mikhail Kobuk (2):
      phy: marvell: a3700-comphy: Fix out of bounds read
      phy: marvell: a3700-comphy: Fix hardcoded array size

Ming Lei (1):
      dm: restore synchronous close of device mapper block device

Miquel Raynal (2):
      dmaengine: xilinx: xdma: Fix wrong offsets in the buffers
addresses in dma descriptor
      dmaengine: xilinx: xdma: Clarify kdoc in XDMA driver

Miri Korenblit (1):
      wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd

Muhammad Usama Anjum (2):
      selftests: mm: fix unused and uninitialized variable warning
      selftests: mm: protection_keys: save/restore nr_hugepages value
from launch script

Muhammed Efe Cetin (1):
      arm64: dts: rockchip: mark system power controller and fix typo
on orangepi-5-plus

Mukul Joshi (2):
      drm/amdgpu: Fix leak when GPU memory allocation fails
      drm/amdkfd: Add VRAM accounting for SVM migration

Nam Cao (2):
      HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
      fbdev: fix incorrect address computation in deferred IO

Namjae Jeon (4):
      ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
      ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
      ksmbd: common: use struct_group_attr instead of struct_group for
network_open_info
      ksmbd: add continuous availability share parameter

Naohiro Aota (1):
      btrfs: scrub: run relocation repair when/only needed

Nathan Chancellor (2):
      bcachefs: Fix format specifier in validate_bset_keys()
      Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()

Nuno Pereira (1):
      HID: nintendo: Fix N64 controller being identified as mouse

Nícolas F. R. A. Prado (5):
      arm64: dts: mediatek: mt8192: Add missing gce-client-reg to mutex
      arm64: dts: mediatek: mt8195: Add missing gce-client-reg to vpp/vdosys
      arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex
      arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex1
      arm64: dts: mediatek: cherry: Describe CPU supplies

Oleg Nesterov (2):
      sched/isolation: Prevent boot crash when the boot CPU is nohz_full
      sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU

Pablo Neira Ayuso (1):
      netfilter: nf_tables: honor table dormant flag from netdev
release event path

Patrik Jakobsson (1):
      drm/gma500: Remove lid code

Paul Geurts (1):
      NFC: trf7970a: disable all regulators on removal

Paulo Alcantara (1):
      smb: client: fix rename(2) regression against samba

Peter Münster (1):
      net: b44: set pause params only when interface is up

Peter Xu (1):
      mm/hugetlb: fix missing hugetlb_lock for resv uncharge

Peyton Lee (1):
      drm/amdgpu/vpe: fix vpe dpm setup failed

Pin-yen Lin (4):
      arm64: dts: mediatek: mt8192-asurada: Update min voltage
constraint for MT6315
      arm64: dts: mediatek: mt8195-cherry: Update min voltage
constraint for MT6315
      arm64: dts: mediatek: mt8183-kukui: Use default min voltage for MT6358
      arm64: dts: mediatek: mt8186-corsola: Update min voltage
constraint for Vgpu

Prathamesh Shete (1):
      gpio: tegra186: Fix tegra186_gpio_is_accessible() check

Prike Liang (1):
      drm/amdgpu: Fix the ring buffer size for queue VM flush

Qu Wenruo (1):
      btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()

Quentin Schulz (3):
      arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
      arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for
RK3399 Puma
      arm64: dts: rockchip: add regulators for PCIe on RK3399 Puma Haikou

Rafael J. Wysocki (1):
      ACPI: PM: s2idle: Evaluate all Low-Power S0 Idle _DSM functions

Rafał Miłecki (9):
      arm64: dts: mediatek: mt7622: fix clock controllers
      arm64: dts: mediatek: mt7622: fix IR nodename
      arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
      arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
      arm64: dts: mediatek: mt7986: drop invalid properties from ethsys
      arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controller
      arm64: dts: mediatek: mt7986: drop invalid thermal block clock
      arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-"
      arm64: dts: mediatek: mt2712: fix validation errors

Rahul Rameshbabu (4):
      macsec: Enable devices to advertise whether they update sk_buff
md_dst during offloads
      ethernet: Add helper for assigning packet type when dest address
does not match device address
      macsec: Detect if Rx skb is macsec-related for offloading
devices that update md_dst
      net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst
for MACsec

Rajendra Nayak (1):
      arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states

Rex Zhang (1):
      dmaengine: idxd: Convert spinlock to mutex to lock evl workqueue

Richard Kinder (1):
      wifi: mac80211: ensure beacon is non-S1G prior to extracting the
beacon timestamp field

Rob Herring (3):
      dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
      dt-bindings: eeprom: at24: Fix ST M24C64-D compatible schema
      arm64: dts: rockchip: Fix USB interface compatible string on
kobol-helios64

Sabrina Dubroca (1):
      tls: fix lockless read of strp->msg_ready in ->poll

Samuel Holland (2):
      riscv: Fix TASK_SIZE on 64-bit NOMMU
      riscv: Fix loading 64-bit NOMMU kernels past the start of RAM

Sean Anderson (1):
      dma: xilinx_dpdma: Fix locking

Sean Christopherson (2):
      cpu: Re-enable CPU mitigations by default for !X86 architectures
      cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n

Sean Wang (1):
      Bluetooth: btusb: mediatek: Fix double free of skb in coredump

Sebastian Reichel (2):
      phy: rockchip-snps-pcie3: fix clearing PHP_GRF_PCIESEL_CON bits
      phy: rockchip: naneng-combphy: Fix mux on rk3588

Sergei Antonov (1):
      mmc: moxart: fix handling of sgm->consumed, otherwise WARN_ON triggers

Sindhu Devale (1):
      i40e: Do not use WQ_MEM_RECLAIM flag for workqueue

Stephen Boyd (2):
      phy: qcom: qmp-combo: Fix VCO div offset on v3
      phy: qcom: qmp-combo: Fix register base for QSERDES_DP_PHY_MODE

Steve French (2):
      smb3: missing lock when picking channel
      smb3: fix lock ordering potential deadlock in cifs_sync_mid_result

Su Hui (1):
      octeontx2-af: fix the double free in rvu_npc_freemem()

Sudheer Mogilappagari (1):
      iavf: Fix TC config comparison with existing adapter TC config

Sweet Tea Dorminy (1):
      btrfs: fallback if compressed IO fails for ENOSPC

Takayuki Nagata (1):
      cifs: reinstate original behavior again for forceuid/forcegid

Tetsuo Handa (1):
      profiling: Remove create_prof_cpu_mask().

Thorsten Leemhuis (6):
      docs: verify/bisect: use git switch, tag kernel, and various fixes
      docs: verify/bisect: add and fetch stable branches ahead of time
      docs: verify/bisect: proper headlines and more spacing
      docs: verify/bisect: explain testing reverts, patches and newer code
      docs: verify/bisect: describe how to use a build host
      docs: verify/bisect: stable regressions: first stable, then mainline

Tianchen Ding (2):
      sched/eevdf: Always update V if se->on_rq when reweighting
      sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr

Tom Lendacky (1):
      x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler

Uwe Kleine-König (1):
      MAINTAINERS: Update Uwe's email address, drop SIOX maintenance

Vanshidhar Konda (1):
      ACPI: CPPC: Fix access width used for PCC registers

Vijendar Mukunda (1):
      soundwire: amd: fix for wake interrupt handling for clockstop mode

Vikas Gupta (2):
      bnxt_en: refactor reset close code
      bnxt_en: Fix the PCI-AER routines

Vineet Gupta (2):
      ARC: Fix -Wmissing-prototypes warnings
      ARC: mm: fix new code about cache aliasing

Vinod Koul (1):
      dmaengine: Revert "dmaengine: pl330: issue_pending waits until WFP state"

Vishal Moola (Oracle) (1):
      hugetlb: check for anon_vma prior to folio allocation

WangYuli (1):
      Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853

Wedson Almeida Filho (2):
      rust: phy: implement `Send` for `Registration`
      rust: kernel: require `Send` for `Module` implementations

Wenkuan Wang (1):
      x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range

William Zhang (1):
      mtd: rawnand: brcmnand: Fix data access violation for STB chip

Wolfram Sang (1):
      i2c: smbus: fix NULL function pointer dereference

Xuewen Yan (1):
      sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()

Yaraslau Furman (1):
      HID: logitech-dj: allow mice to use all types of reports

Yick Xie (1):
      udp: preserve the connected status if only UDP cmsg

Yu Kuai (1):
      block: fix module reference leakage from bdev_open_by_dev error path

Zhang Lixu (1):
      HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc

Zhu Lingshan (1):
      vDPA: code clean for vhost_vdpa uapi

Zijun Hu (1):
      Bluetooth: btusb: Fix triggering coredump implementation for QCA

^ permalink raw reply	[relevance 43%]

* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
  2024-04-28 20:01 91%   ` Linus Torvalds
@ 2024-04-28 20:22 96%     ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:22 UTC (permalink / raw)
  To: Hillf Danton; +Cc: syzbot, andrii, bpf, linux-kernel, syzkaller-bugs

On Sun, 28 Apr 2024 at 13:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The *problem* here is that the page fault doesn't actually happen on a
> user access, it happens on the *ret* instruction in
> rep_movs_alternative itself (which doesn't have a exception fixup,
> obviously, because no exception is supposed to happen there!):

Actually, there's another page fault deeper in that call chain:

   asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
  RIP: 0010:__put_user_handle_exception+0x0/0x10 arch/x86/lib/putuser.S:125
  Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 01 cb 48 89 01 31
c9 0f 01 ca c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 01
ca b9 f2 ff ff ff c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90
  RSP: 0000:ffffc90004137d98 EFLAGS: 00050202
  RAX: 00000000662d5943 RBX: 0000000000000000 RCX: 0000000000000019
  RDX: 0000000000000000 RSI: ffffffff8bcaca20 RDI: ffffffff8c1eaba0
  RBP: ffffc90004137e50 R08: ffffffff8fa7cd6f R09: 1ffffffff1f4f9ad
  R10: dffffc0000000000 R11: fffffbfff1f4f9ae R12: ffffc90004137de0
  R13: dffffc0000000000 R14: 1ffff92000826fb8 R15: 0000000000000019
   __do_sys_gettimeofday kernel/time/time.c:147 [inline]
   __se_sys_gettimeofday+0xd9/0x240 kernel/time/time.c:140

which is also nonsensical, since that "<0f> 01 ca" code is just the
"CLAC" instruction (which is the first instruction of
__put_user_handle_exception, which is the exception fixup for the
__put_user() functions.

So that seems to be the *first* problem spot, actually. It too is
incomprehensible to me. I must be missing something. A "clac"
instruction cannot take a page fault (except for the instruction fetch
itself, of course).

So if the page fault on the 'RET' instruction was odd, the page fault
on the CLAC is *really* odd.

That original page fault looks like it's just from one of the
put_user() calls in gettimeofday():

                if (put_user(ts.tv_sec, &tv->tv_sec) ||
                    put_user(ts.tv_nsec / 1000, &tv->tv_usec))

and yes, they can fault, but I'm not seeing how that then points to
the CLAC in the exception handler.

                Linus

^ permalink raw reply	[relevance 96%]

* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
  @ 2024-04-28 20:01 91%   ` Linus Torvalds
  2024-04-28 20:22 96%     ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:01 UTC (permalink / raw)
  To: Hillf Danton; +Cc: syzbot, andrii, bpf, linux-kernel, syzkaller-bugs

On Sat, 27 Apr 2024 at 16:13, Hillf Danton <hdanton@sina.com> wrote:
>
> > -> #0 (&sighand->siglock){....}-{2:2}:
> >        check_prev_add kernel/locking/lockdep.c:3134 [inline]
> >        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
> >        validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
> >        __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
> >        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> >        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >        _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
> >        force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
> >        force_sig_fault_to_task kernel/signal.c:1733 [inline]
> >        force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738
> >        __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
> >        handle_page_fault arch/x86/mm/fault.c:1505 [inline]
>
> Given page fault with runqueue locked, bpf makes trouble instead of
> helping anything in this case.

That's not the odd thing here.

Look, the callchain is:

> >        exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
> >        asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> >        rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:48
> >        copy_user_generic arch/x86/include/asm/uaccess_64.h:110 [inline]
> >        raw_copy_from_user arch/x86/include/asm/uaccess_64.h:125 [inline]
> >        __copy_from_user_inatomic include/linux/uaccess.h:87 [inline]
> >        copy_from_user_nofault+0xbc/0x150 mm/maccess.c:125

IOW, this is all doing a copy from user with page faults disabled, and
it shouldn't have caused a signal to be sent, so the whole
__bad_area_nosemaphore -> force_sig_fault path is bad.

The *problem* here is that the page fault doesn't actually happen on a
user access, it happens on the *ret* instruction in
rep_movs_alternative itself (which doesn't have a exception fixup,
obviously, because no exception is supposed to happen there!):

  RIP: 0010:rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:50
  Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 83 f9 40 73 40 83 f9 08
73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 <c3> cc
cc cc cc 66 0f 1f 84 00 00 0$
  RSP: 0000:ffffc90004137468 EFLAGS: 00050002
  RAX: ffffffff8205ce4e RBX: dffffc0000000000 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: 0000000000000900 RDI: ffffc900041374e8
  RBP: ffff88802d039784 R08: 0000000000000005 R09: ffffffff8205ce37
  R10: 0000000000000003 R11: ffff88802d038000 R12: 1ffff11005a072f0
  R13: 0000000000000900 R14: 0000000000000002 R15: ffffc900041374e8

where decoding that "Code:" line gives this:

   0: f3 0f 1e fa          endbr64
   4: 48 83 f9 40          cmp    $0x40,%rcx
   8: 73 40                jae    0x4a
   a: 83 f9 08              cmp    $0x8,%ecx
   d: 73 21                jae    0x30
   f: 85 c9                test   %ecx,%ecx
  11: 74 0f                je     0x22
  13: 8a 06                mov    (%rsi),%al
  15: 88 07                mov    %al,(%rdi)
  17: 48 ff c7              inc    %rdi
  1a: 48 ff c6              inc    %rsi
  1d: 48 ff c9              dec    %rcx
  20: 75 f1                jne    0x13
  22:* c3                    ret <-- trapping instruction

but I have no idea why the 'ret' instruction would take a page fault.
It really shouldn't.

Now, it's not like 'ret' instructions can't take page faults, but it
sure shouldn't happen in the *kernel*. The reasons for page faults on
'ret' instructions are:

 - the instruction itself takes a page fault

 - the stack pointer is bogus

 - possibly because the stack *contents* are bogus (at least some x86
instructions that jump will check the destination in the jump
instruction itself, although I didn't think 'ret' was one of them)

but for the kernel, none of these actually seem to be the case
normally. And even abnormally I don't see this being an issue, since
the exception backtrace is happily shown (ie the stack looks all
good).

So this dump is just *WEIRD*.

End result: the problem is not about any kind of deadlock on circular
locking. That's just the symptom of that odd page fault that shouldn't
have happened, and that I don't quite see how it happened.

               Linus

^ permalink raw reply	[relevance 91%]

* Re: [GIT PULL] scheduler fixes
  @ 2024-04-28 19:13 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 19:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vincent Guittot, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Juri Lelli, Daniel Bristot de Oliveira, Valentin Schneider

On Sun, 28 Apr 2024 at 01:42, Ingo Molnar <mingo@kernel.org> wrote:
>
> Merge note: in case you are wondering about the timestamps, I ninja-rebased
> these two commits shortly before the pull request to fix an annoying typo
> in a commit title.

Hmm. You also forgot to have a diffstat..

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  @ 2024-04-28 18:50 99%                           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-28 18:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Dmitry Vyukov, syzbot, linux-kernel,
	syzkaller-bugs, Nathan Chancellor, Arnd Bergmann, Al Viro,
	Jiri Slaby

On Sun, 28 Apr 2024 at 03:20, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
>
> If we keep the current model, WRITE_ONCE() is not sufficient.
>
> My understanding is that KCSAN's report like

I find it obnoxious that these are NOT REAL PROBLEMS.

It's KCSAN that is broken and doesn't allow us to just tell it to
sanely ignore things.

I don't want to add stupid and pointless annotations for a broken tooling.

Can you instead just ask the KCSAN people to have some mode where we
can annotate a pointer as a "use one or the other", and just shut that
thing up that way?

Because no, we're not adding some idiotic "f_op()" wrapper just to
shut KCSAN up about a non-issue.

                     Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
  @ 2024-04-27 19:02 96%                       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-27 19:02 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Dmitry Vyukov, syzbot, linux-kernel,
	syzkaller-bugs, Nathan Chancellor, Arnd Bergmann, Al Viro,
	Jiri Slaby

On Fri, 26 Apr 2024 at 23:21, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting data race between __tty_hangup() and __fput(), for
> filp->f_op readers are not holding tty->files_lock.

Hmm. I looked round, and we actually have another case of this:
snd_card_disconnect() also does

                        mfile->file->f_op = &snd_shutdown_f_ops;

and I don't think tty->files_lock (or, in the sound case,
&card->files_lock) is at all relevant, since the users of f_ops don't
use it or care.

That said, I really think we'd be better off just keeping the current
model, and have the "you get one or the other". For the two cases that
do this, do that f_op replacement with a WRITE_ONCE(), and just make
the rule be that you have to have all the same ops in both the
original and the shutdown version.

I do *not* think it's at all better to replace (in two different
places) the racy f_op thing with another racy 'hungup' flag.

The sound case is actually a bit more involved, since it tries to deal
with module counts. That looks potentially bogus. It does

                        fops_get(mfile->file->f_op);

after it has installed the snd_shutdown_f_ops, but in snd_open() it
has done the proper

        replace_fops(file, new_fops);

which actually drops the module count for the old one. So the sound
case seems to possibly leak a module ref on disconnect. That's a
separate issue, though.

                      Linus

                    Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] ACPI fixes for v6.9-rc6
  2024-04-25 18:58 95% ` Linus Torvalds
  2024-04-25 19:01 99%   ` Linus Torvalds
@ 2024-04-25 19:18 96%   ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 19:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Jarred White
  Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List

On Thu, 25 Apr 2024 at 11:58, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And maybe this time, it's not a buggy mess?

Actually, even with MASK_VAL() fixed, I think it's *STILL* a buggy mess.

Why? Beuse the *uses* of MASK_VAL() seem entirely bogus.

In particular, we have this in cpc_write():

        if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
                val = MASK_VAL(reg, val);

        switch (size) {
        case 8:
                writeb_relaxed(val, vaddr);
                break;
        case 16:
                writew_relaxed(val, vaddr);
                break;
        ...

and I strongly suspect that it needs to update the 'vaddr' too. Something like

        if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) {
                val = MASK_VAL(reg, val);
  #ifdef __LITTLE_ENDIAN
                vaddr += reg->bit_offset >> 3;
                if (reg->bit_offset & 7)
                        return -EFAULT;
  #else
                /* Fixme if we ever care */
                if (reg->bit_offset)
                        return -EFAULT;
  #endif
        }

*might* be changing this in the right direction, but it's unclear and
I neither know that CPC rules, nor did I think _that_ much about it.

Anyway, the take-away should be that all this code is entirely broken
and somebody didn't think enough about it.

It's possible that that whole cpc_write() ACPI_ADR_SPACE_SYSTEM_MEMORY
case should be done as a 64-bit "read-mask-write" sequence.

Possibly with "reg->bit_offset == 0" and the 8/16/32/64-bit cases as a
special case for "just do the write".

Or, maybe writes with a non-zero bit offset shouldn't be allowed at
all, and there are CPC rules that aren't checked. I don't know. I only
know that the current code is seriously broken.

                   Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] ACPI fixes for v6.9-rc6
  2024-04-25 18:58 95% ` Linus Torvalds
@ 2024-04-25 19:01 99%   ` Linus Torvalds
  2024-04-25 19:18 96%   ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 19:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List

On Thu, 25 Apr 2024 at 11:58, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> When that macro now has had TWO independent bugs, how about you just
> write it out with explicit types and without any broken "helpers":
>
>    static inline u64 MASK_VAL(const struct cpc_reg *reg, u64 val)
>    {
>         u64 mask = (1ull << reg->bit_width)-1;
>         return (val >> reg->bit_offset) & mask;
>    }
>
> which is a few more lines, but doesn't that make it a whole lot more readable?
>
> And maybe this time, it's not a buggy mess?

Just to clarify: that was written in the MUA, and entirely untested.
Somebody should still verify it, but really, with already now two
bugs, that macro needs fixing for good, and the "for good" should be
looking at least _something_ like the above.

And despite needing fixing, I've done the pull, since bug #2 is at
least less bad than bug#1 was.

                   Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] ACPI fixes for v6.9-rc6
  @ 2024-04-25 18:58 95% ` Linus Torvalds
  2024-04-25 19:01 99%   ` Linus Torvalds
  2024-04-25 19:18 96%   ` Linus Torvalds
  0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 18:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List

On Thu, 25 Apr 2024 at 10:46, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
>  - Fix bit offset computation in MASK_VAL() macro used for applying
>    a bitmask to a new CPPC register value (Jarred White).

Honestly, that code should never have used GENMASK() in the first place.

When a helper macro is more complicated than just doing the obvious
thing without it, it's not a helper macro any more.

Doing

    GENMASK(((reg)->bit_width) - 1, 0)

is literally more work than just doing the obvious thing

    ((1ul << (reg)->bit_width) - 1)

and using that "helper" macro was actually more error-prone too as
shown by this example, because of the whole "inclusive or not" issue.

BUT!

Even with that simpler model, that's still entirely buggy, since 'val'
is 64-bit, and these GENMASK tricks only work on 'long'.

Which happens to be ok on x86-64, of course, and maybe in practice all
fields are less than 32 bits in width anyway so maybe it even works on
32-bit, but this all smells HORRIBLY WRONG.

And no, the fix is *NOT* to make that GENVAL() mindlessly just be
GENVAL_ULL().  That fixes the immediate bug, but it shows - once again
- how mindlessly using "helper macros" is not the right thing to do.

When that macro now has had TWO independent bugs, how about you just
write it out with explicit types and without any broken "helpers":

   static inline u64 MASK_VAL(const struct cpc_reg *reg, u64 val)
   {
        u64 mask = (1ull << reg->bit_width)-1;
        return (val >> reg->bit_offset) & mask;
   }

which is a few more lines, but doesn't that make it a whole lot more readable?

And maybe this time, it's not a buggy mess?

               Linus

^ permalink raw reply	[relevance 95%]

* Re: regression fixes sitting in subsystem git trees for a week or longer (was: Re: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor)
  @ 2024-04-24 18:53 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-24 18:53 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Jiri Kosina, Douglas Anderson, Hans de Goede, linux-input,
	linux-kernel, Kenny Levinsen, Benjamin Tissoires,
	Linux regressions mailing list

On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> out of interest: what's your stance on regression fixes sitting in
> subsystem git trees for a week or longer before being mainlined?

Annoying, but probably depends on circumstances. The fact that it took
a while to even be noticed presumably means it's not common or holding
anything up.

That said, th4e last HID pull I have is from March 14. If the issue is
just that there's nothing else happening, I think people should just
point me to the patch and say "can you apply this single fix?"

                         Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  @ 2024-04-23 16:37 99%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:37 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

On Tue, 23 Apr 2024 at 08:26, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/04/22 1:04, Linus Torvalds wrote:
> >
> > Actually, another option would be to just return an error at 'set_ldisc()' time.
>
> This patch works for me. You can propose a formal patch.

Ok, I wrote a commit message, added your tested-by, and sent it out

    https://lore.kernel.org/all/20240423163339.59780-1-torvalds@linux-foundation.org/

let's see if anybody has better ideas, but that patch at least looks
palatable to me.

                  Linus

^ permalink raw reply	[relevance 99%]

* [PATCH] tty: add the option to have a tty reject a new ldisc
@ 2024-04-23 16:33 89% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Linus Torvalds, Tetsuo Handa, Jiri Slaby,
	Andrew Morton, Daniel Starke, syzbot

... and use it to limit the virtual terminals to just N_TTY.  They are
kind of special, and in particular, the "con_write()" routine violates
the "writes cannot sleep" rule that some ldiscs rely on.

This avoids the

   BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659

when N_GSM has been attached to a virtual console, and gsmld_write()
calls con_write() while holding a spinlock, and con_write() then tries
to get the console lock.

Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Starke <daniel.starke@siemens.com>
Reported-by: syzbot <syzbot+dbac96d8e73b61aa559c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=dbac96d8e73b61aa559c
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/tty/tty_ldisc.c    |  6 ++++++
 drivers/tty/vt/vt.c        | 10 ++++++++++
 include/linux/tty_driver.h |  8 ++++++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 3f68e213df1f..d80e9d4c974b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -545,6 +545,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
 		goto out;
 	}
 
+	if (tty->ops->ldisc_ok) {
+		retval = tty->ops->ldisc_ok(tty, disc);
+		if (retval)
+			goto out;
+	}
+
 	old_ldisc = tty->ldisc;
 
 	/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 9b5b98dfc8b4..cd87e3d1291e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3576,6 +3576,15 @@ static void con_cleanup(struct tty_struct *tty)
 	tty_port_put(&vc->port);
 }
 
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+	return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
 static int default_color           = 7; /* white */
 static int default_italic_color    = 2; // green (ASCII)
 static int default_underline_color = 3; // cyan (ASCII)
@@ -3695,6 +3704,7 @@ static const struct tty_operations con_ops = {
 	.resize = vt_resize,
 	.shutdown = con_shutdown,
 	.cleanup = con_cleanup,
+	.ldisc_ok = con_ldisc_ok,
 };
 
 static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 7372124fbf90..dd4b31ce6d5d 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -154,6 +154,13 @@ struct serial_struct;
  *
  *	Optional. Called under the @tty->termios_rwsem. May sleep.
  *
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ *	This routine allows the @tty driver to decide if it can deal
+ *	with a particular @ldisc.
+ *
+ *	Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
  * @set_ldisc: ``void ()(struct tty_struct *tty)``
  *
  *	This routine allows the @tty driver to be notified when the device's
@@ -372,6 +379,7 @@ struct tty_operations {
 	void (*hangup)(struct tty_struct *tty);
 	int (*break_ctl)(struct tty_struct *tty, int state);
 	void (*flush_buffer)(struct tty_struct *tty);
+	int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
 	void (*set_ldisc)(struct tty_struct *tty);
 	void (*wait_until_sent)(struct tty_struct *tty, int timeout);
 	void (*send_xchar)(struct tty_struct *tty, u8 ch);
-- 
2.44.0.330.g4d18c88175


^ permalink raw reply related	[relevance 89%]

* Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling
  @ 2024-04-23 16:13 97%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:13 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Ankur Arora, Thomas Gleixner, peterz, paulmck, akpm, luto, bp,
	dave.hansen, hpa, mingo, juri.lelli, vincent.guittot, willy,
	mgorman, jpoimboe, mark.rutland, jgross, andrew.cooper3, bristot,
	mathieu.desnoyers, geert, glaubitz, anton.ivanov, mattst88,
	krypton, rostedt, David.Laight, richard, mjguzik, jon.grimm,
	bharata, raghavendra.kt, boris.ostrovsky, konrad.wilk, LKML,
	Michael Ellerman, Nicholas Piggin

On Tue, 23 Apr 2024 at 08:23, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> Tried this patch on PowerPC by defining LAZY similar to x86. The change is below.
> Kept it at PREEMPT=none for PREEMPT_AUTO.
>
> Running into soft lockup on large systems (40Cores, SMT8) and seeing close to 100%
> regression on small system ( 12 Cores, SMT8). More details are after the patch.
>
> Are these the only arch bits that need to be defined? am I missing something very
> basic here? will try to debug this further. Any inputs?

I don't think powerpc uses the generic *_exit_to_user_mode() helper
functions, so you'll need to also add that logic to the low-level
powerpc code.

IOW, on x86, with this patch series, patch 06/30 did this:

-               if (ti_work & _TIF_NEED_RESCHED)
+               if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY))
                        schedule();

in kernel/entry/common.c exit_to_user_mode_loop().

But that works on x86 because it uses the irqentry_exit_to_user_mode().

On PowerPC, I think you need to at least fix up

    interrupt_exit_user_prepare_main()

similarly (and any other paths like that - I used to know the powerpc
code, but that was long long LOOONG ago).

                Linus

^ permalink raw reply	[relevance 97%]

* Linux 6.9-rc5
@ 2024-04-21 19:53 47% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-21 19:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another week, another -rc. Things look fairly normal, although the
diffstat for rc5 looks a bit wonky due to another rash of bcachefs
fixes, and a perf tools header sync with the main kernel headers.

But if you ignore those oddities, it all looks pretty normal and
things appear fairly calm. Which is just as well, since the first part
of the week I was on a quick trip to Seattle, and the second part of
the week I've been doing a passable imitation of the Fontana di Trevi,
except my medium is mucus. Sooo much mucus.

Anyway, moving on..

Apart from the already mentioned bcachefs and header updates, it's
mostly various drivers (gpu, networking, usb, tty, sound..) some
architecture updates (mainly x86 kvm), some small MM patches, some
core networking, a couple of small filesystem updates (fuse, 9p, nfsd)
and just random singleton patches elsewhere.

Shortlog appended for anybody who wants to get a feel for the details,

              Linus

---

Ai Chao (1):
      ALSA: hda/realtek - Enable audio jacks of Haier Boyue G42 with ALC269VC

Alan Stern (1):
      fs: sysfs: Fix reference leak in sysfs_break_active_protection()

Alex Deucher (3):
      Revert "drm/amd/display: fix USB-C flag update after enc10 feature init"
      drm/radeon: make -fstrict-flex-arrays=3 happy
      drm/radeon: silence UBSAN warning (v3)

Alexander Usyskin (1):
      mei: me: disable RPL-S on SPS and IGN firmwares

Amir Goldstein (2):
      fuse: fix wrong ff->iomode state changes from parallel dio write
      fuse: fix parallel dio write on file open in passthrough mode

Andrew Jones (1):
      KVM: selftests: fix supported_flags for riscv

Andy Shevchenko (5):
      gpio: wcove: Use -ENOTSUPP consistently
      gpio: crystalcove: Use -ENOTSUPP consistently
      serial: 8250_pci: Remove redundant PCI IDs
      serial: core: Clearing the circular buffer before NULLifying it
      gpiolib: swnode: Remove wrong header inclusion

AngeloGioacchino Del Regno (2):
      usb: typec: mux: it5205: Fix ChipID value typo
      dt-bindings: pwm: mediatek,pwm-disp: Document power-domains property

Anshuman Khandual (1):
      arm64/hugetlb: Fix page table walk in huge_pte_alloc()

Ard Biesheuvel (2):
      arm64/head: Drop unnecessary pre-disable-MMU workaround
      arm64/head: Disable MMU at EL2 before clearing HCR_EL2.E2H

Arınç ÜNAL (2):
      net: dsa: mt7530: fix mirroring frames received on local port
      net: dsa: mt7530: fix port mirroring for MT7988 SoC switch

Asbjørn Sloth Tønnesen (2):
      net: sparx5: flower: fix fragment flags handling
      octeontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementation

Bart Van Assche (1):
      scsi: core: Fix handling of SCMD_FAIL_IF_RECOVERING

Borislav Petkov (AMD) (1):
      x86/retpolines: Enable the default thunk warning only on relevant configs

Carlos Llamas (1):
      binder: check offset alignment in binder_get_object()

Carolina Jubran (2):
      net/mlx5e: Acquire RTNL lock before RQs/SQs activation/deactivation
      net/mlx5e: Prevent deadlock while disabling aRFS

Chao Yu (1):
      bcachefs: fix error path of __bch2_read_super()

Christian A. Ehrhardt (1):
      usb: typec: ucsi: Fix connector check on init

Christian König (3):
      drm/ttm: stop pooling cached NUMA pages v2
      drm/amdgpu: remove invalid resource->start check v2
      drm/amdgpu: fix visible VRAM handling during faults

Christoph Hellwig (1):
      block: propagate partition scanning errors to the BLKRRPART ioctl

Christophe JAILLET (1):
      KVM: SVM: Remove a useless zeroing of allocated memory

Chuanhong Guo (1):
      USB: serial: option: add support for Fibocom FM650/FG650

Coia Prant (1):
      USB: serial: option: add Lonsung U8300/U9300 product

Dan Carpenter (1):
      serial: 8250_lpc18xx: disable clks on error in probe()

Daniel Golle (1):
      clk: mediatek: mt7988-infracfg: fix clocks for 2nd PCIe port

Daniele Palmas (1):
      USB: serial: option: add Telit FN920C04 rmnet compositions

Danny Lin (1):
      fuse: fix leaked ENOSYS error on first statx call

Dave Airlie (1):
      nouveau: fix instmem race condition around ptr stores

David Hildenbrand (1):
      mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly

David Matlack (4):
      KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status
      KVM: x86/mmu: Remove function comments above
clear_dirty_{gfn_range,pt_masked}()
      KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs.
write-protecting
      KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test

Dmitry Baryshkov (2):
      drm/panel: visionox-rm69299: don't unregister DSI device
      drm/panel: novatek-nt36682e: don't unregister DSI device

Dmitry Safonov (4):
      selftests/tcp_ao: Make RST tests less flaky
      selftests/tcp_ao: Zero-init tcp_ao_info_opt
      selftests/tcp_ao: Fix fscanf() call for format-security
      selftests/tcp_ao: Printing fixes to confirm with format-security

Emil Kronborg (1):
      serial: mxs-auart: add spinlock around changing cts state

Eric Biggers (1):
      x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ

Eric Dumazet (1):
      net/sched: Fix mirred deadlock on device recursion

Eric Van Hensbergen (2):
      fs/9p: remove erroneous nlink init from legacy stat2inode
      fs/9p: Revert "fs/9p: fix dups even in uncached mode"

Fabio Estevam (1):
      usb: misc: onboard_usb_hub: Disable the USB hub clock on failure

Felix Fietkau (1):
      net: ethernet: mtk_eth_soc: fix WED + wifi reset

Felix Kuehling (1):
      drm/amdkfd: Fix memory leak in create_process failure

Finn Thain (1):
      serial/pmac_zilog: Remove flawed mitigation for rx irq flood

Florian Westphal (1):
      netfilter: nft_set_pipapo: do not free live element

Gerd Bayer (1):
      s390/ism: Properly fix receive message buffer allocation

Gil Fine (2):
      thunderbolt: Fix wake configurations after device unplug
      thunderbolt: Avoid notify PM core about runtime PM resume

Greg Kroah-Hartman (1):
      Revert "usb: cdc-wdm: close race between read and workqueue"

Hans de Goede (1):
      serial: 8250_dw: Revert: Do not reclock if already at correct rate

Hou Wenlong (1):
      x86/fred: Fix incorrect error code printout in fred_bad_type()

Huayu Zhang (1):
      ALSA: hda/realtek: Fix volumn control of ThinkBook 16P Gen4

Jakub Kicinski (2):
      inet: bring NLM_DONE out to a separate recv() again
      selftests: kselftest_harness: fix Clang warning about zero-length format

James Bottomley (1):
      MAINTAINERS: update to working email address

Jason A. Donenfeld (2):
      random: handle creditable entropy from atomic process context
      Revert "vmgenid: emit uevent when VMGENID updates"

Jason Gunthorpe (1):
      iommufd: Add missing IOMMUFD_DRIVER kconfig for the selftest

Jeff Layton (1):
      9p: explicitly deny setlease attempts

Jeongjun Park (1):
      nilfs2: fix OOB in nilfs_set_de_type

Jerry Meng (1):
      USB: serial: option: support Quectel EM060K sub-models

Joakim Sindholt (4):
      fs/9p: only translate RWX permissions for plain 9P2000
      fs/9p: translate O_TRUNC into OTRUNC
      fs/9p: fix the cache always being enabled on files with qid flags
      fs/9p: drop inodes immediately on non-.L too

Jose Ignacio Tornos Martinez (1):
      net: usb: ax88179_178a: avoid writing the mac address before first reading

Josh Poimboeuf (1):
      x86/bugs: Fix BHI retpoline check

Kai-Heng Feng (1):
      usb: Disable USB3 LPM at shutdown

Kees Cook (1):
      ubsan: Add awareness of signed integer overflow traps

Kent Overstreet (23):
      bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree
split path
      bcachefs: Fix UAFs of btree_insert_entry array
      bcachefs: Check for packed bkeys that are too big
      bcachefs: btree node scan: handle encrypted nodes
      bcachefs: fix unsafety in bch2_extent_ptr_to_text()
      bcachefs: fix unsafety in bch2_stripe_to_text()
      bcachefs: fix race in bch2_btree_node_evict()
      bcachefs: don't queue btree nodes for rewrites during scan
      bcachefs: Standardize helpers for printing enum strs with bounds checks
      bcachefs: Go rw if running any explicit recovery passes
      bcachefs: Fix deadlock in journal replay
      bcachefs: Fix missing write refs in fs fio paths
      bcachefs: Run merges at BCH_WATERMARK_btree
      bcachefs: Disable merges from interior update path
      bcachefs: Fix btree node merging on write buffer btrees
      bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
      bcachefs: Interior known are required to have known key types
      bcachefs: add safety checks in bch2_btree_node_fill()
      bcachefs: Fix bch2_btree_node_fill() for !path
      bcachefs: sysfs internal/trigger_journal_flush
      bcachefs: bch_member.btree_allocated_bitmap
      bcachefs: Check for backpointer bucket_offset >= bucket size
      bcachefs: set_btree_iter_dontneed also clears should_be_locked

Konrad Dybcio (1):
      interconnect: qcom: x1e80100: Remove inexistent ACV_PERF BCM

Krzysztof Kozlowski (2):
      usb: phy: MAINTAINERS: mark Freescale USB PHY as orphaned
      gpio: lpc32xx: fix module autoloading

Kuniyuki Iwashima (2):
      af_unix: Call manage_oob() for every skb in unix_stream_read_generic().
      af_unix: Don't peek OOB data without MSG_OOB.

Kyle Tso (1):
      usb: typec: tcpm: Correct the PDO counting in pd_set

Lei Chen (1):
      tun: limit printing rate when illegal packet received by tun dev

Li Nan (1):
      blk-iocost: do not WARN if iocg was already offlined

Linus Torvalds (1):
      Linux 6.9-rc5

Lokesh Gidra (1):
      userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE

Lyude Paul (2):
      drm/nouveau/kms/nv50-: Disable AUX bus for disconnected DP ports
      drm/nouveau/dp: Don't probe eDP ports twice harder

Maarten Lankhorst (1):
      drm/xe: Fix bo leak in intel_fb_bo_framebuffer_init

Manivannan Sadhasivam (1):
      scsi: ufs: qcom: Add missing interconnect bandwidth values for Gear 5

Marcin Szycik (1):
      ice: Fix checking for unsupported keys on non-tunnel device

Mario Limonciello (4):
      platform/x86/amd: pmf: Decrease error message to debug
      platform/x86/amd: pmf: Add infrastructure for quirking supported funcs
      platform/x86/amd: pmf: Add quirk for ROG Zephyrus G14
      platform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes

Mark Zhang (1):
      RDMA/cm: Print the old state when cm_destroy_id gets timeout

Masami Hiramatsu (Google) (1):
      bootconfig: Fix the kerneldoc of _xbc_exit()

Mathias Nyman (1):
      xhci: Fix root hub port null pointer dereference in xhci tracepoints

Mathieu Desnoyers (1):
      sched: Add missing memory barrier in switch_mm_cid

Matthew Auld (1):
      drm/xe/vm: prevent UAF with asid based lookup

Mauro Carvalho Chehab (1):
      ALSA: hda/realtek: Add quirks for Huawei Matebook D14 NBLB-WAX9N

Maxim Levitsky (1):
      KVM: selftests: fix max_guest_memory_test with more that 256 vCPUs

Maíra Canal (1):
      drm/v3d: Don't increment `enabled_ns` twice

Miaohe Lin (2):
      mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
      fork: defer linking file vma until vma is fully initialized

Michael Ellerman (2):
      powerpc/crypto/chacha-p10: Fix failure on non Power10
      Documentation: embargoed-hardware-issues.rst: Add myself for Power

Michael Guralnik (1):
      RDMA/mlx5: Fix port number for counter query in multi-port configuration

Michal Swiatkowski (2):
      ice: tc: check src_vsi in case of traffic from VF
      ice: tc: allow zero flags in parsing tc flower

Mika Westerberg (1):
      thunderbolt: Do not create DisplayPort tunnels on adapters of
the same router

Mike Tipton (1):
      interconnect: Don't access req_list while it's being manipulated

Mikhail Kobuk (1):
      drm: nv04: Fix out of bounds access

Minas Harutyunyan (1):
      usb: dwc2: host: Fix dereference issue in DDMA completion flow.

Muhammad Usama Anjum (1):
      iommufd: Add config needed for iommufd_fail_nth

Namhyung Kim (11):
      perf annotate: Make sure to call symbol__annotate2() in TUI
      perf lock contention: Add a missing NULL check
      tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
      tools/include: Sync uapi/linux/fs.h with the kernel sources
      tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
      tools/include: Sync uapi/sound/asound.h with the kernel sources
      tools/include: Sync x86 CPU feature headers with the kernel sources
      tools/include: Sync x86 asm/irq_vectors.h with the kernel sources
      tools/include: Sync x86 asm/msr-index.h with the kernel sources
      tools/include: Sync asm-generic/bitops/fls.h with the kernel sources
      tools/include: Sync arm64 asm/cputype.h with the kernel sources

Naohiro Aota (2):
      btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
      btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling

Naoya Horiguchi (1):
      MAINTAINERS: update Naoya Horiguchi's email address

Nathan Chancellor (2):
      configs/hardening: Fix disabling UBSAN configurations
      configs/hardening: Disable CONFIG_UBSAN_SIGNED_WRAP

Nathan Lynch (1):
      selftests/powerpc/papr-vpd: Fix missing variable initialization

Nikita Zhandarovich (1):
      comedi: vmk80xx: fix incomplete endpoint checking

Norihiko Hama (1):
      usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep
transport error

Oliver Neukum (1):
      usb: xhci: correct return value in case of STS_HCE

Oscar Salvador (6):
      mm,page_owner: update metadata for tail pages
      mm,page_owner: fix refcount imbalance
      mm,page_owner: fix accounting of pages when migrating
      mm,page_owner: fix printing of stack records
      mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
      mm,page_owner: defer enablement of static branch

Pablo Neira Ayuso (7):
      netfilter: br_netfilter: skip conntrack input hook for promisc packets
      netfilter: nft_set_pipapo: walk over current view on netlink dump
      netfilter: flowtable: validate pppoe header
      netfilter: flowtable: incorrect pppoe tuple
      netfilter: nf_tables: missing iterator type in lookup walk
      netfilter: nf_tables: restore set elements when delete set fails
      netfilter: nf_tables: fix memleak in map from abort path

Paul Barker (4):
      net: ravb: Count packets instead of descriptors in R-Car RX path
      net: ravb: Allow RX loop to move past DMA mapping errors
      net: ravb: Fix GbEth jumbo packet RX checksum handling
      net: ravb: Fix RX byte accounting for jumbo packets

Paul Cercueil (2):
      usb: gadget: functionfs: Fix inverted DMA fence direction
      usb: gadget: functionfs: Wait for fences before enqueueing DMABUF

Peter Oberparleiter (3):
      s390/qdio: handle deferred cc1
      s390/cio: fix race condition during online processing
      s390/cio: log fake IRB events

Peter Xu (1):
      mm/userfaultfd: allow hugetlb change protection upon poison entry

Phillip Lougher (1):
      Squashfs: check the inode number is not the invalid value of zero

Pin-yen Lin (1):
      clk: mediatek: Do a runtime PM get on controllers during probe

Qiang Zhang (1):
      bootconfig: use memblock_free_late to free xbc memory to buddy

Qu Wenruo (1):
      btrfs: do not wait for short bulk allocation

Raag Jadav (1):
      pwm: dwc: allow suspend/resume for 16 channels

Rafael J. Wysocki (1):
      thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up()

Rahul Rameshbabu (1):
      net/mlx5e: Use channel mdev reference instead of global mdev
instance for coalescing

Randy Dunlap (1):
      peci: linux/peci.h: fix Excess kernel-doc description warning

Richard Genoud (1):
      MAINTAINERS: mailmap: update Richard Genoud's email address

Rick Edgecombe (1):
      KVM: x86/mmu: x86: Don't overflow lpage_info when checking attributes

Ricky Wu (1):
      misc: rtsx: Fix rts5264 driver status incorrect when card removed

Sakari Ailus (2):
      Revert "mei: vsc: Call wake_up() in the threaded IRQ handler"
      mei: vsc: Unregister interrupt handler for system suspend

Samuel Thibault (1):
      speakup: Avoid crash on very long word

Sandipan Das (1):
      KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms

Sean Christopherson (20):
      KVM: Add helpers to consolidate gfn_to_pfn_cache's page split check
      KVM: Check validity of offset+length of gfn_to_pfn_cache prior
to activation
      KVM: Explicitly disallow activatating a gfn_to_pfn_cache with INVALID_GPA
      KVM: x86/pmu: Disable support for adaptive PEBS
      KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
      KVM: selftests: Verify post-RESET value of PERF_GLOBAL_CTRL in PMCs test
      KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding
      KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV
      KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run()
      KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted
      KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via
host save area
      KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area
      KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run()
      KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD
      KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
      KVM: VMX: Snapshot LBR capabilities during module initialization
      perf/x86/intel: Expose existence of callback support to KVM
      KVM: VMX: Disable LBR virtualization if the CPU doesn't support
LBR callstacks
      KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update
      KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start()

Serge Semin (3):
      net: stmmac: Apply half-duplex-less constraint for DW QoS Eth only
      net: stmmac: Fix max-speed being ignored on queue re-init
      net: stmmac: Fix IP-cores specific MAC capabilities

Shay Drory (2):
      net/mlx5: Lag, restore buckets number to default after hash LAG
deactivation
      net/mlx5: Restore mistakenly dropped parts in register devlink flow

Shenghao Ding (2):
      ALSA: hda/tas2781: correct the register for pow calibrated data
      ALSA: hda/tas2781: Add new vendor_id and subsystem_id to support
ThinkPad ICE-1

Shengyu Li (1):
      selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN

Shivaprasad G Bhat (1):
      powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()

Siddharth Vadapalli (1):
      net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them

Srinivas Pandruvada (2):
      platform/x86: ISST: Add Granite Rapids-D to HPM CPU list
      platform/x86/intel-uncore-freq: Increase minor number support

Stephen Boyd (5):
      clk: Remove prepare_lock hold assertion in __clk_release()
      clk: Don't hold prepare_lock when calling kref_put()
      clk: Initialize struct clk_core kref earlier
      clk: Get runtime PM before walking tree during disable_unused
      clk: Get runtime PM before walking tree for clk_summary

Steven Rostedt (Google) (1):
      SUNRPC: Fix rpcgss_context trace event acceptor field

Sumanth Korikkar (1):
      mm/shmem: inline shmem_is_huge() for disabled transparent hugepages

Sven Schnelle (1):
      s390/mm: Fix NULL pointer dereference

Takashi Iwai (1):
      ALSA: seq: ump: Fix conversion from MIDI2 to MIDI1 UMP messages

Tao Su (1):
      KVM: VMX: Ignore MKTME KeyID bits when intercepting #PF for
allow_smaller_maxphyaddr

Tariq Toukan (1):
      net/mlx5: SD, Handle possible devcom ERR_PTR

Thinh Nguyen (1):
      usb: dwc3: ep0: Don't reset resource alloc flag

Tony Lindgren (2):
      serial: core: Fix regression when runtime PM is not enabled
      serial: core: Fix missing shutdown and startup for serial base port

Uwe Kleine-König (5):
      clk: Provide !COMMON_CLK dummy for devm_clk_rate_exclusive_get()
      usb: gadget: fsl: Initialize udc before using it
      MAINTAINERS: Drop Li Yang as their email address stopped working
      serial: stm32: Return IRQ_NONE in the ISR if no handling happend
      serial: stm32: Reset .throttled state in .startup()

Vanillan Wang (2):
      net:usb:qmi_wwan: support Rolling modules
      USB: serial: option: add Rolling RW101-GL and RW135-GL support

Vasily Gorbik (1):
      NFSD: fix endianness issue in nfsd4_encode_fattr4

Vitalii Torshyn (1):
      ALSA: hda/realtek: Fixes for Asus GU605M and GA403U sound

Vitaly Rodionov (1):
      ALSA: hda/realtek: Add quirk for HP SnowWhite laptops

Xin Li (Intel) (1):
      x86/fred: Fix INT80 emulation for FRED

Yang Li (1):
      cuse: add kernel-doc comments to cuse_process_init_reply()

Yanjun.Zhu (1):
      RDMA/rxe: Fix the problem "mutex_destroy missing"

Yaxiong Tian (1):
      arm64: hibernate: Fix level3 translation fault in swsusp_save()

Yuanhe Shu (1):
      selftests/ftrace: Limit length in subsystem-enable tests

Yuntao Wang (1):
      init/main.c: Fix potential static_command_line memory overflow

Yuri Benditovich (1):
      net: change maximum number of UDP segments to 128

Zack Rusin (3):
      drm/vmwgfx: Fix prime import/export
      drm/vmwgfx: Fix crtc's atomic check conditional
      drm/vmwgfx: Sort primary plane formats by order of preference

Ziyang Xuan (2):
      netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get()
      netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get()

bolan wang (1):
      USB: serial: option: add Fibocom FM135-GL variants

xinhui pan (1):
      drm/amdgpu: validate the parameters of bo mapping operations more clearly

^ permalink raw reply	[relevance 47%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  2024-04-21 16:04 95%         ` Linus Torvalds
@ 2024-04-21 17:18 87%           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-21 17:18 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]

On Sun, 21 Apr 2024 at 09:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The only option is to *mark* the ones that are atomic. Which was my suggestion.

Actually, another option would be to just return an error at 'set_ldisc()' time.

Sadly, the actual "tty->ops->set_ldisc()" function not only returns
'void' (easy enough to change - there aren't that many of them), but
it's called too late after the old ldisc has already been dropped.
It's basically a "inform tty about new ldisc" and is not useful for a
"is this ok"?

But we could trivially add a "ldisc_ok()" function, and have the vt
driver say "I only accept N_TTY".

Something like this ENTIRELY UNTESTED patch.

Again - this is untested, and maybe there are other tty drivers that
have issues with the stranger line disciplines, but this at least
seems simple and fairly easy to explain why we do what we do..

And if pty's really need the same thing, that would be easy to add.
But I actually think that at least pty slaves should *not* limit
ldiscs, because the whole point of a pty slave is to look like another
tty. If you want to emulate a serial device over a network, the way to
do it would be with a pty.

Hmm?

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2497 bytes --]

 drivers/tty/tty_ldisc.c    |  6 ++++++
 drivers/tty/vt/vt.c        | 10 ++++++++++
 include/linux/tty_driver.h |  8 ++++++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 3f68e213df1f..d80e9d4c974b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -545,6 +545,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
 		goto out;
 	}
 
+	if (tty->ops->ldisc_ok) {
+		retval = tty->ops->ldisc_ok(tty, disc);
+		if (retval)
+			goto out;
+	}
+
 	old_ldisc = tty->ldisc;
 
 	/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 9b5b98dfc8b4..cd87e3d1291e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3576,6 +3576,15 @@ static void con_cleanup(struct tty_struct *tty)
 	tty_port_put(&vc->port);
 }
 
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+	return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
 static int default_color           = 7; /* white */
 static int default_italic_color    = 2; // green (ASCII)
 static int default_underline_color = 3; // cyan (ASCII)
@@ -3695,6 +3704,7 @@ static const struct tty_operations con_ops = {
 	.resize = vt_resize,
 	.shutdown = con_shutdown,
 	.cleanup = con_cleanup,
+	.ldisc_ok = con_ldisc_ok,
 };
 
 static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 7372124fbf90..dd4b31ce6d5d 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -154,6 +154,13 @@ struct serial_struct;
  *
  *	Optional. Called under the @tty->termios_rwsem. May sleep.
  *
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ *	This routine allows the @tty driver to decide if it can deal
+ *	with a particular @ldisc.
+ *
+ *	Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
  * @set_ldisc: ``void ()(struct tty_struct *tty)``
  *
  *	This routine allows the @tty driver to be notified when the device's
@@ -372,6 +379,7 @@ struct tty_operations {
 	void (*hangup)(struct tty_struct *tty);
 	int (*break_ctl)(struct tty_struct *tty, int state);
 	void (*flush_buffer)(struct tty_struct *tty);
+	int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
 	void (*set_ldisc)(struct tty_struct *tty);
 	void (*wait_until_sent)(struct tty_struct *tty, int timeout);
 	void (*send_xchar)(struct tty_struct *tty, u8 ch);

^ permalink raw reply related	[relevance 87%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  @ 2024-04-21 16:04 95%         ` Linus Torvalds
  2024-04-21 17:18 87%           ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-21 16:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

On Sun, 21 Apr 2024 at 06:28, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> "struct tty_ldisc_ops" says that ->write() function (e.g. gsmld_write())
> is allowed to sleep and "struct tty_operations" says that ->write() function
> (e.g. con_write()) is not allowed to sleep.

Well, clearly con_write() *is* allowed to sleep. The very first thing
it does is that

        console_lock();

thing, which uses a sleeping semaphore.

But yes, the comment in the header does say "may not sleep".

Clearly that comment doesn't actually reflect reality - and never did.
The console lock sleeping isn't some new thing (ie it doesn't come
from the somewhat recent printk changes).

So the comment is bogus and wrong.

> Thus, I initially proposed
> https://lkml.kernel.org/r/9cd9d3eb-418f-44cc-afcf-7283d51252d6@I-love.SAKURA.ne.jp
> which makes con_write() no-op when called with IRQs disabled.

The thing is, that's not the only thing that makes atomic context.

And some atomic contexts cannot be detected at run-time, they are
purely static (ie being inside a spinlock withg a !PREEMPT kernel
build).

So you cannot test for this.

The only option is to *mark* the ones that are atomic. Which was my suggestion.

> My major/minor approach is based on a suggestion from Jiri that we just somehow
> disallow attaching this line discipline to a console

Since we already know that the comment is garbage, why do you think
it's just a con_write() that has this issue?

And if it is only the console that has this issue, why are you testing
for other major/minor numbers?

> Now, your 'struct tty_operations' flag saying 'my ->write() function is OK with
> atomic context' is expected to be set to all drivers.

I'm not convinced. The only thing I know is that the comment in
question is wrong, and has been wrong for over a decade (and honestly,
probably pretty much forever).

So how confident are we that other tty write functions are ok?

Also, since you think that only con_write() has a problem, why the
heck are you then testing for ptys etc? From a quick check, the
pty->ops->write() function is fine.

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  2024-04-20 18:02 99%   ` Linus Torvalds
@ 2024-04-20 18:05 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 18:05 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

On Sat, 20 Apr 2024 at 11:02, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Most other normal tty devices just expect ->write() to be called in
> normal process context, so if we do a line discipline flag, it would
                                        ^^^^^^^^^^^^^^^^^^^^
> have to be something like "I'm ok with being called with interrupts
> disabled", and then the n_gsm ->open function would just check that.

Not line discipline - it would be a 'struct tty_operations' flag
saying 'my ->write() function is ok with atomic context".

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  2024-04-20 17:34 97% ` Linus Torvalds
@ 2024-04-20 18:02 99%   ` Linus Torvalds
  2024-04-20 18:05 99%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 18:02 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

On Sat, 20 Apr 2024 at 10:34, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Alternatively, we could go the opposite way, and have some flag in the
> line discipline that says "I can be a console", and just check that in
> tty_set_ldisc() for the console.

Actually, I take that back. It's not /dev/console that is the problem,
that just happened to be the one oops I looked at.

Most other normal tty devices just expect ->write() to be called in
normal process context, so if we do a line discipline flag, it would
have to be something like "I'm ok with being called with interrupts
disabled", and then the n_gsm ->open function would just check that.

So it would end up being just another form of that

  +     if (tty->ops->set_serial == NULL)
  +             return -EINVAL;

check - but maybe more explicit and prettier.

Because a real serial driver might not be ok with it either, if it
uses a semaphore or something.

Whatever. I think the 'set_serial' test would at least be an improvement.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
  @ 2024-04-20 17:34 97% ` Linus Torvalds
  2024-04-20 18:02 99%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 17:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
	LKML, linux-security-module

On Sat, 20 Apr 2024 at 04:12, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Since n_gsm is designed to be used for serial port [1], reject attaching to
> virtual consoles and PTY devices, by checking tty's device major/minor
> numbers at gsmld_open().

If we really just want to restrict it to serial devices, then do
something like, this:

   drivers/tty/n_gsm.c | 2 ++
   1 file changed, 2 insertions(+)

  diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
  index 4036566febcb..24425ef35b2b 100644
  --- a/drivers/tty/n_gsm.c
  +++ b/drivers/tty/n_gsm.c
  @@ -3629,6 +3629,8 @@ static int gsmld_open(struct tty_struct *tty)

        if (tty->ops->write == NULL)
                return -EINVAL;
  +     if (tty->ops->set_serial == NULL)
  +             return -EINVAL;

        /* Attach our ldisc data */
        gsm = gsm_alloc_mux();

which at least matches the current (largely useless) pattern of
checking for a write function.

I think all real serial sub-drivers already have that 'set_serial()'
function, and if there are some that don't, we could just add a dummy
for them. No?

Alternatively, we could go the opposite way, and have some flag in the
line discipline that says "I can be a console", and just check that in
tty_set_ldisc() for the console.

That would probably be a good idea regardless, but likely requires more effort.

But this kind of random major number testing seems wrong. It's trying
to deal with the _symptoms_, not some deeper truth.

                  Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] Btrfs fixes for 6.9-rc5
  @ 2024-04-18  0:14 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-18  0:14 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, linux-kernel

On Wed, 17 Apr 2024 at 16:53, David Sterba <dsterba@suse.com> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.9-rc4-tag

Nol such tag. I see the branch 'for-6.9-rc4' with the right commit,
but not the signed tag. Forgot to push out?

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 00/19] Enable -Wshadow=local for kernel/sched
  2024-04-17  0:29 99%   ` Linus Torvalds
@ 2024-04-17  0:50 90%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-17  0:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox (Oracle),
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel

On Tue, 16 Apr 2024 at 17:29, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So what is the solution to
>
>     #define MAX(a,b) ({ \

Side note: do not focus on the macro name. I'm not interested in "the
solution is MAX3()" kinds of answers.

And the macro doesn't have to be physically nested like that.

The macro could be a list traversal thing.  Appended is an example
list traversal macro that is type-safe and simple to use, and would
absolutely improve on our current "list_for_each_entry()" in many
ways.

Imagine now traversing a list within an entry that happens while
traversing an outer one. Which is not at all an odd thing, IOW, you'd
have

        traverse(bus_list, bus) {
                traverse(&bus->devices, device) {
                        .. do something with the device ..
                }
        }

this kind of macro use that has internal variables that will
inevitably shadow each other when used in some kind of nested model is
pretty fundamental.

So no. The answer is *NOT* some kind of "don't do that then".

             Linus

PS. The list trraversal thing below worked at some point. It's an old
snippet of mine, it might not work any more. It depends on the kernel
'list_head' definitions, it's not a standalone example.

---

    #define list_traversal_head(type, name, member) union {     \
        struct list_head name;                          \
        struct type *name##_traversal_type;             \
        struct type##_##name##_##member##_traversal_struct
*name##_traversal_info; \
    }

    #define list_traversal_node(name) union {           \
        struct list_head name;                          \
        int name##_traversal_node;                      \
    }

    #define DEFINE_TRAVERSAL(from, name, to, member)            \
    struct to##_##name##_##member##_traversal_struct {          \
        char dummy[offsetof(struct to, member##_traversal_node)]; \
        struct list_head node;                          \
    }

    #define __traverse_type(head, ext) typeof(head##ext)
    #define traverse_type(head, ext) __traverse_type(head, ext)

    #define traverse_offset(head) \
        offsetof(traverse_type(*head,_traversal_info), node)

    #define traverse_is_head(head,  raw) \
        ((void *)(raw) == (void *)(head))

    /*
     * Very annoying. We want 'node' to be of the right type, and __raw to be
     * the underlying "struct list_head". But while we can declare multiple
     * variables in a for-loop in C99, we can't declare multiple _types_.
     *
     * So __raw has the wrong type, making everything pointlessly uglier.
     */
    #define traverse(head, node) \
        for (typeof(*head##_traversal_type) __raw = (void
*)(head)->next, node; \
                node = (void *)__raw + traverse_offset(*head),
!traverse_is_head(head, __raw); \
                __raw = (void *) ((struct list_head *)__raw)->next)

    struct first_struct {
        int offset[6];
        list_traversal_head(second_struct, head, entry);
    };

    struct second_struct {
        int hash;
        int offset[17];
        list_traversal_node(entry);
    };

    DEFINE_TRAVERSAL(first_struct, head, second_struct, entry);

    struct second_struct *find(struct first_struct *p)
    {
        traverse(&p->head, node) {
                if (node->hash == 1234)
                        return node;
        }
        return NULL;
    }

^ permalink raw reply	[relevance 90%]

* Re: [PATCH 00/19] Enable -Wshadow=local for kernel/sched
  @ 2024-04-17  0:29 99%   ` Linus Torvalds
  2024-04-17  0:50 90%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-17  0:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox (Oracle),
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel

On Tue, 16 Apr 2024 at 14:15, Kees Cook <keescook@chromium.org> wrote:
>
> I was looking at -Wshadow=local again, and remembered this series. It
> sounded like things were close, but a tweak was needed. What would be
> next to get this working?

So what is the solution to

    #define MAX(a,b) ({ \
        typeof(a) __a = (a); \
        typeof(b) __b = (b); \
        __a > __b ? __a : __b; \
    })

    int test(int a, int b, int c)
    {
        return MAX(a, MAX(b,c));
    }

where -Wshadow=all causes insane warnings that are bogus garbage?

Honestly, Willy's patch-series is a hack to avoid this kind of very
natural nested macro pattern.

But it's a horrible hack, and it does it by making the code actively worse.

Here's the deal: if we can't handle somethng like the above without
warning, -Wshadow isn't getting enabled.

Because we don't write worse code because of bad warnings.

IOW, what is the sane way to just say "this variable can shadow the
use site, and it's fine"?

Without that kind of out, I don't think -Wshadow=local is workable.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v10 1/5] mseal: Wire up mseal syscall
  @ 2024-04-15 18:21 96%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-15 18:21 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: jeffxu, akpm, keescook, jannh, sroettger, willy, gregkh, corbet,
	Liam.Howlett, surenb, merimus, rdunlap, jeffxu, jorgelo, groeck,
	linux-kernel, linux-kselftest, linux-mm, pedro.falcato,
	dave.hansen, linux-hardening, deraadt

On Mon, 15 Apr 2024 at 11:11, Muhammad Usama Anjum
<usama.anjum@collabora.com> wrote:
>
> It isn't logical to wire up something which isn't present

Actually, with system calls, the rules end up being almost opposite.

There's no point in adding the code if it's not reachable. So adding
the system call code before adding the wiring makes no sense.

So you have two cases: add the stubs first, or add the code first.
Neither does anything without the other.

So then you go "add both in the same commit" option, which ends up
being horrible from a "review the code" standpoint. The two parts are
entirely different and mixing them up makes the patch very unclear
(and has very different target audiences for reviewing it - the MM
people really shouldn't have to look at the architecture wiring
parts).

End result: there are no "this is the logical ordering" cases.

But the "wire up system calls" part actually has some reasons to be first:

 - it reserves the system call number

 - it adds the "when system call isn't enabled, return -ENOSYS"
conditional system call logic

so I actually tend prefer this ordering when it comes to system calls.

                Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed
  @ 2024-04-15 15:47 98%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-15 15:47 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Josh Poimboeuf, x86, linux-kernel, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, KP Singh, Waiman Long,
	Borislav Petkov, Ingo Molnar

On Mon, 15 Apr 2024 at 08:27, Nikolay Borisov <nik.borisov@suse.com> wrote:
>
> Same as with every issue - assess the problem and develop fixes.

No. Let's have at least all the infrastructure in place to be a bit proactive.

> Let's be honest, the indirect branches in the syscall handler aren't the
> biggest problem

Oh, they have been.

> it's the stacked LSMs.

Hopefully those will get fixed too.

There's a few other fairly reachable ones (the timer indirection ones
are much too close, and VFS file ops aren't entirely out of reach).

But maybe some day we'll be in a situation where it's actually fairly
hard to reach indirect kernel calls from untrusted user space.

The system call ones are pretty much always the first ones, though.

> And even if those get fixes
> chances are the security people will likely find some other avenue of
> attack, I think even now the attack is somewhat hard to pull off.

No disagreement about that. I think outright sw bugs are still the
99.9% thing. But let's learn from history instead of "assess the
problem" every time anew.

               Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed
  @ 2024-04-15 15:16 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-15 15:16 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Josh Poimboeuf, x86, linux-kernel, Daniel Sneddon, Pawan Gupta,
	Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
	Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
	Andrew Cooper, Dave Hansen, KP Singh, Waiman Long,
	Borislav Petkov, Ingo Molnar

On Mon, 15 Apr 2024 at 00:37, Nikolay Borisov <nik.borisov@suse.com> wrote:
>
> To ask again, what do we gain by having this syscall hardening at the
> same time as the always on BHB scrubbing sequence?

What happens the next time some indirect call problem comes up?

If we had had *one* hardware bug in this area, that would be one
thing. But this has been going on for a decade now.

              Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.9-rc4
@ 2024-04-14 20:48 43% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-14 20:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Nothing particularly unusual going on this week - some new hw
mitigations may stand out, but after a decade of this I can't really
call it "unusual" any more, can I? We also had a bit more bcachefs
fixes, and a turbostat tool update, but other than that it's the
regular drop of random stuff all over.

Drivers end up being the bulk of the remaining stuff, and we still had
some timer fallout from the big timer updates this merge window.
Nothing else really strikes me, but the foll shortlog is appended as
usual - easy enough to just scan through to get kind of a flavor of
what has been going on.

                Linus

---

Aaro Koskinen (6):
      ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
      ARM: OMAP2+: fix N810 MMC gpiod table
      mmc: omap: fix broken slot switch lookup
      mmc: omap: fix deferred probe
      mmc: omap: restore original power up/down steps
      ARM: OMAP2+: fix USB regression on Nokia N8x0

Abhinav Kumar (1):
      drm/msm/dp: fix typo in dp_display_handle_port_status_changed()

Adam Dunlap (1):
      x86/apic: Force native_apic_mem_read() to use the MOV instruction

Adrian Hunter (1):
      bug: Fix no-return-statement warning with !CONFIG_BUG

Alex Constantino (1):
      Revert "drm/qxl: simplify qxl_fence_wait"

Alex Deucher (1):
      drm/amdgpu: always force full reset for SOC21

Alex Hung (2):
      drm/amd/display: Skip on writeback when it's not applicable
      drm/amd/display: Return max resolution supported by DWB

Alexander Wetzel (1):
      scsi: sg: Avoid race in error handling & drop bogus warn

Alexey Izbyshev (1):
      io_uring: Fix io_cqring_wait() not restoring sigmask on
get_timespec64() failure

Amir Goldstein (1):
      kernfs: annotate different lockdep class for of->mutex of writable files

Anna-Maria Behnsen (1):
      PM: s2idle: Make sure CPUs will wakeup directly on resume

Archie Pusaka (1):
      Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit

Ard Biesheuvel (1):
      gcc-plugins/stackleak: Avoid .head.text section

Arnd Bergmann (8):
      ubsan: fix unused variable warning in test module
      nouveau: fix function cast warning
      lib: checksum: hide unused expected_csum_ipv6_magic[]
      irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
      ipv6: fib: hide unused 'pn' variable
      ipv4/route: avoid unused-but-set-variable warning
      net/mlx5: fix possible stack overflows
      tracing: hide unused ftrace_event_id_fops

Arınç ÜNAL (2):
      net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards
      net: dsa: mt7530: trap link-local frames regardless of ST Port State

Ashutosh Dixit (1):
      drm/xe: Label RING_CONTEXT_CONTROL as masked

Bagas Sanjaya (2):
      Documentation: filesystems: Add bcachefs toctree
      MAINTAINERS: Add entry for bcachefs documentation

Bernhard Rosenkränzer (1):
      platform/x86: acer-wmi: Add support for Acer PH18-71

Bjorn Helgaas (1):
      Revert "PCI: Mark LSI FW643 to avoid bus reset"

Boris Brezillon (1):
      drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()

Boris Burkov (6):
      btrfs: qgroup: correctly model root qgroup rsv in convert
      btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
      btrfs: record delayed inode root in transaction
      btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
      btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve
      btrfs: always clear PERTRANS metadata during commit

Breno Leitao (1):
      virtio_net: Do not send RSS key if it is not supported

Brett Creeley (1):
      pds_core: Fix pdsc_check_pci_health function to use work thread

Carolina Jubran (4):
      net/mlx5e: RSS, Block changing channels number when RXFH is configured
      net/mlx5e: Fix mlx5e_priv_init() cleanup flow
      net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
      net/mlx5e: RSS, Block XOR hash with over 128 channels

Chen Yu (1):
      tools/power turbostat: Do not print negative LPI residency

Cosmin Ratiu (2):
      net/mlx5: Properly link new fs rules into the tree
      net/mlx5: Correctly compare pkt reformat ids

Cristian Marussi (1):
      firmware: arm_scmi: Make raw debugfs entries non-seekable

Damien Le Moal (2):
      ata: ahci: Add mask_port_map module parameter
      ata: libata-scsi: Fix ata_scsi_dev_rescan() error path

Dan Carpenter (2):
      bcachefs: fix ! vs ~ typo in __clear_bit_le64()
      scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()

Daniel Machon (1):
      net: sparx5: fix wrong config being used when reconfiguring PCS

Daniel Sneddon (3):
      x86/bhi: Define SPEC_CTRL_BHI_DIS_S
      KVM: x86: Add BHI_NO
      x86/bugs: Fix return type of spectre_bhi_state()

Dave Airlie (2):
      nouveau: fix devinit paths to only handle display on GSP.
      amdkfd: use calloc instead of kzalloc to avoid integer overflow

Dave Jiang (6):
      cxl/core/regs: Fix usage of map->reg_type in
cxl_decode_regblock() before assigned
      cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
      cxl: Fix retrieving of access_coordinates in PCIe path
      cxl: Fix incorrect region perf data calculation
      cxl: Consolidate dport access_coordinate ->hb_coord and
->sw_coord into ->coord
      cxl: Add checks to access_coordinate calculation to fail missing data

David Arinzon (4):
      net: ena: Fix potential sign extension issue
      net: ena: Wrong missing IO completions check order
      net: ena: Fix incorrect descriptor free behavior
      net: ena: Set tx_info->xdpf value to NULL

David McFarland (1):
      platform/x86/intel/hid: Don't wake on 5-button releases

Dexuan Cui (1):
      swiotlb: do not set total_used to 0 in swiotlb_create_debugfs_files()

Dillon Varone (1):
      drm/amd/display: Do not recursively call manual trigger programming

Dmitry Antipov (1):
      Bluetooth: Fix memory leak in hci_req_sync_complete()

Dmitry Baryshkov (3):
      drm/msm/dpu: don't allow overriding data from catalog
      drm/msm/dpu: make error messages at
dpu_core_irq_register_callback() more sensible
      dt-bindings: display/msm: sm8150-mdss: add DP node

Doug Smythies (1):
      tools/power turbostat: Fix added raw MSR output

Eric Dumazet (6):
      xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
      geneve: fix header validation in geneve[6]_xmit_skb
      net: add copy_safe_from_sockptr() helper
      mISDN: fix MISDN_TIME_STAMP handling
      nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
      netfilter: complete validation of user input

Erni Sri Satya Vennela (1):
      x86/hyperv: Cosmetic changes for hv_apic.c

Fabio Estevam (2):
      ARM: dts: imx7-mba7: Use 'no-mmc' property
      ARM: dts: imx7s-warp: Pass OV2680 link-frequencies

Frank Li (8):
      arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
      arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
      arm64: dts: imx8-ss-conn: fix usb lpcg indices
      arm64: dts: imx8-ss-dma: fix spi lpcg indices
      arm64: dts: imx8-ss-dma: fix pwm lpcg indices
      arm64: dts: imx8-ss-dma: fix adc lpcg indices
      arm64: dts: imx8-ss-dma: fix can lpcg indices
      arm64: dts: imx8qm-ss-dma: fix can lpcg indices

Fudongwang (1):
      drm/amd/display: fix disable otg wa logic in DCN316

Gavin Shan (3):
      vhost: Add smp_rmb() in vhost_vq_avail_empty()
      vhost: Add smp_rmb() in vhost_enable_notify()
      arm64: tlb: Fix TLBI RANGE operand

Geetha sowjanya (1):
      octeontx2-af: Fix NIX SQ mode and BP config

Gerd Bayer (2):
      s390/ism: fix receive message buffer allocation
      Revert "s390/ism: fix receive message buffer allocation"

Gergo Koteles (1):
      platform/x86: lg-laptop: fix %s null argument warning

Gwendal Grignou (2):
      platform/x86: intel-vbtn: Use acpi_has_method to check for switch
      platform/x86: intel-vbtn: Update tablet mode switch at end of probe

Haiyue Wang (1):
      io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKE

Hans de Goede (2):
      ACPI: scan: Do not increase dep_unmet for already met dependencies
      platform/x86: toshiba_acpi: Silence logging for some events

Hariprasad Kelam (1):
      octeontx2-pf: Fix transmit scheduler resource leak

Harish Kasiviswanathan (1):
      drm/amdkfd: Reset GPU on queue preemption failure

Harry Wentland (2):
      drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
      drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST

Heiner Kallweit (2):
      r8169: fix LED-related deadlock on module removal
      r8169: add missing conditional compiling for call to r8169_remove_leds

Himal Prasad Ghimiray (1):
      drm/xe/xe_migrate: Cast to output precision before multiplying operands

Hongbo Li (1):
      bcachefs: fix the count of nr_freed_pcpu after changing
bc->freed_nonpcpu list

Huacai Chen (7):
      mm: Move lowmem_page_address() a little later
      LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
      LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
      LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
      LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
      LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
      LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET

Igor Pylypiv (1):
      ata: libata-core: Allow command duration limits detection for ACS-4 drives

Ilya Maximets (1):
      net: openvswitch: fix unwanted error log on timeout policy probing

Ingo Molnar (1):
      x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'

Irui Wang (1):
      media: mediatek: vcodec: Handle VP9 superframe bitstream with 8 sub-frames

Jacek Lawrynowicz (5):
      accel/ivpu: Remove d3hot_after_power_off WA
      accel/ivpu: Put NPU back to D3hot after failed resume
      accel/ivpu: Return max freq for DRM_IVPU_PARAM_CORE_CLOCK_RATE
      accel/ivpu: Fix missed error message after VPU rename
      accel/ivpu: Fix deadlock in context_xa

Jacob Pan (1):
      iommu/vt-d: Allocate local memory for page request queue

Jammy Huang (1):
      drm/ast: Fix soft lockup

Jeff Layton (1):
      MAINTAINERS: remove myself as a Reviewer for Ceph

Jens Wiklander (1):
      firmware: arm_ffa: Fix the partition ID check in
ffa_notification_info_get()

Jiaxun Yang (1):
      MIPS: scall: Save thread_info.syscall unconditionally on entry

Jiri Benc (1):
      ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr

Johan Hovold (2):
      drm/msm/dp: fix runtime PM leak on disconnect
      drm/msm/dp: fix runtime PM leak on connect failure

John Harrison (1):
      drm/i915/guc: Fix the fix for reset lock confusion

John Stultz (3):
      selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior
      selftests: timers: Fix posix_timers ksft_print_msg() warning
      selftests: timers: Fix abs() warning in posix_timers test

Josh Poimboeuf (7):
      x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
      x86/bugs: Fix BHI documentation
      x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
      x86/bugs: Fix BHI handling of RRSBA
      x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
      x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
      x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with
CONFIG_MITIGATION_SPECTRE_BHI

Justin Ernst (1):
      tools/power/turbostat: Fix uncore frequency file string

Karthik Poosa (1):
      drm/xe/hwmon: Cast result to output precision on left shift of operand

Kees Cook (2):
      randomize_kstack: Improve entropy diffusion
      nouveau/gsp: Avoid addressing beyond end of rpc->entries

Kenneth Feng (1):
      drm/amd/pm: fix the high voltage issue after unload

Kent Overstreet (19):
      bcachefs: Make snapshot_is_ancestor() safe
      bcachefs: Bump limit in btree_trans_too_many_iters()
      bcachefs: Move btree_updates to debugfs
      bcachefs: Further improve btree_update_to_text()
      bcachefs: Print shutdown journal sequence number
      bcachefs: Fix rebalance from durability=0 device
      bcachefs: fix rand_delete unit test
      bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
      bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
      bcachefs: JOURNAL_SPACE_LOW
      bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
      bcachefs: fix bch2_get_acl() transaction restart handling
      bcachefs: fix eytzinger0_find_gt()
      bcachefs: Fix check_topology() when using node scan
      bcachefs: Don't scan for btree nodes when we can reconstruct
      bcachefs: btree_node_scan: Respect member.data_allowed
      bcachefs: Fix a race in btree_update_nodes_written()
      bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
      bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()

Krzysztof Kozlowski (3):
      virtio: store owner from modules with register_virtio_driver()
      MAINTAINERS: Change Krzysztof Kozlowski's email address
      iommu: mtk: fix module autoloading

Kuniyuki Iwashima (1):
      af_unix: Clear stale u->oob_skb.

Kuogee Hsieh (1):
      drm/msm/dp: assign correct DP controller ID to x1e80100 interface table

Kwangjin Ko (1):
      cxl/core: Fix initialization of mbox_cmd.size_out in get event

Lang Yu (1):
      drm/amdgpu/umsch: reinitialize write pointer in hw init

Len Brown (4):
      tools/power turbostat: Expand probe_intel_uncore_frequency()
      tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read
      tools/power turbostat: enhance -D (debug counter dump) output
      tools/power turbostat: v2024.04.10

Li Ma (1):
      drm/amd/display: add DCN 351 version for microcode load

Li Zhijian (1):
      hv: vmbus: Convert sprintf() family to sysfs_emit() family

Lijo Lazar (3):
      drm/amdgpu: Refine IB schedule error logging
      drm/amdgpu: Reset dGPU if suspend got aborted
      drm/amdgpu: Fix VCN allocation in CPX partition

Linus Torvalds (3):
      x86/syscall: Don't force use of indirect calls for system calls
      Kconfig: add some hidden tabs on purpose
      Linux 6.9-rc4

Lu Baolu (1):
      iommu/vt-d: Fix WARN_ON in iommu probe path

Luca Weiss (1):
      drm/msm/adreno: Set highest_bank_bit for A619

Lucas De Marchi (1):
      drm/xe/display: Fix double mutex initialization

Luiz Augusto von Dentz (7):
      Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
      Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
      Bluetooth: SCO: Fix not validating setsockopt user input
      Bluetooth: RFCOMM: Fix not validating setsockopt user input
      Bluetooth: L2CAP: Fix not validating setsockopt user input
      Bluetooth: ISO: Fix not validating setsockopt user input
      Bluetooth: hci_sock: Fix not validating setsockopt user input

Manivannan Sadhasivam (1):
      MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer

Marek Vasut (2):
      net: ks8851: Inline ks8851_rx_skb()
      net: ks8851: Handle softirqs at the end of IRQ thread to fix hang

Masami Hiramatsu (1):
      fs/proc: Skip bootloader comment if no embedded kernel parameters

Maurizio Lombardi (1):
      scsi: target: Fix SELinux error when systemd-modules loads the
target module

Michael Kelley (2):
      swiotlb: fix swiotlb_bounce() to do partial sync's correctly
      Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted

Michael Liang (1):
      net/mlx5: offset comp irq index in name by one

Michael S. Tsirkin (1):
      vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE

Michal Luczaj (1):
      af_unix: Fix garbage collector racing against connect()

Miguel Ojeda (1):
      drm/msm: fix the `CRASHDUMP_READ` target of `a6xx_get_shader_block()`

Minda Chen (2):
      net: stmmac: mmc_core: Add GMAC LPI statistics
      net: stmmac: mmc_core: Add GMAC mmc tx/rx missing statistics

Ming Lei (2):
      block: fix q->blkg_list corruption during disk rebind
      block: allow device to have both virt_boundary_mask and max segment size

Namhyung Kim (1):
      perf/x86: Fix out of range data

Nathan Chancellor (1):
      selftests: kselftest: Mark functions that unconditionally call
exit() as __noreturn

NeilBrown (1):
      ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE

Nianyao Tang (1):
      irqchip/gic-v3-its: Fix VSYNC referencing an unmapped VPE on GIC v4.1

Nicolas Dufresne (1):
      media: mediatek: vcodec: Fix oops when HEVC init fails

Noah Loomans (1):
      platform/chrome: cros_ec_uart: properly fix race condition

Nuno Das Neves (1):
      mshyperv: Introduce hv_numa_node_to_pxm_info()

Oleg Nesterov (2):
      selftests/timers/posix_timers: Reimplement check_timer_distribution()
      selftests: kselftest: Fix build failure with NOLIBC

Patryk Wlazlyn (11):
      tools/power turbostat: Print ucode revision only if valid
      tools/power turbostat: Read base_hz and bclk from CPUID.16H if available
      tools/power turbostat: Add --no-msr option
      tools/power turbostat: Add --no-perf option
      tools/power turbostat: Add reading aperf and mperf via perf API
      tools/power turbostat: detect and disable unavailable BICs at runtime
      tools/power turbostat: add early exits for permission checks
      tools/power turbostat: Clear added counters when in no-msr mode
      tools/power turbostat: Add proper re-initialization for perf
file descriptors
      tools/power turbostat: read RAPL counters via perf
      tools/power turbostat: Add selftests

Paulo Alcantara (2):
      smb: client: fix NULL ptr deref in
cifs_mark_open_handles_for_deleted_file()
      smb: client: instantiate when creating SFU files

Pavan Chebbi (1):
      bnxt_en: Reset PTP tx_avail after possible firmware reset

Pavel Begunkov (1):
      io_uring/net: restore msg_control on sendzc retry

Pawan Gupta (4):
      x86/bhi: Add support for clearing branch history at syscall entry
      x86/bhi: Enumerate Branch History Injection (BHI) bug
      x86/bhi: Add BHI mitigation knob
      x86/bhi: Mitigate KVM by default

Peng Liu (1):
      tools/power turbostat: Fix Bzy_MHz documentation typo

Petr Tesarik (2):
      swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
      u64_stats: fix u64_stats_init() for lockdep when used repeatedly
in one file

Pierre Gondois (1):
      firmware: arm_scmi: Fix wrong fastchannel initialization

Prasad Pandit (1):
      tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry

Raag Jadav (1):
      ACPI: bus: allow _UID matching for integer zero

Rahul Rameshbabu (1):
      net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit

Randy Dunlap (1):
      LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors

Rick Edgecombe (4):
      Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
      Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
      hv_netvsc: Don't free decrypted memory
      uio_hv_generic: Don't free decrypted memory

Rik van Riel (1):
      blk-iocost: avoid out of bounds shift

Samuel Holland (1):
      cache: sifive_ccache: Partially convert to a platform driver

Sean Christopherson (1):
      x86/cpu: Actually turn off mitigations by default for
SPECULATION_MITIGATIONS=n

Sebastian Andrzej Siewior (1):
      locking: Make rwsem_assert_held_write_nolockdep() build with PREEMPT_RT=y

Shay Drory (2):
      net/mlx5: E-switch, store eswitch pointer before registering devlink_param
      net/mlx5: Register devlink first under devlink lock

Shradha Gupta (1):
      hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format

Stephen Boyd (1):
      drm/msm: Add newlines to some debug prints

Steve French (2):
      smb3: fix Open files on server counter going negative
      smb3: fix broken reconnect when password changing on the server
by allowing password rotation

Steven Rostedt (Google) (1):
      ring-buffer: Only update pages_touched when a new page is touched

Sumeet Pawnikar (1):
      platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support

Suraj Kandpal (1):
      drm/i915/hdcp: Fix get remote hdcp capability function

Sven Eckelmann (1):
      batman-adv: Avoid infinite loop trying to resize local TT

Tao Zhou (1):
      drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2

Tariq Toukan (1):
      net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev

Thierry Reding (1):
      gpu: host1x: Do not setup DMA for virtual devices

Thomas Bertschinger (1):
      bcachefs: create debugfs dir for each btree

Thomas Gleixner (5):
      timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
      x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
      x86/cpu/amd: Make the CPUID 0x80000008 parser correct
      x86/cpu/amd: Make the NODEID_MSR union actually work
      x86/cpu/amd: Move TOPOEXT enablement into the topology parser

Thorsten Blum (3):
      bcachefs: Rename struct field swap to prevent macro naming collision
      compiler.h: Add missing quote in macro comment
      zonefs: Use str_plural() to fix Coccinelle warning

Tim Harvey (2):
      arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
      arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator

Tim Huang (2):
      drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
      drm/amdgpu: fix incorrect number of active RBs for gfx11

Uwe Kleine-König (1):
      MAINTAINERS: Drop Li Yang as their email address stopped working

Vasant Hegde (3):
      iommu/amd: Fix possible irq lock inversion dependency issue
      iommu/amd: Do not enable SNP when V2 page table is enabled
      iommu/amd: Change log message severity

Vikas Gupta (2):
      bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()
      bnxt_en: Fix error recovery for RoCE ulp client

Ville Syrjälä (7):
      drm/client: Fully protect modes[] with dev->mode_config.mutex
      drm/i915/cdclk: Fix CDCLK programming order when pipes are active
      drm/i915/cdclk: Fix voltage_level programming edge case
      drm/i915/psr: Disable PSR when bigjoiner is used
      drm/i915: Disable port sync when bigjoiner is used
      drm/i915: Disable live M/N updates when using bigjoiner
      drm/i915/vrr: Disable VRR when using bigjoiner

Wachowski, Karol (3):
      accel/ivpu: Check return code of ipc->lock init
      accel/ivpu: Fix PCI D0 state entry in resume
      accel/ivpu: Improve clarity of MMU error messages

Wei Yang (3):
      memblock tests: fix undefined reference to `early_pfn_to_nid'
      memblock tests: fix undefined reference to `panic'
      memblock tests: fix undefined reference to `BIT'

Wenjing Liu (1):
      drm/amd/display: always reset ODM mode in context when adding first plane

Wyes Karny (1):
      tools/power turbostat: Increase the limit for fd opened

Xiang Chen (2):
      scsi: hisi_sas: Handle the NCQ error returned by D2H frame
      scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()

Xianting Tian (1):
      vhost: correct misleading printing information

Xiubo Li (1):
      ceph: switch to use cap_delay_lock for the unlink delay list

Xuchun Shang (1):
      iommu/vt-d: Fix wrong use of pasid config

Yang Li (1):
      eventfs: Fix kernel-doc comments to functions

Yifan Zhang (2):
      drm/amdgpu: add smu 14.0.1 discovery support
      drm/amdgpu: differentiate external rev id for gfx 11.5.0

Yu Kuai (2):
      raid1: fix use-after-free for original bio in raid1_write_request()
      block: fix that blk_time_get_ns() doesn't update time after schedule

Yunfei Dong (3):
      media: mediatek: vcodec: adding lock to protect decoder context list
      media: mediatek: vcodec: adding lock to protect encoder context list
      media: mediatek: vcodec: support 36 bits physical address

Yuquan Wang (1):
      cxl/mem: Fix for the index of Clear Event Record Handle

Zack Rusin (1):
      drm/vmwgfx: Enable DMA mappings with SEV

Zhang Rui (6):
      tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
      tools/power/turbostat: Cache graphics sysfs path
      tools/power/turbostat: Unify graphics sysfs snapshots
      tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
      tools/power/turbostat: Add support for new i915 sysfs knobs
      tools/power/turbostat: Add support for Xe sysfs knobs

ZhenGuo Yin (1):
      drm/amdgpu: clear set_q_mode_offs when VM changed

Zheng Yejian (1):
      kprobes: Fix possible use-after-free issue on kprobe registration

Zhenhua Huang (1):
      fs/proc: remove redundant comments from /proc/bootconfig

Zhigang Luo (1):
      amd/amdkfd: sync all devices to wait all processes being evicted

Zhongwei (1):
      drm/amd/display: Adjust dprefclk by down spread percentage.

lima1002 (1):
      drm/amd/swsmu: Update smu v14.0.0 headers to be 14.0.1 compatible

shaoyunl (2):
      drm/amdgpu : Add mes_log_enable to control mes log feature
      drm/amdgpu : Increase the mes log buffer size as per new MES FW version

^ permalink raw reply	[relevance 43%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-13 17:07 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-13 17:07 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin,
	Alexander Viro, Jan Kara

On Sat, 13 Apr 2024 at 08:16, Christian Brauner <brauner@kernel.org> wrote:
>
> I think it should be ok to allow AT_EMPTY_PATH with NULL because
> userspace can detect whether the kernel allows that by passing
> AT_EMPTY_PATH with a NULL path argument and they would get an error back
> that would tell them that this kernel doesn't support NULL paths.

Yeah, it should return -1 / EFAULT on  older kernels.

> I'd like to try a patch for this next week. It's a good opportunity to
> get into some of the more gritty details of this area.
>
> From a rough first glance most AT_EMPTY_PATH users should be covered by
> adapting getname_flags() accordingly.
>
> Imho, this could likely be done by introducing a single struct filename
> null_filename.

It's probably better to try to special-case it entirely.

See commit 9013c51c630a ("vfs: mostly undo glibc turning 'fstat()'
into 'fstatat(AT_EMPTY_PATH)'") and the numbers in there in
particular.

That still leaves performance on the table exactly because it has to
do that extra "get_user()" to check for an empty path, but it avoids
not only the pathname allocation, but also the setup for the pathname
walk.

If we had a NULL case there, I'd expect that fstatat() and fstat()
would perform the same (modulo a couple of instructions).

Of course, the performance of get_user() will vary depending on
microarchitecture. If you don't have SMAP, it's cheap. It's the
STAC/CLAC that is most of the cost, and the exact cost of those will
then depend on implementations - they *could* be much faster than they
are.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-12 17:43 95%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-12 17:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin,
	Alexander Viro, Jan Kara

Side note: I'd really like to relax another unrelated AT_EMPTY_PATH
issue: we should just allow a NULL path for that case.

The requirement that you pass an actual empty string is insane. It's
wrong. And it adds a noticeable amount of expense to this path,
because just getting the single byte and looking it up is fairly
expensive.

This was more noticeable because glibc at one point (still?) did

        newfstatat(6, "", buf, AT_EMPTY_PATH)

when it should have just done a simple "fstat()".

So there were (are?) a *LOT* of AT_EMPTY_PATH users, and they all do a
pointless "let's copy a string from user space".

And yes, I know exactly why AT_EMPTY_PATH exists: because POSIX
traditionally says that a path of "" has to return -ENOENT, not the
current working directory. So AT_EMPTY_PATH basically says "allow the
empty path for lookup".

But while it *allows* the empty path, it does't *force* it, so it
doesn't mean "avoid the lookup", and we really end up doing a lot of
extra work just for this case. Just the user string copy is a big deal
because of the whole overhead of accessing user space, but it's also
the whole "allocate memory for the path etc".

If we either said "a NULL path with AT_EMPTY_PATH means empty", or
even just added a new AT_NULL_PATH thing that means "path has to be
NULL, and it means the same as AT_EMPTY_PATH with an empty path", we'd
be able to avoid quite a bit of pointless work.

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [GIT PULL] tracing: Fixes for v6.9
  @ 2024-04-12 16:21 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 16:21 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
	Andrew Morton, Arnd Bergmann, Prasad Pandit, Yang Li

On Fri, 12 Apr 2024 at 09:20, Randy Dunlap <rdunlap@infradead.org> wrote:
> >>
> >> Argh. What parser is this? We need to fix this craziness.
>
> something that fedora cares about.
> out-of-tree I expect.

Ok, that shit will now be broken immediately by me adding tabs to our
Kconfig file.

Because no, some out-of-tree garbage is not relevant, and if they
don't fix it out of tree, that's *their* problem, not ours.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Fixes for v6.9
    @ 2024-04-12 16:20 99%     ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 16:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
	Arnd Bergmann, Prasad Pandit, Yang Li

On Fri, 12 Apr 2024 at 09:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Note, the tab is here:

Yeah,  yeah, I checked.

I also checked that the normal "make defconfig" does not care.

In fact, I'm seriously inclined to make sure that our main Kconfig
file has several tabs in several places, just to make damn sure that
any broken sh*t is fixed.

Because no, the fix is *not* to try to fix invisible problems in the
Kconfig files themselves.

I've pulled your thing, but any parsers that think tabs and spaces are
different need to either be fixed, or they need to be shunned.

                     Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Fixes for v6.9
  @ 2024-04-12 16:07 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-12 16:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
	Arnd Bergmann, Prasad Pandit, Yang Li

On Fri, 12 Apr 2024 at 07:29, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE

Argh. What parser is this? We need to fix this craziness.

Yes, yes, we have "tabs and spaces" issues due to the fundamental
brokenness of make, and we can't get rid of *that* bogosity.

But for our own Kconfig files? Whitespace is whitespace (ignoring
crazy unicode extensions), we need to get away from "tabs and spaces
act differently".

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-12 15:36 99%                   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 15:36 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Charles Mirabile, Alexander Viro, Jan Kara, linux-fsdevel,
	linux-kernel, Andrew Lutomirski, Peter Anvin

On Fri, 12 Apr 2024 at 00:46, Christian Brauner <brauner@kernel.org> wrote:
>
> Hm, I would like to avoid adding an exception for O_PATH.

Ack. It's not the important or really relevant part.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-11 20:22 99%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 20:22 UTC (permalink / raw)
  To: Charles Mirabile
  Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
	linux-kernel, Andrew Lutomirski, Peter Anvin

On Thu, 11 Apr 2024 at 13:08, Charles Mirabile <cmirabil@redhat.com> wrote:
>
> The problem with this is that another process might be able to access
> the file during via that name during the brief period before it is
> unlinked. If I am not using NFS, I am always going to prefer using
> O_TMPFILE. I would rather be able to do that without restriction even
> if it isn't the most robust solution by your definition.


Oh, absolutely. I think the right pattern is basically some variation of

    fd = open(filename, O_TMPFILE | O_WRONLY, 0600);
    if (fd < 0) {
        char template{...] = ".tmpfileXXXXXX";
        fd = mkstmp(template);
        unlink(template);
    }
    .. now act on fd to initialize it ..
    linkat(fd, "", AT_FDCWD, "finalname", AT_EMPTY_PATH);

which should work reasonably well in various environments.

Clearly O_TMPFILE is the superior option when it exists. I'm just
saying that anything that *relies* on it existing is dubious.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  2024-04-11 18:13 99%             ` Linus Torvalds
@ 2024-04-11 19:34 89%               ` Linus Torvalds
      1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11 19:34 UTC (permalink / raw)
  To: Charles Mirabile
  Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
	linux-kernel, Andrew Lutomirski, Peter Anvin

On Thu, 11 Apr 2024 at 11:13, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So while I understand your motivation, I actually think it's actively
> wrong to special-case __O_TMPFILE, because it encourages a pattern
> that is bad.

Just to clarify: I think the ns_capable() change is a good idea and
makes sense. The whole "limited to global root" makes no sense if the
file was opened within a namespace, and I think it always just came
from the better check not being obvious at the point where
AT_EMPTY_PATH was checked for.

Similarly, while the FMODE_PATH test _looks_ very similar to an
O_TMPFILE check, I think it's fundamentally different in a conceptual
sense: not only is FMODE_PATH filesystem-agnostic, a FMODE_PATH file
is *only* useful as a pathname (ie no read/write semantics).

And so if a FMODE_PATH file descriptor is passed in from the outside,
I feel like the "you cannot use this to create a path" is kind of a
fundamentally nonsensical rule.

IOW, whoever is passing that FMODE_PATH file descriptor around must
have actually thought about it, and must have opened it with O_PATH,
and it isn't useful for anything else than as a starting point for a
path lookup.

So while I don't think the __O_TMPFILE exception would necessarily be
wrong per se, I am afraid that it would result in people writing
convenient code that "appears to work" in testing, but then fails when
run in an environment where the directory is mounted over NFS (or any
other filesystem that doesn't do ->tmpfile()).

I am certainly open to be convinced otherwise, but I really think that
the real pattern to aim for should just be "look, I opened the file
myself, then filled in the detail, and now I'm doing a linkat() to
expose it" and that the real protection issue should be that "my
credentials are the same for open and linkat".

The other rules (ie the capability check or the FMODE_PATH case) would
be literally about situations where you *want* to pass things around
between protection domains.

In that context, the ns_capable() and the FMODE_PATH check make sense to me.

In contrast, the __O_TMPFILE check just feels like a random detail.

Hmm?

Anyway, end result of that is that this is what that part of the patch
looks like for me right now:

+               if (flags & LOOKUP_DFD_MATCH_CREDS) {
+                       const struct cred *cred = f.file->f_cred;
+                       if (!(f.file->f_mode & FMODE_PATH) &&
+                           cred != current_cred() &&
+                           !ns_capable(cred->user_ns, CAP_DAC_READ_SEARCH)) {
+                               fdput(f);
+                               return ERR_PTR(-ENOENT);
+                       }
+               }

and that _seems_ sensible to me.

But yes, this all has been something that we have failed to do right
for at least a quarter of a century so far, so this needs a *lot* of
thought, even if the patch itself is rather small and looks relatively
obvious.

                 Linus

^ permalink raw reply	[relevance 89%]

* Re: [GIT PULL] turbostat 2024.04.10
  @ 2024-04-11 19:14 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 19:14 UTC (permalink / raw)
  To: Len Brown; +Cc: Linux PM list, Linux Kernel Mailing List

On Thu, 11 Apr 2024 at 11:20, Len Brown <lenb@kernel.org> wrote:
>
> ISTR that once upon a time at the kernel summit you expressed a
> preference that things like utilities (which sometimes depend on merge
> window changes) come in after rc1 is declared to basically stay out of
> the way.

That may have been true at some point, but probably long ago - the
merge windows have been so reliable that it's just not an issue any
more.

So I'd rather see people hold to the normal release cycle, and aim to
have the rc releases for fixes or major problems.

We also used to allow entirely new drivers etc outside the release
cycle as a "this cannot regress" exception to the normal rules, but
that has also been largely abandoned as the release cycle is just
short enough that it makes no sense.

So the "new hardware support" rule has basically been watered down
over the years, and has become a "new hardware IDs are fine" kind of
rule, where just adding basically just a PCI ID or OF matching entry
or similar is still fine, but no more "whole new drivers".

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-11 18:13 99%             ` Linus Torvalds
  2024-04-11 19:34 89%               ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 18:13 UTC (permalink / raw)
  To: Charles Mirabile
  Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
	linux-kernel, Andrew Lutomirski, Peter Anvin

On Thu, 11 Apr 2024 at 10:35, Charles Mirabile <cmirabil@redhat.com> wrote:
>
> And a slightly dubious addition to bypass these checks for tmpfiles
> across the board.

Does this make sense?

I 100% agree that one of the primary reasons why people want flink()
is that "open tmpfile, finalize contents and permissions, then link
the final result into the filesystem".

But I would expect that the "same credentials as open" check is the
one that really matters.

And __O_TMPFILE is just a special case that might not even be used -
it's entirely possible to just do the same with a real file (ie
non-O_TMPFILE) and link it in place and remove the original.

Not to mention that ->tmpfile() isn't necessarily even available, so
the whole concept of "use O_TMPFILE and then linkat" is actually
broken. It *has* to be able to fall back to a regular file to work at
all on NFS.

So while I understand your motivation, I actually think it's actively
wrong to special-case __O_TMPFILE, because it encourages a pattern
that is bad.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: [tip: locking/core] locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending()
  @ 2024-04-11 16:31 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 16:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Uros Bizjak, Ingo Molnar, Waiman Long, x86

On Thu, 11 Apr 2024 at 06:33, tip-bot2 for Uros Bizjak
<tip-bot2@linutronix.de> wrote:
>
> Use try_cmpxchg_acquire(*ptr, &old, new) instead of
> cmpxchg_relaxed(*ptr, old, new) == old in trylock_clear_pending().

The above commit message is horribly confusing and wrong.

I was going "that's not right", because it says "use acquire instead
of relaxed" memory ordering, and then goes on to say "No functional
change intended".

But it turns out the *code* was always acquire, and it's only the
commit message that is wrong, presumably due to a bit too much
cut-and-paste.

But please fix the commit message, and use the right memory ordering
in the explanations too.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  @ 2024-04-11 16:21 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 16:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Jan Kara, linux-fsdevel, linux-kernel,
	Andrew Lutomirski, Peter Anvin

On Thu, 11 Apr 2024 at 05:25, Christian Brauner <brauner@kernel.org> wrote:
>
> Btw, I think we should try to avoid putting this into path_init() and
> confine this to linkat() itself imho. The way I tried to do it was by
> presetting a root for filename_lookup(); means we also don't need a
> LOOKUP_* flag for this as this is mostly a linkat thing.

So I had the exact reverse reaction to your patch - I felt that using
that 'root' thing was the hacky case.

The lookup flag may be limited to linkat(), but it makes the code
smaller and clearer, and avoids having multiple places where we check
dfd.

And that 'root' argument really is the special hacky case, and is not
actually used by any normal system call path, and is meant for
internal kernel use rather than any generic case.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
    @ 2024-04-11 16:15 93%     ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11 16:15 UTC (permalink / raw)
  To: Christian Brauner, Charles Mirabile
  Cc: Alexander Viro, Jan Kara, linux-fsdevel, linux-kernel,
	Andrew Lutomirski, Peter Anvin

On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@kernel.org> wrote:
>
> I had a similar discussion a while back someone requested that we relax
> permissions so linkat can be used in containers.

Hmm.

Ok, that's different - it just wants root to be able to do it, but
"root" being just in the container itself.

I don't think that's all that useful - I think one of the issues with
linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root",
which means that it's effectively useless. Inside a container or out.

Because very few loads run as root-only (and fewer still run with any
capability bits that aren't just "root or nothing").

Before I did all this, I did a Debian code search for linkat with
AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly
because of this "when it's only useful for root, it's hardly useful at
all" issue.

(Of course, my Debian code search may have been broken).

So I suspect your special case is actually largely useless, and what
the container user actually wanted was what my patch does, but they
didn't think that was possible, so they asked to just extend the
"root" notion.

I've added Charles to the Cc.

But yes, with my patch, it would now be trivial to make that

        capable(CAP_DAC_READ_SEARCH)

test also be

        ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)

instead. I suspect not very many would care any more, but it does seem
conceptually sensible.

As to your patch - I don't like your nd->root  games in that patch at
all. That looks odd.

Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue),
but it also makes lookup ignore "/". Which happens to be ok with an
empty path, but still...

So it feels to me like that patch of yours mis-uses something that is
just meant for vfs_path_lookup().

It may happen to work, but it smells really odd to me.

             Linus

^ permalink raw reply	[relevance 93%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  2024-04-11  0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
  2024-04-11  0:20 99% ` Linus Torvalds
@ 2024-04-11  2:39 96% ` Linus Torvalds
      2 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11  2:39 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin

On Wed, 10 Apr 2024 at 17:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> +               if (flags & LOOKUP_DFD_MATCH_CREDS) {
> +                       if (f.file->f_cred != current_cred() &&
> +                           !capable(CAP_DAC_READ_SEARCH)) {
> +                               fdput(f);
> +                               return ERR_PTR(-ENOENT);
> +                       }
> +               }

Side note: I suspect that this could possibly be relaxed further, by
making the rule be that if something has been explicitly opened to be
used as a path (ie O_PATH was used at open time), we can link to it
even across different credentials.

IOW, the above could perhaps even be

+               if (flags & LOOKUP_DFD_MATCH_CREDS) {
+                       if (!(f.file->f_mode & FMODE_PATH) &&
+                           f.file->f_cred != current_cred() &&
+                           !capable(CAP_DAC_READ_SEARCH)) {
+                               fdput(f);
+                               return ERR_PTR(-ENOENT);
+                       }
+               }

which would _allow_ people to pass in paths as file descriptors if
they actually wanted to.

After all, the only thing you can do with an O_PATH file descriptor is
to use it as a path - there would be no other reason to use O_PATH in
the first place. So if you now pass it to somebody else, clearly you
are intentionally trying to make it available *as* a path.

So you could imagine doing something like this:

         // Open path as root
         int fd = open('filename", O_PATH);

        // drop privileges
        // setresuid(..) or chmod() or enter new namespace or whatever

        linkat(fd, "", AT_FDCWD, "newname", AT_EMPTY_PATH);

and it would open the path with one set of privileges, but then
intentionally go into a more restricted mode and create a link to the
source within that restricted environment.

Sensible? Who knows. I'm just throwing this out as another "this may
be the solution to our historical flink() issues".

           Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
  2024-04-11  0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
@ 2024-04-11  0:20 99% ` Linus Torvalds
  2024-04-11  2:39 96% ` Linus Torvalds
    2 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11  0:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin

On Wed, 10 Apr 2024 at 17:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>    "The definition of insanity is doing the same thing over and over
>     again and expecting different results”

Note that I'm sending this patch out not because I plan to commit it,
but to see if people can shoot holes in the concept.

There's a reason why people have tried to do this for decades.

There's also a reason why it has not worked out well.

             Linus

^ permalink raw reply	[relevance 99%]

* [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11  0:10 76% Linus Torvalds
  2024-04-11  0:20 99% ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-04-11  0:10 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Andrew Lutomirski,
	Peter Anvin

   "The definition of insanity is doing the same thing over and over
    again and expecting different results”

We've tried to do this before, most recently with commit bb2314b47996
("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a
decade ago.

But the effort goes back even further than that, eg this thread back
from 1998 that is so old that we don't even have it archived in lore:

    https://lkml.org/lkml/1998/3/10/108

which also points out some of the reasons why it's dangerous.

Or, how about then in 2003:

    https://lkml.org/lkml/2003/4/6/112

where we went through some of the same arguments, just wirh different
people involved.

In particular, having access to a file descriptor does not necessarily
mean that you have access to the path that was used for lookup, and
there may be very good reasons why you absolutely must not have access
to a path to said file.

For example, if we were passed a file descriptor from the outside into
some limited environment (think chroot, but also user namespaces etc) a
'flink()' system call could now make that file visible inside a context
where it's not supposed to be visible.

In the process the user may also be able to re-open it with permissions
that the original file descriptor did not have (eg a read-only file
descriptor may be associated with an underlying file that is writable).

Another variation on this is if somebody else (typically root) opens a
file in a directory that is not accessible to others, and passes the
file descriptor on as a read-only file.  Again, the access to the file
descriptor does not imply that you should have access to a path to the
file in the filesystem.

So while we have tried this several times in the past, it never works.

The last time we did this, that commit bb2314b47996 quickly got reverted
again in commit f0cc6ffb8ce8 (Revert "fs: Allow unprivileged linkat(...,
AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once
the whole discussion about the interface is done".

Well, the discussion is long done, and didn't come to any resolution.
There's no question that 'flink()' would be a useful operation, but it's
a dangerous one.

However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED:
Use creds in file structs") we have had a fairly straightforward way to
check whether the file descriptor was opened by the same credentials as
the credentials of the flink().

That allows the most common patterns that people want to use, which tend
to be to either open the source carefully (ie using the openat2()
RESOLVE_xyz flags, and/or checking ownership with fstat() before
linking), or to use O_TMPFILE and fill in the file contents before it's
exposed to the world with linkat().

But it also means that if the file descriptor was opened by somebody
else, or we've gone through a credentials change since, the operation no
longer works (unless we have CAP_DAC_READ_SEARCH capabilities, as
before).

Note that the credential equality check is done by using pointer
equality, which means that it's not enough that you have effectively the
same user - they have to be literally identical, since our credentials
are using copy-on-write semantics.

So you can't change your credentials to something else and try to change
it back to the same ones between the open() and the linkat().  This is
not meant to be some kind of generic permission check, this is literally
meant as a "the open and link calls are 'atomic' wrt user credentials"
check.

It also means that you can't just move things between namespaces,
because the credentials aren't just a list of uid's and gid's: they
includes the pointer to the user_ns that the capabilities are relative
to.

So let's try this one more time and see if maybe this approach ends up
being workable after all.

Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c            | 17 ++++++++++++-----
 include/linux/namei.h |  1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c5b2a25be7d0..3c684014eb40 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2422,6 +2422,14 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		if (!f.file)
 			return ERR_PTR(-EBADF);
 
+		if (flags & LOOKUP_DFD_MATCH_CREDS) {
+			if (f.file->f_cred != current_cred() &&
+			    !capable(CAP_DAC_READ_SEARCH)) {
+				fdput(f);
+				return ERR_PTR(-ENOENT);
+			}
+		}
+
 		dentry = f.file->f_path.dentry;
 
 		if (*s && unlikely(!d_can_lookup(dentry))) {
@@ -4641,14 +4649,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 		goto out_putnames;
 	}
 	/*
-	 * To use null names we require CAP_DAC_READ_SEARCH
+	 * To use null names we require CAP_DAC_READ_SEARCH or
+	 * that the open-time creds of the dfd matches current.
 	 * This ensures that not everyone will be able to create
 	 * handlink using the passed filedescriptor.
 	 */
-	if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) {
-		error = -ENOENT;
-		goto out_putnames;
-	}
+	if (flags & AT_EMPTY_PATH)
+		how |= LOOKUP_DFD_MATCH_CREDS;
 
 	if (flags & AT_SYMLINK_FOLLOW)
 		how |= LOOKUP_FOLLOW;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 74e0cc14ebf8..678ffe4acf99 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -44,6 +44,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
 #define LOOKUP_BENEATH		0x080000 /* No escaping from starting point. */
 #define LOOKUP_IN_ROOT		0x100000 /* Treat dirfd as fs root. */
 #define LOOKUP_CACHED		0x200000 /* Only do cached lookup */
+#define LOOKUP_DFD_MATCH_CREDS	0x400000 /* Require that dfd creds match current */
 /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
 #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
 
-- 
2.44.0.330.g4d18c88175


^ permalink raw reply related	[relevance 76%]

* Re: [GIT PULL for v6.9-rc4] media fixes
  @ 2024-04-10 20:53 98% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-10 20:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Greg Kroah-Hartman, Andrew Morton, Linux Media Mailing List,
	Linux Kernel Mailing List

On Wed, 10 Apr 2024 at 09:39, Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>
>   - some fixes causing oops on mediatec vcodec encoder/decoder.

Well, I certainly hope it's not the fixes that cause oopses. That
would be the opposite of a fix.

However, having fixed that, I also find some of the fixes in here
rather broken: commit d353c3c34af0 ("media: mediatek: vcodec: support
36 bits physical address") has a "fix" for a cast like this:

-       dec->bs_dma = (unsigned long)bs->dma_addr;
+       dec->bs_dma = (uint64_t)bs->dma_addr;

but the underlying problem was in fact that the cast was WRONG TO EVEN EXIST.

Both 'bs_dma' and 'dma_addr' are integers. The cast is pointless and
wrong. It makes the code look like it is doing something else than
what it's doing, and that something else would be wrong anyway (ie if
it is a cast from a pointer, it would be doubly wrong).

IOW, as far as I can tell, the fix *should* have been to just remove
the cast entirely since it was pointless.

I've pulled this, but please people - make the pull request
description make sense, and when fixing bugs, please think about the
code a bit more than just do a mindless conversion.

           Linus

^ permalink raw reply	[relevance 98%]

* Re: [GIT PULL] turbostat 2024.04.10
  @ 2024-04-10 20:18 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-10 20:18 UTC (permalink / raw)
  To: Len Brown; +Cc: Linux PM list, Linux Kernel Mailing List

On Wed, 10 Apr 2024 at 06:24, Len Brown <lenb@kernel.org> wrote:
>
> Turbostat version 2024.04.10

Tssk. Things like this should still come in during the merge window
and preferably be in linux-next.

I have pulled this, since it's obviously just tooling (and the
maintainer file pattern update), but stil...

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()
  @ 2024-04-08 20:10 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 20:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
	pmladek, Arnd Bergmann, Al Viro

On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> And get rid of manual truncation down to u8, etc. in there - the
> only reason for those is to avoid bogus warnings about constant
> truncation from sparse, and those are easy to avoid by turning
> that switch into conditional expression.

I support the use of the conditional, but why add the 16-bit case when
it turns out we don't want it after all?

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-04-08 20:05 85%                   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 20:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Philipp Stanner, Kent Overstreet, Boqun Feng,
	rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
	Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Mon, 8 Apr 2024 at 11:14, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, PA-RISC is no better - the same "fetch and replace with constant"
> kind of primitive as for sparc32, only the constant is (u32)0 instead
> of (u8)~0.  And unlike sparc64, 64bit variant didn't get better.

Heh. The thing about PA-RISC is that it is actually *so* much worse
that it was never useful for an arithmetic type.

IOW, the fact that sparc used just a byte meant that the aotmic_t
hackery on sparc still gave us 24 useful bits in a 32-bit atomic_t.

So long ago, we used to have an arithmetic atomic_t that was 32-bit on
all sane architectures, but only had a 24-bit range on sparc.

And I know you know all this, I'm just explaining the horror for the audience.

On PA-RISC you couldn't do that horrendous trick, so parist just used
the "we use a hashed spinlock for all atomics", and "atomic_t" was a
regular full-sized integer type.

Anyway, the sparc 24-bit atomics were actually replaced by the PA-RISC
version back twenty years ago (almost to the day):

   https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=373f1583c5c5

and while we still had some left-over of that horror in the git tree
up until 2011 (until commit 348738afe530: "sparc32: drop unused
atomic24 support") we probably should have made the
"arch_atomic_xyz()" ops work on generic types rather than "atomic_t"
for a long long time, so that you could use them on other things than
"atomic_t" and friends.

You can see the casting horror here, for example:

   include/asm-generic/bitops/atomic.h

where we do that cast from "volatile unsigned long *p" to
"atomic_long_t *" just to use the raw_atomic_long_xyz() operations.

It would make more sense if the raw atomics took that "native"
volatile unsigned long pointer directly.

(And here that "volatile" is not because it's necessary used as a
volatile - it is - but simply because it's the most permissive type of
pointer. You can see other places using "const volatile unsigned long"
pointers for the same reason: passing in a non-const or non-volatile
pointer is perfectly fine).

              Linus

^ permalink raw reply	[relevance 85%]

* Re: More annoying code generation by clang
  2024-04-08 18:32 99%   ` Linus Torvalds
@ 2024-04-08 19:42 77%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).

Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):

      https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).

This version also has that output size fixed, and the commit message
talks about it.

This does *not* fix other inline asms to use "ASM_INPUT_G/RM".

I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.

I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.

Anyway, with gcc this generates

        cmp (%rdx),%ebx; sbb %rax,%rax  # _7->max_fds, fd, __mask

IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.

With clang, this generates

        movl    (%rcx), %eax
        cmpl    %eax, %edi
        sbbq    %rdi, %rdi

which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".

               Linus

[-- Attachment #2: 0001-x86-improve-array_index_mask_nospec-code-generation.patch --]
[-- Type: text/x-patch, Size: 4554 bytes --]

From 7779d285040bab685296da2cd0afe9d2d7b58969 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 8 Apr 2024 11:38:30 -0700
Subject: [PATCH] x86: improve array_index_mask_nospec() code generation

Don't force the inputs to be 'unsigned long', when the comparison can
easily be done in 32-bit if that's more appropriate.

Note that while we can look at the inputs to choose an appropriate size
for the compare instruction, the output is fixed at 'unsigned long'.
That's not technically optimal either, since a 32-bit 'sbbl' would often
be sufficient.

But for the outgoing mask we don't know how the mask ends up being used
(ie we have uses thathave an incoming 32-bit array index, but end up
using the mask for other things).  That said, it only costs the extra
REX prefix to always generate the 64-bit mask.

[ A 'sbbl' also always technically generates a 64-bit mask, but with the
  upper 32 bits clear: that's fine for when the incoming index that will
  be masked is already 32-bit, but not if you use the mask to mask a
  pointer afterwards, like the file table lookup does ]

Also, work around clang problems with asm constraints that have multiple
possibilities, particularly "g" and "rm".  Clang seems to turn inputs
like that into the most generic form, which is the memory input - but to
make matters worse, clang won't even use a possible original memory
location, but will spill the value to stack, and use the stack for the
asm input.

See

  https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

for some explanation of why clang has this strange behavior, but the end
result is that "g" and "rm" really end up generating horrid code.

Link: https://github.com/llvm/llvm-project/issues/20571
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/include/asm/barrier.h | 24 ++++++++++--------------
 include/linux/compiler-clang.h | 12 ++++++++++++
 include/linux/compiler_types.h |  9 +++++++++
 3 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..234fd892e39e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,16 @@
  * Returns:
  *     0 - (index < size)
  */
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
-		unsigned long size)
-{
-	unsigned long mask;
-
-	asm volatile ("cmp %1,%2; sbb %0,%0;"
-			:"=r" (mask)
-			:"g"(size),"r" (index)
-			:"cc");
-	return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({	\
+	typeof((idx)+(sz)) __idx = (idx);	\
+	typeof(__idx) __sz = (sz);		\
+	unsigned long __mask;			\
+	asm volatile ("cmp %1,%2; sbb %0,%0"	\
+			:"=r" (__mask)		\
+			:ASM_INPUT_G (__sz),	\
+			 "r" (__idx)		\
+			:"cc");			\
+	__mask; })
 
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() asm volatile("lfence":::"memory")
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 49feac0162a5..0dee061fd7a6 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -118,3 +118,15 @@
 
 #define __diag_ignore_all(option, comment) \
 	__diag_clang(13, ignore, option)
+
+/*
+ * clang has horrible behavior with "g" or "rm" constraints for asm
+ * inputs, turning them into something worse than "m". Avoid using
+ * constraints with multiple possible uses (but "ir" seems to be ok):
+ *
+ *	https://github.com/llvm/llvm-project/issues/20571
+ *	https://github.com/llvm/llvm-project/issues/30873
+ *	https://github.com/llvm/llvm-project/issues/34837
+ */
+#define ASM_INPUT_G "ir"
+#define ASM_INPUT_RM "r"
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 2abaa3a825a9..e53acd310545 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -380,6 +380,15 @@ struct ftrace_likely_data {
 #define asm_goto_output(x...) asm volatile goto(x)
 #endif
 
+/*
+ * Clang has trouble with constraints with multiple
+ * alternative behaviors (mainly "g" and "rm").
+ */
+#ifndef ASM_INPUT_G
+  #define ASM_INPUT_G "g"
+  #define ASM_INPUT_RM "rm"
+#endif
+
 #ifdef CONFIG_CC_HAS_ASM_INLINE
 #define asm_inline asm __inline
 #else
-- 
2.44.0.330.g4d18c88175


^ permalink raw reply related	[relevance 77%]

* Re: More annoying code generation by clang
  @ 2024-04-08 18:32 99%   ` Linus Torvalds
  2024-04-08 19:42 77%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-08 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Mon, 8 Apr 2024 at 01:49, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Should this not carry a comment about the "ir" constraint wanting to be
> "g" except for clang being daft?

Yeah. Except I think I'll do something like

  /* Clang messes up "g" as an asm source */
  #define ASM_SOURCE_G "ir"

in <linux/compiler-clang.h>, and

  #ifndef ASM_SOURCE_G
    #define ASM_SOURCE_G "g"
  #endif

in linux/compiler.h.

> (I really wish clang would go fix this, it keeps coming up time and
> again).

It's been reported long ago, it seems to be hard to fix.

I suspect the issue is that the inline asm format is fairly closely
related to the gcc machine descriptions (look at the machine
descriptor files in gcc, and if you can ignore the horrid LISP-style
syntax you see how close they are).

And clang has a different model and needs to "translate" things, and
that one doesn't translate.

It's not like we don't have workarounds for gcc bugs in this area too
(eg "asm_goto_output()", née "asm_volatile_goto()").

There was another bug in my patch, though: the output mask should
always be "unsigned long", not tied to the input type.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-04-08 17:01 75%               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-08 17:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Philipp Stanner, Kent Overstreet, Boqun Feng, rust-for-linux,
	linux-kernel, linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
	Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
	Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
	Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
	Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
	linux-arm-kernel, linux-fsdevel

On Mon, 8 Apr 2024 at 09:02, Matthew Wilcox <willy@infradead.org> wrote:
>
> What annoys me is that 'volatile' accesses have (at least) two distinct
> meanings:
>  - Make this access untorn
>  - Prevent various optimisations (code motion,
>    common-subexpression-elimination, ...)

Oh, I'm not at all trying to say that "volatile" is great.

My argument was that the C (and C++, and Rust) model of attaching
memory ordering to objects is actively bad. and limiting.

Because the whole "the access rules are context-dependent" is really
fundamental. Anybody who designs an atomic model around the object is
simply not doing it right.

Now, the "volatile" rules actually make sense in a historical
"hardware access" context. So I do not think "volatile" is great, but
I also don't think K&R were incompetent. "volatile" makes perfect
sense in the historical setting of "direct hardware access".

It just so happens that there weren't other tools, so then you end up
using "volatile" for cached memory too when you want to get "access
once" semantics, and then it isn't great.

And then you have *too* many tools on the standards bodies, and they
don't understand atomics, and don't understand volatile, and they have
been told that "volatile" isn't great for atomics because it doesn't
have memory ordering semantics, but do not understand the actual
problem space.

So those people - who in some cases spent decades arguing about (and
sometimes against) "volatile" think that despite all the problems, the
solution for atomics is to make the *same* mistake, and tie it to the
data and the type system, not the action.

Which is honestly just plain *stupid*. What made sense for 'volatile'
in a historical setting, absolutely does not make sense for atomics.

> As an example, folio_migrate_flags() (in mm/migrate.c):
>
>         if (folio_test_error(folio))
>                 folio_set_error(newfolio);
>         if (folio_test_referenced(folio))
>                 folio_set_referenced(newfolio);
>         if (folio_test_uptodate(folio))
>                 folio_mark_uptodate(newfolio);
>
> ... which becomes...

[ individual load and store code generation removed ]

> In my ideal world, the compiler would turn this into:
>
>         newfolio->flags |= folio->flags & MIGRATE_MASK;

Well, honestly, we should just write the code that way, and not expect
too much of the compiler.

We don't currently have a "generic atomic or" operation, but we
probably should have one.

For our own historical reasons, while we have a few generic atomic
operations: bit operations, cmpxchg, etc, most of our arithmetic and
logical ops all rely on a special "atomic_t" type (later extended with
"atomic_long_t").

The reason? The garbage that is legacy Sparc atomics.

Sparc historically basically didn't have any atomics outside of the
'test and set byte' one, so if you wanted an atomic counter thing, and
you cared about sparc, you had to play games with "some bits of the
counter are the atomic byte lock".

And we do not care about that Sparc horror any *more*, but we used to.

End result: instead of having "do atomic ops on a normal type" - which
would be a lot more powerful - we have this model of "do atomic ops on
atomic_t".

We could fix that now. Instead of having architectures define

   arch_atomic_or(int i, atomic_t *v)

operations, we could - and should - make the 'arch' atomics be

   arch_atomic_or(int i, unsigned int *v)

and then we'd still keep the "atomic_t" around for type safety
reasons, but code that just wants to act on an "int" (or a "long")
atomically could just do so.

But in your case, I don't think you actually need it:

> Part of that is us being dumb; folio_set_foo() should be __folio_set_foo()
> because this folio is newly allocated and nobody else can be messing
> with its flags word yet.  I failed to spot that at the time I was doing
> the conversion from SetPageFoo to folio_set_foo.

This is part of my "context matters" rant and why I do *not* think
atomics should be tied to the object, but to the operation.

The compiler generally doesn't know the context rules (insert "some
languages do know in some cases" here), which is why we as programmers
should just use different operations when we do.

In this case, since it's a new folio that hasn't been exposed to
anybody, you should just have done exactly that kind of

    newfolio->flags |= folio->flags & MIGRATE_MASK;

which we already do in the page initialization code when we know we
own the flags (set_page_zone, set_page_zone, set_page_section).

We've generally avoided doing this in general, though - even the buddy
allocator seldom does it. The only case of manual "I know I own the
flags" I know if (apart from the initialization itself) is

        ->flags &= ~PAGE_FLAGS_CHECK_AT_FREE;
     ...
        ->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;

kinds of things at free/alloc time.

> But if the compiler people could give us something a little more
> granular than "scary volatile access disable everything", that would
> be nice.  Also hard, because now you have to figure out what this new
> thing interacts with and when is it safe to do what.

I think it would be lovely to have some kind of "atomic access"
operations that the compiler could still combine when it can see that
"this is invisible at a cache access level".

But as things are now, we do have most of those in the kernel, and
what you ask for can either be done today, or could be done (like that
"arch_atomic_or()") with a bit of re-org.

                       Linus

^ permalink raw reply	[relevance 75%]

* Linux 6.9-rc3
@ 2024-04-07 20:39 41% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-07 20:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Ok, so this rc3 looks a bit different than the usual ones, because
there's a large series to bcachefs to do filesystem repair after
corruption. Not normally something we'd see in an rc kernel, but hey,
if you had a corrupted bcachefs filesystem you'd probably want this,
and if you thought bcachefs was stable already, I have a bridge to
sell you. Special deal only for you, real cheap.

The bcachefs part is a bit over a third of the patch, and if you
ignore that part, things look fairly normal, although there's perhaps
a bit more sound SoC noise than is common.

So the rest is mostly drivers (already mentioned sound, but also
networking and gpu), architecture fixes (mainly x86 and s390, some
arm64), some other filesystem noise (mainly smb client), some selftest
updates, and a random smattering elsewhere.

It's not really all that big, although the bcachefs changes do make it
bigger than typical for an rc3.

Shortlog appended, please keep testing,

                 Linus

---

Adam Goldman (1):
      firewire: ohci: mask bus reset interrupts between ISR and bottom half

Aleksandr Loktionov (2):
      i40e: fix i40e_count_filters() to count only active/new filters
      i40e: fix vf may be used uninitialized in this function warning

Aleksandr Mishin (2):
      net: phy: micrel: Fix potential null pointer dereference
      octeontx2-af: Add array index check

Alexandre Ghiti (2):
      riscv: Fix warning by declaring arch_cpu_idle() as noinstr
      riscv: Disable preemption when using patch_map()

Alexey Makhalov (1):
      MAINTAINERS: change vmware.com addresses to broadcom.com

Amadeusz Sławiński (1):
      ASoC: Intel: avs: boards: Add modules description

Andi Shyti (4):
      drm/i915/gt: Limit the reserved VM space to only the platforms
that need it
      drm/i915/gt: Disable HW load balancing for CCS
      drm/i915/gt: Do not generate the command streamer for all the CCS
      drm/i915/gt: Enable only one CCS for compute workload

Andreas Schwab (1):
      riscv: use KERN_INFO in do_trap

Andrey Albershteyn (1):
      xfs: allow cross-linking special files without project quota

Andrii Nakryiko (2):
      bpf: put uprobe link's path and task in release callback
      bpf: support deferring bpf_link dealloc to after RCU grace period

André Apitzsch (1):
      regulator: tps65132: Add of_match table

Ankit Nautiyal (1):
      drm/i915/dp: Fix the computation for compressed_bpp for DISPLAY < 13

Anna-Maria Behnsen (1):
      timers/migration: Return early on deactivation

Antoine Tenart (5):
      udp: do not accept non-tunnel GSO skbs landing in a tunnel
      gro: fix ownership transfer
      udp: do not transition UDP GRO fraglist partial checksums to unnecessary
      udp: prevent local UDP tunnel packets from being GROed
      selftests: net: gro fwd: update vxlan GRO test expectations

Anton Protopopov (1):
      bpf: fix possible file descriptor leaks in verifier

Anup Patel (2):
      RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
      RISC-V: KVM: Fix APLIC in_clrip[x] read emulation

Arnd Bergmann (6):
      ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
      scsi: mylex: Fix sysfs buffer lengths
      vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
      i2c: pxa: hide unused icr_bits[] variable
      ata: sata_mv: Fix PCI device ID table declaration compilation warning
      x86/numa/32: Include missing <asm/pgtable_areas.h>

Arun R Murthy (1):
      drm/i915/dp: Remove support for UHBR13.5

Ashish Kalra (1):
      KVM: SVM: Add support for allowing zero SEV ASIDs

Atlas Yu (1):
      r8169: skip DASH fw status checks when DASH is disabled

Bartosz Golaszewski (1):
      gpio: cdev: check for NULL labels when sanitizing them for irqs

Bastien Nocera (1):
      Bluetooth: Fix TOCTOU in HCI debugfs implementation

Björn Töpel (1):
      riscv: Fix vector state restore in rt_sigreturn()

Borislav Petkov (AMD) (6):
      x86/retpoline: Do the necessary fixup to the Zen3/4 srso return
thunk for !SRSO
      x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
      x86/cc: Add cc_platform_set/_clear() helpers
      x86/CPU/AMD: Track SNP host status with cc_platform_*()
      x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
      x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk

Brendan Jackman (1):
      Documentation: dev-tools: Add link to RV docs

Carlos Song (1):
      spi: spi-fsl-lpspi: remove redundant spi_controller_put call

Chaitanya Kumar Borah (1):
      ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path

Charles Keepax (1):
      ASoC: cs42l43: Correct extraction of data pointer in suspend/resume

Chen Ni (1):
      ata: sata_gemini: Check clk_enable() result

Chengming Zhou (1):
      9p: remove SLAB_MEM_SPREAD flag usage

Christian Bendiksen (1):
      ALSA: hda/realtek: Add sound quirks for Lenovo Legion slim 7
16ARHA7 models

Christian Brauner (3):
      block: handle BLK_OPEN_RESTRICT_WRITES correctly
      block: count BLK_OPEN_RESTRICT_WRITES openers
      fs,block: yield devices early

Christian Göttsche (1):
      selinux: avoid dereference of garbage after mount failure

Christian Hewitt (1):
      drm/panfrost: fix power transition timeout warnings

Christoffer Sandberg (1):
      ALSA: hda/realtek - Fix inactive headset mic jack

Christoph Hellwig (3):
      nvme-multipath: don't inherit LBA-related fields for the multipath node
      nvme: split nvme_update_zone_info
      nvme: don't create a multipath node for zero capacity devices

Christophe JAILLET (4):
      ata: ahci_st: Remove an unused field in struct st_ahci_drv_data
      vboxsf: Avoid an spurious warning if load_nls_xxx() fails
      vboxsf: Remove usage of the deprecated ida_simple_xx() API
      net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()

Chuck Lever (1):
      SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP

Colin Ian King (4):
      KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
      RISC-V: KVM: Remove second semicolon
      drm/nouveau/gr/gf100: Remove second semicolon
      vboxsf: remove redundant variable out_len

Damien Le Moal (1):
      nullblk: Fix cleanup order in null_add_dev() error path

Dan Carpenter (1):
      ice: Fix freeing uninitialized pointers

Daniel Wagner (2):
      nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists
      nvme-fc: rename free_ctrl callback to match name pattern

Dave Airlie (1):
      nouveau/uvmm: fix addr/range calcs for remap operations

David Hildenbrand (2):
      mm/secretmem: fix GUP-fast succeeding on secretmem folios
      x86/mm/pat: fix VM_PAT handling in COW mappings

David Howells (1):
      cifs: Fix caching to try to do open O_WRONLY as rdwr on server

David Thompson (1):
      mlxbf_gige: stop interface during shutdown

Davide Caratti (1):
      mptcp: don't account accept() of non-MPC client as fallback to TCP

Dominique Martinet (1):
      9p: Fix read/write debug statements to report server reply

Donald Hunter (1):
      docs: Fix bitfield handling in kernel-doc

Duanqiang Wen (1):
      net: txgbe: fix i2c dev name cannot match clkdev

Duoming Zhou (1):
      ax25: fix use-after-free bugs caused by ax25_ds_del_timer

Edward Liaw (1):
      selftests/mm: include strings.h for ffsl

Eric Dumazet (5):
      net: do not consume a cacheline for system_page_pool
      erspan: make sure erspan_base_hdr is present in skb->head
      net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
      net/sched: act_skbmod: prevent kernel-infoleak
      netfilter: validate user input for expected length

Frederic Weisbecker (1):
      timers/migration: Fix ignored event due to missing CPU update

Geliang Tang (1):
      selftests: mptcp: join: fix dev in check_endpoint

Gergo Koteles (1):
      ASoC: tas2781: mark dvc_tlv with __maybe_unused

Guenter Roeck (2):
      mean_and_variance: Drop always failing tests
      nios2: Only use built-in devicetree blob if configured to do so

Haiyang Zhang (1):
      net: mana: Fix Rx DMA datasize and skb_over_panic

Hannes Reinecke (1):
      nvmet: implement unique discovery NQN

Hans de Goede (1):
      gpiolib: Fix triggering "kobject: 'gpiochipX' is not
initialized, yet" kobject_get() errors

Hariprasad Kelam (1):
      octeontx2-af: Fix issue with loading coalesced KPU profiles

Heiko Carstens (1):
      s390/mm: fix NULL pointer dereference

Heiner Kallweit (1):
      r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d

Herve Codina (2):
      driver core: Introduce device_link_wait_removal()
      of: dynamic: Synchronize of_changeset_destroy() with the devlink removals

Hongbo Li (1):
      bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc

Horatiu Vultur (1):
      net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping

Huai-Yuan Liu (1):
      spi: mchp-pci1xxx: Fix a possible null pointer dereference in
pci1xxx_spi_probe

Hui Wang (1):
      Bluetooth: hci_event: set the conn encrypted before conn establishes

I Gede Agastya Darma Laksana (1):
      ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support
headset with microphone

Ilya Leoshkevich (2):
      s390/atomic: mark all functions __always_inline
      s390/preempt: mark all functions __always_inline

Imre Deak (1):
      drm/i915/dp: Fix DSC state HW readout for SST connectors

Ivan Vecera (2):
      i40e: Enforce software interrupt during busy-poll exit
      i40e: Fix VF MAC filter removal

Jaewon Kim (1):
      spi: s3c64xx: Use DMA mode from fifo size

Jakub Kicinski (1):
      selftests: reuseaddr_conflict: add missing new line at the end
of the output

Jakub Sitnicki (1):
      bpf, sockmap: Prevent lock inversion deadlock in map delete elem

Jason A. Donenfeld (1):
      x86/coco: Require seeding RNG with RDRAND on CoCo systems

Jeff Layton (2):
      vboxsf: explicitly deny setlease attempts
      nfsd: hold a lighter-weight client reference over CB_RECALL_ANY

Jens Axboe (7):
      io_uring/rw: don't allow multishot reads without NOWAIT support
      io_uring: disable io-wq execution of multishot NOWAIT requests
      io_uring: use private workqueue for exit work
      io_uring/kbuf: get rid of lower BGID lists
      io_uring/kbuf: get rid of bl->is_ready
      io_uring/kbuf: protect io_buffer_list teardown with a reference
      io_uring/kbuf: hold io_buffer_list reference over mmap

Jesper Dangaard Brouer (1):
      xen-netfront: Add missing skb_mark_for_recycle

Jisheng Zhang (1):
      riscv: mm: implement pgprot_nx

Joan Bruguera Micó (1):
      x86/bpf: Fix IP for relocating call depth accounting

Johan Hovold (5):
      Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
      dt-bindings: bluetooth: add 'qcom,local-bd-address-broken'
      arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
      Bluetooth: add quirk for broken address properties
      Bluetooth: qca: fix device-address endianness

John Sperbeck (1):
      init: open output files from cpio unpacking with O_LARGEFILE

Jose Ignacio Tornos Martinez (1):
      net: usb: ax88179_178a: avoid the interface always configured as
random address

Joshua Hay (1):
      idpf: fix kernel panic on unknown packet types

Jouni Högander (3):
      drm/i915/psr: Calculate PIPE_SRCSZ_ERLY_TPT value
      drm/i915/psr: Move writing early transport pipe src
      drm/i915/psr: Fix intel_psr2_sel_fetch_et_alignment usage

Justin Stitt (1):
      smb: client: replace deprecated strncpy with strscpy

Kan Liang (1):
      perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event

Kent Gibson (1):
      gpio: cdev: fix missed label sanitizing in debounce_setup()

Kent Overstreet (48):
      bcachefs: Fix assert in bch2_backpointer_invalid()
      bcachefs: Fix journal pins in btree write buffer
      bcachefs: fix mount error path
      bcachefs: Add an assertion for trying to evict btree root
      bcachefs: Move snapshot table size to struct snapshot_table
      bcachefs: Add checks for invalid snapshot IDs
      bcachefs: Don't do extent merging before journal replay is finished
      bcachefs: btree_and_journal_iter now respects
trans->journal_replay_not_finished
      bcachefs: Be careful about btree node splits during journal replay
      bcachefs: Improved topology repair checks
      bcachefs: Check btree ptr min_key in .invalid
      bcachefs: Fix btree node keys accounting in topology repair path
      bcachefs: Fix use after free in bch2_check_fix_ptrs()
      bcachefs: Fix repair path for missing indirect extents
      bcachefs: Fix use after free in check_root_trans()
      bcachefs: Kill bch2_bkey_ptr_data_type()
      bcachefs: Fix bch2_btree_increase_depth()
      bcachefs: fix backpointer for missing alloc key msg
      bcachefs: Split out recovery_passes.c
      bcachefs: Add error messages to logged ops fns
      bcachefs: Resume logged ops after fsck
      bcachefs: Flush journal immediately after replay if we did early repair
      bcachefs: Ensure bch_sb_field_ext always exists
      bcachefs: bch2_run_explicit_recovery_pass_persistent()
      bcachefs: Improve -o norecovery; opts.recovery_pass_limit
      bcachefs: Logged op errors should be ignored
      bcachefs: Fix remove_dirent()
      bcachefs: Fix overlapping extent repair
      bcachefs: On emergency shutdown, print out current journal sequence number
      bcachefs: Fix btree node reserve
      bcachefs: BCH_WATERMARK_interior_updates
      bcachefs: fix nocow lock deadlock
      bcachefs: Improve bch2_btree_update_to_text()
      bcachefs: Check for bad needs_discard before doing discard
      bcachefs: ratelimit informational fsck errors
      bcachefs: Clear recovery_passes_required as they complete without errors
      bcachefs: bch2_shoot_down_journal_keys()
      bcachefs: Etyzinger cleanups
      bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
      bcachefs: Don't skip fake btree roots in fsck
      bcachefs: Repair pass for scanning for btree nodes
      bcachefs: Topology repair now uses nodes found by scanning to fill holes
      bcachefs: Flag btrees with missing data
      bcachefs: Reconstruct missing snapshot nodes
      bcachefs: Check for extents that point to same space
      bcachefs: Subvolume reconstruction
      bcachefs: reconstruct_inode()
      aio: Fix null ptr deref in aio_complete() wakeup

Krzysztof Kozlowski (11):
      docs: dt-bindings: add missing address/size-cells to example
      dt-bindings: ufs: qcom: document SC8180X UFS
      dt-bindings: ufs: qcom: document SC7180 UFS
      dt-bindings: ufs: qcom: document SM6125 UFS
      ptp: MAINTAINERS: drop Jeff Sipek
      ata: pata_macio: drop driver owner assignment
      dt-bindings: clock: keystone: remove unstable remark
      dt-bindings: clock: ti: remove unstable remark
      dt-bindings: remoteproc: ti,davinci: remove unstable remark
      dt-bindings: soc: fsl: narrow regex for unit address to hex numbers
      dt-bindings: timer: narrow regex for unit address to hex numbers

Kuniyuki Iwashima (9):
      tcp: Fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses.
      tcp: Fix bind() regression for v6-only wildcard and
v4(-mapped-v6) non-wildcard addresses.
      selftest: tcp: Make bind() selftest flexible.
      selftest: tcp: Define the reverse order bind() tests explicitly.
      selftest: tcp: Add v4-v4 and v6-v6 bind() conflict tests.
      selftest: tcp: Add more bind() calls.
      selftest: tcp: Add bind() tests for IPV6_V6ONLY.
      selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT.
      ipv6: Fix infinite recursion in fib6_dump_done().

Li Nan (2):
      scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
      block: fix overflow in blk_ioctl_discard()

Linus Torvalds (1):
      Linux 6.9-rc3

Luiz Augusto von Dentz (1):
      Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync

Lukasz Majewski (1):
      net: hsr: Use full string description when opening HSR network device

Luke D. Jones (1):
      ALSA: hda/realtek: cs35l41: Support ASUS ROG G634JYR

Mahmoud Adam (1):
      net/rds: fix possible cp null dereference

Marc Zyngier (2):
      arm64: Fix early handling of FEAT_E2H0 not being implemented
      KVM: arm64: Rationalise KVM banner output

Marco Pinna (1):
      vsock/virtio: fix packet delivery to tap device

Mark Brown (1):
      arm64/ptrace: Use saved floating point state type to determine SVE layout

Masahiro Yamada (2):
      riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/
      riscv: compat_vdso: align VDSOAS build log

Matthew Brost (1):
      drm/xe: Use ordered wq for preempt fence waiting

Michael Krummsdorf (1):
      net: dsa: mv88e6xxx: fix usable ports on 88e6020

Namjae Jeon (3):
      ksmbd: don't send oplock break if rename fails
      ksmbd: validate payload size in ipc response
      ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1

Natanael Copa (1):
      tools/resolve_btfids: fix build with musl libc

Nikita Kiryushin (1):
      tg3: Remove residual error handling in tg3_suspend

Nikita Travkin (2):
      thermal: gov_power_allocator: Allow binding without cooling devices
      thermal: gov_power_allocator: Allow binding without trip points

Oleksandr Natalenko (1):
      drm/display: fix typo

Oliver Upton (1):
      KVM: arm64: Fix host-programmed guest events in nVHE

Oswald Buddenhagen (1):
      Revert "ALSA: emu10k1: fix synthesizer sample playback position
and caching"

Pablo Neira Ayuso (5):
      netfilter: nf_tables: release batch on table validation from abort path
      netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
      netfilter: nf_tables: flush pending destroy work before exit_net release
      netfilter: nf_tables: reject new basechain after table flag update
      netfilter: nf_tables: discard table flag update with pending
basechain deletion

Paolo Abeni (2):
      mptcp: prevent BPF accessing lowat from a subflow socket.
      Revert "tg3: Remove residual error handling in tg3_suspend"

Paolo Bonzini (3):
      KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
      Documentation: kvm/sev: separate description of firmware
      Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP

Paul Barker (2):
      net: ravb: Always process TX descriptor ring
      net: ravb: Always update error counters

Paulo Alcantara (14):
      smb: client: fix UAF in smb2_reconnect_server()
      smb: client: guarantee refcounted children from parent session
      smb: client: refresh referral without acquiring refpath_lock
      smb: client: handle DFS tcons in cifs_construct_tcon()
      smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
      smb: client: fix potential UAF in cifs_debug_files_proc_show()
      smb: client: fix potential UAF in cifs_dump_full_key()
      smb: client: fix potential UAF in cifs_stats_proc_write()
      smb: client: fix potential UAF in cifs_stats_proc_show()
      smb: client: fix potential UAF in smb2_is_valid_lease_break()
      smb: client: fix potential UAF in smb2_is_valid_oplock_break()
      smb: client: fix potential UAF in is_valid_oplock_break()
      smb: client: fix potential UAF in smb2_is_network_name_deleted()
      smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()

Peter Collingbourne (1):
      stackdepot: rename pool_index to pool_index_plus_1

Peter Ujfalusi (19):
      ASoC: SOF: Add dsp_max_burst_size_in_ms member to snd_sof_pcm_stream
      ASoC: SOF: ipc4-topology: Save the DMA maximum burst size for PCMs
      ASoC: SOF: Intel: hda-pcm: Use dsp_max_burst_size_in_ms to place
constraint
      ASoC: SOF: Intel: hda: Implement get_stream_position (Linear
Link Position)
      ASoC: SOF: Intel: mtl/lnl: Use the generic get_stream_position callback
      ASoC: SOF: Introduce a new callback pair to be used for PCM
delay reporting
      ASoC: SOF: Intel: Set the dai/host get frame/byte counter callbacks
      ASoC: SOF: ipc4-pcm: Use the snd_sof_pcm_get_dai_frame_counter()
for pcm_delay
      ASoC: SOF: Intel: hda-common-ops: Do not set the
get_stream_position callback
      ASoC: SOF: Remove the get_stream_position callback
      ASoC: SOF: ipc4-pcm: Move struct sof_ipc4_timestamp_info
definition locally
      ASoC: SOF: ipc4-pcm: Combine the SOF_IPC4_PIPE_PAUSED cases in pcm_trigger
      ASoC: SOF: ipc4-pcm: Invalidate the stream_start_offset in PAUSED state
      ASoC: SOF: sof-pcm: Add pointer callback to sof_ipc_pcm_ops
      ASoC: SOF: ipc4-pcm: Correct the delay calculation
      ALSA: hda: Add pplcllpl/u members to hdac_ext_stream
      ASoC: SOF: Intel: hda: Compensate LLP in case it is not reset
      ASoC: SOF: Intel: hda-dsp: Skip IMR boot on ACE platforms in
case of S3 suspend
      ASoC: SOF: Intel: lnl: Disable DMIC/SSP offload on remove

Peter Wang (2):
      scsi: ufs: core: WLUN suspend dev/link state error recovery
      scsi: ufs: core: Fix MCQ mode dev command timeout

Petr Oros (1):
      ice: fix enabling RX VLAN filtering

Phil Elwell (1):
      net: bcmgenet: Reset RBUF on first open

Pierre-Louis Bossart (6):
      ASoC: rt5682-sdw: fix locking sequence
      ASoC: rt711-sdca: fix locking sequence
      ASoC: rt711-sdw: fix locking sequence
      ASoC: rt712-sdca-sdw: fix locking sequence
      ASoC: rt722-sdca-sdw: fix locking sequence
      ASoC: rt-sdw*: add __func__ to all error logs

Piotr Wejman (1):
      net: stmmac: fix rx queue priority assignment

Pu Lehui (1):
      drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported

Rander Wang (1):
      ASoC: SOF: mtrace: rework mtrace timestamp setting

Randy Dunlap (7):
      9p/trans_fd: remove Excess kernel-doc comment
      time/timecounter: Fix inline documentation
      time/timekeeping: Fix kernel-doc warnings and typos
      timers: Fix kernel-doc format and add Return values
      tick/sched: Fix various kernel-doc warnings
      tick/sched: Fix struct tick_sched doc warnings
      timers: Fix text inconsistencies and spelling

Reinette Chatre (1):
      x86/resctrl: Fix uninitialized memory read when last CPU of
domain goes offline

Richard Fitzgerald (3):
      ASoC: wm_adsp: Fix missing mutex_lock in wm_adsp_write_ctl()
      regmap: maple: Fix cache corruption in regcache_maple_drop()
      regmap: maple: Fix uninitialized symbol 'ret' warnings

Ritvik Budhiraja (1):
      smb3: retrying on failed server close

Rob Clark (1):
      drm/prime: Unbreak virtgpu dma-buf export

Rob Herring (1):
      MAINTAINERS: Add TPM DT bindings to TPM maintainers

Roberto Sassu (1):
      security: Place security_path_post_mknod() where the original IMA call was

Sami Tolvanen (1):
      riscv: Mark __se_sys_* functions __used

Samuel Holland (2):
      riscv: mm: Fix prototype to avoid discarding const
      riscv: Fix spurious errors from __get/put_kernel_nofault

Sean Christopherson (5):
      KVM: SVM: Set sev->asid in sev_asid_new() instead of overloading
the return
      KVM: SVM: Use unsigned integers when dealing with ASIDs
      KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init
SEV/SEV-ES
      KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's
arch timer test
      x86/cpufeatures: Add CPUID_LNX_5 to track recently added
Linux-defined word

Sergey Shtylyov (1):
      of: module: prevent NULL pointer dereference in vsnprintf()

Simon Trimmer (3):
      ASoC: cs-amp-lib: Check for no firmware controls when writing calibration
      ALSA: hda: cs35l56: Add ACPI device match tables
      ALSA: hda/realtek: Add quirks for ASUS Laptops using CS35L56

Stefan O'Rear (1):
      riscv: process: Fix kernel gp leakage

Stephen Horvath (1):
      ACPI: thermal: Register thermal zones without valid trip points

Stephen Lee (1):
      ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw

Su Hui (1):
      octeontx2-pf: check negative error code in otx2_open()

Sumanth Korikkar (1):
      s390/entry: align system call table on 8 bytes

Takashi Iwai (1):
      ALSA: line6: Zero-initialize message buffers

Tariq Toukan (1):
      MAINTAINERS: mlx5: Add Tariq Toukan

Thomas Bertschinger (1):
      bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()

Thomas Hellström (4):
      drm/xe: Use ring ops TLB invalidation for rebinds
      drm/xe: Rework rebinding
      drm/xe: Make TLB invalidation fences unordered
      drm/xe: Move vma rebinding to the drm_exec locking loop

Thomas Richter (1):
      s390/pai: fix sampling event removal for PMU device driver

Uladzislau Rezki (Sony) (2):
      mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
      mm: vmalloc: fix lockdep warning

Uros Bizjak (1):
      x86/bpf: Fix IP after emitting call depth accounting

Uwe Kleine-König (2):
      pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()
      OSS: dmasound/paula: Mark driver struct with __refdata to
prevent section mismatch

Victor Isaev (1):
      RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ

Vijendar Mukunda (3):
      ASoC: amd: acp: fix for acp pdm configuration check
      ASoC: amd: acp: fix for acp_init function error handling
      ASoC: SOF: amd: fix for false dsp interrupts

Ville Syrjälä (2):
      drm/i915/mst: Limit MST+DSC to TGL+
      drm/i915/mst: Reject FEC+MST on ICL

Vincent Guittot (1):
      PM: EM: fix wrong utilization estimation in em_cpu_energy()

Vitaly Chikunov (1):
      tracing: Fix documentation on tp_printk cmdline option

Vitaly Kuznetsov (3):
      KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
      KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
      KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting
is disabled

Vitaly Lifshits (2):
      e1000e: Workaround for sporadic MDI error on Meteor Lake systems
      e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue

Vladimir Isaev (1):
      riscv: hwprobe: do not produce frtace relocation

Wei Fang (1):
      net: fec: Set mac_managed_pm during probe

Weiji Wang (1):
      docs: zswap: fix shell command format

Will Deacon (4):
      KVM: arm64: Don't defer TLB invalidation when zapping table entries
      KVM: arm64: Don't pass a TLBI level hint when zapping table entries
      KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
      KVM: arm64: Ensure target address is granule-aligned for range TLBI

William Tu (1):
      Documentation: Add documentation for eswitch attribute

Wujie Duan (1):
      KVM: arm64: Fix out-of-IPA space translation fault handling

Xiaoyao Li (2):
      x86/kvm: Use separate percpu variable to track the enabling of asyncpf
      KVM: x86: Improve documentation of MSR_KVM_ASYNC_PF_EN

Yihang Li (1):
      scsi: libsas: Align SMP request allocation to ARCH_DMA_MINALIGN

Zhang Yi (4):
      ASoC: codecs: ES8326: Solve error interruption issue
      ASoC: codecs: ES8326: modify clock table
      ASoC: codecs: ES8326: Solve a headphone detection issue after
suspend and resume
      ASoC: codecs: ES8326: Removing the control of ADC_SCALE

Ziyang Xuan (1):
      netfilter: nf_tables: Fix potential data-race in
__nft_flowtable_type_get()

zhuxiaohui (1):
      bcachefs: add REQ_SYNC and REQ_IDLE in write dio

^ permalink raw reply	[relevance 41%]

* Re: More annoying code generation by clang
  2024-04-06 15:39 99%     ` Linus Torvalds
@ 2024-04-06 16:04 87%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-06 16:04 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
	Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List

On Sat, 6 Apr 2024 at 08:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Because this code actually requires a data-depencency and not a
> control dependency as a correctness issue because of Spectre-v1.

Just to clarify: our comments in this code are maybe a bit odd,
because our comments are not about the Spectre-v1 issue (which
predates a rewrite) and more about the odd RCU pattern and conditional
avoidance we use here:

        unsigned long nospec_mask;

        /* Mask is a 0 for invalid fd's, ~0 for valid ones */
        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

        /*
         * fdentry points to the 'fd' offset, or fdt->fd[0].
         * Loading from fdt->fd[0] is always safe, because the
         * array always exists.
         */
        fdentry = fdt->fd + (fd & nospec_mask);

        /* Do the load, then mask any invalid result */
        file = rcu_dereference_raw(*fdentry);

where *normally* (if RCU wasn't an issue) we'd just write this as

        file = fdt->fd[array_index_nospec(fd, fdt->max_fds)];

where the key part is that "nospec" array indexing that will not
speculatively access the array past the "max_fds".

IOW, the code naively would want to do just

        if (fd < fdt->max_fds) {
                 file = fdt->fd[fd];
                ...

but we need to make sure that it can't be fooled into using a branch
mispredict and use a user-controlled index ("fd") to speculatively
access the array with an arbitrary index and then leak unrelated data
through some side channel (mostly cache access).

And while the normal pattern doesn't expose the mask generation and
just hides that mask in that simpler "array_index_nospec()" macro,
this code actually ends up using the same mask *twice*, because it
will later end up doing this hack:

        file = (void *)(nospec_mask & (unsigned long)file);
        if (unlikely(!file))
                return NULL;

to have just one single conditional at the end (ie we may have loaded
a non-NULL file pointer from fdt->fd[0] because an invalid index got
masked down to a zero index, and the second masking will mask away
that pointer and make it NULL because we're bad people and we know
that NULL is "bitpattern 0" and we care about the code working, not
about some unreal "NULL could be anything else" thing.

End result: this code that is just a few lines long and has more
comments than code, and generates only a handful of instruction is
fairly subtle but also fairly important both for hardware security
issues and for performance.

See commit 253ca8678d30 ("Improve __fget_files_rcu() code generation
(and thus __fget_light())" that actually started doing this "use mask
twice", and realize that that commit is what this performance
regression report is talking about:

    https://lore.kernel.org/all/ZWQ+LEcfFFi4YOAU@xsang-OptiPlex-9020/

ie that whole "use masks and avoid doing the obvious thing" may be a
bit subtle, but it's what turned a 2.9% performance regression into a
3.4% improvement.

(Ok, those performance numbers are on just one random microbenchmark
and don't really matter, so take that with a pinch of salt, but if you
care about a _lot_ of random benchmarks, eventually you get good
performance overall).

Anyway, hopefully that explains the dual issue here: we care about
performance, but we also have to use a specific instruction pattern,
and can't just hope for the best.

                    Linus

^ permalink raw reply	[relevance 87%]

* Re: More annoying code generation by clang
  @ 2024-04-06 15:39 99%     ` Linus Torvalds
  2024-04-06 16:04 87%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-06 15:39 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
	Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List

On Sat, 6 Apr 2024 at 05:30, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> FYI, please note that gcc-12 is able to synthesize carry-flag compares
> on its own:

Oh, gcc has been able to do that for much longer than that. It's a
idiomatic i386 pattern, and gcc has generated it for as long as I can
remember.

HOWEVER.

There's a big difference between "able to" and "GUARANTEED to".

Because this code actually requires a data-depencency and not a
control dependency as a correctness issue because of Spectre-v1.

So while I know very well that gcc _can_ do it, I also know very well
that there are absolutely no guarantees that gcc won't use a
conditional branch instead.

So this code is needs to generate good code because it's actually
important code that shows up in benchmarks, but this code also needs
to generate a very _particular_ pattern of code, and it's not good
enough that gcc may "happen" to generate that pattern of code.

Thus the inline asm.

               Linus

^ permalink raw reply	[relevance 99%]

* More annoying code generation by clang
@ 2024-04-04 22:53 81% Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-04 22:53 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3725 bytes --]

So this doesn't really matter in any real life situation, but it
really grated on me.

Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.

See for example

    https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/

where I was ranting about clang just doing pointlessly stupid things.

However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.

We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:

                /* Mask is a 0 for invalid fd's, ~0 for valid ones */
                nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

and clang makes a *horrid* mess of it, generating this code:

        movl    %edi, %r14d
        movq    32(%rbx), %rdx
        movl    (%rdx), %eax
        movq    %rax, 8(%rsp)
        cmpq    8(%rsp), %r14
        sbbq    %rcx, %rcx

which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.

In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.

All clang's fault, right?

Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.

This is what we do:

  static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
                unsigned long size)
  {
        unsigned long mask;

        asm volatile ("cmp %1,%2; sbb %0,%0;"
                        :"=r" (mask)
                        :"g"(size),"r" (index)
                        :"cc");
        return mask;
  }

and look at the use again:

        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.

So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.

Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"

Anyway, I can get gcc to generate the right code:

        movq    32(%r13), %rdx
        cmp (%rdx),%ebx
        sbb %esi,%esi

which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate

        movq    32(%rdi), %rcx
        movl    (%rcx), %eax
        cmpl    %eax, %esi
        sbbl    %esi, %esi

which is the same thing, but with that (pointless) load to a register.

And now clang doesn't generate that stack frame at all.

Anyway, this was a long email to explain the odd attached patch.

Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1122 bytes --]

 arch/x86/include/asm/barrier.h | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..6159d2cbbfde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,15 @@
  * Returns:
  *     0 - (index < size)
  */
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
-		unsigned long size)
-{
-	unsigned long mask;
-
-	asm volatile ("cmp %1,%2; sbb %0,%0;"
-			:"=r" (mask)
-			:"g"(size),"r" (index)
-			:"cc");
-	return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({	\
+	typeof((idx)+(sz)) __idx = (idx);	\
+	typeof(__idx) __sz = (sz);		\
+	typeof(__idx) __mask;			\
+	asm volatile ("cmp %1,%2; sbb %0,%0"	\
+			:"=r" (__mask)		\
+			:"ir"(__sz),"r" (__idx)	\
+			:"cc");			\
+	__mask; })
 
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() asm volatile("lfence":::"memory")

^ permalink raw reply related	[relevance 81%]

* Re: user-space concurrent pipe buffer scheduler interactions
  @ 2024-04-03 20:57 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 20:57 UTC (permalink / raw)
  To: Michael Clark; +Cc: Jens Axboe, Ingo Molnar, Peter Zijlstra, linux-kernel

On Wed, 3 Apr 2024 at 13:52, Michael Clark <michael@metaparadigm.com> wrote:
>
> On 4/4/24 05:56, Linus Torvalds wrote:
> > On Tue, 2 Apr 2024 at 13:54, Michael Clark <michael@metaparadigm.com> wrote:
> >>
> >> I am working on a low latency cross-platform concurrent pipe buffer
> >> using C11 threads and atomics.
> >
> > You will never get good performance doing spinlocks in user space
> > unless you actually tell the scheduler about the spinlocks, and have
> > some way to actually sleep on contention.
> >
> > Which I don't see you as having.
>
> We can work on this.

It's been tried.

Nobody ever found a use-case that is sufficiently convincing, but see
the write-up at

   https://lwn.net/Articles/944895/

for a pointer to at least attempts.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns)
  @ 2024-04-03 17:13 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 17:13 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Ingo Molnar

On Wed, 3 Apr 2024 at 10:05, Borislav Petkov <bp@alien8.de> wrote:
>
> Can you pls replace it with the below one?

Ok, done.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [RESEND][PATCH v3] security: Place security_path_post_mknod() where the original IMA call was
  @ 2024-04-03 16:59 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 16:59 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: viro, brauner, jack, paul, jmorris, serge, zohar, linux-fsdevel,
	linux-kernel, linux-security-module, linux-cifs, linux-integrity,
	pc, Roberto Sassu, Steve French

On Wed, 3 Apr 2024 at 02:10, Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> Move security_path_post_mknod() where the ima_post_path_mknod() call was,
> which is obviously correct from IMA/EVM perspective. IMA/EVM are the only
> in-kernel users, and only need to inspect regular files.

Thanks, applied,

              Linus

^ permalink raw reply	[relevance 99%]

* Re: user-space concurrent pipe buffer scheduler interactions
  @ 2024-04-03 16:56 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-03 16:56 UTC (permalink / raw)
  To: Michael Clark; +Cc: Jens Axboe, Ingo Molnar, Peter Zijlstra, linux-kernel

On Tue, 2 Apr 2024 at 13:54, Michael Clark <michael@metaparadigm.com> wrote:
>
> I am working on a low latency cross-platform concurrent pipe buffer
> using C11 threads and atomics.

You will never get good performance doing spinlocks in user space
unless you actually tell the scheduler about the spinlocks, and have
some way to actually sleep on contention.

Which I don't see you as having.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns)
  @ 2024-04-03 16:45 99%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-03 16:45 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Ingo Molnar

On Wed, 3 Apr 2024 at 05:24, Borislav Petkov <bp@alien8.de> wrote:
>
> Subject: [PATCH] x86/retpoline: Fix a missing return thunk warning

Thanks, applied directly,

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] security changes for v6.9-rc3
  @ 2024-04-02 21:35 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-02 21:35 UTC (permalink / raw)
  To: Al Viro
  Cc: Roberto Sassu, linux-integrity, linux-security-module,
	linux-fsdevel, linux-cifs, linux-kernel, Roberto Sassu

On Tue, 2 Apr 2024 at 14:00, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         1) location of that hook is wrong.  It's really "how do we catch
> file creation that does not come through open() - yes, you can use
> mknod(2) for that".  It should've been after the call of vfs_create(),
> not the entire switch.  LSM folks have a disturbing fondness of inserting
> hooks in various places, but IMO this one has no business being where
> they'd placed it.  Bikeshedding regarding the name/arguments/etc. for
> that thing is, IMO, not interesting...

Hmm. I guess that's right - for a non-file node, there's nothing that
the security layer can really check after-the-fact anyway.

It's not like you can attest the contents of a character device or whatever...

>         2) the only ->mknod() instance in the tree that tries to leave
> dentry unhashed negative on success is CIFS (and only one case in it).
> From conversation with CIFS folks it's actually cheaper to instantiate
> in that case as well - leaving instantiation to the next lookup will
> cost several extra roundtrips for no good reason.

Ack.

>         3) documentation (in vfs.rst) is way too vague.  The actual
> rules are
>         * ->create() must instantiate on success
>         * ->mkdir() is allowed to return unhashed negative on success and
> it might be forced to do so in some cases.  If a caller of vfs_mkdir()
> wants the damn thing positive, it should account for such possibility and do
> a lookup.  Normal callers don't care; see e.g. nfsd and overlayfs for example
> of those that do.
>         * ->mknod() is interesting - historically it had been "may leave
> unhashed negative", but e.g. unix_bind() expected that it won't do so;
> the reason it didn't blow up for CIFS is that this case (SFU) of their mknod()
> does not support FIFOs and sockets anyway.  Considering how few instances
> try to make use of that option and how it doesn't actually save them
> anything, I would prefer to declare that ->mknod() should act as ->create().
>         * ->symlink() - not sure; there are instances that make use of that
> option (coda and hostfs).  OTOH, the only callers of vfs_symlink() that
> care either way are nfsd and overlayfs, and neither is usable with coda
> or hostfs...  Could go either way, but we need to say it clearly in the
> docs, whichever way we choose.

Fair enough.

Anyway, it does sound like maybe the minimal fix would be just that
"move it into the
                case 0: case S_IFREG:
path".

Although if somebody already has the cifs patch to just do the
d_instantiate() for mknod, that might be even better.

I will leave this in more competent hands for now.

Let the bike-shedding commence,

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] security changes for v6.9-rc3
  2024-04-02 19:39 92% ` Linus Torvalds
@ 2024-04-02 19:57 96%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-02 19:57 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: linux-integrity, linux-security-module, linux-fsdevel,
	linux-cifs, linux-kernel, Roberto Sassu

On Tue, 2 Apr 2024 at 12:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>

>    void security_path_post_mknod(struct mnt_idmap *idmap, struct dentry *dentry)
>    {
>   -     if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
>   +     struct inode *inode = d_backing_inode(dentry);
>   +     if (unlikely(!inode || IS_PRIVATE(inode)))
>                 return;
>         call_void_hook(path_post_mknod, idmap, dentry);

Hmm. We do have other hooks that get called for this case.

For fsnotify_create() we actually have a comment about this:

 * fsnotify_create - 'name' was linked in
 *
 * Caller must make sure that dentry->d_name is stable.
 * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate
 * ->d_inode later

and audit_inode_child() ends up having a

        if (inode)
                handle_one(inode);

in it.

So in other cases we do handle the NULL, but it does seem like the
other cases actually do validaly want to deal with this (ie the
fsnotify case will say "the directory that mknod was done in was
changed" even if it doesn't know what the change is.

But for the security case, it really doesn't seem to make much sense
to check a mknod() that you don't know the result of.

I do wonder if that "!inode" test might also be more specific with
"d_unhashed(dentry)". But that would only make sense if we moved this
test from security_path_post_mknod() into the caller itself, ie we
could possibly do something like this instead (or in addition to):

  -     if (error)
  -             goto out2;
  -     security_path_post_mknod(idmap, dentry);
  +     if (!error && !d_unhashed(dentry))
  +             security_path_post_mknod(idmap, dentry);

which might also be sensible.

Al? Anybody?

                Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] security changes for v6.9-rc3
  @ 2024-04-02 19:39 92% ` Linus Torvalds
  2024-04-02 19:57 96%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-02 19:39 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: linux-integrity, linux-security-module, linux-fsdevel,
	linux-cifs, linux-kernel, Roberto Sassu

On Tue, 2 Apr 2024 at 07:12, Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> A single bug fix to address a kernel panic in the newly introduced function
> security_path_post_mknod.

So I've pulled from you before, but I still don't have a signature
chain for your key (not that I can even find the key itself, much less
a signature chain).

Last time I pulled, it was after having everybody else just verify the
actual commit.

This time, the commit looks like a valid "avoid NULL", but I have to
say that I also think the security layer code in question is ENTIRELY
WRONG.

IOW, as far as I can tell, the mknod() system call may indeed leave
the dentry unhashed, and rely on anybody who then wants to use the new
special file to just do a "lookup()" to actually use it.

HOWEVER.

That also means that the whole notion of "post_path_mknod() is
complete and utter hoghwash. There is not anything that the security
layer can possibly validly do.

End result: instead of checking the 'inode' for NULL, I think the
right fix is to remove that meaningless security hook. It cannot do
anything sane, since one option is always 'the inode hasn't been
initialized yet".

Put another way: any security hook that checks inode in
security_path_post_mknod() seems simply buggy.

But if we really want to do this ("if mknod creates a positive dentry,
I won't see it in lookup, so I want to appraise it now"), then we
should just deal with this in the generic layer with some hack like
this:

  --- a/security/security.c
  +++ b/security/security.c
  @@ -1801,7 +1801,8 @@ EXPORT_SYMBOL(security_path_mknod);
    */
   void security_path_post_mknod(struct mnt_idmap *idmap, struct dentry *dentry)
   {
  -     if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
  +     struct inode *inode = d_backing_inode(dentry);
  +     if (unlikely(!inode || IS_PRIVATE(inode)))
                return;
        call_void_hook(path_post_mknod, idmap, dentry);
   }

and IMA and EVM would have to do any validation at lookup() time for
the cases where the dentry wasn't hashed by ->mknod.

Anyway, all of this is to say that I don't feel like I can pull this without
 (a) more acks by people
and
 (b) explanations for why the simpler fix to just
security_path_post_mknod() isn't the right fix.

                 Linus

^ permalink raw reply	[relevance 92%]

* Linux 6.9-rc2
@ 2024-03-31 22:05 43% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-31 22:05 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Neither snow nor rain nor heat nor gloom of night stays kernel rc releases.

Nor does Easter.

So here we are. Another week has passed, and rc2 is out. Nothing here
look all that remarkable, and the fixes are fairly evenly spread out
(so mostly drivers, because that's the bulk of the code).

Outside of the driver fixes (see shortlog below for details), we've
got some more selftest work (mostly networking and bpf but also some
random fixes), some architecture fixes (mostly x86), some filesystem
work (xfs and btrfs) and random noise in other parts (mm, core kernel,
networking, Kbuild..).

Nothing stands out to me or looks unusual.

                 Linus

---

Alan Stern (3):
      USB: core: Fix deadlock in usb_deauthorize_interface()
      USB: core: Add hub_get() and hub_put() routines
      USB: core: Fix deadlock in port "disable" sysfs attribute

Alexander Stein (1):
      Revert "usb: phy: generic: Get the vbus supply"

Alexander Wetzel (1):
      scsi: sg: Avoid sg device teardown race

Alexandra Winter (1):
      s390/qeth: handle deferred cc1

Alexei Starovoitov (4):
      bpf: Clarify bpf_arena comments.
      libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
      selftests/bpf: Remove hard coded PAGE_SIZE macro.
      selftests/bpf: Add arena test case for 4Gbyte corner case

Anand Jain (2):
      btrfs: validate device maj:min during open
      btrfs: return accurate error code on open failure in open_fs_devices()

Andrei Matei (2):
      bpf: Check bloom filter map value size
      bpf: Protect against int overflow for stack access size

Andrew Price (1):
      gfs2: Fix invalid metadata access in punch_hole

Andrii Nakryiko (1):
      libbpf: fix u64-to-pointer cast on 32-bit arches

Andy Shevchenko (1):
      gpiolib: Fix debug messaging in gpiod_find_and_request()

Andy Yan (1):
      drm/rockchip: vop2: Remove AR30 and AB30 format support

Ard Biesheuvel (3):
      x86/efistub: Add missing boot_params for mixed mode compat entry
      efi/libstub: Cast away type warning in use of max()
      x86/efistub: Reinstate soft limit for initrd loading

Arnaldo Carvalho de Melo (1):
      libbpf: Define MFD_CLOEXEC if not available

Arnd Bergmann (6):
      staging: vc04_services: changen strncpy() to strscpy_pad()
      irqchip/armada-370-xp: Suppress unused-function warning
      ACPI: APEI: EINJ: mark remove callback as non-__exit
      ALSA: aoa: avoid false-positive format truncation warning
      dm integrity: fix out-of-range warning
      kbuild: make -Woverride-init warnings more consistent

Artem Savkov (1):
      arm64: bpf: fix 32bit unconditional bswap

Arınç ÜNAL (1):
      net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530

Ayala Beker (1):
      wifi: mac80211: correctly set active links upon TTLM

Baoquan He (1):
      crash: use macro to add crashk_res into iomem early for specific arch

Barry Song (1):
      mm: zswap: fix kernel BUG in sg_init_one

Bartosz Golaszewski (1):
      gpio: cdev: sanitize the label before requesting the interrupt

Benjamin Berg (2):
      wifi: iwlwifi: mvm: guard against invalid STA ID on removal
      wifi: iwlwifi: mvm: include link ID when releasing frames

Bhanuprakash Modem (2):
      drm/i915/drrs: Refactor CPU transcoder DRRS check
      drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status

Bikash Hazarika (1):
      scsi: qla2xxx: Update manufacturer detail

Bjørn Mork (1):
      net: wwan: t7xx: Split 64bit accesses to fix alignment issues

Borislav Petkov (AMD) (3):
      x86/vdso: Fix rethunk patching for vdso-image-x32.o too
      x86/bugs: Fix the SRSO mitigation on Zen3/4
      kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries

Brent Lu (2):
      ALSA: hda: intel-nhlt: add intel_nhlt_ssp_device_type() function
      ASoC: SOF: ipc4-topology: support NHLT device type

Carlos Maiolino (1):
      tmpfs: fix race on handling dquot rbtree

Chris Bainbridge (1):
      drm/dp: Fix divide-by-zero regression on DP MST unplug with nouveau

Chris Park (1):
      drm/amd/display: Prevent crash when disable stream

Chris Wilson (1):
      drm/i915/gt: Reset queue_priority_hint on parking

Christian A. Ehrhardt (5):
      usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
      usb: typec: ucsi: Check for notifications after init
      usb: typec: ucsi: Ack unsupported commands
      usb: typec: ucsi_acpi: Refactor and fix DELL quirk
      usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset

Christian Marangi (1):
      net: phy: qcom: at803x: fix kernel panic with at8031_probe

Christoph Hellwig (1):
      block: don't reject too large max_user_sectors in blk_validate_limits

Chuck Lever (2):
      SUNRPC: Revert 561141dd494382217bace4d1a51d08168420eace
      NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies

Claus Hansen Ries (1):
      net: ll_temac: platform_get_resource replaced by wrong function

Colin Ian King (2):
      scsi: target: iscsi: Remove unused variable xfer_len
      fs/9p: remove redundant pointer v9ses

Cong Liu (1):
      tools/Makefile: remove cgroup target

Damien Le Moal (2):
      scsi: sd: Fix TCG OPAL unlock on system resume
      block: Do not force full zone append completion in req_bio_endio()

Dan Carpenter (2):
      nexthop: fix uninitialized variable in nla_put_nh_group_stats()
      staging: vc04_services: fix information leak in create_component()

Daniel Lezcano (1):
      Revert "thermal: core: Don't update trip points inside the
hysteresis range"

Dave Airlie (1):
      drm/i915: add bug.h include to i915_memcpy.c

Dave Chinner (2):
      xfs: allow sunit mount option to repair bad primary sb stripe values
      xfs: don't use current->journal_info

David Gow (1):
      kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests

David Howells (1):
      cifs: Fix duplicate fscache cookie warnings

David Thompson (2):
      mlxbf_gige: stop PHY during open() error paths
      mlxbf_gige: call request_irq() after NAPI initialized

Dmitry Baryshkov (1):
      scsi: ufs: qcom: Provide default cycles_in_1us value

Duoming Zhou (2):
      nouveau/dmem: handle kcalloc() allocation failure
      ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs

Edward Liaw (2):
      selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM
      selftests/mm: fix ARM related issue with fork after pthread_create

Emmanuel Grumbach (1):
      wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF

Eric Biggers (1):
      Revert "crypto: pkcs7 - remove sha1 support"

Eric Dumazet (1):
      tcp: properly terminate timers for kernel sockets

Eric Huang (1):
      drm/amdkfd: fix TLB flush after unmap for GFX9.4.2

Eric Van Hensbergen (1):
      fs/9p: fix uninitialized values during inode evict

Felix Fietkau (1):
      wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes

Filipe Manana (4):
      btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()
      btrfs: fix warning messages not printing interval at unpin_extent_range()
      btrfs: fix message not properly printing interval when adding extent map
      btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()

Florian Westphal (1):
      inet: inet_defrag: prevent sk release while still in use

Francesco Dolcini (1):
      MAINTAINERS: wifi: mwifiex: add Francesco as reviewer

Gao Xiang (1):
      erofs: drop experimental warning for FSDAX

George Shen (1):
      drm/amd/display: Remove MPC rate control logic from DCN30 and above

Gergo Koteles (4):
      ALSA: hda/tas2781: remove digital gain kcontrol
      ALSA: hda/tas2781: add locks to kcontrols
      ALSA: hda/tas2781: add debug statements to kcontrols
      ALSA: hda/tas2781: remove useless dev_dbg from playback_hook

Guilherme G. Piccoli (1):
      scsi: core: Fix unremoved procfs host directory regression

Hamza Mahfooz (1):
      drm/amd/display: fix IPX enablement

Hangbin Liu (1):
      scripts/bpf_doc: Use silent mode when exec make cmd

Hari Bathini (1):
      bpf: fix warning for crash_kexec

Hariprasad Kelam (1):
      Octeontx2-af: fix pause frame configuration in GMP mode

Harry Wentland (1):
      Revert "drm/amd/display: Fix sending VSC (+ colorimetry) packets
for DP/eDP displays without PSR"

Heikki Krogerus (1):
      usb: dwc3: pci: Drop duplicate ID

Herve Codina (1):
      net: wan: framer: Add missing static inline qualifiers

Ido Schimmel (2):
      ipv6: Fix address dump when IPv6 is disabled on an interface
      selftests: vxlan_mdb: Fix failures with old libnet

Igor Artemiev (1):
      wifi: cfg80211: fix rdev_dump_mpp() arguments order

Ilan Peer (1):
      wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW

Ilya Leoshkevich (1):
      s390/bpf: Fix bpf_plt pointer arithmetic

Ingo Molnar (2):
      Documentation/x86: Fix title underline length
      Revert "x86/mm/ident_map: Use gbpages only where full GB page
should be mapped."

Isak Ellmer (1):
      kconfig: Fix typo HEIGTH to HEIGHT

Jakub Kicinski (2):
      tools: ynl: fix setting presence bits in simple nests
      selftests: netdevsim: set test timeout to 10 minutes

Jameson Thies (1):
      usb: typec: ucsi: Check capabilities before cable and identity discovery

Jan Kara (1):
      nfsd: Fix error cleanup path in nfsd_rename()

Janusz Krzysztofik (2):
      drm/i915/hwmon: Fix locking inversion in sysfs getter
      drm/i915/vma: Fix UAF on destroy against retire race

Jason Gunthorpe (2):
      iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V
      iommu: Validate the PASID in iommu_attach_device_pasid()

Jeff Johnson (1):
      wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc

Jesse Brandeburg (1):
      ice: fix memory corruption bug with suspend and rebuild

Jian Shen (1):
      net: hns3: mark unexcuted loopback test result as UNEXECUTED

Jie Wang (1):
      net: hns3: fix index limit to support all queue stats

Jocelyn Falempe (1):
      drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed

Johan Hovold (1):
      wifi: mac80211: fix mlme_link_id_dbg()

Johannes Berg (8):
      wifi: cfg80211: add a flag to disable wireless extensions
      wifi: iwlwifi: mvm: disable MLO for the time being
      wifi: mac80211: fix prep_connection error path
      wifi: iwlwifi: mvm: rfi: fix potential response leaks
      wifi: iwlwifi: fw: don't always use FW dump trig
      wifi: iwlwifi: read txq->read_ptr under lock
      wifi: iwlwifi: mvm: handle debugfs names more carefully
      kunit: fix wireless test dependencies

Johannes Thumshirn (3):
      btrfs: zoned: use zone aware sb location for scrub
      btrfs: zoned: fix use-after-free in do_zone_finish()
      btrfs: zoned: don't skip block groups with 100% zone unusable

Johannes Weiner (4):
      mm: cachestat: fix two shmem bugs
      mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion
      mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices
      drm/amdgpu: fix deadlock while reading mqd from debugfs

John Garry (1):
      block: Make blk_rq_set_mixed_merge() static

John Ogness (1):
      printk: Update @console_may_schedule in console_trylock_spinning()

John Sperbeck (1):
      init: open /initrd.image with O_LARGEFILE

Jonathan Kim (1):
      drm/amdkfd: range check cp bad op exception interrupts

Jonathon Hall (1):
      drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()

Joonas Lahtinen (1):
      drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c

José Roberto de Souza (1):
      drm/i915: Do not print 'pxp init failed with 0' when it succeed

Juha-Pekka Heikkila (1):
      drm/i915/display: Disable AuxCCS framebuffers if built for Xe

Justin Chen (2):
      net: bcmasp: Bring up unimac after PHY link up
      net: bcmasp: Remove phy_{suspend/resume}

Justin Stitt (1):
      binfmt: replace deprecated strncpy

Justin Tee (12):
      scsi: lpfc: Remove unnecessary log message in queuecommand path
      scsi: lpfc: Move NPIV's transport unregistration to after
resource clean up
      scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
      scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
      scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
      scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
      scsi: lpfc: Use a dedicated lock for ras_fwlog state
      scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
      scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
      scsi: lpfc: Define types in a union for generic void *context3 ptr
      scsi: lpfc: Update lpfc version to 14.4.0.1
      scsi: lpfc: Copyright updates for 14.4.0.1 patches

Kees Cook (2):
      selftests/exec: execveat: Improve debug reporting
      selftests/exec: Convert remaining /bin/sh to /bin/bash

Ken Raeburn (1):
      dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones

Kevin Loughlin (1):
      x86/sev: Skip ROM range scans and validation for SEV-SNP guests

Krishna Kurapati (1):
      usb: typec: ucsi: Fix race between typec_switch and role_switch

Kuan-Wei Chiu (2):
      MAINTAINERS: remove incorrect M: tag for dm-devel@lists.linux.dev
      MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev

Kuniyuki Iwashima (1):
      netfilter: arptables: Select NETFILTER_FAMILY_ARP when building
arp_tables.c

Kurt Kanzenbach (1):
      igc: Remove stale comment about Tx timestamping

Kyle Tso (3):
      usb: typec: tcpm: Correct port source pdo array in pd_set callback
      usb: typec: tcpm: Update PD of Type-C port upon pd_set
      usb: typec: Return size of buffer if pd_set operation succeeds

Lang Yu (2):
      drm/amdgpu/umsch: update UMSCH 4.0 FW interface
      drm/amdgpu: enable UMSCH 4.0.6

Leonard Crestez (1):
      mailmap: update entry for Leonard Crestez

Liming Sun (1):
      sdhci-of-dwcmshc: disable PM runtime in dwcmshc_remove()

Linus Torvalds (4):
      Fix memory leak in posix_clock_open()
      Fix build errors due to new UIO_MEM_DMA_COHERENT mess
      mm: clean up populate_vma_page_range() FOLL_* flag handling
      Linux 6.9-rc2

Lizhi Xu (1):
      fs/9p: fix uaf in in v9fs_stat2inode_dotl

Lokesh Gidra (1):
      userfaultfd: fix deadlock warning when locking src and dst VMAs

Luca Weiss (1):
      drm/bridge: Select DRM_KMS_HELPER for DRM_PANEL_BRIDGE

Lucas De Marchi (1):
      drm/xe: Fix END redefinition

Mario Limonciello (1):
      drm/amd: Flush GFXOFF requests in prepare stage

Mark Brown (2):
      gpiolib: Add stubs for GPIO lookup functions
      selftests/seccomp: Try to fit runtime of benchmark into timeout

Mark Rutland (1):
      selftests/ftrace: Fix event filter target_func selection

Masahiro Yamada (6):
      cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
      MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
      kconfig: do not reparent the menu inside a choice block
      export.h: remove include/asm-generic/export.h
      modpost: do not make find_tosym() return NULL
      x86/build: Use obj-y to descend into arch/x86/virt/

Masami Hiramatsu (Google) (1):
      tracing: probes: Fix to zero initialize a local variable

Matt Bobrowski (1):
      bpf: update BPF LSM designated reviewer list

Matthew Auld (5):
      drm/xe/guc_submit: use jiffies for job timeout
      drm/xe/queue: fix engine_class bounds check
      drm/xe/device: fix XE_MAX_GT_PER_TILE check
      drm/xe/device: fix XE_MAX_TILES_PER_DEVICE check
      drm/xe/query: fix gt_id bounds check

Matthew Wilcox (Oracle) (1):
      mm: increase folio batch size

Max Filippov (1):
      exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()

Maxim Levitsky (1):
      i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1

Miguel Ojeda (2):
      drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`
      drm/qxl: remove unused variable from `qxl_process_single_command()`

Mikko Rapeli (2):
      mmc: core: Initialize mmc_blk_ioc_data
      mmc: core: Avoid negative index with array access

Mikulas Patocka (1):
      objtool: Fix compile failure when using the x32 compiler

Minas Harutyunyan (5):
      usb: dwc2: host: Fix hibernation flow
      usb: dwc2: host: Fix remote wakeup from hibernation
      usb: dwc2: host: Fix ISOC flow in DDMA mode
      usb: dwc2: gadget: Fix exiting from clock gating
      usb: dwc2: gadget: LPM flow fix

Mostafa Saleh (1):
      iommu/arm-smmu-v3: Fix access for STE.SHCFG

Muhammad Usama Anjum (7):
      scsi: lpfc: Correct size for wqe for memset()
      scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()
      selftests/exec: binfmt_script: Add the overall result line
according to TAP
      selftests/exec: load_address: conform test to TAP format output
      selftests/exec: recursion-depth: conform test to TAP format output
      selftests: mm: restore settings from only parent process
      selftests: dmabuf-heap: add config file for the test

Mukul Joshi (1):
      drm/amdkfd: Check cgroup when returning DMABuf info

Natanel Roizenman (1):
      drm/amd/display: Increase Z8 watermark times.

Nathan Chancellor (2):
      hexagon: vmlinux.lds.S: handle attributes section
      Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer

Neil Armstrong (1):
      Revert "drm/bridge: Select DRM_KMS_HELPER for DRM_PANEL_BRIDGE"

Nikita Kiryushin (1):
      ACPICA: debugger: check status of acpi_evaluate_object() in
acpi_db_walk_for_fields()

Nilesh Javali (1):
      scsi: qla2xxx: Update version to 10.02.09.200-k

Nirmoy Das (1):
      drm/xe: Remove unused xe_bo->props struct

Oliver Neukum (1):
      usb: cdc-wdm: close race between read and workqueue

Oscar Salvador (1):
      mm,page_owner: fix recursion

Pablo Neira Ayuso (3):
      netfilter: nf_tables: reject destroy command to remove basechain hooks
      netfilter: nf_tables: reject table flag and netdev basechain updates
      netfilter: nf_tables: skip netdev hook unregistration if table is dormant

Paul E. McKenney (1):
      x86/nmi: Upgrade NMI backtrace stall checks & messages

Pavel Sakharov (1):
      dma-buf: Fix NULL pointer dereference in sanitycheck()

Peter Wang (1):
      scsi: ufs: core: Add config_scsi_dev vops comment

Peter Xu (1):
      mm/memory: fix missing pte marker for !page on pte zaps

Peyton Lee (1):
      drm/amdgpu/vpe: power on vpe when hw_init

Ping-Ke Shih (2):
      wifi: rtw89: coex: fix configuration for shared antenna for 8922A
      MAINTAINERS: wifi: add git tree for Realtek WiFi drivers

Prasad Pandit (1):
      dpll: indent DPLL option type by a tab

Przemek Kitszel (1):
      ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()

Pu Lehui (1):
      riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi

Puranjay Mohan (5):
      bpf: Temporarily disable atomic operations in BPF arena
      bpf, arm64: fix bug in BPF_LDX_MEMSX
      bpf: verifier: fix addr_space_cast from as(1) to as(0)
      selftests/bpf: verifier_arena: fix mmap address for arm64
      bpf: verifier: reject addr_space_cast insn without arena

Quentin Monnet (1):
      MAINTAINERS: Update email address for Quentin Monnet

Quinn Tran (6):
      scsi: qla2xxx: Prevent command send on chip reset
      scsi: qla2xxx: Fix N2N stuck connection
      scsi: qla2xxx: Split FCE|EFT trace control
      scsi: qla2xxx: NVME|FCP prefer flag not being honored
      scsi: qla2xxx: Fix command flush on cable pull
      scsi: qla2xxx: Delay I/O Abort on PCI error

Rafael J. Wysocki (1):
      genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd

Raju Lakkaraju (1):
      net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips

Ravi Gunasekaran (1):
      net: hsr: hsr_slave: Fix the promiscuous mode in offload mode

Ricardo B. Marliere (5):
      scsi: sg: Make sg_sysfs_class constant
      scsi: pmcraid: Make pmcraid_class constant
      scsi: cxlflash: Make cxlflash_class constant
      scsi: ch: Make ch_sysfs_class constant
      scsi: st: Make st_sysfs_class constant

Rohit Ner (1):
      scsi: ufs: core: Fix MCQ MAC configuration

Romain Naour (1):
      mmc: sdhci-omap: re-tuning is needed after a pm transition to
support emmc HS200 mode

Roman Li (1):
      drm/amd/display: Fix bounds check for dcn35 DcfClocks

Ryosuke Yasuoka (1):
      nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet

Sabrina Dubroca (4):
      tls: recv: process_rx_list shouldn't use an offset with kvec
      tls: adjust recv return with async crypto and failed copy to userspace
      selftests: tls: add test with a partially invalid iov
      tls: get psock ref after taking rxlock to avoid leak

Sandeep Dhavale (1):
      MAINTAINERS: erofs: add myself as reviewer

Sandipan Das (4):
      x86/cpufeatures: Add new word for scattered features
      perf/x86/amd/lbr: Use freeze based on availability
      perf/x86/amd/core: Update and fix stalled-cycles-* events for
Zen 2 and later
      perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later

Saurav Kashyap (4):
      scsi: qla2xxx: Fix double free of the ha->vp_map pointer
      scsi: qla2xxx: Fix double free of fcport
      scsi: qla2xxx: Change debug message during driver unload
      scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload

Sergey Shtylyov (1):
      MAINTAINERS: split Renesas Ethernet drivers entry

Shaul Triebitz (1):
      wifi: iwlwifi: mvm: consider having one active link

Shin'ichiro Kawasaki (1):
      scsi: mpi3mr: Avoid memcpy field-spanning write WARNING

Simon Trimmer (2):
      ALSA: hda: cs35l56: Raise device name message log level
      ALSA: hda: cs35l56: Set the init_done flag before component_add()

Stanislav Fomichev (1):
      xsk: Don't assume metadata is always requested in TX completion

Steve French (1):
      smb3: add trace event for mknod

Steven Zou (1):
      ice: Refactor FW data type and fix bitmap casting issue

Sung Joon Kim (1):
      drm/amd/display: Update dcn351 to latest dcn35 config

Taimur Hassan (1):
      drm/amd/display: Send DTBCLK disable message on first commit

Tavian Barnes (1):
      btrfs: fix race in read_extent_buffer_pages()

Tejas Upadhyay (1):
      drm/i915/mtl: Update workaround 14018575942

Thinh Nguyen (1):
      usb: dwc3: Properly set system wakeup

Thomas Gleixner (1):
      MAINTAINERS: Add co-maintainers for time[rs]

Thomas Zimmermann (1):
      fbdev: Select I/O-memory framebuffer ops for SBus

Tom Zanussi (1):
      crypto: iaa - Fix nr_cpus < nr_iaa case

Uros Bizjak (1):
      x86/percpu: Disable named address spaces for KCSAN

Ville Syrjälä (6):
      drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP
      drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly
      drm/i915/vrr: Generate VRR "safe window" for DSB
      drm/i915/dsb: Fix DSB vblank waits when using VRR
      drm/i915: Pre-populate the cursor physical dma address
      drm/i915/bios: Tolerate devdata==NULL in
intel_bios_encoder_supports_dp_dual_mode()

Vitaly Chikunov (1):
      selftests/mm: Fix build with _FORTIFY_SOURCE

Vitaly Prosyak (1):
      drm/sched: fix null-ptr-deref in init entity

Weitao Wang (1):
      USB: UAS: return ENODEV when submit urbs fail with device not attached

Wenjing Liu (1):
      drm/amd/display: fix a dereference of a NULL pointer

Xi Liu (2):
      drm/amd/display: increase bb clock for DCN351
      drm/amd/display: Set DCN351 BB and IP the same as DCN35

Xingui Yang (2):
      scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()
      scsi: libsas: Fix disk not being scanned in after being removed

Xu Yang (1):
      usb: typec: tcpm: fix double-free issue in tcpm_port_unregister_pd()

Yazen Ghannam (3):
      RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
      RAS/AMD/FMPM: Safely handle saved records of various sizes
      RAS: Avoid build errors when CONFIG_DEBUG_FS=n

Ye Zhang (1):
      thermal: devfreq_cooling: Fix perf state when calculate dfc res_util

Yonglong Liu (1):
      net: hns3: fix kernel crash when devlink reload during pf initialization

Yongzhi Liu (1):
      usb: misc: ljca: Fix double free in error handling path

Zev Weiss (2):
      prctl: generalize PR_SET_MDWE support check to be per-arch
      ARM: prctl: reject PR_SET_MDWE on pre-ARMv6

Zoltan HERPAI (1):
      pwm: img: fix pwm clock lookup

lima1002 (1):
      drm/amd/swsmu: add smu 14.0.1 vcn and jpeg msg

linke li (1):
      net: mark racy access on sk->sk_rcvbuf

yuan linyu (1):
      usb: udc: remove warning when queue disabled ep

^ permalink raw reply	[relevance 43%]

* Re: [GIT PULL] tpmdd changes for v6.9-rc2
  @ 2024-03-31 17:01 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-31 17:01 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dhowells, Peter Huewe, Jason Gunthorpe, linux-integrity,
	linux-kernel, keyrings

On Sat, 30 Mar 2024 at 22:57, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> OK, point taken and it is evolutionary issue really but definitely
> needs to be fixed.
>
> I review and test most of the stuff that goes to keyring but other
> than trusted keys, I usually pick only few patches every now and
> then to my tree.

It's perfectly fine if you send me key updates - you're listed as
maintainer etc, that's not a problem.

But when I get a tag name that says "tpmdd" and a subject that says
"tpmdd", I'm noty expecting to then see key updates in the pull.

So that part of my issue was literally just that your subject line and
tag name didn't match the contents, and that just makes me go "there's
something wrong here".

So keys coming through your tree is fine per se, it's just that I want
the subject line etc to actually make sense.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tpmdd changes for v6.9-rc2
  @ 2024-03-30 22:32 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-30 22:32 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Peter Huewe, Jason Gunthorpe, David Howells, linux-integrity,
	linux-kernel, keyrings

On Tue, 26 Mar 2024 at 07:38, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git tags/tpmdd-v6.9-rc2

So I haven't pulled this, because the subject line (and tag name)
talks about tpmdd, but this is clearly about key handling.

Also, the actual contents seem to be very much an "update", not fixes.
And it doesn't seem to be an actual improvement, in how it now does
things from interrupts. That seems to be going backward rather than
forward.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  @ 2024-03-28 20:09 95%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-28 20:09 UTC (permalink / raw)
  To: Linux regressions mailing list, Andreas Larsson
  Cc: Nick Bowler, linux-kernel, David S. Miller, sparclinux

On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> [CCing Linus, in case I say something to his disliking]
>
> On 22.03.24 05:57, Nick Bowler wrote:
> >
> > Just a friendly reminder that this issue still happens on Linux 6.8 and
> > reverting commit 9b2f753ec237 as indicated below is still sufficient to
> > resolve the problem.
>
> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
> nr_cpus is set") is from v4.8. Reverting it after all that time might
> easily lead to even bigger trouble.

I'm definitely not reverting a patch from almost a decade ago as a regression.

If it took that long to find, it can't be that critical of a regression.

So yes, let's treat it as a regular bug. And let's bring in Andreas to
the discussion too (although presumably he has seen it on the
sparclinux mailing list).

Andreas, if not, here's the link to lore for the beginning of the thread:

  https://lore.kernel.org/all/CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com/

And from a quick look I do think that commit is buggy, and yes, the
fix probably is just be to revert it.

As the original report makes clear, that commit 9b2f753ec23710 is
clearly confused about the difference between "number of CPU's", and
"index of CPU numbers".

When that smp_fill_in_cpu_possible_map() does

        int possible_cpus = num_possible_cpus();

and then uses that to fill in &__cpu_possible_mask, that's completely
nonsensical. Because we literally have

  #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
  #define num_possible_cpus()     cpumask_weight(cpu_possible_mask)

so it's reading cpu_possible_mask to figure out how many cpus it might
have, and then using that number to set possibly *different* bits in
the same bitmap that is just used to judge what the max number is.

So I do think a revert is called for, but I'm not going to treat this
as a regression, I'm going to just treat it as "sparc bug" and hope
that the sparc people try to figure out why that crazy code was
written.

And maybe it made more sense back a decade ago than it does now.

Andreas?

                Linus

^ permalink raw reply	[relevance 95%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-27 22:57 94%                                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-27 22:57 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
	rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
	Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Wed, 27 Mar 2024 at 14:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
>
> On the hardware end, the Mill guys were pointing out years ago that
> register renaming is a big power bottleneck in modern processors;

LOL.

The Mill guys took the arguments from the Itanium people, and turned
the crazy up to 11, with "the belt" and seemingly trying to do a
dataflow machine but not worrying over-much about memory accesses etc.

The whole "we'll deal with it in the compiler" is crazy talk.

In other words, I'll believe it when I see it. And I doubt we'll ever see it.

               Linus

^ permalink raw reply	[relevance 94%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-27 20:45 88%                             ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 20:45 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
	rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
	Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Wed, 27 Mar 2024 at 12:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> _But_: the lack of any aliasing guarantees means that writing through
> any pointer can invalidate practically anything, and this is a real
> problem.

It's actually much less of a problem than you make it out to be.

A lot of common aliasing information is statically visible (offsets
off the same struct pointer etc).

The big problems tend to be

 (a) old in-order hardware that really wants the compiler to schedule
memory operations

 (b) vectorization and HPC

and honestly, (a) is irrelevant, and (b) is where 'restrict' and
actual real vector extensions come in. In fact, the type-based
aliasing often doesn't help (because you have arrays of the same FP
types), and so you really just need to tell the compiler that your
arrays are disjoint.

Yes, yes, possible aliasing means that the compiler won't generate
nice-looking code in many situations and will end up reloading values
from memory etc.

AND NONE OF THAT MATTERS IN REALITY.

Performance issues to a close approximation come from cache misses and
branch mispredicts. The aliasing issue just isn't the horrendous issue
people claim it is. It's most *definitely* not worth the absolute
garbage that is C type-based aliasing.

And yes, I do think it might be nice to have a nicer 'restrict' model,
because yes, I look at the generated asm and I see the silly code
generation too. But describing aliasing sanely in general is just hard
(both for humans _and_ for some sane machine interface), and it's very
very seldom worth the pain.

            Linus

^ permalink raw reply	[relevance 88%]

* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
  2024-03-27 16:56 97% ` Linus Torvalds
@ 2024-03-27 20:26 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-27 20:26 UTC (permalink / raw)
  To: Greg KH, Chris Leech, Nilesh Javali, Christoph Hellwig; +Cc: linux-kernel

On Wed, 27 Mar 2024 at 09:56, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I also *suspect* that using 'physaddr_t' is in itself pointless,
> because I *think* the physical addresses are always page-aligned
> anyway, and it would be better if the uio_mem thing just contained the
> pfn instead. Which could just be 'unsigned long pfn'.

Oddly, the uio code seems to be written to allow unaligned page buffers,

        actual_pages = ((idev->info->mem[mi].addr & ~PAGE_MASK)
                        + idev->info->mem[mi].size + PAGE_SIZE -1) >>
PAGE_SHIFT;

but none of the mmap routines than actually allow such a mapping, and
they all have alignment checks.

Which sounds wonderful, until you find code like this duplicated in
various uio drivers:

                uiomem->memtype = UIO_MEM_PHYS;
                uiomem->addr = r->start & PAGE_MASK;
                uiomem->offs = r->start & ~PAGE_MASK;
                uiomem->size = (uiomem->offs + resource_size(r)
                                + PAGE_SIZE - 1) & PAGE_MASK;

IOW, it explicitly aligns the resources to pages, so now mmap works
again. Oh the horror.

But yes, that physical part of 'addr' should be a pfn. Sadly, all of
this code is such a mess that it's a horrible job to try to fix it all
up.

So we may be stuck with the horrendous confusion that is the current
uio_mem thing.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-27 19:07 89%                         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 19:07 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
	rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
	Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote:
> > Meanwhile, Rust intentionally lacks strict aliasing.
>
> I wasn't aware of this. Given that unrestricted pointers are a real
> impediment to compiler optimization, I thought that with Rust we were
> finally starting to nail down a concrete enough memory model to tackle
> this safely. But I guess not?

Strict aliasing is a *horrible* mistake.

It's not even *remotely* "tackle this safely". It's the exact
opposite. It's completely broken.

Anybody who thinks strict aliasing is a good idea either

 (a) doesn't understand what it means

 (b) has been brainwashed by incompetent compiler people.

it's a horrendous crock that was introduced by people who thought it
was too complicated to write out "restrict" keywords, and that thought
that "let's break old working programs and make it harder to write new
programs" was a good idea.

Nobody should ever do it. The fact that Rust doesn't do the C strict
aliasing is a good thing. Really.

I suspect you have been fooled by the name. Because "strict aliasing"
sounds like a good thing. It sounds like "I know these strictly can't
alias". But despite that name, it's the complete opposite of that, and
means "I will ignore actual real aliasing even if it exists, because I
will make aliasing decisions on entirely made-up grounds".

Just say no to strict aliasing. Thankfully, there's an actual compiler
flag for that: -fno-strict-aliasing. It should absolutely have been
the default.

                 Linus

^ permalink raw reply	[relevance 89%]

* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
    @ 2024-03-27 16:56 97% ` Linus Torvalds
  2024-03-27 20:26 99%   ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 16:56 UTC (permalink / raw)
  To: Greg KH, Chris Leech, Nilesh Javali, Christoph Hellwig; +Cc: linux-kernel

On Thu, 21 Mar 2024 at 06:02, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> Char/Misc and other driver subsystem updates for 6.9-rc1
[...]
> Chris Leech (4):
>       uio: introduce UIO_MEM_DMA_COHERENT type
>       cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
>       uio_pruss: UIO_MEM_DMA_COHERENT conversion
>       uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion

So this was all broken, and doesn't even build on 32-bit architectures
with 64-bit physical addresses as reported by at least Guenter.
Notably that includes i386 allmodconfig.

I fixed up the build, but I did it the mindless way. I noted in the
commit message that I think the correct fix is likely to make
'uio_mem.mem' be a union of 'physaddr_t' and 'void *' and just always
use the right member. UIO_MEM_LOGICAL and UIO_MEM_VIRTUAL should
probably use the pointer thing too.

I also *suspect* that using 'physaddr_t' is in itself pointless,
because I *think* the physical addresses are always page-aligned
anyway, and it would be better if the uio_mem thing just contained the
pfn instead. Which could just be 'unsigned long pfn'.

So there are proper cleanups that could be done in that area.

That's not what I did, though. I just fixed up the bad casts.

There may be other fixes pending out there, but I didn't want to delay
the 32-bit build fixes any more.

It turns out that the cnic,bnx2,bnx2x conversion avoided the problems,
almost by accident. That driver had used UIO_MEM_LOGICAL before and
had existing casts. That doesn't make it good, but at least it made it
not fail to build.

See commit 498e47cd1d1f ("Fix build errors due to new
UIO_MEM_DMA_COHERENT mess")

                    Linus

^ permalink raw reply	[relevance 97%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-26  3:49 76%                   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-26  3:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kent Overstreet, Philipp Stanner, Boqun Feng, rust-for-linux,
	linux-kernel, linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
	Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
	Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
	Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
	Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
	linux-arm-kernel, linux-fsdevel

On Mon, 25 Mar 2024 at 17:05, Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> Isn't one of the aims of the Rust/C++ idea that you can't forget to access
> a shared piece of data atomically?

If that is an aim, it's a really *bad* one.

Really.

It very much should never have been an aim, and I hope it wasn't. I
think, and hope, that the source of the C++ and Rust bad decisions is
cluelessness, not active malice.

Take Rust - one big point of Rust is the whole "safe" thing, but it's
very much not a straightjacket like Pascal was. There's a "safe" part
to Rust, but equally importantly, there's also the "unsafe" part to
Rust.

The safe part is the one that most programmers are supposed to use.
It's the one that allows you to not have to worry too much about
things. It's the part that makes it much harder to screw up.

But the *unsafe* part is what makes Rust powerful. It's the part that
works behind the curtain. It's the part that may be needed to make the
safe parts *work*.

And yes, an application programmer might never actually need to use
it, and in fact in many projects the rule might be that unsafe Rust is
simply never even an option - but that doesn't mean that the unsafe
parts don't exist.

Because those unsafe parts are needed to make it all work in reality.

And you should never EVER base your whole design around the "safe"
part. Then you get a language that is a straight-jacket.

So I'd very strongly argue that the core atomics should be done the
"unsafe" way - allow people to specify exactly when they want exactly
what access. Allow people to mix and match and have overlapping
partial aliases, because if you implement things like locking, you
*need* those partially aliasing accesses, and you need to make
overlapping atomics where sometimes you access only one part of the
field.

And yes, that will be unsafe, and it might even be unportable, but
it's exactly the kind of thing you need in order to avoid having to
use assembly language to do your locking.

And by all means, you should relegate that to the "unsafe corner" of
the language. And maybe don't talk about the unsafe sharp edges in the
first chapter of the book about the language.

But you should _start_ the design of your language memory model around
the unsafe "raw atomic access operations" model.

Then you can use those strictly more powerful operations, and you
create an object model *around* it.

So you create safe objects like just an atomic counter. In *that*
corner of the language, you have the "safe atomics" - they aren't the
fundamental implementation, but they are the safe wrappers *around*
the more powerful (but unsafe) core.

With that "atomic counter" you cannot forget to do atomic accesses,
because that safe corner of the world doesn't _have_ anything but the
safe atomic accesses for every time you use the object.

See? Having the capability to do powerful and maybe unsafe things does
not force people to expose and use all that power. You can - and
should - wrap the powerful model with safer and simpler interfaces.

This isn't something specific to atomics. Not even remotely. This is
quite fundamental. You often literally _cannot_ do interesting things
using only safe interfaces. You want safe memory allocations - but to
actually write the allocator itself, you want to have all those unsafe
escape methods - all those raw pointers with arbitrary arithmetic etc.

And if you don't have unsafe escapes, you end up doing what so many
languages did: the libraries are written in something more powerful
like C, because C literally can do things that other languages
*cannot* do.

Don't let people fool you with talk about Turing machines and similar
smoke-and-mirror garbage. It's a bedtime story for first-year CS
students. It's not true.

Not all languages are created equal. Not all languages can do the same
things. If your language doesn't have those unsafe escapes, your
language is inherently weaker, and inherently worse for it.

           Linus

^ permalink raw reply	[relevance 76%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-25 19:44 84%               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-25 19:44 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Philipp Stanner, Boqun Feng, rust-for-linux, linux-kernel,
	linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
	Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
	Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
	Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
	Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
	linux-arm-kernel, linux-fsdevel

On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> To be fair, "volatile" dates from an era when we didn't have the haziest
> understanding of what a working memory model for C would look like or
> why we'd even want one.

I don't disagree, but I find it very depressing that now that we *do*
know about memory models etc, the C++ memory model basically doubled
down on the same "object" model.

> The way the kernel uses volatile in e.g. READ_ONCE() is fully in line
> with modern thinking, just done with the tools available at the time. A
> more modern version would be just
>
> __atomic_load_n(ptr, __ATOMIC_RELAXED)

Yes. Again, that's the *right* model in many ways, where you mark the
*access*, not the variable. You make it completely and utterly clear
that this is a very explicit access to memory.

But that's not what C++ actually did. They went down the same old
"volatile object" road, and instead of marking the access, they mark
the object, and the way you do the above is

    std::atomic_int value;

and then you just access 'value' and magic happens.

EXACTLY the same way that

   volatile int value;

works, in other words. With exactly the same downsides.

And yes, I think that model is a nice shorthand. But it should be a
*shorthand*, not the basis of the model.

I do find it annoying, because the C++ people literally started out
with shorthands. The whole "pass by reference" is literally nothing
but a shorthand for pointers (ooh, scary scary pointers), where the
address-of is implied at the call site, and the 'dereference'
operation is implied at use.

So it's not that shorthands are wrong. And it's not that C++ isn't
already very fundamentally used to them. But despite that, the C++
memory model is very much designed around the broken object model, and
as already shown in this thread, it causes actual immediate problems.

And it's not just C++. Rust is supposed to be the really moden thing.
And it made the *SAME* fundamental design mistake.

IOW, the whole access size problem that Boqun described is
*inherently* tied to the fact that the C++ and Rust memory model is
badly designed from the wrong principles.

Instead of designing it as a "this is an atomic object that you can do
these operations on", it should have been "this is an atomic access,
and you can use this simple object model to have the compiler generate
the accesses for you".

This is why I claim that LKMM is fundamentally better. It didn't start
out from a bass-ackwards starting point of marking objects "atomic".

And yes, the LKMM is a bit awkward, because we don't have the
shorthands, so you have to write out "atomic_read()" and friends.

Tough. It's better to be correct than to be simple.

             Linus

^ permalink raw reply	[relevance 84%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-25 17:44 80%           ` Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-25 17:44 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: Kent Overstreet, Boqun Feng, rust-for-linux, linux-kernel,
	linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
	Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
	Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
	Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
	Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
	linux-arm-kernel, linux-fsdevel

On Mon, 25 Mar 2024 at 06:57, Philipp Stanner <pstanner@redhat.com> wrote:
>
> On Fri, 2024-03-22 at 17:36 -0700, Linus Torvalds wrote:
> >
> > It's kind of like our "volatile" usage. If you read the C (and C++)
> > standards, you'll find that you should use "volatile" on data types.
> > That's almost *never* what the kernel does. The kernel uses
> > "volatile"
> > in _code_ (ie READ_ONCE() etc), and uses it by casting etc.
> >
> > Compiler people don't tend to really like those kinds of things.
>
> Just for my understanding: Why don't they like it?

So I actually think most compiler people are perfectly fine with the
kernel model of mostly doing 'volatile' not on the data structures
themselves, but as accesses through casts.

It's very traditional C, and there's actually nothing particularly odd
about it. Not even from a compiler standpoint.

In fact, I personally will argue that it is fundamentally wrong to
think that the underlying data has to be volatile. A variable may be
entirely stable in some cases (ie locks held), but not in others.

So it's not the *variable* (aka "object") that is 'volatile', it's the
*context* that makes a particular access volatile.

That explains why the kernel has basically zero actual volatile
objects, and 99% of all volatile accesses are done through accessor
functions that use a cast to mark a particular access volatile.

But I've had negative comments from compiler people who read the
standards as language lawyers (which honestly, I despise - it's always
possible to try to argue what the meaning of some wording is), and
particularly C++ people used to be very very antsy about "volatile".

They had some truly _serious_ problems with volatile.

The C++ people spent absolutely insane amounts of time arguing about
"volatile objects" vs "accesses", and how an access through a cast
didn't make the underlying object volatile etc.

There were endless discussions because a lvalue isn't supposed to be
an access (an lvalue is something that is being acted on, and it
shouldn't imply an access because an access will then cause other
things in C++). So a statement expression that was just an lvalue
shouldn't imply an access in C++ originally, but obviously when the
thing was volatile it *had* to do so, and there was gnashing of teeth
over this all.

And all of it was purely semantic nitpicking about random wording. The
C++ people finally tried to save face by claiming that it was always
the C (not C++) rules that were unclear, and introduced the notion of
"glvalue", and it's all good now, but there's literally decades of
language lawyering and pointless nitpicking about the difference
between "objects" and "accesses".

Sane people didn't care, but if you reported a compiler bug about
volatile use, you had better be ready to sit back and be flamed for
how your volatile pointer cast wasn't an "object" and that the
compiler that clearly generated wrong code was technically correct,
and that your mother was a hamster.

It's a bit like the NULL debacle. Another thing that took the C++
people a couple of decades to admit they were wrong all along, and
that NULL isn't actually 'integer zero' in any sane language that
claims to care deeply about types.

[ And again, to save face, at no point did they say "ok, '(void *)0'
is fine" - they introduced a new __nullptr thing just so that they
wouldn't have to admit that their decades of arguing was just them
being wrong. You'll find another decade of arguments explaining the
finer details about _that_ difference ]

It turns out that the people who are language-lawyering nitpickers are
then happy to be "proven right" by adding some more pointless
syntacting language-lawyering language.

Which I guess makes sense, but to the rest of us it all looks a bit pointless.

                  Linus

^ permalink raw reply	[relevance 80%]

* Linux 6.9-rc1
@ 2024-03-24 21:56 72% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-24 21:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So two weeks have passed, the merge window is over, and v6.9-rc1 is
tagged and pushed out.

This merge window looks to be fairly normal. If you look at the diffs,
you'd think that the bulk of all the changes are AMD GPU header files
again, and you'd not be entirely wrong. About 40% of the whole 6.9rc1
patch is indeed just the auto-generated AMD GPU definitions. I wish
this was unusual, but it's a pattern.

Anyway, while that is a lot of the actual changes by pure line
numbers, it's all just basically noise and not meaningful in the big
picture.

In contrast, what _is_ meaningful is a couple of very core updates.
The timer subsystem had a fairly big rewrite, to have per-cpu timer
wheels to improve performance of timers, which can be a big deal
particularly for networking. The other fairly notable core update is
to the workqueue subsystem, where one notable addition is for BH
workqueue support. That's notable mainly because it means we finally
have a way away from tasklets. The tasklet interface has basically
been deprecated for a long while, but we've never really had any good
alternatives (with threaded interrupt handlers being one suggested
use-case, but not realistic in many cases).

The core updates should be entirely invisible to users, as they don't
involve any semantic changes, just expanded capabilities. Of course,
being somewhat big changes, they did cause a few issues, but we've
hopefully already caught all the big deals.

Anyway, there's obviously also all the usual updates, and even when
you ignore the recurring AMD header drop more than half of actual
patch is - as usual - various driver updates all over. And all the
other usual suspects: architecture updates, various filesystems (old
ntfs core removal might be worth noting), core networking, VM and
kernel. And tooling and documentation.

Please commence testing,

           Linus

---

Alex Williamson (1):
    VFIO updates

Alexandre Belloni (2):
    i3c updates
    RTC updates

Amir Goldstein (1):
    overlayfs fixes

Andreas Larsson (1):
    sparc updates

Andrew Morton (2):
    MM updates
    non-MM updates

Andy Shevchenko (1):
    auxdisplay updates

Ard Biesheuvel (3):
    EFI updates
    EFI fix
    EFI fixes

Arnd Bergmann (6):
    SoC device tree updates
    ARM SoC driver updates
    ARM SoC code updates
    ARM defconfig updates
    asm-generic updates
    more ARM SoC updates

Bartosz Golaszewski (1):
    gpio updates

Bjorn Andersson (3):
    remoteproc updates
    rpmsg updates
    hwspinlock updates

Bjorn Helgaas (1):
    PCI updates

Boqun Feng (1):
    RCU updates

Borislav Petkov (7):
    RAS fixlet
    x86 cpu update
    x86 MTRR update
    resource control updates
    x86 SEV updates
    misc x86 fixes
    EDAC updates

Casey Schaufler (1):
    smack updates

Catalin Marinas (2):
    arm64 updates
    arm64 fixes

Chandan Babu (2):
    xfs updates
    xfs fixes

Christian Brauner (8):
    misc vfs updates
    ntfs update
    iomap updates
    pdfd updates
    file locking updates
    block handle updates
    vfs uuid updates
    vfs fixes

Christoph Hellwig (2):
    dma-mapping updates
    dma-mapping fixes

Chuck Lever (1):
    nfsd updates

Damien Le Moal (1):
    zonefs update

Dan Williams (1):
    CXL updates

Dave Airlie (2):
    drm updates
    drm fixes

Dave Hansen (4):
    x86 mm updates
    x86 tdx update
    x86 RFDS mitigation
    x86 APIC fixup

Dave Jiang (1):
    libnvdimm updates

David Sterba (3):
    btrfs updates
    affs update
    btrfs fix

David Teigland (1):
    dlm updates

Dmitry Torokhov (1):
    input updates

Dominik Brodowski (1):
    PCMCIA updates

Eric Biggers (2):
    fscrypt updates
    fsverity update

Eric Van Hensbergen (1):
    9p updates

Gao Xiang (1):
    erofs updates

Geert Uytterhoeven (1):
    m68k updates

Greg KH (5):
    USB / Thunderbolt updates
    tty / serial driver updates
    staging driver updates
    char/misc and other driver subsystem updates
    driver core updates

Guenter Roeck (1):
    hwmon updates

Heiko Carstens (2):
    s390 updates
    more s390 updates

Helge Deller (2):
    parisc architecture updates and fixes
    fbdev updates

Herbert Xu (1):
    crypto updates

Huacai Chen (1):
    LoongArch updates

Ilpo Järvinen (1):
    x86 platform driver updates

Ilya Dryomov (1):
    ceph updates

Ingo Molnar (10):
    locking updates
    scheduler updates
    x86 asm updates
    x86 build updates
    x86 cleanups
    core x86 updates
    x86 boot updates
    x86 perf event fixes
    timer fix
    irq fix

Jaegeuk Kim (1):
    f2fs update

Jakub Kicinski (2):
    networking updates
    networking fixes

James Bottomley (2):
    SCSI updates
    more SCSI updates

Jan Kara (2):
    fsnotify updates
    ext2, isofs, udf, and quota updates

Jarkko Sakkinen (1):
    tpm updates

Jason Gunthorpe (1):
    rdma updates

Jassi Brar (1):
    mailbox updates

Jens Axboe (5):
    io_uring updates
    block updates
    block fixes
    more io_uring updates
    more block updates

Jiri Kosina (1):
    HID updates

Joel Granados (1):
    sysctl updates

Joerg Roedel (1):
    iommu updates

John Paul Adrian Glaubitz (1):
    sh updates

Jonathan Corbet (2):
    documentation updates
    more documentation updates

Juergen Gross (1):
    xen updates

Julia Lawall (1):
    coccinelle update

Kees Cook (5):
    pstore updates
    execve updates
    hardening updates
    seccomp updates
    more hardening updates

Kent Overstreet (2):
    bcachefs updates
    bcachefs fixes

Lee Jones (3):
    MFD updates
    backlight updates
    LED updates

Linus Walleij (1):
    pin control updates

Luis Chamberlain (1):
    modules updates

Mark Brown (5):
    regmap updates
    regulator updates
    spi updates
    regulator fix
    spi fixes

Masahiro Yamada (1):
    Kbuild updates

Masami Hiramatsu (1):
    probes updates

Mauro Carvalho Chehab (1):
    media updates

Michael Ellerman (2):
    powerpc updates
    more powerpc updates

Michael Tsirkin (1):
    virtio updates

Mickaël Salaün (1):
    landlock updates

Miguel Ojeda (2):
    compiler attributes update
    Rust updates

Mike Marshall (1):
    orangefs updates

Mike Snitzer (4):
    device mapper updates
    device mapper BH workqueue conversion
    device mapper VDO target
    device mapper fixes

Miklos Szeredi (1):
    fuse updates

Miquel Raynal (1):
    MTD updates

Namhyung Kim (1):
    perf tools updates

Namjae Jeon (1):
    exfat updates

Niklas Cassel (2):
    ata updates
    ata fix

Palmer Dabbelt (1):
    RISC-V updates

Paolo Bonzini (1):
    kvm updates

Paul Moore (4):
    audit updates
    selinux updates
    lsm updates
    lsm fixes

Petr Mladek (1):
    printk updates

Rafael Wysocki (6):
    power management updates
    ACPI updates
    thermal control updates
    more thermal control updates
    more ACPI updates
    more power management updates

Richard Weinberger (1):
    UBI and UBIFS updates

Rob Herring (1):
    devicetree updates

Russell King (1):
    ARM updates

Sebastian Reichel (2):
    HSI updates
    power supply and reset updates

Shuah Khan (2):
    kselftest update
    KUnit updates

Stafford Horne (1):
    OpenRISC updates

Stephen Boyd (1):
    clk updates

Steve French (3):
    smb client updates
    smb server updates
    smb client fixes

Steven Rostedt (4):
    tracing updates
    tracing updates
    ktest updates
    trace tool updates

Takashi Iwai (3):
    sound updates
    sound fixes
    more sound fixes

Takashi Sakamoto (1):
    firewire updates

Ted Ts'o (1):
    ext4 updates

Tejun Heo (3):
    workqueue updates
    workqueue BH conversions
    cgroup updates

Thomas Bogendoerfer (1):
    MIPS updates

Thomas Gleixner (14):
    irq updates
    MSI updates
    cpu core updates
    clocksource updates
    timer updates
    x86 APIC updates
    x86 FRED support
    x86 entry update
    core entry fix
    irq fixes
    more clocksource updates
    timer fixes
    scheduler doc clarification
    x86 fixes

Trond Myklebust (1):
    NFS client updates

Tzung-Bi Shih (1):
    chrome platform firmware updates

Ulf Hansson (2):
    MMC updates
    pmdomain updates

Uwe Kleine-König (2):
    pwm updates
    siox updates

Vinod Koul (3):
    soundwire updates
    dmaengine updates
    phy updates

Vlastimil Babka (1):
    slab updates

Wei Liu (1):
    hyperv updates

Wim Van Sebroeck (1):
    watchdog updates

Wolfram Sang (2):
    i2c updates
    more i2c updates

Yury Norov (1):
    bitmap updates

^ permalink raw reply	[relevance 72%]

* Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat()
  @ 2024-03-24 17:44 94%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-24 17:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Vlastimil Babka, Josh Poimboeuf, Jeff Layton, Chuck Lever,
	Kees Cook, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
	Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
	Christian Brauner, Jan Kara, linux-mm, linux-kernel, cgroups,
	linux-fsdevel

[ Al, I hope your email works now ]

On Sat, 23 Mar 2024 at 19:27, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We can have the same file occuring in many slots of many descriptor tables,
> obviously.  So it would have to be a flag (in ->f_mode?) set by it, for
> "someone's already charged for it", or you'll end up with really insane
> crap on each fork(), dup(), etc.

Nope.

That flag already exists in the slab code itself with this patch. The
kmem_cache_charge() thing itself just sets the "I'm charged" bit in
the slab header, and you're done. Any subsequent fd_install (with dup,
or fork or whatever) simply is irrelevant.

In fact, dup and fork and friends won't need to worry about this,
because they only work on files that have already been installed, so
they know the file is already accounted.

So it's only the initial open() case that needs to do the
kmem_cache_charge() as it does the fd_install.

> But there's also MAP_ANON with its setup_shmem_file(), with the resulting
> file not going into descriptor tables at all, and that's not a rare thing.

Just making alloc_file_pseudo() do a SLAB_ACOUNT should take care of
all the normal case.

For once, the core allocator is not exposed very much, so we can
literally just look at "who does alloc_file*()" and it turns out it's
all pretty well abstracted out.

So I think it's mainly the three cases of 'alloc_empty_file()' that
would be affected and need to check that they actually do the
fd_install() (or release it).

Everything else should either not account at all (if they know they
are doing temporary kernel things), or always account (eg
alloc_file_pseudo()).

               Linus

^ permalink raw reply	[relevance 94%]

* Re: [PATCH v4 00/16] x86-64: Stack protector and percpu improvements
  2024-03-23 16:16 94%     ` Linus Torvalds
@ 2024-03-23 17:06 96%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-23 17:06 UTC (permalink / raw)
  To: Brian Gerst, Arnd Bergmann
  Cc: Uros Bizjak, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, David.Laight

On Sat, 23 Mar 2024 at 09:16, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And we might as well also do the semi-yearly compiler version review.
> We raised the minimum to 4.9 almost four years ago, and then the jump
> to 5.1 was first for arm64 due to a serious gcc code generation bug
> and then globally in Sept 2021.

Looking at RHEL, I find a page that claims

  RHEL9 : gcc 11.x in app stream
  RHEL8 : gcc 8.x or gcc 9.x in app stream.
  RHEL7 : gcc 4.8.x

so RHEL7 is already immaterial from a kernel compiler standpoint, and
so it looks like at least as far as RHEL is concerned, we could just
jump to gcc 8.1 as a minimum.

RHEL also has a "Developer Toolset" that allows you to pick a compiler
upgrade, so it's not *quite* as black-and-white as that, but it does
seem like we could at some point just pick gcc-8 as a new minimum with
very little pain on that front.

The SLES situation seems somewhat similar, with SLES12 being 4.8.x and
SLES15 being 7.3. But again with a "Development Tools Module" setup.
So that *might* argue for 7.3.

I can't make sense of Debian releases. There's "stable" (bookworm)
that comes with gcc-12.2, but there's oldstable, oldoldstable, and
various "archived" releases still under LTS. I can't even begin to
guess what may be relevant.

I don't think we care that deeply on the kernel side, other than a
"maybe we should be a bit more proactive about raising gcc version
requirements". I don't think we have any huge issues right now with
old gcc versions.

               Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH v4 00/16] x86-64: Stack protector and percpu improvements
  @ 2024-03-23 16:16 94%     ` Linus Torvalds
  2024-03-23 17:06 96%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23 16:16 UTC (permalink / raw)
  To: Brian Gerst, Arnd Bergmann
  Cc: Uros Bizjak, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, David.Laight

On Sat, 23 Mar 2024 at 06:23, Brian Gerst <brgerst@gmail.com> wrote:
>
> One small issue is that Kconfig would silently disable istackprotector
> if the compiler doesn't support the new options.  That said, the
> number of people that this would affect is very small, as just about
> any modern distribution ships a compiler newer than 8.1.

Yes, let's make the rule be that you can still compile the kernel with
gcc-5.1+, but you can't get stackprotector support unless you have
gcc-8.1+.

I'd hate to add the objtool support for an old compiler - this is a
hardening feature, not a core feature, and anybody who insists on old
compilers just won't get it.

And we have other cases like this where various debug features depend
on the gcc version, eg

  config CC_HAS_WORKING_NOSANITIZE_ADDRESS
          def_bool !CC_IS_GCC || GCC_VERSION >= 80300

so we could easily do the same for stack protector support.

And we might as well also do the semi-yearly compiler version review.
We raised the minimum to 4.9 almost four years ago, and then the jump
to 5.1 was first for arm64 due to a serious gcc code generation bug
and then globally in Sept 2021.

So it's probably time to think about that anyway,

That said, we don't actually have all that many gcc version checks
around any more, so I think the jump to 5.1 got rid of the worst of
the horrors. Most of the GCC_VERSION checks are either in gcc-plugins
(which we should just remove, imnsho - not the version checks, the
plugins entirely), or for various random minor details (warnign
enablement and the asm goto workaround).

So there doesn't seem to be a major reason to up the versioning, since
the stack protector thing can just be disabled for older versions.

But maybe even enterprise distros have upgraded anyway, and we should
be proactive.

Cc'ing Arnd, who has historically been one of the people pushing this.
He may no longer care because we haven't had huge issues.

               Linus

^ permalink raw reply	[relevance 94%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-23  0:36 84%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23  0:36 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Boqun Feng, rust-for-linux, linux-kernel, linux-arch, llvm,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Fri, 22 Mar 2024 at 17:21, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Besides that there's cross arch support to think about - it's hard to
> imagine us ever ditching our own atomics.

Well, that's one of the advantages of using compiler builtins -
projects that do want cross-architecture support, but that aren't
actually maintaining their _own_ architecture support.

So I very much see the lure of compiler support for that kind of
situation - to write portable code without having to know or care
about architecture details.

This is one reason I think the kernel is kind of odd and special -
because in the kernel, we obviously very fundamentally have to care
about the architecture details _anyway_, so then having the
architecture also define things like atomics is just a pretty small
(and relatively straightforward) detail.

The same argument goes for compiler builtins vs inline asm. In the
kernel, we have to have people who are intimately familiar with the
architecture _anyway_, so inline asms and architecture-specific header
files aren't some big pain-point: they'd be needed _anyway_.

But in some random user level program, where all you want is an
efficient way to do "find first bit"? Then using a compiler intrinsic
makes a lot more sense.

> I was thinking about something more incremental - just an optional mode
> where our atomics were C atomics underneath. It'd probably give the
> compiler people a much more effective way to test their stuff than
> anything they have now.

I suspect it might be painful, and some compiler people would throw
their hands up in horror, because the C++ atomics model is based
fairly solidly on atomic types, and the kernel memory model is much
more fluid.

Boqun already mentioned the "mixing access sizes", which is actually
quite fundamental in the kernel, where we play lots of games with that
(typically around locking, where you find patterns line unlock writing
a zero to a single byte, even though the whole lock data structure is
a word). And sometimes the access size games are very explicit (eg
lib/lockref.c).

But it actually goes deeper than that. While we do have "atomic_t" etc
for arithmetic atomics, and that probably would map fairly well to C++
atomics, in other cases we simply base our atomics not on _types_, but
on code.

IOW, we do things like "cmpxchg()", and the target of that atomic
access is just a regular data structure field.

It's kind of like our "volatile" usage. If you read the C (and C++)
standards, you'll find that you should use "volatile" on data types.
That's almost *never* what the kernel does. The kernel uses "volatile"
in _code_ (ie READ_ONCE() etc), and uses it by casting etc.

Compiler people don't tend to really like those kinds of things.

            Linus

^ permalink raw reply	[relevance 84%]

* Re: [WIP 0/3] Memory model and atomic API in Rust
  @ 2024-03-23  0:12 88%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23  0:12 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Boqun Feng, rust-for-linux, linux-kernel, linux-arch, llvm,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Nathan Chancellor, Nick Desaulniers, kent.overstreet,
	Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Catalin Marinas, linux-arm-kernel, linux-fsdevel

On Fri, 22 Mar 2024 at 16:57, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> I wonder about that. The disadvantage of only supporting LKMM atomics is
> that we'll be incompatible with third party code, and we don't want to
> be rolling all of our own data structures forever.

Honestly, having seen the shit-show that is language standards bodies
and incomplete compiler support, I do not understand why people think
that we wouldn't want to roll our own.

The C++ memory model may be reliable in another decade. And then a
decade after *that*, we can drop support for the pre-reliable
compilers.

People who think that compilers do things right just because they are
automated simply don't know what they are talking about.

It was just a couple of days ago that I was pointed at

    https://github.com/llvm/llvm-project/issues/64188

which is literally the compiler completely missing a C++ memory barrier.

And when the compiler itself is fundamentally buggy, you're kind of
screwed. When you roll your own, you can work around the bugs in
compilers.

And this is all doubly true when it is something that the kernel does,
and very few other projects do. For example, we're often better off
using inline asm over dubious builtins that have "native" compiler
support for them, but little actual real coverage. It really is often
a "ok, this builtin has actually been used for a decade, so it's
hopefully stable now".

We have years of examples of builtins either being completely broken
(as in "immediate crash" broken), or simply generating crap code that
is actively worse than using the inline asm.

The memory ordering isn't going to be at all different. Moving it into
the compiler doesn't solve problems. It creates them.

                 Linus

^ permalink raw reply	[relevance 88%]

* Re: [GIT PULL] Hyper-V commits for 6.9
  @ 2024-03-22 23:42 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-22 23:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: Linux Kernel List, Linux on Hyper-V List, kys, haiyangz, decui

On Fri, 22 Mar 2024 at 16:25, Wei Liu <wei.liu@kernel.org> wrote:
>
> Hmm... I thought I refreshed it right before the expiration date. I
> pushed it to Ubuntu's keyserver.

Ok, I can find it there.

> I will check if something's wrong.
>
> Do you have a keyserver that you prefer?

The problem with keyservers is that there's so many of them, and
everybody uses different keyservers, and the propagation of pgp keys
across keyservers hasn't really worked for over a decade by now.

Maybe keys eventually propagate, but I have my doubts.

My default keyserver appears to be hkps://keys.openpgp.org, but the
pgp key git tree on kernel.org is the one I then look at when some key
isn't there (or is there, but hasn't been updated).

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] SCSI postmerge updates for the 6.8+ merge window
  @ 2024-03-22 20:34 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-22 20:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Fri, 22 Mar 2024 at 13:24, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> OK, try this (I've updated the scsi-misc tag with it as well)

Well there we go. I really had no idea what the pull was supposed to do.

And while I end up looking at individual commits for random smaller
subsystems when it's unclear (sometimes just for language barrier
issues), for long-time maintainers of bigger stuff I kind of expect
better.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] SCSI postmerge updates for the 6.8+ merge window
  @ 2024-03-22 19:55 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-22 19:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Fri, 22 Mar 2024 at 12:12, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> Eleven patches that are based on the rw_hint branch of the vfs tree
> which contained the base block and fs changes needed to support this.
> 8 patches are in the debug driver and 3 in the core.

Please people - the number of patches involved is entirely immaterial.

I want my merge messages to say what those patches *do*?

This whole "how many patches" thing is a disease. It's not even
remotely interesting. I see the size of the patch in the diffstat, and
that actually has some meaning in the sense of "how much does this
pull actually change", whether it's in one patch or a hundred.

I have absolutely *zero* idea what the above pull request actually
asks me to pull.

So I won't.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
  @ 2024-03-21 20:28 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 20:28 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm

On Thu, 21 Mar 2024 at 11:30, Nathan Chancellor <nathan@kernel.org> wrote:
>
> Since GCC does not appear emit warnings for newer C features that it
> allows even with older 'gnu' standard values by default (I think it does
> with '-pedantic'?), perhaps we should just disable -Wc23-extensions
> altogether? Not sure how big of a hammer this is, I think this type of
> warning is the only thing I have seen come from -Wc23-extensions...

It looks like adding -Wno-c23-extensions would only work with more
recent clang versions, so it wouldn't actually fix the build problems,
just make them even harder for developers to actually notice.

Oh well. It's not like this is all that common a problem, so I think
we'll just have to live with it, and hope that people don't do that
"label at end of statement" very often.

(I think it's case statements too, not just labels, too lazy to look
up the details again)

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
  2024-03-21 18:10 99%   ` Linus Torvalds
@ 2024-03-21 18:12 99%     ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:12 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm

On Thu, 21 Mar 2024 at 11:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the "labels without a statement" thing is not only a long-time gcc
> behavior (admittedly due to a parsing bug), afaik it's becoming
> "standard C" in C23.

Actually, let me take that back. I think it's only a proposal (WG14
N2508), I have no idea if it's actually going to be standard.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
  @ 2024-03-21 18:10 99%   ` Linus Torvalds
  2024-03-21 18:12 99%     ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:10 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm

On Thu, 21 Mar 2024 at 06:48, Nathan Chancellor <nathan@kernel.org> wrote:
>
> That build warning actually happens with clang, not GCC as far as I am
> aware, and it is actually a hard build error with older versions of
> clang

So the "labels without a statement" thing is not only a long-time gcc
behavior (admittedly due to a parsing bug), afaik it's becoming
"standard C" in C23.

Does clang have a flag to allow this?

Considering that gcc doesn't warn for it, and that it will become
official at some point anyway, I think this might be a thing that we
might be better off just accepting, rather than be in the situation
where people write code that compiles fine with gcc and don't notice
that clang will error out.

So yes, clang is being correct, but in this case it only causes problems.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] remoteproc updates for v6.9
  @ 2024-03-21 18:05 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:05 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: linux-remoteproc, linux-kernel, Andrew Davis, Neil Armstrong,
	Arnaud Pouliquen, Krzysztof Kozlowski, Sibi Sankar, Abel Vesa,
	Dmitry Baryshkov, Joakim Zhang, Mathieu Poirier

On Thu, 21 Mar 2024 at 11:03, Bjorn Andersson <andersson@kernel.org> wrote:
>
> I was further notified that this conflicts with your tree, Linus. Below
> is the resolution for this conflict.

Heh. This email came in after the pr-tracker-bot email notifying you
that it's already done..

I think I got it all right, it didn't seem at all controversial, but
maybe you should double-check.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Hyper-V commits for 6.9
  @ 2024-03-21 17:06 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-21 17:06 UTC (permalink / raw)
  To: Wei Liu; +Cc: Linux Kernel List, Linux on Hyper-V List, kys, haiyangz, decui

On Wed, 20 Mar 2024 at 21:09, Wei Liu <wei.liu@kernel.org> wrote:
>
>   ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20240320

Pulled, but...

Your pgp key expired two weeks ago. Please extend the expiration date
(and not something small!) and make sure to refresh the kernel.org
repo and/or other keyservers.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing/tools: Updates for v6.9
  @ 2024-03-20 23:40 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-20 23:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira

On Wed, 20 Mar 2024 at 08:19, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Update makefiles for latency-collector and RTLA, using tools/build/
>   makefiles like perf does, inheriting its benefits.

Lovely. Now it all worked for me, and gave me the legible

  Auto-detecting system features:
  ...                           libtraceevent: [ on  ]
  ...                              libtracefs: [ OFF ]

  libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
  Makefile.config:29: *** Please, check the errors above..  Stop.

and after installing libtracefs-devel as suggested it finished cleanly.

Let's see if it works for others too,

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2 1/3] mm: kmsan: implement kmsan_memmove()
  @ 2024-03-20 16:04 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-20 16:04 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86, Tetsuo Handa,
	Dmitry Vyukov, Marco Elver

On Wed, 20 Mar 2024 at 03:18, Alexander Potapenko <glider@google.com> wrote:
>
> Provide a hook that can be used by custom memcpy implementations to tell
> KMSAN that the metadata needs to be copied. Without that, false positive
> reports are possible in the cases where KMSAN fails to intercept memory
> initialization.

Thanks, the series looks fine to me now with the updated 3/3.

I assume it will go through Andrew's -mm tree?

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL v2] tracing: Updates for v6.9
  @ 2024-03-19 21:22 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 21:22 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
	Alison Schofield, Beau Belgrave, Huang Yiwei, John Garry,
	Randy Dunlap, Thorsten Blum, Vincent Donnefort, linke li, llvm

On Tue, 19 Mar 2024 at 14:03, Nathan Chancellor <nathan@kernel.org> wrote:
>
> For what it's worth, I applied that change and built ARCH=x86_64
> defconfig with LLVM 18.1.1 from [1] but it does not appear to help the
> instances of -Wstring-compare; in fact, it adds some additional warnings
> that I have not seen before. I have attached the full build log.

Hmm. I'm no longer seeing any problems with commit 24f5bb9f24ad
("tracing: Just use strcmp() for testing __string() and __assign_str()
match").

But that's clang 17.0.6.

The patch that Steven sent out (and that I applied) is a bit different
from his "I'll change it to this" email, though. A couple of casts and
parentheses different.

So maybe current -git works for you?

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] more s390 updates for 6.9 merge window
  @ 2024-03-19 18:54 97% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 18:54 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Vasily Gorbik, Alexander Gordeev, linux-s390, linux-kernel

On Tue, 19 Mar 2024 at 07:12, Heiko Carstens <hca@linux.ibm.com> wrote:
>
> - Add new bitwise types and helper functions and use them in s390 specific
>   drivers and code to make it easier to find virtual vs physical address
>   usage bugs.

Hmm. Because you still want to be able to do arithmetic on them, this
is really what "__nocast" should be used for rather than "__bitwise".

__bitwise was intended (as the name implies) for things that can only
be mixed bitwise with similar types. It was _mainly_ for big-endian vs
little-endian marking, where it's actually perfectly fine to do
bitwise operations on two big-endian values without ever translation
them to "cpu endianness", but you can't for example do normal
arithmetic on them.

So __bitwise has those very specific rules that seem odd until you
realize what the reason for them are.

In contrast, your types actually *would* be fine with arithmetic and
logical operations being done on them, and that is what "__nocast"
really was meant to be.

But we basically never had much use for __nocast in the kernel, and
largely as a result __nocast was never fleshed out to work very well
(and it gets lost *much* too easily), so __bitwise it is.

Oh well.

It looks like it's not a lot of arithmetic you want to allow anyway,
so I guess the fact that __bitwise forces you to do some silly helper
functions for that isn't too much of an issue.

              Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] virtio: features, fixes
  @ 2024-03-19 18:03 98% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 18:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, alex.williamson,
	andrew, david, dtatulea, eperezma, feliu, gregkh, jasowang,
	jean-philippe, jonah.palmer, leiyang, lingshan.zhu,
	maxime.coquelin, ricardo, shannon.nelson, stable, steven.sistare,
	suzuki.poulose, xuanzhuo, yishaih

On Tue, 19 Mar 2024 at 00:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> virtio: features, fixes
>
> Per vq sizes in vdpa.
> Info query for block devices support in vdpa.
> DMA sync callbacks in vduse.
>
> Fixes, cleanups.

Grr. I thought the merge message was a bit too terse, but I let it slide.

But only after pushing it out do I notice that not only was the pull
request message overly terse, you had also rebased this all just
moments before sending the pull request and didn't even give a hit of
a reason for that.

So I missed that, and the merge is out now, but this was NOT OK.

Yes, rebasing happens. But last-minute rebasing needs to be explained,
not some kind of nasty surprise after-the-fact.

And that pull request explanation was really borderline even *without*
that issue.

                Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH v1 3/3] x86: call instrumentation hooks from copy_mc.c
  @ 2024-03-19 17:58 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 17:58 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86,
	Dmitry Vyukov, Marco Elver, Tetsuo Handa

On Tue, 19 Mar 2024 at 09:37, Alexander Potapenko <glider@google.com> wrote:
>
>         if (copy_mc_fragile_enabled) {
>                 __uaccess_begin();
> +               instrument_copy_to_user(dst, src, len);
>                 ret = copy_mc_fragile((__force void *)dst, src, len);
>                 __uaccess_end();

I'd actually prefer that instrument_copy_to_user() to be *outside* the
__uaccess_begin.

In fact, I'm a bit surprised that objtool didn't complain about it in that form.

__uaccess_begin() causes the CPU to accept kernel accesses to user
mode, and I don't think instrument_copy_to_user() has any business
actually touching user mode memory.

In fact it might be better to rename the function and change the prototype to

   instrument_src(src, len);

because you really can't sanely instrument the destination of a user
copy, but "instrument_src()" might be useful in other situations than
just user copies.

Hmm?

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v1 2/3] instrumented.h: add instrument_memcpy_before, instrument_memcpy_after
  @ 2024-03-19 17:52 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 17:52 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86,
	Dmitry Vyukov, Marco Elver, Tetsuo Handa

On Tue, 19 Mar 2024 at 09:37, Alexander Potapenko <glider@google.com> wrote:
>
> +/**
> + * instrument_memcpy_after - add instrumentation before non-instrumented memcpy

Spot the cut-and-paste.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL v2] tracing: Updates for v6.9
  @ 2024-03-19 16:23 96% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-19 16:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Alison Schofield,
	Beau Belgrave, Huang Yiwei, John Garry, Randy Dunlap,
	Thorsten Blum, Vincent Donnefort, linke li

On Mon, 18 Mar 2024 at 08:28, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Added checks to make sure that the source of __string() is also the
>   source of __assign_str() so that it can be safely removed in the next
>   merge window.

Aargh.

I didn't notice this initially, because it doesn't happen with gcc (or
maybe not with allmodconfig), but with clang I get

    CC [M]  net/sunrpc/sched.o
  In file included from net/sunrpc/sched.c:31:
  In file included from ./include/trace/events/sunrpc.h:2524:
  In file included from ./include/trace/define_trace.h:102:
  In file included from ./include/trace/trace_events.h:419:
  include/trace/events/sunrpc.h:707:4: error: result of comparison
against a string literal is unspecified (use an explicit string
comparison function instead) [-Werror,-Wstring-compare]

and then about 250 lines ot messy "explanations" for how it was
expanded because it happens on line 709 too in the same macro, and it
ends up being three macros deep or something.

So no, this all needs to be re-done. That

                WARN_ON_ONCE(__builtin_constant_p(src) ?                \
                             strcmp((src), __data_offsets.dst##_ptr_) : \
                             (src) != __data_offsets.dst##_ptr_);       \

does *NOT* work.

Also, looking at that __assign_str() macro, it seems literally insane.
On the next line it will do

                memcpy(__str__, __data_offsets.dst##_ptr_ ? :           \
                       EVENT_NULL_STR, __len__);                        \

so now it checks "__data_offsets.dst##_ptr_" for NULL - but that's one
line after it just did that strcmp on it.

WTF?

This code is completely bogus.

               Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL v2] dlm fixes for 6.9
  @ 2024-03-18 22:44 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-18 22:44 UTC (permalink / raw)
  To: David Teigland; +Cc: linux-kernel, gfs2

On Mon, 18 Mar 2024 at 14:25, David Teigland <teigland@redhat.com> wrote:
>
> I dropped the commit with the bad atomic usage, and replaced it with two
> other commits: the first reverts the unnecessary change that began using
> atomic_t for lkb_wait_count, and the second adds comments to the recovery
> code that forcibly resets the wait_count state.

Ok, the diff certainly looks saner. I didn't look at the code outside
the context of the diff, so that's literally just going by the patches
themselves, but I appreciate the comment ("The wait_count will almost
always be 1, but in case of an overlapping unlock/cancel it could be
2: see ..") and yes, it just makes the old atomic thing sound even
odder.

Thanks,

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] vfs fixes
  2024-03-18 19:14 92% ` Linus Torvalds
@ 2024-03-18 19:41 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-18 19:41 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Mon, 18 Mar 2024 at 12:14, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, isn't the 'get()' always basically paired with the mounting? And
> the 'put()' would probably be best done iin kill_block_super()?

.. or alternative handwavy approach:

 The fundamental _reason_ for the ->get/put seems to be to make the
'holder' lifetime be at least as long as the 'struct file' it is
associated with. No?

So how about we just make the 'holder' always *be* a 'struct file *'? That

 (a) gets rid of the typeless 'void *' part

 (b) is already what it is for normal cases (ie O_EXCL file opens).

wouldn't it be lovely if we just made the rule be that 'holder' *is*
the file pointer, and got rid of a lot of typeless WTF code?

Again, this comment (and the previous email) is more based on "this
does not feel right to me" than anything else.

That code just makes my skin itch. I can't say it's _wrong_, but it
just FeelsWrongToMe(tm).

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] vfs fixes
  @ 2024-03-18 19:14 92% ` Linus Torvalds
  2024-03-18 19:41 99%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-18 19:14 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Mon, 18 Mar 2024 at 05:20, Christian Brauner <brauner@kernel.org> wrote:
>
> * Take a passive reference on the superblock when opening a block device
>   so the holder is available to concurrent callers from the block layer.

So I've pulled this, but I have to admit that I hate it.

The bdev "holder" logic is an abomination. And "struct blk_holder_ops"
is horrendous.

Afaik, we have exactly two cases of "struct blk_holder_ops" in the
whole kernel, and you edited one of them.

And the other one is in bcachefs, and is a completely empty one with
no actual ops, so I think that one shouldn't exist.

In other words, we have only *one* actual set of "holder ops".  That
makes me suspicious in the first place.

Now, let's then look at that new "holder->put_holder" use. It has
_one_ single user too, which is bd_end_claim(), which is called from
one place, which is bdev_release(). Which in turn is called from
exactly one place, which is blkdev_release(). Which is the release
function for def_blk_fops. Which is called from __fput() on the last
release of the file.

Fine, fine, fine. So let's chase down *who* actually uses that single
"blk_holder_ops". And it turns out that it's used in three places:
fs/super.c, fs/ext4/super.c, and fs/xfs/xfs_super.c.

So in those three cases, it would be absolutely *wrong* if the
'holder' was anything but the super-block (because that's what the new
get/put functions require for any of this to work.

This all smells horribly bad to me. The code looks and acts like it is
some generic interface, but in reality it really isn't. Yes, bcachefs
seems to make up some random holder (it's a one-byte kmalloc that
isn't actually used), and a random holder op structure (it's empty, as
mentioned), but none of this makes any sense at all.

I get the feeling that the "get/put" operations should just be done in
the three places that currently use that 'fs_holder_ops'.

IOW, isn't the 'get()' always basically paired with the mounting? And
the 'put()' would probably be best done iin kill_block_super()?

I don't know. Maybe I missed something really important, but this
smells like a specific case that simply shouldn't have gotten this
kind of "generic infrastructure" solution.

               Linus

^ permalink raw reply	[relevance 92%]

* Re: [GIT PULL] tracing: Updates for v6.9
  @ 2024-03-16 20:42 88%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 20:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
	Beau Belgrave, Chengming Zhou, Huang Yiwei, John Garry,
	Randy Dunlap, Thorsten Blum, Vincent Donnefort, linke li,
	Daniel Bristot de Oliveira, x86-ml

On Sat, 16 Mar 2024 at 13:00, Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 16, 2024 at 11:42:42AM -0700, Linus Torvalds wrote:
> > Now, I'm not suggesting anything like the multiple topic branches from
> > -tip (from a quick check, there's been a total of 25 tip/tip topic
> > branches merged just this merge window), but for clear new features
> > definitely.
>
> So some of those branches are really tiny (1-2 patches) during some
> cycles so I have often wondered whether I should merge those small
> branches into a single pull...
>
> So as not to have too many tiny pull requests.
>
> Any preference?

Not really any strong preferences.

The really tiny ones are so easy to pull that pulling a few random
ones just isn't an issue.

I've been known to occasionally end up doing an octopus merge if I
decide that I might as well just merge multiple small branches in one
go, but honestly, I stopped doing that because it's just simpler to do
two really trivial merges than to even bother thinking about "should I
just merge these all together".

So I don't mind getting three or more random small pulls if they all
still make sense (ie they are clearly separate things).

Now, if you send me three separate pulls for basically the same
conceptual thing, that might annoy me just because it would be so
pointless.

But if it's a "one pull to fix a single-line issue in resource
control, and another pull to fix a single-line issue in objtool", then
those make perfect sense to keep separate, even if they are both
trivial and small.

And on the other hand, if you have a couple of trivial branches with
no real pattern, and decide to just merge them into one that fixes
"misc x86 problems", and the end result is still completely trivial
and there are no surprises or gotchas, that's not wrong either.

And sometimes, merging and sending me just one pull request is
absolutely the right thing.

For example, the ARM SoC trees tend to just merge "umbrella" updates
into one single pull request, and I prefer that - because I see no
point in getting ten different "this is the drivers for SoC xyz"
thing.

So then it's still a clear topic branch ("ARM SoC drivers"), but they
kept multiple branches for different SoC's and sent me just one pull
request.

End result: there's no one right thing.  Make it make sense. Probably
the only real rule is

 - try to keep conceptually different things separate just for cleanliness

 - definitely keep fundamental new features or anything that _might_
be questionable in a branch of its own

but there aren't some kind of black-and-white rules for "this is so
small that it's not worth sending on its own".

This merge window, I think I currently have something like ~15 merges
that ended up being literally just a couple of lines (maybe spread
over two or three files). I don't mind at all. If that's all that
happened, that's fine.

               Linus

^ permalink raw reply	[relevance 88%]

* Re: [GIT PULL] tracing: Updates for v6.9
  @ 2024-03-16 18:42 93%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-16 18:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
	Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
	Thorsten Blum, Vincent Donnefort, linke li,
	Daniel Bristot de Oliveira

On Sat, 16 Mar 2024 at 11:20, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> 1) Rebase without them (I know how much you love rebasing)

This.

Except honestly, the pulls are getting to be so complicated for me
because I have to check them, that I'd really like you to start doing
topic branches for individual things.

That's what we ended up doing with the security layers too, because
there were too many cases of "that is broken, I can't pull it", and
then having one single branch for everything meant that it was always
a "all or nothing" thing.

The security layer issues have largely gone away, but I still pull
things individually, and I think it actually ended up working out
well. Yes, I see more pulls, but not only are they clearer for me, the
code history ends up being much clearer too.

So topic branches tend to make for more actual pull requests, but when
the individual pulls are smaller and have clear "this branch does XYZ
and nothing more", it turns out that the actual effort per pull ends
up being less, and it actually clarifies things a lot too.

In fact, the x86 -tip people ended up doing topic branches just to
make things easier to review, rather than any "I can't pull that"
issues, and I think it actually ended up being something that they
preferred to do anyway.

Now, I'm not suggesting anything like the multiple topic branches from
-tip (from a quick check, there's been a total of 25 tip/tip topic
branches merged just this merge window), but for clear new features
definitely.

And no cross-merges between those topic branches, because that defeats
the whole purpose.

Do you have to do it for the current situation where I just can't take
the mmap stuff? No. But please look at it going forward.

           Linus

^ permalink raw reply	[relevance 93%]

* Re: [GIT PULL]: Generic phy updates for v6.9
  @ 2024-03-16 18:23 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 18:23 UTC (permalink / raw)
  To: Vinod Koul; +Cc: LKML

On Sat, 16 Mar 2024 at 11:05, Vinod Koul <vkoul@kernel.org> wrote:
>
> On 15-03-24, 12:22, Linus Torvalds wrote:
> >
> > That is not a valid signed tag, and I can't find one in that repo.
>
> It was pushed: tags/phy-for-6.9, I erred in generating the request for
> sure

Ahh. I did do a "git ls-remote" to try to find it, but I must have
messed up searching for it.

>   git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git tags/phy-for-6.9

Thanks, now pulled.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Updates for v6.9
  2024-03-16 16:59 93%   ` Linus Torvalds
@ 2024-03-16 18:18 97%     ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 18:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
	Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
	Thorsten Blum, Vincent Donnefort, linke li,
	Daniel Bristot de Oliveira

On Sat, 16 Mar 2024 at 09:59, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  - I'd suggest marking it all VM_DONTCOPY | VM_IO | VM_DONTEXPAND to
> not let people play games with the mapping.

You already did set VM_DONTCOPY (and VM_DONTDUMP is a good idea too).
And you cleared VM_MAYWRITE. Those are all good.

I'd also suggest requiring the mma[ to be MAP_SHARED.

With a read-only mapping, that doesn't really do all that much, but I
don't think you actually need the vm_ops at all once you do everything
at mmap() time, and then it causes a SIGBUS instead of a "insert zero
page".

And _technically_ it could tell the architecture code to try to align
the mapping to the cache aliasing boundaries.

Of course, because of how you insert the meta-page at the beginning of
the mapping, you end up with the actual page table entries not aligned
anyway, so it doesn't actually help the cache coloring, but it's still
conceptually the right thing to do. So even if it ends up mostly just
a "document the fact that these are shared with the kernel" flag, I
think it's a good idea.

               Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] tracing: Updates for v6.9
  2024-03-16 16:31 99% ` Linus Torvalds
@ 2024-03-16 16:59 93%   ` Linus Torvalds
  2024-03-16 18:18 97%     ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 16:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
	Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
	Thorsten Blum, Vincent Donnefort, linke li,
	Daniel Bristot de Oliveira

On Sat, 16 Mar 2024 at 09:31, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So instead of merging a new feature that was mis-designed and is
> already having code working around its mis-design, I'm not merging it
> at all.

Here's a clue: when hacking up VFS code, ask for ACK's from the VFS people.

And when hacking up MM code, make damn sure that you have VM people involved.

No more of this "random code that happens to work in my tests"
garbage. Yes, I'm sure that others have done this same disgusting page
counting hack and this was copied-and-pasted from some other
disgusting source, but because of all the history, I'm now looking at
tracing pulls arefully, and I'm simply not allowing any broken hacks.

So in addition to getting actual VM people to help you with mapping
stuff (hard requirement), I would also suggest:

 - your allocation has to be live over the whole mmap (and that's due
to other fundamental issues - you're not even trying to deal with
actual dynamic allocations and thank Cthulhu for that), and the code
is literally designed that way, so then faulting pages in one at a
time and refcounting them one at a time is just pointless and wrong.
Just do it all at mmap time.

 - I'd suggest marking it all VM_DONTCOPY | VM_IO | VM_DONTEXPAND to
not let people play games with the mapping.

 - avoid all the sub-page ref-counts entirely by using VM_PFNMAP, and
use vm_insert_pages()

and a random note:

 - from a TLB pressure standpoint, it might be a good idea to try to
keep the page table entries naturally aligned, so putting that one
status page at the beginning is likely a bad idea. It will typically
mean that hardware that can silently use larger TLB entries for
aligned pages won't be able to do so.

but the effect of that is likely fairly small.

                Linus

^ permalink raw reply	[relevance 93%]

* Re: [GIT PULL] tracing: Updates for v6.9
  @ 2024-03-16 16:31 99% ` Linus Torvalds
  2024-03-16 16:59 93%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-16 16:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
	Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
	Thorsten Blum, Vincent Donnefort, linke li,
	Daniel Bristot de Oliveira

On Fri, 15 Mar 2024 at 09:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Add ring_buffer memory mappings

I pulled this, looked at it, and unpulled it again.

I don't want to have years of "fix up the mistakes after the fact".

This is all done entirely incorrectly, and just as an example of that,
subbuf_map_prepare() is another case of "tracing code works around the
fact that it did things wrong in the first place".

So instead of merging a new feature that was mis-designed and is
already having code working around its mis-design, I'm not merging it
at all.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] KVM changes for Linux 6.9 merge window
  @ 2024-03-16 16:01 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 16:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oliver Upton, Marc Zyngier, Catalin Marinas, Mark Rutland,
	Will Deacon, linux-kernel, kvm

On Sat, 16 Mar 2024 at 01:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Linus, were you compiling with allyesconfig so that you got
> CONFIG_KVM_ARM64_RES_BITS_PARANOIA on?

Regular allmodconfig.

> You can also make CONFIG_KVM_ARM64_RES_BITS_PARANOIA depend on !COMPILE_TEST.

No.

WTF is wrong with you?

You're saying "let's turn off this compile-time sanity check when
we're doing compile testing".

That's insane.

The sanity check was WRONG. People hadn't tested it. Stephen points
out that it was reported to you almost a month ago in

    https://lore.kernel.org/linux-next/20240222220349.1889c728@canb.auug.org.au/

and you're still trying to just *HIDE* this garbage?

Stop it.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: [patch 5/9] x86: Cure per CPU madness on UP
  @ 2024-03-16  1:23 99%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16  1:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, x86, Uros Bizjak, linux-sparse, lkp,
	oe-kbuild-all, Arnd Bergmann

On Fri, 15 Mar 2024 at 18:11, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> You wish. We still support 486 and some of the still produced 486 clones
> do not have a local APIC.

Ouch. I was _sure_ we had dropped i486 support too due to cmpxchg8b.

But apparently that was just a discussion, and my wishful thinking,
and we never actually followed through.

         Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] Revert "KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking"
  @ 2024-03-16  0:51 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16  0:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, Catalin Marinas, Mark Rutland,
	Will Deacon, linux-kernel, kvm, kvmarm, James Morse,
	Suzuki K Poulose, Zenghui Yu

On Fri, 15 Mar 2024 at 17:25, Oliver Upton <oliver.upton@linux.dev> wrote:
>
> This reverts commits 99101dda29e3186b1356b0dc4dbb835c02c71ac9 and
> b80b701d5a67d07f4df4a21e09cb31f6bc1feeca.

Applied.  Thanks,

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [patch 5/9] x86: Cure per CPU madness on UP
  @ 2024-03-15 23:23 93%           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 23:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, x86, Uros Bizjak, linux-sparse, lkp, oe-kbuild-all

On Fri, 15 Mar 2024 at 15:55, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Not really. The problem is that a SMP build can run on a UP machine w/o
> APIC or command line disables the APIC and will run into the exactly
> same problem. The only case where we know that it is impossible is when
> APIC support is disabled, which is silly but topic for a different
> discussion.

Oh, I agree - that was why I said that it shouldn't depend on a local
APIC on machines that may not even have one.

That "may not even have one" can still be a static option - we
technically allow 32-bit UP kernel to not enable X86_UP_APIC, although
it might be time to drop that option.

> So the proper thing to do is to check for num_possible_cpus() == 1 in
> that function.

I think that's _one_ proper thing. I still think that the deeper
problem is that it still looks at local apic rules even when those
rules are completely nonsensical.

For example, that MAX_LOCAL_APIC range test may not matter simply
because it's testing a constant value, but it still smells entirely
wrong to even check for that, when the system doesn't necessarily have
one.

So I think your patch may fix the immediate bug, but I think it's
still just a band-aid.

Either we should just make all machines look like they have the proper
local apic mappings, or we shouldn't look at any local apic rules AT
ALL.

So I'd rather see those apic_maps[] just be properly filled in.

> Sure you can argue that we could avoid it for SMP=n builds completely,
> but I think the right thing to do is to aim for removing CONFIG_SMP and
> make the UP build a subset of a generic SMP capable build which has
> CONFIG_NR_CPUS=1, i.e. num_possible_cpus() = 1. Why?

I wouldn't be entirely opposed to just doing that. UP has become
fairly irrelevant.

That said, UP is *not* entirely irrelevant on other architectures, and
if we drop UP support on x86, we'll be effectively dropping a lot of
coverage testing. The number of people who do cross-compilers is
pretty small.

End result: I'd *much* rather get rid of X86_UP_APIC and the "nolapic"
kernel command line, and say "even UP has to have a local APIC".

We already require a Pentium-class CPU, so in practice we already
require that local APIC setup. And yes, machines existed where it
could be turned off, but I don't think that is relevant any more.

Put another way: I think "UP config for wider build testing" is a
_lot_ more relevant than "no LAPIC support".

             Linus

^ permalink raw reply	[relevance 93%]

* Re: [GIT PULL] KVM changes for Linux 6.9 merge window
  @ 2024-03-15 22:28 98% ` Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 22:28 UTC (permalink / raw)
  To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Catalin Marinas,
	Mark Rutland, Will Deacon
  Cc: linux-kernel, kvm

On Fri, 15 Mar 2024 at 10:49, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>   https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

Argh.

This causes my arm64 build to fail, but since I don't do that between
every pull, I didn't notice until after I had already pushed things
out.

I get a failure on arch/arm64/kvm/check-res-bits.h (line 60):

        BUILD_BUG_ON(ID_AA64DFR1_EL1_RES0       != (GENMASK_ULL(63, 0)));

and at least in my build, the generated sysreg-defs.h file has

 #define ID_AA64DFR1_EL1_RES0 (UL(0))

so yeah, it most definitely doesn't match that GENMASK_ULL(63, 0).

I did *not* go delve into how arch/arm64/tools/gen-sysreg.awk works. I
don't really do awk any more.

The immediate cause of the failure is commit b80b701d5a67 ("KVM:
arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later
checking") but I hope it worked at *some* point. I can't see how.

I would guess / assume that commit cfc680bb04c5 ("arm64: sysreg: Add
layout for ID_AA64MMFR4_EL1") is also involved, but having recoiled in
horror from the awk script, I really can't even begin to guess at what
is going on.

Bringing in other people who hopefully can sort this out.

                   Linus

^ permalink raw reply	[relevance 98%]

* Re: [GIT PULL] Crypto Update for 6.9
  @ 2024-03-15 21:51 99%                     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 21:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Linux Kernel Mailing List, Linux Crypto Mailing List

On Thu, 14 Mar 2024 at 20:04, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Drivers:
>
> - Add queue stop/query debugfs support in hisilicon/qm.

There's a lot more than that in there. Fairl ybig Intel qat updates
from what I can see, for example.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL]: Generic phy updates for v6.9
  @ 2024-03-15 19:22 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 19:22 UTC (permalink / raw)
  To: Vinod Koul; +Cc: LKML

On Fri, 15 Mar 2024 at 04:03, Vinod Koul <vkoul@kernel.org> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git next

That is not a valid signed tag, and I can't find one in that repo.

I know you know how to do this right, so please send me a proper pull
request with a signed tag like you usually do...

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] clk changes for the merge window
  @ 2024-03-15 18:54 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 18:54 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: Michael Turquette, linux-clk, linux-kernel

On Thu, 14 Mar 2024 at 12:43, Stephen Boyd <sboyd@kernel.org> wrote:
>
> I'm hoping that we can make that into a genpd that drivers attach
> instead, but this API should help drivers simplify in the meantime.

.. and I'm hoping that name dies in the code too, not just in the
directory structure.

'genpd' really makes absolutely zero sense as a name to anybody
outside of that legacy clique.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] fs/9p patches for 6.9 merge window
  @ 2024-03-15 17:17 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 17:17 UTC (permalink / raw)
  To: Eric Van Hensbergen; +Cc: v9fs, linux-kernel

On Fri, 15 Mar 2024 at 08:10, Eric Van Hensbergen <ericvh@kernel.org> wrote:
>
> fs/9p changes for the 6.9 merge window

Entirely tangential, but your pgp key drives me insane, and it finally
drove me over the edge.

One of your uid's has your own name mis-spelled. This is not new.

Please tell me there's a reason for it.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] dlm fixes for 6.9
  @ 2024-03-15 17:10 87% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 17:10 UTC (permalink / raw)
  To: David Teigland; +Cc: linux-kernel, gfs2

On Thu, 14 Mar 2024 at 11:43, David Teigland <teigland@redhat.com> wrote:
>
> Fix two refcounting bugs from recent changes:
> - misuse of atomic_dec_and_test results in missed ref decrement
> - wrong variable assignment results in another missed ref decrement

I pulled this, and then I unpulled it again.

That code is insane.

This is *NOT* sane or valid code:

+               while (atomic_read(&lkb->lkb_wait_count)) {
+                       if (atomic_dec_and_test(&lkb->lkb_wait_count))
+                               list_del_init(&lkb->lkb_wait_reply);
+
+                       unhold_lkb(lkb);
+               }

the above is completely crazy. That's simply not how atomics work.

What's the point of using a refcount - an atomic one at that - if you
just use it as a counter. That's not a "reference count", that's
literally just broken.

The whole - and *ONLY* - point of a refcount is that you are counting
references. References that *YOU* hold. Not that somebody else is
holding and you are releasing.

If you're the only holder of any counts, don't make them atomic, don't
put them in a data structure. But you're *not* the only holder fo that
refcount here, are you?

Using atomics for this kind of sequence shows some crazy crazy
behavior. It's not valid to say "ok, as long as this atomic is not
zero, let's decrement it and test if it's not zero".

Because for an atomic value to MAKE SENSE IN THE FIRST PLACE, there
could be somebody else that comes in and also possibly decrements it.

And if that happens between the test of "is this zero" and "did I
decrement it to zero", you now had two decrements, and that value is
now negative. So you didn't really have an atomic value, because you
did two operations on it.

And dammit, if that mutex means that it cannot happen, then WHY WAS IT
AN ATOMIC IN THE FIRST PLACE?

IOW, if you have locking that protects the value, then atomic accesses
are STILL wrong.

So there is not a single situation where I can see the above kind of
code ever being valid.

Now, if the issue is that you want to clean up something that is never
getting cleaned up by anybody else, and this is a fatal error, and
you're just trying to fix things up (badly), and you know that this is
all racy but the code is trying to kill a dead data structure, then
you should

 (a) need a damn big comment (bigger than the comment is already)

 (b) should *NOT* pretend to do some stupid "atomic decrement and test" loop

IOW, if what you want to do is get rid of stuck entries and set the
refcount to zero, then doing that would probably be something like

        /* This is broken, but.. */
        stale = atomic_xchg((&lkb->lkb_wait_count, 0);
        if (stale) {
                list_del_init(&lkb->lkb_wait_reply);
                do { unhold_lkb(lkb); } while (--stale);
        }

and it needs a much bigger comment than that "This is broken".

(And I don't know if you want that list_del_init() before or after the
'unhold N times' loop).

The above is still completely broken, but at least it doesn't do some
kind of odd non-atomic test and decrement stuff in a loop, and
hopefully makes it clear that we're very much talking about fixing up
stale final values

And no, I didn't look at the code around it. Because I really think
that "while (atomic_read(...)" loop cannot POSSIBLY be correct,
regardless of any context.

                Linus

^ permalink raw reply	[relevance 87%]

* Re: [patch 5/9] x86: Cure per CPU madness on UP
  @ 2024-03-15 16:42 97%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 16:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thomas Gleixner, LKML, x86, Uros Bizjak, linux-sparse, lkp,
	oe-kbuild-all

On Fri, 15 Mar 2024 at 09:17, Guenter Roeck <linux@roeck-us.net> wrote:
>
> [    3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
> [    3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff

The code is

  mov    %rax,0x60(%rbx)
  call   0x125f5f
  mov    0x26184b4(%rip),%edx
  mov    0x3078f01(%rip),%rax
  movq   $0xffffffffb68084e0,0x90(%rbx)
  mov    %rbx,0x138(%rax,%rdx,8)                <-- trapping instruction
  jmp    <backwards>

with %rdx being some index having the value 0xffffffed (-19).

That's ENODEV.

Without line numbers (if you have debug info for that kernel, it's
good to run "scripts/decode_stacktrace.sh" on stack traces) it's hard
to really know what's up, but I strongly suspect that it's this:

        rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;

because we have

   topology_logical_die_id(cpu) ->
       (cpu_data(cpu).topo.logical_die_id)

and we have

    c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);

and topology_get_logical_id() does this:

        if (lvlid >= MAX_LOCAL_APIC)
                return -ERANGE;
        if (!test_bit(lvlid, apic_maps[at_level].map))
                return -ENODEV;

so that -ENODEV is not entirely unlikely for a UP run.

This also explains why it *used* to work - that whole thing is new to
the current merge window and came in through commit ca7e91776912
("Merge tag 'x86-apic-2024-03-10' of ...").

Thomas, over to you. I wonder if maybe all those topology macros
should just return 0 on an UP build, but that
topology_get_logical_id() thing looks a bit wrong regardless.

It really shouldn't depend on local apic data for configs that may not
*have* a local apic.

                 Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] lsm/lsm-pr-20240314
  @ 2024-03-14 23:05 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 23:05 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, linux-kernel

On Thu, 14 Mar 2024 at 13:31, Paul Moore <paul@paul-moore.com> wrote:
>
> I would like if you could merge these patches, I believe fixing the
> syscall signature problem now poses very little risk and will help us
> avoid the management overhead of compat syscall variants in the future.
> However, I'll understand if you're opposed, just let me know and I'll
> get you a compat version of this pull request as soon as we can get
> something written/tested/verfified.

No, attempting to just fix it after-the-fact in the hopes that nobody
actually uses the new system call yet sounds like the right thing to
do.

6.8 has been out for just days, and I see it's marked for stable, so
hopefully nobody ever even sees the mistake. I can't imagine that the
new system call is that eagerly used.

Famous last wods.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] platform-drivers-x86 for v6.9-1
       [not found]     <65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2024-03-14 18:36 97% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 18:36 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, PDx86, Hans de Goede, Andy Shevchenko

On Thu, 14 Mar 2024 at 04:04, Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com> wrote:
>
> Here is the main PDx86 PR for v6.9.

So I've obviously pulled this, and pr-tracker-bot already replied to
that effect.

However, it turns out that the pr-tracker-bot reply didn't thread
correctly for me, and I looked into why.

Your SMTP setup is oddly broken. It looks like your original email was
sent with a bogus Message-ID.

So in my headers, I see how gmail has added a properly formatted Message-ID:

  <65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>

and lists your original broken one as

  <4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>

which indeed is completely wrong.

I have no idea how you managed that, since your headers don't actually
seem to specify the MUA you used. But whatever it was, it's very very
mis-configured.

The pr-tracker-bot reply does have that original Message-ID in its
threading notes:

  In-Reply-To: <4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>
  References: <4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>

but it doesn't thread for me because the message-id from the original
email got rewritten as something valid.

Can you please look into fixing whatever MUA you used for sending that
pull request?

This is obviously not a deal breaker, but it's odd.

              Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] bcachefs updates for 6.9
  @ 2024-03-14 17:15 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 17:15 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Darrick J. Wong, linux-bcachefs, linux-fsdevel, linux-kernel

On Wed, 13 Mar 2024 at 15:28, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Sorry, you were talking about mean absolute deviation; that does work
> here.

Yes, I meant mean, not median.

But the confusion is my fault - I wrote MAD and then to "explain"
that, I put "median" in my own email - so you read it right the first
time, and it was just me being sloppy and confusing things.

They are both called MAD in their own contexts, and they are much too
easy to confuse.

My bad,

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [git pull] drm for 6.9-rc1
  @ 2024-03-14  1:49 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14  1:49 UTC (permalink / raw)
  To: Dave Airlie, Animesh Manna, Jani Nikula; +Cc: Daniel Vetter, dri-devel, LKML

On Tue, 12 Mar 2024 at 21:07, Dave Airlie <airlied@gmail.com> wrote:
>
> I've done a trial merge into your tree from a few hours ago, there
> are definitely some slighty messy conflicts, I've pushed a sample
> branch here:

I appreciate your sample merges since I like verifying my end result,
but I think your merge is wrong.

I got two differences when I did the merge. The one in
intel_dp_detect() I think is just syntactic - I ended up placing the

        if (!intel_dp_is_edp(intel_dp))
                intel_psr_init_dpcd(intel_dp);

differently than you did (I did it *after* the tunnel_detect()).

I don't _think,_ that placement matters, but somebody more familiar
with the code should check it out. Added Animesh and Jani to the
participants.

But I think your merge gets the TP_printk() for the xe_bo_move trace
event is actively wrong. You don't have the destination for the move
in the printk.

Or maybe I got it wrong. Our merges end up _close_, but not identical.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
  @ 2024-03-14  1:15 97%                       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14  1:15 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Russell King (Oracle),
	Joel Fernandes, Boqun Feng, Anna-Maria Behnsen, linux-kernel,
	kernel-team, paulmck, mingo, tglx, rcu, neeraj.upadhyay, urezki,
	qiang.zhang1211, frederic, bigeasy, chenzhongjin, yangjihong1,
	rostedt, Justin Chen

On Wed, 13 Mar 2024 at 16:29, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On this specific commit 7ee988770326fca440472200c3eb58935fe712f6, there
> is a 100% failure for at least 3 devices out of the 16 that are running
> the test.

Hmm.  I have no idea what is going on, and the unimac-mdio probe
function (one of the things that seem to take forever on your setup)
looks fairly simple.

There doesn't even seem to be any timers involved.

That said - one of the things it does is

  unimac_mdio_probe ->
    unimac_mdio_clk_set ->
      clk_prepare_enable

and maybe that's a pattern, because you report that
brcm_pcie_resume_noirq is another problem spot (on resume).

And guess what brcm_pcie_resume_noirq() does?

Yup. clk_prepare_enable().

So I'm wondering if there's some interaction with some clock driver?
That might explain why it shows up on some arm platforms but not
elsewhere.

I may be barking *entirely* up the wrong tree, though. I was just
looking at that unimac probe and going "there's absolutely _nothing_
timer-related here" and that clk thing looked like it might at least
have _some_ relevance.

            Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] bcachefs updates for 6.9
  @ 2024-03-13 21:51 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-13 21:51 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Darrick J. Wong, linux-bcachefs, linux-fsdevel, linux-kernel

On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> I liked your MAD suggestion, but the catch was that we need an
> exponentially weighted version,

The code for the weighted version literally doesn't change.

The variance value is different, but the difference between MAD and
standard deviation is basically just a constant factor (which will be
different for different distributions, but so what? Any _particular_
case will have a particular distribution).

So why would a constant factor make _any_ difference for any
exponential weighting?

Anyway, feel free to keep your code in bcachefs.

And maybe xfs even wants to copy that code. I don't care, it seems
stupid, but that's a filesystem choice.

But if we're making it a generic kernel library, it needs to be sane.
Not making people do 64-bit square roots and 128-bit divides just for
a random statistical element.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [RFC PATCH 1/2] smp: Implement serialized smp_call_function APIs
  @ 2024-03-13 21:19 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 21:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Oskolkov, linux-kernel, Peter Zijlstra, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On Wed, 13 Mar 2024 at 13:56, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> Introduce serialized smp_call_function APIs to limit the number of
> concurrent smp_call_function IPIs which can be sent to a given CPU to a
> maximum of two: one broadcast and one specifically targeting the CPU.

So  honestly, with only one user, I think the serialization code
should be solidly in that one user, not in kernel/smp.c.

Also, this kind of extra complexity does require numbers to argue for it.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] bcachefs updates for 6.9
  @ 2024-03-13 20:47 94% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-13 20:47 UTC (permalink / raw)
  To: Kent Overstreet, Darrick J. Wong
  Cc: linux-bcachefs, linux-fsdevel, linux-kernel

On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Hi Linus, few patches for you - plus a simple merge conflict with VFS
> changes:

The conflicts are trivial.

The "make random bcachefs code be a library function" stuff I looked
at, decided is senseless, and ended up meaning that I'm not pulling
this without a lot more explanation (and honestly, I don't think the
explanations would hold water).

That "stdio_redirect_printf()" and darray_char stuff is just
horrendous interfaces with no explanations. The interfaces are
disgusting.

Keep it in your own code where it belongs, don't try to make it some
generic library thing.

And if you *do* make it a library thing, it needs to be

 (a) much more explained

 (b) have much saner naming, and fewer disgusting and completely
nonsensical interfaces ("DARRAY()").

And no, finding one other filesystem to share this kind of code is not
sufficient to try to claim it's a sane interface and sane naming.

But the main dealbreaker is the insane math.

And dammit, we talked about the idiotic "mean and variance" garbage
long ago. It was wrong back then, it's *still* wrong.

You didn't explain why it couldn't use the *much* simpler MAD (median
absolute deviation) instead of using variance.

That bad decision directly results in that pointless use of overly
complex 128-bit math.

I called it insanely over-engineered back then, and as far as I can
tell, absolutely *NOTHING* has changed apart from some slight type
name details.

As long as you made it some kind of bcachefs-only thing, I don't mind.

But now you're trying to push this garbage as some kind of generic
library code that others would use, and that immediately means that I
*do* mind insanely overengineered interfaces.

The time_stats stuff otherwise looks at leask like a sane interface
with names and uses, but the use of that horrendous infrastructure
scuttles it.

              Linus

^ permalink raw reply	[relevance 94%]

* Re: [GIT PULL] vfs pidfd
  @ 2024-03-13 19:40 99%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 19:40 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Wed, 13 Mar 2024 at 10:10, Christian Brauner <brauner@kernel.org> wrote:
>
> If you're fine with it I would ask you to please just apply it [..]

I'll take it directly, no problem.

Thanks,
             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] slab updates for 6.9
  @ 2024-03-13  3:54 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13  3:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo, Chengming Zhou, Xiongwei Song

On Tue, 12 Mar 2024 at 02:55, Vlastimil Babka <vbabka@suse.cz> wrote:
>
>       Also deprecate SLAB_MEM_SPREAD which was only
>   used by SLAB, so it's a no-op since SLAB removal. Assign it an explicit zero
>   value.  The removals of the flag usage are handled independently in the
>   respective subsystems, with a final removal of any leftover usage planned
>   for the next release.

I already had the patch ready to go:

    https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/

so I just did a "git stash apply" and got rid of the final stragglers.
No need to have various random maintainers have to worry about a flag
that hasn't had any meaning since 6.7, and very little before that
either.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Networking for v6.9
    2024-03-12 20:17 99% ` Linus Torvalds
@ 2024-03-13  1:00 99% ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13  1:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni, bpf

On Mon, 11 Mar 2024 at 21:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
>  - Large effort by Eric to lower rtnl_lock pressure and remove locks:

W00t!

Pulled. The rtnl lock is probably my least favorite kernel lock. It's
been one of the few global locks we have left (at least that matters).

There are others (I'm not claiming tasklist_lock is great), but
rtnl_lock has certainly been "up there" with the worst of them.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
  @ 2024-03-12 21:44 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 21:44 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Boqun Feng, linux-kernel, kernel-team, paulmck, mingo, tglx, rcu,
	joel, neeraj.upadhyay, urezki, qiang.zhang1211, frederic,
	bigeasy, anna-maria, chenzhongjin, yangjihong1, rostedt

On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> and here is a log where this fails:
>
> https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8

You could try the 'initcall_debug' kernel command line.

It will make the above *much* noisier, but it might - thanks to all
the new noise - show exactly *what* is being crazy slow to initialize.

Because right now it's just radio silence in between those

  [    1.926435] bcmgenet f0480000.ethernet: GENET 5.0 EPHY: 0x0000
  [  162.148135] unimac-mdio unimac-mdio.0: Broadcom UniMAC MDIO bus

things, and that's presumably because some random initcall there just
takes forever to time out.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Networking for v6.9
  @ 2024-03-12 21:11 95%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 21:11 UTC (permalink / raw)
  To: Jakub Kicinski, Jens Axboe, Johannes Thumshirn
  Cc: davem, netdev, linux-kernel, pabeni, bpf, Tejun Heo

On Tue, 12 Mar 2024 at 13:47, Jakub Kicinski <kuba@kernel.org> wrote:
>
> With your tree as of 65d287c7eb1d it gets to prompt but dies soon after
> when prod services kick in (dunno what rpm Kdump does but says iocost
> so adding Tejun):

Both of your traces are timers that seem to either lock up in ioc_now():

   https://lore.kernel.org/all/20240312133427.1a744844@kernel.org/

and now it looks like ioc_timer_fn():

  https://lore.kernel.org/all/20240312134739.248e6bd3@kernel.org/

But in neither case does it actually look like it's a lockup on a *lock*.

IOW, the NMI isn't happening on some spin_lock sequence or anything like that.

Yes, ioc_now() could have been looping on the seq read-lock if the
sequence number was odd. But the writers do seem to be done with
interrupts disabled, plus then you wouldn't have this lockup in
ioc_timer_fn, so it's probably not that.

And yes, ioc_timer_fn() does take locks, but again, that doesn't seem
to be where it is hanging.

So it smells like it's an endless loop in ioc_timer_fn() to me, or
perhaps retriggering the timer itself infinitely.

Which would then explain both of those traces (that endless loop would
call ioc_now() as part of it).

The blk-iocost.c code itself hasn't changed, but the timer code has
gone through big changes.

That said, there's a more blk-related change: da4c8c3d0975 ("block:
cache current nsec time in struct blk_plug").

*And* your second dump is from that

        period_vtime = now.vnow - ioc->period_at_vtime;
        if (WARN_ON_ONCE(!period_vtime)) {

so it smells like the blk-iocost code is just completely confused by
the time caching. Jens?

Jakub, it might be worth seeing if just reverting that commit
da4c8c3d0975 makes the problem go away. Otherwise a bisect might be
needed...

          Linus

^ permalink raw reply	[relevance 95%]

* Re: [GIT PULL] vfs pidfd
  @ 2024-03-12 20:21 99%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 20:21 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Tue, 12 Mar 2024 at 13:09, Christian Brauner <brauner@kernel.org> wrote:
>
> It's used to compare pidfs and someone actually already sent a pull
> request for this to another project iirc. So it'd be good to keep that
> property.

Hmm. If people really do care, I guess we should spend the effort on
making those things unique.

> But if your point is that we don't care about this for 32bit then I do
> agree. We could do away with the checks completely and just accept the
> truncation for 32bit. If that's your point feel free to just remove the
> 32bit handling in the patch and apply it. Let me know. Maybe I
> misunderstood.

I personally don't care about 32-bit any more, but it also feels wrong
to just say that it's ok depending on something on a 64-bit kernel,
but not a 32-bit one.

So let's go with your patch. It's not like it's a problem to spend the
(very little) extra effort to do a 64-bit inode number.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Networking for v6.9
  @ 2024-03-12 20:17 99% ` Linus Torvalds
    2024-03-13  1:00 99% ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 20:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni, bpf

On Mon, 11 Mar 2024 at 21:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> I get what looks like blk-iocost deadlock when I try to run
> your current tree on real Meta servers :(

Hmm. This "it breaks on real hardware, but works in virtual boxes"
sounds like it might be the DM queue limit issue.

Did the tree you tested with perhaps have commit 8e0ef4128694 (which
came in yesterday through the block merge (merge commit 1ddeeb2a058d
just after 11am Monday), but not the revert (commit bff4b74625fe, six
hours later).

IOW, just how current was that "current"? Your email was sent multiple
hours after the revert happened and was pushed out, but I would not be
surprised if your testing was done with something that was in that
broken window.

So if you merged some *other* tree than one from that six-hour window,
please holler - because there's something else going on and we need to
get the block people on it.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] AFFS update for 6.9
  @ 2024-03-12 20:02 54% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 20:02 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 904 bytes --]

On Mon, 11 Mar 2024 at 12:37, David Sterba <dsterba@suse.com> wrote:
>
> please pull one change to AFFS that removes use of SLAB_MEM_SPREAD,
> which is going to be removed from MM code.

I've pulled this, but I don't really see the point in removing these
one by one like this.

SLAB_MEM_SPREAD is already a no-op, the MM people could just do a
coccinelle thing to remove it everywhere.

I think you could do 90% even just using a few variations of 'sed', eg
variations on

   git grep -l 'SLAB_MEM_SPREAD' |
        xargs sed -i 's/SLAB_MEM_SPREAD *|//'

   git grep -l 'SLAB_MEM_SPREAD' |
        xargs sed -i 's/| *SLAB_MEM_SPREAD//'

and then some manual fixups for (a) whitespace cleanup of the result
and (b) the couple of cases where it wasn't a bitwise or into other
fields (or where the bitwise or was on a different line)

And then you'd end up with something like the attached.

        Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 23779 bytes --]

 drivers/dax/super.c               |  3 +--
 drivers/usb/isp1760/isp1760-hcd.c |  8 +++-----
 fs/9p/v9fs.c                      |  2 +-
 fs/adfs/super.c                   |  2 +-
 fs/befs/linuxvfs.c                |  3 +--
 fs/bfs/inode.c                    |  2 +-
 fs/ceph/super.c                   | 18 +++++++++---------
 fs/coda/inode.c                   |  4 ++--
 fs/erofs/super.c                  |  2 +-
 fs/exfat/cache.c                  |  2 +-
 fs/exfat/super.c                  |  2 +-
 fs/ext2/super.c                   |  3 +--
 fs/ext4/super.c                   |  3 +--
 fs/fat/cache.c                    |  2 +-
 fs/fat/inode.c                    |  2 +-
 fs/freevxfs/vxfs_super.c          |  2 +-
 fs/gfs2/main.c                    |  1 -
 fs/hpfs/super.c                   |  2 +-
 fs/isofs/inode.c                  |  2 +-
 fs/jffs2/super.c                  |  2 +-
 fs/nfs/direct.c                   |  3 +--
 fs/nfs/inode.c                    |  2 +-
 fs/nfs/nfs42xattr.c               |  2 +-
 fs/ntfs3/super.c                  |  2 +-
 fs/ocfs2/dlmfs/dlmfs.c            |  2 +-
 fs/ocfs2/super.c                  |  7 +++----
 fs/overlayfs/super.c              |  2 +-
 fs/qnx4/inode.c                   |  2 +-
 fs/quota/dquot.c                  |  2 +-
 fs/smb/client/cifsfs.c            |  2 +-
 fs/tracefs/inode.c                |  1 -
 fs/ubifs/super.c                  |  4 ++--
 fs/udf/super.c                    |  1 -
 fs/ufs/super.c                    |  3 +--
 fs/vboxsf/super.c                 |  3 +--
 fs/xfs/xfs_super.c                |  7 +++----
 fs/zonefs/super.c                 |  2 +-
 include/linux/slab.h              |  2 --
 mm/slab.h                         |  1 -
 net/socket.c                      |  2 +-
 net/sunrpc/rpc_pipe.c             |  2 +-
 41 files changed, 52 insertions(+), 69 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index f4b635526345..a0244f6bb44b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -532,8 +532,7 @@ static int dax_fs_init(void)
 	int rc;
 
 	dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_device), 0,
-			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-			 SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+			SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 			init_once);
 	if (!dax_cache)
 		return -ENOMEM;
diff --git a/drivers/usb/isp1760/isp1760-hcd.c b/drivers/usb/isp1760/isp1760-hcd.c
index 76862ba40f35..0e5e4cb74c87 100644
--- a/drivers/usb/isp1760/isp1760-hcd.c
+++ b/drivers/usb/isp1760/isp1760-hcd.c
@@ -2521,21 +2521,19 @@ static const struct hc_driver isp1760_hc_driver = {
 int __init isp1760_init_kmem_once(void)
 {
 	urb_listitem_cachep = kmem_cache_create("isp1760_urb_listitem",
-			sizeof(struct urb_listitem), 0, SLAB_TEMPORARY |
-			SLAB_MEM_SPREAD, NULL);
+			sizeof(struct urb_listitem), 0, SLAB_TEMPORARY, NULL);
 
 	if (!urb_listitem_cachep)
 		return -ENOMEM;
 
 	qtd_cachep = kmem_cache_create("isp1760_qtd",
-			sizeof(struct isp1760_qtd), 0, SLAB_TEMPORARY |
-			SLAB_MEM_SPREAD, NULL);
+			sizeof(struct isp1760_qtd), 0, SLAB_TEMPORARY, NULL);
 
 	if (!qtd_cachep)
 		goto destroy_urb_listitem;
 
 	qh_cachep = kmem_cache_create("isp1760_qh", sizeof(struct isp1760_qh),
-			0, SLAB_TEMPORARY | SLAB_MEM_SPREAD, NULL);
+			0, SLAB_TEMPORARY, NULL);
 
 	if (!qh_cachep)
 		goto destroy_qtd;
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 61dbe52bb3a3..281a1ed03a04 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -637,7 +637,7 @@ static int v9fs_init_inode_cache(void)
 	v9fs_inode_cache = kmem_cache_create("v9fs_inode_cache",
 					  sizeof(struct v9fs_inode),
 					  0, (SLAB_RECLAIM_ACCOUNT|
-					      SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+					      SLAB_ACCOUNT),
 					  v9fs_inode_init_once);
 	if (!v9fs_inode_cache)
 		return -ENOMEM;
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index e8bfc38239cd..9354b14bbfe3 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -249,7 +249,7 @@ static int __init init_inodecache(void)
 	adfs_inode_cachep = kmem_cache_create("adfs_inode_cache",
 					     sizeof(struct adfs_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (adfs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 2b4dda047450..d76f406d3b2e 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -435,8 +435,7 @@ befs_init_inodecache(void)
 {
 	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
 				sizeof(struct befs_inode_info), 0,
-				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-					SLAB_ACCOUNT),
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 				offsetof(struct befs_inode_info,
 					i_data.symlink),
 				sizeof_field(struct befs_inode_info,
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 355957dbce39..db81570c9637 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -259,7 +259,7 @@ static int __init init_inodecache(void)
 	bfs_inode_cachep = kmem_cache_create("bfs_inode_cache",
 					     sizeof(struct bfs_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (bfs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 5ec102f6b1ac..885cb5d4e771 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -928,36 +928,36 @@ static int __init init_caches(void)
 	ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
 				      sizeof(struct ceph_inode_info),
 				      __alignof__(struct ceph_inode_info),
-				      SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-				      SLAB_ACCOUNT, ceph_inode_init_once);
+				      SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+				      ceph_inode_init_once);
 	if (!ceph_inode_cachep)
 		return -ENOMEM;
 
-	ceph_cap_cachep = KMEM_CACHE(ceph_cap, SLAB_MEM_SPREAD);
+	ceph_cap_cachep = KMEM_CACHE(ceph_cap, 0);
 	if (!ceph_cap_cachep)
 		goto bad_cap;
-	ceph_cap_snap_cachep = KMEM_CACHE(ceph_cap_snap, SLAB_MEM_SPREAD);
+	ceph_cap_snap_cachep = KMEM_CACHE(ceph_cap_snap, 0);
 	if (!ceph_cap_snap_cachep)
 		goto bad_cap_snap;
 	ceph_cap_flush_cachep = KMEM_CACHE(ceph_cap_flush,
-					   SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+					   SLAB_RECLAIM_ACCOUNT);
 	if (!ceph_cap_flush_cachep)
 		goto bad_cap_flush;
 
 	ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
-					SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+					SLAB_RECLAIM_ACCOUNT);
 	if (!ceph_dentry_cachep)
 		goto bad_dentry;
 
-	ceph_file_cachep = KMEM_CACHE(ceph_file_info, SLAB_MEM_SPREAD);
+	ceph_file_cachep = KMEM_CACHE(ceph_file_info, 0);
 	if (!ceph_file_cachep)
 		goto bad_file;
 
-	ceph_dir_file_cachep = KMEM_CACHE(ceph_dir_file_info, SLAB_MEM_SPREAD);
+	ceph_dir_file_cachep = KMEM_CACHE(ceph_dir_file_info, 0);
 	if (!ceph_dir_file_cachep)
 		goto bad_dir_file;
 
-	ceph_mds_request_cachep = KMEM_CACHE(ceph_mds_request, SLAB_MEM_SPREAD);
+	ceph_mds_request_cachep = KMEM_CACHE(ceph_mds_request, 0);
 	if (!ceph_mds_request_cachep)
 		goto bad_mds_req;
 
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index a50356c541f6..6898dc621011 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -72,8 +72,8 @@ int __init coda_init_inodecache(void)
 {
 	coda_inode_cachep = kmem_cache_create("coda_inode_cache",
 				sizeof(struct coda_inode_info), 0,
-				SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-				SLAB_ACCOUNT, init_once);
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+				init_once);
 	if (coda_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 9b4b66dcdd4f..8b6bf9ae1a59 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -885,7 +885,7 @@ static int __init erofs_module_init(void)
 
 	erofs_inode_cachep = kmem_cache_create("erofs_inode",
 			sizeof(struct erofs_inode), 0,
-			SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT,
+			SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 			erofs_inode_init_once);
 	if (!erofs_inode_cachep)
 		return -ENOMEM;
diff --git a/fs/exfat/cache.c b/fs/exfat/cache.c
index 5a2f119b7e8c..7cc200d89821 100644
--- a/fs/exfat/cache.c
+++ b/fs/exfat/cache.c
@@ -46,7 +46,7 @@ int exfat_cache_init(void)
 {
 	exfat_cachep = kmem_cache_create("exfat_cache",
 				sizeof(struct exfat_cache),
-				0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+				0, SLAB_RECLAIM_ACCOUNT,
 				exfat_cache_init_once);
 	if (!exfat_cachep)
 		return -ENOMEM;
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index fcb658267765..3d5ea2cfad66 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -813,7 +813,7 @@ static int __init init_exfat_fs(void)
 
 	exfat_inode_cachep = kmem_cache_create("exfat_inode_cache",
 			sizeof(struct exfat_inode_info),
-			0, SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+			0, SLAB_RECLAIM_ACCOUNT,
 			exfat_inode_init_once);
 	if (!exfat_inode_cachep) {
 		err = -ENOMEM;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 01f9addc8b1f..cabea887314d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -213,8 +213,7 @@ static int __init init_inodecache(void)
 {
 	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
 				sizeof(struct ext2_inode_info), 0,
-				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-					SLAB_ACCOUNT),
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 				offsetof(struct ext2_inode_info, i_data),
 				sizeof_field(struct ext2_inode_info, i_data),
 				init_once);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a8ba84eabab2..59c72b6dd153 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1500,8 +1500,7 @@ static int __init init_inodecache(void)
 {
 	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
 				sizeof(struct ext4_inode_info), 0,
-				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-					SLAB_ACCOUNT),
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 				offsetof(struct ext4_inode_info, i_data),
 				sizeof_field(struct ext4_inode_info, i_data),
 				init_once);
diff --git a/fs/fat/cache.c b/fs/fat/cache.c
index 738e427e2d21..2af424e200b3 100644
--- a/fs/fat/cache.c
+++ b/fs/fat/cache.c
@@ -47,7 +47,7 @@ int __init fat_cache_init(void)
 {
 	fat_cache_cachep = kmem_cache_create("fat_cache",
 				sizeof(struct fat_cache),
-				0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+				0, SLAB_RECLAIM_ACCOUNT,
 				init_once);
 	if (fat_cache_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 5c813696d1ff..d9e6fbb6f246 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -787,7 +787,7 @@ static int __init fat_init_inodecache(void)
 	fat_inode_cachep = kmem_cache_create("fat_inode_cache",
 					     sizeof(struct msdos_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (fat_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index e6e2a2185e7c..42e03b6b1cc7 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -307,7 +307,7 @@ vxfs_init(void)
 
 	vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
 			sizeof(struct vxfs_inode_info), 0,
-			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+			SLAB_RECLAIM_ACCOUNT,
 			offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
 			sizeof_field(struct vxfs_inode_info,
 				vii_immed.vi_immed),
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 79be0cdc730c..04cadc02e5a6 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -111,7 +111,6 @@ static int __init init_gfs2_fs(void)
 	gfs2_inode_cachep = kmem_cache_create("gfs2_inode",
 					      sizeof(struct gfs2_inode),
 					      0,  SLAB_RECLAIM_ACCOUNT|
-						  SLAB_MEM_SPREAD|
 						  SLAB_ACCOUNT,
 					      gfs2_init_inode_once);
 	if (!gfs2_inode_cachep)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 6b0ba3c1efba..314834a078e9 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -255,7 +255,7 @@ static int init_inodecache(void)
 	hpfs_inode_cachep = kmem_cache_create("hpfs_inode_cache",
 					     sizeof(struct hpfs_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (hpfs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 3e4d53e26f94..25fca44149dd 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -93,7 +93,7 @@ static int __init init_inodecache(void)
 	isofs_inode_cachep = kmem_cache_create("isofs_inode_cache",
 					sizeof(struct iso_inode_info),
 					0, (SLAB_RECLAIM_ACCOUNT|
-					SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+					SLAB_ACCOUNT),
 					init_once);
 	if (!isofs_inode_cachep)
 		return -ENOMEM;
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index f99591a634b4..aede1be4dc0c 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -387,7 +387,7 @@ static int __init init_jffs2_fs(void)
 	jffs2_inode_cachep = kmem_cache_create("jffs2_i",
 					     sizeof(struct jffs2_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     jffs2_i_init_once);
 	if (!jffs2_inode_cachep) {
 		pr_err("error: Failed to initialise inode cache\n");
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index c03926a1cc73..7af5d270de28 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1037,8 +1037,7 @@ int __init nfs_init_directcache(void)
 {
 	nfs_direct_cachep = kmem_cache_create("nfs_direct_cache",
 						sizeof(struct nfs_direct_req),
-						0, (SLAB_RECLAIM_ACCOUNT|
-							SLAB_MEM_SPREAD),
+						0, SLAB_RECLAIM_ACCOUNT,
 						NULL);
 	if (nfs_direct_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ebb8d60e1152..93ea49a7eb61 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2372,7 +2372,7 @@ static int __init nfs_init_inodecache(void)
 	nfs_inode_cachep = kmem_cache_create("nfs_inode_cache",
 					     sizeof(struct nfs_inode),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (nfs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index 49aaf28a6950..b6e3d8f77b91 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -1017,7 +1017,7 @@ int __init nfs4_xattr_cache_init(void)
 
 	nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
 	    sizeof(struct nfs4_xattr_cache), 0,
-	    (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+	    (SLAB_RECLAIM_ACCOUNT),
 	    nfs4_xattr_cache_init_once);
 	if (nfs4_xattr_cache_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index cef5467fd928..9df7c20d066f 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1825,7 +1825,7 @@ static int __init init_ntfs_fs(void)
 
 	ntfs_inode_cachep = kmem_cache_create(
 		"ntfs_inode_cache", sizeof(struct ntfs_inode), 0,
-		(SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+		(SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT),
 		init_once);
 	if (!ntfs_inode_cachep) {
 		err = -ENOMEM;
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 85215162c9dd..7fc0e920eda7 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -578,7 +578,7 @@ static int __init init_dlmfs_fs(void)
 	dlmfs_inode_cache = kmem_cache_create("dlmfs_inode_cache",
 				sizeof(struct dlmfs_inode_private),
 				0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-					SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+					SLAB_ACCOUNT),
 				dlmfs_init_once);
 	if (!dlmfs_inode_cache) {
 		status = -ENOMEM;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index a70aff17d455..b3f860888e93 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1706,18 +1706,17 @@ static int ocfs2_initialize_mem_caches(void)
 				       sizeof(struct ocfs2_inode_info),
 				       0,
 				       (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 				       ocfs2_inode_init_once);
 	ocfs2_dquot_cachep = kmem_cache_create("ocfs2_dquot_cache",
 					sizeof(struct ocfs2_dquot),
 					0,
-					(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD),
+					(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT),
 					NULL);
 	ocfs2_qf_chunk_cachep = kmem_cache_create("ocfs2_qf_chunk_cache",
 					sizeof(struct ocfs2_quota_chunk),
 					0,
-					(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+					(SLAB_RECLAIM_ACCOUNT),
 					NULL);
 	if (!ocfs2_inode_cachep || !ocfs2_dquot_cachep ||
 	    !ocfs2_qf_chunk_cachep) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 36d4b8b1f784..a40fc7e05525 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1503,7 +1503,7 @@ static int __init ovl_init(void)
 	ovl_inode_cachep = kmem_cache_create("ovl_inode",
 					     sizeof(struct ovl_inode), 0,
 					     (SLAB_RECLAIM_ACCOUNT|
-					      SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+					      SLAB_ACCOUNT),
 					     ovl_inode_init_once);
 	if (ovl_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 7b5711f76709..d79841e94428 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -378,7 +378,7 @@ static int init_inodecache(void)
 	qnx4_inode_cachep = kmem_cache_create("qnx4_inode_cache",
 					     sizeof(struct qnx4_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					     init_once);
 	if (qnx4_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 1f0c754416b6..eb6e9d95dea1 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2984,7 +2984,7 @@ static int __init dquot_init(void)
 	dquot_cachep = kmem_cache_create("dquot",
 			sizeof(struct dquot), sizeof(unsigned long) * 4,
 			(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-				SLAB_MEM_SPREAD|SLAB_PANIC),
+				SLAB_PANIC),
 			NULL);
 
 	order = 0;
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index fb368b191eef..e0d8c79cdde1 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1664,7 +1664,7 @@ cifs_init_inodecache(void)
 	cifs_inode_cachep = kmem_cache_create("cifs_inode_cache",
 					      sizeof(struct cifsInodeInfo),
 					      0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 					      cifs_init_once);
 	if (cifs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index d65ffad4c327..5545e6bf7d26 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -731,7 +731,6 @@ static int __init tracefs_init(void)
 	tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
 						 sizeof(struct tracefs_inode),
 						 0, (SLAB_RECLAIM_ACCOUNT|
-						     SLAB_MEM_SPREAD|
 						     SLAB_ACCOUNT),
 						 init_once);
 	if (!tracefs_inode_cachep)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index d2881041b393..7f4031a15f4d 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2434,8 +2434,8 @@ static int __init ubifs_init(void)
 
 	ubifs_inode_slab = kmem_cache_create("ubifs_inode_slab",
 				sizeof(struct ubifs_inode), 0,
-				SLAB_MEM_SPREAD | SLAB_RECLAIM_ACCOUNT |
-				SLAB_ACCOUNT, &inode_slab_ctor);
+				SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+				&inode_slab_ctor);
 	if (!ubifs_inode_slab)
 		return -ENOMEM;
 
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 928a04d9d9e0..6f420f4ca005 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -177,7 +177,6 @@ static int __init init_inodecache(void)
 	udf_inode_cachep = kmem_cache_create("udf_inode_cache",
 					     sizeof(struct udf_inode_info),
 					     0, (SLAB_RECLAIM_ACCOUNT |
-						 SLAB_MEM_SPREAD |
 						 SLAB_ACCOUNT),
 					     init_once);
 	if (!udf_inode_cachep)
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index a480810cd4e3..44666afc6209 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1470,8 +1470,7 @@ static int __init init_inodecache(void)
 {
 	ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
 				sizeof(struct ufs_inode_info), 0,
-				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-					SLAB_ACCOUNT),
+				(SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT),
 				offsetof(struct ufs_inode_info, i_u1.i_symlink),
 				sizeof_field(struct ufs_inode_info,
 					i_u1.i_symlink),
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 1fb8f4df60cb..cabe8ac4fefc 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -339,8 +339,7 @@ static int vboxsf_setup(void)
 	vboxsf_inode_cachep =
 		kmem_cache_create("vboxsf_inode_cache",
 				  sizeof(struct vboxsf_inode), 0,
-				  (SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-				   SLAB_ACCOUNT),
+				  SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 				  vboxsf_inode_init_once);
 	if (!vboxsf_inode_cachep) {
 		err = -ENOMEM;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 00fbd5b6e582..59c8c0541bdd 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -2043,8 +2043,7 @@ xfs_init_caches(void)
 
 	xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
 					 SLAB_HWCACHE_ALIGN |
-					 SLAB_RECLAIM_ACCOUNT |
-					 SLAB_MEM_SPREAD,
+					 SLAB_RECLAIM_ACCOUNT,
 					 NULL);
 	if (!xfs_buf_cache)
 		goto out;
@@ -2109,14 +2108,14 @@ xfs_init_caches(void)
 					   sizeof(struct xfs_inode), 0,
 					   (SLAB_HWCACHE_ALIGN |
 					    SLAB_RECLAIM_ACCOUNT |
-					    SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+					    SLAB_ACCOUNT),
 					   xfs_fs_inode_init_once);
 	if (!xfs_inode_cache)
 		goto out_destroy_efi_cache;
 
 	xfs_ili_cache = kmem_cache_create("xfs_ili",
 					 sizeof(struct xfs_inode_log_item), 0,
-					 SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+					 SLAB_RECLAIM_ACCOUNT,
 					 NULL);
 	if (!xfs_ili_cache)
 		goto out_destroy_inode_cache;
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 236a6d88306f..c6a124e8d565 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1422,7 +1422,7 @@ static int __init zonefs_init_inodecache(void)
 {
 	zonefs_inode_cachep = kmem_cache_create("zonefs_inode_cache",
 			sizeof(struct zonefs_inode_info), 0,
-			(SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+			SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
 			NULL);
 	if (zonefs_inode_cachep == NULL)
 		return -ENOMEM;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index b5f5ee8308d0..995f0cdc2b70 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -96,8 +96,6 @@
  */
 /* Defer freeing slabs to RCU */
 #define SLAB_TYPESAFE_BY_RCU	((slab_flags_t __force)0x00080000U)
-/* Spread some memory over cpuset */
-#define SLAB_MEM_SPREAD		((slab_flags_t __force)0x00100000U)
 /* Trace allocations and frees */
 #define SLAB_TRACE		((slab_flags_t __force)0x00200000U)
 
diff --git a/mm/slab.h b/mm/slab.h
index 54deeb0428c6..f4534eefb35d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -469,7 +469,6 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
 			      SLAB_STORE_USER | \
 			      SLAB_TRACE | \
 			      SLAB_CONSISTENCY_CHECKS | \
-			      SLAB_MEM_SPREAD | \
 			      SLAB_NOLEAKTRACE | \
 			      SLAB_RECLAIM_ACCOUNT | \
 			      SLAB_TEMPORARY | \
diff --git a/net/socket.c b/net/socket.c
index ed3df2f749bf..7e9c8fc9a5b4 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -343,7 +343,7 @@ static void init_inodecache(void)
 					      0,
 					      (SLAB_HWCACHE_ALIGN |
 					       SLAB_RECLAIM_ACCOUNT |
-					       SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+					       SLAB_ACCOUNT),
 					      init_once);
 	BUG_ON(sock_inode_cachep == NULL);
 }
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index dcc2b4f49e77..910a5d850d04 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1490,7 +1490,7 @@ int register_rpc_pipefs(void)
 	rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
 				sizeof(struct rpc_inode),
 				0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+						SLAB_ACCOUNT),
 				init_once);
 	if (!rpc_inode_cachep)
 		return -ENOMEM;

^ permalink raw reply related	[relevance 54%]

* Re: [GIT PULL] vfs pidfd
  @ 2024-03-12 16:23 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 16:23 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Tue, 12 Mar 2024 at 07:16, Christian Brauner <brauner@kernel.org> wrote:
>
> No, the size of struct pid was the main reason but I don't think it
> matters. A side-effect was that we could easily enforce 64bit inode
> numbers. But realistically it's trivial enough to workaround. Here's a
> patch for what I think is pretty simple appended. Does that work?

This looks eminently sane to me. Not that I actually _tested_it, but
since my testing would have compared it to my current setup (64-bit
and CONFIG_FS_PID=y) any testing would have been pointless because
that case didn't change.

Looking at the patch, I do wonder how much we even care about 64-bit
inodes. I'd like to point out how 'path_from_stashed()' only takes a
'unsigned long ino' anyway, and I don't think anything really cares
about either the high bits *or* the uniqueness of that inode number..

And similarly, i_ino isn't actually *used* for anything but naming to
user space.

So I'm not at all sure the whole 64-bit checks are worth it. Am I
missing something else?

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] EDAC updates for v6.9
  @ 2024-03-12  2:25 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12  2:25 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Borislav Petkov, x86-ml, lkml, linux-edac

On Mon, 11 Mar 2024 at 19:24, Randy Dunlap <rdunlap@infradead.org> wrote:
>
> and there's an extra/trailing ';'.

Ayup, I fixed that too while I was in there anyway.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] EDAC updates for v6.9
  @ 2024-03-12  1:12 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12  1:12 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml, linux-edac

On Mon, 11 Mar 2024 at 08:57, Borislav Petkov <bp@alien8.de> wrote:
>
> -       return topology_die_id(err->cpu) % amd_get_nodes_per_socket();
> +       return topology_amd_node_id(err->cpu) % topology_amd_nodes_per_pkg();

Ho humm. Lookie here:

    static inline unsigned int topology_amd_nodes_per_pkg(void)
    { return 0; };

that's the UP case.

Yeah, I'm assuming nobody tests this for UP, but it's clearly wrong to
potentially do that modulus by zero.

So I made the merge also change that UP case of
topology_amd_nodes_per_pkg() to return 1.

Because dammit, not only is a mod-by-zero wrong, a UP system most
definitely has one node per package, not zero.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] x86/sev for v6.9-rc1
  @ 2024-03-12  0:50 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12  0:50 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86-ml, lkml

On Mon, 11 Mar 2024 at 08:19, Borislav Petkov <bp@alien8.de> wrote:
>
> If you're merging tip pull requests in the chronological order you've
> received them, you'll encounter a couple of simple merge conflicts.

It's not exactly chronological - I tend to go by areas and by
submitter, but it tends to approximate chronological most of the
time..

> I'm adding how I've resolved them at the end of this message in case
> you wanna compare notes.

Hmm. I took a slightly different approach:

> diff --cc arch/x86/include/asm/coco.h
> index 76c310b19b11,21940ef8d290..42871bb262d0
> --- a/arch/x86/include/asm/coco.h
> +++ b/arch/x86/include/asm/coco.h
> @@@ -10,9 -11,15 +11,15 @@@ enum cc_vendor
>         CC_VENDOR_INTEL,
>   };
>
>  -extern enum cc_vendor cc_vendor;
> + extern u64 cc_mask;
> +
>   #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
>  +extern enum cc_vendor cc_vendor;

I put the 'cc_mask' declaration inside the #ifdef too.

Because those two variables are defined together, and without
CONFIG_ARCH_HAS_CC_PLATFORM the whole coco/ subdirectory that defines
them won't even be built, as far as I can tell.

And I don't see any _use_ of 'cc_mask' anywhere outside of that one
'cc_set_mask()' inline function and the coco/core.c file. So declaring
it only when it's all enabled seems to be the right thing.

Let's hope my artistic merge resolution doesn't end up coming back to bite me.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO
  @ 2024-03-11 20:23 95%                       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-11 20:23 UTC (permalink / raw)
  To: Ankur Arora
  Cc: paulmck, Joel Fernandes, linux-kernel, tglx, peterz, akpm, luto,
	bp, dave.hansen, hpa, mingo, juri.lelli, vincent.guittot, willy,
	mgorman, jpoimboe, mark.rutland, jgross, andrew.cooper3, bristot,
	mathieu.desnoyers, geert, glaubitz, anton.ivanov, mattst88,
	krypton, rostedt, David.Laight, richard, mjguzik, jon.grimm,
	bharata, raghavendra.kt, boris.ostrovsky, konrad.wilk

On Mon, 11 Mar 2024 at 13:10, Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
> Ah, I see your point. Basically, keep the lazy semantics but -- in
> addition -- also provide the ability to dynamically toggle
> cond_resched(), might_reshed() as a feature to help move this along
> further.

Please, let's not make up any random hypotheticals.

Honestly, if we ever hit the hypothetical scenario that Paul outlined, let's

 (a) deal with it THEN, when we actually know what the situation is

 (b) learn and document what it is that actually causes the odd behavior

IOW, instead of assuming that some "cond_resched()" case would even be
the right thing to do, maybe there are other issues going on?  Let's
not paper over them by keeping some hack around - and *if* some
cond_resched() model is actually the right model in some individual
place, let's make it the rule that *when* we hit that case, we
document it.

And we should absolutely not have some hypothetical case keep us from
just doing the right thing and getting rid of the existing
cond_resched().

Because any potential future case is *not* going to be the same
cond_resched() that the current case is anyway. It is going to have
some very different cause.

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [GIT PULL] vfs pidfd
  @ 2024-03-11 20:05 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-11 20:05 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel

On Fri, 8 Mar 2024 at 02:14, Christian Brauner <brauner@kernel.org> wrote:
>
> * Move pidfds from the anonymous inode infrastructure to a tiny
>   pseudo filesystem. This will unblock further work that we weren't able
>   to do simply because of the very justified limitations of anonymous
>   inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on
>   pidfds to become useful for the first time. They can now be compared
>   by inode number which are unique for the system lifetime.

So I obviously pulled this already, but I did have one question - we
don't make nsfs conditional, and I'm not convinced we should make
pidfs conditional either.

I think (and *hope*) all the semantic annoyances got sorted out, and I
don't think there are any realistic size advantages to not enabling
CONFIG_FS_PID.

Is there some fundamental reason for that config entry to exist?

            Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.8
@ 2024-03-10 21:06 51% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-10 21:06 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So it took a bit longer for the commit counts to come down this
release than I tend to prefer, but a lot of that seemed to be about
various selftest updates (networking in particular) rather than any
actual real sign of problems. And the last two weeks have been pretty
quiet, so I feel there's no real reason to delay 6.8.  We always have
some straggling work, and we'll end up having some of it pushed to
stable rather than hold up the new code. Nothing worrisome enough to
keep the regular release schedule from happening.

As usual, the shortlog below is just for the last week since rc7, the
overall changes in 6.8 are obviously much much bigger. This is not the
historically big release that 6.7 was - we seem to be back to a fairly
average release size for the last few years. You can see it in the
overall diffstats too - this looks like an average release in pretty
much all respects, and we don't have (for example) any obvious big new
filesystems or architectures. I think the biggest single new thing in
6.8 is probably the new Xe drm driver, but honestly, the big bulk of
changes are just various random updates and fixes all over.

Just as it should be.

In a sea of normality, one thing that stands out is a bit of random
git numerology.  This is the last mainline kernel to have less than
ten million git objects. In fact, we're at 9.996 million objects, so
we got really close to crossing that not-milestone if it hadn't been
for the nice calming down in the last couple of weeks. Other trees -
notably linux-next - obviously are already comfortably over that
limit.

Of course, there is absolutely nothing special about it apart from a
nice round number.  Git doesn't care.

Anyway, this all obviously means that tomorrow the merge window for
6.9 opens, and I already have several pull requests pending. Thanks to
everybody who sent in early pull requests, you know who you are. But
before that excitement commences, please do spend a bit of time with
the now boring old status quo and give 6.8 a good test, ok?

              Linus

---

Al Raj Hassain (1):
      ASoC: amd: yc: Add HP Pavilion Aero Laptop 13-be2xxx(8BD6) into
DMI quirk table

Alan Stern (1):
      USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command

Alban Boyé (1):
      ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet

Alex Deucher (1):
      drm/amd/display: handle range offsets in VRR ranges

Alexander Usyskin (3):
      mei: me: add arrow lake point S DID
      mei: me: add arrow lake point H DID
      mei: gsc_proxy: match component when GSC is on different bus

Andreas Pape (1):
      ASoC: rcar: adg: correct TIMSEL setting for SSI9

Andrew Ballance (1):
      scripts/gdb/symbols: fix invalid escape sequence warning

Andrey Skvortsov (1):
      crypto: sun8i-ce - Fix use after free in unprepare

Andy Chi (1):
      ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook

Animesh Manna (1):
      drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector()

Antonio Borneo (1):
      pinctrl: stm32: fix PM support for stm32mp257

Arnd Bergmann (1):
      net: bql: fix building with BQL disabled

Aya Levin (1):
      net/mlx5: Fix fw reporter diagnose output

Badhri Jagan Sridharan (1):
      usb: typec: tpcm: Fix PORT_RESET behavior for self powered devices

Bart Van Assche (2):
      Revert "fs/aio: Make io_cancel() generate completions again"
      fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion

Bartosz Golaszewski (1):
      pinctrl: don't put the reference to GPIO device in pinctrl_pins_show()

Charles Keepax (1):
      spi: cs42l43: Don't limit native CS to the first chip select

Christophe JAILLET (1):
      i2c: wmt: Fix an error handling path in wmt_i2c_probe()

Coiby Xu (1):
      integrity: eliminate unnecessary "Problem loading X.509 certificate" msg

Cong Yang (1):
      drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP
and HBP (again)

Cosmin Tanislav (2):
      iio: accel: adxl367: fix DEVID read after reset
      iio: accel: adxl367: fix I2C FIFO data register

Daniel Baluta (1):
      MAINTAINERS: Use a proper mailinglist for NXP i.MX development

Daniel Borkmann (2):
      xdp, bonding: Fix feature flags when there are no slave devs anymore
      selftests/bpf: Fix up xdp bonding test wrt feature flags

Dave Airlie (1):
      nouveau: lock the client object tree.

Dawei Li (1):
      firmware: microchip: Fix over-requested allocation size

Dmitry Baryshkov (1):
      Revert "arm64: dts: qcom: msm8996: Hook up MPM"

Douglas Anderson (3):
      Revert "tty: serial: simplify qcom_geni_serial_send_chunk_fifo()"
      drm/udl: Add ARGB8888 as a format
      Revert "drm/udl: Add ARGB8888 as a format"

Edmund Raile (1):
      firewire: ohci: prevent leak of left-over IRQ on unbind

Eduard Zingerman (2):
      bpf: check bpf_func_state->callback_depth when pruning states
      selftests/bpf: test case for callback_depth states pruning logic

Edward Adam Davis (1):
      net/rds: fix WARNING in rds_conn_connect_if_down

Ekansh Gupta (1):
      misc: fastrpc: Pass proper arguments to scm call

Emeel Hakim (1):
      net/mlx5e: Fix MACsec state loss upon state update in offload path

Emil Tantilov (1):
      idpf: disable local BH when scheduling napi for marker packets

Eric Dumazet (2):
      geneve: make sure to pull inner header in geneve_rx()
      net/ipv6: avoid possible UAF in ip6_route_mpath_notify()

Fabio Estevam (1):
      ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE

Florian Kauer (1):
      igc: avoid returning frame twice in XDP_REDIRECT

Florian Westphal (1):
      netfilter: nft_ct: fix l3num expectations with inet pseudo family

Francesco Dolcini (1):
      ARM: dts: imx7: remove DSI port endpoints

Frej Drejhammar (1):
      comedi: comedi_8255: Correct error in subdevice initialization

Gao Xiang (2):
      erofs: fix uninitialized page cache reported by KMSAN
      erofs: apply proper VMA alignment for memory mapped files on THP

Gavin Li (1):
      Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"

Geliang Tang (1):
      selftests: mptcp: diag: return KSFT_FAIL not test_cnt

Guillaume Nault (1):
      xfrm: Clear low order bits of ->flowi4_tos in decode_session4().

Hans de Goede (2):
      misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice
on suspend/resume
      platform/x86: p2sb: On Goldmont only cache P2SB and SPI devfn BAR

Harshit Mogalapalli (1):
      platform/x86/amd/pmf: Fix missing error code in amd_pmf_init_smart_pc()

Heiner Kallweit (2):
      i2c: i801: Fix using mux_pdev before it's set
      i2c: i801: Avoid potential double call to gpiod_remove_lookup_table

Herbert Xu (1):
      crypto: rk3288 - Fix use after free in unprepare

Horatiu Vultur (1):
      net: sparx5: Fix use after free inside sparx5_del_mact_entry

Ian Abbott (1):
      comedi: comedi_test: Prevent timers rescheduling during deletion

Imre Deak (2):
      drm: Fix output poll work for drm_kms_helper_poll=n
      drm/i915/dp: Fix connector DSC HW state readout

Ivan Vecera (1):
      i40e: Fix firmware version comparison function

Jacob Keller (1):
      ice: virtchnl: stop pretending to support RSS over AQ or registers

Jakub Kicinski (2):
      page_pool: fix netlink dump stop/resume
      dpll: move all dpll<>netdev helpers to dpll code

Janusz Krzysztofik (1):
      drm/i915/selftests: Fix dependency of some timeouts on HZ

Jason Xing (12):
      netrom: Fix a data-race around sysctl_netrom_default_path_quality
      netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
      netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
      netrom: Fix a data-race around sysctl_netrom_transport_timeout
      netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
      netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
      netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
      netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
      netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
      netrom: Fix a data-race around sysctl_netrom_routing_control
      netrom: Fix a data-race around sysctl_netrom_link_fails_count
      netrom: Fix data-races around sysctl_net_busy_read

Javier Carrasco (1):
      Revert "Input: bcm5974 - check endpoint type before starting traffic"

Jean-Baptiste Maneyrol (2):
      iio: imu: inv_mpu6050: fix FIFO parsing when empty
      iio: imu: inv_mpu6050: fix frequency setting when chip is off

Jernej Skrabec (1):
      arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile

Jesse Brandeburg (1):
      ice: fix typo in assignment

Jianbo Liu (2):
      net/mlx5: E-switch, Change flow rule destination checking
      net/mlx5e: Change the warning when ignore_flow_level is not supported

Johan Hovold (4):
      arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
      arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
      phy: qcom-qmp-combo: fix drm bridge registration
      phy: qcom-qmp-combo: fix type-c switch registration

Jon Hunter (1):
      arm64: tegra: Fix Tegra234 MGBE power-domains

Kailang Yang (2):
      ALSA: hda/realtek - Fix headset Mic no show at resume back for
Lenovo ALC897 platform
      ALSA: hda/realtek - Add Headset Mic supported Acer NB platform

Kamalesh Babulal (1):
      cgroup/cpuset: Fix retval in update_cpumask()

Karol Herbst (1):
      drm/nouveau: fix stale locked mutex in nouveau_gem_ioctl_pushbuf

Kees Cook (2):
      iio: pressure: dlhl60d: Initialize empty DLH bytes
      init/Kconfig: lower GCC version check for -Warray-bounds

Konrad Dybcio (1):
      arm64: dts: qcom: sm6115: Fix missing interconnect-names

Krishna Kurapati (1):
      usb: gadget: ncm: Fix handling of zero block length packets

Lena Wang (1):
      netfilter: nf_conntrack_h323: Add protection for bmp length out of range

Leon Romanovsky (1):
      xfrm: Pass UDP encapsulation in TX packet offload

Li Ma (1):
      drm/amd/swsmu: modify the gfx activity scaling

Linus Torvalds (2):
      iov_iter: get rid of 'copy_mc' flag
      Linux 6.8

Liu Ying (1):
      arm64: dts: imx8mp: Fix LDB clocks property

Ma Jun (1):
      drm/amdgpu/pm: Fix the error of pwm1_enable setting

Maciej Fijalkowski (3):
      ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
      i40e: disable NAPI right after disabling irqs when handling xsk_pool
      ice: reorder disabling IRQ and NAPI in ice_qp_dis

Marek Vasut (1):
      arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM

Masahisa Kojima (1):
      MAINTAINERS: net: netsec: add myself as co-maintainer

Mathias Krause (1):
      Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal

Mathias Nyman (2):
      usb: port: Don't try to peer unused USB ports based on location
      xhci: Fix failure to detect ring expansion need.

Matthew Auld (1):
      drm/tests/buddy: fix print format

Matthieu Baerts (NGI0) (1):
      selftests: mptcp: diag: avoid extra waiting

Max Nguyen (1):
      Input: xpad - add additional HyperX Controller Identifiers

Melissa Wen (1):
      drm/amd/display: check dc_link before dereferencing

Michael Kelley (8):
      Drivers: hv: vmbus: Calculate ring buffer size for more
efficient use of memory
      fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()
      Drivers: hv: vmbus: Remove duplication and cleanup code in
create_gpadl_header()
      Drivers: hv: vmbus: Update indentation in create_gpadl_header()
      Documentation: hyperv: Add overview of PCI pass-thru device support
      x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
      x86/mm: Regularize set_memory_p() parameters and make non-static
      x86/hyperv: Make encrypted/decrypted changes safe for
load_unaligned_zeropad()

Michal Schmidt (1):
      ice: fix uninitialized dplls mutex usage

Michal Swiatkowski (1):
      ice: reconfig host after changing MSI-X on VF

Mika Westerberg (1):
      thunderbolt: Fix NULL pointer dereference in tb_port_update_credits()

Mike Yu (2):
      xfrm: fix xfrm child route lookup for packet offload
      xfrm: set skb control buffer based on packet offload as well

Moshe Shemesh (1):
      net/mlx5: Check capability for fw_reset

Nathan Chancellor (1):
      xfrm: Avoid clang fortify warning in copy_to_user_tmpl()

Neil Armstrong (3):
      arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
      arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
      usb: typec: ucsi: fix UCSI on SM8550 & SM8650 Qualcomm devices

Nicolas Pitre (1):
      vt: fix unicode buffer corruption when deleting characters

Niklas Cassel (1):
      mailmap: fix Kishon's email

Niklas Söderlund (1):
      dt-bindings: net: renesas,ethertsn: Document default for delays

Nirmoy Das (1):
      drm/i915: Check before removing mm notifier

Nuno Sa (1):
      counter: fix privdata alignment

Oleksij Rempel (1):
      net: lan78xx: fix runtime PM count underflow on link stop

Pablo Neira Ayuso (3):
      netfilter: nf_tables: disallow anonymous set with timeout flag
      netfilter: nf_tables: reject constant set with timeout
      netfilter: nf_tables: mark set as dead when unbinding anonymous
set with timeout

Paolo Bonzini (1):
      SEV: disable SEV-ES DebugSwap by default

Peter Collingbourne (1):
      serial: 8250_dw: Do not reclock if already at correct rate

Peter Martincic (1):
      hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC

Puranjay Mohan (1):
      arm64: prohibit probing on arch_kunwind_consume_entry()

Qi Zheng (1):
      mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails

Quentin Schulz (2):
      regulator: rk808: fix buck range on RK806
      regulator: rk808: fix LDO range on RK806

RD Babiera (1):
      usb: typec: altmodes/displayport: create sysfs nodes as driver's
default device attribute group

Rahul Rameshbabu (2):
      net/mlx5e: Use a memory barrier to enforce PTP WQ xmit
submission tracking occurs after populating the metadata_map
      net/mlx5e: Switch to using _bh variant of of spinlock API in
port timestamping NAPI poll context

Rand Deeb (1):
      net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()

Ricardo B. Marliere (1):
      Drivers: hv: vmbus: make hv_bus const

Rickard x Andersson (1):
      tty: serial: imx: Fix broken RS485

Rob Herring (1):
      ASoC: dt-bindings: nvidia: Fix 'lge' vendor prefix

Rodrigo Vivi (1):
      drm/xe: Return immediately on tile_init failure

Saeed Mahameed (1):
      Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"

Sasha Neftin (1):
      intel: legacy: Partial revert of field get conversion

Saurabh Sengar (1):
      x86/hyperv: Allow 15-bit APIC IDs for VTL platforms

Sean Christopherson (8):
      KVM: x86: Mark target gfn of emulated atomic instruction as dirty
      KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
      KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
      KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
      KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
      KVM: selftests: Add a testcase to verify GUEST_MEMFD and
READONLY are exclusive
      KVM: SVM: Flush pages under kvm->lock to fix UAF in
svm_register_enc_region()
      KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing

Sherry Sun (1):
      tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled

Stefan Binding (3):
      ALSA: hda: cs35l41: Support Lenovo Thinkbook 16P
      ALSA: hda/realtek: Add quirks for Lenovo Thinkbook 16P laptops
      ALSA: hda: cs35l41: Overwrite CS35L41 configuration for ASUS UM5302LA

Steven Rostedt (Google) (7):
      tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
      tracing: Remove precision vsnprintf() check from print event
      tracing: Limit trace_seq size to just 8K and not depend on
architecture PAGE_SIZE
      tracing: Limit trace_marker writes to just 4K
      ring-buffer: Fix waking up ring buffer readers
      ring-buffer: Fix resetting of shortest_full
      tracing: Use .flush() call to wake up readers

Stuart Henderson (4):
      ASoC: madera: Fix typo in madera_set_fll_clks shift value
      ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
      ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
      ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll

Sumit Garg (1):
      tee: optee: Fix kernel panic caused by incorrect error handling

Suraj Kandpal (3):
      drm/i915/hdcp: Move to direct reads for HDCP
      drm/i915/hdcp: Remove additional timing for reading mst hdcp message
      drm/i915/hdcp: Extract hdcp structure from correct connector

Thierry Reding (1):
      arm64: tegra: Set the correct PHY mode for MGBE

Tobias Jakobi (Compleo) (1):
      net: dsa: microchip: fix register write order in ksz8_ind_write8()

Toke Høiland-Jørgensen (1):
      cpumap: Zero-initialise xdp_rxq_info struct before running XDP program

Tommy Huang (1):
      i2c: aspeed: Fix the dummy irq expected print

Tvrtko Ursulin (1):
      MAINTAINERS: Update email address for Tvrtko Ursulin

Uwe Kleine-König (1):
      Input: gpio_keys_polled - suppress deferred probe error for gpio

Vasileios Amoiridis (1):
      iio: pressure: Fixes BMP38x and BMP390 SPI support

Ville Syrjälä (1):
      drm/i915: Don't explode when the dig port we don't have an AUX CH

Vlastimil Babka (2):
      mm, vmscan: prevent infinite loop for costly GFP_NOIO |
__GFP_RETRY_MAYFAIL allocations
      mm, mmap: fix vma_merge() case 7 with vma_ops->close

Waiman Long (1):
      cgroup/cpuset: Fix a memory leak in update_exclusive_cpumask()

Wentong Wu (1):
      mei: Add Meteor Lake support for IVSC device

Xiubo Li (1):
      libceph: init the cursor when preparing sparse read in msgr2

Yicong Yang (1):
      serial: port: Don't suspend if the port is still busy

Yongzhi Liu (1):
      net: pds_core: Fix possible double free in error handling path

songxiebing (1):
      ALSA: hda: optimize the probe codec process

^ permalink raw reply	[relevance 51%]

* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
  2024-03-08 21:39 99%     ` Linus Torvalds
@ 2024-03-08 21:41 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-08 21:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent

On Fri, 8 Mar 2024 at 13:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the above "complexity" is *literally* just changing the
>
>                   (new = atomic_read_acquire(&my->seq)) != old
>
> condition to
>
>                   should_exit ||
>                   (new = atomic_read_acquire(&my->seq)) != old

.. and obviously you'll need to add the exit condition to the actual
"deal with events" loop too.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
  @ 2024-03-08 21:39 99%     ` Linus Torvalds
  2024-03-08 21:41 99%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-08 21:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent

On Fri, 8 Mar 2024 at 13:33, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> There's two layers:
>
> 1) the ring buffer has the above simple producer / consumer.
>    Where the wake ups can happen at the point of where the buffer has
>    the amount filled that the consumer wants to start consuming with.
>
> 2) The tracing layer; Here on close of a file, the consumers need to be
>    woken up and not wait again. And just take whatever was there to finish
>    reading.
>
>    There's also another case that the ioctl() just kicks the current
>    readers out, but doesn't care about new readers.

But that's the beauty of just using the wait_event() model.

Just add that "exit" condition to the condition.

So the above "complexity" is *literally* just changing the

                  (new = atomic_read_acquire(&my->seq)) != old

condition to

                  should_exit ||
                  (new = atomic_read_acquire(&my->seq)) != old

(replace "should_exit" with whatever that condition is, of course) and
the wait_event() logic will take care of the rest.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
  @ 2024-03-08 20:39 96% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-08 20:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent

On Fri, 8 Mar 2024 at 10:38, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> A patch was sent to "fix" the wait_index variable that is used to help with
> waking of waiters on the ring buffer. The patch was rejected, but I started
> looking at associated code. Discussing it on IRC with Mathieu Desnoyers
> we discovered a design flaw.

Honestly, all of this seems excessively complicated.

And your new locking shouldn't be necessary if you just do things much
more simply.

Here's what I *think* you should do:

  struct xyz {
        ...
        atomic_t seq;
        struct wait_queue_head seq_wait;
        ...
  };

with the consumer doing something very simple like this:

        int seq = atomic_read_acquire(&my->seq);
        for (;;) {
                .. consume outstanding events ..
                seq = wait_for_seq_change(seq, my);
        }

and the producer being similarly trivial, just having a
"add_seq_event()" at the end:

        ... add whatever event ..
        add_seq_event(my);

And the helper functions for this are really darn simple:

  static inline int wait_for_seq_change(int old, struct xyz *my)
  {
        int new;
        wait_event(my->seq_wait,
                (new = atomic_read_acquire(&my->seq)) != old);
        return new;
  }

  static inline void add_seq_event(struct xyz *my)
  {
        atomic_fetch_inc_release(&my->seq);
        wake_up(&my->seq_wait);
  }

Note how you don't need any new locks, and note how "wait_event()"
will do all the required optimistic stuff for you (ie it will check
that "has seq changed" before even bothering to add itself to the wait
queue etc).

So the above is not only short and sweet, it generates fairly good
code too, and doesn't it look really simple and fairly understandable?

And - AS ALWAYS - the above isn't actually tested in any way, shape or form.

                 Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-07 22:09 87%                                     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07 22:09 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Paul E. McKenney, Mathieu Desnoyers, Steven Rostedt, linke li,
	joel, boqun.feng, dave, frederic, jiangshanlai, josh,
	linux-kernel, qiang.zhang1211, quic_neeraju, rcu

On Thu, 7 Mar 2024 at 13:40, Julia Lawall <julia.lawall@inria.fr> wrote:
>
> I tried the following:
>
> @@
> expression x;
> @@
>
> *WRITE_ONCE(x,<+...READ_ONCE(x)...+>)
>
> This gave a number of results, shown below.  Let me know if some of them
> are undesirable.

Well, all the ones you list do look like garbage.

That said, quite often the garbage does seem to be "we don't actually
care about the result". Several of them look like statistics.

Some of them look outright nasty, though:

> --- /home/julia/linux/net/netfilter/nf_tables_api.c
> +++ /tmp/nothing/net/netfilter/nf_tables_api.c
> @@ -10026,8 +10026,6 @@ static unsigned int nft_gc_seq_begin(str
>         unsigned int gc_seq;
>
>         /* Bump gc counter, it becomes odd, this is the busy mark. */
> -       gc_seq = READ_ONCE(nft_net->gc_seq);
> -       WRITE_ONCE(nft_net->gc_seq, ++gc_seq);

The above is garbage code, and the comment implies that it is garbage
code that _should_ be reliable.

> diff -u -p /home/julia/linux/fs/xfs/xfs_icache.c /tmp/nothing/fs/xfs/xfs_icache.c
> --- /home/julia/linux/fs/xfs/xfs_icache.c
> +++ /tmp/nothing/fs/xfs/xfs_icache.c
> @@ -2076,8 +2076,6 @@ xfs_inodegc_queue(
>         cpu_nr = get_cpu();
>         gc = this_cpu_ptr(mp->m_inodegc);
>         llist_add(&ip->i_gclist, &gc->list);
> -       items = READ_ONCE(gc->items);
> -       WRITE_ONCE(gc->items, items + 1);

In contrast, this is also garbage code, but the only user of it seems
to be a heuristic, so if 'items' is off by one (or by a hundred), it
probably doesn't matter.

The xfs code is basically using that 'items' count to decide if it
really wants to do GC or not.

This is actually a case where having a "UNSAFE_INCREMENTISH()" macro
might make sense.

That said, this is also a case where using a "local_t" and using
"local_add_return()" might be a better option. It falls back on true
atomics, but at least on x86 you probably get *better* code generation
for the "incrementish" operation than you get with READ_ONCE ->
WRITE_ONCE.


> diff -u -p /home/julia/linux/kernel/rcu/tree.c /tmp/nothing/kernel/rcu/tree.c
> --- /home/julia/linux/kernel/rcu/tree.c
> +++ /tmp/nothing/kernel/rcu/tree.c
> @@ -1620,8 +1620,6 @@ static void rcu_gp_fqs(bool first_time)
>         /* Clear flag to prevent immediate re-entry. */
>         if (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) {
>                 raw_spin_lock_irq_rcu_node(rnp);
> -               WRITE_ONCE(rcu_state.gp_flags,
> -                          READ_ONCE(rcu_state.gp_flags) & ~RCU_GP_FLAG_FQS);
>                 raw_spin_unlock_irq_rcu_node(rnp);

This smells bad to me. The code is holding a lock, but apparently not
one that protects gp_flags.

And that READ_ONCE->WRITE_ONCE sequence can corrupt all the other flags.

Maybe it's fine for some reason (that reason being either that the
ONCE operations aren't actually needed at all, or because nobody
*really* cares about the flags), but it smells.

> @@ -1882,8 +1880,6 @@ static void rcu_report_qs_rsp(unsigned l
>  {
>         raw_lockdep_assert_held_rcu_node(rcu_get_root());
>         WARN_ON_ONCE(!rcu_gp_in_progress());
> -       WRITE_ONCE(rcu_state.gp_flags,
> -                  READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
>         raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(), flags);

Same field, same lock held, same odd smelly pattern.

> -       WRITE_ONCE(rcu_state.gp_flags,
> -                  READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
>         raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);

.. and again.

> --- /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> @@ -80,8 +80,6 @@ static int cn23xx_vf_reset_io_queues(str
>                                 q_no);
>                         return -1;
>                 }
> -               WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> -                          ~CN23XX_PKT_INPUT_CTL_RST);
>                 octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
>                                    READ_ONCE(reg_val));

I suspect this is garbage that has been triggered by the usual
mindless "fix the symptoms, not the bug" as a result of a "undefined
behavior report".

>> --- /home/julia/linux/kernel/kcsan/kcsan_test.c
> +++ /tmp/nothing/kernel/kcsan/kcsan_test.c
> @@ -381,7 +381,6 @@ static noinline void test_kernel_change_
>                 test_var ^= TEST_CHANGE_BITS;
>                 kcsan_nestable_atomic_end();
>         } else
> -               WRITE_ONCE(test_var, READ_ONCE(test_var) ^ TEST_CHANGE_BITS);

Presumably this is intentionally testing whether KCSAN notices these
things at all.

> diff -u -p /home/julia/linux/arch/s390/kernel/idle.c /tmp/nothing/arch/s390/kernel/idle.c
>         /* Account time spent with enabled wait psw loaded as idle time. */
> -       WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> -       WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
>         account_idle_time(cputime_to_nsecs(idle_time));

This looks like another "UNSAFE_INCREMENTISH()" case.

> --- /home/julia/linux/mm/mmap.c
> +++ /tmp/nothing/mm/mmap.c
> @@ -3476,7 +3476,6 @@ bool may_expand_vm(struct mm_struct *mm,
>
>  void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
>  {
> -       WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm)+npages);

As does this.

> diff -u -p /home/julia/linux/fs/xfs/libxfs/xfs_iext_tree.c /tmp/nothing/fs/xfs/libxfs/xfs_iext_tree.c
>  static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp)
>  {
> -       WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1);
>  }

Ugh. A sequence count that is "incrementish"? That smells wrong to me.
But I didn't go look at the users. Maybe it's another case of "we
don't *actually* care about the sequence count".

>
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> @@ -379,8 +379,6 @@ static int cn23xx_reset_io_queues(struct
>                                 q_no);
>                         return -1;
>                 }
> -               WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> -                       ~CN23XX_PKT_INPUT_CTL_RST);
> ....
> -               WRITE_ONCE(d64, READ_ONCE(d64) &
> -                                       (~(CN23XX_PKT_INPUT_CTL_RING_ENB)));
> -               WRITE_ONCE(d64, READ_ONCE(d64) | CN23XX_PKT_INPUT_CTL_RST);


More "likely wrong" cases.

> +++ /tmp/nothing/mm/kfence/kfence_test.c
> @@ -501,7 +501,6 @@ static void test_kmalloc_aligned_oob_wri
>          * fault immediately after it.
>          */
>         expect.addr = buf + size;
> -       WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);

Looks like questionable test-code again.

> +++ /tmp/nothing/io_uring/io_uring.c
> @@ -363,7 +363,6 @@ static void io_account_cq_overflow(struc
>  {
>         struct io_rings *r = ctx->rings;
>
> -       WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
>         ctx->cq_extra--;

Bah. Looks like garbage, but the kernel doesn't actually use that
value. Looks like a random number generator exposed to user space.
Presumably this is another "statistics, but I don't care enouhg".

> @@ -2403,8 +2402,6 @@ static bool io_get_sqe(struct io_ring_ct
> -                       WRITE_ONCE(ctx->rings->sq_dropped,
> -                                  READ_ONCE(ctx->rings->sq_dropped) + 1);

As is the above.

> +++ /tmp/nothing/security/apparmor/apparmorfs.c
> @@ -596,7 +596,6 @@ static __poll_t ns_revision_poll(struct
>
>  void __aa_bump_ns_revision(struct aa_ns *ns)
>  {
> -       WRITE_ONCE(ns->revision, READ_ONCE(ns->revision) + 1);
>         wake_up_interruptible(&ns->wait);

This looks like somebody copied the RCU / tracing pattern?

> +++ /tmp/nothing/arch/riscv/kvm/vmid.c
> @@ -90,7 +90,6 @@ void kvm_riscv_gstage_vmid_update(struct
>
>         /* First user of a new VMID version? */
>         if (unlikely(vmid_next == 0)) {
> -               WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1);
>                 vmid_next = 1;

Looks bogus and wrong. An unreliable address space version does _not_
sound sane, but who knows.

Anyway, from a quick look, there's a mix of "this is just wrong" and a
couple of "this seems to just want approximate statistics".

Maybe the RCU 'flags' field is using WRITE_ONCE() because while the
spinlock protects the bit changes, there are readers that look at
other bits with READ_ONCE.

That would imply that the READ_ONCE->WRITE_ONCE is just broken garbage
- the WRITE_ONCE() part may be right, but the READ_ONCE is wrong
because the value is stable.

                  Linus

^ permalink raw reply	[relevance 87%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-07 20:00 96%                               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-07 20:00 UTC (permalink / raw)
  To: paulmck
  Cc: Mathieu Desnoyers, Steven Rostedt, linke li, joel, boqun.feng,
	dave, frederic, jiangshanlai, josh, linux-kernel,
	qiang.zhang1211, quic_neeraju, rcu

On Thu, 7 Mar 2024 at 11:47, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> > - The per-thread counter (Thread-Local Storage) incremented by a single
> >   thread, read by various threads concurrently, is a good target
> >   for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> >   various liburcu implementations which track read-side critical sections
> >   per-thread.
>
> Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> similar?

Absolutely not.

The READ_ONCE->WRITE_ONCE pattern is almost certainly a bug.

The valid reason to have a WRITE_ONCE() is that there's a _later_
READ_ONCE() on another CPU.

So WRITE_ONCE->READ_ONCE (across threads) is very valid. But
READ_ONCE->WRITE_ONCE (inside a thread) simply is not a valid
operation.

We do have things like "local_t", which allows for non-smp-safe local
thread atomic accesses, but they explicitly are *NOT* about some kind
of READ_ONCE -> WRITE_ONCE sequence that by definition cannot be
atomic unless you disable interrupts and disable preemption (at which
point they become pointless and only generate worse code).

But the point of "local_t" is that you can do things that aresafe if
there is no SMP issues. They are kind of an extension of the
percpu_add() kind of operations.

In fact, I think it might be interesting to catch those
READ_ONCE->WRITE_ONCE chains (perhaps with coccinelle?) because they
are a sign of bugs.

Now, there's certainly some possibility of "I really don't care about
some stats, I'm willing to do non-smp-safe and non-thread safe
operations if they are faster". So I'm not saying a
READ_ONCE->WRITE_ONCE data dependency is _always_ a bug, but I do
think it's a pattern that is very very close to being one.

             Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  2024-03-07  2:43 93%                     ` Linus Torvalds
@ 2024-03-07  2:49 99%                       ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07  2:49 UTC (permalink / raw)
  To: paulmck
  Cc: Steven Rostedt, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I dunno.

Oh, and just looking at that patch, I still think the code is confused.

On the reading side, we have:

    pipe_count = smp_load_acquire(&p->rtort_pipe_count);
    if (pipe_count > RCU_TORTURE_PIPE_LEN) {
        /* Should not happen, but... */

where that comment clearly says that the pipe_count we read (whether
with READ_ONCE() or with my smp_load_acquire() suggestion) should
never be larger than RCU_TORTURE_PIPE_LEN.

But the writing side very clearly did:

    i = rp->rtort_pipe_count;
    if (i > RCU_TORTURE_PIPE_LEN)
        i = RCU_TORTURE_PIPE_LEN;
    ...
    smp_store_release(&rp->rtort_pipe_count, ++i);

(again, syntactically it could have been "i + 1" instead of my "++i" -
same value), so clearly the writing side *can* write a value that is >
RCU_TORTURE_PIPE_LEN.

So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
is one thing that might be worth looking at, I think there are other
very confusing aspects here.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-07  2:43 93%                     ` Linus Torvalds
  2024-03-07  2:49 99%                       ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-07  2:43 UTC (permalink / raw)
  To: paulmck
  Cc: Steven Rostedt, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]

On Wed, 6 Mar 2024 at 18:29, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> TL;DR:  Those ->rtort_pipe_count increments cannot run concurrently
> with each other or any other update of that field, so that update-side
> READ_ONCE() call is unnecessary and the update-side plain C-language
> read is OK.  The WRITE_ONCE() calls are there for the benefit of the
> lockless read-side accesses to rtort_pipe_count.

Ahh. Ok. That makes a bit more sense.

So if that's the case, then the "updating side" should never use
READ_ONCE, because there's nothing else to protect against.

Honestly, this all makes me think that we'd be *much* better off
showing the real "handoff" with smp_store_release() and
smp_load_acquire().

IOW, something like this (TOTALLY UNTESTED!) patch, perhaps?

And please note that this patch is not only untested, it really is a
very handwavy patch.

I'm sending it as a patch just because it's a more precise way of
saying "I think the writers and readers could use the store-release ->
load-acquire not just to avoid any worries about accessing things
once, but also as a way to show the directional 'flow' of the data".

I dunno.

           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1690 bytes --]

 kernel/rcu/rcutorture.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7567ca8e743c..60b74df3eae2 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -461,12 +461,12 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
 		WRITE_ONCE(rp->rtort_chkp, NULL);
 		smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
 	}
-	i = READ_ONCE(rp->rtort_pipe_count);
+	i = rp->rtort_pipe_count;
 	if (i > RCU_TORTURE_PIPE_LEN)
 		i = RCU_TORTURE_PIPE_LEN;
 	atomic_inc(&rcu_torture_wcount[i]);
-	WRITE_ONCE(rp->rtort_pipe_count, i + 1);
-	if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
+	smp_store_release(&rp->rtort_pipe_count, ++i);
+	if (i >= RCU_TORTURE_PIPE_LEN) {
 		rp->rtort_mbtest = 0;
 		return true;
 	}
@@ -1408,8 +1408,7 @@ rcu_torture_writer(void *arg)
 			if (i > RCU_TORTURE_PIPE_LEN)
 				i = RCU_TORTURE_PIPE_LEN;
 			atomic_inc(&rcu_torture_wcount[i]);
-			WRITE_ONCE(old_rp->rtort_pipe_count,
-				   old_rp->rtort_pipe_count + 1);
+			smp_store_release(&old_rp->rtort_pipe_count, ++i);
 
 			// Make sure readers block polled grace periods.
 			if (cur_ops->get_gp_state && cur_ops->poll_gp_state) {
@@ -1991,7 +1990,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
 	rcu_torture_reader_do_mbchk(myid, p, trsp);
 	rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
 	preempt_disable();
-	pipe_count = READ_ONCE(p->rtort_pipe_count);
+	pipe_count = smp_load_acquire(&p->rtort_pipe_count);
 	if (pipe_count > RCU_TORTURE_PIPE_LEN) {
 		/* Should not happen, but... */
 		pipe_count = RCU_TORTURE_PIPE_LEN;

^ permalink raw reply related	[relevance 93%]

* Re: [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled
  @ 2024-03-07  0:09 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07  0:09 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev, LKML,
	the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Wed, 6 Mar 2024 at 14:08, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Something like below one?

I'd rather leave the regular fallbacks (to memcpy and copy_to_user())
alone, and I'd just put the

        kmsan_memmove(dst, src, len - ret);

etc in the places that currently just call the MC copy functions.

The copy_mc_to_user() logic is already set up for that, since it has
to do the __uaccess_begin/end().

Changing copy_mc_to_kernel() to look visually the same would only
improve on this horror-show, I feel.

Obviously some kmsan person needs to validate your kmsan_memmove() thing, but

> Can we assume that 0 <= ret <= len is always true?

Yes. It had better be for other reasons.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  2024-03-06 19:46 92%                 ` Linus Torvalds
@ 2024-03-06 20:20 97%                   ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 20:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 11:46, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That 'rtort_pipe_count' should be an atomic_t, and the "add one and
> return the old value" should be an "atomic_inc_return()-1" (the "-1"
> is because "inc_return" returns the *new* value).

Bah. I am lost in a twisty maze of operations, all the same.

One final correction to myself: if you want the old value, the nicer
thing to use is probably just "atomic_fetch_inc()".

It generates the same result as "atomic_inc_return()-1", but since we
do have that native "return old value" variant of this, let's just use
it.

So the rules are "atomic_op_return()" returns the new value after the
op, and "atomic_fetch_op()" returns the old value.

For some ops, this matters more than for others. For 'add' like
operations, it's you can deduce the old from the new (and vice versa).

But for bitwise ops, only the 'fetch" version makes much sense,
because you can see the end result from that, but you can't figure out
the original value from the final one.

And to *really* confuse things, as with the memory ordering variants,
we don't always have the full complement of operations.

So we have atomic_fetch_and() (returns old version) and atomic_and()
(doesn't return any version), but we don't have "atomic_and_return()"
because it's less useful.

But for 'inc' we have all three.

                        Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-06 20:06 94%                   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 20:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 11:45, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Here's the back story. I received the following patch:
>
>   https://lore.kernel.org/all/tencent_BA1473492BC618B473864561EA3AB1418908@qq.com/
>
> I didn't like it. My reply was:
>
>         > -     rbwork->wait_index++;
>         > +     WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);
>
>         I mean the above is really ugly. If this is the new thing to do, we need
>         better macros.
>
>         If anything, just convert it to an atomic_t.

The right thing is definitely to convert it to an atomic_t.

The memory barriers can probably also be turned into atomic ordering,
although we don't always have all the variates.

But for example, that

                /* Make sure to see the new wait index */
                smp_rmb();
                if (wait_index != work->wait_index)
                        break;

looks odd, and should probably do an "atomic_read_acquire()" instead
of a rmb and a (non-atomic and non-READ_ONCE thing).

The first READ_ONCE() should probably also be that atomic_read_acquire() op.

On the writing side, my gut feel is that the

        rbwork->wait_index++;
        /* make sure the waiters see the new index */
        smp_wmb();

should be an "atomic_inc_release(&rbwork->wait_index);" but we don't
actually have that operation. We only have the "release" versions for
things that return a value.

So it would probably need to be either

        atomic_inc(&rbwork->wait_index);
        /* make sure the waiters see the new index */
        smp_wmb();

or

        atomic_inc_return_release(&rbwork->wait_index);

or we'd need to add the "basic atomics with ordering semantics" (which
we aren't going to do unless we end up with a lot more people who want
them).

I dunno. I didn't look all *that* closely at the code. The above might
be garbage too. Somebody who actually knows the code should think
about what ordering they actually were looking for.

(And I note that 'wait_index' is of type 'long' in 'struct
rb_irq_work', so I guess it should be "atomic_long_t" instead -  just
shows how little attention I paid on the first read-through, which
should make everybody go "I need to double-check Linus here")

               Linus

^ permalink raw reply	[relevance 94%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-06 19:46 92%                 ` Linus Torvalds
  2024-03-06 20:20 97%                   ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 19:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Note this has nothing to do with tracing. This thread is in RCU. I just
> happen to receive the same patch "fix" for my code.

Ok, googling for rtort_pipe_count, I can only state that that code is
complete garbage.

And no amount of READ_ONCE/WRITE_ONCE will fix it.

For one thing, we have this code:

        WRITE_ONCE(rp->rtort_pipe_count, i + 1);
        if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {

which is broken by design. The compiler is allowed to (and probably
does) turn that into just

        WRITE_ONCE(rp->rtort_pipe_count, i + 1);
        if (i + 1 >= RCU_TORTURE_PIPE_LEN) {

which only results in the question "Why didn't the source code do that
obvious simplification itself?"

So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
regular reads in ways that just makes me go: "there's no saving this
shit".

This needs fixing. Having tests that have random code in them only
makes me doubt that the *TEST* itself is correct, rather than the code
it is trying to actually test.

And dammit, none of that makes sense anyway. This is not some
performance-crticial code. Why is it not using proper atomics if there
is an actual data race?

The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
lot faster than atomics, or - more commonly - because you have some
fundamental algorithm that doesn't do arithmetic, but cares about some
"state at time X" (the RCU _pointer_ being one such obvious case, but
doing an *increment* sure as hell isn't).

So using those READ_ONCE/WRITE_ONCE macros for that thing is
fundamntally wrong to begin with.

The question should not be "should we add another READ_ONCE()". The
question should be "what drugs were people on when writing this code"?

People - please just stop writing garbage.

That 'rtort_pipe_count' should be an atomic_t, and the "add one and
return the old value" should be an "atomic_inc_return()-1" (the "-1"
is because "inc_return" returns the *new* value).

And feel free to add "_relaxed()" to that atomic op because this code
doesn't care about ordering of that counter. It will help on some
architectures, but as mentioned, this is not performance-crticial code
to begin with.

                Linus

^ permalink raw reply	[relevance 92%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  2024-03-06 19:01 95%             ` Linus Torvalds
@ 2024-03-06 19:27 95%               ` Linus Torvalds
      1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06 19:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 11:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In some individual tracing C file where it has a comment above it how
> it's braindamaged and unsafe and talking about why it's ok in that
> particular context? Go wild.

Actually, I take that back.

Even in a random C file, the naming makes no sense. There's no "once" about it.

So if you want to do something like

   #define UNSAFE_INCREMENTISH(x) (WRITE_ONCE(a, READ_ONCE(a) + 1))

then that's fine, I guess. Because that's what the operation is.

It's not safe, and it's not an increment, but it _approximates_ an
increment most of the time. So UNSAFE_INCREMENTISH() pretty much
perfectly describes what it is doing.

Note that you'll also almost certainly end up with worse code
generation, ie don't expect to see a single "inc" instruction (or "add
$1") for the above.

Because at least for gcc, the volatiles involved with those "ONCE"
operations end up often generating much worse code, so rather than an
"inc" instruction, you'll almost certainly get "load+add+store" and
the inevitable code expansion and extra register use.

I really don't know what you want to do, but it smells bad. A big
comment about why you'd want that "incrementish" operation will be
needed.

To me, this smells like "Steven did something fundamentally wrong
again, some tool is now complaining about it, and Steven doesn't want
to fix the problem but instead paper over it again".

Not a good look.

But I don't have a link to the original report, and I'm not thrilled
enough about this to go looking for it.

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-06 19:01 95%             ` Linus Torvalds
  2024-03-06 19:27 95%               ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 19:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 10:53, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Now, are you OK with an addition of ADD_ONCE() and/or INC_ONCE()? So that we
> don't have to look at:
>
>         WRITE_ONCE(a, READ_ONCE(a) + 1);
>
> ?

In a generic header file under include/linux/?

Absolutely not. The above is a completely broken operation. There is
no way in hell we should expose it as a general helper.

So there is no way we'd add that kind of sh*t-for-brains operation in
(for example) our <asm/rwonce.h> header file next to the normal
READ/WRITE_ONCE defines.

In some individual tracing C file where it has a comment above it how
it's braindamaged and unsafe and talking about why it's ok in that
particular context? Go wild.

But honestly, I do not see when a ADD_ONCE() would ever be a valid
thing to do, and *if* it's a valid thing to do, why you'd do it with
READ_ONCE and WRITE_ONCE.

If you don't care about races, just do a simple "++" and be done with
it. The end result is random.

Adding a "ADD_ONCE()" macro doesn't make it one whit less random. It
just makes a broken concept even uglier.

So honestly, I think the ADD_ONCE macro not only needs to be in some
tracing-specific C file, the comment needs to be pretty damn big too.
Because as a random number generator, it's not even a very good one.
So you need to explain *why* you want a particularly bad random number
generator in the first place.

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
  @ 2024-03-06 18:43 87%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
	jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
	qiang.zhang1211, quic_neeraju, rcu

On Wed, 6 Mar 2024 at 09:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> IIRC, the original purpose of READ_ONCE() and WRITE_ONCE() was to make sure
> that the compiler only reads or writes the variable "once". Hence the name.
> That way after a load, you don't need to worry that the content of the
> variable you read isn't going to be read again from the original location
> because the compiler decided to save stack space and registers.
>
> But that macro has now been extended for other purposes.

Not really.

Tearing of simple types (as opposed to structures or bitfields or
"more than one word" or whatever) has never really been a real
concern.

It keeps being brought up as a "compilers could do this", but it's
basically just BS fear-mongering. Compilers _don't_ do it, and the
reason why compilers don't do it isn't some "compilers are trying to
be nice" issue, but simply a "it is insane and generates worse code"
issue.

So what happens is that READ_ONCE() and WRITE_ONCE() have always been
about reading and writing *consistent* values. There is no locking,
but the idea is - and has always been - that you get one *single*
answer from READ_ONCE(), and that single answer will always be
consistent with something that has been written by WRITE_ONCE.

That's often useful - lots of code doesn't really care if you get the
old or the new value, but the code *does* care that it gets *one*
value, and not some random mix of "I tested one value for validity,
then it got reloaded due to register pressure, and I actually used
another value".

And not some "I read one value, and it was a mix of two other values".

But in order to get those semantics, the READ_ONCE() and WRITE_ONCE()
macros don't do just the 'volatile' (to get the "no reloads"
guarantee), but they also do that "simple types" check.

So READ_ONCE/WRITE_ONCE has never really been "extended for other
purposes". The purpose has always been the same: one single consistent
value.

What did happen that our *original* name for this was not "read vs
write", but just "access".

So instead of "READ_ONCE(x)" you'd do "ACCESS_ONCE(x)", and instead of
"WRITE_ONCE(x,y)" you'd do "ACCESS_ONCE(x) = y".

And, to make matters more interesting, we had code that did that on
things that were *not* simple values. IOW, we'd have things like
ACCESS_ONCE() on things that literally *couldn't* be accessed as one
single value.

The most notable was accessing page table entries, which on multiple
architectures (including plain old 32-bit x86) ended up being two
words.

So the extension that *did* happen is that READ_ONCE and WRITE_ONCE
actually verify that the type is simple, and that you can't do a
64-bit READ_ONCE on a 32-bit architecture. Because then while you
migth guarantee that the value isn't reloaded multiple times, you
cannot guarantee that you actually get a value that is consistent with
a WRITE_ONCE (because the reads and writes are both two operations).

Now, we've gotten rid of the whole ACCESS_ONCE() thing, and so some of
that history is no longer visible (although you can still see that
pattern in the rseq self-tests).

So yes, READ_ONCE/WRITE_ONCE do control "tearing", but realistically,
it was always only about the "complex values" kind of tearing that the
old ACCESS_ONCE() model silently and incorrectly allowed.

              Linus

^ permalink raw reply	[relevance 87%]

* Re: linux-next: build warning after merge of the vfs-brauner tree
  @ 2024-03-06  4:47 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06  4:47 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Christian Brauner, Tong Tiangen, Linux Kernel Mailing List,
	Linux Next Mailing List

On Tue, 5 Mar 2024 at 20:37, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> +static struct page *dump_page_copy(struct page *src, struct page *dst)
> +{
> +        return NULL;
> +}

No, it needs to be "return src;" not NULL.

That

  #define dump_page_copy(src, dst) ((dst), (src))

was supposed to be a "use 'dst', return 'src'" macro, and is correct
as that. The problem - as you noticed - is that it causes that "left
side of comma expression has no effect" warning.

(Technically it *does* have an effect - exactly the "argument is used"
one - but the compiler warning does make sense).

Actually, the simplest thing to do is probably just

  #define dump_page_free(x) ((void)(x))
  #define dump_page_copy(src, dst) (src)

where the "use" of the 'dump_page' argument is that dump_page_free()
void cast, and dump_page_copy() simply doesn't need to use it at all.

Christian?

            Linus

^ permalink raw reply	[relevance 99%]

* Re: linux-next: build warning after merge of the vfs-brauner tree
  @ 2024-03-06  2:48 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06  2:48 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Christian Brauner, Tong Tiangen, Linux Kernel Mailing List,
	Linux Next Mailing List

On Tue, 5 Mar 2024 at 15:51, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> fs/coredump.c: In function 'dump_user_range':
> fs/coredump.c:923:40: warning: left-hand operand of comma expression has no effect [-Wunused-value]
>   923 | #define dump_page_copy(src, dst) ((dst), (src))
>       |                                        ^
> fs/coredump.c:948:58: note: in expansion of macro 'dump_page_copy'
>   948 |                         int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
>       |                                                          ^~~~~~~~~~~~~~
>
> Introduced by commit
>
>   4630f2caafcd ("coredump: get machine check errors early rather than during iov_iter")

Bah. If comes from that

  #define dump_page_copy(src,dst) ((dst),(src))

and I did it that way because I wanted to avoid *another* warning,
namely the "dst not used" thing.

But it would have probably been better to either make it an inline
function, or maybe an explicit cast, eg

  #define dump_page_copy(src,dst) ((void)(dst),(src))

or whatever.

                   Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled
  @ 2024-03-05 17:57 93%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-05 17:57 UTC (permalink / raw)
  To: Tetsuo Handa, Alexander Potapenko, Marco Elver, Dmitry Vyukov
  Cc: LKML, the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

[ For the KMSAN people I brought in: this is the patch I'm NAK'ing:

    https://lore.kernel.org/all/3b7dbd88-0861-4638-b2d2-911c97a4cadf@I-love.SAKURA.ne.jp/

  and it looks like you were already cc'd on earlier versions (which
were even more broken) ]

On Tue, 5 Mar 2024 at 03:31, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Ping?

Please don't add new people and 'ping' without context. Very annoying.

That said, after having to search for it that whole patch is
disgusting. Why make duplicated complex conditionals when you could
have just had the tests inside one #ifndef.

Also, that patch means that a KMSAN kernel potentially simply no
longer works on admittedly crappy hardware that almost doesn't exist.

So now a debug feature changes actual semantics in a big way. Not ok.

So I think this patch is ugly but also doubly incorrect.

I think the KMSAN people need to tell us how to tell kmsan that it's a
memcpy (and about the "I'm going to touch this part of memory", needed
for the "copy_mv_to_user" side).

So somebody needs to abstract out that

        depot_stack_handle_t origin;

        if (!kmsan_enabled || kmsan_in_runtime())
                return;

        kmsan_enter_runtime();
        /* Using memmove instead of memcpy doesn't affect correctness. */
        kmsan_internal_memmove_metadata(dst, (void *)src, n);
        kmsan_leave_runtime();

        set_retval_metadata(shadow, origin);

kind of thing, and expose it as a helper function for "I did something
that looks like a memory copy", the same way that we currently have
kmsan_copy_page_meta()

Because NO, IT IS NEVER CORRECT TO USE __msan_memcpy FOR THE MC COPIES.

So no. NAK on that patch. It's completely and utterly wrong.

The onus is firmly on the KMSAN people to give kernel people a way to
tell KMSAN to shut the f&%^ up about that.

End result: don't bother the x86 people until KMSAN has the required support.

               Linus

^ permalink raw reply	[relevance 93%]

* Re: [PATCH] coredump: get machine check errors early rather than during iov_iter
  @ 2024-03-05 17:29 71%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-05 17:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Brauner, Tong Tiangen, linux-fsdevel, linux-kernel,
	wangkefeng.wang, Guohanjun, David Howells, Al Viro,
	Alexander Viro, Jan Kara, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Tue, 5 Mar 2024 at 08:39, Jens Axboe <axboe@kernel.dk> wrote:
>
> For what it's worth, checking the two patches, it's basically the one
> that Linus sent. I think it should have a From: based on that, and I
> also do not see Linus actually signing off on the patch, though that
> has been added to this one.
>
> Would probably be sane to get this one resent before applying, properly
> done.

I have a sign-off in my own test-tree, so it's all ok.

Sending my changelog just in case somebody wants to mix-and-match the two.

              Linus

[-- Attachment #2: 0001-iov_iter-get-rid-of-copy_mc-flag.patch --]
[-- Type: text/x-patch, Size: 8640 bytes --]

From 1077a0a82d0f9b93df4d66a63c5f758b11dc1bbb Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 2 Mar 2024 09:35:13 -0800
Subject: [PATCH] iov_iter: get rid of 'copy_mc' flag

This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).

That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.

In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.

As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.

We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.

But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.

See for example commit c9eec08bac96 ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.

So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.

Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.

Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/coredump.c       | 41 ++++++++++++++++++++++++++++++++++++++---
 include/linux/uio.h | 16 ----------------
 lib/iov_iter.c      | 23 -----------------------
 3 files changed, 38 insertions(+), 42 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index f258c17c1841..6a9b9f3280d8 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	loff_t pos;
 	ssize_t n;
 
+	if (!page)
+		return 0;
+
 	if (cprm->to_skip) {
 		if (!__dump_skip(cprm, cprm->to_skip))
 			return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	pos = file->f_pos;
 	bvec_set_page(&bvec, page, PAGE_SIZE, 0);
 	iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
-	iov_iter_set_copy_mc(&iter);
 	n = __kernel_write_iter(cprm->file, &iter, &pos);
 	if (n != PAGE_SIZE)
 		return 0;
@@ -895,10 +897,40 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	return 1;
 }
 
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+	void *buf = kmap_local_page(src);
+	size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+	kunmap_local(buf);
+	return left ? NULL : dst;
+}
+
+#else
+
+#define dump_page_alloc() ((struct page *)8) // Not NULL
+#define dump_page_free(x) do { } while (0)
+#define dump_page_copy(src,dst) ((dst),(src))
+
+#endif
+
 int dump_user_range(struct coredump_params *cprm, unsigned long start,
 		    unsigned long len)
 {
 	unsigned long addr;
+	struct page *dump_page = dump_page_alloc();
+
+	if (!dump_page)
+		return 0;
 
 	for (addr = start; addr < start + len; addr += PAGE_SIZE) {
 		struct page *page;
@@ -912,14 +944,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
 		 */
 		page = get_dump_page(addr);
 		if (page) {
-			int stop = !dump_emit_page(cprm, page);
+			int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
 			put_page(page);
-			if (stop)
+			if (stop) {
+				dump_page_free(dump_page);
 				return 0;
+			}
 		} else {
 			dump_skip(cprm, PAGE_SIZE);
 		}
 	}
+	dump_page_free(dump_page);
 	return 1;
 }
 #endif
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bea9c89922d9..00cebe2b70de 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
 
 struct iov_iter {
 	u8 iter_type;
-	bool copy_mc;
 	bool nofault;
 	bool data_source;
 	size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
 size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
-	i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return i->copy_mc;
-}
 #else
 #define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return false;
-}
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_UBUF,
-		.copy_mc = false,
 		.data_source = direction,
 		.ubuf = buf,
 		.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..cf2eb2b2f983 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
-		.copy_mc = false,
 		.nofault = false,
 		.data_source = direction,
 		.__iov = iov,
@@ -244,27 +243,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
-			   size_t len, void *to, void *priv2)
-{
-	return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
-	if (unlikely(i->count < bytes))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-	return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
 static __always_inline
 size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(iov_iter_is_copy_mc(i)))
-		return __copy_from_iter_mc(addr, bytes, i);
 	return iterate_and_advance(i, bytes, addr,
 				   copy_from_user_iter, memcpy_from_iter);
 }
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_KVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.kvec = kvec,
 		.nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_BVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.bvec = bvec,
 		.nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
 	BUG_ON(direction & ~1);
 	*i = (struct iov_iter) {
 		.iter_type = ITER_XARRAY,
-		.copy_mc = false,
 		.data_source = direction,
 		.xarray = xarray,
 		.xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 	BUG_ON(direction != READ);
 	*i = (struct iov_iter){
 		.iter_type = ITER_DISCARD,
-		.copy_mc = false,
 		.data_source = false,
 		.count = count,
 		.iov_offset = 0
-- 
2.44.0.rc1.22.g64314bd58b


^ permalink raw reply related	[relevance 71%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-05  0:17 99%                                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-05  0:17 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Mon, 4 Mar 2024 at 15:50, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> But this still isn't fixing anything. It's just adding a limit.

Limiting things to a common maximum size is a good thing. The kernel
limits much  more important things for very good reasons.

The kernel really shouldn't have big strings. EVER.  And it literally
shows in our kernel infrastructure. It showed in that vsnprintf
precision thing. It shows in our implementation choices, where we tend
to have simplistic implementations because doing things a byte at a
time is simple and cheap when the strings are limited in size (and we
don't want fancy and can't use vector state anyway).

If something as core as a pathname can be limited to 4kB, then
something as unimportant as a trace string had better be limited too.
Because we simply DO NOT WANT to have to deal with longer strings in
the kernel.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-04 23:20 98%                           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-04 23:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Mon, 4 Mar 2024 at 14:08, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Fine, I'll just remove the precision as that's not needed. There was no
> other overflows involved here.

I really want you to add the size check on the trace buffer *creation* side.

I don't understand why you refuse to accept the fact that the
precision warning found a PROBLEM.

And no, the fix was never to paper over the problem by limiting the
precision field. Hiding a problem isn't fixing it.

And no, the fix was also never to chop up the printing of the string
in smaller pieces to hide paper over the precision field. Again,
hiding a problem isn't fixing it.

And finally, NO, the fix was also never to add extra debug code to see
that there was a NUL character there.

The fix was *always* to simply not accept insanely long strings in the
first place, and make sure that the field was correctly *set*.

IOW, at *creation* time the code needed a proper check for length
(which obviously indirectly includes checking for the terminating NUL
character at that point).

Why do these threads with you always have to end up this long? Why do
I Nhave to explain every single step of the way that you need to *FIX*
the problem, not try to hide it with new extra code.

                  Linus

^ permalink raw reply	[relevance 98%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-04 21:50 99%                       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-04 21:50 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Mon, 4 Mar 2024 at 13:40, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> As I mentioned that the design is based on that the allocated buffer size is
> the string length rounded up to the word size, all I need to do is to make
> sure that there's a nul terminating byte within the last word of the
> allocated buffer. Then "%s" is all I need.

Please don't add pointless code that helps nothing.

> Would this work for you?

No. This code only adds debug code, and doesn't actually improve anything.

We *have* debug code already. Things like KASAN already find array
overruns, and your ex-tempore debug code adds zero actual value.

That, btw, is why your old stupid precision code was not only
triggering warnings, but was ACTIVELY DETRIMENTAL.

All that precision code could ever do was to potentially hide bugs if
the string wasn't NUL-terminated.

So no. I absolutely do NOT want you to write more code to hide bugs or
do half-arsed checking.

I want you to *simplify* the code, and put proper limits in place for strings.

I want to see the code that actually notices when somebody generates a
crazy string, and stops that garbage in its tracks.

What I do *not* want to see is more ad-hoc code that tries to deal
with the symptoms of you not having done so.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  @ 2024-03-04 18:32 99%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-04 18:32 UTC (permalink / raw)
  To: David Howells
  Cc: Tong Tiangen, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
	linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
	Kefeng Wang

On Mon, 4 Mar 2024 at 03:56, David Howells <dhowells@redhat.com> wrote:
>
> That said, I wonder if:
>
>         #ifdef copy_mc_to_kernel
>
> should be:
>
>         #ifdef CONFIG_ARCH_HAS_COPY_MC

Hmm. Maybe. We do have that

  #ifdef copy_mc_to_kernel

pattern already in <linux/uaccess.h>, so clearly we've done it both ways.

I personally like the "just test for the thing you are using" model,
which is then why I did it that way, but I don't have hugely strong
opinions on it.

> and whether it's possible to find out dynamically if MCEs can occur at all.

I really wanted to do something like that, and look at the source page
to decide "is this a pmem page that can cause machine checks", but I
didn't find any obvious way to do that.

Improvement suggestions more than welcome.

               Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.8-rc7
@ 2024-03-03 21:15 51% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-03 21:15 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So we finally have a week where things have calmed down, and in fact
6.8-rc7 is smaller than usual at this point in time. So if that keeps
up (but that's a fairly notable "if") I won't feel like I need to do
an rc8 this release after all.

So no guarantees, but assuming no bad surprises, we'll have the final
6.8 next weekend.

You can see the rc7 fixes in the shortlog below, and I don't think
there's anything particularly notable in there. It's not only fairly
small for an rc7, all the stats look fairly normal: just over half of
the diff is driver fixes, with the rest being a fairly random mix of
arch updates (powerpc and RISC-C dominate - although "dominate" may
not the right word when it's all pretty small) some filesystem fixes
(btrfs stands out), some core networking and mm fixes, and some more
networking selftest updates.

It really is all pretty small. Let's hope it stays that way,

                  Linus

---

Abel Vesa (1):
      phy: qualcomm: eusb2-repeater: Rework init to drop redundant zero-out loop

Alex Deucher (1):
      Revert "drm/amd/pm: resolve reboot exception for si oland"

Alexander Ofitserov (1):
      gtp: fix use-after-free and null-ptr-deref in gtp_newlink()

Alexander Stein (1):
      phy: freescale: phy-fsl-imx8-mipi-dphy: Fix alias name to use dashes

Alexandre Ghiti (3):
      riscv: Fix build error if !CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
      Revert "riscv: mm: support Svnapot in huge vmap"
      riscv: Fix pte_leaf_size() for NAPOT

Amritha Nambiar (1):
      ice: Fix ASSERT_RTNL() warning during certain scenarios

Andre Werner (1):
      net: smsc95xx: add support for SYS TEC USB-SPEmodule1

Andy Shevchenko (1):
      gpiolib: Fix the error path order in gpiochip_add_data_with_key()

Aneesh Kumar K.V (IBM) (1):
      mm/debug_vm_pgtable: fix BUG_ON with pud advanced test

Ard Biesheuvel (3):
      crypto: arm64/neonbs - fix out-of-bounds access on short input
      efivarfs: Drop redundant cleanup on fill_super() failure
      efivarfs: Drop 'duplicates' bool parameter on efivar_init()

Arkadiusz Kubalewski (4):
      ice: fix dpll input pin phase_adjust value updates
      ice: fix dpll and dpll_pin data access on PF reset
      ice: fix dpll periodic work data updates on PF reset
      ice: fix pin phase adjust updates on PF reset

Arnd Bergmann (3):
      efi/capsule-loader: fix incorrect allocation size
      scsi: mpi3mr: Reduce stack usage in mpi3mr_refresh_sas_ports()
      drm/xe/mmio: fix build warning for BAR resize on 32-bit

Arturas Moskvinas (1):
      gpio: 74x164: Enable output pins after registers are reset

Bart Van Assche (1):
      fs/aio: Make io_cancel() generate completions again

Bartosz Golaszewski (1):
      gpio: fix resource unwinding order in error path

Benjamin Berg (1):
      wifi: iwlwifi: mvm: ensure offloading TID queue exists

Bjorn Andersson (1):
      pmdomain: qcom: rpmhpd: Fix enabled_corner aggregation

Byungchul Park (1):
      mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index

Christian König (1):
      drm/ttm/tests: depend on UML || COMPILE_TEST

Christophe Kerello (1):
      mmc: mmci: stm32: fix DMA API overlapping mappings warning

Christophe Leroy (1):
      kunit: Fix again checksum tests on big endian CPUs

Colin Ian King (1):
      ASoC: qcom: Fix uninitialized pointer dmactl

Conor Dooley (1):
      RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs

Cristian Marussi (1):
      pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

Curtis Klein (1):
      dmaengine: fsl-qdma: init irq after reg initialization

Dave Airlie (1):
      nouveau: report byte usage in VRAM usage.

David Howells (1):
      afs: Fix endless loop in directory parsing

David Sterba (1):
      btrfs: dev-replace: properly validate device names

Davide Caratti (1):
      mptcp: fix double-free on socket dismantle

Dimitris Vlachos (1):
      riscv: Sparse-Memory/vmemmap out-of-bounds fix

Dmitry Baryshkov (2):
      phy: qcom-qmp-usb: fix v3 offsets data
      Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD
status changes"

Doug Smythies (1):
      cpufreq: intel_pstate: fix pstate limits enforcement for
adjust_perf call back

Elad Nachman (3):
      mtd: rawnand: marvell: fix layouts
      mmc: sdhci-xenon: fix PHY init clock stability
      mmc: sdhci-xenon: add timeout for PHY init complete

Emmanuel Grumbach (1):
      wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices

Eniac Zhang (1):
      ALSA: hda/realtek: fix mute/micmute LED For HP mt440

Eric Dumazet (3):
      ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
      dpll: rely on rcu for netdev_dpll_pin()
      dpll: fix build failure due to rcu_dereference_check() on unknown type

Fei Wu (1):
      perf: RISCV: Fix panic on pmu overflow handler

Felix Fietkau (1):
      wifi: mac80211: only call drv_sta_rc_update for uploaded stations

Fenghua Yu (2):
      dmaengine: idxd: Remove shadow Event Log head stored in idxd
      dmaengine: idxd: Ensure safe user copy of completion record

Filipe Manana (6):
      btrfs: send: don't issue unnecessary zero writes for trailing hole
      btrfs: fix data races when accessing the reserved amount of block reserves
      btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
      btrfs: fix race between ordered extent completion and fiemap
      btrfs: ensure fiemap doesn't race with writes when
FIEMAP_FLAG_SYNC is given
      btrfs: fix double free of anonymous device after snapshot creation failure

Florian Westphal (4):
      netlink: add nla be16/32 types to minlen array
      net: ip_tunnel: prevent perpetual headroom growth
      netfilter: bridge: confirm multicast packets before passing them
up the stack
      selftests: netfilter: add bridge conntrack + multicast test case

Francois Dugast (1):
      drm/xe/uapi: Remove unused flags

Frank Li (2):
      dmaengine: fsl-edma: correct max_segment_size setting
      dmaengine: fsl-qdma: add __iomem and struct in union to fix sparse warning

Frédéric Danis (1):
      Bluetooth: mgmt: Fix limited discoverable off timeout

Gaurav Batra (1):
      powerpc/pseries/iommu: IOMMU table is not initialized for kdump
over SR-IOV

Geliang Tang (3):
      mptcp: map v4 address to v6 when destroying subflow
      selftests: mptcp: rm subflow with v4/v4mapped addr
      selftests: mptcp: join: add ss mptcp support check

Geoff Levand (1):
      ps3/gelic: Fix SKB allocation

Gergo Koteles (1):
      ALSA: hda/realtek: tas2781: enable subwoofer volume control

Haiyue Wang (1):
      Documentations: correct net_cachelines title for struct inet_sock

Han Xu (1):
      mtd: spinand: gigadevice: Fix the get ecc status issue

Hans Peter (1):
      ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8)

Hans de Goede (1):
      power: supply: bq27xxx-i2c: Do not free non existing IRQ

Herbert Xu (1):
      crypto: lskcipher - Copy IV in lskcipher glue code always

Ignat Korchagin (1):
      netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()

Ivan Semenov (1):
      mmc: core: Fix eMMC initialization with 1-bit bus connection

Jakub Kicinski (4):
      net: veth: clear GRO when clearing XDP even when down
      selftests: net: veth: test syncing GRO and XDP state while device is down
      veth: try harder when allocating queue memory
      tools: ynl: fix handling of multiple mcast groups

Jakub Raczynski (1):
      stmmac: Clear variable when destroying workqueue

Janaki Ramaiah Thota (1):
      Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT

Jaroslav Kysela (1):
      ALSA: pcm: clarify and fix default msbits value for all formats

Jason Gunthorpe (1):
      iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGES

Javier Carrasco (1):
      net: usb: dm9601: fix wrong return value in dm9601_mdio_read

Jay Ajit Mate (1):
      ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron
16 Plus 7630

Jeff Johnson (2):
      MAINTAINERS: wifi: update Jeff Johnson e-mail address
      MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files

Jeremy Kerr (1):
      net: mctp: take ownership of skb in mctp_local_output

Jiawei Wang (2):
      ASoC: amd: yc: add new YC platform variant (0x63) support
      ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2

Jiri Bohac (1):
      x86/e820: Don't reserve SETUP_RNG_SEED in e820

Jiri Slaby (SUSE) (1):
      fbcon: always restore the old font data in fbcon_do_set_font()

Jisheng Zhang (1):
      riscv: tlb: fix __p*d_free_tlb()

Johan Hovold (4):
      drm/bridge: aux-hpd: fix OF node leaks
      drm/bridge: aux-hpd: separate allocation and registration
      soc: qcom: pmic_glink_altmode: fix drm bridge use-after-free
      Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid

Johannes Berg (1):
      wifi: nl80211: reject iftype change with mesh ID change

Johannes Thumshirn (1):
      btrfs: zoned: don't skip block group profile checks on conventional zones

Johnny Hsieh (1):
      ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table

Jonas Dreßler (1):
      Bluetooth: hci_sync: Check the correct flag before starting a scan

José Roberto de Souza (1):
      drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over

Joy Zou (1):
      dmaengine: fsl-edma: correct calculation of 'nbytes' in
multi-fifo scenario

Justin Iurman (1):
      uapi: in6: replace temporary label with rfc9486

Kai-Heng Feng (1):
      Bluetooth: Enforce validation on max value of connection interval

Kailang Yang (1):
      ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port

Kory Maincent (6):
      dmaengine: dw-edma: Fix the ch_count hdma callback
      dmaengine: dw-edma: Fix wrong interrupt bit set for HDMA
      dmaengine: dw-edma: HDMA_V0_REMOTEL_STOP_INT_EN typo fix
      dmaengine: dw-edma: Add HDMA remote interrupt configuration
      dmaengine: dw-edma: HDMA: Add sync read before starting the DMA
transfer in remote setup
      dmaengine: dw-edma: eDMA: Add sync read before starting the DMA
transfer in remote setup

Kurt Kanzenbach (1):
      net: stmmac: Complete meta data only when enabled

Lin Ma (1):
      rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back

Linus Torvalds (1):
      Linux 6.8-rc7

Lorenzo Stoakes (1):
      MAINTAINERS: add memory mapping entry with reviewers

Lucas De Marchi (1):
      drm/xe: Use pointers in trace events

Luiz Augusto von Dentz (2):
      Bluetooth: hci_sync: Fix accept_list when attempting to suspend
      Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST

Lukas Bulwahn (1):
      MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER

Lukasz Majewski (2):
      net: hsr: Fix typo in the hsr_forward_do() function comment
      net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames

Ma Jun (1):
      drm/amdgpu/pm: Fix the power1_min_cap value

Maarten Lankhorst (1):
      drm/xe: Add uapi for dumpable bos

Marco Elver (2):
      stackdepot: use variable size records for non-evictable entries
      kasan: revert eviction of stack traces in generic mode

Mark Brown (1):
      spi: Drop mismerged fix

Mark O'Donovan (1):
      fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS

Masami Hiramatsu (Google) (1):
      fprobe: Fix to allocate entry_data_size buffer with rethook instances

Matthew Auld (3):
      drm/buddy: fix range bias
      drm/buddy: check range allocation matches alignment
      drm/tests/drm_buddy: add alloc_range_bias test

Matthew Brost (3):
      drm/xe: Fix execlist splat
      drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer
      drm/xe: Use vmalloc for array of bind allocation in bind IOCTL

Matthieu Baerts (NGI0) (1):
      mptcp: avoid printing warning once on client side

Michael Ellerman (1):
      selftests/powerpc: Fix fpu_signal failures

Mickaël Salaün (3):
      selinux: fix lsm_get_self_attr()
      apparmor: fix lsm_get_self_attr()
      landlock: Fix asymmetric private inodes referring

Mika Kuoppala (2):
      drm/xe: Expose user fence from xe_sync_entry
      drm/xe: Deny unbinds if uapi ufence pending

Mikko Perttunen (1):
      gpu: host1x: Skip reset assert on Tegra186

Ming Lei (1):
      block: define bvec_iter as __packed __aligned(4)

Miquel Raynal (1):
      mtd: Fix possible refcounting issue when going through partition nodes

Naresh Solanki (1):
      regulator: max5970: Fix regulator child node name

Nathan Chancellor (2):
      kbuild: Add -Wa,--fatal-warnings to as-instr invocation
      RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH

Nathan Lynch (1):
      powerpc/rtas: use correct function name for resetting TCE tables

Nhat Pham (1):
      mm: cachestat: fix folio read-after-free in cache walk

Nicolin Chen (3):
      iommufd: Fix iopt_access_list_id overwrite bug
      iommufd/selftest: Fix mock_dev_num bug
      iommufd: Fix protection fault in iommufd_test_syz_conv_iova

Oleksij Rempel (3):
      lan78xx: enable auto speed configuration for LAN7850 if no
EEPROM is detected
      net: lan78xx: fix "softirq work is pending" error
      igb: extend PTP timestamp adjustments to i211

Paolo Abeni (5):
      mptcp: push at DSS boundaries
      mptcp: fix snd_wnd initialization for passive socket
      mptcp: fix potential wake-up event loss
      mptcp: fix possible deadlock in subflow diag
      selftests: mptcp: explicitly trigger the listener diag code-path

Paolo Bonzini (2):
      x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()
      x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers

Paulo Zanoni (1):
      drm/xe: get rid of MAX_BINDS

Peng Ma (1):
      dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read

Prike Liang (1):
      drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

Priyanka Dandamudi (2):
      drm/xe/xe_bo_move: Enhance xe_bo_move trace
      drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace

Rafael J. Wysocki (1):
      Revert "ACPI: EC: Use a spin lock without disabing interrupts"

Randy Dunlap (1):
      net: ethernet: adi: move PHYLIB from vendor to driver symbol

Ranjan Kumar (1):
      scsi: mpt3sas: Prevent sending diag_reset when the controller is ready

Richard Fitzgerald (2):
      ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET
      ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()

Rob Clark (1):
      soc: qcom: pmic_glink: Fix boot when QRTR=m

Ryan Lin (1):
      drm/amd/display: Add monitor patch for specific eDP

Ryosuke Yasuoka (1):
      netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter

Sabrina Dubroca (4):
      tls: decrement decrypt_pending if no async completion will be called
      tls: fix peeking with sync+async decryption
      tls: separate no-async decryption request handling from async
      tls: fix use-after-free on failed backlog decryption

Samuel Holland (4):
      MAINTAINERS: Update SiFive driver maintainers
      riscv: Fix enabling cbo.zero when running in M-mode
      riscv: Add a custom ISA extension for the [ms]envcfg CSR
      riscv: Save/restore envcfg CSR during CPU suspend

Saravana Kannan (1):
      of: property: fw_devlink: Fix stupid bug in remote-endpoint parsing

Shannon Nelson (3):
      ionic: check before releasing pci regions
      ionic: check cmd_regs before copying in or out
      ionic: restore netdev feature bits after reset

Shiyang Ruan (1):
      xfs: drop experimental warning for FSDAX

Sid Pranjale (1):
      drm/nouveau: keep DMA buffers required for suspend/resume

Srinivasan Shanmugam (1):
      drm/amd/display: Prevent potential buffer overflow in map_hw_resources

Tadeusz Struk (1):
      dmaengine: ptdma: use consistent DMA masks

Takashi Iwai (2):
      ALSA: ump: Fix the discard error code from snd_ump_legacy_open()
      ALSA: Drop leftover snd-rtctimer stuff from Makefile

Takashi Sakamoto (2):
      ALSA: firewire-lib: fix to check cycle continuity
      firewire: core: use long bus reset on gap count error

Tetsuo Handa (1):
      tomoyo: fix UAF write bug in tomoyo_write_control()

Thierry Reding (1):
      drm/tegra: Remove existing framebuffer only if we support display

Thomas Weißschuh (1):
      power: supply: mm8013: select REGMAP_I2C

Théo Lebrun (4):
      spi: cadence-qspi: fix pointer reference in runtime PM hooks
      spi: cadence-qspi: remove system-wide suspend helper calls from
runtime PM hooks
      spi: cadence-qspi: put runtime in runtime PM hooks names
      spi: cadence-qspi: add system-wide suspend and resume callbacks

Tim Schumacher (1):
      efivarfs: Request at most 512 bytes for variable names

Vadim Shakirov (2):
      drivers: perf: added capabilities for legacy PMU
      drivers: perf: ctr_get_width function for legacy is not defined

Vladimir Oltean (1):
      net: dpaa: fman_memac: accept phy-interface-type = "10gbase-r"
in the device tree

Willian Wang (1):
      ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8

Xiubo Li (1):
      ceph: switch to corrected encoding of max_xattr_size in mdsmap

Yang Yingliang (1):
      phy: qcom: phy-qcom-m31: fix wrong pointer pass to PTR_ERR()

Yangyu Chen (1):
      riscv: mm: fix NOCACHE_THEAD does not set bit[61] correctly

Ying Hsu (1):
      Bluetooth: Avoid potential use-after-free in hci_error_reset

Yochai Hagvi (1):
      ice: fix connection state of DPLL and out pin

Yuezhang Mo (1):
      exfat: fix appending discontinuous clusters to empty file

Yunjian Wang (1):
      tun: Fix xdp_rxq_info's queue_index when detaching

Yuxuan Hu (1):
      Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security

Zhangfei Gao (1):
      iommu/sva: Fix SVA handle sharing in multi device case

Zijun Hu (3):
      Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
      Bluetooth: qca: Fix wrong event type for patch config command
      Bluetooth: qca: Fix triggering coredump implementation

Zong Li (1):
      riscv: add CALLER_ADDRx support

^ permalink raw reply	[relevance 51%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-03 20:09 79%                 ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-03 20:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Sun, 3 Mar 2024 at 11:07, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The string in question isn't some random string. It's a print event on
> the ring buffer where the size is strlen(p) rounded up to word size.
> That means, max will be no bigger than word-size - 1 greater than
> strlen(p). That means the chunks of 1024 will never land in the middle
> of garbage.

What a piece of unbelievable crap.

So you didn't actually want the precision in the first place, then you
started limiting it to an insane value because the printk code
complains about insane precision, and then you want to "fix" that by
printing it out in chunks where you know the chunk size won't hit
garbage, but that ends up being a random implementation detail that
you don't even document in that chunking code.

> > What was wrong with saying "don't do that"? You seem to be bending
> > over backwards to doing stupid things, and then insisting on doing
> > them entirely wrong.
>
> Don't do what?

You have this pattern of not actually thinking through the code AT
ALL, andc just fixing symptoms, and then making the code worse.

The whole "let's avoid the symptom of the kernel printk code telling
us that 32kB string precision is crazy by putting a 32kB-1 limit on
it" was clearly just papering over a symptom, not fixing the problem.

Doing insane chunking in 1kB pieces was another "let's paper over the
symptom, not fix the problem".

And now you finally admit that the actual problem was that YOUR
PRECISION WAS STUPID TO BEGIN WITH.

Do you really not see what the truly _fundamental_ problem here is?

Kernel code doesn't "paper over" stuff. We do things *right*. No more
of this crap.

You really need to take a deep look at what you are doing. I spend
more time on your pull requests than I want to, exactly because you
have had this pattern of doing something wrong in the first place, and
then adding MORE CODE to paper over all the problems that initial
wrong decision causes.

This was *exactly* the same same thing that happened on the tracefs
side. You did things wrong, and then you spent a lot of effort in
trying to patch up the resulting problems, instead of going back and
doing it *right*.

And honestly, I still think that the fundamental mistake you have done
is to let people say "I want to have these big strings" and you just
roll over and say "sure, we'll create shit kernel code for you".

WTF do you think it's fine to say "let people do insane things"
instead of just telling people that no, we have sane and small limits.

As a maintainer, one of your jobs is to say "No, we're not doing crazy
stuff". I still think that having so big strings that this came up in
the first place is a sign of the deeper problem, and then the fact
that you had an insane and pointless precision field was just a small
implementation issue.

Doing tracing in the kernel is not some kind of general-purpose thing.
It's ok - in fact, it's a really damn good idea - to just tell people
"yes, you can add strings, but dammit, there needs to be sanity to
it".

So I now tell you that you should

 (a) get rid of the stupid and nonsensical precision

 (b) tell people that their string are limited (and that 4kB is an
_upper_ value to sane string lengths in the kernel)

 (c) really fundamentally stop with the "paper over" things approach
to kernel programming

Large strings are not a "feature". They are a bug.

It's also sad that apparently your strings are counted, but you don't
count them very well, so instead of just using the count (which is
*much* cheaper) you end up using '%s' and do things until you hit a
NUL byte.  Guess what? All our printk infrastructure is designed for
small strings, so '%s' isn't exactly optimized, because we expected
sanity. It ends up in a loop that literally does things one byte at a
time.

And no. The solution is *not* to paper that over by making '%s'
printing more efficient. It's not supposed to need that kind of
efficiency.

Christ. *IF* large strings were a good idea, and you actually almost
have the length encoded, this whole tracing code could have used the
fact that you have that approximate count to do something like

    len = fieldsize-1;
    len &= ~(sizeof(unsigned long)-1);
    len += strlen(fieldbase + len);
    seq_write(buf, fieldbase, len);

and at least now you'd have something *efficient*. Which is at least a
source of pride in itself.

But since I really think the core of the problem is "we shouldn't have
allowed this kind of crap", I think efficiency - while a source of
pride - is polishing a turd and more of the "paper over the
fundamental issue".

And no, the above code sequence isn't wonderfully pretty either. Using
'strlen()' to find the last NUL character in a word is disgusting, and
our kernel 'strlen()' isn't some optimized thing either.

We do have fancier cases for fancier code (the word-at-a-time stuff),
but at least the above only walks things one byte at a time for a tiny
sequence.

                   Linus

^ permalink raw reply	[relevance 79%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-03 17:38 99%             ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-03 17:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Sun, 3 Mar 2024 at 04:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> -       trace_seq_printf(s, ": %.*s", max, field->buf);
> +       trace_seq_puts(s, ": ");
> +       /* Write 1K chunks at a time */
> +       p = field->buf;
> +       do {
> +               int pre = max > 1024 ? 1024 : max;
> +               trace_seq_printf(s, "%.*s", pre, p);
> +               max -= pre;
> +               p += pre;
> +       } while (max > 0);

The above loop is complete garbage.

If 'p' is a string, you're randomly just walking past the end of the
string with 'p += pre'

And if 'o' isn't a string but has a fixed size, you shouldn't use '%s'
in the first place, you should just use seq_write().

Just stop. You are doing things entirely wrong, and you're just adding
random code.

I'm not taking *any* fixes from you as things are now, you're once
again only making things worse.

What was wrong with saying "don't do that"? You seem to be bending
over backwards to doing stupid things, and then insisting on doing
them entirely wrong.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: arch/x86/include/asm/processor.h:698:16: sparse: sparse: incorrect type in initializer (different address spaces)
  @ 2024-03-02 22:49 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 22:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: kernel test robot, oe-kbuild-all, linux-kernel, Arjan van de Ven,
	x86, Luc Van Oostenryck, Sparse Mailing-list

On Sat, 2 Mar 2024 at 14:00, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I had commented out both. But the real reason is the EXPORT_SYMBOL,
> which obviously wants to be EXPORT_PER_CPU_SYMBOL_GPL...

Side note: while it's nice to hear that sparse kind of got this right,
I wonder what gcc does when we start using the named address spaces
for percpu variables.

We actively make EXPORT_PER_CPU_SYMBOL_XYZ be a no-op for sparse
exactly because sparse ended up warning about the regular
EXPORT_SYMBOL, and we didn't have any "real" per-cpu export model.

So EXPORT_PER_CPU_SYMBOL_GPL() is kind of an artificial "shut up
sparse". But with __seg_gs/fs support for native percpu symbols with
gcc, I wonder if we'll hit the same thing. Or is there something that
makes gcc not warn about the named address spaces?

Because in many ways the gcc named address spaces _should_ be pretty
much equivalent to the sparse ones.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-02 20:55 99%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 20:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant

On Sat, 2 Mar 2024 at 12:47, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I'm fine with just making it 4K with a comment saying that "4K is the
> minimum page size on most archs, and to keep this consistent for crazy
> architectures like PowerPC and it's 64K pages, we hard code 4K to keep
> all architectures acting the same".

4k is at least a somewhat sane limit, and yes, being hw-independent is
a good idea.

We have other strings that have that limit for similar reasons (ie PATH_MAX).

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
    2024-03-02 20:25 99%     ` Linus Torvalds
@ 2024-03-02 20:33 99%     ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 20:33 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers

On Sat, 2 Mar 2024 at 12:00, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The error isn't printk, it's vsnprintf() that is writing to a seq_file
> to user space. There's no stack or printk involved here.

Look again. The code uses 'struct printf_spec' and we literally have a

   static_assert(sizeof(struct printf_spec) == 8);

because we want the compiler to generate sane calling conventions and
not waste space and code with arguments on the stack. That's literally
why we do all those limits in a bitfield - because the code in
question is written to say "unreasonable people can go screw
themselves".

I'm not interested in arguing this. We're not doing some completely
idiotic "let's edge up to the physical limit of what our printk code
is willing to do".

I'm perfectly happy having that WARN_ON() to continue to tell people
they are doing stupid things that won't work.

And if you ever decide that a sane limit is ok, you can send that in.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-02 20:25 99%     ` Linus Torvalds
  2024-03-02 20:33 99%     ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 20:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers

On Sat, 2 Mar 2024 at 12:00, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I don't have control over the strings. Anyone can do in user space:
>
>         fd = open("/sys/kernel/tracing/trace_marker", O_WRONLY);
>         r = write(fd, huge_string, 10000000);

So?

Stop the stupidity.

You already limit the string.

Just limit it to a sane value. if somebody uses a 10kB trace marker,
return an error, or just truncate it to 100 bytes.

You already were willing to truncate it to 32kB. Use your brain, and
realize that 32kB is a ridiculous limit.

Why do I even need to tell you this? I'm getting really tired of
having these idiotic arguments with you.

         Linus

^ permalink raw reply	[relevance 99%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  2024-03-02 18:06 93%                 ` Linus Torvalds
@ 2024-03-02 18:11 99%                   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 18:11 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Al Viro, David Howells, Jens Axboe, Christoph Hellwig,
	Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
	linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
	Kefeng Wang

On Sat, 2 Mar 2024 at 10:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In other words, it's the usual "Enterprise Hardware" situation. Looks
> fancy on paper, costs an arm and a leg, and the reality is just sad,
> sad, sad.

Don't get me wrong. I'm sure large companies are more than willing to
sell other large companies very expensive support contracts and have
engineers that they fly out to deal with the problems all these
enterprise solutions have.

The problem *will* get fixed somehow, it's just going to cost you. A lot.

Because THAT is what Enterprise Hardware is all about.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  @ 2024-03-02 18:06 93%                 ` Linus Torvalds
  2024-03-02 18:11 99%                   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 18:06 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: Al Viro, David Howells, Jens Axboe, Christoph Hellwig,
	Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
	linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
	Kefeng Wang

On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> I think this solution has two impacts:
> 1. Although it is not a performance-critical path, the CPU usage may be
> affected by one more memory copy in some large-memory applications.

Compared to the IO, the extra memory copy is a non-issue.

If anything, getting rid of the "copy_mc" flag removes extra code in a
much more important path (ie the normal iov_iter code).

> 2. If a hardware memory error occurs in "good location" and the
> ".copy_mc" is removed, the kernel will panic.

That's always true. We do not support non-recoverable machine checks
on kernel memory. Never have, and realistically probably never will.

In fact, as far as I know, the hardware that caused all this code in
the first place no longer exists, and never really made it to wide
production.

The machine checks in question happened on pmem, now killed by Intel.
It's possible that somebody wants to use it for something else, but
let's hope any future implementations are less broken than the
unbelievable sh*tshow that caused all this code in the first place.

The whole copy_mc_to_kernel() mess exists mainly due to broken pmem
devices along with old and broken CPU's that did not deal correctly
with machine checks inside the regular memory copy ('rep movs') code,
and caused hung machines.

IOW, notice how 'copy_mc_to_kernel()' just becomes a regular
'memcpy()' on fixed hardware, and how we have that disgusting
copy_mc_fragile_key that gets enabled for older CPU cores.

And yes, we then have copy_mc_enhanced_fast_string() which isn't
*that* disgusting, and that actually handles machine checks properly
on more modern hardware, but it's still very much "the hardware is
misdesiged, it has no testing, and nobody sane should depend on this"

In other words, it's the usual "Enterprise Hardware" situation. Looks
fancy on paper, costs an arm and a leg, and the reality is just sad,
sad, sad.

               Linus

^ permalink raw reply	[relevance 93%]

* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
  @ 2024-03-02 17:24 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 17:24 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers

On Sat, 2 Mar 2024 at 08:10, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - The change to allow trace_marker writes to be as big as the trace_seq can
>   hold, and also the change that increases the size of the trace_seq to two
>   pages, caused PowerPC kselftest trace_marker test to fail. The trace_marker
>   kselftest writes up to subbuffer size which is determined by PAGE_SIZE.
>   On PowerPC, the PAGE_SIZE can be 64K, which means the selftest will write
>   a string that is around 64K in size. The output of the trace_marker is
>   performed with a vsnprintf("%.*s", size, string), but this write would make
>   the size greater than 32K, which is the max precision of "%.*s", and that
>   causes a kernel warning. The fix is simply to keep the write of trace_marker
>   less than or equal to max signed short.

Please don't just add random limits that are based on other random limits.

That printk warning is for "you did something obviously crazy".

That does NOT MEAN that you now should limit your strings to something
JUST BORDERLINE CRAZY.

See?

There is not a way in hell that printing a 32kB string in tracing is
valid. EVER.

So stop it. Stop making limits be some random implementation detail.
Make limits *sane*.

Make a *sane* limit for tracing. Not a "avoid being called crazy" limit.

Honestly, I suspect that a sane limit for tracing strings is likely on
the order of tens or maybe hundreds of bytes. Not some kind of "fits
in a short" that is just printk saying "I refuse to waste memory on
the stack".

Side note: for similar reasons the field-width is a 24-bit integer.
And no, if you think that passing a 16MB field width is sane, you need
to rethink your life. Again, that's a small implementation detail, not
a "let's explore how stupid we can be".

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  2024-02-29 17:32 91%           ` Linus Torvalds
@ 2024-03-02  2:59 76%             ` Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-02  2:59 UTC (permalink / raw)
  To: Tong Tiangen, Al Viro
  Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
	David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, netdev, linux-kernel, Kefeng Wang

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]

On Thu, 29 Feb 2024 at 09:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> One option might be to make a failed memcpy_from_iter_mc() set another
> flag in the iter, and then make fault_in_iov_iter_readable() test that
> flag and return 'len' if that flag is set.
>
> Something like that (wild handwaving) should get the right error handling.
>
> The simpler alternative is maybe something like the attached.
> COMPLETELY UNTESTED. Maybe I've confused myself with all the different
> indiraction mazes in the iov_iter code.

Actually, I think the right model is to get rid of that horrendous
.copy_mc field entirely.

We only have one single place that uses it - that nasty core dumping
code. And that code is *not* performance critical.

And not only isn't it performance-critical, it already does all the
core dumping one page at a time because it doesn't want to write pages
that were never mapped into user space.

So what we can do is

 (a) make the core dumping code *copy* the page to a good location
with copy_mc_to_kernel() first

 (b) remove this horrendous .copy_mc crap entirely from iov_iter

This is slightly complicated by the fact that copy_mc_to_kernel() may
not even exist, and architectures that don't have it don't want the
silly extra copy. So we need to abstract the "copy to temporary page"
code a bit. But that's probably a good thing anyway in that it forces
us to have nice interfaces.

End result: something like the attached.

AGAIN: THIS IS ENTIRELY UNTESTED.

But hey, so was clearly all the .copy_mc code too that this removes, so...

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 6155 bytes --]

 fs/coredump.c       | 41 ++++++++++++++++++++++++++++++++++++++---
 include/linux/uio.h | 16 ----------------
 lib/iov_iter.c      | 23 -----------------------
 3 files changed, 38 insertions(+), 42 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index f258c17c1841..6a9b9f3280d8 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	loff_t pos;
 	ssize_t n;
 
+	if (!page)
+		return 0;
+
 	if (cprm->to_skip) {
 		if (!__dump_skip(cprm, cprm->to_skip))
 			return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	pos = file->f_pos;
 	bvec_set_page(&bvec, page, PAGE_SIZE, 0);
 	iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
-	iov_iter_set_copy_mc(&iter);
 	n = __kernel_write_iter(cprm->file, &iter, &pos);
 	if (n != PAGE_SIZE)
 		return 0;
@@ -895,10 +897,40 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	return 1;
 }
 
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+	void *buf = kmap_local_page(src);
+	size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+	kunmap_local(buf);
+	return left ? NULL : dst;
+}
+
+#else
+
+#define dump_page_alloc() ((struct page *)8) // Not NULL
+#define dump_page_free(x) do { } while (0)
+#define dump_page_copy(src,dst) ((dst),(src))
+
+#endif
+
 int dump_user_range(struct coredump_params *cprm, unsigned long start,
 		    unsigned long len)
 {
 	unsigned long addr;
+	struct page *dump_page = dump_page_alloc();
+
+	if (!dump_page)
+		return 0;
 
 	for (addr = start; addr < start + len; addr += PAGE_SIZE) {
 		struct page *page;
@@ -912,14 +944,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
 		 */
 		page = get_dump_page(addr);
 		if (page) {
-			int stop = !dump_emit_page(cprm, page);
+			int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
 			put_page(page);
-			if (stop)
+			if (stop) {
+				dump_page_free(dump_page);
 				return 0;
+			}
 		} else {
 			dump_skip(cprm, PAGE_SIZE);
 		}
 	}
+	dump_page_free(dump_page);
 	return 1;
 }
 #endif
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bea9c89922d9..00cebe2b70de 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
 
 struct iov_iter {
 	u8 iter_type;
-	bool copy_mc;
 	bool nofault;
 	bool data_source;
 	size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
 size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
-	i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return i->copy_mc;
-}
 #else
 #define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return false;
-}
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_UBUF,
-		.copy_mc = false,
 		.data_source = direction,
 		.ubuf = buf,
 		.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..cf2eb2b2f983 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
-		.copy_mc = false,
 		.nofault = false,
 		.data_source = direction,
 		.__iov = iov,
@@ -244,27 +243,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
-			   size_t len, void *to, void *priv2)
-{
-	return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
-	if (unlikely(i->count < bytes))
-		bytes = i->count;
-	if (unlikely(!bytes))
-		return 0;
-	return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
 static __always_inline
 size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(iov_iter_is_copy_mc(i)))
-		return __copy_from_iter_mc(addr, bytes, i);
 	return iterate_and_advance(i, bytes, addr,
 				   copy_from_user_iter, memcpy_from_iter);
 }
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_KVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.kvec = kvec,
 		.nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_BVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.bvec = bvec,
 		.nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
 	BUG_ON(direction & ~1);
 	*i = (struct iov_iter) {
 		.iter_type = ITER_XARRAY,
-		.copy_mc = false,
 		.data_source = direction,
 		.xarray = xarray,
 		.xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 	BUG_ON(direction != READ);
 	*i = (struct iov_iter){
 		.iter_type = ITER_DISCARD,
-		.copy_mc = false,
 		.data_source = false,
 		.count = count,
 		.iov_offset = 0

^ permalink raw reply related	[relevance 76%]

* Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
  @ 2024-03-01 20:10 96%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-01 20:10 UTC (permalink / raw)
  To: Nikolai Kondrashov
  Cc: Maxime Ripard, Helen Koike, linuxtv-ci, dave.pigott,
	linux-kernel, dri-devel, linux-kselftest, gustavo.padovan,
	pawiecz, tales.aparecida, workflows, kernelci, skhan, kunit-dev,
	nfraprado, davidgow, cocci, Julia.Lawall, laura.nao,
	ricardo.canuelo, kernel, gregkh

On Fri, 1 Mar 2024 at 02:27, Nikolai Kondrashov <spbnick@gmail.com> wrote:
>
> I agree, it's hard to imagine even a simple majority agreeing on how GitLab CI
> should be done. Still, we would like to help people, who are interested in
> this kind of thing, to set it up. How about we reframe this contribution as a
> sort of template, or a reference for people to start their setup with,
> assuming that most maintainers would want to tweak it? We would also be glad
> to stand by for questions and help, as people try to use it.

Ack. I think seeing it as a library for various gitlab CI models would
be a lot more palatable. Particularly if you can then show that yes,
it is also relevant to our currently existing drm case.

So I'm not objecting to having (for example) some kind of CI helper
templates - I think a logical place would be in tools/ci/ which is
kind of alongside our tools/testing subdirectory.

(And then perhaps have a 'gitlab' directory under that. I'm not sure
whether - and how much - commonality there might be between the
different CI models of different hosts).

Just to clarify: when I say "a logical place", I very much want to
emphasize the "a" - maybe there are better places, and I'm not saying
that is the only possible place. But it sounds more logical to me than
some.

            Linus

^ permalink raw reply	[relevance 96%]

* Re: [PATCH for 6.8] tomoyo: fix UAF write bug in tomoyo_write_control()
  @ 2024-03-01 19:14 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-01 19:14 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Sam Sun, paul, syzkaller, takedakn, jmorris, serge,
	linux-security-module, linux-kernel

On Fri, 1 Mar 2024 at 05:04, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> I couldn't reproduce this problem in my environment, but I believe
> this does fix a bug. Linus, can you directly apply to linux.git ?

Thanks. Applied,

        Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat()
  @ 2024-03-01 17:51 90%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-01 17:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Josh Poimboeuf, Jeff Layton, Chuck Lever, Kees Cook,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, Johannes Weiner,
	Michal Hocko, Shakeel Butt, Muchun Song, Alexander Viro,
	Christian Brauner, Jan Kara, linux-mm, linux-kernel, cgroups,
	linux-fsdevel

On Fri, 1 Mar 2024 at 09:07, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> This is just an example of using the kmem_cache_charge() API.  I think
> it's placed in a place that's applicable for Linus's example [1]
> although he mentions do_dentry_open() - I have followed from strace()
> showing openat(2) to path_openat() doing the alloc_empty_file().

Thanks. This is not the right patch,  but yes, patches 1-3 look very nice to me.

> The idea is that filp_cachep stops being SLAB_ACCOUNT. Allocations that
> want to be accounted immediately can use GFP_KERNEL_ACCOUNT. I did that
> in alloc_empty_file_noaccount() (despite the contradictory name but the
> noaccount refers to something else, right?) as IIUC it's about
> kernel-internal opens.

Yeah, the "noaccount" function is about not accounting it towards nr_files.

That said, I don't think it necessarily needs to do the memory
accounting either - it's literally for cases where we're never going
to install the file descriptor in any user space.

Your change to use GFP_KERNEL_ACCOUNT isn't exactly wrong, but I don't
think it's really the right thing either, because

> Why is this unfinished:
>
> - there are other callers of alloc_empty_file() which I didn't adjust so
>   they simply became memcg-unaccounted. I haven't investigated for which
>   ones it would make also sense to separate the allocation and accounting.
>   Maybe alloc_empty_file() would need to get a parameter to control
>   this.

Right. I think the natural and logical way to deal with this is to
just say "we account when we add the file to the fdtable".

IOW, just have fd_install() do it. That's the really natural point,
and also makes it very logical why alloc_empty_file_noaccount()
wouldn't need to do the GFP_KERNEL_ACCOUNT.

> - I don't know how to properly unwind the accounting failure case. It
>   seems like a new case because when we succeed the open, there's no
>   further error path at least in path_openat().

Yeah, let me think about this part. Becasue fd_install() is the right
point, but that too does not really allow for error handling.

Yes, we could close things and fail it, but it really is much too late
at this point.

What I *think* I'd want for this case is

 (a) allow the accounting to go over by a bit

 (b) make sure there's a cheap way to ask (before) about "did we go
over the limit"

IOW, the accounting never needed to be byte-accurate to begin with,
and making it fail (cheaply and early) on the next file allocation is
fine.

Just make it really cheap. Can we do that?

For example, maybe don't bother with the whole "bytes and pages"
stuff. Just a simple "are we more than one page over?" kind of
question. Without the 'stock_lock' mess for sub-page bytes etc

How would that look? Would it result in something that can be done
cheaply without locking and atomics and without excessive pointer
indirection through many levels of memcg data structures?

             Linus

^ permalink raw reply	[relevance 90%]

* Re: [GIT PULL] Networking for v6.8-rc7
  @ 2024-02-29 20:56 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-29 20:56 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni

On Thu, 29 Feb 2024 at 12:39, Jakub Kicinski <kuba@kernel.org> wrote:
>
> A few hours late, the commit on top fixes an odd "rcu_dereference()
> needs to know full type" build issue I can't repro..

Ugfh. That change literally makes a single load instruction be a
function call. Pretty sad, particularly with all the crazy CPU
mitigations causing that to be even more expensive than it is already.

I really don't see how that error can happen, it sounds very odd.

Oh well.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
  @ 2024-02-29 20:21 95%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-29 20:21 UTC (permalink / raw)
  To: Nikolai Kondrashov
  Cc: Maxime Ripard, Helen Koike, linuxtv-ci, dave.pigott,
	linux-kernel, dri-devel, linux-kselftest, gustavo.padovan,
	pawiecz, tales.aparecida, workflows, kernelci, skhan, kunit-dev,
	nfraprado, davidgow, cocci, Julia.Lawall, laura.nao,
	ricardo.canuelo, kernel, gregkh

On Thu, 29 Feb 2024 at 01:23, Nikolai Kondrashov <spbnick@gmail.com> wrote:
>
> However, I think a better approach would be *not* to add the .gitlab-ci.yaml
> file in the root of the source tree, but instead change the very same repo
> setting to point to a particular entry YAML, *inside* the repo (somewhere
> under "ci" directory) instead.

I really don't want some kind of top-level CI for the base kernel project.

We already have the situation that the drm people have their own ci
model. II'm ok with that, partly because then at least the maintainers
of that subsystem can agree on the rules for that one subsystem.

I'm not at all interested in having something that people will then
either fight about, or - more likely - ignore, at the top level
because there isn't some global agreement about what the rules are.

For example, even just running checkpatch is often a stylistic thing,
and not everybody agrees about all the checkpatch warnings.

I would suggest the CI project be separate from the kernel.

And having that slack channel that is restricted to particular
companies is just another sign of this whole disease.

If you want to make a google/microsoft project to do kernel CI, then
more power to you, but don't expect it to be some kind of agreed-upon
kernel project when it's a closed system.

               Linus

^ permalink raw reply	[relevance 95%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  @ 2024-02-29 17:32 91%           ` Linus Torvalds
  2024-03-02  2:59 76%             ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-29 17:32 UTC (permalink / raw)
  To: Tong Tiangen, Al Viro
  Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
	David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, netdev, linux-kernel, Kefeng Wang

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

On Thu, 29 Feb 2024 at 00:13, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> See the logic before this patch, always success (((void)(K),0)) is
> returned for three types: ITER_BVEC, ITER_KVEC and ITER_XARRAY.

No, look closer.

Yes, the iterate_and_advance() macro does that "((void)(K),0)" to make
the compiler generate better code for those cases (because then the
compiler can see that the return value is a compile-time zero), but
notice how _copy_mc_to_iter() didn't use that macro back then. It used
the unvarnished __iterate_and_advance() exactly so that the MC copy
case would *not* get that "always return zero" behavior.

That goes back to (in a different form) at least commit 1b4fb5ffd79b
("iov_iter: teach iterate_{bvec,xarray}() about possible short
copies").

But hardly anybody ever tests this machine-check special case code, so
who knows when it broke again.

I'm just looking at the source code, and with all the macro games it's
*really* hard to follow, so I may well be missing something.

> Maybe we're all gonna fix it back? as follows:

No. We could do it for the kvec and xarray case, just to get better
code generation again (not that I looked at it, so who knows), but the
one case that actually uses memcpy_from_iter_mc() needs to react to a
short write.

One option might be to make a failed memcpy_from_iter_mc() set another
flag in the iter, and then make fault_in_iov_iter_readable() test that
flag and return 'len' if that flag is set.

Something like that (wild handwaving) should get the right error handling.

The simpler alternative is maybe something like the attached.
COMPLETELY UNTESTED. Maybe I've confused myself with all the different
indiraction mazes in the iov_iter code.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 618 bytes --]

 lib/iov_iter.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..5236c16734e0 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -248,7 +248,10 @@ static __always_inline
 size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
 			   size_t len, void *to, void *priv2)
 {
-	return copy_mc_to_kernel(to + progress, iter_from, len);
+	size_t n = copy_mc_to_kernel(to + progress, iter_from, len);
+	if (n)
+		memset(to + progress - n, 0, n);
+	return 0;
 }
 
 static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)

^ permalink raw reply related	[relevance 91%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  2024-02-28 21:21 99%     ` Linus Torvalds
@ 2024-02-28 22:57 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-28 22:57 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig,
	Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
	linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
	Kefeng Wang

On Wed, 28 Feb 2024 at 13:21, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. If the copy doesn't succeed and make any progress at all, then
> the code in generic_perform_write() after the "goto again"
>
>                 //[4]
>                 if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
>                               bytes)) {
>
> should break out of the loop.

Ahh. I see the problem. Or at least part of it.

The iter is an ITER_BVEC.

And fault_in_iov_iter_readable() "knows" that an ITER_BVEC cannot
fail. Because obviously it's a kernel address, so no user page fault.

But for the machine check case, ITER_BVEC very much can fail.

This should never have worked in the first place.

What a crock.

Do we need to make iterate_bvec() always succeed fully, and make
copy_mc_to_kernel() zero out the end?

                   Linus

^ permalink raw reply	[relevance 99%]

* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
  @ 2024-02-28 21:21 99%     ` Linus Torvalds
  2024-02-28 22:57 99%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-28 21:21 UTC (permalink / raw)
  To: Tong Tiangen
  Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig,
	Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
	linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
	Kefeng Wang

On Sat, 17 Feb 2024 at 19:13, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> After this patch:
>    copy_page_from_iter_atomic()
>      -> iterate_and_advance2()
>        -> iterate_bvec()
>          -> remain = step()
>
> With CONFIG_ARCH_HAS_COPY_MC, the step() is copy_mc_to_kernel() which
> return "bytes not copied".
>
> When a memory error occurs during step(), the value of "left" equal to
> the value of "part" (no one byte is copied successfully). In this case,
> iterate_bvec() returns 0, and copy_page_from_iter_atomic() also returns
> 0. The callback shmem_write_end()[2] also returns 0. Finally,
> generic_perform_write() goes to "goto again"[3], and the loop restarts.
> 4][5] cannot enter and exit the loop, then deadloop occurs.

Hmm. If the copy doesn't succeed and make any progress at all, then
the code in generic_perform_write() after the "goto again"

                //[4]
                if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
                              bytes)) {
                        status = -EFAULT;
                        break;
                }

should break out of the loop.

So either your analysis looks a bit flawed, or I'm missing something.
Likely I'm missing something really obvious.

Why does the copy_mc_to_kernel() fail, but the
fault_in_iov_iter_readable() succeeds?

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] hotfixes for 6.8-rc7
  @ 2024-02-28  0:51 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-28  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, mm-commits, linux-kernel

On Tue, 27 Feb 2024 at 14:56, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 6 hotfixes.  3 are cc:stable and the remainder address post-6.7 issues
> or aren't considered appropriate for backporting.

Hmm. I notice that you add "Link:" pointers to lore, but you do so
even for emails that have been sent to you without any lists, so that
they don't actually exist on lore..

IOW, that link-generating automation of yours looks a bit overly aggressive.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN()
  @ 2024-02-27 22:34 86%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 22:34 UTC (permalink / raw)
  To: Dan Williams
  Cc: peterz, gregkh, Ira Weiny, Dave Jiang, Jonathan Cameron,
	Fabio M. De Francesco, linux-kernel, linux-cxl

[-- Attachment #1: Type: text/plain, Size: 2053 bytes --]

On Tue, 27 Feb 2024 at 13:42, Dan Williams <dan.j.williams@intel.com> wrote:
>
> I will also note that these last 3 statements, nuking the proposal from
> space, I find excessive. Yes, on the internet no one can hear you being
> subtle, but the "MORE READABLE" and "NOTHING" were pretty darn
> unequivocal, especially coming from the person who has absolute final
> say on what enters his project.

Heh. It's not just " one can hear you being subtle", sometimes it's
also "people don't take hints". It can be hard to tell..

Anyway, it's not that I hate the guard things in general. But I do
think they need to be used carefully, and I do think it's very
important that they have clean interfaces.

The current setup came about after quite long discussions about
getting reasonable syntax, and I'm still a bit worried even about the
current simpler ones.

And by "simpler ones" I don't mean our current scoped_cond_guard()
thing. We have exactly one user of it, and I have considered getting
rid of that one user because I think it's absolutely horrid. I haven't
figured out a better syntax for it.

For the non-scoped version, I actually think there *would* be a better
syntax - putting the error case after the macro (the way we put the
success case after the macro for the scoped one).

In fact, maybe the solution is to make the scoped and non-scoped
versions act very similar: we could do something like this:

        [scoped_]cond_guard(name, args) { success } else { fail };

and that syntax feels much more C-line to me.

So maybe something like the attached (TOTALLY UNTESTED!!) patch for
the scoped version, and then the non-scoped version would have the
same syntax (except it would have to generate that __UNIQUE_ID()
thing, of course).

I haven't thought much about this. But I think this would be more
acceptable to me, and also solve some of the ugliness with the current
pre-existing scoped_cond_guard().

I dunno. PeterZ did the existing stuff, but he's offlined due to
shoulder problems so not likely to chip in.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1999 bytes --]

 include/linux/cleanup.h | 7 +++----
 kernel/ptrace.c         | 5 +++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index c2d09bc4f976..a015ac9517a6 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -142,7 +142,7 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
  *	for conditional locks the loop body is skipped when the lock is not
  *	acquired.
  *
- * scoped_cond_guard (name, fail, args...) { }:
+ * scoped_cond_guard (name, args...) { } [ else { fail } :
  *      similar to scoped_guard(), except it does fail when the lock
  *      acquire fails.
  *
@@ -169,11 +169,10 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
 	for (CLASS(_name, scope)(args),					\
 	     *done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1)
 
-#define scoped_cond_guard(_name, _fail, args...) \
+#define scoped_cond_guard(_name, args...) \
 	for (CLASS(_name, scope)(args), \
 	     *done = NULL; !done; done = (void *)1) \
-		if (!__guard_ptr(_name)(&scope)) _fail; \
-		else
+		if (__guard_ptr(_name)(&scope))
 
 /*
  * Additional helper macros for generating lock guards with types, either for
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 2fabd497d659..f509b21a5711 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -441,8 +441,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
-	scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
-			   &task->signal->cred_guard_mutex) {
+	scoped_cond_guard (mutex_intr, &task->signal->cred_guard_mutex) {
 
 		scoped_guard (task_lock, task) {
 			retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
@@ -466,6 +465,8 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 			ptrace_set_stopped(task);
 		}
+	} else {
+		return -ERESTARTNOINTR;
 	}
 
 	/*

^ permalink raw reply related	[relevance 86%]

* Re: [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN()
  @ 2024-02-27 20:55 98%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-27 20:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: peterz, gregkh, Ira Weiny, Dave Jiang, Jonathan Cameron,
	Fabio M. De Francesco, linux-kernel, linux-cxl

On Tue, 27 Feb 2024 at 08:49, Dan Williams <dan.j.williams@intel.com> wrote:
>
> -       rc = down_read_interruptible(&cxl_region_rwsem);
> -       if (rc)
> -               return rc;
> +       cond_guard(rwsem_read_intr, return -EINTR, &cxl_region_rwsem);

Yeah, this is an example of how NOT to do things.

If you can't make the syntax be something clean and sane like

        if (!cond_guard(rwsem_read_intr, &cxl_region_rwsem))
                return -EINTR;

then this code should simply not be converted to guards AT ALL.

Note that we have a perfectly fine way to do conditional lock guarding
by simply using helper functions, which actually makes code MORE
READABLE:

        if (!down_read_interruptible(&cxl_region_rwsem))
                return -EINTR;
        rc = do_locked_function();
        up_read(&cxl_region_rwsem);
        return rc;

and notice how there are no special cases, no multiple unlocks, no
NOTHING. And the syntax is clean.

Honestly, if people are going to use 'guard' to write crap code, we
need to really stop that in its tracks.

There is no upside to making up new interfaces that only generate garbage.

This is final. I'm not willing to even entertain this kind of crap.

                     Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH 1/3] cleanup: Add cond_guard() to conditional guards
  @ 2024-02-27 20:49 98%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 20:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: peterz, gregkh, Dave Jiang, Ira Weiny, Jonathan Cameron,
	Fabio M. De Francesco, linux-kernel, linux-cxl

On Tue, 27 Feb 2024 at 08:48, Dan Williams <dan.j.williams@intel.com> wrote:
>
>         cond_guard(mutex_intr, return -EINTR, &mutex);

Again, this is *not* helping make code readable and less likely to have bugs.

The macro has obvious deficiencies, like the "_fail" argument not
being surrounded by  "{ }" (the equivalent of parenthesizing an
expression argument), but even with that trivial fix the syntax is
just too ugly to live, and doesn't match normal C syntax.

And yes, we have other macros that don't have normal C syntax, and
they are ugly too (example: #define CHKINFO(ret) in
drivers/video/fbdev/hgafb.c), but we should have higher standards for
globally visible helpers, and we should have *MUCH* higher standards
for helpers that are supposed to be all about reducing mistakes.

Bad / odd syntax does not reduce mistakes.

If a sane 'guard' model doesn't work for some code, the answer is not
to make an insane guard model. The answer is to not use 'guard' in
code like that.

               Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH 2/3] cleanup: Introduce cond_no_free_ptr()
  @ 2024-02-27 20:40 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 20:40 UTC (permalink / raw)
  To: Dan Williams; +Cc: peterz, gregkh, Jonathan Cameron, linux-kernel, linux-cxl

On Tue, 27 Feb 2024 at 08:49, Dan Williams <dan.j.williams@intel.com> wrote:
>
>     5/ cond_no_free_ptr(rc == 0, return rc, res, name);

Ugh. Honestly, this is all too ugly for words.

The whole - and only - point for the cond_guard() is to make mistakes
less likely.

This is not it. This makes mistakes unreadable and undebuggable.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: Linux regressions report for mainline [2024-02-25]
  @ 2024-02-26 17:33 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-26 17:33 UTC (permalink / raw)
  To: Linux regressions mailing list; +Cc: LKML, ntfs3, Konstantin Komarov

On Sun, 25 Feb 2024 at 06:21, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> Sorry, forgot something: there is a patch to fix a ntfs3 build problem
> that was posted 10+ days ago[1] that didn't get any reaction from the
> ntfs3 maintainer at all. Given the history of occasional slow responses
> for that subsystem I thought I'd let you know in case you want to pick
> the fix up directly; but if you do, consider using v2 of the patch[2].

Ack. Picked up directly.

               Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.8-rc6
@ 2024-02-25 23:57 42% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-25 23:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another week, another rc. Nothing here really stands out.

Last week I said that I was hoping things would calm down a bit.
Technically things did calm down a bit, and rc6 is smaller than rc5
was. But not by a huge amount, and honestly, while there's nothing
really alarming here, there's more here than I would really like at
this point in the release.

So this may end up being one of those releases that get an rc8. We'll
see. The fact that we have a bit more commits than I would really wish
for might not be a huge issue when a noticeable portion of said
commits end up being about self-tests etc.

So right now I'm still on the fence about things. Most of the stuff
here is really just fairly trivial driver updates (and those self-test
ones), but we do have regressions being tracked still, so...

Just reading through the appended shortlog, a lot of this really _is_
very trivial, and some of the core stuff (like the RCU fixes by Al)
are so esoteric that I kind of doubt anybody has ever hit them in real
life.  But still.. Over 300 non-merge fixes in the last week isn't
exactly quiet.

I'm clearly not ready to make that "do we do an rc8" decision right
now. I'll give it another week until I have to make that decision.

                 Linus

---

Aaro Koskinen (1):
      usb: gadget: omap_udc: fix USB gadget regression on Palm TE

Al Viro (15):
      Revert "get rid of DCACHE_GENOCIDE"
      erofs: fix handling kern_mount() failure
      fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
      rcu pathwalk: prevent bogus hard errors from may_lookup()
      affs: free affs_sb_info with kfree_rcu()
      exfat: move freeing sbi, upcase table and dropping nls into
rcu-delayed helper
      hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
      afs: fix __afs_break_callback() / afs_drop_open_mmap() race
      nfs: make nfs_set_verifier() safe for use in RCU pathwalk
      nfs: fix UAF on pathwalk running into umount
      procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
      procfs: make freeing proc_fs_info rcu-delayed
      fuse: fix UAF in rcu pathwalks
      cifs_get_link(): bail out in unsafe case
      ext4_get_link(): fix breakage in RCU mode

Alex Elder (1):
      net: ipa: don't overrun IPA suspend interrupt registers

Alexander Gordeev (1):
      net/iucv: fix the allocation size of iucv_path_table array

Alexander Stein (1):
      arm64: dts: tqma8mpql: fix audio codec iov-supply

Alison Schofield (4):
      x86/numa: Fix the address overlap check in numa_fill_memblks()
      x86/numa: Fix the sort compare func used in numa_fill_memblks()
      cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
      cxl/region: Allow out of order assembly of autodiscovered regions

Amit Machhiwal (1):
      KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty
'arch_compat'

Andreas Larsson (1):
      usb: uhci-grlib: Explicitly include linux/platform_device.h

Andrey Jr. Melnikov (1):
      ahci: asm1064: correct count of reported ports

Andrzej Kacprowski (1):
      accel/ivpu: Don't enable any tiles by default on VPU40xx

Andy Yan (4):
      arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi 4B
      arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi CM5 EVB
      arm64: dts: rockchip: rename vcc5v0_usb30_host regulator for
Cool Pi CM5 EVB
      arm64: dts: rockchip: Fix the num-lanes of pcie3x4 on Cool Pi CM5 EVB

Anshuman Khandual (1):
      mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array

Armin Wolf (1):
      drm/amd/display: Fix memory leak in dm_sw_fini()

Arnd Bergmann (4):
      RDMA/srpt: fix function pointer cast warnings
      nouveau: fix function cast warnings
      iommu/vt-d: Fix constant-out-of-range warning
      dm-integrity, dm-verity: reduce stack usage for recheck

Arunpravin Paneer Selvam (1):
      drm/buddy: Modify duplicate list_splice_tail call

Ashutosh Dixit (2):
      drm/xe/xe_gt_idle: Drop redundant newline in name
      drm/xe: Fix modpost warning on xe_mocs kunit module

Baokun Li (1):
      cachefiles: fix memory leak in cachefiles_add_cache()

Bart Van Assche (2):
      RDMA/srpt: Support specifying the srpt_service_guid parameter
      fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio

Benjamin Gray (1):
      kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()

Brian Foster (1):
      bcachefs: fix iov_iter count underflow on sub-block dio read

Chen Jun (1):
      irqchip/mbigen: Don't use bus_get_dev_root() to find the parent

Chengming Zhou (1):
      mm/zswap: invalidate duplicate entry when !zswap_enabled

Chris Morgan (1):
      arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names

Conor Dooley (1):
      riscv: dts: sifive: add missing #interrupt-cells to pmic

Corey Minyard (1):
      i2c: imx: when being a target, mark the last read as processed

Damien Le Moal (2):
      ata: libata-core: Do not try to set sleeping devices to standby
      ata: libata-core: Do not call ata_dev_power_set_standby() twice

Dan Carpenter (2):
      scsi: ufs: Uninitialized variable in ufshcd_devfreq_target()
      drm/nouveau/mmu/r535: uninitialized variable in r535_bar_new_()

Dan Williams (2):
      acpi/ghes: Remove CXL CPER notifications
      cxl/acpi: Fix load failures due to single window creation failure

Daniel Vacek (1):
      IB/hfi1: Fix sdma.h tx->num_descs off-by-one error

Daniil Dulov (1):
      afs: Increase buffer size in afs_update_volume_status()

Dave Airlie (3):
      nouveau/gsp: add kconfig option to enable GSP paths by default
      nouveau: add an ioctl to return vram bar size.
      nouveau: add an ioctl to report vram usage

Dave Jiang (4):
      cxl: Change 'struct cxl_memdev_state' *_perf_list to single
'struct cxl_dpa_perf'
      cxl: Remove unnecessary type cast in cxl_qos_class_verify()
      cxl: Fix sysfs export of qos_class for memdev
      cxl/test: Add support for qos_class checking

David Howells (1):
      netfs: Fix missing zero-length check in unbuffered write

Dmitry Baryshkov (1):
      Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"

Don Brace (1):
      scsi: smartpqi: Fix disable_managed_interrupts

Emil Renner Berthing (1):
      gpiolib: Handle no pin_ranges in gpiochip_generic_config()

Eric Dumazet (3):
      ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
      ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
      net: implement lockless setsockopt(SO_PEEK_OFF)

Erik Kurzinger (2):
      drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE
flag is set
      drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func

Fabio Estevam (2):
      Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector"
      Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector"

Florian Fainelli (1):
      net: bcmasp: Indicate MAC is in charge of PHY PM

Florian Westphal (2):
      netfilter: nf_tables: set dormant flag on hook register failure
      netfilter: nf_tables: use kzalloc for hook allocation

Frank Li (2):
      usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
      usb: cdns3: fix memory double free when handle zero packet

Gaurav Batra (1):
      powerpc/pseries/iommu: DLPAR add doesn't completely initialize
pci_controller

Geert Uytterhoeven (2):
      soc: microchip: Fix POLARFIRE_SOC_SYS_CTRL input prompt
      ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes

Geliang Tang (2):
      mptcp: add needs_id for userspace appending addr
      mptcp: add needs_id for netlink appending addr

Gianmarco Lusvardi (1):
      bpf, scripts: Correct GPL license name

Greg Joyce (1):
      block: sed-opal: handle empty atoms when parsing response

Guenter Roeck (4):
      MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
      parisc: Fix stack unwinder
      lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
      hwmon: (nct6775) Fix access to temperature configuration registers

Hangbin Liu (1):
      selftests: bonding: set active slave to primary eth1 specifically

Hans de Goede (8):
      platform/x86: intel: int0002_vgpio: Pass IRQF_ONESHOT to request_irq()
      platform/x86: touchscreen_dmi: Allow partial (prefix) matches
for ACPI names
      platform/x86: touchscreen_dmi: Consolidate Goodix upside-down
touchscreen data
      platform/x86: x86-android-tablets: Fix keyboard touchscreen on
Lenovo Yogabook1 X90
      platform/x86: Add new get_serdev_controller() helper
      platform/x86: x86-android-tablets: Fix serdev instantiation no
longer working
      platform/x86: x86-android-tablets: Fix acer_b1_750_goodix_gpios name
      platform/x86: intel-vbtn: Stop calling "VBDL" from notify_handler

Hari Bathini (1):
      bpf: Fix warning for bpf_cpumask in verifier

Heiko Carstens (3):
      s390/configs: provide compat topic configuration target
      s390/configs: enable INIT_STACK_ALL_ZERO in all configurations
      s390/configs: update default configurations

Heiko Stuebner (2):
      arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds
      arm64: dts: rockchip: set num-cs property for spi on px30

Helge Deller (1):
      Revert "parisc: Only list existing CPUs in cpu_possible_mask"

Hojin Nam (1):
      perf: CXL: fix CPMU filter value mask length

Horatiu Vultur (1):
      net: sparx5: Add spinlock for frame transmission from CPU

Hou Tao (3):
      x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
      x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
      selftest/bpf: Test the read of vsyscall page under x86-64

Huacai Chen (3):
      LoongArch: Disable IRQ before init_fn() for nonboot CPUs
      LoongArch: Update cpu_sibling_map when disabling nonboot CPUs
      LoongArch: Call early_init_fdt_scan_reserved_mem() earlier

Jakub Kicinski (5):
      net/sched: act_mirred: use the backlog for mirred ingress
      net/sched: act_mirred: don't override retval if we already lost the skb
      docs: netdev: update the link to the CI repo
      tools: ynl: make sure we always pass yarg to mnl_cb_run
      tools: ynl: don't leak mcast_groups on init error

Jason Gunthorpe (4):
      iommufd: Reject non-zero data_type if no data_len is provided
      s390: use the correct count for __iowrite64_copy()
      iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
      iommu/sva: Restore SVA handle sharing

Javier Martinez Canillas (1):
      sparc: Fix undefined reference to fb_is_primary_device

Jeremy Kerr (1):
      net: mctp: put sock on tag allocation failure

Jianbo Liu (1):
      net/sched: flower: Add lock protection when remove filter handle

Jiri Pirko (1):
      devlink: fix port dump cmd type

Joao Martins (9):
      iommufd/iova_bitmap: Bounds check mapped::pages access
      iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
      iommufd/selftest: Test u64 unaligned bitmaps
      iommufd/iova_bitmap: Handle recording beyond the mapped pages
      iommufd/selftest: Refactor dirty bitmap tests
      iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
      iommufd/selftest: Hugepage mock domain support
      iommufd/selftest: Add mock IO hugepages tests
      iommufd/iova_bitmap: Consider page offset for the pages to be pinned

Johan Jonker (1):
      arm64: dts: rockchip: Drop interrupts property from rk3328
pwm-rockchip node

Johannes Weiner (1):
      mm: memcontrol: clarify swapaccount=0 deprecation warning

Jonathan Corbet (1):
      docs: Instruct LaTeX to cope with deeper nesting

Josef Bacik (1):
      btrfs: fix deadlock with fiemap and extent locking

Justin Chen (1):
      net: bcmasp: Sanity check is off by one

Justin Iurman (2):
      Fix write to cloned skb in ipv6_hop_ioam()
      selftests: ioam: refactoring to align with the fix

Kairui Song (1):
      mm/swap: fix race when skipping swapcache

Kalesh AP (5):
      RDMA/bnxt_re: Avoid creating fence MR for newer adapters
      RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config
      RDMA/bnxt_re: Fix unconditional fence for newer adapters
      RDMA/bnxt_re: Return error for SRQ resize
      RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq

Kamal Heib (1):
      RDMA/qedr: Fix qedr_create_user_qp error flow

Kees Cook (1):
      enic: Avoid false positive under FORTIFY_SOURCE

Kent Overstreet (6):
      bcachefs: fix backpointer_to_text() when dev does not exist
      bcachefs: Kill __GFP_NOFAIL in buffered read path
      bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
      bcachefs: Fix bch2_journal_flush_device_pins()
      bcachefs: Fix check_snapshot() memcpy
      bcachefs: fix bch2_save_backtrace()

Krishna Kurapati (1):
      usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs

Krzysztof Kozlowski (3):
      riscv: dts: starfive: replace underscores in node names
      arm64: dts: rockchip: minor rk3588 whitespace cleanup
      LoongArch: dts: Minor whitespace cleanup

Kuniyuki Iwashima (3):
      dccp/tcp: Unhash sk from ehash for tb2 alloc failure after
check_estalblished().
      arp: Prevent overflow in arp_req_get().
      af_unix: Drop oob_skb ref before purging queue in GC.

Kurt Kanzenbach (1):
      net: stmmac: Fix EST offset for dwmac 5.10

Lad Prabhakar (1):
      cache: ax45mp_cache: Align end size to cache boundary in
ax45mp_dma_cache_wback()

Leon Romanovsky (1):
      RDMA/mlx5: Fix fortify source warning while accessing Eth segment

Lewis Huang (1):
      drm/amd/display: Only allow dig mapping to pwrseq in new asic

Li Ming (1):
      cxl/pci: Skip to handle RAS errors if CXL.mem device is detached

Lino Sanfilippo (2):
      serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485
is enabled
      serial: amba-pl011: Fix DMA transmission in RS485 mode

Linus Torvalds (3):
      sched/membarrier: reduce the ability to hammer on sys_membarrier
      drm/tests/drm_buddy: fix build failure on 32-bit targets
      Linux 6.8-rc6

Lucas Stach (1):
      bus: imx-weim: fix valid range check

Ma Jun (1):
      drm/amdgpu: Fix the runtime resume failure issue

Marc Dionne (2):
      netfs: Fix i_dio_count leak on DIO read past i_size
      afs: Fix ignored callbacks over ipv4

Marek Vasut (1):
      arm64: dts: imx8mp: Disable UART4 by default on Data Modul
i.MX8M Plus eDM SBC

Mario Limonciello (5):
      platform/x86/amd/pmf: Fix a suspend hang on Framework 13
      platform/x86/amd/pmf: Add debugging message for missing policy data
      platform/x86/amd/pmf: Fixup error handling for amd_pmf_init_smart_pc()
      platform/x86/amd/pmf: Fix a potential race with policy binary sideload
      platform/x86: thinkpad_acpi: Only update profile if successfully converted

Mark Brown (3):
      usb: typec: tpcm: Fix issues with power being removed during reset
      arm64/sme: Restore SME registers on exit from suspend
      arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspend

Mark Pearson (1):
      platform/x86: think-lmi: Fix password opcode ordering for workstations

Mark Zhang (1):
      IB/mlx5: Don't expose debugfs entries for RRoCE general
parameters if not supported

Martin Blumenstingl (1):
      drm/meson: Don't remove bridges which are created by other drivers

Martin K. Petersen (2):
      scsi: sd: usb_storage: uas: Access media prior to querying
device properties
      scsi: core: Consult supported VPD page list prior to fetching page

Martin KaFai Lau (2):
      bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
      selftests/bpf: Test racing between bpf_timer_cancel_and_free and
bpf_timer_cancel

Matthew Auld (1):
      drm/tests/drm_buddy: fix 32b build

Matthew Brost (3):
      drm/xe: Fix xe_vma_set_pte_size
      drm/xe: Add XE_VMA_PTE_64K VMA flag
      drm/xe: Return 2MB page size for compact 64k PTEs

Matthieu Baerts (NGI0) (7):
      selftests: mptcp: pm nl: also list skipped tests
      selftests: mptcp: pm nl: avoid error msg on older kernels
      selftests: mptcp: diag: fix bash warnings on older kernels
      selftests: mptcp: simult flows: fix some subtest names
      selftests: mptcp: userspace_pm: unique subtest names
      selftests: mptcp: diag: unique 'in use' subtest names
      selftests: mptcp: diag: unique 'cestab' subtest names

Max Kellermann (2):
      parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check
      parisc/kprobes: always include asm-generic/kprobes.h

Maxime Ripard (1):
      drm/i915/tv: Fix TV mode

Melissa Wen (1):
      drm/amd/display: fix null-pointer dereference on edid reading

Mike Marciniszyn (1):
      RDMA/irdma: Fix KASAN issue with tasklet

Mike Snitzer (1):
      dm-crypt, dm-integrity, dm-verity: bump target version

Mikulas Patocka (5):
      dm-integrity: recheck the integrity tag after a failure
      dm-verity: recheck the hash after a failure
      dm-crypt: don't modify the data when using authenticated encryption
      dm-crypt: recheck the integrity tag after a failure
      dm-verity, dm-crypt: align "struct bvec_iter" correctly

Muhammad Usama Anjum (1):
      selftests/iommu: fix the config fragment

Mustafa Ismail (2):
      RDMA/irdma: Set the CQ read threshold for GEN 1
      RDMA/irdma: Add AE for too many RNRS

Nam Cao (1):
      irqchip/sifive-plic: Enable interrupt if needed before EOI

Naohiro Aota (1):
      scsi: target: pscsi: Fix bio_put() for error case

Nhat Pham (1):
      mm/swap_state: update zswap LRU's protection range with the folio locked

Nikita Shubin (1):
      ARM: ep93xx: Add terminator to gpiod_lookup_table

Oliver Upton (3):
      KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
      KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
      irqchip/gic-v3-its: Do not assume vPE tables are preallocated

Ondrej Jirman (1):
      Revert "usb: typec: tcpm: reset counter when enter into
unattached state after try role"

Pablo Neira Ayuso (3):
      netfilter: nft_flow_offload: reset dst in route object after
setting up flow
      netfilter: nft_flow_offload: release dst in case direct xmit path is used
      netfilter: nf_tables: register hooks last when adding new chain/flowtable

Palmer Dabbelt (1):
      tty: hvc: Don't enable the RISC-V SBI console by default

Paolo Abeni (4):
      mptcp: fix lockless access in subflow ULP diag
      mptcp: fix data races on local_id
      mptcp: fix data races on remote_id
      mptcp: fix duplicate subflow creation

Pavel Sakharov (1):
      net: stmmac: Fix incorrect dereference in interrupt handlers

Pawan Gupta (5):
      x86/bugs: Add asm helpers for executing VERW
      x86/entry_64: Add VERW just before userspace transition
      x86/entry_32: Add VERW just before userspace transition
      x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
      KVM/VMX: Move VERW closer to VMentry for MDS mitigation

Pawel Laszczak (2):
      usb: cdnsp: blocked some cdns3 specific code
      usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers

Peter Oberparleiter (1):
      s390/cio: fix invalid -EBUSY on ccw_device_start

Qu Wenruo (1):
      btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size

Radhey Shyam Pandey (1):
      ata: ahci_ceva: fix error handling for Xilinx GT PHY support

Randy Dunlap (2):
      scsi: jazz_esp: Only build if SCSI core is builtin
      net: ethernet: adi: requires PHYLIB support

Rob Herring (5):
      arm64: dts: freescale: Disable interrupt_map check
      arm: dts: Fix dtc interrupt_provider warnings
      arm64: dts: Fix dtc interrupt_provider warnings
      arm: dts: Fix dtc interrupt_map warnings
      arm64: dts: qcom: Fix interrupt-map cell sizes

Robert Richter (1):
      cxl/pci: Fix disabling memory if DVSEC CXL Range does not match
a CFMWS window

Rémi Denis-Courmont (2):
      phonet: take correct lock to peek at the RX queue
      phonet/pep: fix racy skb_queue_empty() use

Sabrina Dubroca (5):
      tls: break out of main loop when PEEK gets a non-data record
      tls: stop recv() if initial process_rx_list gave us non-DATA
      tls: don't skip over different type records from the rx_list
      selftests: tls: add test for merging of same-type control messages
      selftests: tls: add test for peeking past a record of a different type

Samasth Norway Ananda (1):
      firmware: microchip: fix wrong sizeof argument

Sandeep Dhavale (1):
      erofs: fix refcount on the metabuf used for inode lookup

Sean Christopherson (1):
      KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH

Sebastian Andrzej Siewior (1):
      xsk: Add truesize to skb_add_rx_frag().

Sebastian Reichel (1):
      arm64: dts: rockchip: mark system power controller on rk3588-evb1

SeongJae Park (4):
      mm/damon/core: check apply interval in damon_do_apply_schemes()
      mm/damon/sysfs-schemes: handle schemes sysfs dir removal before
commit_schemes_quota_goals
      mm/damon/reclaim: fix quota stauts loss due to online tunings
      mm/damon/lru_sort: fix quota status loss due to online tunings

Shakeel Butt (1):
      MAINTAINERS: mailmap: update Shakeel's email address

Shannon Nelson (1):
      ionic: use pci_is_enabled not open code

Shigeru Yoshida (1):
      bpf, sockmap: Fix NULL pointer dereference in
sk_psock_verdict_data_ready()

Shiraz Saleem (1):
      RDMA/irdma: Validate max_send_wr and max_recv_wr

Shyam Sundar S K (2):
      platform/x86/amd/pmf: Remove smart_pc_status enum
      platform/x86/amd/pmf: Fix TEE enact command failure after
suspend and resume

Siddharth Vadapalli (1):
      net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY

Simon Horman (1):
      MAINTAINERS: Add framer headers to NETWORKING [GENERAL]

Srinivasan Shanmugam (1):
      drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv

Steven Rostedt (Google) (1):
      ring-buffer: Do not let subbuf be bigger than write mask

Subbaraya Sundeep (1):
      octeontx2-af: Consider the action set by PF

Swapnil Patel (1):
      drm/amd/display: fix input states translation error for dcn35 & dcn351

Terry Tritton (1):
      selftests/mm: uffd-unit-test check if huge page size is 0

Thinh Nguyen (1):
      usb: dwc3: gadget: Don't disconnect if not started

Thomas Hellström (2):
      drm/xe/uapi: Remove support for persistent exec_queues
      drm/ttm: Fix an invalid freeing on already freed page in error path

Tobias Waldekranz (2):
      net: bridge: switchdev: Skip MDB replays of deferred events on offload
      net: bridge: switchdev: Ensure deferred event delivery on unoffload

Tom Parkin (1):
      l2tp: pass correct message length to ip6_append_data

Tudor Ambarus (2):
      dt-bindings: clock: gs101: rename cmu_misc clock-names
      clk: samsung: clk-gs101: comply with the new dt cmu_misc clock names

Uwe Kleine-König (1):
      ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes

Vasiliy Kovalev (3):
      ipv6: sr: fix possible use-after-free and null-ptr-deref
      devlink: fix possible use-after-free and memory leaks in devlink_init()
      gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()

Vegard Nossum (1):
      docs: translations: use attribute to store current language

Vidya Sagar (1):
      PCI/MSI: Prevent MSI hardware interrupt number truncation

WANG Xuerui (3):
      LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() &
kvm_check_cpucfg()
      LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
      LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments

Wayne Lin (1):
      drm/amd/display: adjust few initialization order in dm

Will Deacon (1):
      Revert "arm64: jump_label: use constraints "Si" instead of "i""

Xu Yang (2):
      usb: roles: fix NULL pointer issue when put module's reference
      usb: roles: don't get/set_role() when usb_role_switch is unregistered

Yafang Shao (2):
      bpf: Fix an issue due to uninitialized bpf_iter_task
      selftests/bpf: Add negtive test cases for task iter

Yi Liu (9):
      iommu/vt-d: Track nested domains in parent
      iommu/vt-d: Add __iommu_flush_iotlb_psi()
      iommu/vt-d: Add missing iotlb flush for parent domain
      iommu/vt-d: Update iotlb in nested domain attach
      iommu/vt-d: Add missing device iotlb flush for parent domain
      iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
      iommu/vt-d: Wrap the dirty tracking loop to be a helper
      iommu/vt-d: Add missing dirty tracking set for parent domain
      iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking

Yishai Hadas (1):
      RDMA/mlx5: Relax DEVX access upon modify commands

Yosry Ahmed (1):
      mm: zswap: fix missing folio cleanup in writeback race path

Yu Kuai (6):
      md: Fix missing release of 'active_io' for flush
      md: Don't ignore suspended array in md_check_recovery()
      md: Don't ignore read-only array in md_check_recovery()
      md: Make sure md_do_sync() will set MD_RECOVERY_DONE
      md: Don't register sync_thread for reshape directly
      md: Don't suspend the array for interrupted reshape

Zhipeng Lu (1):
      IB/hfi1: Fix a memleak in init_credit_return

zhenwei pi (1):
      crypto: virtio/akcipher - Fix stack overflow on memcpy

^ permalink raw reply	[relevance 42%]

* Re: [PATCH next v2 08/11] minmax: Add min_const() and max_const()
  @ 2024-02-25 17:13 97%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-25 17:13 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, Netdev, dri-devel, Jens Axboe,
	Matthew Wilcox (Oracle),
	Christoph Hellwig, linux-btrfs, Andrew Morton, Andy Shevchenko,
	David S . Miller, Dan Carpenter, Jani Nikula

On Sun, 25 Feb 2024 at 08:53, David Laight <David.Laight@aculab.com> wrote:
>
> The expansions of min() and max() contain statement expressions so are
> not valid for static intialisers.
> min_const() and max_const() are expressions so can be used for static
> initialisers.

I hate the name.

Naming shouldn't be about an implementation detail, particularly not
an esoteric one like the "C constant expression" rule. That can be
useful for some internal helper functions or macros, but not for
something that random people are supposed to USE.

Telling some random developer that inside an array size declaration or
a static initializer you need to use "max_const()" because it needs to
syntactically be a constant expression, and our regular "max()"
function isn't that, is just *horrid*.

No, please just use the traditional C model of just using ALL CAPS for
macro names that don't act like a function.

Yes, yes, that may end up requiring getting rid of some current users of

  #define MIN(a,b) ((a)<(b) ? (a):(b))

but dammit, we don't actually have _that_ many of them, and why should
we have random drivers doing that anyway?

              Linus

^ permalink raw reply	[relevance 97%]

Results 1-200 of ~40000   | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2022-03-02  4:34     [PATCH 00/19] Enable -Wshadow=local for kernel/sched Matthew Wilcox (Oracle)
2024-04-16 21:15     ` Kees Cook
2024-04-17  0:29 99%   ` Linus Torvalds
2024-04-17  0:50 90%     ` Linus Torvalds
2022-05-27 11:29     [GIT PULL] Crypto Fixes for 5.19 Herbert Xu
2022-08-02  6:05     ` [GIT PULL] Crypto Update for 5.20 Herbert Xu
2022-10-04  8:54       ` [GIT PULL] Crypto Update for 6.1 Herbert Xu
2022-12-14  8:15         ` [GIT PULL] Crypto Update for 6.2 Herbert Xu
2023-02-20  5:22           ` [GIT PULL] Crypto Update for 6.3 Herbert Xu
2023-04-24  4:52             ` [GIT PULL] Crypto Update for 6.4 Herbert Xu
2023-06-29  5:06               ` [GIT PULL] Crypto Update for 6.5 Herbert Xu
2023-08-28  9:22                 ` [GIT PULL] Crypto Update for 6.6 Herbert Xu
2023-11-02  6:56                   ` [GIT PULL] Crypto Update for 6.7 Herbert Xu
2024-01-09 22:17                     ` [GIT PULL] Crypto Update for 6.8 Herbert Xu
2024-03-15  3:04                       ` [GIT PULL] Crypto Update for 6.9 Herbert Xu
2024-03-15 21:51 99%                     ` Linus Torvalds
2023-04-23 23:55     [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) Tetsuo Handa
2023-04-24  0:44     ` Al Viro
2023-04-24  1:09       ` Tetsuo Handa
2023-04-25 14:47         ` Tetsuo Handa
2023-04-25 16:03           ` Al Viro
2023-04-25 22:09             ` Tetsuo Handa
2023-04-26 11:05               ` [PATCH] tty: tty_io: remove hung_up_tty_fops Tetsuo Handa
2023-05-14  1:02                 ` [PATCH v2] " Tetsuo Handa
2023-05-30 10:44                   ` Greg Kroah-Hartman
2023-05-30 11:57                     ` Tetsuo Handa
2023-05-30 12:51                       ` Greg Kroah-Hartman
2024-04-27  6:20                         ` [PATCH v3] " Tetsuo Handa
2024-04-27 19:02 96%                       ` Linus Torvalds
2024-04-28 10:19                             ` Tetsuo Handa
2024-04-28 18:50 99%                           ` Linus Torvalds
2024-04-29 13:55                                 ` Marco Elver
2024-04-29 15:38 99%                               ` Linus Torvalds
2024-05-01 18:45                                     ` Paul E. McKenney
2024-05-01 18:56 99%                                   ` Linus Torvalds
2024-05-01 19:02                                         ` Paul E. McKenney
2024-05-01 20:14                                           ` Marco Elver
2024-05-01 21:06 97%                                         ` Linus Torvalds
2024-05-01 21:20 94%                                           ` Linus Torvalds
2023-09-25 12:02     [PATCH v7 00/12] iov_iter: Convert the iterator macros into inline funcs David Howells
2023-09-25 12:03     ` [PATCH v7 07/12] iov_iter: Convert iterate*() to " David Howells
2024-02-18  3:13       ` [bug report] dead loop in generic_perform_write() //Re: " Tong Tiangen
2024-02-28 21:21 99%     ` Linus Torvalds
2024-02-28 22:57 99%       ` Linus Torvalds
2024-02-29  8:13             ` Tong Tiangen
2024-02-29 17:32 91%           ` Linus Torvalds
2024-03-02  2:59 76%             ` Linus Torvalds
2024-03-02  9:37                   ` Tong Tiangen
2024-03-02 18:06 93%                 ` Linus Torvalds
2024-03-02 18:11 99%                   ` Linus Torvalds
2024-03-04 11:56                   ` David Howells
2024-03-04 18:32 99%                 ` Linus Torvalds
2024-02-13  5:55     [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-04-23 15:21     ` Shrikanth Hegde
2024-04-23 16:13 97%   ` Linus Torvalds
2024-02-13  5:55     [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO Ankur Arora
2024-03-03  1:08     ` Joel Fernandes
2024-03-05  8:11       ` Ankur Arora
2024-03-06 20:42         ` Joel Fernandes
2024-03-07 19:01           ` Paul E. McKenney
2024-03-08  0:15             ` Joel Fernandes
2024-03-08  0:42               ` Paul E. McKenney
2024-03-08  4:22                 ` Ankur Arora
2024-03-08 21:33                   ` Paul E. McKenney
2024-03-11  4:50                     ` Ankur Arora
2024-03-11 19:26                       ` Paul E. McKenney
2024-03-11 20:09                         ` Ankur Arora
2024-03-11 20:23 95%                       ` Linus Torvalds
2024-02-25 13:21     Linux regressions report for mainline [2024-02-25] Regzbot (on behalf of Thorsten Leemhuis)
2024-02-25 14:21     ` Linux regression tracking (Thorsten Leemhuis)
2024-02-26 17:33 99%   ` Linus Torvalds
2024-02-25 16:46     [PATCH next v2 00/11] minmax: Optimise to reduce .i line length David Laight
2024-02-25 16:53     ` [PATCH next v2 08/11] minmax: Add min_const() and max_const() David Laight
2024-02-25 17:13 97%   ` Linus Torvalds
2024-02-25 23:57 42% Linux 6.8-rc6 Linus Torvalds
2024-02-27 16:48     [PATCH 0/2] cleanup: A couple extensions for conditional resource management Dan Williams
2024-02-27 16:48     ` [PATCH 1/3] cleanup: Add cond_guard() to conditional guards Dan Williams
2024-02-27 20:49 98%   ` Linus Torvalds
2024-02-27 16:48     ` [PATCH 2/3] cleanup: Introduce cond_no_free_ptr() Dan Williams
2024-02-27 20:40 99%   ` Linus Torvalds
2024-02-27 16:49     ` [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN() Dan Williams
2024-02-27 20:55 98%   ` Linus Torvalds
2024-02-27 21:41         ` Dan Williams
2024-02-27 22:34 86%       ` Linus Torvalds
2024-02-27 22:56     [GIT PULL] hotfixes for 6.8-rc7 Andrew Morton
2024-02-28  0:51 99% ` Linus Torvalds
2024-02-28 22:55     [PATCH 0/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing Helen Koike
2024-02-28 22:55     ` [PATCH 1/3] " Helen Koike
2024-02-29  9:02       ` Maxime Ripard
2024-02-29  9:23         ` Nikolai Kondrashov
2024-02-29 20:21 95%       ` Linus Torvalds
2024-03-01 10:27             ` Nikolai Kondrashov
2024-03-01 20:10 96%           ` Linus Torvalds
2024-02-29 20:39     [GIT PULL] Networking for v6.8-rc7 Jakub Kicinski
2024-02-29 20:56 99% ` Linus Torvalds
2024-03-01  8:32     [Linux Kernel Bug] KASAN: slab-out-of-bounds Write in tomoyo_write_control Sam Sun
2024-03-01 13:04     ` [PATCH for 6.8] tomoyo: fix UAF write bug in tomoyo_write_control() Tetsuo Handa
2024-03-01 19:14 99%   ` Linus Torvalds
2024-03-01 17:07     [PATCH RFC 0/4] memcg_kmem hooks refactoring and kmem_cache_charge() Vlastimil Babka
2024-03-01 17:07     ` [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat() Vlastimil Babka
2024-03-01 17:51 90%   ` Linus Torvalds
2024-03-24  2:27         ` Al Viro
2024-03-24 17:44 94%       ` Linus Torvalds
2024-03-01 20:12     arch/x86/include/asm/processor.h:698:16: sparse: sparse: incorrect type in initializer (different address spaces) kernel test robot
2024-03-01 21:57     ` Thomas Gleixner
2024-03-01 22:26       ` Thomas Gleixner
2024-03-02 11:37         ` Thomas Gleixner
2024-03-02 15:44           ` Thomas Gleixner
2024-03-02 22:00             ` Thomas Gleixner
2024-03-02 22:49 99%           ` Linus Torvalds
2024-03-01 22:52     [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled Tetsuo Handa
2024-03-05 11:31     ` Tetsuo Handa
2024-03-05 17:57 93%   ` Linus Torvalds
2024-03-06 22:08         ` Tetsuo Handa
2024-03-07  0:09 99%       ` Linus Torvalds
2024-03-02 16:12     [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short Steven Rostedt
2024-03-02 17:24 99% ` Linus Torvalds
2024-03-02 19:59       ` Steven Rostedt
2024-03-02 20:25 99%     ` Linus Torvalds
2024-03-02 20:33 99%     ` Linus Torvalds
2024-03-02 20:47           ` Steven Rostedt
2024-03-02 20:55 99%         ` Linus Torvalds
2024-03-03 12:59               ` Steven Rostedt
2024-03-03 17:38 99%             ` Linus Torvalds
2024-03-03 19:07                   ` Steven Rostedt
2024-03-03 20:09 79%                 ` Linus Torvalds
2024-03-03 21:00                       ` Steven Rostedt
2024-03-04 21:42                         ` Steven Rostedt
2024-03-04 21:50 99%                       ` Linus Torvalds
2024-03-04 22:10                             ` Steven Rostedt
2024-03-04 23:20 98%                           ` Linus Torvalds
2024-03-04 23:47                                 ` Steven Rostedt
2024-03-04 23:52                                   ` Steven Rostedt
2024-03-05  0:17 99%                                 ` Linus Torvalds
2024-03-03 21:15 51% Linux 6.8-rc7 Linus Torvalds
2024-03-04 10:12     [patch 0/9] x86: Cure tons of sparse warnings (mostly __percpu) Thomas Gleixner
2024-03-04 10:12     ` [patch 5/9] x86: Cure per CPU madness on UP Thomas Gleixner
2024-03-15 16:17       ` Guenter Roeck
2024-03-15 16:42 97%     ` Linus Torvalds
2024-03-15 17:40           ` Thomas Gleixner
2024-03-15 22:55             ` Thomas Gleixner
2024-03-15 23:23 93%           ` Linus Torvalds
2024-03-16  1:11                 ` Thomas Gleixner
2024-03-16  1:23 99%               ` Linus Torvalds
2024-03-04 16:19     [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug Joel Fernandes
2024-03-05  6:24     ` linke li
2024-03-06 15:37       ` Steven Rostedt
2024-03-06 17:36         ` Paul E. McKenney
2024-03-06 18:01           ` Steven Rostedt
2024-03-06 18:43 87%         ` Linus Torvalds
2024-03-06 18:55               ` Steven Rostedt
2024-03-06 19:01 95%             ` Linus Torvalds
2024-03-06 19:27 95%               ` Linus Torvalds
2024-03-06 19:47                     ` Steven Rostedt
2024-03-06 20:06 94%                   ` Linus Torvalds
2024-03-06 19:27                   ` Steven Rostedt
2024-03-06 19:46 92%                 ` Linus Torvalds
2024-03-06 20:20 97%                   ` Linus Torvalds
2024-03-07  2:29                       ` Paul E. McKenney
2024-03-07  2:43 93%                     ` Linus Torvalds
2024-03-07  2:49 99%                       ` Linus Torvalds
2024-03-07  3:06                           ` Mathieu Desnoyers
2024-03-07  3:37                             ` Paul E. McKenney
2024-03-07 13:53                               ` Mathieu Desnoyers
2024-03-07 19:47                                 ` Paul E. McKenney
2024-03-07 20:00 96%                               ` Linus Torvalds
2024-03-07 20:57                                     ` Paul E. McKenney
2024-03-07 21:40                                       ` Julia Lawall
2024-03-07 22:09 87%                                     ` Linus Torvalds
2024-03-05 13:33     [PATCH] coredump: get machine check errors early rather than during iov_iter Tong Tiangen
2024-03-05 16:33     ` Christian Brauner
2024-03-05 16:39       ` Jens Axboe
2024-03-05 17:29 71%     ` Linus Torvalds
2024-03-05 23:51     linux-next: build warning after merge of the vfs-brauner tree Stephen Rothwell
2024-03-06  2:48 99% ` Linus Torvalds
2024-03-06  4:37       ` Stephen Rothwell
2024-03-06  4:47 99%     ` Linus Torvalds
2024-03-08 10:13     [GIT PULL] vfs pidfd Christian Brauner
2024-03-11 20:05 99% ` Linus Torvalds
2024-03-12 14:15       ` Christian Brauner
2024-03-12 16:23 99%     ` Linus Torvalds
2024-03-12 20:09           ` Christian Brauner
2024-03-12 20:21 99%         ` Linus Torvalds
2024-03-13 17:10               ` Christian Brauner
2024-03-13 19:40 99%             ` Linus Torvalds
2024-03-08 17:15     [GIT PULL] RCU changes for v6.9 Boqun Feng
2024-03-12 20:32     ` Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9] Florian Fainelli
2024-03-12 21:07       ` Boqun Feng
2024-03-12 21:34         ` Florian Fainelli
2024-03-12 21:44 99%       ` Linus Torvalds
2024-03-12 23:48             ` Boqun Feng
2024-03-13 16:01               ` Joel Fernandes
2024-03-13 21:30                 ` Florian Fainelli
2024-03-13 21:59                   ` Russell King (Oracle)
2024-03-13 22:04                     ` Florian Fainelli
2024-03-13 22:49                       ` Russell King (Oracle)
2024-03-13 23:29                         ` Florian Fainelli
2024-03-14  1:15 97%                       ` Linus Torvalds
2024-03-08 18:38     [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters Steven Rostedt
2024-03-08 20:39 96% ` Linus Torvalds
2024-03-08 21:35       ` Steven Rostedt
2024-03-08 21:39 99%     ` Linus Torvalds
2024-03-08 21:41 99%       ` Linus Torvalds
2024-03-10 21:06 51% Linux 6.8 Linus Torvalds
2024-03-11 15:19     [GIT PULL] x86/sev for v6.9-rc1 Borislav Petkov
2024-03-12  0:50 99% ` Linus Torvalds
2024-03-11 15:57     [GIT PULL] EDAC updates for v6.9 Borislav Petkov
2024-03-12  1:12 99% ` Linus Torvalds
2024-03-12  2:24       ` Randy Dunlap
2024-03-12  2:25 99%     ` Linus Torvalds
2024-03-11 19:30     [GIT PULL] AFFS update for 6.9 David Sterba
2024-03-12 20:02 54% ` Linus Torvalds
2024-03-12  4:25     [GIT PULL] Networking for v6.9 Jakub Kicinski
2024-03-12 20:17 99% ` Linus Torvalds
2024-03-12 20:34       ` Jakub Kicinski
2024-03-12 20:47         ` Jakub Kicinski
2024-03-12 21:11 95%       ` Linus Torvalds
2024-03-13  1:00 99% ` Linus Torvalds
2024-03-12  9:55     [GIT PULL] slab updates for 6.9 Vlastimil Babka
2024-03-13  3:54 99% ` Linus Torvalds
2024-03-13  1:10     [GIT PULL] bcachefs " Kent Overstreet
2024-03-13 20:47 94% ` Linus Torvalds
2024-03-13 21:34       ` Kent Overstreet
2024-03-13 21:51 99%     ` Linus Torvalds
2024-03-13 22:22           ` Kent Overstreet
2024-03-13 22:28             ` Kent Overstreet
2024-03-14 17:15 99%           ` Linus Torvalds
2024-03-13  4:06     [git pull] drm for 6.9-rc1 Dave Airlie
2024-03-14  1:49 99% ` Linus Torvalds
2024-03-13 20:56     [RFC PATCH 0/2] Introduce serialized smp_call_function APIs Mathieu Desnoyers
2024-03-13 20:56     ` [RFC PATCH 1/2] smp: Implement " Mathieu Desnoyers
2024-03-13 21:19 99%   ` Linus Torvalds
     [not found]     <65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>
2024-03-14 18:36 97% ` [GIT PULL] platform-drivers-x86 for v6.9-1 Linus Torvalds
2024-03-14 18:43     [GIT PULL] dlm fixes for 6.9 David Teigland
2024-03-15 17:10 87% ` Linus Torvalds
2024-03-14 19:43     [GIT PULL] clk changes for the merge window Stephen Boyd
2024-03-15 18:54 99% ` Linus Torvalds
2024-03-14 20:31     [GIT PULL] lsm/lsm-pr-20240314 Paul Moore
2024-03-14 23:05 99% ` Linus Torvalds
2024-03-15 11:03     [GIT PULL]: Generic phy updates for v6.9 Vinod Koul
2024-03-15 19:22 99% ` Linus Torvalds
2024-03-16 18:05       ` Vinod Koul
2024-03-16 18:23 99%     ` Linus Torvalds
2024-03-15 15:10     [GIT PULL] fs/9p patches for 6.9 merge window Eric Van Hensbergen
2024-03-15 17:17 99% ` Linus Torvalds
2024-03-15 16:29     [GIT PULL] tracing: Updates for v6.9 Steven Rostedt
2024-03-16 16:31 99% ` Linus Torvalds
2024-03-16 16:59 93%   ` Linus Torvalds
2024-03-16 18:18 97%     ` Linus Torvalds
2024-03-16 18:20         ` Steven Rostedt
2024-03-16 18:42 93%       ` Linus Torvalds
2024-03-16 20:00             ` Borislav Petkov
2024-03-16 20:42 88%           ` Linus Torvalds
2024-03-15 17:49     [GIT PULL] KVM changes for Linux 6.9 merge window Paolo Bonzini
2024-03-15 22:28 98% ` Linus Torvalds
2024-03-15 23:32       ` Oliver Upton
2024-03-15 23:49         ` Oliver Upton
2024-03-16  8:48           ` Paolo Bonzini
2024-03-16 16:01 99%         ` Linus Torvalds
2024-03-16  0:24       ` [PATCH] Revert "KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking" Oliver Upton
2024-03-16  0:51 99%     ` Linus Torvalds
2024-03-18 12:19     [GIT PULL] vfs fixes Christian Brauner
2024-03-18 19:14 92% ` Linus Torvalds
2024-03-18 19:41 99%   ` Linus Torvalds
2024-03-18 15:30     [GIT PULL v2] tracing: Updates for v6.9 Steven Rostedt
2024-03-19 16:23 96% ` Linus Torvalds
2024-03-19 17:06       ` Steven Rostedt
2024-03-19 17:13         ` Steven Rostedt
2024-03-19 21:03           ` Nathan Chancellor
2024-03-19 21:22 99%         ` Linus Torvalds
2024-03-18 21:25     [GIT PULL v2] dlm fixes for 6.9 David Teigland
2024-03-18 22:44 99% ` Linus Torvalds
2024-03-19  7:41     [GIT PULL] virtio: features, fixes Michael S. Tsirkin
2024-03-19 18:03 98% ` Linus Torvalds
2024-03-19 14:12     [GIT PULL] more s390 updates for 6.9 merge window Heiko Carstens
2024-03-19 18:54 97% ` Linus Torvalds
2024-03-19 16:36     [PATCH v1 1/3] mm: kmsan: implement kmsan_memmove() Alexander Potapenko
2024-03-19 16:36     ` [PATCH v1 2/3] instrumented.h: add instrument_memcpy_before, instrument_memcpy_after Alexander Potapenko
2024-03-19 17:52 99%   ` Linus Torvalds
2024-03-19 16:36     ` [PATCH v1 3/3] x86: call instrumentation hooks from copy_mc.c Alexander Potapenko
2024-03-19 17:58 99%   ` Linus Torvalds
2024-03-20 10:18     [PATCH v2 1/3] mm: kmsan: implement kmsan_memmove() Alexander Potapenko
2024-03-20 16:04 99% ` Linus Torvalds
2024-03-20 15:22     [GIT PULL] tracing/tools: Updates for v6.9 Steven Rostedt
2024-03-20 23:40 99% ` Linus Torvalds
2024-03-21  4:09     [GIT PULL] Hyper-V commits for 6.9 Wei Liu
2024-03-21 17:06 99% ` Linus Torvalds
2024-03-22 23:25       ` Wei Liu
2024-03-22 23:42 99%     ` Linus Torvalds
2024-03-21 12:55     [GIT PULL] remoteproc updates for v6.9 Bjorn Andersson
2024-03-21 18:08     ` Bjorn Andersson
2024-03-21 18:05 99%   ` Linus Torvalds
2024-03-21 13:02     [GIT PULL] Char/Misc driver changes for 6.9-rc1 Greg KH
2024-03-21 13:48     ` Nathan Chancellor
2024-03-21 18:10 99%   ` Linus Torvalds
2024-03-21 18:12 99%     ` Linus Torvalds
2024-03-21 18:30         ` Nathan Chancellor
2024-03-21 20:28 99%       ` Linus Torvalds
2024-03-27 16:56 97% ` Linus Torvalds
2024-03-27 20:26 99%   ` Linus Torvalds
2024-03-22 16:52     [PATCH v4 00/16] x86-64: Stack protector and percpu improvements Brian Gerst
2024-03-23 11:39     ` Uros Bizjak
2024-03-23 13:22       ` Brian Gerst
2024-03-23 16:16 94%     ` Linus Torvalds
2024-03-23 17:06 96%       ` Linus Torvalds
2024-03-22 19:12     [GIT PULL] SCSI postmerge updates for the 6.8+ merge window James Bottomley
2024-03-22 19:55 99% ` Linus Torvalds
2024-03-22 20:24       ` James Bottomley
2024-03-22 20:34 99%     ` Linus Torvalds
2024-03-22 23:38     [WIP 0/3] Memory model and atomic API in Rust Boqun Feng
2024-03-22 23:57     ` Kent Overstreet
2024-03-23  0:12 88%   ` Linus Torvalds
2024-03-23  0:21         ` Kent Overstreet
2024-03-23  0:36 84%       ` Linus Torvalds
2024-03-25 13:56             ` Philipp Stanner
2024-03-25 17:44 80%           ` Linus Torvalds
2024-03-25 18:59                 ` Kent Overstreet
2024-03-25 19:44 84%               ` Linus Torvalds
2024-03-26  0:05                     ` Dr. David Alan Gilbert
2024-03-26  3:49 76%                   ` Linus Torvalds
2024-03-27 16:16                         ` comex
2024-03-27 18:50                           ` Kent Overstreet
2024-03-27 19:07 89%                         ` Linus Torvalds
2024-03-27 19:41                               ` Kent Overstreet
2024-03-27 20:45 88%                             ` Linus Torvalds
2024-03-27 21:41                                   ` Kent Overstreet
2024-03-27 22:57 94%                                 ` Linus Torvalds
2024-04-08 16:02                 ` Matthew Wilcox
2024-04-08 17:01 75%               ` Linus Torvalds
2024-04-08 18:14                     ` Al Viro
2024-04-08 20:05 85%                   ` Linus Torvalds
2024-03-24 21:56 72% Linux 6.9-rc1 Linus Torvalds
2024-03-25 14:09     [PATCH 1/2] locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending() Uros Bizjak
2024-04-11 13:33     ` [tip: locking/core] " tip-bot2 for Uros Bizjak
2024-04-11 16:31 99%   ` Linus Torvalds
2024-03-26 14:38     [GIT PULL] tpmdd changes for v6.9-rc2 Jarkko Sakkinen
2024-03-30 22:32 99% ` Linus Torvalds
2024-03-31  5:57       ` Jarkko Sakkinen
2024-03-31 17:01 99%     ` Linus Torvalds
     [not found]     <CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com>
2023-01-20  3:15     ` PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression) Nick Bowler
2023-01-21 13:31       ` Linux kernel regression tracking (Thorsten Leemhuis)
2024-03-22  4:57         ` Nick Bowler
2024-03-28 19:36           ` Linux regression tracking (Thorsten Leemhuis)
2024-03-28 20:09 95%         ` Linus Torvalds
2024-03-31 18:24     [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor Kenny Levinsen
2024-04-22 17:10     ` Linux regression tracking (Thorsten Leemhuis)
2024-04-23 14:59       ` Benjamin Tissoires
2024-04-24 16:56         ` regression fixes sitting in subsystem git trees for a week or longer (was: Re: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor) Thorsten Leemhuis
2024-04-24 18:53 99%       ` Linus Torvalds
2024-03-31 22:05 43% Linux 6.9-rc2 Linus Torvalds
2024-04-02  2:14     [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns kernel test robot
2024-04-03 12:23     ` [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns) Borislav Petkov
2024-04-03 16:45 99%   ` Linus Torvalds
2024-04-03 17:05         ` Borislav Petkov
2024-04-03 17:13 99%       ` Linus Torvalds
2024-04-02 14:11     [GIT PULL] security changes for v6.9-rc3 Roberto Sassu
2024-04-02 19:39 92% ` Linus Torvalds
2024-04-02 19:57 96%   ` Linus Torvalds
2024-04-02 21:00         ` Al Viro
2024-04-02 21:35 99%       ` Linus Torvalds
2024-04-02 20:53     user-space concurrent pipe buffer scheduler interactions Michael Clark
2024-04-03 16:56 99% ` Linus Torvalds
2024-04-03 20:39       ` Michael Clark
2024-04-03 20:57 99%     ` Linus Torvalds
2024-04-03  9:07     [RESEND][PATCH v3] security: Place security_path_post_mknod() where the original IMA call was Roberto Sassu
2024-04-03 16:59 99% ` Linus Torvalds
2024-04-04 22:53 81% More annoying code generation by clang Linus Torvalds
2024-04-06 10:56     ` Ingo Molnar
2024-04-06 12:30       ` Uros Bizjak
2024-04-06 15:39 99%     ` Linus Torvalds
2024-04-06 16:04 87%       ` Linus Torvalds
2024-04-08  8:49     ` Peter Zijlstra
2024-04-08 18:32 99%   ` Linus Torvalds
2024-04-08 19:42 77%     ` Linus Torvalds
2024-04-07 20:39 41% Linux 6.9-rc3 Linus Torvalds
2024-04-08 17:47     [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
2024-04-08 17:49     ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
2024-04-08 20:10 99%   ` Linus Torvalds
2024-04-10 13:24     [GIT PULL] turbostat 2024.04.10 Len Brown
2024-04-10 20:18 99% ` Linus Torvalds
2024-04-11 18:20       ` Len Brown
2024-04-11 19:14 99%     ` Linus Torvalds
2024-04-10 16:38     [GIT PULL for v6.9-rc4] media fixes Mauro Carvalho Chehab
2024-04-10 20:53 98% ` Linus Torvalds
2024-04-11  0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
2024-04-11  0:20 99% ` Linus Torvalds
2024-04-11  2:39 96% ` Linus Torvalds
2024-04-11  9:04       ` Christian Brauner
2024-04-11 12:25         ` Christian Brauner
2024-04-11 16:21 99%       ` Linus Torvalds
2024-04-11 16:15 93%     ` Linus Torvalds
2024-04-11 16:44           ` Charles Mirabile
2024-04-11 17:29             ` Charles Mirabile
2024-04-11 17:35               ` Charles Mirabile
2024-04-11 18:13 99%             ` Linus Torvalds
2024-04-11 19:34 89%               ` Linus Torvalds
2024-04-12  7:45                     ` Christian Brauner
2024-04-12 15:36 99%                   ` Linus Torvalds
2024-04-11 20:08                   ` Charles Mirabile
2024-04-11 20:22 99%                 ` Linus Torvalds
2024-04-12  9:07     ` Christian Brauner
2024-04-12 17:43 95%   ` Linus Torvalds
2024-04-13  9:41         ` Christian Brauner
2024-04-13 15:16           ` Christian Brauner
2024-04-13 17:07 99%         ` Linus Torvalds
2024-04-12 14:32     [GIT PULL] tracing: Fixes for v6.9 Steven Rostedt
2024-04-12 16:07 99% ` Linus Torvalds
2024-04-12 16:15       ` Steven Rostedt
2024-04-12 16:20         ` Randy Dunlap
2024-04-12 16:21 99%       ` Linus Torvalds
2024-04-12 16:20 99%     ` Linus Torvalds
2024-04-12 18:10     [PATCH 0/3] x86/bugs: BHI fixes / improvements - round 2 Josh Poimboeuf
2024-04-12 18:10     ` [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
2024-04-15  7:32       ` Nikolay Borisov
2024-04-15 15:16 99%     ` Linus Torvalds
2024-04-15 15:27           ` Nikolay Borisov
2024-04-15 15:47 98%         ` Linus Torvalds
2024-04-14 20:48 43% Linux 6.9-rc4 Linus Torvalds
2024-04-15 16:35     [PATCH v10 0/5] Introduce mseal jeffxu
2024-04-15 16:35     ` [PATCH v10 1/5] mseal: Wire up mseal syscall jeffxu
2024-04-15 18:12       ` Muhammad Usama Anjum
2024-04-15 18:21 96%     ` Linus Torvalds
2024-04-17 23:45     [GIT PULL] Btrfs fixes for 6.9-rc5 David Sterba
2024-04-18  0:14 99% ` Linus Torvalds
2024-04-20 11:12     [PATCH v2] tty: n_gsm: restrict tty devices to attach Tetsuo Handa
2024-04-20 17:34 97% ` Linus Torvalds
2024-04-20 18:02 99%   ` Linus Torvalds
2024-04-20 18:05 99%     ` Linus Torvalds
2024-04-21 13:28           ` Tetsuo Handa
2024-04-21 16:04 95%         ` Linus Torvalds
2024-04-21 17:18 87%           ` Linus Torvalds
2024-04-23 15:26                 ` Tetsuo Handa
2024-04-23 16:37 99%               ` Linus Torvalds
2024-04-21 19:53 47% Linux 6.9-rc5 Linus Torvalds
2024-04-23 16:33 89% [PATCH] tty: add the option to have a tty reject a new ldisc Linus Torvalds
2024-04-25 17:45     [GIT PULL] ACPI fixes for v6.9-rc6 Rafael J. Wysocki
2024-04-25 18:58 95% ` Linus Torvalds
2024-04-25 19:01 99%   ` Linus Torvalds
2024-04-25 19:18 96%   ` Linus Torvalds
2024-04-27 20:00     [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task syzbot
2024-04-27 23:13     ` Hillf Danton
2024-04-28 20:01 91%   ` Linus Torvalds
2024-04-28 20:22 96%     ` Linus Torvalds
2024-04-28 23:23         ` Hillf Danton
2024-04-29  0:50 99%       ` Linus Torvalds
2024-04-29  1:33 75%         ` Linus Torvalds
2024-04-29  8:00               ` [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code Ingo Molnar
2024-04-29 15:51 99%             ` Linus Torvalds
2024-04-29 18:47 95%               ` Linus Torvalds
2024-04-29 19:07 98%                 ` Linus Torvalds
2024-04-29 23:29                       ` Andy Lutomirski
2024-04-30  0:05 99%                     ` Linus Torvalds
2024-04-30  6:16 51%           ` [tip: x86/urgent] " tip-bot2 for Linus Torvalds
2024-05-01  7:50 50%           ` tip-bot2 for Linus Torvalds
2024-04-28  8:24     [GIT PULL] scheduler fixes Ingo Molnar
2024-04-28  8:42     ` Ingo Molnar
2024-04-28 19:13 99%   ` Linus Torvalds
2024-04-28 20:58 43% Linux 6.9-rc6 Linus Torvalds
2024-04-29 14:47     [PATCH] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS Matthew Wilcox (Oracle)
2024-04-29 15:32 99% ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).