* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
2024-05-01 21:06 97% ` Linus Torvalds
@ 2024-05-01 21:20 94% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-05-01 21:20 UTC (permalink / raw)
To: Marco Elver
Cc: paulmck, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
Al Viro, Jiri Slaby
On Wed, 1 May 2024 at 14:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it would be something like
>
> const struct file_operations * __data_racy f_op;
>
> and only the load of f_op would be volatile - not the pointer itself.
Noe that in reality, we'd actually prefer the compiler to treat that
"__data_racy" as volatile in the sense of "don't reload this value",
but at the same time be the opposite of volatile in the sense that
using one read multiple times is actually a good idea.
IOW, the problem is rematerialization ("read the value more than once
when there is just one access in the source"), not strictly a "read
the value separately each time it is accessed".
We've actually had that before: it's not that we want each access to
force a read from memory, we want to avoid a TOCTOU race.
Many of our "READ_ONCE()" uses are of that kind, and using "volatile"
sadly generates horrible code, but is the only way to tell the
compiler to not ever rematerialize the value by loading it _twice_.
I'd love to see an extension where "const volatile" basically means
exactly that: the volatile tells the compiler that it can't
rematerialize by doing the load multiple times, but the "const" would
say that if the compiler sees two or more accesses, it can still CSE
them.
Oh well. Thankfully it's not a hugely common code generation problem.
It comes up every once in a while, and I think the last time this
worry came up, I think we had gcc people tell us that they don't
actually ever rematerialize loads from memory.
Of course, that was an implementation issue, not a guarantee.
Linus
^ permalink raw reply [relevance 94%]
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
@ 2024-05-01 21:06 97% ` Linus Torvalds
2024-05-01 21:20 94% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-05-01 21:06 UTC (permalink / raw)
To: Marco Elver
Cc: paulmck, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
Al Viro, Jiri Slaby
On Wed, 1 May 2024 at 13:15, Marco Elver <elver@google.com> wrote:
>
> This is relatively trivial:
>
> #ifdef __SANITIZE_THREAD__
> #define __data_racy volatile
> #endif
I really wouldn't want to make a code generation difference, but I
guess when the sanitizer is on, the compiler generating crap code
isn't a huge deal.
> In some cases it might cause the compiler to complain if converting a
> volatile pointer to a non-volatile pointer
No. Note that it's not the *pointer* that is volatile, it's the
structure member.
So it would be something like
const struct file_operations * __data_racy f_op;
and only the load of f_op would be volatile - not the pointer itself.
Of course, if somebody then does "&file->f_op" to get a pointer to a
pointer, *that* would now be a volatile pointer, but I don't see
people doing that.
So I guess this might be a way forward. Anybody want to verify?
Now, the "hung_up_tty_fops" *do* need to be expanded to have hung up
ops for every op that is non-NULL in the normal tty ops. That was a
real bug. We'd also want to add a big comment to the tty fops to make
sure anybody who adds a new tty f_op member to make sure to populate
the hung up version too.
Linus
^ permalink raw reply [relevance 97%]
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
@ 2024-05-01 18:56 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-05-01 18:56 UTC (permalink / raw)
To: paulmck
Cc: Marco Elver, Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov,
syzbot, linux-kernel, syzkaller-bugs, Nathan Chancellor,
Arnd Bergmann, Al Viro, Jiri Slaby
On Wed, 1 May 2024 at 11:46, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> In short, I for one do greatly value KCSAN's help. Along with that of
> a great many other tools, none of which are perfect, but all of which
> are helpful.
It's not that I don't value what KCSAN does, but I really think this
is a KCSAN issue.
I absolutely *detest* these crazy "randomly add data race annotations".
Could we instead annotate particular structure fields? I don't want to
mark things actually "volatile", because that then causes the compiler
to generate absolutely horrendous code. But some KCSAN equivalent of
"this field has data races, and we don't care" kind of annotation
would be lovely..
Linus
^ permalink raw reply [relevance 99%]
* [tip: x86/urgent] x86/mm: Remove broken vsyscall emulation code from the page fault code
2024-04-29 1:33 75% ` Linus Torvalds
2024-04-30 6:16 51% ` [tip: x86/urgent] " tip-bot2 for Linus Torvalds
@ 2024-05-01 7:50 50% ` tip-bot2 for Linus Torvalds
2 siblings, 0 replies; 200+ results
From: tip-bot2 for Linus Torvalds @ 2024-05-01 7:50 UTC (permalink / raw)
To: linux-tip-commits
Cc: syzbot+83e7f982ca045ab4405c, Linus Torvalds, Ingo Molnar,
Jiri Olsa, Andy Lutomirski, x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 02b670c1f88e78f42a6c5aee155c7b26960ca054
Gitweb: https://git.kernel.org/tip/02b670c1f88e78f42a6c5aee155c7b26960ca054
Author: Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate: Mon, 29 Apr 2024 10:00:51 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 01 May 2024 09:41:43 +02:00
x86/mm: Remove broken vsyscall emulation code from the page fault code
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
---
arch/x86/entry/vsyscall/vsyscall_64.c | 28 +---------------------
arch/x86/include/asm/processor.h | 1 +-
arch/x86/mm/fault.c | 33 +--------------------------
3 files changed, 3 insertions(+), 59 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df1..2fb7d53 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
- /*
- * XXX: if access_ok, get_user, and put_user handled
- * sig_on_uaccess_err, this could go away.
- */
-
if (!access_ok((void __user *)ptr, size)) {
struct thread_struct *thread = ¤t->thread;
@@ -120,10 +115,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
bool emulate_vsyscall(unsigned long error_code,
struct pt_regs *regs, unsigned long address)
{
- struct task_struct *tsk;
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
- int prev_sig_on_uaccess_err;
long ret;
unsigned long orig_dx;
@@ -172,8 +165,6 @@ bool emulate_vsyscall(unsigned long error_code,
goto sigsegv;
}
- tsk = current;
-
/*
* Check for access_ok violations and find the syscall nr.
*
@@ -234,12 +225,8 @@ bool emulate_vsyscall(unsigned long error_code,
goto do_ret; /* skip requested */
/*
- * With a real vsyscall, page faults cause SIGSEGV. We want to
- * preserve that behavior to make writing exploits harder.
+ * With a real vsyscall, page faults cause SIGSEGV.
*/
- prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
- current->thread.sig_on_uaccess_err = 1;
-
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
@@ -262,23 +249,12 @@ bool emulate_vsyscall(unsigned long error_code,
break;
}
- current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
check_fault:
if (ret == -EFAULT) {
/* Bad news -- userspace fed a bad pointer to a vsyscall. */
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall fault (exploit attempt?)");
-
- /*
- * If we failed to generate a signal for any reason,
- * generate one here. (This should be impossible.)
- */
- if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
- !sigismember(&tsk->pending.signal, SIGSEGV)))
- goto sigsegv;
-
- return true; /* Don't emulate the ret. */
+ goto sigsegv;
}
regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f..78e51b0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
unsigned long iopl_emul;
unsigned int iopl_warn:1;
- unsigned int sig_on_uaccess_err:1;
/*
* Protection Keys Register for Userspace. Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12e..bba4e02 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
WARN_ON_ONCE(user_mode(regs));
/* Are we prepared to handle this kernel fault? */
- if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
- /*
- * Any interrupt that takes a fault gets the fixup. This makes
- * the below recursive fault logic only apply to a faults from
- * task context.
- */
- if (in_interrupt())
- return;
-
- /*
- * Per the above we're !in_interrupt(), aka. task context.
- *
- * In this case we need to make sure we're not recursively
- * faulting through the emulate_vsyscall() logic.
- */
- if (current->thread.sig_on_uaccess_err && signal) {
- sanitize_error_code(address, &error_code);
-
- set_signal_archinfo(address, error_code);
-
- if (si_code == SEGV_PKUERR) {
- force_sig_pkuerr((void __user *)address, pkey);
- } else {
- /* XXX: hwpoison faults will set the wrong code. */
- force_sig_fault(signal, si_code, (void __user *)address);
- }
- }
-
- /*
- * Barring that, we can do the fixup and be happy.
- */
+ if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
return;
- }
/*
* AMD erratum #91 manifests as a spurious page fault on a PREFETCH
^ permalink raw reply related [relevance 50%]
* [tip: x86/urgent] x86/mm: Remove broken vsyscall emulation code from the page fault code
2024-04-29 1:33 75% ` Linus Torvalds
@ 2024-04-30 6:16 51% ` tip-bot2 for Linus Torvalds
2024-05-01 7:50 50% ` tip-bot2 for Linus Torvalds
2 siblings, 0 replies; 200+ results
From: tip-bot2 for Linus Torvalds @ 2024-04-30 6:16 UTC (permalink / raw)
To: linux-tip-commits
Cc: syzbot+83e7f982ca045ab4405c, Linus Torvalds, Ingo Molnar,
Jiri Olsa, Andy Lutomirski, x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: c9e1dc9825319392b44d3c22493dc543075933b9
Gitweb: https://git.kernel.org/tip/c9e1dc9825319392b44d3c22493dc543075933b9
Author: Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate: Mon, 29 Apr 2024 10:00:51 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 30 Apr 2024 08:08:30 +02:00
x86/mm: Remove broken vsyscall emulation code from the page fault code
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
---
arch/x86/entry/vsyscall/vsyscall_64.c | 25 +-------------------
arch/x86/include/asm/processor.h | 1 +-
arch/x86/mm/fault.c | 33 +--------------------------
3 files changed, 3 insertions(+), 56 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df1..3b0f61b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
- /*
- * XXX: if access_ok, get_user, and put_user handled
- * sig_on_uaccess_err, this could go away.
- */
-
if (!access_ok((void __user *)ptr, size)) {
struct thread_struct *thread = ¤t->thread;
@@ -123,7 +118,6 @@ bool emulate_vsyscall(unsigned long error_code,
struct task_struct *tsk;
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
- int prev_sig_on_uaccess_err;
long ret;
unsigned long orig_dx;
@@ -234,12 +228,8 @@ bool emulate_vsyscall(unsigned long error_code,
goto do_ret; /* skip requested */
/*
- * With a real vsyscall, page faults cause SIGSEGV. We want to
- * preserve that behavior to make writing exploits harder.
+ * With a real vsyscall, page faults cause SIGSEGV.
*/
- prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
- current->thread.sig_on_uaccess_err = 1;
-
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
@@ -262,23 +252,12 @@ bool emulate_vsyscall(unsigned long error_code,
break;
}
- current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
check_fault:
if (ret == -EFAULT) {
/* Bad news -- userspace fed a bad pointer to a vsyscall. */
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall fault (exploit attempt?)");
-
- /*
- * If we failed to generate a signal for any reason,
- * generate one here. (This should be impossible.)
- */
- if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
- !sigismember(&tsk->pending.signal, SIGSEGV)))
- goto sigsegv;
-
- return true; /* Don't emulate the ret. */
+ goto sigsegv;
}
regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f..78e51b0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
unsigned long iopl_emul;
unsigned int iopl_warn:1;
- unsigned int sig_on_uaccess_err:1;
/*
* Protection Keys Register for Userspace. Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12e..bba4e02 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
WARN_ON_ONCE(user_mode(regs));
/* Are we prepared to handle this kernel fault? */
- if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
- /*
- * Any interrupt that takes a fault gets the fixup. This makes
- * the below recursive fault logic only apply to a faults from
- * task context.
- */
- if (in_interrupt())
- return;
-
- /*
- * Per the above we're !in_interrupt(), aka. task context.
- *
- * In this case we need to make sure we're not recursively
- * faulting through the emulate_vsyscall() logic.
- */
- if (current->thread.sig_on_uaccess_err && signal) {
- sanitize_error_code(address, &error_code);
-
- set_signal_archinfo(address, error_code);
-
- if (si_code == SEGV_PKUERR) {
- force_sig_pkuerr((void __user *)address, pkey);
- } else {
- /* XXX: hwpoison faults will set the wrong code. */
- force_sig_fault(signal, si_code, (void __user *)address);
- }
- }
-
- /*
- * Barring that, we can do the fixup and be happy.
- */
+ if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
return;
- }
/*
* AMD erratum #91 manifests as a spurious page fault on a PREFETCH
^ permalink raw reply related [relevance 51%]
* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
@ 2024-04-30 0:05 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-30 0:05 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Hillf Danton, Peter Anvin, Adrian Bunk, syzbot,
Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
On Mon, 29 Apr 2024 at 16:30, Andy Lutomirski <luto@amacapital.net> wrote:
>
> What strange page table handling do we do for XONLY?
Ahh, I misread set_vsyscall_pgtable_user_bits(). It's used for EMULATE
not for XONLY.
And the code in pti_setup_vsyscall() is just wrong, and does it for all cases.
> So I think we should remove EMULATE before removing XONLY.
Ok, looking at that again, I don't disagree. I misread that XONLY as
mapping it executable, but it is actually just mapping it readable
Yes, let's remove EMULATE, and keep XONLY.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
2024-04-29 18:47 95% ` Linus Torvalds
@ 2024-04-29 19:07 98% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 19:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
On Mon, 29 Apr 2024 at 11:47, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In particular, I think the page fault emulation code should be moved
> from do_user_addr_fault() to do_kern_addr_fault(), and the horrible
> hack that is fault_in_kernel_space() should be removed (it is what now
> makes a vsyscall page fault be treated as a user address, and the only
> _reason_ for that is that we do the vsyscall handling in the wrong
> place).
Final note: we should also remove the XONLY option entirely, and
remove all the strange page table handling we currently do for it.
It won't work anyway on future CPUs with LASS, and we *have* to
emulate things (and not in the page fault path, I think LASS will
cause a GP fault).
I think the LASS patches ended up just disabling LASS if people wanted
vsyscall, which is probably the worst case.
Again, this is more of a "I think we have more work to do", and should
all happen after that sig_on_uaccess_err stuff is gone.
I guess that patch to rip out sig_on_uaccess_err needs to go into 6.9
and even be marked for stable, since it most definitely breaks some
stuff currently. Even if that "some stuff" is pretty esoteric (ie
"vsyscall=emulate" together with tracing).
Linus
^ permalink raw reply [relevance 98%]
* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
2024-04-29 15:51 99% ` Linus Torvalds
@ 2024-04-29 18:47 95% ` Linus Torvalds
2024-04-29 19:07 98% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 18:47 UTC (permalink / raw)
To: Ingo Molnar
Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
On Mon, 29 Apr 2024 at 08:51, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Well, Hilf had it go through the syzbot testing, and Jiri seems to
> have tested it on his setup too, so it looks like it's all good, and
> you can change the "Not-Yet-Signed-off-by" to be a proper sign-off
> from me.
Side note: having looked more at this, I suspect we have room for
further cleanups in this area.
In particular, I think the page fault emulation code should be moved
from do_user_addr_fault() to do_kern_addr_fault(), and the horrible
hack that is fault_in_kernel_space() should be removed (it is what now
makes a vsyscall page fault be treated as a user address, and the only
_reason_ for that is that we do the vsyscall handling in the wrong
place).
I also think that the vsyscall emulation code should just be cleaned
up - instead of looking up the system call number and then calling the
__x64_xyz() system call stub, I think we should just write out the
code in-place. That would get the SIGSEGV cases right too, and I think
it would actually clean up the code. We already do almost everything
but the (trivial) low-level ops anyway.
But I think my patch to remove the 'sig_on_uaccess_err' should just go
in first, since it fixes a real and present issue. And then if
somebody has the energy - or if it turns out that we actually need to
get the SIGSEGV siginfo details right - we can do the other cleanups.
They are mostly unrelated, but the current sig_on_uaccess_err code
just makes everything more complicated and needs to go.
Linus
^ permalink raw reply [relevance 95%]
* Re: [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code
@ 2024-04-29 15:51 99% ` Linus Torvalds
2024-04-29 18:47 95% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 15:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Hillf Danton, Andy Lutomirski, Peter Anvin, Adrian Bunk, syzbot,
Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
On Mon, 29 Apr 2024 at 01:00, Ingo Molnar <mingo@kernel.org> wrote:
>
> I did some Simple Testing™, and nothing seemed to break in any way visible
> to me, and the diffstat is lovely:
>
> 3 files changed, 3 insertions(+), 56 deletions(-)
>
> Might stick this into tip:x86/mm and see what happens?
Well, Hilf had it go through the syzbot testing, and Jiri seems to
have tested it on his setup too, so it looks like it's all good, and
you can change the "Not-Yet-Signed-off-by" to be a proper sign-off
from me.
It would be good to have some UML testing done, but at the same time I
do think that anybody running UML on modern kernels should be running
a modern user-mode setup too, so while the exact SIGSEGV details may
have been an issue in 2011, I don't think it's reasonable to think
that it's an issue in 2024.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
@ 2024-04-29 15:38 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 15:38 UTC (permalink / raw)
To: Marco Elver
Cc: Tetsuo Handa, Greg Kroah-Hartman, Dmitry Vyukov, syzbot,
linux-kernel, syzkaller-bugs, Nathan Chancellor, Arnd Bergmann,
Al Viro, Jiri Slaby, Paul E. McKenney
On Mon, 29 Apr 2024 at 06:56, Marco Elver <elver@google.com> wrote:
>
> A WRITE_ONCE() / READ_ONCE() pair would do it here. What should we use instead?
Why would we annotate a "any other code generation is insane" issues at all?
When we do chained pointer loads in
file->f_op->op()
and we say "I don't care what value I get for the middle one", I don't
see the value in annotating that at all.
There is no compiler that will sanely and validly do a pointer chain
load by *anything* but a load. And it doesn't matter to us if it then
spills and reloads, it will *STILL* be a load.
We're not talking about "extract different bits in separate
operations". We're talking about following one pointer that can point
to two separate static values.
Reality matters. A *lot* more than some "C standard" that we already
have ignored for decades because it's not strong enough.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS
@ 2024-04-29 15:32 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-29 15:32 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: linux-kernel,
Михаил
Новоселов,
Ильфат
Гаптрахманов,
stable, Rik van Riel, Mel Gorman, Peter Zijlstra, Ingo Molnar,
Andrew Morton
On Mon, 29 Apr 2024 at 07:48, Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> bits_per() rounds up to the next power of two when passed a power of
> two. This causes crashes on some machines and configurations.
Bah. Your patch is *still* wrong, because bits_per() thinks you need
one bit for a zero value, so when you do
bits_per(CONFIG_NR_CPUS - 1)
and some insane person has enabled SMP and managed to set
CONFIG_NR_CPUS to 1, the math is *still* broken.
The right thing to do is
order_base_2(CONFIG_NR_CPUS)
and 'bits_per()' should be avoided, having completely crazy semantics
(you can tell how almost all users actually do "x-1" as the argument).
We should probably get rid of that horrid bits_per(() entirely.
I applied your patch with that fixed (which admittedly make it all
*my* patch, but applying it as yours just to get the changelog).
Linus
^ permalink raw reply [relevance 99%]
* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
2024-04-29 0:50 99% ` Linus Torvalds
@ 2024-04-29 1:33 75% ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-04-29 1:33 UTC (permalink / raw)
To: Hillf Danton, Andy Lutomirski, Peter Anvin, Ingo Molnar, Adrian Bunk
Cc: syzbot, Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
[-- Attachment #1: Type: text/plain, Size: 3180 bytes --]
On Sun, 28 Apr 2024 at 17:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But the immediate problem is
> not the user space access, it's that something goes horribly wrong
> *around* it.
Side note: that stack trace from hell actually has three nested page
faults, and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall emulation.
- the second page fault is from __do_sys_gettimeofday, and that
should just have caused the exception that then sets the return value
to -EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore
-> preempt_schedule -> trace_sched_switch, which then causes that bpf
trace program to run, which does that bpf_probe_read_compat, which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
And I think I finally see what may be going on. The problem is
literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal
*despite* the exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent
- like for the bpf user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is
entirely broken, because it makes any nested page-faults do all the
wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation
any more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state
for something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect
nesting" by having that
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in
interrupts, but as shown by this example, these nested page faults do
not need to be about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken code.
The attached patch is ENTIRELY UNTESTED, but looks like the
ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to commit 4fc3490114bb ("x86-64: Set
siginfo and context on vsyscall emulation faults") in 2011, and back
then the reason was to get all the siginfo details right. Honestly, I
do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says
This fixes issues with UML when vsyscall=emulate.
and so my patch to remove this garbage will probably break UML in this
situation.
I cannot find it in myself to care, since I do not believe that
anybody should be running with vsyscall=emulate in 2024 in the first
place, much less if you are doing things like UML. But let's see if
somebody screams.
Also, somebody should obviously test my COMPLETELY UNTESTED patch.
Did I make it clear enough that this is UNTESTED and just does
crapectgomy on something that is clearly broken?
Linus "UNTESTED" Torvalds
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4002 bytes --]
arch/x86/entry/vsyscall/vsyscall_64.c | 25 ++-----------------------
arch/x86/include/asm/processor.h | 1 -
arch/x86/mm/fault.c | 33 +--------------------------------
3 files changed, 3 insertions(+), 56 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a3c0df11d0e6..3b0f61b2ea6d 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
- /*
- * XXX: if access_ok, get_user, and put_user handled
- * sig_on_uaccess_err, this could go away.
- */
-
if (!access_ok((void __user *)ptr, size)) {
struct thread_struct *thread = ¤t->thread;
@@ -123,7 +118,6 @@ bool emulate_vsyscall(unsigned long error_code,
struct task_struct *tsk;
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
- int prev_sig_on_uaccess_err;
long ret;
unsigned long orig_dx;
@@ -234,12 +228,8 @@ bool emulate_vsyscall(unsigned long error_code,
goto do_ret; /* skip requested */
/*
- * With a real vsyscall, page faults cause SIGSEGV. We want to
- * preserve that behavior to make writing exploits harder.
+ * With a real vsyscall, page faults cause SIGSEGV.
*/
- prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
- current->thread.sig_on_uaccess_err = 1;
-
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
@@ -262,23 +252,12 @@ bool emulate_vsyscall(unsigned long error_code,
break;
}
- current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
check_fault:
if (ret == -EFAULT) {
/* Bad news -- userspace fed a bad pointer to a vsyscall. */
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall fault (exploit attempt?)");
-
- /*
- * If we failed to generate a signal for any reason,
- * generate one here. (This should be impossible.)
- */
- if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
- !sigismember(&tsk->pending.signal, SIGSEGV)))
- goto sigsegv;
-
- return true; /* Don't emulate the ret. */
+ goto sigsegv;
}
regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..78e51b0d6433 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -472,7 +472,6 @@ struct thread_struct {
unsigned long iopl_emul;
unsigned int iopl_warn:1;
- unsigned int sig_on_uaccess_err:1;
/*
* Protection Keys Register for Userspace. Loaded immediately on
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 622d12ec7f08..bba4e020dd64 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -723,39 +723,8 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
WARN_ON_ONCE(user_mode(regs));
/* Are we prepared to handle this kernel fault? */
- if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
- /*
- * Any interrupt that takes a fault gets the fixup. This makes
- * the below recursive fault logic only apply to a faults from
- * task context.
- */
- if (in_interrupt())
- return;
-
- /*
- * Per the above we're !in_interrupt(), aka. task context.
- *
- * In this case we need to make sure we're not recursively
- * faulting through the emulate_vsyscall() logic.
- */
- if (current->thread.sig_on_uaccess_err && signal) {
- sanitize_error_code(address, &error_code);
-
- set_signal_archinfo(address, error_code);
-
- if (si_code == SEGV_PKUERR) {
- force_sig_pkuerr((void __user *)address, pkey);
- } else {
- /* XXX: hwpoison faults will set the wrong code. */
- force_sig_fault(signal, si_code, (void __user *)address);
- }
- }
-
- /*
- * Barring that, we can do the fixup and be happy.
- */
+ if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
return;
- }
/*
* AMD erratum #91 manifests as a spurious page fault on a PREFETCH
^ permalink raw reply related [relevance 75%]
* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
@ 2024-04-29 0:50 99% ` Linus Torvalds
2024-04-29 1:33 75% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-29 0:50 UTC (permalink / raw)
To: Hillf Danton
Cc: syzbot, Tetsuo Handa, andrii, bpf, linux-kernel, syzkaller-bugs
On Sun, 28 Apr 2024 at 16:23, Hillf Danton <hdanton@sina.com> wrote:
>
> So is game like copying from/putting to user with runqueue locked
> at the first place.
No, that should be perfectly fine. In fact, it's even normal. It would
happen any time you have any kind of tracing thing, where looking up
the user mode frame involves doing user accesses with page faults
disabled.
The runqueue lock is irrelevant. As mentioned, it's only a symptom of
something else going wrong.
Now, judging by the syz reproducer, the trigger for this all is almost
certainly that
bpf$BPF_RAW_TRACEPOINT_OPEN(0x11,
&(0x7f00000000c0)={&(0x7f0000000080)='sched_switch\x00', r0}, 0x10)
and that probably causes the instability. But the immediate problem is
not the user space access, it's that something goes horribly wrong
*around* it.
> Plus as per another syzbot report [1], bpf could make trouble with
> workqueue pool locked.
That seems to be entirely different. There's no unexplained page fault
in that case, that seems to be purely a "take lock in the wrong order"
Linus
^ permalink raw reply [relevance 99%]
* Linux 6.9-rc6
@ 2024-04-28 20:58 43% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:58 UTC (permalink / raw)
To: Linux Kernel Mailing List
Things continue to look pretty normal, and nothing here really stands
out. The biggest single change that stands out in the diffstat is
literally a documentation update, everything else looks pretty small
and spread out.
We have the usual driver updates (mainly networking and gpu but some
updates elsewhere), some filesystem updates (mainly smb, bcachefs,
nfsd reverts, and some ntfs compat updates), and misc other fixes all
over - wifi fixes, arm dts fixlets, yadda yadda.
Nothing looks particularly big or bad. Shortlog appended for details,
please do keep testing,
Linus
---
Abdelrahman Morsy (1):
HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
Akhil R (1):
dmaengine: tegra186: Fix residual calculation
Alex Deucher (1):
drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
Alex Elder (1):
mailmap: add entries for Alex Elder
Alexey Brodkin (1):
ARC: [plat-hsdk]: Remove misplaced interrupt-cells property
Alice Ryhl (1):
rust: don't select CONSTRUCTORS
Andrei Simion (2):
ARM: dts: microchip: at91-sama7g5ek: Replace
regulator-suspend-voltage with the valid property
ARM: dts: microchip: at91-sama7g54_curiosity: Replace
regulator-suspend-voltage with the valid property
Andrew Jones (1):
RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
Andrey Ryabinin (1):
stackdepot: respect __GFP_NOLOCKDEP allocation flag
Andy Shevchenko (2):
idma64: Don't try to serve interrupts when device is powered off
gpio: tangier: Use correct type for the IRQ chip data
Andy Yan (1):
arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi CM5
AngeloGioacchino Del Regno (1):
soc: mediatek: mtk-svs: Append "-thermal" to thermal zone names
Arkadiusz Kubalewski (1):
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
Arnd Bergmann (2):
dmaengine: owl: fix register access functions
mtd: diskonchip: work around ubsan link failure
Arınç ÜNAL (1):
arm64: dts: rockchip: set PHY address of MT7531 switch to 0x1f
Aswin Unnikrishnan (1):
rust: remove `params` from `module` macro example
Avraham Stern (1):
wifi: iwlwifi: mvm: remove old PASN station when adding a new one
Baoquan He (1):
LoongArch: Fix Kconfig item and left code related to CRASH_CORE
Bartosz Golaszewski (1):
Bluetooth: qca: set power_ctrl_enabled on NULL returned by
gpiod_get_optional()
Ben Zong-You Xie (1):
perf riscv: Fix the warning due to the incompatible type
Benjamin Tissoires (1):
MAINTAINERS: update Benjamin's email address
Benno Lossin (1):
rust: macros: fix soundness issue in `module!` macro
Bibo Mao (1):
LoongArch: Lately init pmu after smp is online
Bjorn Helgaas (1):
ARC: Fix typos
Bo-Wei Chen (1):
docs: rust: fix improper rendering in Arch Support page
Christian Brauner (3):
ntfs3: serve as alias for the legacy ntfs driver
ntfs3: enforce read-only when used as legacy ntfs driver
ntfs3: add legacy ntfs file operations
Christian Gmeiner (1):
Revert "drm/etnaviv: Expose a few more chipspecs to userspace"
Christian Marangi (2):
mtd: rawnand: qcom: Fix broken OP_RESET_DEVICE command in
qcom_misc_cmd_type_exec()
mtd: limit OTP NVMEM cell parse to non-NAND devices
Christoph Müllner (2):
riscv: thead: Rename T-Head PBMT to MAE
riscv: T-Head: Test availability bit before enabling MAE errata
Chuck Lever (3):
Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain"
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt
is shut down"
Revert "NFSD: Convert the callback workqueue to use delayed_work"
Chun-Yi Lee (1):
Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
Clément Léger (2):
riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN
selftests: sud_test: return correct emulated syscall value on RISC-V
Conor Dooley (1):
rust: make mutually exclusive with CFI_CLANG
Cristian Ciocaltea (1):
phy: phy-rockchip-samsung-hdptx: Select CONFIG_RATIONAL
Dan Carpenter (1):
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
Dan Williams (1):
cxl/core: Fix potential payload size confusion in cxl_mem_get_poison()
Daniel Golle (2):
soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUS
net: phy: mediatek-ge-soc: follow netdev LED trigger semantics
Daniel Okazaki (1):
eeprom: at24: fix memory corruption race condition
Daniele Palmas (1):
net: usb: qmi_wwan: add Telit FN920C04 compositions
David Bauer (1):
vxlan: drop packets from invalid src-address
David Christensen (1):
MAINTAINERS: eth: mark IBM eHEA as an Orphan
David Hildenbrand (1):
LoongArch: Fix a build error due to __tlb_remove_tlb_entry()
David Howells (4):
cifs: Fix reacquisition of volume cookie on still-live connection
cifs: Add tracing for the cifs_tcon struct refcounting
netfs: Fix writethrough-mode error handling
netfs: Fix the pre-flush when appending to a file in writethrough mode
David Kaplan (1):
x86/cpu: Fix check for RDPKRU in __show_regs()
David Sterba (1):
btrfs: remove colon from messages with state
Derek Foreman (1):
drm/etnaviv: fix tx clock gating on some GC7000 variants
Dragan Simic (2):
arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
arm64: dts: rockchip: Designate the system power controller on QuartzPro64
Duanqiang Wen (3):
net: libwx: fix alloc msix vectors failed
Revert "net: txgbe: fix i2c dev name cannot match clkdev"
Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"
Duoming Zhou (1):
ax25: Fix netdev refcount issue
Edward Liaw (1):
selftests/harness: remove use of LINE_MAX
Eric Dumazet (4):
icmp: prevent possible NULL dereferences from icmp_build_probe()
net: fix sk_memory_allocated_{add|sub} vs softirqs
ipv4: check for NULL idev in ip_route_use_hint()
net: usb: ax88179_178a: stop lying about skb->truesize
Eric Van Hensbergen (1):
fs/9p: mitigate inode collisions
Erwan Velu (1):
i40e: Report MFS in decimal base instead of hex
Felix Fietkau (1):
wifi: mac80211: split mesh fast tx cache into local/proxied/forwarded
Felix Kuehling (3):
drm/amdkfd: Fix eviction fence handling
drm/amdgpu: Update BO eviction priorities
drm/amdkfd: Fix rescheduling of restore worker
Fenghua Yu (1):
dmaengine: idxd: Fix oops during rmmod on single-CPU platforms
Gabor Juhos (1):
phy: qcom: m31: match requested regulator name with dt schema
Geert Uytterhoeven (1):
net: ravb: Fix registered interrupt names
Guanrui Huang (1):
irqchip/gic-v3-its: Prevent double free on error
Guenter Roeck (1):
MAINTAINERS: Drop entry for PCA9541 bus master selector
Gustavo A. R. Silva (1):
smb: client: Fix struct_group() usage in __packed structs
Günther Noack (1):
fs: Return ENOTTY directly if FS_IOC_GETUUID or FS_IOC_GETFSSYSFSPATH fail
Hangbin Liu (1):
bridge/br_netlink.c: no need to return void function
Hans de Goede (1):
phy: ti: tusb1210: Resolve charger-det crash if charger psy is
unregistered
Himal Prasad Ghimiray (2):
drm/xe: Remove sysfs only once on action add failure
drm/xe: call free_gsc_pkt only once on action add failure
Huacai Chen (1):
LoongArch: Fix callchain parse error with kernel tracepoint events
Hyunwoo Kim (3):
tcp: Fix Use-After-Free in tcp_ao_connect_init
net: gtp: Fix Use-After-Free in gtp_dellink
net: openvswitch: Fix Use-After-Free in ovs_ct_exit
Ido Schimmel (12):
mlxsw: core: Unregister EMAD trap using FORWARD action
mlxsw: core_env: Fix driver initialization with old firmware
mlxsw: pci: Fix driver initialization with old firmware
mlxsw: spectrum_acl_tcam: Fix race in region ID allocation
mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during
activity update
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
mlxsw: spectrum_acl_tcam: Rate limit error message
mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
mlxsw: spectrum_acl_tcam: Fix warning during rehash
mlxsw: spectrum_acl_tcam: Fix incorrect list API usage
mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work
Igor Artemiev (1):
wifi: cfg80211: fix the order of arguments for trace events of
the tx_rx_evt class
Ikjoon Jang (1):
arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
Iskander Amara (2):
arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma
arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
Ismael Luceno (1):
ipvs: Fix checksumming on GSO of SCTP packets
Jack Xiao (1):
drm/amdgpu/mes: fix use-after-free issue
Jacob Keller (1):
ice: fix LAG and VF lock dependency in ice_reset_vf()
Jakub Kicinski (2):
tools: ynl: don't ignore errors in NLMSG_DONE messages
eth: bnxt: fix counting packets discarded due to OOM and netpoll
Jarred White (1):
ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
Jason Reeder (1):
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
Jiantao Shan (1):
LoongArch: Fix access error when read fault on a write-only VMA
Johan Hovold (5):
phy: qcom: qmp-combo: fix VCO div offset on v5_5nm and v6
arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
Bluetooth: qca: fix invalid device address check
Bluetooth: qca: fix NULL-deref on non-serdev suspend
Bluetooth: qca: fix NULL-deref on non-serdev setup
Johannes Berg (12):
wifi: mac80211: check EHT/TTLM action frame length
wifi: mac80211: don't use rate mask for scanning
Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices"
wifi: mac80211: fix idle calculation with multi-link
wifi: mac80211: mlme: re-parse with correct mode
wifi: mac80211: mlme: fix memory leak
wifi: mac80211: mlme: re-parse if AP mode is less than client
wifi: nl80211: don't free NULL coalescing rule
wifi: mac80211_hwsim: init peer measurement result
wifi: mac80211: remove link before AP
wifi: mac80211: fix unaligned le16 access
wifi: iwlwifi: mvm: fix link ID management
Johannes Thumshirn (1):
btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
Johannes Weiner (1):
mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
Jose Ignacio Tornos Martinez (1):
arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro
Joshua Ashton (1):
drm/amd/display: Set color_mgmt_changed to true on unsuspend
Justin Chen (1):
net: bcmasp: fix memory leak when bringing down interface
Kalle Valo (1):
wifi: ath11k: use RCU when accessing struct inet6_dev::ac_list
Kenny Levinsen (1):
HID: i2c-hid: Revert to await reset ACK before reading report descriptor
Kent Overstreet (14):
bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
bcachefs: node scan: ignore multiple nodes with same seq if interior
bcachefs: make sure to release last journal pin in replay
bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift
bcachefs: KEY_TYPE_error is allowed for reflink
bcachefs: fix leak in bch2_gc_write_reflink_key
bcachefs: Fix bio alloc in check_extent_checksum()
bcachefs: Check for journal entries overruning end of sb clean section
bcachefs: Fix missing call to bch2_fs_allocator_background_exit()
bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong
bcachefs: Tweak btree key cache shrinker so it actually frees
bcachefs: Fix deadlock in journal write path
bcachefs: Fix inode early destruction path
bcachefs: If we run merges at a lower watermark, they must be nonblocking
Kirill A. Shutemov (1):
x86/tdx: Preserve shared bit on mprotect()
Krzysztof Kozlowski (4):
arm64: dts: rockchip: drop panel port unit address in GRU Scarlet
arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2
Kuniyuki Iwashima (1):
af_unix: Suppress false-positive lockdep splat for spin_lock()
in __unix_gc().
Laine Taffin Altman (1):
rust: init: remove impl Zeroable for Infallible
Lang Yu (2):
drm/amdkfd: make sure VM is ready for updating operations
drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend
Lijo Lazar (2):
drm/amdgpu: Assign correct bits for SDMA HDP flush
drm/amd/pm: Restore config space after reset
Linus Torvalds (1):
Linux 6.9-rc6
Louis Chauvet (1):
dmaengine: xilinx: xdma: Fix synchronization issue
Luca Weiss (1):
arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
Lucas Stach (1):
drm/atomic-helper: fix parameter order in
drm_format_conv_state_copy() call
Luiz Augusto von Dentz (3):
Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync
Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE
Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID
Lukas Wunner (1):
igc: Fix LED-related deadlock on driver unbind
MD Danish Anwar (1):
net: phy: dp83869: Fix MII mode failure
Ma Jun (1):
drm/amdgpu/pm: Remove gpu_od if it's an empty directory
Maksim Kiselev (1):
mmc: sdhci-of-dwcmshc: th1520: Increase tuning loop count to 128
Manivannan Sadhasivam (3):
arm64: dts: qcom: sm8450: Fix the msi-map entries
arm64: dts: qcom: sm8550: Fix the msi-map entries
arm64: dts: qcom: sm8650: Fix the msi-map entries
Mantas Pucka (1):
mmc: sdhci-msm: pervent access to suspended controller
Marcel Ziswiler (1):
phy: freescale: imx8m-pcie: fix pcie link-up instability
Marek Vasut (1):
arm64: dts: imx8mp: Fix assigned-clocks for second CSI2
Marios Makassikis (1):
ksmbd: clear RENAME_NOREPLACE before calling vfs_rename
Matthew Sakai (1):
dm vdo murmurhash: remove unneeded semicolon
Matthew Wilcox (Oracle) (3):
mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
mm: support page_mapcount() on page_has_type() pages
mm: turn folio_test_hugetlb into a PageType
Matthias Schiffer (1):
net: dsa: mv88e6xx: fix supported_interfaces setup in
mv88e6250_phylink_get_caps()
Maximilian Luz (2):
firmware: qcom: uefisecapp: Fix memory related IO errors and crashes
arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller
Miaohe Lin (1):
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
Michael Chan (1):
bnxt_en: Fix error recovery for 5760X (P7) chips
Michael Heimpold (1):
ARM: dts: imx6ull-tarragon: fix USB over-current polarity
Michal Tomek (1):
phy: rockchip-snps-pcie3: fix bifurcation on rk3588
Michal Wajdeczko (1):
drm/xe/guc: Fix arguments passed to relay G2H handlers
Miguel Ojeda (2):
kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
kbuild: rust: force `alloc` extern to allow "empty" Rust files
Mikhail Kobuk (2):
phy: marvell: a3700-comphy: Fix out of bounds read
phy: marvell: a3700-comphy: Fix hardcoded array size
Ming Lei (1):
dm: restore synchronous close of device mapper block device
Miquel Raynal (2):
dmaengine: xilinx: xdma: Fix wrong offsets in the buffers
addresses in dma descriptor
dmaengine: xilinx: xdma: Clarify kdoc in XDMA driver
Miri Korenblit (1):
wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd
Muhammad Usama Anjum (2):
selftests: mm: fix unused and uninitialized variable warning
selftests: mm: protection_keys: save/restore nr_hugepages value
from launch script
Muhammed Efe Cetin (1):
arm64: dts: rockchip: mark system power controller and fix typo
on orangepi-5-plus
Mukul Joshi (2):
drm/amdgpu: Fix leak when GPU memory allocation fails
drm/amdkfd: Add VRAM accounting for SVM migration
Nam Cao (2):
HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
fbdev: fix incorrect address computation in deferred IO
Namjae Jeon (4):
ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
ksmbd: common: use struct_group_attr instead of struct_group for
network_open_info
ksmbd: add continuous availability share parameter
Naohiro Aota (1):
btrfs: scrub: run relocation repair when/only needed
Nathan Chancellor (2):
bcachefs: Fix format specifier in validate_bset_keys()
Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
Nuno Pereira (1):
HID: nintendo: Fix N64 controller being identified as mouse
Nícolas F. R. A. Prado (5):
arm64: dts: mediatek: mt8192: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to vpp/vdosys
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex1
arm64: dts: mediatek: cherry: Describe CPU supplies
Oleg Nesterov (2):
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
Pablo Neira Ayuso (1):
netfilter: nf_tables: honor table dormant flag from netdev
release event path
Patrik Jakobsson (1):
drm/gma500: Remove lid code
Paul Geurts (1):
NFC: trf7970a: disable all regulators on removal
Paulo Alcantara (1):
smb: client: fix rename(2) regression against samba
Peter Münster (1):
net: b44: set pause params only when interface is up
Peter Xu (1):
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
Peyton Lee (1):
drm/amdgpu/vpe: fix vpe dpm setup failed
Pin-yen Lin (4):
arm64: dts: mediatek: mt8192-asurada: Update min voltage
constraint for MT6315
arm64: dts: mediatek: mt8195-cherry: Update min voltage
constraint for MT6315
arm64: dts: mediatek: mt8183-kukui: Use default min voltage for MT6358
arm64: dts: mediatek: mt8186-corsola: Update min voltage
constraint for Vgpu
Prathamesh Shete (1):
gpio: tegra186: Fix tegra186_gpio_is_accessible() check
Prike Liang (1):
drm/amdgpu: Fix the ring buffer size for queue VM flush
Qu Wenruo (1):
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
Quentin Schulz (3):
arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for
RK3399 Puma
arm64: dts: rockchip: add regulators for PCIe on RK3399 Puma Haikou
Rafael J. Wysocki (1):
ACPI: PM: s2idle: Evaluate all Low-Power S0 Idle _DSM functions
Rafał Miłecki (9):
arm64: dts: mediatek: mt7622: fix clock controllers
arm64: dts: mediatek: mt7622: fix IR nodename
arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
arm64: dts: mediatek: mt7986: drop invalid properties from ethsys
arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controller
arm64: dts: mediatek: mt7986: drop invalid thermal block clock
arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-"
arm64: dts: mediatek: mt2712: fix validation errors
Rahul Rameshbabu (4):
macsec: Enable devices to advertise whether they update sk_buff
md_dst during offloads
ethernet: Add helper for assigning packet type when dest address
does not match device address
macsec: Detect if Rx skb is macsec-related for offloading
devices that update md_dst
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst
for MACsec
Rajendra Nayak (1):
arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states
Rex Zhang (1):
dmaengine: idxd: Convert spinlock to mutex to lock evl workqueue
Richard Kinder (1):
wifi: mac80211: ensure beacon is non-S1G prior to extracting the
beacon timestamp field
Rob Herring (3):
dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
dt-bindings: eeprom: at24: Fix ST M24C64-D compatible schema
arm64: dts: rockchip: Fix USB interface compatible string on
kobol-helios64
Sabrina Dubroca (1):
tls: fix lockless read of strp->msg_ready in ->poll
Samuel Holland (2):
riscv: Fix TASK_SIZE on 64-bit NOMMU
riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
Sean Anderson (1):
dma: xilinx_dpdma: Fix locking
Sean Christopherson (2):
cpu: Re-enable CPU mitigations by default for !X86 architectures
cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
Sean Wang (1):
Bluetooth: btusb: mediatek: Fix double free of skb in coredump
Sebastian Reichel (2):
phy: rockchip-snps-pcie3: fix clearing PHP_GRF_PCIESEL_CON bits
phy: rockchip: naneng-combphy: Fix mux on rk3588
Sergei Antonov (1):
mmc: moxart: fix handling of sgm->consumed, otherwise WARN_ON triggers
Sindhu Devale (1):
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
Stephen Boyd (2):
phy: qcom: qmp-combo: Fix VCO div offset on v3
phy: qcom: qmp-combo: Fix register base for QSERDES_DP_PHY_MODE
Steve French (2):
smb3: missing lock when picking channel
smb3: fix lock ordering potential deadlock in cifs_sync_mid_result
Su Hui (1):
octeontx2-af: fix the double free in rvu_npc_freemem()
Sudheer Mogilappagari (1):
iavf: Fix TC config comparison with existing adapter TC config
Sweet Tea Dorminy (1):
btrfs: fallback if compressed IO fails for ENOSPC
Takayuki Nagata (1):
cifs: reinstate original behavior again for forceuid/forcegid
Tetsuo Handa (1):
profiling: Remove create_prof_cpu_mask().
Thorsten Leemhuis (6):
docs: verify/bisect: use git switch, tag kernel, and various fixes
docs: verify/bisect: add and fetch stable branches ahead of time
docs: verify/bisect: proper headlines and more spacing
docs: verify/bisect: explain testing reverts, patches and newer code
docs: verify/bisect: describe how to use a build host
docs: verify/bisect: stable regressions: first stable, then mainline
Tianchen Ding (2):
sched/eevdf: Always update V if se->on_rq when reweighting
sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr
Tom Lendacky (1):
x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler
Uwe Kleine-König (1):
MAINTAINERS: Update Uwe's email address, drop SIOX maintenance
Vanshidhar Konda (1):
ACPI: CPPC: Fix access width used for PCC registers
Vijendar Mukunda (1):
soundwire: amd: fix for wake interrupt handling for clockstop mode
Vikas Gupta (2):
bnxt_en: refactor reset close code
bnxt_en: Fix the PCI-AER routines
Vineet Gupta (2):
ARC: Fix -Wmissing-prototypes warnings
ARC: mm: fix new code about cache aliasing
Vinod Koul (1):
dmaengine: Revert "dmaengine: pl330: issue_pending waits until WFP state"
Vishal Moola (Oracle) (1):
hugetlb: check for anon_vma prior to folio allocation
WangYuli (1):
Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
Wedson Almeida Filho (2):
rust: phy: implement `Send` for `Registration`
rust: kernel: require `Send` for `Module` implementations
Wenkuan Wang (1):
x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
William Zhang (1):
mtd: rawnand: brcmnand: Fix data access violation for STB chip
Wolfram Sang (1):
i2c: smbus: fix NULL function pointer dereference
Xuewen Yan (1):
sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()
Yaraslau Furman (1):
HID: logitech-dj: allow mice to use all types of reports
Yick Xie (1):
udp: preserve the connected status if only UDP cmsg
Yu Kuai (1):
block: fix module reference leakage from bdev_open_by_dev error path
Zhang Lixu (1):
HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
Zhu Lingshan (1):
vDPA: code clean for vhost_vdpa uapi
Zijun Hu (1):
Bluetooth: btusb: Fix triggering coredump implementation for QCA
^ permalink raw reply [relevance 43%]
* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
2024-04-28 20:01 91% ` Linus Torvalds
@ 2024-04-28 20:22 96% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:22 UTC (permalink / raw)
To: Hillf Danton; +Cc: syzbot, andrii, bpf, linux-kernel, syzkaller-bugs
On Sun, 28 Apr 2024 at 13:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The *problem* here is that the page fault doesn't actually happen on a
> user access, it happens on the *ret* instruction in
> rep_movs_alternative itself (which doesn't have a exception fixup,
> obviously, because no exception is supposed to happen there!):
Actually, there's another page fault deeper in that call chain:
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:__put_user_handle_exception+0x0/0x10 arch/x86/lib/putuser.S:125
Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 01 cb 48 89 01 31
c9 0f 01 ca c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 01
ca b9 f2 ff ff ff c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90
RSP: 0000:ffffc90004137d98 EFLAGS: 00050202
RAX: 00000000662d5943 RBX: 0000000000000000 RCX: 0000000000000019
RDX: 0000000000000000 RSI: ffffffff8bcaca20 RDI: ffffffff8c1eaba0
RBP: ffffc90004137e50 R08: ffffffff8fa7cd6f R09: 1ffffffff1f4f9ad
R10: dffffc0000000000 R11: fffffbfff1f4f9ae R12: ffffc90004137de0
R13: dffffc0000000000 R14: 1ffff92000826fb8 R15: 0000000000000019
__do_sys_gettimeofday kernel/time/time.c:147 [inline]
__se_sys_gettimeofday+0xd9/0x240 kernel/time/time.c:140
which is also nonsensical, since that "<0f> 01 ca" code is just the
"CLAC" instruction (which is the first instruction of
__put_user_handle_exception, which is the exception fixup for the
__put_user() functions.
So that seems to be the *first* problem spot, actually. It too is
incomprehensible to me. I must be missing something. A "clac"
instruction cannot take a page fault (except for the instruction fetch
itself, of course).
So if the page fault on the 'RET' instruction was odd, the page fault
on the CLAC is *really* odd.
That original page fault looks like it's just from one of the
put_user() calls in gettimeofday():
if (put_user(ts.tv_sec, &tv->tv_sec) ||
put_user(ts.tv_nsec / 1000, &tv->tv_usec))
and yes, they can fault, but I'm not seeing how that then points to
the CLAC in the exception handler.
Linus
^ permalink raw reply [relevance 96%]
* Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
@ 2024-04-28 20:01 91% ` Linus Torvalds
2024-04-28 20:22 96% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 20:01 UTC (permalink / raw)
To: Hillf Danton; +Cc: syzbot, andrii, bpf, linux-kernel, syzkaller-bugs
On Sat, 27 Apr 2024 at 16:13, Hillf Danton <hdanton@sina.com> wrote:
>
> > -> #0 (&sighand->siglock){....}-{2:2}:
> > check_prev_add kernel/locking/lockdep.c:3134 [inline]
> > check_prevs_add kernel/locking/lockdep.c:3253 [inline]
> > validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
> > __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
> > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> > _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
> > force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
> > force_sig_fault_to_task kernel/signal.c:1733 [inline]
> > force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738
> > __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
> > handle_page_fault arch/x86/mm/fault.c:1505 [inline]
>
> Given page fault with runqueue locked, bpf makes trouble instead of
> helping anything in this case.
That's not the odd thing here.
Look, the callchain is:
> > exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
> > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> > rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:48
> > copy_user_generic arch/x86/include/asm/uaccess_64.h:110 [inline]
> > raw_copy_from_user arch/x86/include/asm/uaccess_64.h:125 [inline]
> > __copy_from_user_inatomic include/linux/uaccess.h:87 [inline]
> > copy_from_user_nofault+0xbc/0x150 mm/maccess.c:125
IOW, this is all doing a copy from user with page faults disabled, and
it shouldn't have caused a signal to be sent, so the whole
__bad_area_nosemaphore -> force_sig_fault path is bad.
The *problem* here is that the page fault doesn't actually happen on a
user access, it happens on the *ret* instruction in
rep_movs_alternative itself (which doesn't have a exception fixup,
obviously, because no exception is supposed to happen there!):
RIP: 0010:rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:50
Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 83 f9 40 73 40 83 f9 08
73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 <c3> cc
cc cc cc 66 0f 1f 84 00 00 0$
RSP: 0000:ffffc90004137468 EFLAGS: 00050002
RAX: ffffffff8205ce4e RBX: dffffc0000000000 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000900 RDI: ffffc900041374e8
RBP: ffff88802d039784 R08: 0000000000000005 R09: ffffffff8205ce37
R10: 0000000000000003 R11: ffff88802d038000 R12: 1ffff11005a072f0
R13: 0000000000000900 R14: 0000000000000002 R15: ffffc900041374e8
where decoding that "Code:" line gives this:
0: f3 0f 1e fa endbr64
4: 48 83 f9 40 cmp $0x40,%rcx
8: 73 40 jae 0x4a
a: 83 f9 08 cmp $0x8,%ecx
d: 73 21 jae 0x30
f: 85 c9 test %ecx,%ecx
11: 74 0f je 0x22
13: 8a 06 mov (%rsi),%al
15: 88 07 mov %al,(%rdi)
17: 48 ff c7 inc %rdi
1a: 48 ff c6 inc %rsi
1d: 48 ff c9 dec %rcx
20: 75 f1 jne 0x13
22:* c3 ret <-- trapping instruction
but I have no idea why the 'ret' instruction would take a page fault.
It really shouldn't.
Now, it's not like 'ret' instructions can't take page faults, but it
sure shouldn't happen in the *kernel*. The reasons for page faults on
'ret' instructions are:
- the instruction itself takes a page fault
- the stack pointer is bogus
- possibly because the stack *contents* are bogus (at least some x86
instructions that jump will check the destination in the jump
instruction itself, although I didn't think 'ret' was one of them)
but for the kernel, none of these actually seem to be the case
normally. And even abnormally I don't see this being an issue, since
the exception backtrace is happily shown (ie the stack looks all
good).
So this dump is just *WEIRD*.
End result: the problem is not about any kind of deadlock on circular
locking. That's just the symptom of that odd page fault that shouldn't
have happened, and that I don't quite see how it happened.
Linus
^ permalink raw reply [relevance 91%]
* Re: [GIT PULL] scheduler fixes
@ 2024-04-28 19:13 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-28 19:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Vincent Guittot, linux-kernel, Peter Zijlstra, Thomas Gleixner,
Juri Lelli, Daniel Bristot de Oliveira, Valentin Schneider
On Sun, 28 Apr 2024 at 01:42, Ingo Molnar <mingo@kernel.org> wrote:
>
> Merge note: in case you are wondering about the timestamps, I ninja-rebased
> these two commits shortly before the pull request to fix an annoying typo
> in a commit title.
Hmm. You also forgot to have a diffstat..
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
@ 2024-04-28 18:50 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-28 18:50 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Dmitry Vyukov, syzbot, linux-kernel,
syzkaller-bugs, Nathan Chancellor, Arnd Bergmann, Al Viro,
Jiri Slaby
On Sun, 28 Apr 2024 at 03:20, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
>
> If we keep the current model, WRITE_ONCE() is not sufficient.
>
> My understanding is that KCSAN's report like
I find it obnoxious that these are NOT REAL PROBLEMS.
It's KCSAN that is broken and doesn't allow us to just tell it to
sanely ignore things.
I don't want to add stupid and pointless annotations for a broken tooling.
Can you instead just ask the KCSAN people to have some mode where we
can annotate a pointer as a "use one or the other", and just shut that
thing up that way?
Because no, we're not adding some idiotic "f_op()" wrapper just to
shut KCSAN up about a non-issue.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v3] tty: tty_io: remove hung_up_tty_fops
@ 2024-04-27 19:02 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-27 19:02 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Dmitry Vyukov, syzbot, linux-kernel,
syzkaller-bugs, Nathan Chancellor, Arnd Bergmann, Al Viro,
Jiri Slaby
On Fri, 26 Apr 2024 at 23:21, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting data race between __tty_hangup() and __fput(), for
> filp->f_op readers are not holding tty->files_lock.
Hmm. I looked round, and we actually have another case of this:
snd_card_disconnect() also does
mfile->file->f_op = &snd_shutdown_f_ops;
and I don't think tty->files_lock (or, in the sound case,
&card->files_lock) is at all relevant, since the users of f_ops don't
use it or care.
That said, I really think we'd be better off just keeping the current
model, and have the "you get one or the other". For the two cases that
do this, do that f_op replacement with a WRITE_ONCE(), and just make
the rule be that you have to have all the same ops in both the
original and the shutdown version.
I do *not* think it's at all better to replace (in two different
places) the racy f_op thing with another racy 'hungup' flag.
The sound case is actually a bit more involved, since it tries to deal
with module counts. That looks potentially bogus. It does
fops_get(mfile->file->f_op);
after it has installed the snd_shutdown_f_ops, but in snd_open() it
has done the proper
replace_fops(file, new_fops);
which actually drops the module count for the old one. So the sound
case seems to possibly leak a module ref on disconnect. That's a
separate issue, though.
Linus
Linus
^ permalink raw reply [relevance 96%]
* Re: [GIT PULL] ACPI fixes for v6.9-rc6
2024-04-25 18:58 95% ` Linus Torvalds
2024-04-25 19:01 99% ` Linus Torvalds
@ 2024-04-25 19:18 96% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 19:18 UTC (permalink / raw)
To: Rafael J. Wysocki, Jarred White
Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List
On Thu, 25 Apr 2024 at 11:58, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And maybe this time, it's not a buggy mess?
Actually, even with MASK_VAL() fixed, I think it's *STILL* a buggy mess.
Why? Beuse the *uses* of MASK_VAL() seem entirely bogus.
In particular, we have this in cpc_write():
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
val = MASK_VAL(reg, val);
switch (size) {
case 8:
writeb_relaxed(val, vaddr);
break;
case 16:
writew_relaxed(val, vaddr);
break;
...
and I strongly suspect that it needs to update the 'vaddr' too. Something like
if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) {
val = MASK_VAL(reg, val);
#ifdef __LITTLE_ENDIAN
vaddr += reg->bit_offset >> 3;
if (reg->bit_offset & 7)
return -EFAULT;
#else
/* Fixme if we ever care */
if (reg->bit_offset)
return -EFAULT;
#endif
}
*might* be changing this in the right direction, but it's unclear and
I neither know that CPC rules, nor did I think _that_ much about it.
Anyway, the take-away should be that all this code is entirely broken
and somebody didn't think enough about it.
It's possible that that whole cpc_write() ACPI_ADR_SPACE_SYSTEM_MEMORY
case should be done as a 64-bit "read-mask-write" sequence.
Possibly with "reg->bit_offset == 0" and the 8/16/32/64-bit cases as a
special case for "just do the write".
Or, maybe writes with a non-zero bit offset shouldn't be allowed at
all, and there are CPC rules that aren't checked. I don't know. I only
know that the current code is seriously broken.
Linus
^ permalink raw reply [relevance 96%]
* Re: [GIT PULL] ACPI fixes for v6.9-rc6
2024-04-25 18:58 95% ` Linus Torvalds
@ 2024-04-25 19:01 99% ` Linus Torvalds
2024-04-25 19:18 96% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 19:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List
On Thu, 25 Apr 2024 at 11:58, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> When that macro now has had TWO independent bugs, how about you just
> write it out with explicit types and without any broken "helpers":
>
> static inline u64 MASK_VAL(const struct cpc_reg *reg, u64 val)
> {
> u64 mask = (1ull << reg->bit_width)-1;
> return (val >> reg->bit_offset) & mask;
> }
>
> which is a few more lines, but doesn't that make it a whole lot more readable?
>
> And maybe this time, it's not a buggy mess?
Just to clarify: that was written in the MUA, and entirely untested.
Somebody should still verify it, but really, with already now two
bugs, that macro needs fixing for good, and the "for good" should be
looking at least _something_ like the above.
And despite needing fixing, I've done the pull, since bug #2 is at
least less bad than bug#1 was.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] ACPI fixes for v6.9-rc6
@ 2024-04-25 18:58 95% ` Linus Torvalds
2024-04-25 19:01 99% ` Linus Torvalds
2024-04-25 19:18 96% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-25 18:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: ACPI Devel Maling List, Linux PM, Linux Kernel Mailing List
On Thu, 25 Apr 2024 at 10:46, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> - Fix bit offset computation in MASK_VAL() macro used for applying
> a bitmask to a new CPPC register value (Jarred White).
Honestly, that code should never have used GENMASK() in the first place.
When a helper macro is more complicated than just doing the obvious
thing without it, it's not a helper macro any more.
Doing
GENMASK(((reg)->bit_width) - 1, 0)
is literally more work than just doing the obvious thing
((1ul << (reg)->bit_width) - 1)
and using that "helper" macro was actually more error-prone too as
shown by this example, because of the whole "inclusive or not" issue.
BUT!
Even with that simpler model, that's still entirely buggy, since 'val'
is 64-bit, and these GENMASK tricks only work on 'long'.
Which happens to be ok on x86-64, of course, and maybe in practice all
fields are less than 32 bits in width anyway so maybe it even works on
32-bit, but this all smells HORRIBLY WRONG.
And no, the fix is *NOT* to make that GENVAL() mindlessly just be
GENVAL_ULL(). That fixes the immediate bug, but it shows - once again
- how mindlessly using "helper macros" is not the right thing to do.
When that macro now has had TWO independent bugs, how about you just
write it out with explicit types and without any broken "helpers":
static inline u64 MASK_VAL(const struct cpc_reg *reg, u64 val)
{
u64 mask = (1ull << reg->bit_width)-1;
return (val >> reg->bit_offset) & mask;
}
which is a few more lines, but doesn't that make it a whole lot more readable?
And maybe this time, it's not a buggy mess?
Linus
^ permalink raw reply [relevance 95%]
* Re: regression fixes sitting in subsystem git trees for a week or longer (was: Re: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor)
@ 2024-04-24 18:53 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-24 18:53 UTC (permalink / raw)
To: Thorsten Leemhuis
Cc: Jiri Kosina, Douglas Anderson, Hans de Goede, linux-input,
linux-kernel, Kenny Levinsen, Benjamin Tissoires,
Linux regressions mailing list
On Wed, 24 Apr 2024 at 09:56, Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> out of interest: what's your stance on regression fixes sitting in
> subsystem git trees for a week or longer before being mainlined?
Annoying, but probably depends on circumstances. The fact that it took
a while to even be noticed presumably means it's not common or holding
anything up.
That said, th4e last HID pull I have is from March 14. If the issue is
just that there's nothing else happening, I think people should just
point me to the patch and say "can you apply this single fix?"
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
@ 2024-04-23 16:37 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:37 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
On Tue, 23 Apr 2024 at 08:26, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/04/22 1:04, Linus Torvalds wrote:
> >
> > Actually, another option would be to just return an error at 'set_ldisc()' time.
>
> This patch works for me. You can propose a formal patch.
Ok, I wrote a commit message, added your tested-by, and sent it out
https://lore.kernel.org/all/20240423163339.59780-1-torvalds@linux-foundation.org/
let's see if anybody has better ideas, but that patch at least looks
palatable to me.
Linus
^ permalink raw reply [relevance 99%]
* [PATCH] tty: add the option to have a tty reject a new ldisc
@ 2024-04-23 16:33 89% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:33 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: linux-kernel, Linus Torvalds, Tetsuo Handa, Jiri Slaby,
Andrew Morton, Daniel Starke, syzbot
... and use it to limit the virtual terminals to just N_TTY. They are
kind of special, and in particular, the "con_write()" routine violates
the "writes cannot sleep" rule that some ldiscs rely on.
This avoids the
BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659
when N_GSM has been attached to a virtual console, and gsmld_write()
calls con_write() while holding a spinlock, and con_write() then tries
to get the console lock.
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Daniel Starke <daniel.starke@siemens.com>
Reported-by: syzbot <syzbot+dbac96d8e73b61aa559c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=dbac96d8e73b61aa559c
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
drivers/tty/tty_ldisc.c | 6 ++++++
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/tty_driver.h | 8 ++++++++
3 files changed, 24 insertions(+)
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 3f68e213df1f..d80e9d4c974b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -545,6 +545,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
goto out;
}
+ if (tty->ops->ldisc_ok) {
+ retval = tty->ops->ldisc_ok(tty, disc);
+ if (retval)
+ goto out;
+ }
+
old_ldisc = tty->ldisc;
/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 9b5b98dfc8b4..cd87e3d1291e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3576,6 +3576,15 @@ static void con_cleanup(struct tty_struct *tty)
tty_port_put(&vc->port);
}
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+ return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3695,6 +3704,7 @@ static const struct tty_operations con_ops = {
.resize = vt_resize,
.shutdown = con_shutdown,
.cleanup = con_cleanup,
+ .ldisc_ok = con_ldisc_ok,
};
static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 7372124fbf90..dd4b31ce6d5d 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -154,6 +154,13 @@ struct serial_struct;
*
* Optional. Called under the @tty->termios_rwsem. May sleep.
*
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ * This routine allows the @tty driver to decide if it can deal
+ * with a particular @ldisc.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
* @set_ldisc: ``void ()(struct tty_struct *tty)``
*
* This routine allows the @tty driver to be notified when the device's
@@ -372,6 +379,7 @@ struct tty_operations {
void (*hangup)(struct tty_struct *tty);
int (*break_ctl)(struct tty_struct *tty, int state);
void (*flush_buffer)(struct tty_struct *tty);
+ int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
void (*set_ldisc)(struct tty_struct *tty);
void (*wait_until_sent)(struct tty_struct *tty, int timeout);
void (*send_xchar)(struct tty_struct *tty, u8 ch);
--
2.44.0.330.g4d18c88175
^ permalink raw reply related [relevance 89%]
* Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling
@ 2024-04-23 16:13 97% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-23 16:13 UTC (permalink / raw)
To: Shrikanth Hegde
Cc: Ankur Arora, Thomas Gleixner, peterz, paulmck, akpm, luto, bp,
dave.hansen, hpa, mingo, juri.lelli, vincent.guittot, willy,
mgorman, jpoimboe, mark.rutland, jgross, andrew.cooper3, bristot,
mathieu.desnoyers, geert, glaubitz, anton.ivanov, mattst88,
krypton, rostedt, David.Laight, richard, mjguzik, jon.grimm,
bharata, raghavendra.kt, boris.ostrovsky, konrad.wilk, LKML,
Michael Ellerman, Nicholas Piggin
On Tue, 23 Apr 2024 at 08:23, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
>
> Tried this patch on PowerPC by defining LAZY similar to x86. The change is below.
> Kept it at PREEMPT=none for PREEMPT_AUTO.
>
> Running into soft lockup on large systems (40Cores, SMT8) and seeing close to 100%
> regression on small system ( 12 Cores, SMT8). More details are after the patch.
>
> Are these the only arch bits that need to be defined? am I missing something very
> basic here? will try to debug this further. Any inputs?
I don't think powerpc uses the generic *_exit_to_user_mode() helper
functions, so you'll need to also add that logic to the low-level
powerpc code.
IOW, on x86, with this patch series, patch 06/30 did this:
- if (ti_work & _TIF_NEED_RESCHED)
+ if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY))
schedule();
in kernel/entry/common.c exit_to_user_mode_loop().
But that works on x86 because it uses the irqentry_exit_to_user_mode().
On PowerPC, I think you need to at least fix up
interrupt_exit_user_prepare_main()
similarly (and any other paths like that - I used to know the powerpc
code, but that was long long LOOONG ago).
Linus
^ permalink raw reply [relevance 97%]
* Linux 6.9-rc5
@ 2024-04-21 19:53 47% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-21 19:53 UTC (permalink / raw)
To: Linux Kernel Mailing List
Another week, another -rc. Things look fairly normal, although the
diffstat for rc5 looks a bit wonky due to another rash of bcachefs
fixes, and a perf tools header sync with the main kernel headers.
But if you ignore those oddities, it all looks pretty normal and
things appear fairly calm. Which is just as well, since the first part
of the week I was on a quick trip to Seattle, and the second part of
the week I've been doing a passable imitation of the Fontana di Trevi,
except my medium is mucus. Sooo much mucus.
Anyway, moving on..
Apart from the already mentioned bcachefs and header updates, it's
mostly various drivers (gpu, networking, usb, tty, sound..) some
architecture updates (mainly x86 kvm), some small MM patches, some
core networking, a couple of small filesystem updates (fuse, 9p, nfsd)
and just random singleton patches elsewhere.
Shortlog appended for anybody who wants to get a feel for the details,
Linus
---
Ai Chao (1):
ALSA: hda/realtek - Enable audio jacks of Haier Boyue G42 with ALC269VC
Alan Stern (1):
fs: sysfs: Fix reference leak in sysfs_break_active_protection()
Alex Deucher (3):
Revert "drm/amd/display: fix USB-C flag update after enc10 feature init"
drm/radeon: make -fstrict-flex-arrays=3 happy
drm/radeon: silence UBSAN warning (v3)
Alexander Usyskin (1):
mei: me: disable RPL-S on SPS and IGN firmwares
Amir Goldstein (2):
fuse: fix wrong ff->iomode state changes from parallel dio write
fuse: fix parallel dio write on file open in passthrough mode
Andrew Jones (1):
KVM: selftests: fix supported_flags for riscv
Andy Shevchenko (5):
gpio: wcove: Use -ENOTSUPP consistently
gpio: crystalcove: Use -ENOTSUPP consistently
serial: 8250_pci: Remove redundant PCI IDs
serial: core: Clearing the circular buffer before NULLifying it
gpiolib: swnode: Remove wrong header inclusion
AngeloGioacchino Del Regno (2):
usb: typec: mux: it5205: Fix ChipID value typo
dt-bindings: pwm: mediatek,pwm-disp: Document power-domains property
Anshuman Khandual (1):
arm64/hugetlb: Fix page table walk in huge_pte_alloc()
Ard Biesheuvel (2):
arm64/head: Drop unnecessary pre-disable-MMU workaround
arm64/head: Disable MMU at EL2 before clearing HCR_EL2.E2H
Arınç ÜNAL (2):
net: dsa: mt7530: fix mirroring frames received on local port
net: dsa: mt7530: fix port mirroring for MT7988 SoC switch
Asbjørn Sloth Tønnesen (2):
net: sparx5: flower: fix fragment flags handling
octeontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementation
Bart Van Assche (1):
scsi: core: Fix handling of SCMD_FAIL_IF_RECOVERING
Borislav Petkov (AMD) (1):
x86/retpolines: Enable the default thunk warning only on relevant configs
Carlos Llamas (1):
binder: check offset alignment in binder_get_object()
Carolina Jubran (2):
net/mlx5e: Acquire RTNL lock before RQs/SQs activation/deactivation
net/mlx5e: Prevent deadlock while disabling aRFS
Chao Yu (1):
bcachefs: fix error path of __bch2_read_super()
Christian A. Ehrhardt (1):
usb: typec: ucsi: Fix connector check on init
Christian König (3):
drm/ttm: stop pooling cached NUMA pages v2
drm/amdgpu: remove invalid resource->start check v2
drm/amdgpu: fix visible VRAM handling during faults
Christoph Hellwig (1):
block: propagate partition scanning errors to the BLKRRPART ioctl
Christophe JAILLET (1):
KVM: SVM: Remove a useless zeroing of allocated memory
Chuanhong Guo (1):
USB: serial: option: add support for Fibocom FM650/FG650
Coia Prant (1):
USB: serial: option: add Lonsung U8300/U9300 product
Dan Carpenter (1):
serial: 8250_lpc18xx: disable clks on error in probe()
Daniel Golle (1):
clk: mediatek: mt7988-infracfg: fix clocks for 2nd PCIe port
Daniele Palmas (1):
USB: serial: option: add Telit FN920C04 rmnet compositions
Danny Lin (1):
fuse: fix leaked ENOSYS error on first statx call
Dave Airlie (1):
nouveau: fix instmem race condition around ptr stores
David Hildenbrand (1):
mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
David Matlack (4):
KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status
KVM: x86/mmu: Remove function comments above
clear_dirty_{gfn_range,pt_masked}()
KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs.
write-protecting
KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test
Dmitry Baryshkov (2):
drm/panel: visionox-rm69299: don't unregister DSI device
drm/panel: novatek-nt36682e: don't unregister DSI device
Dmitry Safonov (4):
selftests/tcp_ao: Make RST tests less flaky
selftests/tcp_ao: Zero-init tcp_ao_info_opt
selftests/tcp_ao: Fix fscanf() call for format-security
selftests/tcp_ao: Printing fixes to confirm with format-security
Emil Kronborg (1):
serial: mxs-auart: add spinlock around changing cts state
Eric Biggers (1):
x86/cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ
Eric Dumazet (1):
net/sched: Fix mirred deadlock on device recursion
Eric Van Hensbergen (2):
fs/9p: remove erroneous nlink init from legacy stat2inode
fs/9p: Revert "fs/9p: fix dups even in uncached mode"
Fabio Estevam (1):
usb: misc: onboard_usb_hub: Disable the USB hub clock on failure
Felix Fietkau (1):
net: ethernet: mtk_eth_soc: fix WED + wifi reset
Felix Kuehling (1):
drm/amdkfd: Fix memory leak in create_process failure
Finn Thain (1):
serial/pmac_zilog: Remove flawed mitigation for rx irq flood
Florian Westphal (1):
netfilter: nft_set_pipapo: do not free live element
Gerd Bayer (1):
s390/ism: Properly fix receive message buffer allocation
Gil Fine (2):
thunderbolt: Fix wake configurations after device unplug
thunderbolt: Avoid notify PM core about runtime PM resume
Greg Kroah-Hartman (1):
Revert "usb: cdc-wdm: close race between read and workqueue"
Hans de Goede (1):
serial: 8250_dw: Revert: Do not reclock if already at correct rate
Hou Wenlong (1):
x86/fred: Fix incorrect error code printout in fred_bad_type()
Huayu Zhang (1):
ALSA: hda/realtek: Fix volumn control of ThinkBook 16P Gen4
Jakub Kicinski (2):
inet: bring NLM_DONE out to a separate recv() again
selftests: kselftest_harness: fix Clang warning about zero-length format
James Bottomley (1):
MAINTAINERS: update to working email address
Jason A. Donenfeld (2):
random: handle creditable entropy from atomic process context
Revert "vmgenid: emit uevent when VMGENID updates"
Jason Gunthorpe (1):
iommufd: Add missing IOMMUFD_DRIVER kconfig for the selftest
Jeff Layton (1):
9p: explicitly deny setlease attempts
Jeongjun Park (1):
nilfs2: fix OOB in nilfs_set_de_type
Jerry Meng (1):
USB: serial: option: support Quectel EM060K sub-models
Joakim Sindholt (4):
fs/9p: only translate RWX permissions for plain 9P2000
fs/9p: translate O_TRUNC into OTRUNC
fs/9p: fix the cache always being enabled on files with qid flags
fs/9p: drop inodes immediately on non-.L too
Jose Ignacio Tornos Martinez (1):
net: usb: ax88179_178a: avoid writing the mac address before first reading
Josh Poimboeuf (1):
x86/bugs: Fix BHI retpoline check
Kai-Heng Feng (1):
usb: Disable USB3 LPM at shutdown
Kees Cook (1):
ubsan: Add awareness of signed integer overflow traps
Kent Overstreet (23):
bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree
split path
bcachefs: Fix UAFs of btree_insert_entry array
bcachefs: Check for packed bkeys that are too big
bcachefs: btree node scan: handle encrypted nodes
bcachefs: fix unsafety in bch2_extent_ptr_to_text()
bcachefs: fix unsafety in bch2_stripe_to_text()
bcachefs: fix race in bch2_btree_node_evict()
bcachefs: don't queue btree nodes for rewrites during scan
bcachefs: Standardize helpers for printing enum strs with bounds checks
bcachefs: Go rw if running any explicit recovery passes
bcachefs: Fix deadlock in journal replay
bcachefs: Fix missing write refs in fs fio paths
bcachefs: Run merges at BCH_WATERMARK_btree
bcachefs: Disable merges from interior update path
bcachefs: Fix btree node merging on write buffer btrees
bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
bcachefs: Interior known are required to have known key types
bcachefs: add safety checks in bch2_btree_node_fill()
bcachefs: Fix bch2_btree_node_fill() for !path
bcachefs: sysfs internal/trigger_journal_flush
bcachefs: bch_member.btree_allocated_bitmap
bcachefs: Check for backpointer bucket_offset >= bucket size
bcachefs: set_btree_iter_dontneed also clears should_be_locked
Konrad Dybcio (1):
interconnect: qcom: x1e80100: Remove inexistent ACV_PERF BCM
Krzysztof Kozlowski (2):
usb: phy: MAINTAINERS: mark Freescale USB PHY as orphaned
gpio: lpc32xx: fix module autoloading
Kuniyuki Iwashima (2):
af_unix: Call manage_oob() for every skb in unix_stream_read_generic().
af_unix: Don't peek OOB data without MSG_OOB.
Kyle Tso (1):
usb: typec: tcpm: Correct the PDO counting in pd_set
Lei Chen (1):
tun: limit printing rate when illegal packet received by tun dev
Li Nan (1):
blk-iocost: do not WARN if iocg was already offlined
Linus Torvalds (1):
Linux 6.9-rc5
Lokesh Gidra (1):
userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE
Lyude Paul (2):
drm/nouveau/kms/nv50-: Disable AUX bus for disconnected DP ports
drm/nouveau/dp: Don't probe eDP ports twice harder
Maarten Lankhorst (1):
drm/xe: Fix bo leak in intel_fb_bo_framebuffer_init
Manivannan Sadhasivam (1):
scsi: ufs: qcom: Add missing interconnect bandwidth values for Gear 5
Marcin Szycik (1):
ice: Fix checking for unsupported keys on non-tunnel device
Mario Limonciello (4):
platform/x86/amd: pmf: Decrease error message to debug
platform/x86/amd: pmf: Add infrastructure for quirking supported funcs
platform/x86/amd: pmf: Add quirk for ROG Zephyrus G14
platform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes
Mark Zhang (1):
RDMA/cm: Print the old state when cm_destroy_id gets timeout
Masami Hiramatsu (Google) (1):
bootconfig: Fix the kerneldoc of _xbc_exit()
Mathias Nyman (1):
xhci: Fix root hub port null pointer dereference in xhci tracepoints
Mathieu Desnoyers (1):
sched: Add missing memory barrier in switch_mm_cid
Matthew Auld (1):
drm/xe/vm: prevent UAF with asid based lookup
Mauro Carvalho Chehab (1):
ALSA: hda/realtek: Add quirks for Huawei Matebook D14 NBLB-WAX9N
Maxim Levitsky (1):
KVM: selftests: fix max_guest_memory_test with more that 256 vCPUs
Maíra Canal (1):
drm/v3d: Don't increment `enabled_ns` twice
Miaohe Lin (2):
mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
fork: defer linking file vma until vma is fully initialized
Michael Ellerman (2):
powerpc/crypto/chacha-p10: Fix failure on non Power10
Documentation: embargoed-hardware-issues.rst: Add myself for Power
Michael Guralnik (1):
RDMA/mlx5: Fix port number for counter query in multi-port configuration
Michal Swiatkowski (2):
ice: tc: check src_vsi in case of traffic from VF
ice: tc: allow zero flags in parsing tc flower
Mika Westerberg (1):
thunderbolt: Do not create DisplayPort tunnels on adapters of
the same router
Mike Tipton (1):
interconnect: Don't access req_list while it's being manipulated
Mikhail Kobuk (1):
drm: nv04: Fix out of bounds access
Minas Harutyunyan (1):
usb: dwc2: host: Fix dereference issue in DDMA completion flow.
Muhammad Usama Anjum (1):
iommufd: Add config needed for iommufd_fail_nth
Namhyung Kim (11):
perf annotate: Make sure to call symbol__annotate2() in TUI
perf lock contention: Add a missing NULL check
tools/include: Sync uapi/drm/i915_drm.h with the kernel sources
tools/include: Sync uapi/linux/fs.h with the kernel sources
tools/include: Sync uapi/linux/kvm.h and asm/kvm.h with the kernel sources
tools/include: Sync uapi/sound/asound.h with the kernel sources
tools/include: Sync x86 CPU feature headers with the kernel sources
tools/include: Sync x86 asm/irq_vectors.h with the kernel sources
tools/include: Sync x86 asm/msr-index.h with the kernel sources
tools/include: Sync asm-generic/bitops/fls.h with the kernel sources
tools/include: Sync arm64 asm/cputype.h with the kernel sources
Naohiro Aota (2):
btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer
btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling
Naoya Horiguchi (1):
MAINTAINERS: update Naoya Horiguchi's email address
Nathan Chancellor (2):
configs/hardening: Fix disabling UBSAN configurations
configs/hardening: Disable CONFIG_UBSAN_SIGNED_WRAP
Nathan Lynch (1):
selftests/powerpc/papr-vpd: Fix missing variable initialization
Nikita Zhandarovich (1):
comedi: vmk80xx: fix incomplete endpoint checking
Norihiko Hama (1):
usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep
transport error
Oliver Neukum (1):
usb: xhci: correct return value in case of STS_HCE
Oscar Salvador (6):
mm,page_owner: update metadata for tail pages
mm,page_owner: fix refcount imbalance
mm,page_owner: fix accounting of pages when migrating
mm,page_owner: fix printing of stack records
mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
mm,page_owner: defer enablement of static branch
Pablo Neira Ayuso (7):
netfilter: br_netfilter: skip conntrack input hook for promisc packets
netfilter: nft_set_pipapo: walk over current view on netlink dump
netfilter: flowtable: validate pppoe header
netfilter: flowtable: incorrect pppoe tuple
netfilter: nf_tables: missing iterator type in lookup walk
netfilter: nf_tables: restore set elements when delete set fails
netfilter: nf_tables: fix memleak in map from abort path
Paul Barker (4):
net: ravb: Count packets instead of descriptors in R-Car RX path
net: ravb: Allow RX loop to move past DMA mapping errors
net: ravb: Fix GbEth jumbo packet RX checksum handling
net: ravb: Fix RX byte accounting for jumbo packets
Paul Cercueil (2):
usb: gadget: functionfs: Fix inverted DMA fence direction
usb: gadget: functionfs: Wait for fences before enqueueing DMABUF
Peter Oberparleiter (3):
s390/qdio: handle deferred cc1
s390/cio: fix race condition during online processing
s390/cio: log fake IRB events
Peter Xu (1):
mm/userfaultfd: allow hugetlb change protection upon poison entry
Phillip Lougher (1):
Squashfs: check the inode number is not the invalid value of zero
Pin-yen Lin (1):
clk: mediatek: Do a runtime PM get on controllers during probe
Qiang Zhang (1):
bootconfig: use memblock_free_late to free xbc memory to buddy
Qu Wenruo (1):
btrfs: do not wait for short bulk allocation
Raag Jadav (1):
pwm: dwc: allow suspend/resume for 16 channels
Rafael J. Wysocki (1):
thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up()
Rahul Rameshbabu (1):
net/mlx5e: Use channel mdev reference instead of global mdev
instance for coalescing
Randy Dunlap (1):
peci: linux/peci.h: fix Excess kernel-doc description warning
Richard Genoud (1):
MAINTAINERS: mailmap: update Richard Genoud's email address
Rick Edgecombe (1):
KVM: x86/mmu: x86: Don't overflow lpage_info when checking attributes
Ricky Wu (1):
misc: rtsx: Fix rts5264 driver status incorrect when card removed
Sakari Ailus (2):
Revert "mei: vsc: Call wake_up() in the threaded IRQ handler"
mei: vsc: Unregister interrupt handler for system suspend
Samuel Thibault (1):
speakup: Avoid crash on very long word
Sandipan Das (1):
KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms
Sean Christopherson (20):
KVM: Add helpers to consolidate gfn_to_pfn_cache's page split check
KVM: Check validity of offset+length of gfn_to_pfn_cache prior
to activation
KVM: Explicitly disallow activatating a gfn_to_pfn_cache with INVALID_GPA
KVM: x86/pmu: Disable support for adaptive PEBS
KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
KVM: selftests: Verify post-RESET value of PERF_GLOBAL_CTRL in PMCs test
KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding
KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV
KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run()
KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted
KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via
host save area
KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area
KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run()
KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD
KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible
KVM: VMX: Snapshot LBR capabilities during module initialization
perf/x86/intel: Expose existence of callback support to KVM
KVM: VMX: Disable LBR virtualization if the CPU doesn't support
LBR callstacks
KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update
KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start()
Serge Semin (3):
net: stmmac: Apply half-duplex-less constraint for DW QoS Eth only
net: stmmac: Fix max-speed being ignored on queue re-init
net: stmmac: Fix IP-cores specific MAC capabilities
Shay Drory (2):
net/mlx5: Lag, restore buckets number to default after hash LAG
deactivation
net/mlx5: Restore mistakenly dropped parts in register devlink flow
Shenghao Ding (2):
ALSA: hda/tas2781: correct the register for pow calibrated data
ALSA: hda/tas2781: Add new vendor_id and subsystem_id to support
ThinkPad ICE-1
Shengyu Li (1):
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
Shivaprasad G Bhat (1):
powerpc/iommu: Refactor spapr_tce_platform_iommu_attach_dev()
Siddharth Vadapalli (1):
net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
Srinivas Pandruvada (2):
platform/x86: ISST: Add Granite Rapids-D to HPM CPU list
platform/x86/intel-uncore-freq: Increase minor number support
Stephen Boyd (5):
clk: Remove prepare_lock hold assertion in __clk_release()
clk: Don't hold prepare_lock when calling kref_put()
clk: Initialize struct clk_core kref earlier
clk: Get runtime PM before walking tree during disable_unused
clk: Get runtime PM before walking tree for clk_summary
Steven Rostedt (Google) (1):
SUNRPC: Fix rpcgss_context trace event acceptor field
Sumanth Korikkar (1):
mm/shmem: inline shmem_is_huge() for disabled transparent hugepages
Sven Schnelle (1):
s390/mm: Fix NULL pointer dereference
Takashi Iwai (1):
ALSA: seq: ump: Fix conversion from MIDI2 to MIDI1 UMP messages
Tao Su (1):
KVM: VMX: Ignore MKTME KeyID bits when intercepting #PF for
allow_smaller_maxphyaddr
Tariq Toukan (1):
net/mlx5: SD, Handle possible devcom ERR_PTR
Thinh Nguyen (1):
usb: dwc3: ep0: Don't reset resource alloc flag
Tony Lindgren (2):
serial: core: Fix regression when runtime PM is not enabled
serial: core: Fix missing shutdown and startup for serial base port
Uwe Kleine-König (5):
clk: Provide !COMMON_CLK dummy for devm_clk_rate_exclusive_get()
usb: gadget: fsl: Initialize udc before using it
MAINTAINERS: Drop Li Yang as their email address stopped working
serial: stm32: Return IRQ_NONE in the ISR if no handling happend
serial: stm32: Reset .throttled state in .startup()
Vanillan Wang (2):
net:usb:qmi_wwan: support Rolling modules
USB: serial: option: add Rolling RW101-GL and RW135-GL support
Vasily Gorbik (1):
NFSD: fix endianness issue in nfsd4_encode_fattr4
Vitalii Torshyn (1):
ALSA: hda/realtek: Fixes for Asus GU605M and GA403U sound
Vitaly Rodionov (1):
ALSA: hda/realtek: Add quirk for HP SnowWhite laptops
Xin Li (Intel) (1):
x86/fred: Fix INT80 emulation for FRED
Yang Li (1):
cuse: add kernel-doc comments to cuse_process_init_reply()
Yanjun.Zhu (1):
RDMA/rxe: Fix the problem "mutex_destroy missing"
Yaxiong Tian (1):
arm64: hibernate: Fix level3 translation fault in swsusp_save()
Yuanhe Shu (1):
selftests/ftrace: Limit length in subsystem-enable tests
Yuntao Wang (1):
init/main.c: Fix potential static_command_line memory overflow
Yuri Benditovich (1):
net: change maximum number of UDP segments to 128
Zack Rusin (3):
drm/vmwgfx: Fix prime import/export
drm/vmwgfx: Fix crtc's atomic check conditional
drm/vmwgfx: Sort primary plane formats by order of preference
Ziyang Xuan (2):
netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get()
netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get()
bolan wang (1):
USB: serial: option: add Fibocom FM135-GL variants
xinhui pan (1):
drm/amdgpu: validate the parameters of bo mapping operations more clearly
^ permalink raw reply [relevance 47%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
2024-04-21 16:04 95% ` Linus Torvalds
@ 2024-04-21 17:18 87% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-21 17:18 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]
On Sun, 21 Apr 2024 at 09:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The only option is to *mark* the ones that are atomic. Which was my suggestion.
Actually, another option would be to just return an error at 'set_ldisc()' time.
Sadly, the actual "tty->ops->set_ldisc()" function not only returns
'void' (easy enough to change - there aren't that many of them), but
it's called too late after the old ldisc has already been dropped.
It's basically a "inform tty about new ldisc" and is not useful for a
"is this ok"?
But we could trivially add a "ldisc_ok()" function, and have the vt
driver say "I only accept N_TTY".
Something like this ENTIRELY UNTESTED patch.
Again - this is untested, and maybe there are other tty drivers that
have issues with the stranger line disciplines, but this at least
seems simple and fairly easy to explain why we do what we do..
And if pty's really need the same thing, that would be easy to add.
But I actually think that at least pty slaves should *not* limit
ldiscs, because the whole point of a pty slave is to look like another
tty. If you want to emulate a serial device over a network, the way to
do it would be with a pty.
Hmm?
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2497 bytes --]
drivers/tty/tty_ldisc.c | 6 ++++++
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/tty_driver.h | 8 ++++++++
3 files changed, 24 insertions(+)
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 3f68e213df1f..d80e9d4c974b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -545,6 +545,12 @@ int tty_set_ldisc(struct tty_struct *tty, int disc)
goto out;
}
+ if (tty->ops->ldisc_ok) {
+ retval = tty->ops->ldisc_ok(tty, disc);
+ if (retval)
+ goto out;
+ }
+
old_ldisc = tty->ldisc;
/* Shutdown the old discipline. */
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 9b5b98dfc8b4..cd87e3d1291e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3576,6 +3576,15 @@ static void con_cleanup(struct tty_struct *tty)
tty_port_put(&vc->port);
}
+/*
+ * We can't deal with anything but the N_TTY ldisc,
+ * because we can sleep in our write() routine.
+ */
+static int con_ldisc_ok(struct tty_struct *tty, int ldisc)
+{
+ return ldisc == N_TTY ? 0 : -EINVAL;
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3695,6 +3704,7 @@ static const struct tty_operations con_ops = {
.resize = vt_resize,
.shutdown = con_shutdown,
.cleanup = con_cleanup,
+ .ldisc_ok = con_ldisc_ok,
};
static struct cdev vc0_cdev;
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 7372124fbf90..dd4b31ce6d5d 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -154,6 +154,13 @@ struct serial_struct;
*
* Optional. Called under the @tty->termios_rwsem. May sleep.
*
+ * @ldisc_ok: ``int ()(struct tty_struct *tty, int ldisc)``
+ *
+ * This routine allows the @tty driver to decide if it can deal
+ * with a particular @ldisc.
+ *
+ * Optional. Called under the @tty->ldisc_sem and @tty->termios_rwsem.
+ *
* @set_ldisc: ``void ()(struct tty_struct *tty)``
*
* This routine allows the @tty driver to be notified when the device's
@@ -372,6 +379,7 @@ struct tty_operations {
void (*hangup)(struct tty_struct *tty);
int (*break_ctl)(struct tty_struct *tty, int state);
void (*flush_buffer)(struct tty_struct *tty);
+ int (*ldisc_ok)(struct tty_struct *tty, int ldisc);
void (*set_ldisc)(struct tty_struct *tty);
void (*wait_until_sent)(struct tty_struct *tty, int timeout);
void (*send_xchar)(struct tty_struct *tty, u8 ch);
^ permalink raw reply related [relevance 87%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
@ 2024-04-21 16:04 95% ` Linus Torvalds
2024-04-21 17:18 87% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-21 16:04 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
On Sun, 21 Apr 2024 at 06:28, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> "struct tty_ldisc_ops" says that ->write() function (e.g. gsmld_write())
> is allowed to sleep and "struct tty_operations" says that ->write() function
> (e.g. con_write()) is not allowed to sleep.
Well, clearly con_write() *is* allowed to sleep. The very first thing
it does is that
console_lock();
thing, which uses a sleeping semaphore.
But yes, the comment in the header does say "may not sleep".
Clearly that comment doesn't actually reflect reality - and never did.
The console lock sleeping isn't some new thing (ie it doesn't come
from the somewhat recent printk changes).
So the comment is bogus and wrong.
> Thus, I initially proposed
> https://lkml.kernel.org/r/9cd9d3eb-418f-44cc-afcf-7283d51252d6@I-love.SAKURA.ne.jp
> which makes con_write() no-op when called with IRQs disabled.
The thing is, that's not the only thing that makes atomic context.
And some atomic contexts cannot be detected at run-time, they are
purely static (ie being inside a spinlock withg a !PREEMPT kernel
build).
So you cannot test for this.
The only option is to *mark* the ones that are atomic. Which was my suggestion.
> My major/minor approach is based on a suggestion from Jiri that we just somehow
> disallow attaching this line discipline to a console
Since we already know that the comment is garbage, why do you think
it's just a con_write() that has this issue?
And if it is only the console that has this issue, why are you testing
for other major/minor numbers?
> Now, your 'struct tty_operations' flag saying 'my ->write() function is OK with
> atomic context' is expected to be set to all drivers.
I'm not convinced. The only thing I know is that the comment in
question is wrong, and has been wrong for over a decade (and honestly,
probably pretty much forever).
So how confident are we that other tty write functions are ok?
Also, since you think that only con_write() has a problem, why the
heck are you then testing for ptys etc? From a quick check, the
pty->ops->write() function is fine.
Linus
^ permalink raw reply [relevance 95%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
2024-04-20 18:02 99% ` Linus Torvalds
@ 2024-04-20 18:05 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 18:05 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
On Sat, 20 Apr 2024 at 11:02, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Most other normal tty devices just expect ->write() to be called in
> normal process context, so if we do a line discipline flag, it would
^^^^^^^^^^^^^^^^^^^^
> have to be something like "I'm ok with being called with interrupts
> disabled", and then the n_gsm ->open function would just check that.
Not line discipline - it would be a 'struct tty_operations' flag
saying 'my ->write() function is ok with atomic context".
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
2024-04-20 17:34 97% ` Linus Torvalds
@ 2024-04-20 18:02 99% ` Linus Torvalds
2024-04-20 18:05 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 18:02 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
On Sat, 20 Apr 2024 at 10:34, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Alternatively, we could go the opposite way, and have some flag in the
> line discipline that says "I can be a console", and just check that in
> tty_set_ldisc() for the console.
Actually, I take that back. It's not /dev/console that is the problem,
that just happened to be the one oops I looked at.
Most other normal tty devices just expect ->write() to be called in
normal process context, so if we do a line discipline flag, it would
have to be something like "I'm ok with being called with interrupts
disabled", and then the n_gsm ->open function would just check that.
So it would end up being just another form of that
+ if (tty->ops->set_serial == NULL)
+ return -EINVAL;
check - but maybe more explicit and prettier.
Because a real serial driver might not be ok with it either, if it
uses a semaphore or something.
Whatever. I think the 'set_serial' test would at least be an improvement.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v2] tty: n_gsm: restrict tty devices to attach
@ 2024-04-20 17:34 97% ` Linus Torvalds
2024-04-20 18:02 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-20 17:34 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Greg Kroah-Hartman, Jiri Slaby, Andrew Morton, Starke, Daniel,
LKML, linux-security-module
On Sat, 20 Apr 2024 at 04:12, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Since n_gsm is designed to be used for serial port [1], reject attaching to
> virtual consoles and PTY devices, by checking tty's device major/minor
> numbers at gsmld_open().
If we really just want to restrict it to serial devices, then do
something like, this:
drivers/tty/n_gsm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 4036566febcb..24425ef35b2b 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -3629,6 +3629,8 @@ static int gsmld_open(struct tty_struct *tty)
if (tty->ops->write == NULL)
return -EINVAL;
+ if (tty->ops->set_serial == NULL)
+ return -EINVAL;
/* Attach our ldisc data */
gsm = gsm_alloc_mux();
which at least matches the current (largely useless) pattern of
checking for a write function.
I think all real serial sub-drivers already have that 'set_serial()'
function, and if there are some that don't, we could just add a dummy
for them. No?
Alternatively, we could go the opposite way, and have some flag in the
line discipline that says "I can be a console", and just check that in
tty_set_ldisc() for the console.
That would probably be a good idea regardless, but likely requires more effort.
But this kind of random major number testing seems wrong. It's trying
to deal with the _symptoms_, not some deeper truth.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] Btrfs fixes for 6.9-rc5
@ 2024-04-18 0:14 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-18 0:14 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs, linux-kernel
On Wed, 17 Apr 2024 at 16:53, David Sterba <dsterba@suse.com> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.9-rc4-tag
Nol such tag. I see the branch 'for-6.9-rc4' with the right commit,
but not the signed tag. Forgot to push out?
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 00/19] Enable -Wshadow=local for kernel/sched
2024-04-17 0:29 99% ` Linus Torvalds
@ 2024-04-17 0:50 90% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-17 0:50 UTC (permalink / raw)
To: Kees Cook
Cc: Matthew Wilcox (Oracle),
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, linux-kernel
On Tue, 16 Apr 2024 at 17:29, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So what is the solution to
>
> #define MAX(a,b) ({ \
Side note: do not focus on the macro name. I'm not interested in "the
solution is MAX3()" kinds of answers.
And the macro doesn't have to be physically nested like that.
The macro could be a list traversal thing. Appended is an example
list traversal macro that is type-safe and simple to use, and would
absolutely improve on our current "list_for_each_entry()" in many
ways.
Imagine now traversing a list within an entry that happens while
traversing an outer one. Which is not at all an odd thing, IOW, you'd
have
traverse(bus_list, bus) {
traverse(&bus->devices, device) {
.. do something with the device ..
}
}
this kind of macro use that has internal variables that will
inevitably shadow each other when used in some kind of nested model is
pretty fundamental.
So no. The answer is *NOT* some kind of "don't do that then".
Linus
PS. The list trraversal thing below worked at some point. It's an old
snippet of mine, it might not work any more. It depends on the kernel
'list_head' definitions, it's not a standalone example.
---
#define list_traversal_head(type, name, member) union { \
struct list_head name; \
struct type *name##_traversal_type; \
struct type##_##name##_##member##_traversal_struct
*name##_traversal_info; \
}
#define list_traversal_node(name) union { \
struct list_head name; \
int name##_traversal_node; \
}
#define DEFINE_TRAVERSAL(from, name, to, member) \
struct to##_##name##_##member##_traversal_struct { \
char dummy[offsetof(struct to, member##_traversal_node)]; \
struct list_head node; \
}
#define __traverse_type(head, ext) typeof(head##ext)
#define traverse_type(head, ext) __traverse_type(head, ext)
#define traverse_offset(head) \
offsetof(traverse_type(*head,_traversal_info), node)
#define traverse_is_head(head, raw) \
((void *)(raw) == (void *)(head))
/*
* Very annoying. We want 'node' to be of the right type, and __raw to be
* the underlying "struct list_head". But while we can declare multiple
* variables in a for-loop in C99, we can't declare multiple _types_.
*
* So __raw has the wrong type, making everything pointlessly uglier.
*/
#define traverse(head, node) \
for (typeof(*head##_traversal_type) __raw = (void
*)(head)->next, node; \
node = (void *)__raw + traverse_offset(*head),
!traverse_is_head(head, __raw); \
__raw = (void *) ((struct list_head *)__raw)->next)
struct first_struct {
int offset[6];
list_traversal_head(second_struct, head, entry);
};
struct second_struct {
int hash;
int offset[17];
list_traversal_node(entry);
};
DEFINE_TRAVERSAL(first_struct, head, second_struct, entry);
struct second_struct *find(struct first_struct *p)
{
traverse(&p->head, node) {
if (node->hash == 1234)
return node;
}
return NULL;
}
^ permalink raw reply [relevance 90%]
* Re: [PATCH 00/19] Enable -Wshadow=local for kernel/sched
@ 2024-04-17 0:29 99% ` Linus Torvalds
2024-04-17 0:50 90% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-17 0:29 UTC (permalink / raw)
To: Kees Cook
Cc: Matthew Wilcox (Oracle),
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, linux-kernel
On Tue, 16 Apr 2024 at 14:15, Kees Cook <keescook@chromium.org> wrote:
>
> I was looking at -Wshadow=local again, and remembered this series. It
> sounded like things were close, but a tweak was needed. What would be
> next to get this working?
So what is the solution to
#define MAX(a,b) ({ \
typeof(a) __a = (a); \
typeof(b) __b = (b); \
__a > __b ? __a : __b; \
})
int test(int a, int b, int c)
{
return MAX(a, MAX(b,c));
}
where -Wshadow=all causes insane warnings that are bogus garbage?
Honestly, Willy's patch-series is a hack to avoid this kind of very
natural nested macro pattern.
But it's a horrible hack, and it does it by making the code actively worse.
Here's the deal: if we can't handle somethng like the above without
warning, -Wshadow isn't getting enabled.
Because we don't write worse code because of bad warnings.
IOW, what is the sane way to just say "this variable can shadow the
use site, and it's fine"?
Without that kind of out, I don't think -Wshadow=local is workable.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v10 1/5] mseal: Wire up mseal syscall
@ 2024-04-15 18:21 96% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-15 18:21 UTC (permalink / raw)
To: Muhammad Usama Anjum
Cc: jeffxu, akpm, keescook, jannh, sroettger, willy, gregkh, corbet,
Liam.Howlett, surenb, merimus, rdunlap, jeffxu, jorgelo, groeck,
linux-kernel, linux-kselftest, linux-mm, pedro.falcato,
dave.hansen, linux-hardening, deraadt
On Mon, 15 Apr 2024 at 11:11, Muhammad Usama Anjum
<usama.anjum@collabora.com> wrote:
>
> It isn't logical to wire up something which isn't present
Actually, with system calls, the rules end up being almost opposite.
There's no point in adding the code if it's not reachable. So adding
the system call code before adding the wiring makes no sense.
So you have two cases: add the stubs first, or add the code first.
Neither does anything without the other.
So then you go "add both in the same commit" option, which ends up
being horrible from a "review the code" standpoint. The two parts are
entirely different and mixing them up makes the patch very unclear
(and has very different target audiences for reviewing it - the MM
people really shouldn't have to look at the architecture wiring
parts).
End result: there are no "this is the logical ordering" cases.
But the "wire up system calls" part actually has some reasons to be first:
- it reserves the system call number
- it adds the "when system call isn't enabled, return -ENOSYS"
conditional system call logic
so I actually tend prefer this ordering when it comes to system calls.
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed
@ 2024-04-15 15:47 98% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-15 15:47 UTC (permalink / raw)
To: Nikolay Borisov
Cc: Josh Poimboeuf, x86, linux-kernel, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, KP Singh, Waiman Long,
Borislav Petkov, Ingo Molnar
On Mon, 15 Apr 2024 at 08:27, Nikolay Borisov <nik.borisov@suse.com> wrote:
>
> Same as with every issue - assess the problem and develop fixes.
No. Let's have at least all the infrastructure in place to be a bit proactive.
> Let's be honest, the indirect branches in the syscall handler aren't the
> biggest problem
Oh, they have been.
> it's the stacked LSMs.
Hopefully those will get fixed too.
There's a few other fairly reachable ones (the timer indirection ones
are much too close, and VFS file ops aren't entirely out of reach).
But maybe some day we'll be in a situation where it's actually fairly
hard to reach indirect kernel calls from untrusted user space.
The system call ones are pretty much always the first ones, though.
> And even if those get fixes
> chances are the security people will likely find some other avenue of
> attack, I think even now the attack is somewhat hard to pull off.
No disagreement about that. I think outright sw bugs are still the
99.9% thing. But let's learn from history instead of "assess the
problem" every time anew.
Linus
^ permalink raw reply [relevance 98%]
* Re: [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed
@ 2024-04-15 15:16 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-15 15:16 UTC (permalink / raw)
To: Nikolay Borisov
Cc: Josh Poimboeuf, x86, linux-kernel, Daniel Sneddon, Pawan Gupta,
Thomas Gleixner, Alexandre Chartre, Konrad Rzeszutek Wilk,
Peter Zijlstra, Greg Kroah-Hartman, Sean Christopherson,
Andrew Cooper, Dave Hansen, KP Singh, Waiman Long,
Borislav Petkov, Ingo Molnar
On Mon, 15 Apr 2024 at 00:37, Nikolay Borisov <nik.borisov@suse.com> wrote:
>
> To ask again, what do we gain by having this syscall hardening at the
> same time as the always on BHB scrubbing sequence?
What happens the next time some indirect call problem comes up?
If we had had *one* hardware bug in this area, that would be one
thing. But this has been going on for a decade now.
Linus
^ permalink raw reply [relevance 99%]
* Linux 6.9-rc4
@ 2024-04-14 20:48 43% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-14 20:48 UTC (permalink / raw)
To: Linux Kernel Mailing List
Nothing particularly unusual going on this week - some new hw
mitigations may stand out, but after a decade of this I can't really
call it "unusual" any more, can I? We also had a bit more bcachefs
fixes, and a turbostat tool update, but other than that it's the
regular drop of random stuff all over.
Drivers end up being the bulk of the remaining stuff, and we still had
some timer fallout from the big timer updates this merge window.
Nothing else really strikes me, but the foll shortlog is appended as
usual - easy enough to just scan through to get kind of a flavor of
what has been going on.
Linus
---
Aaro Koskinen (6):
ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
ARM: OMAP2+: fix N810 MMC gpiod table
mmc: omap: fix broken slot switch lookup
mmc: omap: fix deferred probe
mmc: omap: restore original power up/down steps
ARM: OMAP2+: fix USB regression on Nokia N8x0
Abhinav Kumar (1):
drm/msm/dp: fix typo in dp_display_handle_port_status_changed()
Adam Dunlap (1):
x86/apic: Force native_apic_mem_read() to use the MOV instruction
Adrian Hunter (1):
bug: Fix no-return-statement warning with !CONFIG_BUG
Alex Constantino (1):
Revert "drm/qxl: simplify qxl_fence_wait"
Alex Deucher (1):
drm/amdgpu: always force full reset for SOC21
Alex Hung (2):
drm/amd/display: Skip on writeback when it's not applicable
drm/amd/display: Return max resolution supported by DWB
Alexander Wetzel (1):
scsi: sg: Avoid race in error handling & drop bogus warn
Alexey Izbyshev (1):
io_uring: Fix io_cqring_wait() not restoring sigmask on
get_timespec64() failure
Amir Goldstein (1):
kernfs: annotate different lockdep class for of->mutex of writable files
Anna-Maria Behnsen (1):
PM: s2idle: Make sure CPUs will wakeup directly on resume
Archie Pusaka (1):
Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
Ard Biesheuvel (1):
gcc-plugins/stackleak: Avoid .head.text section
Arnd Bergmann (8):
ubsan: fix unused variable warning in test module
nouveau: fix function cast warning
lib: checksum: hide unused expected_csum_ipv6_magic[]
irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
ipv6: fib: hide unused 'pn' variable
ipv4/route: avoid unused-but-set-variable warning
net/mlx5: fix possible stack overflows
tracing: hide unused ftrace_event_id_fops
Arınç ÜNAL (2):
net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Ashutosh Dixit (1):
drm/xe: Label RING_CONTEXT_CONTROL as masked
Bagas Sanjaya (2):
Documentation: filesystems: Add bcachefs toctree
MAINTAINERS: Add entry for bcachefs documentation
Bernhard Rosenkränzer (1):
platform/x86: acer-wmi: Add support for Acer PH18-71
Bjorn Helgaas (1):
Revert "PCI: Mark LSI FW643 to avoid bus reset"
Boris Brezillon (1):
drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
Boris Burkov (6):
btrfs: qgroup: correctly model root qgroup rsv in convert
btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
btrfs: record delayed inode root in transaction
btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve
btrfs: always clear PERTRANS metadata during commit
Breno Leitao (1):
virtio_net: Do not send RSS key if it is not supported
Brett Creeley (1):
pds_core: Fix pdsc_check_pci_health function to use work thread
Carolina Jubran (4):
net/mlx5e: RSS, Block changing channels number when RXFH is configured
net/mlx5e: Fix mlx5e_priv_init() cleanup flow
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
net/mlx5e: RSS, Block XOR hash with over 128 channels
Chen Yu (1):
tools/power turbostat: Do not print negative LPI residency
Cosmin Ratiu (2):
net/mlx5: Properly link new fs rules into the tree
net/mlx5: Correctly compare pkt reformat ids
Cristian Marussi (1):
firmware: arm_scmi: Make raw debugfs entries non-seekable
Damien Le Moal (2):
ata: ahci: Add mask_port_map module parameter
ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
Dan Carpenter (2):
bcachefs: fix ! vs ~ typo in __clear_bit_le64()
scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
Daniel Machon (1):
net: sparx5: fix wrong config being used when reconfiguring PCS
Daniel Sneddon (3):
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
KVM: x86: Add BHI_NO
x86/bugs: Fix return type of spectre_bhi_state()
Dave Airlie (2):
nouveau: fix devinit paths to only handle display on GSP.
amdkfd: use calloc instead of kzalloc to avoid integer overflow
Dave Jiang (6):
cxl/core/regs: Fix usage of map->reg_type in
cxl_decode_regblock() before assigned
cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
cxl: Fix retrieving of access_coordinates in PCIe path
cxl: Fix incorrect region perf data calculation
cxl: Consolidate dport access_coordinate ->hb_coord and
->sw_coord into ->coord
cxl: Add checks to access_coordinate calculation to fail missing data
David Arinzon (4):
net: ena: Fix potential sign extension issue
net: ena: Wrong missing IO completions check order
net: ena: Fix incorrect descriptor free behavior
net: ena: Set tx_info->xdpf value to NULL
David McFarland (1):
platform/x86/intel/hid: Don't wake on 5-button releases
Dexuan Cui (1):
swiotlb: do not set total_used to 0 in swiotlb_create_debugfs_files()
Dillon Varone (1):
drm/amd/display: Do not recursively call manual trigger programming
Dmitry Antipov (1):
Bluetooth: Fix memory leak in hci_req_sync_complete()
Dmitry Baryshkov (3):
drm/msm/dpu: don't allow overriding data from catalog
drm/msm/dpu: make error messages at
dpu_core_irq_register_callback() more sensible
dt-bindings: display/msm: sm8150-mdss: add DP node
Doug Smythies (1):
tools/power turbostat: Fix added raw MSR output
Eric Dumazet (6):
xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
geneve: fix header validation in geneve[6]_xmit_skb
net: add copy_safe_from_sockptr() helper
mISDN: fix MISDN_TIME_STAMP handling
nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
netfilter: complete validation of user input
Erni Sri Satya Vennela (1):
x86/hyperv: Cosmetic changes for hv_apic.c
Fabio Estevam (2):
ARM: dts: imx7-mba7: Use 'no-mmc' property
ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
Frank Li (8):
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
arm64: dts: imx8-ss-conn: fix usb lpcg indices
arm64: dts: imx8-ss-dma: fix spi lpcg indices
arm64: dts: imx8-ss-dma: fix pwm lpcg indices
arm64: dts: imx8-ss-dma: fix adc lpcg indices
arm64: dts: imx8-ss-dma: fix can lpcg indices
arm64: dts: imx8qm-ss-dma: fix can lpcg indices
Fudongwang (1):
drm/amd/display: fix disable otg wa logic in DCN316
Gavin Shan (3):
vhost: Add smp_rmb() in vhost_vq_avail_empty()
vhost: Add smp_rmb() in vhost_enable_notify()
arm64: tlb: Fix TLBI RANGE operand
Geetha sowjanya (1):
octeontx2-af: Fix NIX SQ mode and BP config
Gerd Bayer (2):
s390/ism: fix receive message buffer allocation
Revert "s390/ism: fix receive message buffer allocation"
Gergo Koteles (1):
platform/x86: lg-laptop: fix %s null argument warning
Gwendal Grignou (2):
platform/x86: intel-vbtn: Use acpi_has_method to check for switch
platform/x86: intel-vbtn: Update tablet mode switch at end of probe
Haiyue Wang (1):
io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKE
Hans de Goede (2):
ACPI: scan: Do not increase dep_unmet for already met dependencies
platform/x86: toshiba_acpi: Silence logging for some events
Hariprasad Kelam (1):
octeontx2-pf: Fix transmit scheduler resource leak
Harish Kasiviswanathan (1):
drm/amdkfd: Reset GPU on queue preemption failure
Harry Wentland (2):
drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
Heiner Kallweit (2):
r8169: fix LED-related deadlock on module removal
r8169: add missing conditional compiling for call to r8169_remove_leds
Himal Prasad Ghimiray (1):
drm/xe/xe_migrate: Cast to output precision before multiplying operands
Hongbo Li (1):
bcachefs: fix the count of nr_freed_pcpu after changing
bc->freed_nonpcpu list
Huacai Chen (7):
mm: Move lowmem_page_address() a little later
LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET
Igor Pylypiv (1):
ata: libata-core: Allow command duration limits detection for ACS-4 drives
Ilya Maximets (1):
net: openvswitch: fix unwanted error log on timeout policy probing
Ingo Molnar (1):
x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
Irui Wang (1):
media: mediatek: vcodec: Handle VP9 superframe bitstream with 8 sub-frames
Jacek Lawrynowicz (5):
accel/ivpu: Remove d3hot_after_power_off WA
accel/ivpu: Put NPU back to D3hot after failed resume
accel/ivpu: Return max freq for DRM_IVPU_PARAM_CORE_CLOCK_RATE
accel/ivpu: Fix missed error message after VPU rename
accel/ivpu: Fix deadlock in context_xa
Jacob Pan (1):
iommu/vt-d: Allocate local memory for page request queue
Jammy Huang (1):
drm/ast: Fix soft lockup
Jeff Layton (1):
MAINTAINERS: remove myself as a Reviewer for Ceph
Jens Wiklander (1):
firmware: arm_ffa: Fix the partition ID check in
ffa_notification_info_get()
Jiaxun Yang (1):
MIPS: scall: Save thread_info.syscall unconditionally on entry
Jiri Benc (1):
ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
Johan Hovold (2):
drm/msm/dp: fix runtime PM leak on disconnect
drm/msm/dp: fix runtime PM leak on connect failure
John Harrison (1):
drm/i915/guc: Fix the fix for reset lock confusion
John Stultz (3):
selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior
selftests: timers: Fix posix_timers ksft_print_msg() warning
selftests: timers: Fix abs() warning in posix_timers test
Josh Poimboeuf (7):
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
x86/bugs: Fix BHI documentation
x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
x86/bugs: Fix BHI handling of RRSBA
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with
CONFIG_MITIGATION_SPECTRE_BHI
Justin Ernst (1):
tools/power/turbostat: Fix uncore frequency file string
Karthik Poosa (1):
drm/xe/hwmon: Cast result to output precision on left shift of operand
Kees Cook (2):
randomize_kstack: Improve entropy diffusion
nouveau/gsp: Avoid addressing beyond end of rpc->entries
Kenneth Feng (1):
drm/amd/pm: fix the high voltage issue after unload
Kent Overstreet (19):
bcachefs: Make snapshot_is_ancestor() safe
bcachefs: Bump limit in btree_trans_too_many_iters()
bcachefs: Move btree_updates to debugfs
bcachefs: Further improve btree_update_to_text()
bcachefs: Print shutdown journal sequence number
bcachefs: Fix rebalance from durability=0 device
bcachefs: fix rand_delete unit test
bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
bcachefs: JOURNAL_SPACE_LOW
bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
bcachefs: fix bch2_get_acl() transaction restart handling
bcachefs: fix eytzinger0_find_gt()
bcachefs: Fix check_topology() when using node scan
bcachefs: Don't scan for btree nodes when we can reconstruct
bcachefs: btree_node_scan: Respect member.data_allowed
bcachefs: Fix a race in btree_update_nodes_written()
bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
Krzysztof Kozlowski (3):
virtio: store owner from modules with register_virtio_driver()
MAINTAINERS: Change Krzysztof Kozlowski's email address
iommu: mtk: fix module autoloading
Kuniyuki Iwashima (1):
af_unix: Clear stale u->oob_skb.
Kuogee Hsieh (1):
drm/msm/dp: assign correct DP controller ID to x1e80100 interface table
Kwangjin Ko (1):
cxl/core: Fix initialization of mbox_cmd.size_out in get event
Lang Yu (1):
drm/amdgpu/umsch: reinitialize write pointer in hw init
Len Brown (4):
tools/power turbostat: Expand probe_intel_uncore_frequency()
tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read
tools/power turbostat: enhance -D (debug counter dump) output
tools/power turbostat: v2024.04.10
Li Ma (1):
drm/amd/display: add DCN 351 version for microcode load
Li Zhijian (1):
hv: vmbus: Convert sprintf() family to sysfs_emit() family
Lijo Lazar (3):
drm/amdgpu: Refine IB schedule error logging
drm/amdgpu: Reset dGPU if suspend got aborted
drm/amdgpu: Fix VCN allocation in CPX partition
Linus Torvalds (3):
x86/syscall: Don't force use of indirect calls for system calls
Kconfig: add some hidden tabs on purpose
Linux 6.9-rc4
Lu Baolu (1):
iommu/vt-d: Fix WARN_ON in iommu probe path
Luca Weiss (1):
drm/msm/adreno: Set highest_bank_bit for A619
Lucas De Marchi (1):
drm/xe/display: Fix double mutex initialization
Luiz Augusto von Dentz (7):
Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
Bluetooth: SCO: Fix not validating setsockopt user input
Bluetooth: RFCOMM: Fix not validating setsockopt user input
Bluetooth: L2CAP: Fix not validating setsockopt user input
Bluetooth: ISO: Fix not validating setsockopt user input
Bluetooth: hci_sock: Fix not validating setsockopt user input
Manivannan Sadhasivam (1):
MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer
Marek Vasut (2):
net: ks8851: Inline ks8851_rx_skb()
net: ks8851: Handle softirqs at the end of IRQ thread to fix hang
Masami Hiramatsu (1):
fs/proc: Skip bootloader comment if no embedded kernel parameters
Maurizio Lombardi (1):
scsi: target: Fix SELinux error when systemd-modules loads the
target module
Michael Kelley (2):
swiotlb: fix swiotlb_bounce() to do partial sync's correctly
Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
Michael Liang (1):
net/mlx5: offset comp irq index in name by one
Michael S. Tsirkin (1):
vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE
Michal Luczaj (1):
af_unix: Fix garbage collector racing against connect()
Miguel Ojeda (1):
drm/msm: fix the `CRASHDUMP_READ` target of `a6xx_get_shader_block()`
Minda Chen (2):
net: stmmac: mmc_core: Add GMAC LPI statistics
net: stmmac: mmc_core: Add GMAC mmc tx/rx missing statistics
Ming Lei (2):
block: fix q->blkg_list corruption during disk rebind
block: allow device to have both virt_boundary_mask and max segment size
Namhyung Kim (1):
perf/x86: Fix out of range data
Nathan Chancellor (1):
selftests: kselftest: Mark functions that unconditionally call
exit() as __noreturn
NeilBrown (1):
ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
Nianyao Tang (1):
irqchip/gic-v3-its: Fix VSYNC referencing an unmapped VPE on GIC v4.1
Nicolas Dufresne (1):
media: mediatek: vcodec: Fix oops when HEVC init fails
Noah Loomans (1):
platform/chrome: cros_ec_uart: properly fix race condition
Nuno Das Neves (1):
mshyperv: Introduce hv_numa_node_to_pxm_info()
Oleg Nesterov (2):
selftests/timers/posix_timers: Reimplement check_timer_distribution()
selftests: kselftest: Fix build failure with NOLIBC
Patryk Wlazlyn (11):
tools/power turbostat: Print ucode revision only if valid
tools/power turbostat: Read base_hz and bclk from CPUID.16H if available
tools/power turbostat: Add --no-msr option
tools/power turbostat: Add --no-perf option
tools/power turbostat: Add reading aperf and mperf via perf API
tools/power turbostat: detect and disable unavailable BICs at runtime
tools/power turbostat: add early exits for permission checks
tools/power turbostat: Clear added counters when in no-msr mode
tools/power turbostat: Add proper re-initialization for perf
file descriptors
tools/power turbostat: read RAPL counters via perf
tools/power turbostat: Add selftests
Paulo Alcantara (2):
smb: client: fix NULL ptr deref in
cifs_mark_open_handles_for_deleted_file()
smb: client: instantiate when creating SFU files
Pavan Chebbi (1):
bnxt_en: Reset PTP tx_avail after possible firmware reset
Pavel Begunkov (1):
io_uring/net: restore msg_control on sendzc retry
Pawan Gupta (4):
x86/bhi: Add support for clearing branch history at syscall entry
x86/bhi: Enumerate Branch History Injection (BHI) bug
x86/bhi: Add BHI mitigation knob
x86/bhi: Mitigate KVM by default
Peng Liu (1):
tools/power turbostat: Fix Bzy_MHz documentation typo
Petr Tesarik (2):
swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
u64_stats: fix u64_stats_init() for lockdep when used repeatedly
in one file
Pierre Gondois (1):
firmware: arm_scmi: Fix wrong fastchannel initialization
Prasad Pandit (1):
tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
Raag Jadav (1):
ACPI: bus: allow _UID matching for integer zero
Rahul Rameshbabu (1):
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
Randy Dunlap (1):
LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
Rick Edgecombe (4):
Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
hv_netvsc: Don't free decrypted memory
uio_hv_generic: Don't free decrypted memory
Rik van Riel (1):
blk-iocost: avoid out of bounds shift
Samuel Holland (1):
cache: sifive_ccache: Partially convert to a platform driver
Sean Christopherson (1):
x86/cpu: Actually turn off mitigations by default for
SPECULATION_MITIGATIONS=n
Sebastian Andrzej Siewior (1):
locking: Make rwsem_assert_held_write_nolockdep() build with PREEMPT_RT=y
Shay Drory (2):
net/mlx5: E-switch, store eswitch pointer before registering devlink_param
net/mlx5: Register devlink first under devlink lock
Shradha Gupta (1):
hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format
Stephen Boyd (1):
drm/msm: Add newlines to some debug prints
Steve French (2):
smb3: fix Open files on server counter going negative
smb3: fix broken reconnect when password changing on the server
by allowing password rotation
Steven Rostedt (Google) (1):
ring-buffer: Only update pages_touched when a new page is touched
Sumeet Pawnikar (1):
platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support
Suraj Kandpal (1):
drm/i915/hdcp: Fix get remote hdcp capability function
Sven Eckelmann (1):
batman-adv: Avoid infinite loop trying to resize local TT
Tao Zhou (1):
drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
Tariq Toukan (1):
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
Thierry Reding (1):
gpu: host1x: Do not setup DMA for virtual devices
Thomas Bertschinger (1):
bcachefs: create debugfs dir for each btree
Thomas Gleixner (5):
timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
x86/topology: Don't update cpu_possible_map in topo_set_cpuids()
x86/cpu/amd: Make the CPUID 0x80000008 parser correct
x86/cpu/amd: Make the NODEID_MSR union actually work
x86/cpu/amd: Move TOPOEXT enablement into the topology parser
Thorsten Blum (3):
bcachefs: Rename struct field swap to prevent macro naming collision
compiler.h: Add missing quote in macro comment
zonefs: Use str_plural() to fix Coccinelle warning
Tim Harvey (2):
arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
Tim Huang (2):
drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
drm/amdgpu: fix incorrect number of active RBs for gfx11
Uwe Kleine-König (1):
MAINTAINERS: Drop Li Yang as their email address stopped working
Vasant Hegde (3):
iommu/amd: Fix possible irq lock inversion dependency issue
iommu/amd: Do not enable SNP when V2 page table is enabled
iommu/amd: Change log message severity
Vikas Gupta (2):
bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()
bnxt_en: Fix error recovery for RoCE ulp client
Ville Syrjälä (7):
drm/client: Fully protect modes[] with dev->mode_config.mutex
drm/i915/cdclk: Fix CDCLK programming order when pipes are active
drm/i915/cdclk: Fix voltage_level programming edge case
drm/i915/psr: Disable PSR when bigjoiner is used
drm/i915: Disable port sync when bigjoiner is used
drm/i915: Disable live M/N updates when using bigjoiner
drm/i915/vrr: Disable VRR when using bigjoiner
Wachowski, Karol (3):
accel/ivpu: Check return code of ipc->lock init
accel/ivpu: Fix PCI D0 state entry in resume
accel/ivpu: Improve clarity of MMU error messages
Wei Yang (3):
memblock tests: fix undefined reference to `early_pfn_to_nid'
memblock tests: fix undefined reference to `panic'
memblock tests: fix undefined reference to `BIT'
Wenjing Liu (1):
drm/amd/display: always reset ODM mode in context when adding first plane
Wyes Karny (1):
tools/power turbostat: Increase the limit for fd opened
Xiang Chen (2):
scsi: hisi_sas: Handle the NCQ error returned by D2H frame
scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
Xianting Tian (1):
vhost: correct misleading printing information
Xiubo Li (1):
ceph: switch to use cap_delay_lock for the unlink delay list
Xuchun Shang (1):
iommu/vt-d: Fix wrong use of pasid config
Yang Li (1):
eventfs: Fix kernel-doc comments to functions
Yifan Zhang (2):
drm/amdgpu: add smu 14.0.1 discovery support
drm/amdgpu: differentiate external rev id for gfx 11.5.0
Yu Kuai (2):
raid1: fix use-after-free for original bio in raid1_write_request()
block: fix that blk_time_get_ns() doesn't update time after schedule
Yunfei Dong (3):
media: mediatek: vcodec: adding lock to protect decoder context list
media: mediatek: vcodec: adding lock to protect encoder context list
media: mediatek: vcodec: support 36 bits physical address
Yuquan Wang (1):
cxl/mem: Fix for the index of Clear Event Record Handle
Zack Rusin (1):
drm/vmwgfx: Enable DMA mappings with SEV
Zhang Rui (6):
tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
tools/power/turbostat: Cache graphics sysfs path
tools/power/turbostat: Unify graphics sysfs snapshots
tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
tools/power/turbostat: Add support for new i915 sysfs knobs
tools/power/turbostat: Add support for Xe sysfs knobs
ZhenGuo Yin (1):
drm/amdgpu: clear set_q_mode_offs when VM changed
Zheng Yejian (1):
kprobes: Fix possible use-after-free issue on kprobe registration
Zhenhua Huang (1):
fs/proc: remove redundant comments from /proc/bootconfig
Zhigang Luo (1):
amd/amdkfd: sync all devices to wait all processes being evicted
Zhongwei (1):
drm/amd/display: Adjust dprefclk by down spread percentage.
lima1002 (1):
drm/amd/swsmu: Update smu v14.0.0 headers to be 14.0.1 compatible
shaoyunl (2):
drm/amdgpu : Add mes_log_enable to control mes log feature
drm/amdgpu : Increase the mes log buffer size as per new MES FW version
^ permalink raw reply [relevance 43%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-13 17:07 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-13 17:07 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin,
Alexander Viro, Jan Kara
On Sat, 13 Apr 2024 at 08:16, Christian Brauner <brauner@kernel.org> wrote:
>
> I think it should be ok to allow AT_EMPTY_PATH with NULL because
> userspace can detect whether the kernel allows that by passing
> AT_EMPTY_PATH with a NULL path argument and they would get an error back
> that would tell them that this kernel doesn't support NULL paths.
Yeah, it should return -1 / EFAULT on older kernels.
> I'd like to try a patch for this next week. It's a good opportunity to
> get into some of the more gritty details of this area.
>
> From a rough first glance most AT_EMPTY_PATH users should be covered by
> adapting getname_flags() accordingly.
>
> Imho, this could likely be done by introducing a single struct filename
> null_filename.
It's probably better to try to special-case it entirely.
See commit 9013c51c630a ("vfs: mostly undo glibc turning 'fstat()'
into 'fstatat(AT_EMPTY_PATH)'") and the numbers in there in
particular.
That still leaves performance on the table exactly because it has to
do that extra "get_user()" to check for an empty path, but it avoids
not only the pathname allocation, but also the setup for the pathname
walk.
If we had a NULL case there, I'd expect that fstatat() and fstat()
would perform the same (modulo a couple of instructions).
Of course, the performance of get_user() will vary depending on
microarchitecture. If you don't have SMAP, it's cheap. It's the
STAC/CLAC that is most of the cost, and the exact cost of those will
then depend on implementations - they *could* be much faster than they
are.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-12 17:43 95% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-12 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin,
Alexander Viro, Jan Kara
Side note: I'd really like to relax another unrelated AT_EMPTY_PATH
issue: we should just allow a NULL path for that case.
The requirement that you pass an actual empty string is insane. It's
wrong. And it adds a noticeable amount of expense to this path,
because just getting the single byte and looking it up is fairly
expensive.
This was more noticeable because glibc at one point (still?) did
newfstatat(6, "", buf, AT_EMPTY_PATH)
when it should have just done a simple "fstat()".
So there were (are?) a *LOT* of AT_EMPTY_PATH users, and they all do a
pointless "let's copy a string from user space".
And yes, I know exactly why AT_EMPTY_PATH exists: because POSIX
traditionally says that a path of "" has to return -ENOENT, not the
current working directory. So AT_EMPTY_PATH basically says "allow the
empty path for lookup".
But while it *allows* the empty path, it does't *force* it, so it
doesn't mean "avoid the lookup", and we really end up doing a lot of
extra work just for this case. Just the user string copy is a big deal
because of the whole overhead of accessing user space, but it's also
the whole "allocate memory for the path etc".
If we either said "a NULL path with AT_EMPTY_PATH means empty", or
even just added a new AT_NULL_PATH thing that means "path has to be
NULL, and it means the same as AT_EMPTY_PATH with an empty path", we'd
be able to avoid quite a bit of pointless work.
Linus
^ permalink raw reply [relevance 95%]
* Re: [GIT PULL] tracing: Fixes for v6.9
@ 2024-04-12 16:21 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 16:21 UTC (permalink / raw)
To: Randy Dunlap
Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
Andrew Morton, Arnd Bergmann, Prasad Pandit, Yang Li
On Fri, 12 Apr 2024 at 09:20, Randy Dunlap <rdunlap@infradead.org> wrote:
> >>
> >> Argh. What parser is this? We need to fix this craziness.
>
> something that fedora cares about.
> out-of-tree I expect.
Ok, that shit will now be broken immediately by me adding tabs to our
Kconfig file.
Because no, some out-of-tree garbage is not relevant, and if they
don't fix it out of tree, that's *their* problem, not ours.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Fixes for v6.9
@ 2024-04-12 16:20 99% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 16:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
Arnd Bergmann, Prasad Pandit, Yang Li
On Fri, 12 Apr 2024 at 09:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Note, the tab is here:
Yeah, yeah, I checked.
I also checked that the normal "make defconfig" does not care.
In fact, I'm seriously inclined to make sure that our main Kconfig
file has several tabs in several places, just to make damn sure that
any broken sh*t is fixed.
Because no, the fix is *not* to try to fix invisible problems in the
Kconfig files themselves.
I've pulled your thing, but any parsers that think tabs and spaces are
different need to either be fixed, or they need to be shunned.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Fixes for v6.9
@ 2024-04-12 16:07 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-12 16:07 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
Arnd Bergmann, Prasad Pandit, Yang Li
On Fri, 12 Apr 2024 at 07:29, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE
Argh. What parser is this? We need to fix this craziness.
Yes, yes, we have "tabs and spaces" issues due to the fundamental
brokenness of make, and we can't get rid of *that* bogosity.
But for our own Kconfig files? Whitespace is whitespace (ignoring
crazy unicode extensions), we need to get away from "tabs and spaces
act differently".
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-12 15:36 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-12 15:36 UTC (permalink / raw)
To: Christian Brauner
Cc: Charles Mirabile, Alexander Viro, Jan Kara, linux-fsdevel,
linux-kernel, Andrew Lutomirski, Peter Anvin
On Fri, 12 Apr 2024 at 00:46, Christian Brauner <brauner@kernel.org> wrote:
>
> Hm, I would like to avoid adding an exception for O_PATH.
Ack. It's not the important or really relevant part.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11 20:22 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 20:22 UTC (permalink / raw)
To: Charles Mirabile
Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
linux-kernel, Andrew Lutomirski, Peter Anvin
On Thu, 11 Apr 2024 at 13:08, Charles Mirabile <cmirabil@redhat.com> wrote:
>
> The problem with this is that another process might be able to access
> the file during via that name during the brief period before it is
> unlinked. If I am not using NFS, I am always going to prefer using
> O_TMPFILE. I would rather be able to do that without restriction even
> if it isn't the most robust solution by your definition.
Oh, absolutely. I think the right pattern is basically some variation of
fd = open(filename, O_TMPFILE | O_WRONLY, 0600);
if (fd < 0) {
char template{...] = ".tmpfileXXXXXX";
fd = mkstmp(template);
unlink(template);
}
.. now act on fd to initialize it ..
linkat(fd, "", AT_FDCWD, "finalname", AT_EMPTY_PATH);
which should work reasonably well in various environments.
Clearly O_TMPFILE is the superior option when it exists. I'm just
saying that anything that *relies* on it existing is dubious.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
2024-04-11 18:13 99% ` Linus Torvalds
@ 2024-04-11 19:34 89% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11 19:34 UTC (permalink / raw)
To: Charles Mirabile
Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
linux-kernel, Andrew Lutomirski, Peter Anvin
On Thu, 11 Apr 2024 at 11:13, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So while I understand your motivation, I actually think it's actively
> wrong to special-case __O_TMPFILE, because it encourages a pattern
> that is bad.
Just to clarify: I think the ns_capable() change is a good idea and
makes sense. The whole "limited to global root" makes no sense if the
file was opened within a namespace, and I think it always just came
from the better check not being obvious at the point where
AT_EMPTY_PATH was checked for.
Similarly, while the FMODE_PATH test _looks_ very similar to an
O_TMPFILE check, I think it's fundamentally different in a conceptual
sense: not only is FMODE_PATH filesystem-agnostic, a FMODE_PATH file
is *only* useful as a pathname (ie no read/write semantics).
And so if a FMODE_PATH file descriptor is passed in from the outside,
I feel like the "you cannot use this to create a path" is kind of a
fundamentally nonsensical rule.
IOW, whoever is passing that FMODE_PATH file descriptor around must
have actually thought about it, and must have opened it with O_PATH,
and it isn't useful for anything else than as a starting point for a
path lookup.
So while I don't think the __O_TMPFILE exception would necessarily be
wrong per se, I am afraid that it would result in people writing
convenient code that "appears to work" in testing, but then fails when
run in an environment where the directory is mounted over NFS (or any
other filesystem that doesn't do ->tmpfile()).
I am certainly open to be convinced otherwise, but I really think that
the real pattern to aim for should just be "look, I opened the file
myself, then filled in the detail, and now I'm doing a linkat() to
expose it" and that the real protection issue should be that "my
credentials are the same for open and linkat".
The other rules (ie the capability check or the FMODE_PATH case) would
be literally about situations where you *want* to pass things around
between protection domains.
In that context, the ns_capable() and the FMODE_PATH check make sense to me.
In contrast, the __O_TMPFILE check just feels like a random detail.
Hmm?
Anyway, end result of that is that this is what that part of the patch
looks like for me right now:
+ if (flags & LOOKUP_DFD_MATCH_CREDS) {
+ const struct cred *cred = f.file->f_cred;
+ if (!(f.file->f_mode & FMODE_PATH) &&
+ cred != current_cred() &&
+ !ns_capable(cred->user_ns, CAP_DAC_READ_SEARCH)) {
+ fdput(f);
+ return ERR_PTR(-ENOENT);
+ }
+ }
and that _seems_ sensible to me.
But yes, this all has been something that we have failed to do right
for at least a quarter of a century so far, so this needs a *lot* of
thought, even if the patch itself is rather small and looks relatively
obvious.
Linus
^ permalink raw reply [relevance 89%]
* Re: [GIT PULL] turbostat 2024.04.10
@ 2024-04-11 19:14 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 19:14 UTC (permalink / raw)
To: Len Brown; +Cc: Linux PM list, Linux Kernel Mailing List
On Thu, 11 Apr 2024 at 11:20, Len Brown <lenb@kernel.org> wrote:
>
> ISTR that once upon a time at the kernel summit you expressed a
> preference that things like utilities (which sometimes depend on merge
> window changes) come in after rc1 is declared to basically stay out of
> the way.
That may have been true at some point, but probably long ago - the
merge windows have been so reliable that it's just not an issue any
more.
So I'd rather see people hold to the normal release cycle, and aim to
have the rc releases for fixes or major problems.
We also used to allow entirely new drivers etc outside the release
cycle as a "this cannot regress" exception to the normal rules, but
that has also been largely abandoned as the release cycle is just
short enough that it makes no sense.
So the "new hardware support" rule has basically been watered down
over the years, and has become a "new hardware IDs are fine" kind of
rule, where just adding basically just a PCI ID or OF matching entry
or similar is still fine, but no more "whole new drivers".
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11 18:13 99% ` Linus Torvalds
2024-04-11 19:34 89% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 18:13 UTC (permalink / raw)
To: Charles Mirabile
Cc: Christian Brauner, Alexander Viro, Jan Kara, linux-fsdevel,
linux-kernel, Andrew Lutomirski, Peter Anvin
On Thu, 11 Apr 2024 at 10:35, Charles Mirabile <cmirabil@redhat.com> wrote:
>
> And a slightly dubious addition to bypass these checks for tmpfiles
> across the board.
Does this make sense?
I 100% agree that one of the primary reasons why people want flink()
is that "open tmpfile, finalize contents and permissions, then link
the final result into the filesystem".
But I would expect that the "same credentials as open" check is the
one that really matters.
And __O_TMPFILE is just a special case that might not even be used -
it's entirely possible to just do the same with a real file (ie
non-O_TMPFILE) and link it in place and remove the original.
Not to mention that ->tmpfile() isn't necessarily even available, so
the whole concept of "use O_TMPFILE and then linkat" is actually
broken. It *has* to be able to fall back to a regular file to work at
all on NFS.
So while I understand your motivation, I actually think it's actively
wrong to special-case __O_TMPFILE, because it encourages a pattern
that is bad.
Linus
^ permalink raw reply [relevance 99%]
* Re: [tip: locking/core] locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending()
@ 2024-04-11 16:31 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 16:31 UTC (permalink / raw)
To: linux-kernel
Cc: linux-tip-commits, Uros Bizjak, Ingo Molnar, Waiman Long, x86
On Thu, 11 Apr 2024 at 06:33, tip-bot2 for Uros Bizjak
<tip-bot2@linutronix.de> wrote:
>
> Use try_cmpxchg_acquire(*ptr, &old, new) instead of
> cmpxchg_relaxed(*ptr, old, new) == old in trylock_clear_pending().
The above commit message is horribly confusing and wrong.
I was going "that's not right", because it says "use acquire instead
of relaxed" memory ordering, and then goes on to say "No functional
change intended".
But it turns out the *code* was always acquire, and it's only the
commit message that is wrong, presumably due to a bit too much
cut-and-paste.
But please fix the commit message, and use the right memory ordering
in the explanations too.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11 16:21 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 16:21 UTC (permalink / raw)
To: Christian Brauner
Cc: Alexander Viro, Jan Kara, linux-fsdevel, linux-kernel,
Andrew Lutomirski, Peter Anvin
On Thu, 11 Apr 2024 at 05:25, Christian Brauner <brauner@kernel.org> wrote:
>
> Btw, I think we should try to avoid putting this into path_init() and
> confine this to linkat() itself imho. The way I tried to do it was by
> presetting a root for filename_lookup(); means we also don't need a
> LOOKUP_* flag for this as this is mostly a linkat thing.
So I had the exact reverse reaction to your patch - I felt that using
that 'root' thing was the hacky case.
The lookup flag may be limited to linkat(), but it makes the code
smaller and clearer, and avoids having multiple places where we check
dfd.
And that 'root' argument really is the special hacky case, and is not
actually used by any normal system call path, and is meant for
internal kernel use rather than any generic case.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11 16:15 93% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11 16:15 UTC (permalink / raw)
To: Christian Brauner, Charles Mirabile
Cc: Alexander Viro, Jan Kara, linux-fsdevel, linux-kernel,
Andrew Lutomirski, Peter Anvin
On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@kernel.org> wrote:
>
> I had a similar discussion a while back someone requested that we relax
> permissions so linkat can be used in containers.
Hmm.
Ok, that's different - it just wants root to be able to do it, but
"root" being just in the container itself.
I don't think that's all that useful - I think one of the issues with
linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root",
which means that it's effectively useless. Inside a container or out.
Because very few loads run as root-only (and fewer still run with any
capability bits that aren't just "root or nothing").
Before I did all this, I did a Debian code search for linkat with
AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly
because of this "when it's only useful for root, it's hardly useful at
all" issue.
(Of course, my Debian code search may have been broken).
So I suspect your special case is actually largely useless, and what
the container user actually wanted was what my patch does, but they
didn't think that was possible, so they asked to just extend the
"root" notion.
I've added Charles to the Cc.
But yes, with my patch, it would now be trivial to make that
capable(CAP_DAC_READ_SEARCH)
test also be
ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)
instead. I suspect not very many would care any more, but it does seem
conceptually sensible.
As to your patch - I don't like your nd->root games in that patch at
all. That looks odd.
Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue),
but it also makes lookup ignore "/". Which happens to be ok with an
empty path, but still...
So it feels to me like that patch of yours mis-uses something that is
just meant for vfs_path_lookup().
It may happen to work, but it smells really odd to me.
Linus
^ permalink raw reply [relevance 93%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
2024-04-11 0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
2024-04-11 0:20 99% ` Linus Torvalds
@ 2024-04-11 2:39 96% ` Linus Torvalds
2 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-11 2:39 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin
On Wed, 10 Apr 2024 at 17:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> + if (flags & LOOKUP_DFD_MATCH_CREDS) {
> + if (f.file->f_cred != current_cred() &&
> + !capable(CAP_DAC_READ_SEARCH)) {
> + fdput(f);
> + return ERR_PTR(-ENOENT);
> + }
> + }
Side note: I suspect that this could possibly be relaxed further, by
making the rule be that if something has been explicitly opened to be
used as a path (ie O_PATH was used at open time), we can link to it
even across different credentials.
IOW, the above could perhaps even be
+ if (flags & LOOKUP_DFD_MATCH_CREDS) {
+ if (!(f.file->f_mode & FMODE_PATH) &&
+ f.file->f_cred != current_cred() &&
+ !capable(CAP_DAC_READ_SEARCH)) {
+ fdput(f);
+ return ERR_PTR(-ENOENT);
+ }
+ }
which would _allow_ people to pass in paths as file descriptors if
they actually wanted to.
After all, the only thing you can do with an O_PATH file descriptor is
to use it as a path - there would be no other reason to use O_PATH in
the first place. So if you now pass it to somebody else, clearly you
are intentionally trying to make it available *as* a path.
So you could imagine doing something like this:
// Open path as root
int fd = open('filename", O_PATH);
// drop privileges
// setresuid(..) or chmod() or enter new namespace or whatever
linkat(fd, "", AT_FDCWD, "newname", AT_EMPTY_PATH);
and it would open the path with one set of privileges, but then
intentionally go into a more restricted mode and create a link to the
source within that restricted environment.
Sensible? Who knows. I'm just throwing this out as another "this may
be the solution to our historical flink() issues".
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
2024-04-11 0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
@ 2024-04-11 0:20 99% ` Linus Torvalds
2024-04-11 2:39 96% ` Linus Torvalds
2 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 0:20 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-kernel, Andrew Lutomirski, Peter Anvin
On Wed, 10 Apr 2024 at 17:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> "The definition of insanity is doing the same thing over and over
> again and expecting different results”
Note that I'm sending this patch out not because I plan to commit it,
but to see if people can shoot holes in the concept.
There's a reason why people have tried to do this for decades.
There's also a reason why it has not worked out well.
Linus
^ permalink raw reply [relevance 99%]
* [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
@ 2024-04-11 0:10 76% Linus Torvalds
2024-04-11 0:20 99% ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-04-11 0:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Andrew Lutomirski,
Peter Anvin
"The definition of insanity is doing the same thing over and over
again and expecting different results”
We've tried to do this before, most recently with commit bb2314b47996
("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a
decade ago.
But the effort goes back even further than that, eg this thread back
from 1998 that is so old that we don't even have it archived in lore:
https://lkml.org/lkml/1998/3/10/108
which also points out some of the reasons why it's dangerous.
Or, how about then in 2003:
https://lkml.org/lkml/2003/4/6/112
where we went through some of the same arguments, just wirh different
people involved.
In particular, having access to a file descriptor does not necessarily
mean that you have access to the path that was used for lookup, and
there may be very good reasons why you absolutely must not have access
to a path to said file.
For example, if we were passed a file descriptor from the outside into
some limited environment (think chroot, but also user namespaces etc) a
'flink()' system call could now make that file visible inside a context
where it's not supposed to be visible.
In the process the user may also be able to re-open it with permissions
that the original file descriptor did not have (eg a read-only file
descriptor may be associated with an underlying file that is writable).
Another variation on this is if somebody else (typically root) opens a
file in a directory that is not accessible to others, and passes the
file descriptor on as a read-only file. Again, the access to the file
descriptor does not imply that you should have access to a path to the
file in the filesystem.
So while we have tried this several times in the past, it never works.
The last time we did this, that commit bb2314b47996 quickly got reverted
again in commit f0cc6ffb8ce8 (Revert "fs: Allow unprivileged linkat(...,
AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once
the whole discussion about the interface is done".
Well, the discussion is long done, and didn't come to any resolution.
There's no question that 'flink()' would be a useful operation, but it's
a dangerous one.
However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED:
Use creds in file structs") we have had a fairly straightforward way to
check whether the file descriptor was opened by the same credentials as
the credentials of the flink().
That allows the most common patterns that people want to use, which tend
to be to either open the source carefully (ie using the openat2()
RESOLVE_xyz flags, and/or checking ownership with fstat() before
linking), or to use O_TMPFILE and fill in the file contents before it's
exposed to the world with linkat().
But it also means that if the file descriptor was opened by somebody
else, or we've gone through a credentials change since, the operation no
longer works (unless we have CAP_DAC_READ_SEARCH capabilities, as
before).
Note that the credential equality check is done by using pointer
equality, which means that it's not enough that you have effectively the
same user - they have to be literally identical, since our credentials
are using copy-on-write semantics.
So you can't change your credentials to something else and try to change
it back to the same ones between the open() and the linkat(). This is
not meant to be some kind of generic permission check, this is literally
meant as a "the open and link calls are 'atomic' wrt user credentials"
check.
It also means that you can't just move things between namespaces,
because the credentials aren't just a list of uid's and gid's: they
includes the pointer to the user_ns that the capabilities are relative
to.
So let's try this one more time and see if maybe this approach ends up
being workable after all.
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
fs/namei.c | 17 ++++++++++++-----
include/linux/namei.h | 1 +
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index c5b2a25be7d0..3c684014eb40 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2422,6 +2422,14 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
if (!f.file)
return ERR_PTR(-EBADF);
+ if (flags & LOOKUP_DFD_MATCH_CREDS) {
+ if (f.file->f_cred != current_cred() &&
+ !capable(CAP_DAC_READ_SEARCH)) {
+ fdput(f);
+ return ERR_PTR(-ENOENT);
+ }
+ }
+
dentry = f.file->f_path.dentry;
if (*s && unlikely(!d_can_lookup(dentry))) {
@@ -4641,14 +4649,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
goto out_putnames;
}
/*
- * To use null names we require CAP_DAC_READ_SEARCH
+ * To use null names we require CAP_DAC_READ_SEARCH or
+ * that the open-time creds of the dfd matches current.
* This ensures that not everyone will be able to create
* handlink using the passed filedescriptor.
*/
- if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) {
- error = -ENOENT;
- goto out_putnames;
- }
+ if (flags & AT_EMPTY_PATH)
+ how |= LOOKUP_DFD_MATCH_CREDS;
if (flags & AT_SYMLINK_FOLLOW)
how |= LOOKUP_FOLLOW;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 74e0cc14ebf8..678ffe4acf99 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -44,6 +44,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
#define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
#define LOOKUP_CACHED 0x200000 /* Only do cached lookup */
+#define LOOKUP_DFD_MATCH_CREDS 0x400000 /* Require that dfd creds match current */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
--
2.44.0.330.g4d18c88175
^ permalink raw reply related [relevance 76%]
* Re: [GIT PULL for v6.9-rc4] media fixes
@ 2024-04-10 20:53 98% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-10 20:53 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Greg Kroah-Hartman, Andrew Morton, Linux Media Mailing List,
Linux Kernel Mailing List
On Wed, 10 Apr 2024 at 09:39, Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>
> - some fixes causing oops on mediatec vcodec encoder/decoder.
Well, I certainly hope it's not the fixes that cause oopses. That
would be the opposite of a fix.
However, having fixed that, I also find some of the fixes in here
rather broken: commit d353c3c34af0 ("media: mediatek: vcodec: support
36 bits physical address") has a "fix" for a cast like this:
- dec->bs_dma = (unsigned long)bs->dma_addr;
+ dec->bs_dma = (uint64_t)bs->dma_addr;
but the underlying problem was in fact that the cast was WRONG TO EVEN EXIST.
Both 'bs_dma' and 'dma_addr' are integers. The cast is pointless and
wrong. It makes the code look like it is doing something else than
what it's doing, and that something else would be wrong anyway (ie if
it is a cast from a pointer, it would be doubly wrong).
IOW, as far as I can tell, the fix *should* have been to just remove
the cast entirely since it was pointless.
I've pulled this, but please people - make the pull request
description make sense, and when fixing bugs, please think about the
code a bit more than just do a mindless conversion.
Linus
^ permalink raw reply [relevance 98%]
* Re: [GIT PULL] turbostat 2024.04.10
@ 2024-04-10 20:18 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-10 20:18 UTC (permalink / raw)
To: Len Brown; +Cc: Linux PM list, Linux Kernel Mailing List
On Wed, 10 Apr 2024 at 06:24, Len Brown <lenb@kernel.org> wrote:
>
> Turbostat version 2024.04.10
Tssk. Things like this should still come in during the merge window
and preferably be in linux-next.
I have pulled this, since it's obviously just tooling (and the
maintainer file pattern update), but stil...
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg()
@ 2024-04-08 20:10 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 20:10 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-arch, linux-kernel, elver, akpm, tglx, peterz, dianders,
pmladek, Arnd Bergmann, Al Viro
On Mon, 8 Apr 2024 at 10:50, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> And get rid of manual truncation down to u8, etc. in there - the
> only reason for those is to avoid bogus warnings about constant
> truncation from sparse, and those are easy to avoid by turning
> that switch into conditional expression.
I support the use of the conditional, but why add the 16-bit case when
it turns out we don't want it after all?
Linus
^ permalink raw reply [relevance 99%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-04-08 20:05 85% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 20:05 UTC (permalink / raw)
To: Al Viro
Cc: Matthew Wilcox, Philipp Stanner, Kent Overstreet, Boqun Feng,
rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Mon, 8 Apr 2024 at 11:14, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, PA-RISC is no better - the same "fetch and replace with constant"
> kind of primitive as for sparc32, only the constant is (u32)0 instead
> of (u8)~0. And unlike sparc64, 64bit variant didn't get better.
Heh. The thing about PA-RISC is that it is actually *so* much worse
that it was never useful for an arithmetic type.
IOW, the fact that sparc used just a byte meant that the aotmic_t
hackery on sparc still gave us 24 useful bits in a 32-bit atomic_t.
So long ago, we used to have an arithmetic atomic_t that was 32-bit on
all sane architectures, but only had a 24-bit range on sparc.
And I know you know all this, I'm just explaining the horror for the audience.
On PA-RISC you couldn't do that horrendous trick, so parist just used
the "we use a hashed spinlock for all atomics", and "atomic_t" was a
regular full-sized integer type.
Anyway, the sparc 24-bit atomics were actually replaced by the PA-RISC
version back twenty years ago (almost to the day):
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=373f1583c5c5
and while we still had some left-over of that horror in the git tree
up until 2011 (until commit 348738afe530: "sparc32: drop unused
atomic24 support") we probably should have made the
"arch_atomic_xyz()" ops work on generic types rather than "atomic_t"
for a long long time, so that you could use them on other things than
"atomic_t" and friends.
You can see the casting horror here, for example:
include/asm-generic/bitops/atomic.h
where we do that cast from "volatile unsigned long *p" to
"atomic_long_t *" just to use the raw_atomic_long_xyz() operations.
It would make more sense if the raw atomics took that "native"
volatile unsigned long pointer directly.
(And here that "volatile" is not because it's necessary used as a
volatile - it is - but simply because it's the most permissive type of
pointer. You can see other places using "const volatile unsigned long"
pointers for the same reason: passing in a non-const or non-volatile
pointer is perfectly fine).
Linus
^ permalink raw reply [relevance 85%]
* Re: More annoying code generation by clang
2024-04-08 18:32 99% ` Linus Torvalds
@ 2024-04-08 19:42 77% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-08 19:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
the arch/x86 maintainers, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]
On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).
Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):
https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).
This version also has that output size fixed, and the commit message
talks about it.
This does *not* fix other inline asms to use "ASM_INPUT_G/RM".
I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.
I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.
Anyway, with gcc this generates
cmp (%rdx),%ebx; sbb %rax,%rax # _7->max_fds, fd, __mask
IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.
With clang, this generates
movl (%rcx), %eax
cmpl %eax, %edi
sbbq %rdi, %rdi
which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".
Linus
[-- Attachment #2: 0001-x86-improve-array_index_mask_nospec-code-generation.patch --]
[-- Type: text/x-patch, Size: 4554 bytes --]
From 7779d285040bab685296da2cd0afe9d2d7b58969 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 8 Apr 2024 11:38:30 -0700
Subject: [PATCH] x86: improve array_index_mask_nospec() code generation
Don't force the inputs to be 'unsigned long', when the comparison can
easily be done in 32-bit if that's more appropriate.
Note that while we can look at the inputs to choose an appropriate size
for the compare instruction, the output is fixed at 'unsigned long'.
That's not technically optimal either, since a 32-bit 'sbbl' would often
be sufficient.
But for the outgoing mask we don't know how the mask ends up being used
(ie we have uses thathave an incoming 32-bit array index, but end up
using the mask for other things). That said, it only costs the extra
REX prefix to always generate the 64-bit mask.
[ A 'sbbl' also always technically generates a 64-bit mask, but with the
upper 32 bits clear: that's fine for when the incoming index that will
be masked is already 32-bit, but not if you use the mask to mask a
pointer afterwards, like the file table lookup does ]
Also, work around clang problems with asm constraints that have multiple
possibilities, particularly "g" and "rm". Clang seems to turn inputs
like that into the most generic form, which is the memory input - but to
make matters worse, clang won't even use a possible original memory
location, but will spill the value to stack, and use the stack for the
asm input.
See
https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
for some explanation of why clang has this strange behavior, but the end
result is that "g" and "rm" really end up generating horrid code.
Link: https://github.com/llvm/llvm-project/issues/20571
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
arch/x86/include/asm/barrier.h | 24 ++++++++++--------------
include/linux/compiler-clang.h | 12 ++++++++++++
include/linux/compiler_types.h | 9 +++++++++
3 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..234fd892e39e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,16 @@
* Returns:
* 0 - (index < size)
*/
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
- unsigned long size)
-{
- unsigned long mask;
-
- asm volatile ("cmp %1,%2; sbb %0,%0;"
- :"=r" (mask)
- :"g"(size),"r" (index)
- :"cc");
- return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({ \
+ typeof((idx)+(sz)) __idx = (idx); \
+ typeof(__idx) __sz = (sz); \
+ unsigned long __mask; \
+ asm volatile ("cmp %1,%2; sbb %0,%0" \
+ :"=r" (__mask) \
+ :ASM_INPUT_G (__sz), \
+ "r" (__idx) \
+ :"cc"); \
+ __mask; })
/* Prevent speculative execution past this barrier. */
#define barrier_nospec() asm volatile("lfence":::"memory")
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 49feac0162a5..0dee061fd7a6 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -118,3 +118,15 @@
#define __diag_ignore_all(option, comment) \
__diag_clang(13, ignore, option)
+
+/*
+ * clang has horrible behavior with "g" or "rm" constraints for asm
+ * inputs, turning them into something worse than "m". Avoid using
+ * constraints with multiple possible uses (but "ir" seems to be ok):
+ *
+ * https://github.com/llvm/llvm-project/issues/20571
+ * https://github.com/llvm/llvm-project/issues/30873
+ * https://github.com/llvm/llvm-project/issues/34837
+ */
+#define ASM_INPUT_G "ir"
+#define ASM_INPUT_RM "r"
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 2abaa3a825a9..e53acd310545 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -380,6 +380,15 @@ struct ftrace_likely_data {
#define asm_goto_output(x...) asm volatile goto(x)
#endif
+/*
+ * Clang has trouble with constraints with multiple
+ * alternative behaviors (mainly "g" and "rm").
+ */
+#ifndef ASM_INPUT_G
+ #define ASM_INPUT_G "g"
+ #define ASM_INPUT_RM "rm"
+#endif
+
#ifdef CONFIG_CC_HAS_ASM_INLINE
#define asm_inline asm __inline
#else
--
2.44.0.330.g4d18c88175
^ permalink raw reply related [relevance 77%]
* Re: More annoying code generation by clang
@ 2024-04-08 18:32 99% ` Linus Torvalds
2024-04-08 19:42 77% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-08 18:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
the arch/x86 maintainers, Linux Kernel Mailing List
On Mon, 8 Apr 2024 at 01:49, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Should this not carry a comment about the "ir" constraint wanting to be
> "g" except for clang being daft?
Yeah. Except I think I'll do something like
/* Clang messes up "g" as an asm source */
#define ASM_SOURCE_G "ir"
in <linux/compiler-clang.h>, and
#ifndef ASM_SOURCE_G
#define ASM_SOURCE_G "g"
#endif
in linux/compiler.h.
> (I really wish clang would go fix this, it keeps coming up time and
> again).
It's been reported long ago, it seems to be hard to fix.
I suspect the issue is that the inline asm format is fairly closely
related to the gcc machine descriptions (look at the machine
descriptor files in gcc, and if you can ignore the horrid LISP-style
syntax you see how close they are).
And clang has a different model and needs to "translate" things, and
that one doesn't translate.
It's not like we don't have workarounds for gcc bugs in this area too
(eg "asm_goto_output()", née "asm_volatile_goto()").
There was another bug in my patch, though: the output mask should
always be "unsigned long", not tied to the input type.
Linus
^ permalink raw reply [relevance 99%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-04-08 17:01 75% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-08 17:01 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Philipp Stanner, Kent Overstreet, Boqun Feng, rust-for-linux,
linux-kernel, linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
linux-arm-kernel, linux-fsdevel
On Mon, 8 Apr 2024 at 09:02, Matthew Wilcox <willy@infradead.org> wrote:
>
> What annoys me is that 'volatile' accesses have (at least) two distinct
> meanings:
> - Make this access untorn
> - Prevent various optimisations (code motion,
> common-subexpression-elimination, ...)
Oh, I'm not at all trying to say that "volatile" is great.
My argument was that the C (and C++, and Rust) model of attaching
memory ordering to objects is actively bad. and limiting.
Because the whole "the access rules are context-dependent" is really
fundamental. Anybody who designs an atomic model around the object is
simply not doing it right.
Now, the "volatile" rules actually make sense in a historical
"hardware access" context. So I do not think "volatile" is great, but
I also don't think K&R were incompetent. "volatile" makes perfect
sense in the historical setting of "direct hardware access".
It just so happens that there weren't other tools, so then you end up
using "volatile" for cached memory too when you want to get "access
once" semantics, and then it isn't great.
And then you have *too* many tools on the standards bodies, and they
don't understand atomics, and don't understand volatile, and they have
been told that "volatile" isn't great for atomics because it doesn't
have memory ordering semantics, but do not understand the actual
problem space.
So those people - who in some cases spent decades arguing about (and
sometimes against) "volatile" think that despite all the problems, the
solution for atomics is to make the *same* mistake, and tie it to the
data and the type system, not the action.
Which is honestly just plain *stupid*. What made sense for 'volatile'
in a historical setting, absolutely does not make sense for atomics.
> As an example, folio_migrate_flags() (in mm/migrate.c):
>
> if (folio_test_error(folio))
> folio_set_error(newfolio);
> if (folio_test_referenced(folio))
> folio_set_referenced(newfolio);
> if (folio_test_uptodate(folio))
> folio_mark_uptodate(newfolio);
>
> ... which becomes...
[ individual load and store code generation removed ]
> In my ideal world, the compiler would turn this into:
>
> newfolio->flags |= folio->flags & MIGRATE_MASK;
Well, honestly, we should just write the code that way, and not expect
too much of the compiler.
We don't currently have a "generic atomic or" operation, but we
probably should have one.
For our own historical reasons, while we have a few generic atomic
operations: bit operations, cmpxchg, etc, most of our arithmetic and
logical ops all rely on a special "atomic_t" type (later extended with
"atomic_long_t").
The reason? The garbage that is legacy Sparc atomics.
Sparc historically basically didn't have any atomics outside of the
'test and set byte' one, so if you wanted an atomic counter thing, and
you cared about sparc, you had to play games with "some bits of the
counter are the atomic byte lock".
And we do not care about that Sparc horror any *more*, but we used to.
End result: instead of having "do atomic ops on a normal type" - which
would be a lot more powerful - we have this model of "do atomic ops on
atomic_t".
We could fix that now. Instead of having architectures define
arch_atomic_or(int i, atomic_t *v)
operations, we could - and should - make the 'arch' atomics be
arch_atomic_or(int i, unsigned int *v)
and then we'd still keep the "atomic_t" around for type safety
reasons, but code that just wants to act on an "int" (or a "long")
atomically could just do so.
But in your case, I don't think you actually need it:
> Part of that is us being dumb; folio_set_foo() should be __folio_set_foo()
> because this folio is newly allocated and nobody else can be messing
> with its flags word yet. I failed to spot that at the time I was doing
> the conversion from SetPageFoo to folio_set_foo.
This is part of my "context matters" rant and why I do *not* think
atomics should be tied to the object, but to the operation.
The compiler generally doesn't know the context rules (insert "some
languages do know in some cases" here), which is why we as programmers
should just use different operations when we do.
In this case, since it's a new folio that hasn't been exposed to
anybody, you should just have done exactly that kind of
newfolio->flags |= folio->flags & MIGRATE_MASK;
which we already do in the page initialization code when we know we
own the flags (set_page_zone, set_page_zone, set_page_section).
We've generally avoided doing this in general, though - even the buddy
allocator seldom does it. The only case of manual "I know I own the
flags" I know if (apart from the initialization itself) is
->flags &= ~PAGE_FLAGS_CHECK_AT_FREE;
...
->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
kinds of things at free/alloc time.
> But if the compiler people could give us something a little more
> granular than "scary volatile access disable everything", that would
> be nice. Also hard, because now you have to figure out what this new
> thing interacts with and when is it safe to do what.
I think it would be lovely to have some kind of "atomic access"
operations that the compiler could still combine when it can see that
"this is invisible at a cache access level".
But as things are now, we do have most of those in the kernel, and
what you ask for can either be done today, or could be done (like that
"arch_atomic_or()") with a bit of re-org.
Linus
^ permalink raw reply [relevance 75%]
* Linux 6.9-rc3
@ 2024-04-07 20:39 41% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-07 20:39 UTC (permalink / raw)
To: Linux Kernel Mailing List
Ok, so this rc3 looks a bit different than the usual ones, because
there's a large series to bcachefs to do filesystem repair after
corruption. Not normally something we'd see in an rc kernel, but hey,
if you had a corrupted bcachefs filesystem you'd probably want this,
and if you thought bcachefs was stable already, I have a bridge to
sell you. Special deal only for you, real cheap.
The bcachefs part is a bit over a third of the patch, and if you
ignore that part, things look fairly normal, although there's perhaps
a bit more sound SoC noise than is common.
So the rest is mostly drivers (already mentioned sound, but also
networking and gpu), architecture fixes (mainly x86 and s390, some
arm64), some other filesystem noise (mainly smb client), some selftest
updates, and a random smattering elsewhere.
It's not really all that big, although the bcachefs changes do make it
bigger than typical for an rc3.
Shortlog appended, please keep testing,
Linus
---
Adam Goldman (1):
firewire: ohci: mask bus reset interrupts between ISR and bottom half
Aleksandr Loktionov (2):
i40e: fix i40e_count_filters() to count only active/new filters
i40e: fix vf may be used uninitialized in this function warning
Aleksandr Mishin (2):
net: phy: micrel: Fix potential null pointer dereference
octeontx2-af: Add array index check
Alexandre Ghiti (2):
riscv: Fix warning by declaring arch_cpu_idle() as noinstr
riscv: Disable preemption when using patch_map()
Alexey Makhalov (1):
MAINTAINERS: change vmware.com addresses to broadcom.com
Amadeusz Sławiński (1):
ASoC: Intel: avs: boards: Add modules description
Andi Shyti (4):
drm/i915/gt: Limit the reserved VM space to only the platforms
that need it
drm/i915/gt: Disable HW load balancing for CCS
drm/i915/gt: Do not generate the command streamer for all the CCS
drm/i915/gt: Enable only one CCS for compute workload
Andreas Schwab (1):
riscv: use KERN_INFO in do_trap
Andrey Albershteyn (1):
xfs: allow cross-linking special files without project quota
Andrii Nakryiko (2):
bpf: put uprobe link's path and task in release callback
bpf: support deferring bpf_link dealloc to after RCU grace period
André Apitzsch (1):
regulator: tps65132: Add of_match table
Ankit Nautiyal (1):
drm/i915/dp: Fix the computation for compressed_bpp for DISPLAY < 13
Anna-Maria Behnsen (1):
timers/migration: Return early on deactivation
Antoine Tenart (5):
udp: do not accept non-tunnel GSO skbs landing in a tunnel
gro: fix ownership transfer
udp: do not transition UDP GRO fraglist partial checksums to unnecessary
udp: prevent local UDP tunnel packets from being GROed
selftests: net: gro fwd: update vxlan GRO test expectations
Anton Protopopov (1):
bpf: fix possible file descriptor leaks in verifier
Anup Patel (2):
RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
Arnd Bergmann (6):
ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
scsi: mylex: Fix sysfs buffer lengths
vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h
i2c: pxa: hide unused icr_bits[] variable
ata: sata_mv: Fix PCI device ID table declaration compilation warning
x86/numa/32: Include missing <asm/pgtable_areas.h>
Arun R Murthy (1):
drm/i915/dp: Remove support for UHBR13.5
Ashish Kalra (1):
KVM: SVM: Add support for allowing zero SEV ASIDs
Atlas Yu (1):
r8169: skip DASH fw status checks when DASH is disabled
Bartosz Golaszewski (1):
gpio: cdev: check for NULL labels when sanitizing them for irqs
Bastien Nocera (1):
Bluetooth: Fix TOCTOU in HCI debugfs implementation
Björn Töpel (1):
riscv: Fix vector state restore in rt_sigreturn()
Borislav Petkov (AMD) (6):
x86/retpoline: Do the necessary fixup to the Zen3/4 srso return
thunk for !SRSO
x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM
x86/cc: Add cc_platform_set/_clear() helpers
x86/CPU/AMD: Track SNP host status with cc_platform_*()
x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk
Brendan Jackman (1):
Documentation: dev-tools: Add link to RV docs
Carlos Song (1):
spi: spi-fsl-lpspi: remove redundant spi_controller_put call
Chaitanya Kumar Borah (1):
ASoC: SOF: Core: Add remove_late() to sof_init_environment failure path
Charles Keepax (1):
ASoC: cs42l43: Correct extraction of data pointer in suspend/resume
Chen Ni (1):
ata: sata_gemini: Check clk_enable() result
Chengming Zhou (1):
9p: remove SLAB_MEM_SPREAD flag usage
Christian Bendiksen (1):
ALSA: hda/realtek: Add sound quirks for Lenovo Legion slim 7
16ARHA7 models
Christian Brauner (3):
block: handle BLK_OPEN_RESTRICT_WRITES correctly
block: count BLK_OPEN_RESTRICT_WRITES openers
fs,block: yield devices early
Christian Göttsche (1):
selinux: avoid dereference of garbage after mount failure
Christian Hewitt (1):
drm/panfrost: fix power transition timeout warnings
Christoffer Sandberg (1):
ALSA: hda/realtek - Fix inactive headset mic jack
Christoph Hellwig (3):
nvme-multipath: don't inherit LBA-related fields for the multipath node
nvme: split nvme_update_zone_info
nvme: don't create a multipath node for zero capacity devices
Christophe JAILLET (4):
ata: ahci_st: Remove an unused field in struct st_ahci_drv_data
vboxsf: Avoid an spurious warning if load_nls_xxx() fails
vboxsf: Remove usage of the deprecated ida_simple_xx() API
net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
Chuck Lever (1):
SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
Colin Ian King (4):
KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
RISC-V: KVM: Remove second semicolon
drm/nouveau/gr/gf100: Remove second semicolon
vboxsf: remove redundant variable out_len
Damien Le Moal (1):
nullblk: Fix cleanup order in null_add_dev() error path
Dan Carpenter (1):
ice: Fix freeing uninitialized pointers
Daniel Wagner (2):
nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists
nvme-fc: rename free_ctrl callback to match name pattern
Dave Airlie (1):
nouveau/uvmm: fix addr/range calcs for remap operations
David Hildenbrand (2):
mm/secretmem: fix GUP-fast succeeding on secretmem folios
x86/mm/pat: fix VM_PAT handling in COW mappings
David Howells (1):
cifs: Fix caching to try to do open O_WRONLY as rdwr on server
David Thompson (1):
mlxbf_gige: stop interface during shutdown
Davide Caratti (1):
mptcp: don't account accept() of non-MPC client as fallback to TCP
Dominique Martinet (1):
9p: Fix read/write debug statements to report server reply
Donald Hunter (1):
docs: Fix bitfield handling in kernel-doc
Duanqiang Wen (1):
net: txgbe: fix i2c dev name cannot match clkdev
Duoming Zhou (1):
ax25: fix use-after-free bugs caused by ax25_ds_del_timer
Edward Liaw (1):
selftests/mm: include strings.h for ffsl
Eric Dumazet (5):
net: do not consume a cacheline for system_page_pool
erspan: make sure erspan_base_hdr is present in skb->head
net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
net/sched: act_skbmod: prevent kernel-infoleak
netfilter: validate user input for expected length
Frederic Weisbecker (1):
timers/migration: Fix ignored event due to missing CPU update
Geliang Tang (1):
selftests: mptcp: join: fix dev in check_endpoint
Gergo Koteles (1):
ASoC: tas2781: mark dvc_tlv with __maybe_unused
Guenter Roeck (2):
mean_and_variance: Drop always failing tests
nios2: Only use built-in devicetree blob if configured to do so
Haiyang Zhang (1):
net: mana: Fix Rx DMA datasize and skb_over_panic
Hannes Reinecke (1):
nvmet: implement unique discovery NQN
Hans de Goede (1):
gpiolib: Fix triggering "kobject: 'gpiochipX' is not
initialized, yet" kobject_get() errors
Hariprasad Kelam (1):
octeontx2-af: Fix issue with loading coalesced KPU profiles
Heiko Carstens (1):
s390/mm: fix NULL pointer dereference
Heiner Kallweit (1):
r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
Herve Codina (2):
driver core: Introduce device_link_wait_removal()
of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
Hongbo Li (1):
bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc
Horatiu Vultur (1):
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
Huai-Yuan Liu (1):
spi: mchp-pci1xxx: Fix a possible null pointer dereference in
pci1xxx_spi_probe
Hui Wang (1):
Bluetooth: hci_event: set the conn encrypted before conn establishes
I Gede Agastya Darma Laksana (1):
ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support
headset with microphone
Ilya Leoshkevich (2):
s390/atomic: mark all functions __always_inline
s390/preempt: mark all functions __always_inline
Imre Deak (1):
drm/i915/dp: Fix DSC state HW readout for SST connectors
Ivan Vecera (2):
i40e: Enforce software interrupt during busy-poll exit
i40e: Fix VF MAC filter removal
Jaewon Kim (1):
spi: s3c64xx: Use DMA mode from fifo size
Jakub Kicinski (1):
selftests: reuseaddr_conflict: add missing new line at the end
of the output
Jakub Sitnicki (1):
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
Jason A. Donenfeld (1):
x86/coco: Require seeding RNG with RDRAND on CoCo systems
Jeff Layton (2):
vboxsf: explicitly deny setlease attempts
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
Jens Axboe (7):
io_uring/rw: don't allow multishot reads without NOWAIT support
io_uring: disable io-wq execution of multishot NOWAIT requests
io_uring: use private workqueue for exit work
io_uring/kbuf: get rid of lower BGID lists
io_uring/kbuf: get rid of bl->is_ready
io_uring/kbuf: protect io_buffer_list teardown with a reference
io_uring/kbuf: hold io_buffer_list reference over mmap
Jesper Dangaard Brouer (1):
xen-netfront: Add missing skb_mark_for_recycle
Jisheng Zhang (1):
riscv: mm: implement pgprot_nx
Joan Bruguera Micó (1):
x86/bpf: Fix IP for relocating call depth accounting
Johan Hovold (5):
Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
dt-bindings: bluetooth: add 'qcom,local-bd-address-broken'
arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
Bluetooth: add quirk for broken address properties
Bluetooth: qca: fix device-address endianness
John Sperbeck (1):
init: open output files from cpio unpacking with O_LARGEFILE
Jose Ignacio Tornos Martinez (1):
net: usb: ax88179_178a: avoid the interface always configured as
random address
Joshua Hay (1):
idpf: fix kernel panic on unknown packet types
Jouni Högander (3):
drm/i915/psr: Calculate PIPE_SRCSZ_ERLY_TPT value
drm/i915/psr: Move writing early transport pipe src
drm/i915/psr: Fix intel_psr2_sel_fetch_et_alignment usage
Justin Stitt (1):
smb: client: replace deprecated strncpy with strscpy
Kan Liang (1):
perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS event
Kent Gibson (1):
gpio: cdev: fix missed label sanitizing in debounce_setup()
Kent Overstreet (48):
bcachefs: Fix assert in bch2_backpointer_invalid()
bcachefs: Fix journal pins in btree write buffer
bcachefs: fix mount error path
bcachefs: Add an assertion for trying to evict btree root
bcachefs: Move snapshot table size to struct snapshot_table
bcachefs: Add checks for invalid snapshot IDs
bcachefs: Don't do extent merging before journal replay is finished
bcachefs: btree_and_journal_iter now respects
trans->journal_replay_not_finished
bcachefs: Be careful about btree node splits during journal replay
bcachefs: Improved topology repair checks
bcachefs: Check btree ptr min_key in .invalid
bcachefs: Fix btree node keys accounting in topology repair path
bcachefs: Fix use after free in bch2_check_fix_ptrs()
bcachefs: Fix repair path for missing indirect extents
bcachefs: Fix use after free in check_root_trans()
bcachefs: Kill bch2_bkey_ptr_data_type()
bcachefs: Fix bch2_btree_increase_depth()
bcachefs: fix backpointer for missing alloc key msg
bcachefs: Split out recovery_passes.c
bcachefs: Add error messages to logged ops fns
bcachefs: Resume logged ops after fsck
bcachefs: Flush journal immediately after replay if we did early repair
bcachefs: Ensure bch_sb_field_ext always exists
bcachefs: bch2_run_explicit_recovery_pass_persistent()
bcachefs: Improve -o norecovery; opts.recovery_pass_limit
bcachefs: Logged op errors should be ignored
bcachefs: Fix remove_dirent()
bcachefs: Fix overlapping extent repair
bcachefs: On emergency shutdown, print out current journal sequence number
bcachefs: Fix btree node reserve
bcachefs: BCH_WATERMARK_interior_updates
bcachefs: fix nocow lock deadlock
bcachefs: Improve bch2_btree_update_to_text()
bcachefs: Check for bad needs_discard before doing discard
bcachefs: ratelimit informational fsck errors
bcachefs: Clear recovery_passes_required as they complete without errors
bcachefs: bch2_shoot_down_journal_keys()
bcachefs: Etyzinger cleanups
bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
bcachefs: Don't skip fake btree roots in fsck
bcachefs: Repair pass for scanning for btree nodes
bcachefs: Topology repair now uses nodes found by scanning to fill holes
bcachefs: Flag btrees with missing data
bcachefs: Reconstruct missing snapshot nodes
bcachefs: Check for extents that point to same space
bcachefs: Subvolume reconstruction
bcachefs: reconstruct_inode()
aio: Fix null ptr deref in aio_complete() wakeup
Krzysztof Kozlowski (11):
docs: dt-bindings: add missing address/size-cells to example
dt-bindings: ufs: qcom: document SC8180X UFS
dt-bindings: ufs: qcom: document SC7180 UFS
dt-bindings: ufs: qcom: document SM6125 UFS
ptp: MAINTAINERS: drop Jeff Sipek
ata: pata_macio: drop driver owner assignment
dt-bindings: clock: keystone: remove unstable remark
dt-bindings: clock: ti: remove unstable remark
dt-bindings: remoteproc: ti,davinci: remove unstable remark
dt-bindings: soc: fsl: narrow regex for unit address to hex numbers
dt-bindings: timer: narrow regex for unit address to hex numbers
Kuniyuki Iwashima (9):
tcp: Fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses.
tcp: Fix bind() regression for v6-only wildcard and
v4(-mapped-v6) non-wildcard addresses.
selftest: tcp: Make bind() selftest flexible.
selftest: tcp: Define the reverse order bind() tests explicitly.
selftest: tcp: Add v4-v4 and v6-v6 bind() conflict tests.
selftest: tcp: Add more bind() calls.
selftest: tcp: Add bind() tests for IPV6_V6ONLY.
selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT.
ipv6: Fix infinite recursion in fib6_dump_done().
Li Nan (2):
scsi: sd: Unregister device if device_add_disk() failed in sd_probe()
block: fix overflow in blk_ioctl_discard()
Linus Torvalds (1):
Linux 6.9-rc3
Luiz Augusto von Dentz (1):
Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
Lukasz Majewski (1):
net: hsr: Use full string description when opening HSR network device
Luke D. Jones (1):
ALSA: hda/realtek: cs35l41: Support ASUS ROG G634JYR
Mahmoud Adam (1):
net/rds: fix possible cp null dereference
Marc Zyngier (2):
arm64: Fix early handling of FEAT_E2H0 not being implemented
KVM: arm64: Rationalise KVM banner output
Marco Pinna (1):
vsock/virtio: fix packet delivery to tap device
Mark Brown (1):
arm64/ptrace: Use saved floating point state type to determine SVE layout
Masahiro Yamada (2):
riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/
riscv: compat_vdso: align VDSOAS build log
Matthew Brost (1):
drm/xe: Use ordered wq for preempt fence waiting
Michael Krummsdorf (1):
net: dsa: mv88e6xxx: fix usable ports on 88e6020
Namjae Jeon (3):
ksmbd: don't send oplock break if rename fails
ksmbd: validate payload size in ipc response
ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1
Natanael Copa (1):
tools/resolve_btfids: fix build with musl libc
Nikita Kiryushin (1):
tg3: Remove residual error handling in tg3_suspend
Nikita Travkin (2):
thermal: gov_power_allocator: Allow binding without cooling devices
thermal: gov_power_allocator: Allow binding without trip points
Oleksandr Natalenko (1):
drm/display: fix typo
Oliver Upton (1):
KVM: arm64: Fix host-programmed guest events in nVHE
Oswald Buddenhagen (1):
Revert "ALSA: emu10k1: fix synthesizer sample playback position
and caching"
Pablo Neira Ayuso (5):
netfilter: nf_tables: release batch on table validation from abort path
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: discard table flag update with pending
basechain deletion
Paolo Abeni (2):
mptcp: prevent BPF accessing lowat from a subflow socket.
Revert "tg3: Remove residual error handling in tg3_suspend"
Paolo Bonzini (3):
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
Documentation: kvm/sev: separate description of firmware
Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
Paul Barker (2):
net: ravb: Always process TX descriptor ring
net: ravb: Always update error counters
Paulo Alcantara (14):
smb: client: fix UAF in smb2_reconnect_server()
smb: client: guarantee refcounted children from parent session
smb: client: refresh referral without acquiring refpath_lock
smb: client: handle DFS tcons in cifs_construct_tcon()
smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex
smb: client: fix potential UAF in cifs_debug_files_proc_show()
smb: client: fix potential UAF in cifs_dump_full_key()
smb: client: fix potential UAF in cifs_stats_proc_write()
smb: client: fix potential UAF in cifs_stats_proc_show()
smb: client: fix potential UAF in smb2_is_valid_lease_break()
smb: client: fix potential UAF in smb2_is_valid_oplock_break()
smb: client: fix potential UAF in is_valid_oplock_break()
smb: client: fix potential UAF in smb2_is_network_name_deleted()
smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
Peter Collingbourne (1):
stackdepot: rename pool_index to pool_index_plus_1
Peter Ujfalusi (19):
ASoC: SOF: Add dsp_max_burst_size_in_ms member to snd_sof_pcm_stream
ASoC: SOF: ipc4-topology: Save the DMA maximum burst size for PCMs
ASoC: SOF: Intel: hda-pcm: Use dsp_max_burst_size_in_ms to place
constraint
ASoC: SOF: Intel: hda: Implement get_stream_position (Linear
Link Position)
ASoC: SOF: Intel: mtl/lnl: Use the generic get_stream_position callback
ASoC: SOF: Introduce a new callback pair to be used for PCM
delay reporting
ASoC: SOF: Intel: Set the dai/host get frame/byte counter callbacks
ASoC: SOF: ipc4-pcm: Use the snd_sof_pcm_get_dai_frame_counter()
for pcm_delay
ASoC: SOF: Intel: hda-common-ops: Do not set the
get_stream_position callback
ASoC: SOF: Remove the get_stream_position callback
ASoC: SOF: ipc4-pcm: Move struct sof_ipc4_timestamp_info
definition locally
ASoC: SOF: ipc4-pcm: Combine the SOF_IPC4_PIPE_PAUSED cases in pcm_trigger
ASoC: SOF: ipc4-pcm: Invalidate the stream_start_offset in PAUSED state
ASoC: SOF: sof-pcm: Add pointer callback to sof_ipc_pcm_ops
ASoC: SOF: ipc4-pcm: Correct the delay calculation
ALSA: hda: Add pplcllpl/u members to hdac_ext_stream
ASoC: SOF: Intel: hda: Compensate LLP in case it is not reset
ASoC: SOF: Intel: hda-dsp: Skip IMR boot on ACE platforms in
case of S3 suspend
ASoC: SOF: Intel: lnl: Disable DMIC/SSP offload on remove
Peter Wang (2):
scsi: ufs: core: WLUN suspend dev/link state error recovery
scsi: ufs: core: Fix MCQ mode dev command timeout
Petr Oros (1):
ice: fix enabling RX VLAN filtering
Phil Elwell (1):
net: bcmgenet: Reset RBUF on first open
Pierre-Louis Bossart (6):
ASoC: rt5682-sdw: fix locking sequence
ASoC: rt711-sdca: fix locking sequence
ASoC: rt711-sdw: fix locking sequence
ASoC: rt712-sdca-sdw: fix locking sequence
ASoC: rt722-sdca-sdw: fix locking sequence
ASoC: rt-sdw*: add __func__ to all error logs
Piotr Wejman (1):
net: stmmac: fix rx queue priority assignment
Pu Lehui (1):
drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supported
Rander Wang (1):
ASoC: SOF: mtrace: rework mtrace timestamp setting
Randy Dunlap (7):
9p/trans_fd: remove Excess kernel-doc comment
time/timecounter: Fix inline documentation
time/timekeeping: Fix kernel-doc warnings and typos
timers: Fix kernel-doc format and add Return values
tick/sched: Fix various kernel-doc warnings
tick/sched: Fix struct tick_sched doc warnings
timers: Fix text inconsistencies and spelling
Reinette Chatre (1):
x86/resctrl: Fix uninitialized memory read when last CPU of
domain goes offline
Richard Fitzgerald (3):
ASoC: wm_adsp: Fix missing mutex_lock in wm_adsp_write_ctl()
regmap: maple: Fix cache corruption in regcache_maple_drop()
regmap: maple: Fix uninitialized symbol 'ret' warnings
Ritvik Budhiraja (1):
smb3: retrying on failed server close
Rob Clark (1):
drm/prime: Unbreak virtgpu dma-buf export
Rob Herring (1):
MAINTAINERS: Add TPM DT bindings to TPM maintainers
Roberto Sassu (1):
security: Place security_path_post_mknod() where the original IMA call was
Sami Tolvanen (1):
riscv: Mark __se_sys_* functions __used
Samuel Holland (2):
riscv: mm: Fix prototype to avoid discarding const
riscv: Fix spurious errors from __get/put_kernel_nofault
Sean Christopherson (5):
KVM: SVM: Set sev->asid in sev_asid_new() instead of overloading
the return
KVM: SVM: Use unsigned integers when dealing with ASIDs
KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init
SEV/SEV-ES
KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's
arch timer test
x86/cpufeatures: Add CPUID_LNX_5 to track recently added
Linux-defined word
Sergey Shtylyov (1):
of: module: prevent NULL pointer dereference in vsnprintf()
Simon Trimmer (3):
ASoC: cs-amp-lib: Check for no firmware controls when writing calibration
ALSA: hda: cs35l56: Add ACPI device match tables
ALSA: hda/realtek: Add quirks for ASUS Laptops using CS35L56
Stefan O'Rear (1):
riscv: process: Fix kernel gp leakage
Stephen Horvath (1):
ACPI: thermal: Register thermal zones without valid trip points
Stephen Lee (1):
ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
Su Hui (1):
octeontx2-pf: check negative error code in otx2_open()
Sumanth Korikkar (1):
s390/entry: align system call table on 8 bytes
Takashi Iwai (1):
ALSA: line6: Zero-initialize message buffers
Tariq Toukan (1):
MAINTAINERS: mlx5: Add Tariq Toukan
Thomas Bertschinger (1):
bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()
Thomas Hellström (4):
drm/xe: Use ring ops TLB invalidation for rebinds
drm/xe: Rework rebinding
drm/xe: Make TLB invalidation fences unordered
drm/xe: Move vma rebinding to the drm_exec locking loop
Thomas Richter (1):
s390/pai: fix sampling event removal for PMU device driver
Uladzislau Rezki (Sony) (2):
mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
mm: vmalloc: fix lockdep warning
Uros Bizjak (1):
x86/bpf: Fix IP after emitting call depth accounting
Uwe Kleine-König (2):
pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()
OSS: dmasound/paula: Mark driver struct with __refdata to
prevent section mismatch
Victor Isaev (1):
RISC-V: Update AT_VECTOR_SIZE_ARCH for new AT_MINSIGSTKSZ
Vijendar Mukunda (3):
ASoC: amd: acp: fix for acp pdm configuration check
ASoC: amd: acp: fix for acp_init function error handling
ASoC: SOF: amd: fix for false dsp interrupts
Ville Syrjälä (2):
drm/i915/mst: Limit MST+DSC to TGL+
drm/i915/mst: Reject FEC+MST on ICL
Vincent Guittot (1):
PM: EM: fix wrong utilization estimation in em_cpu_energy()
Vitaly Chikunov (1):
tracing: Fix documentation on tp_printk cmdline option
Vitaly Kuznetsov (3):
KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting
is disabled
Vitaly Lifshits (2):
e1000e: Workaround for sporadic MDI error on Meteor Lake systems
e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue
Vladimir Isaev (1):
riscv: hwprobe: do not produce frtace relocation
Wei Fang (1):
net: fec: Set mac_managed_pm during probe
Weiji Wang (1):
docs: zswap: fix shell command format
Will Deacon (4):
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
KVM: arm64: Ensure target address is granule-aligned for range TLBI
William Tu (1):
Documentation: Add documentation for eswitch attribute
Wujie Duan (1):
KVM: arm64: Fix out-of-IPA space translation fault handling
Xiaoyao Li (2):
x86/kvm: Use separate percpu variable to track the enabling of asyncpf
KVM: x86: Improve documentation of MSR_KVM_ASYNC_PF_EN
Yihang Li (1):
scsi: libsas: Align SMP request allocation to ARCH_DMA_MINALIGN
Zhang Yi (4):
ASoC: codecs: ES8326: Solve error interruption issue
ASoC: codecs: ES8326: modify clock table
ASoC: codecs: ES8326: Solve a headphone detection issue after
suspend and resume
ASoC: codecs: ES8326: Removing the control of ADC_SCALE
Ziyang Xuan (1):
netfilter: nf_tables: Fix potential data-race in
__nft_flowtable_type_get()
zhuxiaohui (1):
bcachefs: add REQ_SYNC and REQ_IDLE in write dio
^ permalink raw reply [relevance 41%]
* Re: More annoying code generation by clang
2024-04-06 15:39 99% ` Linus Torvalds
@ 2024-04-06 16:04 87% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-06 16:04 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
Linux Kernel Mailing List
On Sat, 6 Apr 2024 at 08:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Because this code actually requires a data-depencency and not a
> control dependency as a correctness issue because of Spectre-v1.
Just to clarify: our comments in this code are maybe a bit odd,
because our comments are not about the Spectre-v1 issue (which
predates a rewrite) and more about the odd RCU pattern and conditional
avoidance we use here:
unsigned long nospec_mask;
/* Mask is a 0 for invalid fd's, ~0 for valid ones */
nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
/*
* fdentry points to the 'fd' offset, or fdt->fd[0].
* Loading from fdt->fd[0] is always safe, because the
* array always exists.
*/
fdentry = fdt->fd + (fd & nospec_mask);
/* Do the load, then mask any invalid result */
file = rcu_dereference_raw(*fdentry);
where *normally* (if RCU wasn't an issue) we'd just write this as
file = fdt->fd[array_index_nospec(fd, fdt->max_fds)];
where the key part is that "nospec" array indexing that will not
speculatively access the array past the "max_fds".
IOW, the code naively would want to do just
if (fd < fdt->max_fds) {
file = fdt->fd[fd];
...
but we need to make sure that it can't be fooled into using a branch
mispredict and use a user-controlled index ("fd") to speculatively
access the array with an arbitrary index and then leak unrelated data
through some side channel (mostly cache access).
And while the normal pattern doesn't expose the mask generation and
just hides that mask in that simpler "array_index_nospec()" macro,
this code actually ends up using the same mask *twice*, because it
will later end up doing this hack:
file = (void *)(nospec_mask & (unsigned long)file);
if (unlikely(!file))
return NULL;
to have just one single conditional at the end (ie we may have loaded
a non-NULL file pointer from fdt->fd[0] because an invalid index got
masked down to a zero index, and the second masking will mask away
that pointer and make it NULL because we're bad people and we know
that NULL is "bitpattern 0" and we care about the code working, not
about some unreal "NULL could be anything else" thing.
End result: this code that is just a few lines long and has more
comments than code, and generates only a handful of instruction is
fairly subtle but also fairly important both for hardware security
issues and for performance.
See commit 253ca8678d30 ("Improve __fget_files_rcu() code generation
(and thus __fget_light())" that actually started doing this "use mask
twice", and realize that that commit is what this performance
regression report is talking about:
https://lore.kernel.org/all/ZWQ+LEcfFFi4YOAU@xsang-OptiPlex-9020/
ie that whole "use masks and avoid doing the obvious thing" may be a
bit subtle, but it's what turned a 2.9% performance regression into a
3.4% improvement.
(Ok, those performance numbers are on just one random microbenchmark
and don't really matter, so take that with a pinch of salt, but if you
care about a _lot_ of random benchmarks, eventually you get good
performance overall).
Anyway, hopefully that explains the dual issue here: we care about
performance, but we also have to use a specific instruction pattern,
and can't just hope for the best.
Linus
^ permalink raw reply [relevance 87%]
* Re: More annoying code generation by clang
@ 2024-04-06 15:39 99% ` Linus Torvalds
2024-04-06 16:04 87% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-06 15:39 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
Linux Kernel Mailing List
On Sat, 6 Apr 2024 at 05:30, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> FYI, please note that gcc-12 is able to synthesize carry-flag compares
> on its own:
Oh, gcc has been able to do that for much longer than that. It's a
idiomatic i386 pattern, and gcc has generated it for as long as I can
remember.
HOWEVER.
There's a big difference between "able to" and "GUARANTEED to".
Because this code actually requires a data-depencency and not a
control dependency as a correctness issue because of Spectre-v1.
So while I know very well that gcc _can_ do it, I also know very well
that there are absolutely no guarantees that gcc won't use a
conditional branch instead.
So this code is needs to generate good code because it's actually
important code that shows up in benchmarks, but this code also needs
to generate a very _particular_ pattern of code, and it's not good
enough that gcc may "happen" to generate that pattern of code.
Thus the inline asm.
Linus
^ permalink raw reply [relevance 99%]
* More annoying code generation by clang
@ 2024-04-04 22:53 81% Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-04-04 22:53 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, Peter Anvin
Cc: the arch/x86 maintainers, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 3725 bytes --]
So this doesn't really matter in any real life situation, but it
really grated on me.
Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.
See for example
https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
where I was ranting about clang just doing pointlessly stupid things.
However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.
We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:
/* Mask is a 0 for invalid fd's, ~0 for valid ones */
nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
and clang makes a *horrid* mess of it, generating this code:
movl %edi, %r14d
movq 32(%rbx), %rdx
movl (%rdx), %eax
movq %rax, 8(%rsp)
cmpq 8(%rsp), %r14
sbbq %rcx, %rcx
which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.
In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.
All clang's fault, right?
Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.
This is what we do:
static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
unsigned long size)
{
unsigned long mask;
asm volatile ("cmp %1,%2; sbb %0,%0;"
:"=r" (mask)
:"g"(size),"r" (index)
:"cc");
return mask;
}
and look at the use again:
nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.
So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.
Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"
Anyway, I can get gcc to generate the right code:
movq 32(%r13), %rdx
cmp (%rdx),%ebx
sbb %esi,%esi
which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate
movq 32(%rdi), %rcx
movl (%rcx), %eax
cmpl %eax, %esi
sbbl %esi, %esi
which is the same thing, but with that (pointless) load to a register.
And now clang doesn't generate that stack frame at all.
Anyway, this was a long email to explain the odd attached patch.
Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1122 bytes --]
arch/x86/include/asm/barrier.h | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..6159d2cbbfde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,15 @@
* Returns:
* 0 - (index < size)
*/
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
- unsigned long size)
-{
- unsigned long mask;
-
- asm volatile ("cmp %1,%2; sbb %0,%0;"
- :"=r" (mask)
- :"g"(size),"r" (index)
- :"cc");
- return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({ \
+ typeof((idx)+(sz)) __idx = (idx); \
+ typeof(__idx) __sz = (sz); \
+ typeof(__idx) __mask; \
+ asm volatile ("cmp %1,%2; sbb %0,%0" \
+ :"=r" (__mask) \
+ :"ir"(__sz),"r" (__idx) \
+ :"cc"); \
+ __mask; })
/* Prevent speculative execution past this barrier. */
#define barrier_nospec() asm volatile("lfence":::"memory")
^ permalink raw reply related [relevance 81%]
* Re: user-space concurrent pipe buffer scheduler interactions
@ 2024-04-03 20:57 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 20:57 UTC (permalink / raw)
To: Michael Clark; +Cc: Jens Axboe, Ingo Molnar, Peter Zijlstra, linux-kernel
On Wed, 3 Apr 2024 at 13:52, Michael Clark <michael@metaparadigm.com> wrote:
>
> On 4/4/24 05:56, Linus Torvalds wrote:
> > On Tue, 2 Apr 2024 at 13:54, Michael Clark <michael@metaparadigm.com> wrote:
> >>
> >> I am working on a low latency cross-platform concurrent pipe buffer
> >> using C11 threads and atomics.
> >
> > You will never get good performance doing spinlocks in user space
> > unless you actually tell the scheduler about the spinlocks, and have
> > some way to actually sleep on contention.
> >
> > Which I don't see you as having.
>
> We can work on this.
It's been tried.
Nobody ever found a use-case that is sufficiently convincing, but see
the write-up at
https://lwn.net/Articles/944895/
for a pointer to at least attempts.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns)
@ 2024-04-03 17:13 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 17:13 UTC (permalink / raw)
To: Borislav Petkov; +Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Ingo Molnar
On Wed, 3 Apr 2024 at 10:05, Borislav Petkov <bp@alien8.de> wrote:
>
> Can you pls replace it with the below one?
Ok, done.
Linus
^ permalink raw reply [relevance 99%]
* Re: [RESEND][PATCH v3] security: Place security_path_post_mknod() where the original IMA call was
@ 2024-04-03 16:59 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-03 16:59 UTC (permalink / raw)
To: Roberto Sassu
Cc: viro, brauner, jack, paul, jmorris, serge, zohar, linux-fsdevel,
linux-kernel, linux-security-module, linux-cifs, linux-integrity,
pc, Roberto Sassu, Steve French
On Wed, 3 Apr 2024 at 02:10, Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> Move security_path_post_mknod() where the ima_post_path_mknod() call was,
> which is obviously correct from IMA/EVM perspective. IMA/EVM are the only
> in-kernel users, and only need to inspect regular files.
Thanks, applied,
Linus
^ permalink raw reply [relevance 99%]
* Re: user-space concurrent pipe buffer scheduler interactions
@ 2024-04-03 16:56 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-03 16:56 UTC (permalink / raw)
To: Michael Clark; +Cc: Jens Axboe, Ingo Molnar, Peter Zijlstra, linux-kernel
On Tue, 2 Apr 2024 at 13:54, Michael Clark <michael@metaparadigm.com> wrote:
>
> I am working on a low latency cross-platform concurrent pipe buffer
> using C11 threads and atomics.
You will never get good performance doing spinlocks in user space
unless you actually tell the scheduler about the spinlocks, and have
some way to actually sleep on contention.
Which I don't see you as having.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns)
@ 2024-04-03 16:45 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-03 16:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Ingo Molnar
On Wed, 3 Apr 2024 at 05:24, Borislav Petkov <bp@alien8.de> wrote:
>
> Subject: [PATCH] x86/retpoline: Fix a missing return thunk warning
Thanks, applied directly,
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] security changes for v6.9-rc3
@ 2024-04-02 21:35 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-04-02 21:35 UTC (permalink / raw)
To: Al Viro
Cc: Roberto Sassu, linux-integrity, linux-security-module,
linux-fsdevel, linux-cifs, linux-kernel, Roberto Sassu
On Tue, 2 Apr 2024 at 14:00, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> 1) location of that hook is wrong. It's really "how do we catch
> file creation that does not come through open() - yes, you can use
> mknod(2) for that". It should've been after the call of vfs_create(),
> not the entire switch. LSM folks have a disturbing fondness of inserting
> hooks in various places, but IMO this one has no business being where
> they'd placed it. Bikeshedding regarding the name/arguments/etc. for
> that thing is, IMO, not interesting...
Hmm. I guess that's right - for a non-file node, there's nothing that
the security layer can really check after-the-fact anyway.
It's not like you can attest the contents of a character device or whatever...
> 2) the only ->mknod() instance in the tree that tries to leave
> dentry unhashed negative on success is CIFS (and only one case in it).
> From conversation with CIFS folks it's actually cheaper to instantiate
> in that case as well - leaving instantiation to the next lookup will
> cost several extra roundtrips for no good reason.
Ack.
> 3) documentation (in vfs.rst) is way too vague. The actual
> rules are
> * ->create() must instantiate on success
> * ->mkdir() is allowed to return unhashed negative on success and
> it might be forced to do so in some cases. If a caller of vfs_mkdir()
> wants the damn thing positive, it should account for such possibility and do
> a lookup. Normal callers don't care; see e.g. nfsd and overlayfs for example
> of those that do.
> * ->mknod() is interesting - historically it had been "may leave
> unhashed negative", but e.g. unix_bind() expected that it won't do so;
> the reason it didn't blow up for CIFS is that this case (SFU) of their mknod()
> does not support FIFOs and sockets anyway. Considering how few instances
> try to make use of that option and how it doesn't actually save them
> anything, I would prefer to declare that ->mknod() should act as ->create().
> * ->symlink() - not sure; there are instances that make use of that
> option (coda and hostfs). OTOH, the only callers of vfs_symlink() that
> care either way are nfsd and overlayfs, and neither is usable with coda
> or hostfs... Could go either way, but we need to say it clearly in the
> docs, whichever way we choose.
Fair enough.
Anyway, it does sound like maybe the minimal fix would be just that
"move it into the
case 0: case S_IFREG:
path".
Although if somebody already has the cifs patch to just do the
d_instantiate() for mknod, that might be even better.
I will leave this in more competent hands for now.
Let the bike-shedding commence,
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] security changes for v6.9-rc3
2024-04-02 19:39 92% ` Linus Torvalds
@ 2024-04-02 19:57 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-02 19:57 UTC (permalink / raw)
To: Roberto Sassu
Cc: linux-integrity, linux-security-module, linux-fsdevel,
linux-cifs, linux-kernel, Roberto Sassu
On Tue, 2 Apr 2024 at 12:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> void security_path_post_mknod(struct mnt_idmap *idmap, struct dentry *dentry)
> {
> - if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
> + struct inode *inode = d_backing_inode(dentry);
> + if (unlikely(!inode || IS_PRIVATE(inode)))
> return;
> call_void_hook(path_post_mknod, idmap, dentry);
Hmm. We do have other hooks that get called for this case.
For fsnotify_create() we actually have a comment about this:
* fsnotify_create - 'name' was linked in
*
* Caller must make sure that dentry->d_name is stable.
* Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate
* ->d_inode later
and audit_inode_child() ends up having a
if (inode)
handle_one(inode);
in it.
So in other cases we do handle the NULL, but it does seem like the
other cases actually do validaly want to deal with this (ie the
fsnotify case will say "the directory that mknod was done in was
changed" even if it doesn't know what the change is.
But for the security case, it really doesn't seem to make much sense
to check a mknod() that you don't know the result of.
I do wonder if that "!inode" test might also be more specific with
"d_unhashed(dentry)". But that would only make sense if we moved this
test from security_path_post_mknod() into the caller itself, ie we
could possibly do something like this instead (or in addition to):
- if (error)
- goto out2;
- security_path_post_mknod(idmap, dentry);
+ if (!error && !d_unhashed(dentry))
+ security_path_post_mknod(idmap, dentry);
which might also be sensible.
Al? Anybody?
Linus
^ permalink raw reply [relevance 96%]
* Re: [GIT PULL] security changes for v6.9-rc3
@ 2024-04-02 19:39 92% ` Linus Torvalds
2024-04-02 19:57 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-04-02 19:39 UTC (permalink / raw)
To: Roberto Sassu
Cc: linux-integrity, linux-security-module, linux-fsdevel,
linux-cifs, linux-kernel, Roberto Sassu
On Tue, 2 Apr 2024 at 07:12, Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> A single bug fix to address a kernel panic in the newly introduced function
> security_path_post_mknod.
So I've pulled from you before, but I still don't have a signature
chain for your key (not that I can even find the key itself, much less
a signature chain).
Last time I pulled, it was after having everybody else just verify the
actual commit.
This time, the commit looks like a valid "avoid NULL", but I have to
say that I also think the security layer code in question is ENTIRELY
WRONG.
IOW, as far as I can tell, the mknod() system call may indeed leave
the dentry unhashed, and rely on anybody who then wants to use the new
special file to just do a "lookup()" to actually use it.
HOWEVER.
That also means that the whole notion of "post_path_mknod() is
complete and utter hoghwash. There is not anything that the security
layer can possibly validly do.
End result: instead of checking the 'inode' for NULL, I think the
right fix is to remove that meaningless security hook. It cannot do
anything sane, since one option is always 'the inode hasn't been
initialized yet".
Put another way: any security hook that checks inode in
security_path_post_mknod() seems simply buggy.
But if we really want to do this ("if mknod creates a positive dentry,
I won't see it in lookup, so I want to appraise it now"), then we
should just deal with this in the generic layer with some hack like
this:
--- a/security/security.c
+++ b/security/security.c
@@ -1801,7 +1801,8 @@ EXPORT_SYMBOL(security_path_mknod);
*/
void security_path_post_mknod(struct mnt_idmap *idmap, struct dentry *dentry)
{
- if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
+ struct inode *inode = d_backing_inode(dentry);
+ if (unlikely(!inode || IS_PRIVATE(inode)))
return;
call_void_hook(path_post_mknod, idmap, dentry);
}
and IMA and EVM would have to do any validation at lookup() time for
the cases where the dentry wasn't hashed by ->mknod.
Anyway, all of this is to say that I don't feel like I can pull this without
(a) more acks by people
and
(b) explanations for why the simpler fix to just
security_path_post_mknod() isn't the right fix.
Linus
^ permalink raw reply [relevance 92%]
* Linux 6.9-rc2
@ 2024-03-31 22:05 43% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-31 22:05 UTC (permalink / raw)
To: Linux Kernel Mailing List
Neither snow nor rain nor heat nor gloom of night stays kernel rc releases.
Nor does Easter.
So here we are. Another week has passed, and rc2 is out. Nothing here
look all that remarkable, and the fixes are fairly evenly spread out
(so mostly drivers, because that's the bulk of the code).
Outside of the driver fixes (see shortlog below for details), we've
got some more selftest work (mostly networking and bpf but also some
random fixes), some architecture fixes (mostly x86), some filesystem
work (xfs and btrfs) and random noise in other parts (mm, core kernel,
networking, Kbuild..).
Nothing stands out to me or looks unusual.
Linus
---
Alan Stern (3):
USB: core: Fix deadlock in usb_deauthorize_interface()
USB: core: Add hub_get() and hub_put() routines
USB: core: Fix deadlock in port "disable" sysfs attribute
Alexander Stein (1):
Revert "usb: phy: generic: Get the vbus supply"
Alexander Wetzel (1):
scsi: sg: Avoid sg device teardown race
Alexandra Winter (1):
s390/qeth: handle deferred cc1
Alexei Starovoitov (4):
bpf: Clarify bpf_arena comments.
libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
selftests/bpf: Remove hard coded PAGE_SIZE macro.
selftests/bpf: Add arena test case for 4Gbyte corner case
Anand Jain (2):
btrfs: validate device maj:min during open
btrfs: return accurate error code on open failure in open_fs_devices()
Andrei Matei (2):
bpf: Check bloom filter map value size
bpf: Protect against int overflow for stack access size
Andrew Price (1):
gfs2: Fix invalid metadata access in punch_hole
Andrii Nakryiko (1):
libbpf: fix u64-to-pointer cast on 32-bit arches
Andy Shevchenko (1):
gpiolib: Fix debug messaging in gpiod_find_and_request()
Andy Yan (1):
drm/rockchip: vop2: Remove AR30 and AB30 format support
Ard Biesheuvel (3):
x86/efistub: Add missing boot_params for mixed mode compat entry
efi/libstub: Cast away type warning in use of max()
x86/efistub: Reinstate soft limit for initrd loading
Arnaldo Carvalho de Melo (1):
libbpf: Define MFD_CLOEXEC if not available
Arnd Bergmann (6):
staging: vc04_services: changen strncpy() to strscpy_pad()
irqchip/armada-370-xp: Suppress unused-function warning
ACPI: APEI: EINJ: mark remove callback as non-__exit
ALSA: aoa: avoid false-positive format truncation warning
dm integrity: fix out-of-range warning
kbuild: make -Woverride-init warnings more consistent
Artem Savkov (1):
arm64: bpf: fix 32bit unconditional bswap
Arınç ÜNAL (1):
net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530
Ayala Beker (1):
wifi: mac80211: correctly set active links upon TTLM
Baoquan He (1):
crash: use macro to add crashk_res into iomem early for specific arch
Barry Song (1):
mm: zswap: fix kernel BUG in sg_init_one
Bartosz Golaszewski (1):
gpio: cdev: sanitize the label before requesting the interrupt
Benjamin Berg (2):
wifi: iwlwifi: mvm: guard against invalid STA ID on removal
wifi: iwlwifi: mvm: include link ID when releasing frames
Bhanuprakash Modem (2):
drm/i915/drrs: Refactor CPU transcoder DRRS check
drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status
Bikash Hazarika (1):
scsi: qla2xxx: Update manufacturer detail
Bjørn Mork (1):
net: wwan: t7xx: Split 64bit accesses to fix alignment issues
Borislav Petkov (AMD) (3):
x86/vdso: Fix rethunk patching for vdso-image-x32.o too
x86/bugs: Fix the SRSO mitigation on Zen3/4
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
Brent Lu (2):
ALSA: hda: intel-nhlt: add intel_nhlt_ssp_device_type() function
ASoC: SOF: ipc4-topology: support NHLT device type
Carlos Maiolino (1):
tmpfs: fix race on handling dquot rbtree
Chris Bainbridge (1):
drm/dp: Fix divide-by-zero regression on DP MST unplug with nouveau
Chris Park (1):
drm/amd/display: Prevent crash when disable stream
Chris Wilson (1):
drm/i915/gt: Reset queue_priority_hint on parking
Christian A. Ehrhardt (5):
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
Christian Marangi (1):
net: phy: qcom: at803x: fix kernel panic with at8031_probe
Christoph Hellwig (1):
block: don't reject too large max_user_sectors in blk_validate_limits
Chuck Lever (2):
SUNRPC: Revert 561141dd494382217bace4d1a51d08168420eace
NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies
Claus Hansen Ries (1):
net: ll_temac: platform_get_resource replaced by wrong function
Colin Ian King (2):
scsi: target: iscsi: Remove unused variable xfer_len
fs/9p: remove redundant pointer v9ses
Cong Liu (1):
tools/Makefile: remove cgroup target
Damien Le Moal (2):
scsi: sd: Fix TCG OPAL unlock on system resume
block: Do not force full zone append completion in req_bio_endio()
Dan Carpenter (2):
nexthop: fix uninitialized variable in nla_put_nh_group_stats()
staging: vc04_services: fix information leak in create_component()
Daniel Lezcano (1):
Revert "thermal: core: Don't update trip points inside the
hysteresis range"
Dave Airlie (1):
drm/i915: add bug.h include to i915_memcpy.c
Dave Chinner (2):
xfs: allow sunit mount option to repair bad primary sb stripe values
xfs: don't use current->journal_info
David Gow (1):
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
David Howells (1):
cifs: Fix duplicate fscache cookie warnings
David Thompson (2):
mlxbf_gige: stop PHY during open() error paths
mlxbf_gige: call request_irq() after NAPI initialized
Dmitry Baryshkov (1):
scsi: ufs: qcom: Provide default cycles_in_1us value
Duoming Zhou (2):
nouveau/dmem: handle kcalloc() allocation failure
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
Edward Liaw (2):
selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM
selftests/mm: fix ARM related issue with fork after pthread_create
Emmanuel Grumbach (1):
wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF
Eric Biggers (1):
Revert "crypto: pkcs7 - remove sha1 support"
Eric Dumazet (1):
tcp: properly terminate timers for kernel sockets
Eric Huang (1):
drm/amdkfd: fix TLB flush after unmap for GFX9.4.2
Eric Van Hensbergen (1):
fs/9p: fix uninitialized values during inode evict
Felix Fietkau (1):
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
Filipe Manana (4):
btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()
btrfs: fix warning messages not printing interval at unpin_extent_range()
btrfs: fix message not properly printing interval when adding extent map
btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()
Florian Westphal (1):
inet: inet_defrag: prevent sk release while still in use
Francesco Dolcini (1):
MAINTAINERS: wifi: mwifiex: add Francesco as reviewer
Gao Xiang (1):
erofs: drop experimental warning for FSDAX
George Shen (1):
drm/amd/display: Remove MPC rate control logic from DCN30 and above
Gergo Koteles (4):
ALSA: hda/tas2781: remove digital gain kcontrol
ALSA: hda/tas2781: add locks to kcontrols
ALSA: hda/tas2781: add debug statements to kcontrols
ALSA: hda/tas2781: remove useless dev_dbg from playback_hook
Guilherme G. Piccoli (1):
scsi: core: Fix unremoved procfs host directory regression
Hamza Mahfooz (1):
drm/amd/display: fix IPX enablement
Hangbin Liu (1):
scripts/bpf_doc: Use silent mode when exec make cmd
Hari Bathini (1):
bpf: fix warning for crash_kexec
Hariprasad Kelam (1):
Octeontx2-af: fix pause frame configuration in GMP mode
Harry Wentland (1):
Revert "drm/amd/display: Fix sending VSC (+ colorimetry) packets
for DP/eDP displays without PSR"
Heikki Krogerus (1):
usb: dwc3: pci: Drop duplicate ID
Herve Codina (1):
net: wan: framer: Add missing static inline qualifiers
Ido Schimmel (2):
ipv6: Fix address dump when IPv6 is disabled on an interface
selftests: vxlan_mdb: Fix failures with old libnet
Igor Artemiev (1):
wifi: cfg80211: fix rdev_dump_mpp() arguments order
Ilan Peer (1):
wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW
Ilya Leoshkevich (1):
s390/bpf: Fix bpf_plt pointer arithmetic
Ingo Molnar (2):
Documentation/x86: Fix title underline length
Revert "x86/mm/ident_map: Use gbpages only where full GB page
should be mapped."
Isak Ellmer (1):
kconfig: Fix typo HEIGTH to HEIGHT
Jakub Kicinski (2):
tools: ynl: fix setting presence bits in simple nests
selftests: netdevsim: set test timeout to 10 minutes
Jameson Thies (1):
usb: typec: ucsi: Check capabilities before cable and identity discovery
Jan Kara (1):
nfsd: Fix error cleanup path in nfsd_rename()
Janusz Krzysztofik (2):
drm/i915/hwmon: Fix locking inversion in sysfs getter
drm/i915/vma: Fix UAF on destroy against retire race
Jason Gunthorpe (2):
iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V
iommu: Validate the PASID in iommu_attach_device_pasid()
Jeff Johnson (1):
wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc
Jesse Brandeburg (1):
ice: fix memory corruption bug with suspend and rebuild
Jian Shen (1):
net: hns3: mark unexcuted loopback test result as UNEXECUTED
Jie Wang (1):
net: hns3: fix index limit to support all queue stats
Jocelyn Falempe (1):
drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed
Johan Hovold (1):
wifi: mac80211: fix mlme_link_id_dbg()
Johannes Berg (8):
wifi: cfg80211: add a flag to disable wireless extensions
wifi: iwlwifi: mvm: disable MLO for the time being
wifi: mac80211: fix prep_connection error path
wifi: iwlwifi: mvm: rfi: fix potential response leaks
wifi: iwlwifi: fw: don't always use FW dump trig
wifi: iwlwifi: read txq->read_ptr under lock
wifi: iwlwifi: mvm: handle debugfs names more carefully
kunit: fix wireless test dependencies
Johannes Thumshirn (3):
btrfs: zoned: use zone aware sb location for scrub
btrfs: zoned: fix use-after-free in do_zone_finish()
btrfs: zoned: don't skip block groups with 100% zone unusable
Johannes Weiner (4):
mm: cachestat: fix two shmem bugs
mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion
mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices
drm/amdgpu: fix deadlock while reading mqd from debugfs
John Garry (1):
block: Make blk_rq_set_mixed_merge() static
John Ogness (1):
printk: Update @console_may_schedule in console_trylock_spinning()
John Sperbeck (1):
init: open /initrd.image with O_LARGEFILE
Jonathan Kim (1):
drm/amdkfd: range check cp bad op exception interrupts
Jonathon Hall (1):
drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()
Joonas Lahtinen (1):
drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c
José Roberto de Souza (1):
drm/i915: Do not print 'pxp init failed with 0' when it succeed
Juha-Pekka Heikkila (1):
drm/i915/display: Disable AuxCCS framebuffers if built for Xe
Justin Chen (2):
net: bcmasp: Bring up unimac after PHY link up
net: bcmasp: Remove phy_{suspend/resume}
Justin Stitt (1):
binfmt: replace deprecated strncpy
Justin Tee (12):
scsi: lpfc: Remove unnecessary log message in queuecommand path
scsi: lpfc: Move NPIV's transport unregistration to after
resource clean up
scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
scsi: lpfc: Use a dedicated lock for ras_fwlog state
scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
scsi: lpfc: Define types in a union for generic void *context3 ptr
scsi: lpfc: Update lpfc version to 14.4.0.1
scsi: lpfc: Copyright updates for 14.4.0.1 patches
Kees Cook (2):
selftests/exec: execveat: Improve debug reporting
selftests/exec: Convert remaining /bin/sh to /bin/bash
Ken Raeburn (1):
dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones
Kevin Loughlin (1):
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
Krishna Kurapati (1):
usb: typec: ucsi: Fix race between typec_switch and role_switch
Kuan-Wei Chiu (2):
MAINTAINERS: remove incorrect M: tag for dm-devel@lists.linux.dev
MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev
Kuniyuki Iwashima (1):
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building
arp_tables.c
Kurt Kanzenbach (1):
igc: Remove stale comment about Tx timestamping
Kyle Tso (3):
usb: typec: tcpm: Correct port source pdo array in pd_set callback
usb: typec: tcpm: Update PD of Type-C port upon pd_set
usb: typec: Return size of buffer if pd_set operation succeeds
Lang Yu (2):
drm/amdgpu/umsch: update UMSCH 4.0 FW interface
drm/amdgpu: enable UMSCH 4.0.6
Leonard Crestez (1):
mailmap: update entry for Leonard Crestez
Liming Sun (1):
sdhci-of-dwcmshc: disable PM runtime in dwcmshc_remove()
Linus Torvalds (4):
Fix memory leak in posix_clock_open()
Fix build errors due to new UIO_MEM_DMA_COHERENT mess
mm: clean up populate_vma_page_range() FOLL_* flag handling
Linux 6.9-rc2
Lizhi Xu (1):
fs/9p: fix uaf in in v9fs_stat2inode_dotl
Lokesh Gidra (1):
userfaultfd: fix deadlock warning when locking src and dst VMAs
Luca Weiss (1):
drm/bridge: Select DRM_KMS_HELPER for DRM_PANEL_BRIDGE
Lucas De Marchi (1):
drm/xe: Fix END redefinition
Mario Limonciello (1):
drm/amd: Flush GFXOFF requests in prepare stage
Mark Brown (2):
gpiolib: Add stubs for GPIO lookup functions
selftests/seccomp: Try to fit runtime of benchmark into timeout
Mark Rutland (1):
selftests/ftrace: Fix event filter target_func selection
Masahiro Yamada (6):
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
kconfig: do not reparent the menu inside a choice block
export.h: remove include/asm-generic/export.h
modpost: do not make find_tosym() return NULL
x86/build: Use obj-y to descend into arch/x86/virt/
Masami Hiramatsu (Google) (1):
tracing: probes: Fix to zero initialize a local variable
Matt Bobrowski (1):
bpf: update BPF LSM designated reviewer list
Matthew Auld (5):
drm/xe/guc_submit: use jiffies for job timeout
drm/xe/queue: fix engine_class bounds check
drm/xe/device: fix XE_MAX_GT_PER_TILE check
drm/xe/device: fix XE_MAX_TILES_PER_DEVICE check
drm/xe/query: fix gt_id bounds check
Matthew Wilcox (Oracle) (1):
mm: increase folio batch size
Max Filippov (1):
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
Maxim Levitsky (1):
i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1
Miguel Ojeda (2):
drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`
drm/qxl: remove unused variable from `qxl_process_single_command()`
Mikko Rapeli (2):
mmc: core: Initialize mmc_blk_ioc_data
mmc: core: Avoid negative index with array access
Mikulas Patocka (1):
objtool: Fix compile failure when using the x32 compiler
Minas Harutyunyan (5):
usb: dwc2: host: Fix hibernation flow
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: gadget: LPM flow fix
Mostafa Saleh (1):
iommu/arm-smmu-v3: Fix access for STE.SHCFG
Muhammad Usama Anjum (7):
scsi: lpfc: Correct size for wqe for memset()
scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()
selftests/exec: binfmt_script: Add the overall result line
according to TAP
selftests/exec: load_address: conform test to TAP format output
selftests/exec: recursion-depth: conform test to TAP format output
selftests: mm: restore settings from only parent process
selftests: dmabuf-heap: add config file for the test
Mukul Joshi (1):
drm/amdkfd: Check cgroup when returning DMABuf info
Natanel Roizenman (1):
drm/amd/display: Increase Z8 watermark times.
Nathan Chancellor (2):
hexagon: vmlinux.lds.S: handle attributes section
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
Neil Armstrong (1):
Revert "drm/bridge: Select DRM_KMS_HELPER for DRM_PANEL_BRIDGE"
Nikita Kiryushin (1):
ACPICA: debugger: check status of acpi_evaluate_object() in
acpi_db_walk_for_fields()
Nilesh Javali (1):
scsi: qla2xxx: Update version to 10.02.09.200-k
Nirmoy Das (1):
drm/xe: Remove unused xe_bo->props struct
Oliver Neukum (1):
usb: cdc-wdm: close race between read and workqueue
Oscar Salvador (1):
mm,page_owner: fix recursion
Pablo Neira Ayuso (3):
netfilter: nf_tables: reject destroy command to remove basechain hooks
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
Paul E. McKenney (1):
x86/nmi: Upgrade NMI backtrace stall checks & messages
Pavel Sakharov (1):
dma-buf: Fix NULL pointer dereference in sanitycheck()
Peter Wang (1):
scsi: ufs: core: Add config_scsi_dev vops comment
Peter Xu (1):
mm/memory: fix missing pte marker for !page on pte zaps
Peyton Lee (1):
drm/amdgpu/vpe: power on vpe when hw_init
Ping-Ke Shih (2):
wifi: rtw89: coex: fix configuration for shared antenna for 8922A
MAINTAINERS: wifi: add git tree for Realtek WiFi drivers
Prasad Pandit (1):
dpll: indent DPLL option type by a tab
Przemek Kitszel (1):
ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
Pu Lehui (1):
riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi
Puranjay Mohan (5):
bpf: Temporarily disable atomic operations in BPF arena
bpf, arm64: fix bug in BPF_LDX_MEMSX
bpf: verifier: fix addr_space_cast from as(1) to as(0)
selftests/bpf: verifier_arena: fix mmap address for arm64
bpf: verifier: reject addr_space_cast insn without arena
Quentin Monnet (1):
MAINTAINERS: Update email address for Quentin Monnet
Quinn Tran (6):
scsi: qla2xxx: Prevent command send on chip reset
scsi: qla2xxx: Fix N2N stuck connection
scsi: qla2xxx: Split FCE|EFT trace control
scsi: qla2xxx: NVME|FCP prefer flag not being honored
scsi: qla2xxx: Fix command flush on cable pull
scsi: qla2xxx: Delay I/O Abort on PCI error
Rafael J. Wysocki (1):
genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
Raju Lakkaraju (1):
net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
Ravi Gunasekaran (1):
net: hsr: hsr_slave: Fix the promiscuous mode in offload mode
Ricardo B. Marliere (5):
scsi: sg: Make sg_sysfs_class constant
scsi: pmcraid: Make pmcraid_class constant
scsi: cxlflash: Make cxlflash_class constant
scsi: ch: Make ch_sysfs_class constant
scsi: st: Make st_sysfs_class constant
Rohit Ner (1):
scsi: ufs: core: Fix MCQ MAC configuration
Romain Naour (1):
mmc: sdhci-omap: re-tuning is needed after a pm transition to
support emmc HS200 mode
Roman Li (1):
drm/amd/display: Fix bounds check for dcn35 DcfClocks
Ryosuke Yasuoka (1):
nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
Sabrina Dubroca (4):
tls: recv: process_rx_list shouldn't use an offset with kvec
tls: adjust recv return with async crypto and failed copy to userspace
selftests: tls: add test with a partially invalid iov
tls: get psock ref after taking rxlock to avoid leak
Sandeep Dhavale (1):
MAINTAINERS: erofs: add myself as reviewer
Sandipan Das (4):
x86/cpufeatures: Add new word for scattered features
perf/x86/amd/lbr: Use freeze based on availability
perf/x86/amd/core: Update and fix stalled-cycles-* events for
Zen 2 and later
perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
Saurav Kashyap (4):
scsi: qla2xxx: Fix double free of the ha->vp_map pointer
scsi: qla2xxx: Fix double free of fcport
scsi: qla2xxx: Change debug message during driver unload
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
Sergey Shtylyov (1):
MAINTAINERS: split Renesas Ethernet drivers entry
Shaul Triebitz (1):
wifi: iwlwifi: mvm: consider having one active link
Shin'ichiro Kawasaki (1):
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
Simon Trimmer (2):
ALSA: hda: cs35l56: Raise device name message log level
ALSA: hda: cs35l56: Set the init_done flag before component_add()
Stanislav Fomichev (1):
xsk: Don't assume metadata is always requested in TX completion
Steve French (1):
smb3: add trace event for mknod
Steven Zou (1):
ice: Refactor FW data type and fix bitmap casting issue
Sung Joon Kim (1):
drm/amd/display: Update dcn351 to latest dcn35 config
Taimur Hassan (1):
drm/amd/display: Send DTBCLK disable message on first commit
Tavian Barnes (1):
btrfs: fix race in read_extent_buffer_pages()
Tejas Upadhyay (1):
drm/i915/mtl: Update workaround 14018575942
Thinh Nguyen (1):
usb: dwc3: Properly set system wakeup
Thomas Gleixner (1):
MAINTAINERS: Add co-maintainers for time[rs]
Thomas Zimmermann (1):
fbdev: Select I/O-memory framebuffer ops for SBus
Tom Zanussi (1):
crypto: iaa - Fix nr_cpus < nr_iaa case
Uros Bizjak (1):
x86/percpu: Disable named address spaces for KCSAN
Ville Syrjälä (6):
drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP
drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly
drm/i915/vrr: Generate VRR "safe window" for DSB
drm/i915/dsb: Fix DSB vblank waits when using VRR
drm/i915: Pre-populate the cursor physical dma address
drm/i915/bios: Tolerate devdata==NULL in
intel_bios_encoder_supports_dp_dual_mode()
Vitaly Chikunov (1):
selftests/mm: Fix build with _FORTIFY_SOURCE
Vitaly Prosyak (1):
drm/sched: fix null-ptr-deref in init entity
Weitao Wang (1):
USB: UAS: return ENODEV when submit urbs fail with device not attached
Wenjing Liu (1):
drm/amd/display: fix a dereference of a NULL pointer
Xi Liu (2):
drm/amd/display: increase bb clock for DCN351
drm/amd/display: Set DCN351 BB and IP the same as DCN35
Xingui Yang (2):
scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()
scsi: libsas: Fix disk not being scanned in after being removed
Xu Yang (1):
usb: typec: tcpm: fix double-free issue in tcpm_port_unregister_pd()
Yazen Ghannam (3):
RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
RAS/AMD/FMPM: Safely handle saved records of various sizes
RAS: Avoid build errors when CONFIG_DEBUG_FS=n
Ye Zhang (1):
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Yonglong Liu (1):
net: hns3: fix kernel crash when devlink reload during pf initialization
Yongzhi Liu (1):
usb: misc: ljca: Fix double free in error handling path
Zev Weiss (2):
prctl: generalize PR_SET_MDWE support check to be per-arch
ARM: prctl: reject PR_SET_MDWE on pre-ARMv6
Zoltan HERPAI (1):
pwm: img: fix pwm clock lookup
lima1002 (1):
drm/amd/swsmu: add smu 14.0.1 vcn and jpeg msg
linke li (1):
net: mark racy access on sk->sk_rcvbuf
yuan linyu (1):
usb: udc: remove warning when queue disabled ep
^ permalink raw reply [relevance 43%]
* Re: [GIT PULL] tpmdd changes for v6.9-rc2
@ 2024-03-31 17:01 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-31 17:01 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: dhowells, Peter Huewe, Jason Gunthorpe, linux-integrity,
linux-kernel, keyrings
On Sat, 30 Mar 2024 at 22:57, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> OK, point taken and it is evolutionary issue really but definitely
> needs to be fixed.
>
> I review and test most of the stuff that goes to keyring but other
> than trusted keys, I usually pick only few patches every now and
> then to my tree.
It's perfectly fine if you send me key updates - you're listed as
maintainer etc, that's not a problem.
But when I get a tag name that says "tpmdd" and a subject that says
"tpmdd", I'm noty expecting to then see key updates in the pull.
So that part of my issue was literally just that your subject line and
tag name didn't match the contents, and that just makes me go "there's
something wrong here".
So keys coming through your tree is fine per se, it's just that I want
the subject line etc to actually make sense.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tpmdd changes for v6.9-rc2
@ 2024-03-30 22:32 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-30 22:32 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: Peter Huewe, Jason Gunthorpe, David Howells, linux-integrity,
linux-kernel, keyrings
On Tue, 26 Mar 2024 at 07:38, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git tags/tpmdd-v6.9-rc2
So I haven't pulled this, because the subject line (and tag name)
talks about tpmdd, but this is clearly about key handling.
Also, the actual contents seem to be very much an "update", not fixes.
And it doesn't seem to be an actual improvement, in how it now does
things from interrupts. That seems to be going backward rather than
forward.
Linus
^ permalink raw reply [relevance 99%]
* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
@ 2024-03-28 20:09 95% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-28 20:09 UTC (permalink / raw)
To: Linux regressions mailing list, Andreas Larsson
Cc: Nick Bowler, linux-kernel, David S. Miller, sparclinux
On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> [CCing Linus, in case I say something to his disliking]
>
> On 22.03.24 05:57, Nick Bowler wrote:
> >
> > Just a friendly reminder that this issue still happens on Linux 6.8 and
> > reverting commit 9b2f753ec237 as indicated below is still sufficient to
> > resolve the problem.
>
> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
> nr_cpus is set") is from v4.8. Reverting it after all that time might
> easily lead to even bigger trouble.
I'm definitely not reverting a patch from almost a decade ago as a regression.
If it took that long to find, it can't be that critical of a regression.
So yes, let's treat it as a regular bug. And let's bring in Andreas to
the discussion too (although presumably he has seen it on the
sparclinux mailing list).
Andreas, if not, here's the link to lore for the beginning of the thread:
https://lore.kernel.org/all/CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com/
And from a quick look I do think that commit is buggy, and yes, the
fix probably is just be to revert it.
As the original report makes clear, that commit 9b2f753ec23710 is
clearly confused about the difference between "number of CPU's", and
"index of CPU numbers".
When that smp_fill_in_cpu_possible_map() does
int possible_cpus = num_possible_cpus();
and then uses that to fill in &__cpu_possible_mask, that's completely
nonsensical. Because we literally have
#define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
#define num_possible_cpus() cpumask_weight(cpu_possible_mask)
so it's reading cpu_possible_mask to figure out how many cpus it might
have, and then using that number to set possibly *different* bits in
the same bitmap that is just used to judge what the max number is.
So I do think a revert is called for, but I'm not going to treat this
as a regression, I'm going to just treat it as "sparc bug" and hope
that the sparc people try to figure out why that crazy code was
written.
And maybe it made more sense back a decade ago than it does now.
Andreas?
Linus
^ permalink raw reply [relevance 95%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-27 22:57 94% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-27 22:57 UTC (permalink / raw)
To: Kent Overstreet
Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Wed, 27 Mar 2024 at 14:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
>
> On the hardware end, the Mill guys were pointing out years ago that
> register renaming is a big power bottleneck in modern processors;
LOL.
The Mill guys took the arguments from the Itanium people, and turned
the crazy up to 11, with "the belt" and seemingly trying to do a
dataflow machine but not worrying over-much about memory accesses etc.
The whole "we'll deal with it in the compiler" is crazy talk.
In other words, I'll believe it when I see it. And I doubt we'll ever see it.
Linus
^ permalink raw reply [relevance 94%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-27 20:45 88% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 20:45 UTC (permalink / raw)
To: Kent Overstreet
Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Wed, 27 Mar 2024 at 12:41, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> _But_: the lack of any aliasing guarantees means that writing through
> any pointer can invalidate practically anything, and this is a real
> problem.
It's actually much less of a problem than you make it out to be.
A lot of common aliasing information is statically visible (offsets
off the same struct pointer etc).
The big problems tend to be
(a) old in-order hardware that really wants the compiler to schedule
memory operations
(b) vectorization and HPC
and honestly, (a) is irrelevant, and (b) is where 'restrict' and
actual real vector extensions come in. In fact, the type-based
aliasing often doesn't help (because you have arrays of the same FP
types), and so you really just need to tell the compiler that your
arrays are disjoint.
Yes, yes, possible aliasing means that the compiler won't generate
nice-looking code in many situations and will end up reloading values
from memory etc.
AND NONE OF THAT MATTERS IN REALITY.
Performance issues to a close approximation come from cache misses and
branch mispredicts. The aliasing issue just isn't the horrendous issue
people claim it is. It's most *definitely* not worth the absolute
garbage that is C type-based aliasing.
And yes, I do think it might be nice to have a nicer 'restrict' model,
because yes, I look at the generated asm and I see the silly code
generation too. But describing aliasing sanely in general is just hard
(both for humans _and_ for some sane machine interface), and it's very
very seldom worth the pain.
Linus
^ permalink raw reply [relevance 88%]
* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
2024-03-27 16:56 97% ` Linus Torvalds
@ 2024-03-27 20:26 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-27 20:26 UTC (permalink / raw)
To: Greg KH, Chris Leech, Nilesh Javali, Christoph Hellwig; +Cc: linux-kernel
On Wed, 27 Mar 2024 at 09:56, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I also *suspect* that using 'physaddr_t' is in itself pointless,
> because I *think* the physical addresses are always page-aligned
> anyway, and it would be better if the uio_mem thing just contained the
> pfn instead. Which could just be 'unsigned long pfn'.
Oddly, the uio code seems to be written to allow unaligned page buffers,
actual_pages = ((idev->info->mem[mi].addr & ~PAGE_MASK)
+ idev->info->mem[mi].size + PAGE_SIZE -1) >>
PAGE_SHIFT;
but none of the mmap routines than actually allow such a mapping, and
they all have alignment checks.
Which sounds wonderful, until you find code like this duplicated in
various uio drivers:
uiomem->memtype = UIO_MEM_PHYS;
uiomem->addr = r->start & PAGE_MASK;
uiomem->offs = r->start & ~PAGE_MASK;
uiomem->size = (uiomem->offs + resource_size(r)
+ PAGE_SIZE - 1) & PAGE_MASK;
IOW, it explicitly aligns the resources to pages, so now mmap works
again. Oh the horror.
But yes, that physical part of 'addr' should be a pfn. Sadly, all of
this code is such a mess that it's a horrible job to try to fix it all
up.
So we may be stuck with the horrendous confusion that is the current
uio_mem thing.
Linus
^ permalink raw reply [relevance 99%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-27 19:07 89% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 19:07 UTC (permalink / raw)
To: Kent Overstreet
Cc: comex, Dr. David Alan Gilbert, Philipp Stanner, Boqun Feng,
rust-for-linux, linux-kernel, linux-arch, llvm, Miguel Ojeda,
Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, Marco Elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Wed, 27 Mar 2024 at 11:51, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> On Wed, Mar 27, 2024 at 09:16:09AM -0700, comex wrote:
> > Meanwhile, Rust intentionally lacks strict aliasing.
>
> I wasn't aware of this. Given that unrestricted pointers are a real
> impediment to compiler optimization, I thought that with Rust we were
> finally starting to nail down a concrete enough memory model to tackle
> this safely. But I guess not?
Strict aliasing is a *horrible* mistake.
It's not even *remotely* "tackle this safely". It's the exact
opposite. It's completely broken.
Anybody who thinks strict aliasing is a good idea either
(a) doesn't understand what it means
(b) has been brainwashed by incompetent compiler people.
it's a horrendous crock that was introduced by people who thought it
was too complicated to write out "restrict" keywords, and that thought
that "let's break old working programs and make it harder to write new
programs" was a good idea.
Nobody should ever do it. The fact that Rust doesn't do the C strict
aliasing is a good thing. Really.
I suspect you have been fooled by the name. Because "strict aliasing"
sounds like a good thing. It sounds like "I know these strictly can't
alias". But despite that name, it's the complete opposite of that, and
means "I will ignore actual real aliasing even if it exists, because I
will make aliasing decisions on entirely made-up grounds".
Just say no to strict aliasing. Thankfully, there's an actual compiler
flag for that: -fno-strict-aliasing. It should absolutely have been
the default.
Linus
^ permalink raw reply [relevance 89%]
* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
@ 2024-03-27 16:56 97% ` Linus Torvalds
2024-03-27 20:26 99% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-27 16:56 UTC (permalink / raw)
To: Greg KH, Chris Leech, Nilesh Javali, Christoph Hellwig; +Cc: linux-kernel
On Thu, 21 Mar 2024 at 06:02, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> Char/Misc and other driver subsystem updates for 6.9-rc1
[...]
> Chris Leech (4):
> uio: introduce UIO_MEM_DMA_COHERENT type
> cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT
> uio_pruss: UIO_MEM_DMA_COHERENT conversion
> uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion
So this was all broken, and doesn't even build on 32-bit architectures
with 64-bit physical addresses as reported by at least Guenter.
Notably that includes i386 allmodconfig.
I fixed up the build, but I did it the mindless way. I noted in the
commit message that I think the correct fix is likely to make
'uio_mem.mem' be a union of 'physaddr_t' and 'void *' and just always
use the right member. UIO_MEM_LOGICAL and UIO_MEM_VIRTUAL should
probably use the pointer thing too.
I also *suspect* that using 'physaddr_t' is in itself pointless,
because I *think* the physical addresses are always page-aligned
anyway, and it would be better if the uio_mem thing just contained the
pfn instead. Which could just be 'unsigned long pfn'.
So there are proper cleanups that could be done in that area.
That's not what I did, though. I just fixed up the bad casts.
There may be other fixes pending out there, but I didn't want to delay
the 32-bit build fixes any more.
It turns out that the cnic,bnx2,bnx2x conversion avoided the problems,
almost by accident. That driver had used UIO_MEM_LOGICAL before and
had existing casts. That doesn't make it good, but at least it made it
not fail to build.
See commit 498e47cd1d1f ("Fix build errors due to new
UIO_MEM_DMA_COHERENT mess")
Linus
^ permalink raw reply [relevance 97%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-26 3:49 76% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-26 3:49 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Kent Overstreet, Philipp Stanner, Boqun Feng, rust-for-linux,
linux-kernel, linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
linux-arm-kernel, linux-fsdevel
On Mon, 25 Mar 2024 at 17:05, Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> Isn't one of the aims of the Rust/C++ idea that you can't forget to access
> a shared piece of data atomically?
If that is an aim, it's a really *bad* one.
Really.
It very much should never have been an aim, and I hope it wasn't. I
think, and hope, that the source of the C++ and Rust bad decisions is
cluelessness, not active malice.
Take Rust - one big point of Rust is the whole "safe" thing, but it's
very much not a straightjacket like Pascal was. There's a "safe" part
to Rust, but equally importantly, there's also the "unsafe" part to
Rust.
The safe part is the one that most programmers are supposed to use.
It's the one that allows you to not have to worry too much about
things. It's the part that makes it much harder to screw up.
But the *unsafe* part is what makes Rust powerful. It's the part that
works behind the curtain. It's the part that may be needed to make the
safe parts *work*.
And yes, an application programmer might never actually need to use
it, and in fact in many projects the rule might be that unsafe Rust is
simply never even an option - but that doesn't mean that the unsafe
parts don't exist.
Because those unsafe parts are needed to make it all work in reality.
And you should never EVER base your whole design around the "safe"
part. Then you get a language that is a straight-jacket.
So I'd very strongly argue that the core atomics should be done the
"unsafe" way - allow people to specify exactly when they want exactly
what access. Allow people to mix and match and have overlapping
partial aliases, because if you implement things like locking, you
*need* those partially aliasing accesses, and you need to make
overlapping atomics where sometimes you access only one part of the
field.
And yes, that will be unsafe, and it might even be unportable, but
it's exactly the kind of thing you need in order to avoid having to
use assembly language to do your locking.
And by all means, you should relegate that to the "unsafe corner" of
the language. And maybe don't talk about the unsafe sharp edges in the
first chapter of the book about the language.
But you should _start_ the design of your language memory model around
the unsafe "raw atomic access operations" model.
Then you can use those strictly more powerful operations, and you
create an object model *around* it.
So you create safe objects like just an atomic counter. In *that*
corner of the language, you have the "safe atomics" - they aren't the
fundamental implementation, but they are the safe wrappers *around*
the more powerful (but unsafe) core.
With that "atomic counter" you cannot forget to do atomic accesses,
because that safe corner of the world doesn't _have_ anything but the
safe atomic accesses for every time you use the object.
See? Having the capability to do powerful and maybe unsafe things does
not force people to expose and use all that power. You can - and
should - wrap the powerful model with safer and simpler interfaces.
This isn't something specific to atomics. Not even remotely. This is
quite fundamental. You often literally _cannot_ do interesting things
using only safe interfaces. You want safe memory allocations - but to
actually write the allocator itself, you want to have all those unsafe
escape methods - all those raw pointers with arbitrary arithmetic etc.
And if you don't have unsafe escapes, you end up doing what so many
languages did: the libraries are written in something more powerful
like C, because C literally can do things that other languages
*cannot* do.
Don't let people fool you with talk about Turing machines and similar
smoke-and-mirror garbage. It's a bedtime story for first-year CS
students. It's not true.
Not all languages are created equal. Not all languages can do the same
things. If your language doesn't have those unsafe escapes, your
language is inherently weaker, and inherently worse for it.
Linus
^ permalink raw reply [relevance 76%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-25 19:44 84% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-25 19:44 UTC (permalink / raw)
To: Kent Overstreet
Cc: Philipp Stanner, Boqun Feng, rust-for-linux, linux-kernel,
linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
linux-arm-kernel, linux-fsdevel
On Mon, 25 Mar 2024 at 11:59, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> To be fair, "volatile" dates from an era when we didn't have the haziest
> understanding of what a working memory model for C would look like or
> why we'd even want one.
I don't disagree, but I find it very depressing that now that we *do*
know about memory models etc, the C++ memory model basically doubled
down on the same "object" model.
> The way the kernel uses volatile in e.g. READ_ONCE() is fully in line
> with modern thinking, just done with the tools available at the time. A
> more modern version would be just
>
> __atomic_load_n(ptr, __ATOMIC_RELAXED)
Yes. Again, that's the *right* model in many ways, where you mark the
*access*, not the variable. You make it completely and utterly clear
that this is a very explicit access to memory.
But that's not what C++ actually did. They went down the same old
"volatile object" road, and instead of marking the access, they mark
the object, and the way you do the above is
std::atomic_int value;
and then you just access 'value' and magic happens.
EXACTLY the same way that
volatile int value;
works, in other words. With exactly the same downsides.
And yes, I think that model is a nice shorthand. But it should be a
*shorthand*, not the basis of the model.
I do find it annoying, because the C++ people literally started out
with shorthands. The whole "pass by reference" is literally nothing
but a shorthand for pointers (ooh, scary scary pointers), where the
address-of is implied at the call site, and the 'dereference'
operation is implied at use.
So it's not that shorthands are wrong. And it's not that C++ isn't
already very fundamentally used to them. But despite that, the C++
memory model is very much designed around the broken object model, and
as already shown in this thread, it causes actual immediate problems.
And it's not just C++. Rust is supposed to be the really moden thing.
And it made the *SAME* fundamental design mistake.
IOW, the whole access size problem that Boqun described is
*inherently* tied to the fact that the C++ and Rust memory model is
badly designed from the wrong principles.
Instead of designing it as a "this is an atomic object that you can do
these operations on", it should have been "this is an atomic access,
and you can use this simple object model to have the compiler generate
the accesses for you".
This is why I claim that LKMM is fundamentally better. It didn't start
out from a bass-ackwards starting point of marking objects "atomic".
And yes, the LKMM is a bit awkward, because we don't have the
shorthands, so you have to write out "atomic_read()" and friends.
Tough. It's better to be correct than to be simple.
Linus
^ permalink raw reply [relevance 84%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-25 17:44 80% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-25 17:44 UTC (permalink / raw)
To: Philipp Stanner
Cc: Kent Overstreet, Boqun Feng, rust-for-linux, linux-kernel,
linux-arch, llvm, Miguel Ojeda, Alex Gaynor,
Wedson Almeida Filho, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Alan Stern,
Andrea Parri, Will Deacon, Peter Zijlstra, Nicholas Piggin,
David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
Akira Yokosawa, Daniel Lustig, Joel Fernandes, Nathan Chancellor,
Nick Desaulniers, kent.overstreet, Greg Kroah-Hartman, elver,
Mark Rutland, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Catalin Marinas,
linux-arm-kernel, linux-fsdevel
On Mon, 25 Mar 2024 at 06:57, Philipp Stanner <pstanner@redhat.com> wrote:
>
> On Fri, 2024-03-22 at 17:36 -0700, Linus Torvalds wrote:
> >
> > It's kind of like our "volatile" usage. If you read the C (and C++)
> > standards, you'll find that you should use "volatile" on data types.
> > That's almost *never* what the kernel does. The kernel uses
> > "volatile"
> > in _code_ (ie READ_ONCE() etc), and uses it by casting etc.
> >
> > Compiler people don't tend to really like those kinds of things.
>
> Just for my understanding: Why don't they like it?
So I actually think most compiler people are perfectly fine with the
kernel model of mostly doing 'volatile' not on the data structures
themselves, but as accesses through casts.
It's very traditional C, and there's actually nothing particularly odd
about it. Not even from a compiler standpoint.
In fact, I personally will argue that it is fundamentally wrong to
think that the underlying data has to be volatile. A variable may be
entirely stable in some cases (ie locks held), but not in others.
So it's not the *variable* (aka "object") that is 'volatile', it's the
*context* that makes a particular access volatile.
That explains why the kernel has basically zero actual volatile
objects, and 99% of all volatile accesses are done through accessor
functions that use a cast to mark a particular access volatile.
But I've had negative comments from compiler people who read the
standards as language lawyers (which honestly, I despise - it's always
possible to try to argue what the meaning of some wording is), and
particularly C++ people used to be very very antsy about "volatile".
They had some truly _serious_ problems with volatile.
The C++ people spent absolutely insane amounts of time arguing about
"volatile objects" vs "accesses", and how an access through a cast
didn't make the underlying object volatile etc.
There were endless discussions because a lvalue isn't supposed to be
an access (an lvalue is something that is being acted on, and it
shouldn't imply an access because an access will then cause other
things in C++). So a statement expression that was just an lvalue
shouldn't imply an access in C++ originally, but obviously when the
thing was volatile it *had* to do so, and there was gnashing of teeth
over this all.
And all of it was purely semantic nitpicking about random wording. The
C++ people finally tried to save face by claiming that it was always
the C (not C++) rules that were unclear, and introduced the notion of
"glvalue", and it's all good now, but there's literally decades of
language lawyering and pointless nitpicking about the difference
between "objects" and "accesses".
Sane people didn't care, but if you reported a compiler bug about
volatile use, you had better be ready to sit back and be flamed for
how your volatile pointer cast wasn't an "object" and that the
compiler that clearly generated wrong code was technically correct,
and that your mother was a hamster.
It's a bit like the NULL debacle. Another thing that took the C++
people a couple of decades to admit they were wrong all along, and
that NULL isn't actually 'integer zero' in any sane language that
claims to care deeply about types.
[ And again, to save face, at no point did they say "ok, '(void *)0'
is fine" - they introduced a new __nullptr thing just so that they
wouldn't have to admit that their decades of arguing was just them
being wrong. You'll find another decade of arguments explaining the
finer details about _that_ difference ]
It turns out that the people who are language-lawyering nitpickers are
then happy to be "proven right" by adding some more pointless
syntacting language-lawyering language.
Which I guess makes sense, but to the rest of us it all looks a bit pointless.
Linus
^ permalink raw reply [relevance 80%]
* Linux 6.9-rc1
@ 2024-03-24 21:56 72% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-24 21:56 UTC (permalink / raw)
To: Linux Kernel Mailing List
So two weeks have passed, the merge window is over, and v6.9-rc1 is
tagged and pushed out.
This merge window looks to be fairly normal. If you look at the diffs,
you'd think that the bulk of all the changes are AMD GPU header files
again, and you'd not be entirely wrong. About 40% of the whole 6.9rc1
patch is indeed just the auto-generated AMD GPU definitions. I wish
this was unusual, but it's a pattern.
Anyway, while that is a lot of the actual changes by pure line
numbers, it's all just basically noise and not meaningful in the big
picture.
In contrast, what _is_ meaningful is a couple of very core updates.
The timer subsystem had a fairly big rewrite, to have per-cpu timer
wheels to improve performance of timers, which can be a big deal
particularly for networking. The other fairly notable core update is
to the workqueue subsystem, where one notable addition is for BH
workqueue support. That's notable mainly because it means we finally
have a way away from tasklets. The tasklet interface has basically
been deprecated for a long while, but we've never really had any good
alternatives (with threaded interrupt handlers being one suggested
use-case, but not realistic in many cases).
The core updates should be entirely invisible to users, as they don't
involve any semantic changes, just expanded capabilities. Of course,
being somewhat big changes, they did cause a few issues, but we've
hopefully already caught all the big deals.
Anyway, there's obviously also all the usual updates, and even when
you ignore the recurring AMD header drop more than half of actual
patch is - as usual - various driver updates all over. And all the
other usual suspects: architecture updates, various filesystems (old
ntfs core removal might be worth noting), core networking, VM and
kernel. And tooling and documentation.
Please commence testing,
Linus
---
Alex Williamson (1):
VFIO updates
Alexandre Belloni (2):
i3c updates
RTC updates
Amir Goldstein (1):
overlayfs fixes
Andreas Larsson (1):
sparc updates
Andrew Morton (2):
MM updates
non-MM updates
Andy Shevchenko (1):
auxdisplay updates
Ard Biesheuvel (3):
EFI updates
EFI fix
EFI fixes
Arnd Bergmann (6):
SoC device tree updates
ARM SoC driver updates
ARM SoC code updates
ARM defconfig updates
asm-generic updates
more ARM SoC updates
Bartosz Golaszewski (1):
gpio updates
Bjorn Andersson (3):
remoteproc updates
rpmsg updates
hwspinlock updates
Bjorn Helgaas (1):
PCI updates
Boqun Feng (1):
RCU updates
Borislav Petkov (7):
RAS fixlet
x86 cpu update
x86 MTRR update
resource control updates
x86 SEV updates
misc x86 fixes
EDAC updates
Casey Schaufler (1):
smack updates
Catalin Marinas (2):
arm64 updates
arm64 fixes
Chandan Babu (2):
xfs updates
xfs fixes
Christian Brauner (8):
misc vfs updates
ntfs update
iomap updates
pdfd updates
file locking updates
block handle updates
vfs uuid updates
vfs fixes
Christoph Hellwig (2):
dma-mapping updates
dma-mapping fixes
Chuck Lever (1):
nfsd updates
Damien Le Moal (1):
zonefs update
Dan Williams (1):
CXL updates
Dave Airlie (2):
drm updates
drm fixes
Dave Hansen (4):
x86 mm updates
x86 tdx update
x86 RFDS mitigation
x86 APIC fixup
Dave Jiang (1):
libnvdimm updates
David Sterba (3):
btrfs updates
affs update
btrfs fix
David Teigland (1):
dlm updates
Dmitry Torokhov (1):
input updates
Dominik Brodowski (1):
PCMCIA updates
Eric Biggers (2):
fscrypt updates
fsverity update
Eric Van Hensbergen (1):
9p updates
Gao Xiang (1):
erofs updates
Geert Uytterhoeven (1):
m68k updates
Greg KH (5):
USB / Thunderbolt updates
tty / serial driver updates
staging driver updates
char/misc and other driver subsystem updates
driver core updates
Guenter Roeck (1):
hwmon updates
Heiko Carstens (2):
s390 updates
more s390 updates
Helge Deller (2):
parisc architecture updates and fixes
fbdev updates
Herbert Xu (1):
crypto updates
Huacai Chen (1):
LoongArch updates
Ilpo Järvinen (1):
x86 platform driver updates
Ilya Dryomov (1):
ceph updates
Ingo Molnar (10):
locking updates
scheduler updates
x86 asm updates
x86 build updates
x86 cleanups
core x86 updates
x86 boot updates
x86 perf event fixes
timer fix
irq fix
Jaegeuk Kim (1):
f2fs update
Jakub Kicinski (2):
networking updates
networking fixes
James Bottomley (2):
SCSI updates
more SCSI updates
Jan Kara (2):
fsnotify updates
ext2, isofs, udf, and quota updates
Jarkko Sakkinen (1):
tpm updates
Jason Gunthorpe (1):
rdma updates
Jassi Brar (1):
mailbox updates
Jens Axboe (5):
io_uring updates
block updates
block fixes
more io_uring updates
more block updates
Jiri Kosina (1):
HID updates
Joel Granados (1):
sysctl updates
Joerg Roedel (1):
iommu updates
John Paul Adrian Glaubitz (1):
sh updates
Jonathan Corbet (2):
documentation updates
more documentation updates
Juergen Gross (1):
xen updates
Julia Lawall (1):
coccinelle update
Kees Cook (5):
pstore updates
execve updates
hardening updates
seccomp updates
more hardening updates
Kent Overstreet (2):
bcachefs updates
bcachefs fixes
Lee Jones (3):
MFD updates
backlight updates
LED updates
Linus Walleij (1):
pin control updates
Luis Chamberlain (1):
modules updates
Mark Brown (5):
regmap updates
regulator updates
spi updates
regulator fix
spi fixes
Masahiro Yamada (1):
Kbuild updates
Masami Hiramatsu (1):
probes updates
Mauro Carvalho Chehab (1):
media updates
Michael Ellerman (2):
powerpc updates
more powerpc updates
Michael Tsirkin (1):
virtio updates
Mickaël Salaün (1):
landlock updates
Miguel Ojeda (2):
compiler attributes update
Rust updates
Mike Marshall (1):
orangefs updates
Mike Snitzer (4):
device mapper updates
device mapper BH workqueue conversion
device mapper VDO target
device mapper fixes
Miklos Szeredi (1):
fuse updates
Miquel Raynal (1):
MTD updates
Namhyung Kim (1):
perf tools updates
Namjae Jeon (1):
exfat updates
Niklas Cassel (2):
ata updates
ata fix
Palmer Dabbelt (1):
RISC-V updates
Paolo Bonzini (1):
kvm updates
Paul Moore (4):
audit updates
selinux updates
lsm updates
lsm fixes
Petr Mladek (1):
printk updates
Rafael Wysocki (6):
power management updates
ACPI updates
thermal control updates
more thermal control updates
more ACPI updates
more power management updates
Richard Weinberger (1):
UBI and UBIFS updates
Rob Herring (1):
devicetree updates
Russell King (1):
ARM updates
Sebastian Reichel (2):
HSI updates
power supply and reset updates
Shuah Khan (2):
kselftest update
KUnit updates
Stafford Horne (1):
OpenRISC updates
Stephen Boyd (1):
clk updates
Steve French (3):
smb client updates
smb server updates
smb client fixes
Steven Rostedt (4):
tracing updates
tracing updates
ktest updates
trace tool updates
Takashi Iwai (3):
sound updates
sound fixes
more sound fixes
Takashi Sakamoto (1):
firewire updates
Ted Ts'o (1):
ext4 updates
Tejun Heo (3):
workqueue updates
workqueue BH conversions
cgroup updates
Thomas Bogendoerfer (1):
MIPS updates
Thomas Gleixner (14):
irq updates
MSI updates
cpu core updates
clocksource updates
timer updates
x86 APIC updates
x86 FRED support
x86 entry update
core entry fix
irq fixes
more clocksource updates
timer fixes
scheduler doc clarification
x86 fixes
Trond Myklebust (1):
NFS client updates
Tzung-Bi Shih (1):
chrome platform firmware updates
Ulf Hansson (2):
MMC updates
pmdomain updates
Uwe Kleine-König (2):
pwm updates
siox updates
Vinod Koul (3):
soundwire updates
dmaengine updates
phy updates
Vlastimil Babka (1):
slab updates
Wei Liu (1):
hyperv updates
Wim Van Sebroeck (1):
watchdog updates
Wolfram Sang (2):
i2c updates
more i2c updates
Yury Norov (1):
bitmap updates
^ permalink raw reply [relevance 72%]
* Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat()
@ 2024-03-24 17:44 94% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-24 17:44 UTC (permalink / raw)
To: Al Viro
Cc: Vlastimil Babka, Josh Poimboeuf, Jeff Layton, Chuck Lever,
Kees Cook, Christoph Lameter, Pekka Enberg, David Rientjes,
Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Christian Brauner, Jan Kara, linux-mm, linux-kernel, cgroups,
linux-fsdevel
[ Al, I hope your email works now ]
On Sat, 23 Mar 2024 at 19:27, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We can have the same file occuring in many slots of many descriptor tables,
> obviously. So it would have to be a flag (in ->f_mode?) set by it, for
> "someone's already charged for it", or you'll end up with really insane
> crap on each fork(), dup(), etc.
Nope.
That flag already exists in the slab code itself with this patch. The
kmem_cache_charge() thing itself just sets the "I'm charged" bit in
the slab header, and you're done. Any subsequent fd_install (with dup,
or fork or whatever) simply is irrelevant.
In fact, dup and fork and friends won't need to worry about this,
because they only work on files that have already been installed, so
they know the file is already accounted.
So it's only the initial open() case that needs to do the
kmem_cache_charge() as it does the fd_install.
> But there's also MAP_ANON with its setup_shmem_file(), with the resulting
> file not going into descriptor tables at all, and that's not a rare thing.
Just making alloc_file_pseudo() do a SLAB_ACOUNT should take care of
all the normal case.
For once, the core allocator is not exposed very much, so we can
literally just look at "who does alloc_file*()" and it turns out it's
all pretty well abstracted out.
So I think it's mainly the three cases of 'alloc_empty_file()' that
would be affected and need to check that they actually do the
fd_install() (or release it).
Everything else should either not account at all (if they know they
are doing temporary kernel things), or always account (eg
alloc_file_pseudo()).
Linus
^ permalink raw reply [relevance 94%]
* Re: [PATCH v4 00/16] x86-64: Stack protector and percpu improvements
2024-03-23 16:16 94% ` Linus Torvalds
@ 2024-03-23 17:06 96% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-23 17:06 UTC (permalink / raw)
To: Brian Gerst, Arnd Bergmann
Cc: Uros Bizjak, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
Borislav Petkov, H . Peter Anvin, David.Laight
On Sat, 23 Mar 2024 at 09:16, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And we might as well also do the semi-yearly compiler version review.
> We raised the minimum to 4.9 almost four years ago, and then the jump
> to 5.1 was first for arm64 due to a serious gcc code generation bug
> and then globally in Sept 2021.
Looking at RHEL, I find a page that claims
RHEL9 : gcc 11.x in app stream
RHEL8 : gcc 8.x or gcc 9.x in app stream.
RHEL7 : gcc 4.8.x
so RHEL7 is already immaterial from a kernel compiler standpoint, and
so it looks like at least as far as RHEL is concerned, we could just
jump to gcc 8.1 as a minimum.
RHEL also has a "Developer Toolset" that allows you to pick a compiler
upgrade, so it's not *quite* as black-and-white as that, but it does
seem like we could at some point just pick gcc-8 as a new minimum with
very little pain on that front.
The SLES situation seems somewhat similar, with SLES12 being 4.8.x and
SLES15 being 7.3. But again with a "Development Tools Module" setup.
So that *might* argue for 7.3.
I can't make sense of Debian releases. There's "stable" (bookworm)
that comes with gcc-12.2, but there's oldstable, oldoldstable, and
various "archived" releases still under LTS. I can't even begin to
guess what may be relevant.
I don't think we care that deeply on the kernel side, other than a
"maybe we should be a bit more proactive about raising gcc version
requirements". I don't think we have any huge issues right now with
old gcc versions.
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH v4 00/16] x86-64: Stack protector and percpu improvements
@ 2024-03-23 16:16 94% ` Linus Torvalds
2024-03-23 17:06 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23 16:16 UTC (permalink / raw)
To: Brian Gerst, Arnd Bergmann
Cc: Uros Bizjak, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
Borislav Petkov, H . Peter Anvin, David.Laight
On Sat, 23 Mar 2024 at 06:23, Brian Gerst <brgerst@gmail.com> wrote:
>
> One small issue is that Kconfig would silently disable istackprotector
> if the compiler doesn't support the new options. That said, the
> number of people that this would affect is very small, as just about
> any modern distribution ships a compiler newer than 8.1.
Yes, let's make the rule be that you can still compile the kernel with
gcc-5.1+, but you can't get stackprotector support unless you have
gcc-8.1+.
I'd hate to add the objtool support for an old compiler - this is a
hardening feature, not a core feature, and anybody who insists on old
compilers just won't get it.
And we have other cases like this where various debug features depend
on the gcc version, eg
config CC_HAS_WORKING_NOSANITIZE_ADDRESS
def_bool !CC_IS_GCC || GCC_VERSION >= 80300
so we could easily do the same for stack protector support.
And we might as well also do the semi-yearly compiler version review.
We raised the minimum to 4.9 almost four years ago, and then the jump
to 5.1 was first for arm64 due to a serious gcc code generation bug
and then globally in Sept 2021.
So it's probably time to think about that anyway,
That said, we don't actually have all that many gcc version checks
around any more, so I think the jump to 5.1 got rid of the worst of
the horrors. Most of the GCC_VERSION checks are either in gcc-plugins
(which we should just remove, imnsho - not the version checks, the
plugins entirely), or for various random minor details (warnign
enablement and the asm goto workaround).
So there doesn't seem to be a major reason to up the versioning, since
the stack protector thing can just be disabled for older versions.
But maybe even enterprise distros have upgraded anyway, and we should
be proactive.
Cc'ing Arnd, who has historically been one of the people pushing this.
He may no longer care because we haven't had huge issues.
Linus
^ permalink raw reply [relevance 94%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-23 0:36 84% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23 0:36 UTC (permalink / raw)
To: Kent Overstreet
Cc: Boqun Feng, rust-for-linux, linux-kernel, linux-arch, llvm,
Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Fri, 22 Mar 2024 at 17:21, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Besides that there's cross arch support to think about - it's hard to
> imagine us ever ditching our own atomics.
Well, that's one of the advantages of using compiler builtins -
projects that do want cross-architecture support, but that aren't
actually maintaining their _own_ architecture support.
So I very much see the lure of compiler support for that kind of
situation - to write portable code without having to know or care
about architecture details.
This is one reason I think the kernel is kind of odd and special -
because in the kernel, we obviously very fundamentally have to care
about the architecture details _anyway_, so then having the
architecture also define things like atomics is just a pretty small
(and relatively straightforward) detail.
The same argument goes for compiler builtins vs inline asm. In the
kernel, we have to have people who are intimately familiar with the
architecture _anyway_, so inline asms and architecture-specific header
files aren't some big pain-point: they'd be needed _anyway_.
But in some random user level program, where all you want is an
efficient way to do "find first bit"? Then using a compiler intrinsic
makes a lot more sense.
> I was thinking about something more incremental - just an optional mode
> where our atomics were C atomics underneath. It'd probably give the
> compiler people a much more effective way to test their stuff than
> anything they have now.
I suspect it might be painful, and some compiler people would throw
their hands up in horror, because the C++ atomics model is based
fairly solidly on atomic types, and the kernel memory model is much
more fluid.
Boqun already mentioned the "mixing access sizes", which is actually
quite fundamental in the kernel, where we play lots of games with that
(typically around locking, where you find patterns line unlock writing
a zero to a single byte, even though the whole lock data structure is
a word). And sometimes the access size games are very explicit (eg
lib/lockref.c).
But it actually goes deeper than that. While we do have "atomic_t" etc
for arithmetic atomics, and that probably would map fairly well to C++
atomics, in other cases we simply base our atomics not on _types_, but
on code.
IOW, we do things like "cmpxchg()", and the target of that atomic
access is just a regular data structure field.
It's kind of like our "volatile" usage. If you read the C (and C++)
standards, you'll find that you should use "volatile" on data types.
That's almost *never* what the kernel does. The kernel uses "volatile"
in _code_ (ie READ_ONCE() etc), and uses it by casting etc.
Compiler people don't tend to really like those kinds of things.
Linus
^ permalink raw reply [relevance 84%]
* Re: [WIP 0/3] Memory model and atomic API in Rust
@ 2024-03-23 0:12 88% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-23 0:12 UTC (permalink / raw)
To: Kent Overstreet
Cc: Boqun Feng, rust-for-linux, linux-kernel, linux-arch, llvm,
Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra,
Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
Nathan Chancellor, Nick Desaulniers, kent.overstreet,
Greg Kroah-Hartman, elver, Mark Rutland, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Catalin Marinas, linux-arm-kernel, linux-fsdevel
On Fri, 22 Mar 2024 at 16:57, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> I wonder about that. The disadvantage of only supporting LKMM atomics is
> that we'll be incompatible with third party code, and we don't want to
> be rolling all of our own data structures forever.
Honestly, having seen the shit-show that is language standards bodies
and incomplete compiler support, I do not understand why people think
that we wouldn't want to roll our own.
The C++ memory model may be reliable in another decade. And then a
decade after *that*, we can drop support for the pre-reliable
compilers.
People who think that compilers do things right just because they are
automated simply don't know what they are talking about.
It was just a couple of days ago that I was pointed at
https://github.com/llvm/llvm-project/issues/64188
which is literally the compiler completely missing a C++ memory barrier.
And when the compiler itself is fundamentally buggy, you're kind of
screwed. When you roll your own, you can work around the bugs in
compilers.
And this is all doubly true when it is something that the kernel does,
and very few other projects do. For example, we're often better off
using inline asm over dubious builtins that have "native" compiler
support for them, but little actual real coverage. It really is often
a "ok, this builtin has actually been used for a decade, so it's
hopefully stable now".
We have years of examples of builtins either being completely broken
(as in "immediate crash" broken), or simply generating crap code that
is actively worse than using the inline asm.
The memory ordering isn't going to be at all different. Moving it into
the compiler doesn't solve problems. It creates them.
Linus
^ permalink raw reply [relevance 88%]
* Re: [GIT PULL] Hyper-V commits for 6.9
@ 2024-03-22 23:42 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-22 23:42 UTC (permalink / raw)
To: Wei Liu; +Cc: Linux Kernel List, Linux on Hyper-V List, kys, haiyangz, decui
On Fri, 22 Mar 2024 at 16:25, Wei Liu <wei.liu@kernel.org> wrote:
>
> Hmm... I thought I refreshed it right before the expiration date. I
> pushed it to Ubuntu's keyserver.
Ok, I can find it there.
> I will check if something's wrong.
>
> Do you have a keyserver that you prefer?
The problem with keyservers is that there's so many of them, and
everybody uses different keyservers, and the propagation of pgp keys
across keyservers hasn't really worked for over a decade by now.
Maybe keys eventually propagate, but I have my doubts.
My default keyserver appears to be hkps://keys.openpgp.org, but the
pgp key git tree on kernel.org is the one I then look at when some key
isn't there (or is there, but hasn't been updated).
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] SCSI postmerge updates for the 6.8+ merge window
@ 2024-03-22 20:34 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-22 20:34 UTC (permalink / raw)
To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel
On Fri, 22 Mar 2024 at 13:24, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> OK, try this (I've updated the scsi-misc tag with it as well)
Well there we go. I really had no idea what the pull was supposed to do.
And while I end up looking at individual commits for random smaller
subsystems when it's unclear (sometimes just for language barrier
issues), for long-time maintainers of bigger stuff I kind of expect
better.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] SCSI postmerge updates for the 6.8+ merge window
@ 2024-03-22 19:55 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-22 19:55 UTC (permalink / raw)
To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel
On Fri, 22 Mar 2024 at 12:12, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> Eleven patches that are based on the rw_hint branch of the vfs tree
> which contained the base block and fs changes needed to support this.
> 8 patches are in the debug driver and 3 in the core.
Please people - the number of patches involved is entirely immaterial.
I want my merge messages to say what those patches *do*?
This whole "how many patches" thing is a disease. It's not even
remotely interesting. I see the size of the patch in the diffstat, and
that actually has some meaning in the sense of "how much does this
pull actually change", whether it's in one patch or a hundred.
I have absolutely *zero* idea what the above pull request actually
asks me to pull.
So I won't.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
@ 2024-03-21 20:28 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 20:28 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm
On Thu, 21 Mar 2024 at 11:30, Nathan Chancellor <nathan@kernel.org> wrote:
>
> Since GCC does not appear emit warnings for newer C features that it
> allows even with older 'gnu' standard values by default (I think it does
> with '-pedantic'?), perhaps we should just disable -Wc23-extensions
> altogether? Not sure how big of a hammer this is, I think this type of
> warning is the only thing I have seen come from -Wc23-extensions...
It looks like adding -Wno-c23-extensions would only work with more
recent clang versions, so it wouldn't actually fix the build problems,
just make them even harder for developers to actually notice.
Oh well. It's not like this is all that common a problem, so I think
we'll just have to live with it, and hope that people don't do that
"label at end of statement" very often.
(I think it's case statements too, not just labels, too lazy to look
up the details again)
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
2024-03-21 18:10 99% ` Linus Torvalds
@ 2024-03-21 18:12 99% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:12 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm
On Thu, 21 Mar 2024 at 11:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the "labels without a statement" thing is not only a long-time gcc
> behavior (admittedly due to a parsing bug), afaik it's becoming
> "standard C" in C23.
Actually, let me take that back. I think it's only a proposal (WG14
N2508), I have no idea if it's actually going to be standard.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Char/Misc driver changes for 6.9-rc1
@ 2024-03-21 18:10 99% ` Linus Torvalds
2024-03-21 18:12 99% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:10 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Greg KH, Andrew Morton, Arnd Bergmann, linux-kernel, llvm
On Thu, 21 Mar 2024 at 06:48, Nathan Chancellor <nathan@kernel.org> wrote:
>
> That build warning actually happens with clang, not GCC as far as I am
> aware, and it is actually a hard build error with older versions of
> clang
So the "labels without a statement" thing is not only a long-time gcc
behavior (admittedly due to a parsing bug), afaik it's becoming
"standard C" in C23.
Does clang have a flag to allow this?
Considering that gcc doesn't warn for it, and that it will become
official at some point anyway, I think this might be a thing that we
might be better off just accepting, rather than be in the situation
where people write code that compiles fine with gcc and don't notice
that clang will error out.
So yes, clang is being correct, but in this case it only causes problems.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] remoteproc updates for v6.9
@ 2024-03-21 18:05 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-21 18:05 UTC (permalink / raw)
To: Bjorn Andersson
Cc: linux-remoteproc, linux-kernel, Andrew Davis, Neil Armstrong,
Arnaud Pouliquen, Krzysztof Kozlowski, Sibi Sankar, Abel Vesa,
Dmitry Baryshkov, Joakim Zhang, Mathieu Poirier
On Thu, 21 Mar 2024 at 11:03, Bjorn Andersson <andersson@kernel.org> wrote:
>
> I was further notified that this conflicts with your tree, Linus. Below
> is the resolution for this conflict.
Heh. This email came in after the pr-tracker-bot email notifying you
that it's already done..
I think I got it all right, it didn't seem at all controversial, but
maybe you should double-check.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Hyper-V commits for 6.9
@ 2024-03-21 17:06 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-21 17:06 UTC (permalink / raw)
To: Wei Liu; +Cc: Linux Kernel List, Linux on Hyper-V List, kys, haiyangz, decui
On Wed, 20 Mar 2024 at 21:09, Wei Liu <wei.liu@kernel.org> wrote:
>
> ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20240320
Pulled, but...
Your pgp key expired two weeks ago. Please extend the expiration date
(and not something small!) and make sure to refresh the kernel.org
repo and/or other keyservers.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing/tools: Updates for v6.9
@ 2024-03-20 23:40 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-20 23:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Daniel Bristot de Oliveira
On Wed, 20 Mar 2024 at 08:19, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Update makefiles for latency-collector and RTLA, using tools/build/
> makefiles like perf does, inheriting its benefits.
Lovely. Now it all worked for me, and gave me the legible
Auto-detecting system features:
... libtraceevent: [ on ]
... libtracefs: [ OFF ]
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
Makefile.config:29: *** Please, check the errors above.. Stop.
and after installing libtracefs-devel as suggested it finished cleanly.
Let's see if it works for others too,
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v2 1/3] mm: kmsan: implement kmsan_memmove()
@ 2024-03-20 16:04 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-20 16:04 UTC (permalink / raw)
To: Alexander Potapenko
Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86, Tetsuo Handa,
Dmitry Vyukov, Marco Elver
On Wed, 20 Mar 2024 at 03:18, Alexander Potapenko <glider@google.com> wrote:
>
> Provide a hook that can be used by custom memcpy implementations to tell
> KMSAN that the metadata needs to be copied. Without that, false positive
> reports are possible in the cases where KMSAN fails to intercept memory
> initialization.
Thanks, the series looks fine to me now with the updated 3/3.
I assume it will go through Andrew's -mm tree?
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL v2] tracing: Updates for v6.9
@ 2024-03-19 21:22 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 21:22 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
Alison Schofield, Beau Belgrave, Huang Yiwei, John Garry,
Randy Dunlap, Thorsten Blum, Vincent Donnefort, linke li, llvm
On Tue, 19 Mar 2024 at 14:03, Nathan Chancellor <nathan@kernel.org> wrote:
>
> For what it's worth, I applied that change and built ARCH=x86_64
> defconfig with LLVM 18.1.1 from [1] but it does not appear to help the
> instances of -Wstring-compare; in fact, it adds some additional warnings
> that I have not seen before. I have attached the full build log.
Hmm. I'm no longer seeing any problems with commit 24f5bb9f24ad
("tracing: Just use strcmp() for testing __string() and __assign_str()
match").
But that's clang 17.0.6.
The patch that Steven sent out (and that I applied) is a bit different
from his "I'll change it to this" email, though. A couple of casts and
parentheses different.
So maybe current -git works for you?
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] more s390 updates for 6.9 merge window
@ 2024-03-19 18:54 97% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 18:54 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Vasily Gorbik, Alexander Gordeev, linux-s390, linux-kernel
On Tue, 19 Mar 2024 at 07:12, Heiko Carstens <hca@linux.ibm.com> wrote:
>
> - Add new bitwise types and helper functions and use them in s390 specific
> drivers and code to make it easier to find virtual vs physical address
> usage bugs.
Hmm. Because you still want to be able to do arithmetic on them, this
is really what "__nocast" should be used for rather than "__bitwise".
__bitwise was intended (as the name implies) for things that can only
be mixed bitwise with similar types. It was _mainly_ for big-endian vs
little-endian marking, where it's actually perfectly fine to do
bitwise operations on two big-endian values without ever translation
them to "cpu endianness", but you can't for example do normal
arithmetic on them.
So __bitwise has those very specific rules that seem odd until you
realize what the reason for them are.
In contrast, your types actually *would* be fine with arithmetic and
logical operations being done on them, and that is what "__nocast"
really was meant to be.
But we basically never had much use for __nocast in the kernel, and
largely as a result __nocast was never fleshed out to work very well
(and it gets lost *much* too easily), so __bitwise it is.
Oh well.
It looks like it's not a lot of arithmetic you want to allow anyway,
so I guess the fact that __bitwise forces you to do some silly helper
functions for that isn't too much of an issue.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] virtio: features, fixes
@ 2024-03-19 18:03 98% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 18:03 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: kvm, virtualization, netdev, linux-kernel, alex.williamson,
andrew, david, dtatulea, eperezma, feliu, gregkh, jasowang,
jean-philippe, jonah.palmer, leiyang, lingshan.zhu,
maxime.coquelin, ricardo, shannon.nelson, stable, steven.sistare,
suzuki.poulose, xuanzhuo, yishaih
On Tue, 19 Mar 2024 at 00:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> virtio: features, fixes
>
> Per vq sizes in vdpa.
> Info query for block devices support in vdpa.
> DMA sync callbacks in vduse.
>
> Fixes, cleanups.
Grr. I thought the merge message was a bit too terse, but I let it slide.
But only after pushing it out do I notice that not only was the pull
request message overly terse, you had also rebased this all just
moments before sending the pull request and didn't even give a hit of
a reason for that.
So I missed that, and the merge is out now, but this was NOT OK.
Yes, rebasing happens. But last-minute rebasing needs to be explained,
not some kind of nasty surprise after-the-fact.
And that pull request explanation was really borderline even *without*
that issue.
Linus
^ permalink raw reply [relevance 98%]
* Re: [PATCH v1 3/3] x86: call instrumentation hooks from copy_mc.c
@ 2024-03-19 17:58 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 17:58 UTC (permalink / raw)
To: Alexander Potapenko
Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86,
Dmitry Vyukov, Marco Elver, Tetsuo Handa
On Tue, 19 Mar 2024 at 09:37, Alexander Potapenko <glider@google.com> wrote:
>
> if (copy_mc_fragile_enabled) {
> __uaccess_begin();
> + instrument_copy_to_user(dst, src, len);
> ret = copy_mc_fragile((__force void *)dst, src, len);
> __uaccess_end();
I'd actually prefer that instrument_copy_to_user() to be *outside* the
__uaccess_begin.
In fact, I'm a bit surprised that objtool didn't complain about it in that form.
__uaccess_begin() causes the CPU to accept kernel accesses to user
mode, and I don't think instrument_copy_to_user() has any business
actually touching user mode memory.
In fact it might be better to rename the function and change the prototype to
instrument_src(src, len);
because you really can't sanely instrument the destination of a user
copy, but "instrument_src()" might be useful in other situations than
just user copies.
Hmm?
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v1 2/3] instrumented.h: add instrument_memcpy_before, instrument_memcpy_after
@ 2024-03-19 17:52 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-19 17:52 UTC (permalink / raw)
To: Alexander Potapenko
Cc: akpm, linux-kernel, linux-mm, kasan-dev, tglx, x86,
Dmitry Vyukov, Marco Elver, Tetsuo Handa
On Tue, 19 Mar 2024 at 09:37, Alexander Potapenko <glider@google.com> wrote:
>
> +/**
> + * instrument_memcpy_after - add instrumentation before non-instrumented memcpy
Spot the cut-and-paste.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL v2] tracing: Updates for v6.9
@ 2024-03-19 16:23 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-19 16:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Alison Schofield,
Beau Belgrave, Huang Yiwei, John Garry, Randy Dunlap,
Thorsten Blum, Vincent Donnefort, linke li
On Mon, 18 Mar 2024 at 08:28, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Added checks to make sure that the source of __string() is also the
> source of __assign_str() so that it can be safely removed in the next
> merge window.
Aargh.
I didn't notice this initially, because it doesn't happen with gcc (or
maybe not with allmodconfig), but with clang I get
CC [M] net/sunrpc/sched.o
In file included from net/sunrpc/sched.c:31:
In file included from ./include/trace/events/sunrpc.h:2524:
In file included from ./include/trace/define_trace.h:102:
In file included from ./include/trace/trace_events.h:419:
include/trace/events/sunrpc.h:707:4: error: result of comparison
against a string literal is unspecified (use an explicit string
comparison function instead) [-Werror,-Wstring-compare]
and then about 250 lines ot messy "explanations" for how it was
expanded because it happens on line 709 too in the same macro, and it
ends up being three macros deep or something.
So no, this all needs to be re-done. That
WARN_ON_ONCE(__builtin_constant_p(src) ? \
strcmp((src), __data_offsets.dst##_ptr_) : \
(src) != __data_offsets.dst##_ptr_); \
does *NOT* work.
Also, looking at that __assign_str() macro, it seems literally insane.
On the next line it will do
memcpy(__str__, __data_offsets.dst##_ptr_ ? : \
EVENT_NULL_STR, __len__); \
so now it checks "__data_offsets.dst##_ptr_" for NULL - but that's one
line after it just did that strcmp on it.
WTF?
This code is completely bogus.
Linus
^ permalink raw reply [relevance 96%]
* Re: [GIT PULL v2] dlm fixes for 6.9
@ 2024-03-18 22:44 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-18 22:44 UTC (permalink / raw)
To: David Teigland; +Cc: linux-kernel, gfs2
On Mon, 18 Mar 2024 at 14:25, David Teigland <teigland@redhat.com> wrote:
>
> I dropped the commit with the bad atomic usage, and replaced it with two
> other commits: the first reverts the unnecessary change that began using
> atomic_t for lkb_wait_count, and the second adds comments to the recovery
> code that forcibly resets the wait_count state.
Ok, the diff certainly looks saner. I didn't look at the code outside
the context of the diff, so that's literally just going by the patches
themselves, but I appreciate the comment ("The wait_count will almost
always be 1, but in case of an overlapping unlock/cancel it could be
2: see ..") and yes, it just makes the old atomic thing sound even
odder.
Thanks,
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] vfs fixes
2024-03-18 19:14 92% ` Linus Torvalds
@ 2024-03-18 19:41 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-18 19:41 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Mon, 18 Mar 2024 at 12:14, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, isn't the 'get()' always basically paired with the mounting? And
> the 'put()' would probably be best done iin kill_block_super()?
.. or alternative handwavy approach:
The fundamental _reason_ for the ->get/put seems to be to make the
'holder' lifetime be at least as long as the 'struct file' it is
associated with. No?
So how about we just make the 'holder' always *be* a 'struct file *'? That
(a) gets rid of the typeless 'void *' part
(b) is already what it is for normal cases (ie O_EXCL file opens).
wouldn't it be lovely if we just made the rule be that 'holder' *is*
the file pointer, and got rid of a lot of typeless WTF code?
Again, this comment (and the previous email) is more based on "this
does not feel right to me" than anything else.
That code just makes my skin itch. I can't say it's _wrong_, but it
just FeelsWrongToMe(tm).
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] vfs fixes
@ 2024-03-18 19:14 92% ` Linus Torvalds
2024-03-18 19:41 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-18 19:14 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Mon, 18 Mar 2024 at 05:20, Christian Brauner <brauner@kernel.org> wrote:
>
> * Take a passive reference on the superblock when opening a block device
> so the holder is available to concurrent callers from the block layer.
So I've pulled this, but I have to admit that I hate it.
The bdev "holder" logic is an abomination. And "struct blk_holder_ops"
is horrendous.
Afaik, we have exactly two cases of "struct blk_holder_ops" in the
whole kernel, and you edited one of them.
And the other one is in bcachefs, and is a completely empty one with
no actual ops, so I think that one shouldn't exist.
In other words, we have only *one* actual set of "holder ops". That
makes me suspicious in the first place.
Now, let's then look at that new "holder->put_holder" use. It has
_one_ single user too, which is bd_end_claim(), which is called from
one place, which is bdev_release(). Which in turn is called from
exactly one place, which is blkdev_release(). Which is the release
function for def_blk_fops. Which is called from __fput() on the last
release of the file.
Fine, fine, fine. So let's chase down *who* actually uses that single
"blk_holder_ops". And it turns out that it's used in three places:
fs/super.c, fs/ext4/super.c, and fs/xfs/xfs_super.c.
So in those three cases, it would be absolutely *wrong* if the
'holder' was anything but the super-block (because that's what the new
get/put functions require for any of this to work.
This all smells horribly bad to me. The code looks and acts like it is
some generic interface, but in reality it really isn't. Yes, bcachefs
seems to make up some random holder (it's a one-byte kmalloc that
isn't actually used), and a random holder op structure (it's empty, as
mentioned), but none of this makes any sense at all.
I get the feeling that the "get/put" operations should just be done in
the three places that currently use that 'fs_holder_ops'.
IOW, isn't the 'get()' always basically paired with the mounting? And
the 'put()' would probably be best done iin kill_block_super()?
I don't know. Maybe I missed something really important, but this
smells like a specific case that simply shouldn't have gotten this
kind of "generic infrastructure" solution.
Linus
^ permalink raw reply [relevance 92%]
* Re: [GIT PULL] tracing: Updates for v6.9
@ 2024-03-16 20:42 88% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 20:42 UTC (permalink / raw)
To: Borislav Petkov
Cc: Steven Rostedt, LKML, Masami Hiramatsu, Mathieu Desnoyers,
Beau Belgrave, Chengming Zhou, Huang Yiwei, John Garry,
Randy Dunlap, Thorsten Blum, Vincent Donnefort, linke li,
Daniel Bristot de Oliveira, x86-ml
On Sat, 16 Mar 2024 at 13:00, Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 16, 2024 at 11:42:42AM -0700, Linus Torvalds wrote:
> > Now, I'm not suggesting anything like the multiple topic branches from
> > -tip (from a quick check, there's been a total of 25 tip/tip topic
> > branches merged just this merge window), but for clear new features
> > definitely.
>
> So some of those branches are really tiny (1-2 patches) during some
> cycles so I have often wondered whether I should merge those small
> branches into a single pull...
>
> So as not to have too many tiny pull requests.
>
> Any preference?
Not really any strong preferences.
The really tiny ones are so easy to pull that pulling a few random
ones just isn't an issue.
I've been known to occasionally end up doing an octopus merge if I
decide that I might as well just merge multiple small branches in one
go, but honestly, I stopped doing that because it's just simpler to do
two really trivial merges than to even bother thinking about "should I
just merge these all together".
So I don't mind getting three or more random small pulls if they all
still make sense (ie they are clearly separate things).
Now, if you send me three separate pulls for basically the same
conceptual thing, that might annoy me just because it would be so
pointless.
But if it's a "one pull to fix a single-line issue in resource
control, and another pull to fix a single-line issue in objtool", then
those make perfect sense to keep separate, even if they are both
trivial and small.
And on the other hand, if you have a couple of trivial branches with
no real pattern, and decide to just merge them into one that fixes
"misc x86 problems", and the end result is still completely trivial
and there are no surprises or gotchas, that's not wrong either.
And sometimes, merging and sending me just one pull request is
absolutely the right thing.
For example, the ARM SoC trees tend to just merge "umbrella" updates
into one single pull request, and I prefer that - because I see no
point in getting ten different "this is the drivers for SoC xyz"
thing.
So then it's still a clear topic branch ("ARM SoC drivers"), but they
kept multiple branches for different SoC's and sent me just one pull
request.
End result: there's no one right thing. Make it make sense. Probably
the only real rule is
- try to keep conceptually different things separate just for cleanliness
- definitely keep fundamental new features or anything that _might_
be questionable in a branch of its own
but there aren't some kind of black-and-white rules for "this is so
small that it's not worth sending on its own".
This merge window, I think I currently have something like ~15 merges
that ended up being literally just a couple of lines (maybe spread
over two or three files). I don't mind at all. If that's all that
happened, that's fine.
Linus
^ permalink raw reply [relevance 88%]
* Re: [GIT PULL] tracing: Updates for v6.9
@ 2024-03-16 18:42 93% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-16 18:42 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
Thorsten Blum, Vincent Donnefort, linke li,
Daniel Bristot de Oliveira
On Sat, 16 Mar 2024 at 11:20, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> 1) Rebase without them (I know how much you love rebasing)
This.
Except honestly, the pulls are getting to be so complicated for me
because I have to check them, that I'd really like you to start doing
topic branches for individual things.
That's what we ended up doing with the security layers too, because
there were too many cases of "that is broken, I can't pull it", and
then having one single branch for everything meant that it was always
a "all or nothing" thing.
The security layer issues have largely gone away, but I still pull
things individually, and I think it actually ended up working out
well. Yes, I see more pulls, but not only are they clearer for me, the
code history ends up being much clearer too.
So topic branches tend to make for more actual pull requests, but when
the individual pulls are smaller and have clear "this branch does XYZ
and nothing more", it turns out that the actual effort per pull ends
up being less, and it actually clarifies things a lot too.
In fact, the x86 -tip people ended up doing topic branches just to
make things easier to review, rather than any "I can't pull that"
issues, and I think it actually ended up being something that they
preferred to do anyway.
Now, I'm not suggesting anything like the multiple topic branches from
-tip (from a quick check, there's been a total of 25 tip/tip topic
branches merged just this merge window), but for clear new features
definitely.
And no cross-merges between those topic branches, because that defeats
the whole purpose.
Do you have to do it for the current situation where I just can't take
the mmap stuff? No. But please look at it going forward.
Linus
^ permalink raw reply [relevance 93%]
* Re: [GIT PULL]: Generic phy updates for v6.9
@ 2024-03-16 18:23 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 18:23 UTC (permalink / raw)
To: Vinod Koul; +Cc: LKML
On Sat, 16 Mar 2024 at 11:05, Vinod Koul <vkoul@kernel.org> wrote:
>
> On 15-03-24, 12:22, Linus Torvalds wrote:
> >
> > That is not a valid signed tag, and I can't find one in that repo.
>
> It was pushed: tags/phy-for-6.9, I erred in generating the request for
> sure
Ahh. I did do a "git ls-remote" to try to find it, but I must have
messed up searching for it.
> git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git tags/phy-for-6.9
Thanks, now pulled.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Updates for v6.9
2024-03-16 16:59 93% ` Linus Torvalds
@ 2024-03-16 18:18 97% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 18:18 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
Thorsten Blum, Vincent Donnefort, linke li,
Daniel Bristot de Oliveira
On Sat, 16 Mar 2024 at 09:59, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> - I'd suggest marking it all VM_DONTCOPY | VM_IO | VM_DONTEXPAND to
> not let people play games with the mapping.
You already did set VM_DONTCOPY (and VM_DONTDUMP is a good idea too).
And you cleared VM_MAYWRITE. Those are all good.
I'd also suggest requiring the mma[ to be MAP_SHARED.
With a read-only mapping, that doesn't really do all that much, but I
don't think you actually need the vm_ops at all once you do everything
at mmap() time, and then it causes a SIGBUS instead of a "insert zero
page".
And _technically_ it could tell the architecture code to try to align
the mapping to the cache aliasing boundaries.
Of course, because of how you insert the meta-page at the beginning of
the mapping, you end up with the actual page table entries not aligned
anyway, so it doesn't actually help the cache coloring, but it's still
conceptually the right thing to do. So even if it ends up mostly just
a "document the fact that these are shared with the kernel" flag, I
think it's a good idea.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] tracing: Updates for v6.9
2024-03-16 16:31 99% ` Linus Torvalds
@ 2024-03-16 16:59 93% ` Linus Torvalds
2024-03-16 18:18 97% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 16:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
Thorsten Blum, Vincent Donnefort, linke li,
Daniel Bristot de Oliveira
On Sat, 16 Mar 2024 at 09:31, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So instead of merging a new feature that was mis-designed and is
> already having code working around its mis-design, I'm not merging it
> at all.
Here's a clue: when hacking up VFS code, ask for ACK's from the VFS people.
And when hacking up MM code, make damn sure that you have VM people involved.
No more of this "random code that happens to work in my tests"
garbage. Yes, I'm sure that others have done this same disgusting page
counting hack and this was copied-and-pasted from some other
disgusting source, but because of all the history, I'm now looking at
tracing pulls arefully, and I'm simply not allowing any broken hacks.
So in addition to getting actual VM people to help you with mapping
stuff (hard requirement), I would also suggest:
- your allocation has to be live over the whole mmap (and that's due
to other fundamental issues - you're not even trying to deal with
actual dynamic allocations and thank Cthulhu for that), and the code
is literally designed that way, so then faulting pages in one at a
time and refcounting them one at a time is just pointless and wrong.
Just do it all at mmap time.
- I'd suggest marking it all VM_DONTCOPY | VM_IO | VM_DONTEXPAND to
not let people play games with the mapping.
- avoid all the sub-page ref-counts entirely by using VM_PFNMAP, and
use vm_insert_pages()
and a random note:
- from a TLB pressure standpoint, it might be a good idea to try to
keep the page table entries naturally aligned, so putting that one
status page at the beginning is likely a bad idea. It will typically
mean that hardware that can silently use larger TLB entries for
aligned pages won't be able to do so.
but the effect of that is likely fairly small.
Linus
^ permalink raw reply [relevance 93%]
* Re: [GIT PULL] tracing: Updates for v6.9
@ 2024-03-16 16:31 99% ` Linus Torvalds
2024-03-16 16:59 93% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-16 16:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Beau Belgrave,
Chengming Zhou, Huang Yiwei, John Garry, Randy Dunlap,
Thorsten Blum, Vincent Donnefort, linke li,
Daniel Bristot de Oliveira
On Fri, 15 Mar 2024 at 09:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Add ring_buffer memory mappings
I pulled this, looked at it, and unpulled it again.
I don't want to have years of "fix up the mistakes after the fact".
This is all done entirely incorrectly, and just as an example of that,
subbuf_map_prepare() is another case of "tracing code works around the
fact that it did things wrong in the first place".
So instead of merging a new feature that was mis-designed and is
already having code working around its mis-design, I'm not merging it
at all.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] KVM changes for Linux 6.9 merge window
@ 2024-03-16 16:01 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 16:01 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Oliver Upton, Marc Zyngier, Catalin Marinas, Mark Rutland,
Will Deacon, linux-kernel, kvm
On Sat, 16 Mar 2024 at 01:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Linus, were you compiling with allyesconfig so that you got
> CONFIG_KVM_ARM64_RES_BITS_PARANOIA on?
Regular allmodconfig.
> You can also make CONFIG_KVM_ARM64_RES_BITS_PARANOIA depend on !COMPILE_TEST.
No.
WTF is wrong with you?
You're saying "let's turn off this compile-time sanity check when
we're doing compile testing".
That's insane.
The sanity check was WRONG. People hadn't tested it. Stephen points
out that it was reported to you almost a month ago in
https://lore.kernel.org/linux-next/20240222220349.1889c728@canb.auug.org.au/
and you're still trying to just *HIDE* this garbage?
Stop it.
Linus
^ permalink raw reply [relevance 99%]
* Re: [patch 5/9] x86: Cure per CPU madness on UP
@ 2024-03-16 1:23 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 1:23 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Guenter Roeck, LKML, x86, Uros Bizjak, linux-sparse, lkp,
oe-kbuild-all, Arnd Bergmann
On Fri, 15 Mar 2024 at 18:11, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> You wish. We still support 486 and some of the still produced 486 clones
> do not have a local APIC.
Ouch. I was _sure_ we had dropped i486 support too due to cmpxchg8b.
But apparently that was just a discussion, and my wishful thinking,
and we never actually followed through.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] Revert "KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking"
@ 2024-03-16 0:51 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-16 0:51 UTC (permalink / raw)
To: Oliver Upton
Cc: Paolo Bonzini, Marc Zyngier, Catalin Marinas, Mark Rutland,
Will Deacon, linux-kernel, kvm, kvmarm, James Morse,
Suzuki K Poulose, Zenghui Yu
On Fri, 15 Mar 2024 at 17:25, Oliver Upton <oliver.upton@linux.dev> wrote:
>
> This reverts commits 99101dda29e3186b1356b0dc4dbb835c02c71ac9 and
> b80b701d5a67d07f4df4a21e09cb31f6bc1feeca.
Applied. Thanks,
Linus
^ permalink raw reply [relevance 99%]
* Re: [patch 5/9] x86: Cure per CPU madness on UP
@ 2024-03-15 23:23 93% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 23:23 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Guenter Roeck, LKML, x86, Uros Bizjak, linux-sparse, lkp, oe-kbuild-all
On Fri, 15 Mar 2024 at 15:55, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Not really. The problem is that a SMP build can run on a UP machine w/o
> APIC or command line disables the APIC and will run into the exactly
> same problem. The only case where we know that it is impossible is when
> APIC support is disabled, which is silly but topic for a different
> discussion.
Oh, I agree - that was why I said that it shouldn't depend on a local
APIC on machines that may not even have one.
That "may not even have one" can still be a static option - we
technically allow 32-bit UP kernel to not enable X86_UP_APIC, although
it might be time to drop that option.
> So the proper thing to do is to check for num_possible_cpus() == 1 in
> that function.
I think that's _one_ proper thing. I still think that the deeper
problem is that it still looks at local apic rules even when those
rules are completely nonsensical.
For example, that MAX_LOCAL_APIC range test may not matter simply
because it's testing a constant value, but it still smells entirely
wrong to even check for that, when the system doesn't necessarily have
one.
So I think your patch may fix the immediate bug, but I think it's
still just a band-aid.
Either we should just make all machines look like they have the proper
local apic mappings, or we shouldn't look at any local apic rules AT
ALL.
So I'd rather see those apic_maps[] just be properly filled in.
> Sure you can argue that we could avoid it for SMP=n builds completely,
> but I think the right thing to do is to aim for removing CONFIG_SMP and
> make the UP build a subset of a generic SMP capable build which has
> CONFIG_NR_CPUS=1, i.e. num_possible_cpus() = 1. Why?
I wouldn't be entirely opposed to just doing that. UP has become
fairly irrelevant.
That said, UP is *not* entirely irrelevant on other architectures, and
if we drop UP support on x86, we'll be effectively dropping a lot of
coverage testing. The number of people who do cross-compilers is
pretty small.
End result: I'd *much* rather get rid of X86_UP_APIC and the "nolapic"
kernel command line, and say "even UP has to have a local APIC".
We already require a Pentium-class CPU, so in practice we already
require that local APIC setup. And yes, machines existed where it
could be turned off, but I don't think that is relevant any more.
Put another way: I think "UP config for wider build testing" is a
_lot_ more relevant than "no LAPIC support".
Linus
^ permalink raw reply [relevance 93%]
* Re: [GIT PULL] KVM changes for Linux 6.9 merge window
@ 2024-03-15 22:28 98% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 22:28 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Catalin Marinas,
Mark Rutland, Will Deacon
Cc: linux-kernel, kvm
On Fri, 15 Mar 2024 at 10:49, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus
Argh.
This causes my arm64 build to fail, but since I don't do that between
every pull, I didn't notice until after I had already pushed things
out.
I get a failure on arch/arm64/kvm/check-res-bits.h (line 60):
BUILD_BUG_ON(ID_AA64DFR1_EL1_RES0 != (GENMASK_ULL(63, 0)));
and at least in my build, the generated sysreg-defs.h file has
#define ID_AA64DFR1_EL1_RES0 (UL(0))
so yeah, it most definitely doesn't match that GENMASK_ULL(63, 0).
I did *not* go delve into how arch/arm64/tools/gen-sysreg.awk works. I
don't really do awk any more.
The immediate cause of the failure is commit b80b701d5a67 ("KVM:
arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later
checking") but I hope it worked at *some* point. I can't see how.
I would guess / assume that commit cfc680bb04c5 ("arm64: sysreg: Add
layout for ID_AA64MMFR4_EL1") is also involved, but having recoiled in
horror from the awk script, I really can't even begin to guess at what
is going on.
Bringing in other people who hopefully can sort this out.
Linus
^ permalink raw reply [relevance 98%]
* Re: [GIT PULL] Crypto Update for 6.9
@ 2024-03-15 21:51 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 21:51 UTC (permalink / raw)
To: Herbert Xu
Cc: David S. Miller, Linux Kernel Mailing List, Linux Crypto Mailing List
On Thu, 14 Mar 2024 at 20:04, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Drivers:
>
> - Add queue stop/query debugfs support in hisilicon/qm.
There's a lot more than that in there. Fairl ybig Intel qat updates
from what I can see, for example.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL]: Generic phy updates for v6.9
@ 2024-03-15 19:22 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 19:22 UTC (permalink / raw)
To: Vinod Koul; +Cc: LKML
On Fri, 15 Mar 2024 at 04:03, Vinod Koul <vkoul@kernel.org> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git next
That is not a valid signed tag, and I can't find one in that repo.
I know you know how to do this right, so please send me a proper pull
request with a signed tag like you usually do...
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] clk changes for the merge window
@ 2024-03-15 18:54 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 18:54 UTC (permalink / raw)
To: Stephen Boyd; +Cc: Michael Turquette, linux-clk, linux-kernel
On Thu, 14 Mar 2024 at 12:43, Stephen Boyd <sboyd@kernel.org> wrote:
>
> I'm hoping that we can make that into a genpd that drivers attach
> instead, but this API should help drivers simplify in the meantime.
.. and I'm hoping that name dies in the code too, not just in the
directory structure.
'genpd' really makes absolutely zero sense as a name to anybody
outside of that legacy clique.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] fs/9p patches for 6.9 merge window
@ 2024-03-15 17:17 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 17:17 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: v9fs, linux-kernel
On Fri, 15 Mar 2024 at 08:10, Eric Van Hensbergen <ericvh@kernel.org> wrote:
>
> fs/9p changes for the 6.9 merge window
Entirely tangential, but your pgp key drives me insane, and it finally
drove me over the edge.
One of your uid's has your own name mis-spelled. This is not new.
Please tell me there's a reason for it.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] dlm fixes for 6.9
@ 2024-03-15 17:10 87% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-15 17:10 UTC (permalink / raw)
To: David Teigland; +Cc: linux-kernel, gfs2
On Thu, 14 Mar 2024 at 11:43, David Teigland <teigland@redhat.com> wrote:
>
> Fix two refcounting bugs from recent changes:
> - misuse of atomic_dec_and_test results in missed ref decrement
> - wrong variable assignment results in another missed ref decrement
I pulled this, and then I unpulled it again.
That code is insane.
This is *NOT* sane or valid code:
+ while (atomic_read(&lkb->lkb_wait_count)) {
+ if (atomic_dec_and_test(&lkb->lkb_wait_count))
+ list_del_init(&lkb->lkb_wait_reply);
+
+ unhold_lkb(lkb);
+ }
the above is completely crazy. That's simply not how atomics work.
What's the point of using a refcount - an atomic one at that - if you
just use it as a counter. That's not a "reference count", that's
literally just broken.
The whole - and *ONLY* - point of a refcount is that you are counting
references. References that *YOU* hold. Not that somebody else is
holding and you are releasing.
If you're the only holder of any counts, don't make them atomic, don't
put them in a data structure. But you're *not* the only holder fo that
refcount here, are you?
Using atomics for this kind of sequence shows some crazy crazy
behavior. It's not valid to say "ok, as long as this atomic is not
zero, let's decrement it and test if it's not zero".
Because for an atomic value to MAKE SENSE IN THE FIRST PLACE, there
could be somebody else that comes in and also possibly decrements it.
And if that happens between the test of "is this zero" and "did I
decrement it to zero", you now had two decrements, and that value is
now negative. So you didn't really have an atomic value, because you
did two operations on it.
And dammit, if that mutex means that it cannot happen, then WHY WAS IT
AN ATOMIC IN THE FIRST PLACE?
IOW, if you have locking that protects the value, then atomic accesses
are STILL wrong.
So there is not a single situation where I can see the above kind of
code ever being valid.
Now, if the issue is that you want to clean up something that is never
getting cleaned up by anybody else, and this is a fatal error, and
you're just trying to fix things up (badly), and you know that this is
all racy but the code is trying to kill a dead data structure, then
you should
(a) need a damn big comment (bigger than the comment is already)
(b) should *NOT* pretend to do some stupid "atomic decrement and test" loop
IOW, if what you want to do is get rid of stuck entries and set the
refcount to zero, then doing that would probably be something like
/* This is broken, but.. */
stale = atomic_xchg((&lkb->lkb_wait_count, 0);
if (stale) {
list_del_init(&lkb->lkb_wait_reply);
do { unhold_lkb(lkb); } while (--stale);
}
and it needs a much bigger comment than that "This is broken".
(And I don't know if you want that list_del_init() before or after the
'unhold N times' loop).
The above is still completely broken, but at least it doesn't do some
kind of odd non-atomic test and decrement stuff in a loop, and
hopefully makes it clear that we're very much talking about fixing up
stale final values
And no, I didn't look at the code around it. Because I really think
that "while (atomic_read(...)" loop cannot POSSIBLY be correct,
regardless of any context.
Linus
^ permalink raw reply [relevance 87%]
* Re: [patch 5/9] x86: Cure per CPU madness on UP
@ 2024-03-15 16:42 97% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-15 16:42 UTC (permalink / raw)
To: Guenter Roeck
Cc: Thomas Gleixner, LKML, x86, Uros Bizjak, linux-sparse, lkp,
oe-kbuild-all
On Fri, 15 Mar 2024 at 09:17, Guenter Roeck <linux@roeck-us.net> wrote:
>
> [ 3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
> [ 3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff
The code is
mov %rax,0x60(%rbx)
call 0x125f5f
mov 0x26184b4(%rip),%edx
mov 0x3078f01(%rip),%rax
movq $0xffffffffb68084e0,0x90(%rbx)
mov %rbx,0x138(%rax,%rdx,8) <-- trapping instruction
jmp <backwards>
with %rdx being some index having the value 0xffffffed (-19).
That's ENODEV.
Without line numbers (if you have debug info for that kernel, it's
good to run "scripts/decode_stacktrace.sh" on stack traces) it's hard
to really know what's up, but I strongly suspect that it's this:
rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
because we have
topology_logical_die_id(cpu) ->
(cpu_data(cpu).topo.logical_die_id)
and we have
c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
and topology_get_logical_id() does this:
if (lvlid >= MAX_LOCAL_APIC)
return -ERANGE;
if (!test_bit(lvlid, apic_maps[at_level].map))
return -ENODEV;
so that -ENODEV is not entirely unlikely for a UP run.
This also explains why it *used* to work - that whole thing is new to
the current merge window and came in through commit ca7e91776912
("Merge tag 'x86-apic-2024-03-10' of ...").
Thomas, over to you. I wonder if maybe all those topology macros
should just return 0 on an UP build, but that
topology_get_logical_id() thing looks a bit wrong regardless.
It really shouldn't depend on local apic data for configs that may not
*have* a local apic.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] lsm/lsm-pr-20240314
@ 2024-03-14 23:05 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 23:05 UTC (permalink / raw)
To: Paul Moore; +Cc: linux-security-module, linux-kernel
On Thu, 14 Mar 2024 at 13:31, Paul Moore <paul@paul-moore.com> wrote:
>
> I would like if you could merge these patches, I believe fixing the
> syscall signature problem now poses very little risk and will help us
> avoid the management overhead of compat syscall variants in the future.
> However, I'll understand if you're opposed, just let me know and I'll
> get you a compat version of this pull request as soon as we can get
> something written/tested/verfified.
No, attempting to just fix it after-the-fact in the hopes that nobody
actually uses the new system call yet sounds like the right thing to
do.
6.8 has been out for just days, and I see it's marked for stable, so
hopefully nobody ever even sees the mistake. I can't imagine that the
new system call is that eagerly used.
Famous last wods.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] platform-drivers-x86 for v6.9-1
[not found] <65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2024-03-14 18:36 97% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 18:36 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: LKML, PDx86, Hans de Goede, Andy Shevchenko
On Thu, 14 Mar 2024 at 04:04, Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com> wrote:
>
> Here is the main PDx86 PR for v6.9.
So I've obviously pulled this, and pr-tracker-bot already replied to
that effect.
However, it turns out that the pr-tracker-bot reply didn't thread
correctly for me, and I looked into why.
Your SMTP setup is oddly broken. It looks like your original email was
sent with a bogus Message-ID.
So in my headers, I see how gmail has added a properly formatted Message-ID:
<65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>
and lists your original broken one as
<4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>
which indeed is completely wrong.
I have no idea how you managed that, since your headers don't actually
seem to specify the MUA you used. But whatever it was, it's very very
mis-configured.
The pr-tracker-bot reply does have that original Message-ID in its
threading notes:
In-Reply-To: <4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>
References: <4844b67c9b1feca386eb739a4592bdbf.Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com>>
but it doesn't thread for me because the message-id from the original
email got rewritten as something valid.
Can you please look into fixing whatever MUA you used for sending that
pull request?
This is obviously not a deal breaker, but it's odd.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] bcachefs updates for 6.9
@ 2024-03-14 17:15 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 17:15 UTC (permalink / raw)
To: Kent Overstreet
Cc: Darrick J. Wong, linux-bcachefs, linux-fsdevel, linux-kernel
On Wed, 13 Mar 2024 at 15:28, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Sorry, you were talking about mean absolute deviation; that does work
> here.
Yes, I meant mean, not median.
But the confusion is my fault - I wrote MAD and then to "explain"
that, I put "median" in my own email - so you read it right the first
time, and it was just me being sloppy and confusing things.
They are both called MAD in their own contexts, and they are much too
easy to confuse.
My bad,
Linus
^ permalink raw reply [relevance 99%]
* Re: [git pull] drm for 6.9-rc1
@ 2024-03-14 1:49 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 1:49 UTC (permalink / raw)
To: Dave Airlie, Animesh Manna, Jani Nikula; +Cc: Daniel Vetter, dri-devel, LKML
On Tue, 12 Mar 2024 at 21:07, Dave Airlie <airlied@gmail.com> wrote:
>
> I've done a trial merge into your tree from a few hours ago, there
> are definitely some slighty messy conflicts, I've pushed a sample
> branch here:
I appreciate your sample merges since I like verifying my end result,
but I think your merge is wrong.
I got two differences when I did the merge. The one in
intel_dp_detect() I think is just syntactic - I ended up placing the
if (!intel_dp_is_edp(intel_dp))
intel_psr_init_dpcd(intel_dp);
differently than you did (I did it *after* the tunnel_detect()).
I don't _think,_ that placement matters, but somebody more familiar
with the code should check it out. Added Animesh and Jani to the
participants.
But I think your merge gets the TP_printk() for the xe_bo_move trace
event is actively wrong. You don't have the destination for the move
in the printk.
Or maybe I got it wrong. Our merges end up _close_, but not identical.
Linus
^ permalink raw reply [relevance 99%]
* Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
@ 2024-03-14 1:15 97% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-14 1:15 UTC (permalink / raw)
To: Florian Fainelli
Cc: Russell King (Oracle),
Joel Fernandes, Boqun Feng, Anna-Maria Behnsen, linux-kernel,
kernel-team, paulmck, mingo, tglx, rcu, neeraj.upadhyay, urezki,
qiang.zhang1211, frederic, bigeasy, chenzhongjin, yangjihong1,
rostedt, Justin Chen
On Wed, 13 Mar 2024 at 16:29, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On this specific commit 7ee988770326fca440472200c3eb58935fe712f6, there
> is a 100% failure for at least 3 devices out of the 16 that are running
> the test.
Hmm. I have no idea what is going on, and the unimac-mdio probe
function (one of the things that seem to take forever on your setup)
looks fairly simple.
There doesn't even seem to be any timers involved.
That said - one of the things it does is
unimac_mdio_probe ->
unimac_mdio_clk_set ->
clk_prepare_enable
and maybe that's a pattern, because you report that
brcm_pcie_resume_noirq is another problem spot (on resume).
And guess what brcm_pcie_resume_noirq() does?
Yup. clk_prepare_enable().
So I'm wondering if there's some interaction with some clock driver?
That might explain why it shows up on some arm platforms but not
elsewhere.
I may be barking *entirely* up the wrong tree, though. I was just
looking at that unimac probe and going "there's absolutely _nothing_
timer-related here" and that clk thing looked like it might at least
have _some_ relevance.
Linus
^ permalink raw reply [relevance 97%]
* Re: [GIT PULL] bcachefs updates for 6.9
@ 2024-03-13 21:51 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-13 21:51 UTC (permalink / raw)
To: Kent Overstreet
Cc: Darrick J. Wong, linux-bcachefs, linux-fsdevel, linux-kernel
On Wed, 13 Mar 2024 at 14:34, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> I liked your MAD suggestion, but the catch was that we need an
> exponentially weighted version,
The code for the weighted version literally doesn't change.
The variance value is different, but the difference between MAD and
standard deviation is basically just a constant factor (which will be
different for different distributions, but so what? Any _particular_
case will have a particular distribution).
So why would a constant factor make _any_ difference for any
exponential weighting?
Anyway, feel free to keep your code in bcachefs.
And maybe xfs even wants to copy that code. I don't care, it seems
stupid, but that's a filesystem choice.
But if we're making it a generic kernel library, it needs to be sane.
Not making people do 64-bit square roots and 128-bit divides just for
a random statistical element.
Linus
^ permalink raw reply [relevance 99%]
* Re: [RFC PATCH 1/2] smp: Implement serialized smp_call_function APIs
@ 2024-03-13 21:19 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 21:19 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Oskolkov, linux-kernel, Peter Zijlstra, Paul E . McKenney,
Boqun Feng, Andrew Hunter, Maged Michael, gromer, Avi Kivity,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
On Wed, 13 Mar 2024 at 13:56, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> Introduce serialized smp_call_function APIs to limit the number of
> concurrent smp_call_function IPIs which can be sent to a given CPU to a
> maximum of two: one broadcast and one specifically targeting the CPU.
So honestly, with only one user, I think the serialization code
should be solidly in that one user, not in kernel/smp.c.
Also, this kind of extra complexity does require numbers to argue for it.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] bcachefs updates for 6.9
@ 2024-03-13 20:47 94% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-13 20:47 UTC (permalink / raw)
To: Kent Overstreet, Darrick J. Wong
Cc: linux-bcachefs, linux-fsdevel, linux-kernel
On Tue, 12 Mar 2024 at 18:10, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Hi Linus, few patches for you - plus a simple merge conflict with VFS
> changes:
The conflicts are trivial.
The "make random bcachefs code be a library function" stuff I looked
at, decided is senseless, and ended up meaning that I'm not pulling
this without a lot more explanation (and honestly, I don't think the
explanations would hold water).
That "stdio_redirect_printf()" and darray_char stuff is just
horrendous interfaces with no explanations. The interfaces are
disgusting.
Keep it in your own code where it belongs, don't try to make it some
generic library thing.
And if you *do* make it a library thing, it needs to be
(a) much more explained
(b) have much saner naming, and fewer disgusting and completely
nonsensical interfaces ("DARRAY()").
And no, finding one other filesystem to share this kind of code is not
sufficient to try to claim it's a sane interface and sane naming.
But the main dealbreaker is the insane math.
And dammit, we talked about the idiotic "mean and variance" garbage
long ago. It was wrong back then, it's *still* wrong.
You didn't explain why it couldn't use the *much* simpler MAD (median
absolute deviation) instead of using variance.
That bad decision directly results in that pointless use of overly
complex 128-bit math.
I called it insanely over-engineered back then, and as far as I can
tell, absolutely *NOTHING* has changed apart from some slight type
name details.
As long as you made it some kind of bcachefs-only thing, I don't mind.
But now you're trying to push this garbage as some kind of generic
library code that others would use, and that immediately means that I
*do* mind insanely overengineered interfaces.
The time_stats stuff otherwise looks at leask like a sane interface
with names and uses, but the use of that horrendous infrastructure
scuttles it.
Linus
^ permalink raw reply [relevance 94%]
* Re: [GIT PULL] vfs pidfd
@ 2024-03-13 19:40 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 19:40 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Wed, 13 Mar 2024 at 10:10, Christian Brauner <brauner@kernel.org> wrote:
>
> If you're fine with it I would ask you to please just apply it [..]
I'll take it directly, no problem.
Thanks,
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] slab updates for 6.9
@ 2024-03-13 3:54 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 3:54 UTC (permalink / raw)
To: Vlastimil Babka
Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
Hyeonggon Yoo, Chengming Zhou, Xiongwei Song
On Tue, 12 Mar 2024 at 02:55, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Also deprecate SLAB_MEM_SPREAD which was only
> used by SLAB, so it's a no-op since SLAB removal. Assign it an explicit zero
> value. The removals of the flag usage are handled independently in the
> respective subsystems, with a final removal of any leftover usage planned
> for the next release.
I already had the patch ready to go:
https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
so I just did a "git stash apply" and got rid of the final stragglers.
No need to have various random maintainers have to worry about a flag
that hasn't had any meaning since 6.7, and very little before that
either.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Networking for v6.9
2024-03-12 20:17 99% ` Linus Torvalds
@ 2024-03-13 1:00 99% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-13 1:00 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni, bpf
On Mon, 11 Mar 2024 at 21:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> - Large effort by Eric to lower rtnl_lock pressure and remove locks:
W00t!
Pulled. The rtnl lock is probably my least favorite kernel lock. It's
been one of the few global locks we have left (at least that matters).
There are others (I'm not claiming tasklist_lock is great), but
rtnl_lock has certainly been "up there" with the worst of them.
Linus
^ permalink raw reply [relevance 99%]
* Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
@ 2024-03-12 21:44 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 21:44 UTC (permalink / raw)
To: Florian Fainelli
Cc: Boqun Feng, linux-kernel, kernel-team, paulmck, mingo, tglx, rcu,
joel, neeraj.upadhyay, urezki, qiang.zhang1211, frederic,
bigeasy, anna-maria, chenzhongjin, yangjihong1, rostedt
On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> and here is a log where this fails:
>
> https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8
You could try the 'initcall_debug' kernel command line.
It will make the above *much* noisier, but it might - thanks to all
the new noise - show exactly *what* is being crazy slow to initialize.
Because right now it's just radio silence in between those
[ 1.926435] bcmgenet f0480000.ethernet: GENET 5.0 EPHY: 0x0000
[ 162.148135] unimac-mdio unimac-mdio.0: Broadcom UniMAC MDIO bus
things, and that's presumably because some random initcall there just
takes forever to time out.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Networking for v6.9
@ 2024-03-12 21:11 95% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 21:11 UTC (permalink / raw)
To: Jakub Kicinski, Jens Axboe, Johannes Thumshirn
Cc: davem, netdev, linux-kernel, pabeni, bpf, Tejun Heo
On Tue, 12 Mar 2024 at 13:47, Jakub Kicinski <kuba@kernel.org> wrote:
>
> With your tree as of 65d287c7eb1d it gets to prompt but dies soon after
> when prod services kick in (dunno what rpm Kdump does but says iocost
> so adding Tejun):
Both of your traces are timers that seem to either lock up in ioc_now():
https://lore.kernel.org/all/20240312133427.1a744844@kernel.org/
and now it looks like ioc_timer_fn():
https://lore.kernel.org/all/20240312134739.248e6bd3@kernel.org/
But in neither case does it actually look like it's a lockup on a *lock*.
IOW, the NMI isn't happening on some spin_lock sequence or anything like that.
Yes, ioc_now() could have been looping on the seq read-lock if the
sequence number was odd. But the writers do seem to be done with
interrupts disabled, plus then you wouldn't have this lockup in
ioc_timer_fn, so it's probably not that.
And yes, ioc_timer_fn() does take locks, but again, that doesn't seem
to be where it is hanging.
So it smells like it's an endless loop in ioc_timer_fn() to me, or
perhaps retriggering the timer itself infinitely.
Which would then explain both of those traces (that endless loop would
call ioc_now() as part of it).
The blk-iocost.c code itself hasn't changed, but the timer code has
gone through big changes.
That said, there's a more blk-related change: da4c8c3d0975 ("block:
cache current nsec time in struct blk_plug").
*And* your second dump is from that
period_vtime = now.vnow - ioc->period_at_vtime;
if (WARN_ON_ONCE(!period_vtime)) {
so it smells like the blk-iocost code is just completely confused by
the time caching. Jens?
Jakub, it might be worth seeing if just reverting that commit
da4c8c3d0975 makes the problem go away. Otherwise a bisect might be
needed...
Linus
^ permalink raw reply [relevance 95%]
* Re: [GIT PULL] vfs pidfd
@ 2024-03-12 20:21 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 20:21 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Tue, 12 Mar 2024 at 13:09, Christian Brauner <brauner@kernel.org> wrote:
>
> It's used to compare pidfs and someone actually already sent a pull
> request for this to another project iirc. So it'd be good to keep that
> property.
Hmm. If people really do care, I guess we should spend the effort on
making those things unique.
> But if your point is that we don't care about this for 32bit then I do
> agree. We could do away with the checks completely and just accept the
> truncation for 32bit. If that's your point feel free to just remove the
> 32bit handling in the patch and apply it. Let me know. Maybe I
> misunderstood.
I personally don't care about 32-bit any more, but it also feels wrong
to just say that it's ok depending on something on a 64-bit kernel,
but not a 32-bit one.
So let's go with your patch. It's not like it's a problem to spend the
(very little) extra effort to do a 64-bit inode number.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] Networking for v6.9
@ 2024-03-12 20:17 99% ` Linus Torvalds
2024-03-13 1:00 99% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 20:17 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni, bpf
On Mon, 11 Mar 2024 at 21:25, Jakub Kicinski <kuba@kernel.org> wrote:
>
> I get what looks like blk-iocost deadlock when I try to run
> your current tree on real Meta servers :(
Hmm. This "it breaks on real hardware, but works in virtual boxes"
sounds like it might be the DM queue limit issue.
Did the tree you tested with perhaps have commit 8e0ef4128694 (which
came in yesterday through the block merge (merge commit 1ddeeb2a058d
just after 11am Monday), but not the revert (commit bff4b74625fe, six
hours later).
IOW, just how current was that "current"? Your email was sent multiple
hours after the revert happened and was pushed out, but I would not be
surprised if your testing was done with something that was in that
broken window.
So if you merged some *other* tree than one from that six-hour window,
please holler - because there's something else going on and we need to
get the block people on it.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] AFFS update for 6.9
@ 2024-03-12 20:02 54% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 20:02 UTC (permalink / raw)
To: David Sterba; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 904 bytes --]
On Mon, 11 Mar 2024 at 12:37, David Sterba <dsterba@suse.com> wrote:
>
> please pull one change to AFFS that removes use of SLAB_MEM_SPREAD,
> which is going to be removed from MM code.
I've pulled this, but I don't really see the point in removing these
one by one like this.
SLAB_MEM_SPREAD is already a no-op, the MM people could just do a
coccinelle thing to remove it everywhere.
I think you could do 90% even just using a few variations of 'sed', eg
variations on
git grep -l 'SLAB_MEM_SPREAD' |
xargs sed -i 's/SLAB_MEM_SPREAD *|//'
git grep -l 'SLAB_MEM_SPREAD' |
xargs sed -i 's/| *SLAB_MEM_SPREAD//'
and then some manual fixups for (a) whitespace cleanup of the result
and (b) the couple of cases where it wasn't a bitwise or into other
fields (or where the bitwise or was on a different line)
And then you'd end up with something like the attached.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 23779 bytes --]
drivers/dax/super.c | 3 +--
drivers/usb/isp1760/isp1760-hcd.c | 8 +++-----
fs/9p/v9fs.c | 2 +-
fs/adfs/super.c | 2 +-
fs/befs/linuxvfs.c | 3 +--
fs/bfs/inode.c | 2 +-
fs/ceph/super.c | 18 +++++++++---------
fs/coda/inode.c | 4 ++--
fs/erofs/super.c | 2 +-
fs/exfat/cache.c | 2 +-
fs/exfat/super.c | 2 +-
fs/ext2/super.c | 3 +--
fs/ext4/super.c | 3 +--
fs/fat/cache.c | 2 +-
fs/fat/inode.c | 2 +-
fs/freevxfs/vxfs_super.c | 2 +-
fs/gfs2/main.c | 1 -
fs/hpfs/super.c | 2 +-
fs/isofs/inode.c | 2 +-
fs/jffs2/super.c | 2 +-
fs/nfs/direct.c | 3 +--
fs/nfs/inode.c | 2 +-
fs/nfs/nfs42xattr.c | 2 +-
fs/ntfs3/super.c | 2 +-
fs/ocfs2/dlmfs/dlmfs.c | 2 +-
fs/ocfs2/super.c | 7 +++----
fs/overlayfs/super.c | 2 +-
fs/qnx4/inode.c | 2 +-
fs/quota/dquot.c | 2 +-
fs/smb/client/cifsfs.c | 2 +-
fs/tracefs/inode.c | 1 -
fs/ubifs/super.c | 4 ++--
fs/udf/super.c | 1 -
fs/ufs/super.c | 3 +--
fs/vboxsf/super.c | 3 +--
fs/xfs/xfs_super.c | 7 +++----
fs/zonefs/super.c | 2 +-
include/linux/slab.h | 2 --
mm/slab.h | 1 -
net/socket.c | 2 +-
net/sunrpc/rpc_pipe.c | 2 +-
41 files changed, 52 insertions(+), 69 deletions(-)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index f4b635526345..a0244f6bb44b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -532,8 +532,7 @@ static int dax_fs_init(void)
int rc;
dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_device), 0,
- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
init_once);
if (!dax_cache)
return -ENOMEM;
diff --git a/drivers/usb/isp1760/isp1760-hcd.c b/drivers/usb/isp1760/isp1760-hcd.c
index 76862ba40f35..0e5e4cb74c87 100644
--- a/drivers/usb/isp1760/isp1760-hcd.c
+++ b/drivers/usb/isp1760/isp1760-hcd.c
@@ -2521,21 +2521,19 @@ static const struct hc_driver isp1760_hc_driver = {
int __init isp1760_init_kmem_once(void)
{
urb_listitem_cachep = kmem_cache_create("isp1760_urb_listitem",
- sizeof(struct urb_listitem), 0, SLAB_TEMPORARY |
- SLAB_MEM_SPREAD, NULL);
+ sizeof(struct urb_listitem), 0, SLAB_TEMPORARY, NULL);
if (!urb_listitem_cachep)
return -ENOMEM;
qtd_cachep = kmem_cache_create("isp1760_qtd",
- sizeof(struct isp1760_qtd), 0, SLAB_TEMPORARY |
- SLAB_MEM_SPREAD, NULL);
+ sizeof(struct isp1760_qtd), 0, SLAB_TEMPORARY, NULL);
if (!qtd_cachep)
goto destroy_urb_listitem;
qh_cachep = kmem_cache_create("isp1760_qh", sizeof(struct isp1760_qh),
- 0, SLAB_TEMPORARY | SLAB_MEM_SPREAD, NULL);
+ 0, SLAB_TEMPORARY, NULL);
if (!qh_cachep)
goto destroy_qtd;
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 61dbe52bb3a3..281a1ed03a04 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -637,7 +637,7 @@ static int v9fs_init_inode_cache(void)
v9fs_inode_cache = kmem_cache_create("v9fs_inode_cache",
sizeof(struct v9fs_inode),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
v9fs_inode_init_once);
if (!v9fs_inode_cache)
return -ENOMEM;
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index e8bfc38239cd..9354b14bbfe3 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -249,7 +249,7 @@ static int __init init_inodecache(void)
adfs_inode_cachep = kmem_cache_create("adfs_inode_cache",
sizeof(struct adfs_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (adfs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 2b4dda047450..d76f406d3b2e 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -435,8 +435,7 @@ befs_init_inodecache(void)
{
befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
sizeof(struct befs_inode_info), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT),
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
offsetof(struct befs_inode_info,
i_data.symlink),
sizeof_field(struct befs_inode_info,
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 355957dbce39..db81570c9637 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -259,7 +259,7 @@ static int __init init_inodecache(void)
bfs_inode_cachep = kmem_cache_create("bfs_inode_cache",
sizeof(struct bfs_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (bfs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 5ec102f6b1ac..885cb5d4e771 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -928,36 +928,36 @@ static int __init init_caches(void)
ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
sizeof(struct ceph_inode_info),
__alignof__(struct ceph_inode_info),
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT, ceph_inode_init_once);
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+ ceph_inode_init_once);
if (!ceph_inode_cachep)
return -ENOMEM;
- ceph_cap_cachep = KMEM_CACHE(ceph_cap, SLAB_MEM_SPREAD);
+ ceph_cap_cachep = KMEM_CACHE(ceph_cap, 0);
if (!ceph_cap_cachep)
goto bad_cap;
- ceph_cap_snap_cachep = KMEM_CACHE(ceph_cap_snap, SLAB_MEM_SPREAD);
+ ceph_cap_snap_cachep = KMEM_CACHE(ceph_cap_snap, 0);
if (!ceph_cap_snap_cachep)
goto bad_cap_snap;
ceph_cap_flush_cachep = KMEM_CACHE(ceph_cap_flush,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+ SLAB_RECLAIM_ACCOUNT);
if (!ceph_cap_flush_cachep)
goto bad_cap_flush;
ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+ SLAB_RECLAIM_ACCOUNT);
if (!ceph_dentry_cachep)
goto bad_dentry;
- ceph_file_cachep = KMEM_CACHE(ceph_file_info, SLAB_MEM_SPREAD);
+ ceph_file_cachep = KMEM_CACHE(ceph_file_info, 0);
if (!ceph_file_cachep)
goto bad_file;
- ceph_dir_file_cachep = KMEM_CACHE(ceph_dir_file_info, SLAB_MEM_SPREAD);
+ ceph_dir_file_cachep = KMEM_CACHE(ceph_dir_file_info, 0);
if (!ceph_dir_file_cachep)
goto bad_dir_file;
- ceph_mds_request_cachep = KMEM_CACHE(ceph_mds_request, SLAB_MEM_SPREAD);
+ ceph_mds_request_cachep = KMEM_CACHE(ceph_mds_request, 0);
if (!ceph_mds_request_cachep)
goto bad_mds_req;
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index a50356c541f6..6898dc621011 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -72,8 +72,8 @@ int __init coda_init_inodecache(void)
{
coda_inode_cachep = kmem_cache_create("coda_inode_cache",
sizeof(struct coda_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT, init_once);
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+ init_once);
if (coda_inode_cachep == NULL)
return -ENOMEM;
return 0;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 9b4b66dcdd4f..8b6bf9ae1a59 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -885,7 +885,7 @@ static int __init erofs_module_init(void)
erofs_inode_cachep = kmem_cache_create("erofs_inode",
sizeof(struct erofs_inode), 0,
- SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
erofs_inode_init_once);
if (!erofs_inode_cachep)
return -ENOMEM;
diff --git a/fs/exfat/cache.c b/fs/exfat/cache.c
index 5a2f119b7e8c..7cc200d89821 100644
--- a/fs/exfat/cache.c
+++ b/fs/exfat/cache.c
@@ -46,7 +46,7 @@ int exfat_cache_init(void)
{
exfat_cachep = kmem_cache_create("exfat_cache",
sizeof(struct exfat_cache),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, SLAB_RECLAIM_ACCOUNT,
exfat_cache_init_once);
if (!exfat_cachep)
return -ENOMEM;
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index fcb658267765..3d5ea2cfad66 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -813,7 +813,7 @@ static int __init init_exfat_fs(void)
exfat_inode_cachep = kmem_cache_create("exfat_inode_cache",
sizeof(struct exfat_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+ 0, SLAB_RECLAIM_ACCOUNT,
exfat_inode_init_once);
if (!exfat_inode_cachep) {
err = -ENOMEM;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 01f9addc8b1f..cabea887314d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -213,8 +213,7 @@ static int __init init_inodecache(void)
{
ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
sizeof(struct ext2_inode_info), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT),
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
offsetof(struct ext2_inode_info, i_data),
sizeof_field(struct ext2_inode_info, i_data),
init_once);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a8ba84eabab2..59c72b6dd153 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1500,8 +1500,7 @@ static int __init init_inodecache(void)
{
ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
sizeof(struct ext4_inode_info), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT),
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
offsetof(struct ext4_inode_info, i_data),
sizeof_field(struct ext4_inode_info, i_data),
init_once);
diff --git a/fs/fat/cache.c b/fs/fat/cache.c
index 738e427e2d21..2af424e200b3 100644
--- a/fs/fat/cache.c
+++ b/fs/fat/cache.c
@@ -47,7 +47,7 @@ int __init fat_cache_init(void)
{
fat_cache_cachep = kmem_cache_create("fat_cache",
sizeof(struct fat_cache),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, SLAB_RECLAIM_ACCOUNT,
init_once);
if (fat_cache_cachep == NULL)
return -ENOMEM;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 5c813696d1ff..d9e6fbb6f246 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -787,7 +787,7 @@ static int __init fat_init_inodecache(void)
fat_inode_cachep = kmem_cache_create("fat_inode_cache",
sizeof(struct msdos_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (fat_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index e6e2a2185e7c..42e03b6b1cc7 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -307,7 +307,7 @@ vxfs_init(void)
vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
sizeof(struct vxfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ SLAB_RECLAIM_ACCOUNT,
offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
sizeof_field(struct vxfs_inode_info,
vii_immed.vi_immed),
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 79be0cdc730c..04cadc02e5a6 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -111,7 +111,6 @@ static int __init init_gfs2_fs(void)
gfs2_inode_cachep = kmem_cache_create("gfs2_inode",
sizeof(struct gfs2_inode),
0, SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|
SLAB_ACCOUNT,
gfs2_init_inode_once);
if (!gfs2_inode_cachep)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 6b0ba3c1efba..314834a078e9 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -255,7 +255,7 @@ static int init_inodecache(void)
hpfs_inode_cachep = kmem_cache_create("hpfs_inode_cache",
sizeof(struct hpfs_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (hpfs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 3e4d53e26f94..25fca44149dd 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -93,7 +93,7 @@ static int __init init_inodecache(void)
isofs_inode_cachep = kmem_cache_create("isofs_inode_cache",
sizeof(struct iso_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (!isofs_inode_cachep)
return -ENOMEM;
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index f99591a634b4..aede1be4dc0c 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -387,7 +387,7 @@ static int __init init_jffs2_fs(void)
jffs2_inode_cachep = kmem_cache_create("jffs2_i",
sizeof(struct jffs2_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
jffs2_i_init_once);
if (!jffs2_inode_cachep) {
pr_err("error: Failed to initialise inode cache\n");
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index c03926a1cc73..7af5d270de28 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1037,8 +1037,7 @@ int __init nfs_init_directcache(void)
{
nfs_direct_cachep = kmem_cache_create("nfs_direct_cache",
sizeof(struct nfs_direct_req),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD),
+ 0, SLAB_RECLAIM_ACCOUNT,
NULL);
if (nfs_direct_cachep == NULL)
return -ENOMEM;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ebb8d60e1152..93ea49a7eb61 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2372,7 +2372,7 @@ static int __init nfs_init_inodecache(void)
nfs_inode_cachep = kmem_cache_create("nfs_inode_cache",
sizeof(struct nfs_inode),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (nfs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
index 49aaf28a6950..b6e3d8f77b91 100644
--- a/fs/nfs/nfs42xattr.c
+++ b/fs/nfs/nfs42xattr.c
@@ -1017,7 +1017,7 @@ int __init nfs4_xattr_cache_init(void)
nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
sizeof(struct nfs4_xattr_cache), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+ (SLAB_RECLAIM_ACCOUNT),
nfs4_xattr_cache_init_once);
if (nfs4_xattr_cache_cachep == NULL)
return -ENOMEM;
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index cef5467fd928..9df7c20d066f 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1825,7 +1825,7 @@ static int __init init_ntfs_fs(void)
ntfs_inode_cachep = kmem_cache_create(
"ntfs_inode_cache", sizeof(struct ntfs_inode), 0,
- (SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT),
init_once);
if (!ntfs_inode_cachep) {
err = -ENOMEM;
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 85215162c9dd..7fc0e920eda7 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -578,7 +578,7 @@ static int __init init_dlmfs_fs(void)
dlmfs_inode_cache = kmem_cache_create("dlmfs_inode_cache",
sizeof(struct dlmfs_inode_private),
0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
dlmfs_init_once);
if (!dlmfs_inode_cache) {
status = -ENOMEM;
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index a70aff17d455..b3f860888e93 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1706,18 +1706,17 @@ static int ocfs2_initialize_mem_caches(void)
sizeof(struct ocfs2_inode_info),
0,
(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
ocfs2_inode_init_once);
ocfs2_dquot_cachep = kmem_cache_create("ocfs2_dquot_cache",
sizeof(struct ocfs2_dquot),
0,
- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD),
+ (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT),
NULL);
ocfs2_qf_chunk_cachep = kmem_cache_create("ocfs2_qf_chunk_cache",
sizeof(struct ocfs2_quota_chunk),
0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+ (SLAB_RECLAIM_ACCOUNT),
NULL);
if (!ocfs2_inode_cachep || !ocfs2_dquot_cachep ||
!ocfs2_qf_chunk_cachep) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 36d4b8b1f784..a40fc7e05525 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1503,7 +1503,7 @@ static int __init ovl_init(void)
ovl_inode_cachep = kmem_cache_create("ovl_inode",
sizeof(struct ovl_inode), 0,
(SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
ovl_inode_init_once);
if (ovl_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 7b5711f76709..d79841e94428 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -378,7 +378,7 @@ static int init_inodecache(void)
qnx4_inode_cachep = kmem_cache_create("qnx4_inode_cache",
sizeof(struct qnx4_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (qnx4_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 1f0c754416b6..eb6e9d95dea1 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2984,7 +2984,7 @@ static int __init dquot_init(void)
dquot_cachep = kmem_cache_create("dquot",
sizeof(struct dquot), sizeof(unsigned long) * 4,
(SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_PANIC),
+ SLAB_PANIC),
NULL);
order = 0;
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index fb368b191eef..e0d8c79cdde1 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1664,7 +1664,7 @@ cifs_init_inodecache(void)
cifs_inode_cachep = kmem_cache_create("cifs_inode_cache",
sizeof(struct cifsInodeInfo),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
cifs_init_once);
if (cifs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index d65ffad4c327..5545e6bf7d26 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -731,7 +731,6 @@ static int __init tracefs_init(void)
tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache",
sizeof(struct tracefs_inode),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|
SLAB_ACCOUNT),
init_once);
if (!tracefs_inode_cachep)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index d2881041b393..7f4031a15f4d 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2434,8 +2434,8 @@ static int __init ubifs_init(void)
ubifs_inode_slab = kmem_cache_create("ubifs_inode_slab",
sizeof(struct ubifs_inode), 0,
- SLAB_MEM_SPREAD | SLAB_RECLAIM_ACCOUNT |
- SLAB_ACCOUNT, &inode_slab_ctor);
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+ &inode_slab_ctor);
if (!ubifs_inode_slab)
return -ENOMEM;
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 928a04d9d9e0..6f420f4ca005 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -177,7 +177,6 @@ static int __init init_inodecache(void)
udf_inode_cachep = kmem_cache_create("udf_inode_cache",
sizeof(struct udf_inode_info),
0, (SLAB_RECLAIM_ACCOUNT |
- SLAB_MEM_SPREAD |
SLAB_ACCOUNT),
init_once);
if (!udf_inode_cachep)
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index a480810cd4e3..44666afc6209 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1470,8 +1470,7 @@ static int __init init_inodecache(void)
{
ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
sizeof(struct ufs_inode_info), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT),
+ (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT),
offsetof(struct ufs_inode_info, i_u1.i_symlink),
sizeof_field(struct ufs_inode_info,
i_u1.i_symlink),
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 1fb8f4df60cb..cabe8ac4fefc 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -339,8 +339,7 @@ static int vboxsf_setup(void)
vboxsf_inode_cachep =
kmem_cache_create("vboxsf_inode_cache",
sizeof(struct vboxsf_inode), 0,
- (SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
- SLAB_ACCOUNT),
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
vboxsf_inode_init_once);
if (!vboxsf_inode_cachep) {
err = -ENOMEM;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 00fbd5b6e582..59c8c0541bdd 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -2043,8 +2043,7 @@ xfs_init_caches(void)
xfs_buf_cache = kmem_cache_create("xfs_buf", sizeof(struct xfs_buf), 0,
SLAB_HWCACHE_ALIGN |
- SLAB_RECLAIM_ACCOUNT |
- SLAB_MEM_SPREAD,
+ SLAB_RECLAIM_ACCOUNT,
NULL);
if (!xfs_buf_cache)
goto out;
@@ -2109,14 +2108,14 @@ xfs_init_caches(void)
sizeof(struct xfs_inode), 0,
(SLAB_HWCACHE_ALIGN |
SLAB_RECLAIM_ACCOUNT |
- SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
xfs_fs_inode_init_once);
if (!xfs_inode_cache)
goto out_destroy_efi_cache;
xfs_ili_cache = kmem_cache_create("xfs_ili",
sizeof(struct xfs_inode_log_item), 0,
- SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+ SLAB_RECLAIM_ACCOUNT,
NULL);
if (!xfs_ili_cache)
goto out_destroy_inode_cache;
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 236a6d88306f..c6a124e8d565 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1422,7 +1422,7 @@ static int __init zonefs_init_inodecache(void)
{
zonefs_inode_cachep = kmem_cache_create("zonefs_inode_cache",
sizeof(struct zonefs_inode_info), 0,
- (SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
NULL);
if (zonefs_inode_cachep == NULL)
return -ENOMEM;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index b5f5ee8308d0..995f0cdc2b70 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -96,8 +96,6 @@
*/
/* Defer freeing slabs to RCU */
#define SLAB_TYPESAFE_BY_RCU ((slab_flags_t __force)0x00080000U)
-/* Spread some memory over cpuset */
-#define SLAB_MEM_SPREAD ((slab_flags_t __force)0x00100000U)
/* Trace allocations and frees */
#define SLAB_TRACE ((slab_flags_t __force)0x00200000U)
diff --git a/mm/slab.h b/mm/slab.h
index 54deeb0428c6..f4534eefb35d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -469,7 +469,6 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
SLAB_STORE_USER | \
SLAB_TRACE | \
SLAB_CONSISTENCY_CHECKS | \
- SLAB_MEM_SPREAD | \
SLAB_NOLEAKTRACE | \
SLAB_RECLAIM_ACCOUNT | \
SLAB_TEMPORARY | \
diff --git a/net/socket.c b/net/socket.c
index ed3df2f749bf..7e9c8fc9a5b4 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -343,7 +343,7 @@ static void init_inodecache(void)
0,
(SLAB_HWCACHE_ALIGN |
SLAB_RECLAIM_ACCOUNT |
- SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
BUG_ON(sock_inode_cachep == NULL);
}
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index dcc2b4f49e77..910a5d850d04 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1490,7 +1490,7 @@ int register_rpc_pipefs(void)
rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
sizeof(struct rpc_inode),
0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (!rpc_inode_cachep)
return -ENOMEM;
^ permalink raw reply related [relevance 54%]
* Re: [GIT PULL] vfs pidfd
@ 2024-03-12 16:23 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 16:23 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Tue, 12 Mar 2024 at 07:16, Christian Brauner <brauner@kernel.org> wrote:
>
> No, the size of struct pid was the main reason but I don't think it
> matters. A side-effect was that we could easily enforce 64bit inode
> numbers. But realistically it's trivial enough to workaround. Here's a
> patch for what I think is pretty simple appended. Does that work?
This looks eminently sane to me. Not that I actually _tested_it, but
since my testing would have compared it to my current setup (64-bit
and CONFIG_FS_PID=y) any testing would have been pointless because
that case didn't change.
Looking at the patch, I do wonder how much we even care about 64-bit
inodes. I'd like to point out how 'path_from_stashed()' only takes a
'unsigned long ino' anyway, and I don't think anything really cares
about either the high bits *or* the uniqueness of that inode number..
And similarly, i_ino isn't actually *used* for anything but naming to
user space.
So I'm not at all sure the whole 64-bit checks are worth it. Am I
missing something else?
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] EDAC updates for v6.9
@ 2024-03-12 2:25 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 2:25 UTC (permalink / raw)
To: Randy Dunlap; +Cc: Borislav Petkov, x86-ml, lkml, linux-edac
On Mon, 11 Mar 2024 at 19:24, Randy Dunlap <rdunlap@infradead.org> wrote:
>
> and there's an extra/trailing ';'.
Ayup, I fixed that too while I was in there anyway.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] EDAC updates for v6.9
@ 2024-03-12 1:12 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-12 1:12 UTC (permalink / raw)
To: Borislav Petkov; +Cc: x86-ml, lkml, linux-edac
On Mon, 11 Mar 2024 at 08:57, Borislav Petkov <bp@alien8.de> wrote:
>
> - return topology_die_id(err->cpu) % amd_get_nodes_per_socket();
> + return topology_amd_node_id(err->cpu) % topology_amd_nodes_per_pkg();
Ho humm. Lookie here:
static inline unsigned int topology_amd_nodes_per_pkg(void)
{ return 0; };
that's the UP case.
Yeah, I'm assuming nobody tests this for UP, but it's clearly wrong to
potentially do that modulus by zero.
So I made the merge also change that UP case of
topology_amd_nodes_per_pkg() to return 1.
Because dammit, not only is a mod-by-zero wrong, a UP system most
definitely has one node per package, not zero.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] x86/sev for v6.9-rc1
@ 2024-03-12 0:50 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-12 0:50 UTC (permalink / raw)
To: Borislav Petkov; +Cc: x86-ml, lkml
On Mon, 11 Mar 2024 at 08:19, Borislav Petkov <bp@alien8.de> wrote:
>
> If you're merging tip pull requests in the chronological order you've
> received them, you'll encounter a couple of simple merge conflicts.
It's not exactly chronological - I tend to go by areas and by
submitter, but it tends to approximate chronological most of the
time..
> I'm adding how I've resolved them at the end of this message in case
> you wanna compare notes.
Hmm. I took a slightly different approach:
> diff --cc arch/x86/include/asm/coco.h
> index 76c310b19b11,21940ef8d290..42871bb262d0
> --- a/arch/x86/include/asm/coco.h
> +++ b/arch/x86/include/asm/coco.h
> @@@ -10,9 -11,15 +11,15 @@@ enum cc_vendor
> CC_VENDOR_INTEL,
> };
>
> -extern enum cc_vendor cc_vendor;
> + extern u64 cc_mask;
> +
> #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
> +extern enum cc_vendor cc_vendor;
I put the 'cc_mask' declaration inside the #ifdef too.
Because those two variables are defined together, and without
CONFIG_ARCH_HAS_CC_PLATFORM the whole coco/ subdirectory that defines
them won't even be built, as far as I can tell.
And I don't see any _use_ of 'cc_mask' anywhere outside of that one
'cc_set_mask()' inline function and the coco/core.c file. So declaring
it only when it's all enabled seems to be the right thing.
Let's hope my artistic merge resolution doesn't end up coming back to bite me.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO
@ 2024-03-11 20:23 95% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-11 20:23 UTC (permalink / raw)
To: Ankur Arora
Cc: paulmck, Joel Fernandes, linux-kernel, tglx, peterz, akpm, luto,
bp, dave.hansen, hpa, mingo, juri.lelli, vincent.guittot, willy,
mgorman, jpoimboe, mark.rutland, jgross, andrew.cooper3, bristot,
mathieu.desnoyers, geert, glaubitz, anton.ivanov, mattst88,
krypton, rostedt, David.Laight, richard, mjguzik, jon.grimm,
bharata, raghavendra.kt, boris.ostrovsky, konrad.wilk
On Mon, 11 Mar 2024 at 13:10, Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
> Ah, I see your point. Basically, keep the lazy semantics but -- in
> addition -- also provide the ability to dynamically toggle
> cond_resched(), might_reshed() as a feature to help move this along
> further.
Please, let's not make up any random hypotheticals.
Honestly, if we ever hit the hypothetical scenario that Paul outlined, let's
(a) deal with it THEN, when we actually know what the situation is
(b) learn and document what it is that actually causes the odd behavior
IOW, instead of assuming that some "cond_resched()" case would even be
the right thing to do, maybe there are other issues going on? Let's
not paper over them by keeping some hack around - and *if* some
cond_resched() model is actually the right model in some individual
place, let's make it the rule that *when* we hit that case, we
document it.
And we should absolutely not have some hypothetical case keep us from
just doing the right thing and getting rid of the existing
cond_resched().
Because any potential future case is *not* going to be the same
cond_resched() that the current case is anyway. It is going to have
some very different cause.
Linus
^ permalink raw reply [relevance 95%]
* Re: [GIT PULL] vfs pidfd
@ 2024-03-11 20:05 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-11 20:05 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, linux-kernel
On Fri, 8 Mar 2024 at 02:14, Christian Brauner <brauner@kernel.org> wrote:
>
> * Move pidfds from the anonymous inode infrastructure to a tiny
> pseudo filesystem. This will unblock further work that we weren't able
> to do simply because of the very justified limitations of anonymous
> inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on
> pidfds to become useful for the first time. They can now be compared
> by inode number which are unique for the system lifetime.
So I obviously pulled this already, but I did have one question - we
don't make nsfs conditional, and I'm not convinced we should make
pidfs conditional either.
I think (and *hope*) all the semantic annoyances got sorted out, and I
don't think there are any realistic size advantages to not enabling
CONFIG_FS_PID.
Is there some fundamental reason for that config entry to exist?
Linus
^ permalink raw reply [relevance 99%]
* Linux 6.8
@ 2024-03-10 21:06 51% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-10 21:06 UTC (permalink / raw)
To: Linux Kernel Mailing List
So it took a bit longer for the commit counts to come down this
release than I tend to prefer, but a lot of that seemed to be about
various selftest updates (networking in particular) rather than any
actual real sign of problems. And the last two weeks have been pretty
quiet, so I feel there's no real reason to delay 6.8. We always have
some straggling work, and we'll end up having some of it pushed to
stable rather than hold up the new code. Nothing worrisome enough to
keep the regular release schedule from happening.
As usual, the shortlog below is just for the last week since rc7, the
overall changes in 6.8 are obviously much much bigger. This is not the
historically big release that 6.7 was - we seem to be back to a fairly
average release size for the last few years. You can see it in the
overall diffstats too - this looks like an average release in pretty
much all respects, and we don't have (for example) any obvious big new
filesystems or architectures. I think the biggest single new thing in
6.8 is probably the new Xe drm driver, but honestly, the big bulk of
changes are just various random updates and fixes all over.
Just as it should be.
In a sea of normality, one thing that stands out is a bit of random
git numerology. This is the last mainline kernel to have less than
ten million git objects. In fact, we're at 9.996 million objects, so
we got really close to crossing that not-milestone if it hadn't been
for the nice calming down in the last couple of weeks. Other trees -
notably linux-next - obviously are already comfortably over that
limit.
Of course, there is absolutely nothing special about it apart from a
nice round number. Git doesn't care.
Anyway, this all obviously means that tomorrow the merge window for
6.9 opens, and I already have several pull requests pending. Thanks to
everybody who sent in early pull requests, you know who you are. But
before that excitement commences, please do spend a bit of time with
the now boring old status quo and give 6.8 a good test, ok?
Linus
---
Al Raj Hassain (1):
ASoC: amd: yc: Add HP Pavilion Aero Laptop 13-be2xxx(8BD6) into
DMI quirk table
Alan Stern (1):
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
Alban Boyé (1):
ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
Alex Deucher (1):
drm/amd/display: handle range offsets in VRR ranges
Alexander Usyskin (3):
mei: me: add arrow lake point S DID
mei: me: add arrow lake point H DID
mei: gsc_proxy: match component when GSC is on different bus
Andreas Pape (1):
ASoC: rcar: adg: correct TIMSEL setting for SSI9
Andrew Ballance (1):
scripts/gdb/symbols: fix invalid escape sequence warning
Andrey Skvortsov (1):
crypto: sun8i-ce - Fix use after free in unprepare
Andy Chi (1):
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook
Animesh Manna (1):
drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector()
Antonio Borneo (1):
pinctrl: stm32: fix PM support for stm32mp257
Arnd Bergmann (1):
net: bql: fix building with BQL disabled
Aya Levin (1):
net/mlx5: Fix fw reporter diagnose output
Badhri Jagan Sridharan (1):
usb: typec: tpcm: Fix PORT_RESET behavior for self powered devices
Bart Van Assche (2):
Revert "fs/aio: Make io_cancel() generate completions again"
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
Bartosz Golaszewski (1):
pinctrl: don't put the reference to GPIO device in pinctrl_pins_show()
Charles Keepax (1):
spi: cs42l43: Don't limit native CS to the first chip select
Christophe JAILLET (1):
i2c: wmt: Fix an error handling path in wmt_i2c_probe()
Coiby Xu (1):
integrity: eliminate unnecessary "Problem loading X.509 certificate" msg
Cong Yang (1):
drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP
and HBP (again)
Cosmin Tanislav (2):
iio: accel: adxl367: fix DEVID read after reset
iio: accel: adxl367: fix I2C FIFO data register
Daniel Baluta (1):
MAINTAINERS: Use a proper mailinglist for NXP i.MX development
Daniel Borkmann (2):
xdp, bonding: Fix feature flags when there are no slave devs anymore
selftests/bpf: Fix up xdp bonding test wrt feature flags
Dave Airlie (1):
nouveau: lock the client object tree.
Dawei Li (1):
firmware: microchip: Fix over-requested allocation size
Dmitry Baryshkov (1):
Revert "arm64: dts: qcom: msm8996: Hook up MPM"
Douglas Anderson (3):
Revert "tty: serial: simplify qcom_geni_serial_send_chunk_fifo()"
drm/udl: Add ARGB8888 as a format
Revert "drm/udl: Add ARGB8888 as a format"
Edmund Raile (1):
firewire: ohci: prevent leak of left-over IRQ on unbind
Eduard Zingerman (2):
bpf: check bpf_func_state->callback_depth when pruning states
selftests/bpf: test case for callback_depth states pruning logic
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Ekansh Gupta (1):
misc: fastrpc: Pass proper arguments to scm call
Emeel Hakim (1):
net/mlx5e: Fix MACsec state loss upon state update in offload path
Emil Tantilov (1):
idpf: disable local BH when scheduling napi for marker packets
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Fabio Estevam (1):
ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE
Florian Kauer (1):
igc: avoid returning frame twice in XDP_REDIRECT
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Francesco Dolcini (1):
ARM: dts: imx7: remove DSI port endpoints
Frej Drejhammar (1):
comedi: comedi_8255: Correct error in subdevice initialization
Gao Xiang (2):
erofs: fix uninitialized page cache reported by KMSAN
erofs: apply proper VMA alignment for memory mapped files on THP
Gavin Li (1):
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
Geliang Tang (1):
selftests: mptcp: diag: return KSFT_FAIL not test_cnt
Guillaume Nault (1):
xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
Hans de Goede (2):
misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice
on suspend/resume
platform/x86: p2sb: On Goldmont only cache P2SB and SPI devfn BAR
Harshit Mogalapalli (1):
platform/x86/amd/pmf: Fix missing error code in amd_pmf_init_smart_pc()
Heiner Kallweit (2):
i2c: i801: Fix using mux_pdev before it's set
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
Herbert Xu (1):
crypto: rk3288 - Fix use after free in unprepare
Horatiu Vultur (1):
net: sparx5: Fix use after free inside sparx5_del_mact_entry
Ian Abbott (1):
comedi: comedi_test: Prevent timers rescheduling during deletion
Imre Deak (2):
drm: Fix output poll work for drm_kms_helper_poll=n
drm/i915/dp: Fix connector DSC HW state readout
Ivan Vecera (1):
i40e: Fix firmware version comparison function
Jacob Keller (1):
ice: virtchnl: stop pretending to support RSS over AQ or registers
Jakub Kicinski (2):
page_pool: fix netlink dump stop/resume
dpll: move all dpll<>netdev helpers to dpll code
Janusz Krzysztofik (1):
drm/i915/selftests: Fix dependency of some timeouts on HZ
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Javier Carrasco (1):
Revert "Input: bcm5974 - check endpoint type before starting traffic"
Jean-Baptiste Maneyrol (2):
iio: imu: inv_mpu6050: fix FIFO parsing when empty
iio: imu: inv_mpu6050: fix frequency setting when chip is off
Jernej Skrabec (1):
arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile
Jesse Brandeburg (1):
ice: fix typo in assignment
Jianbo Liu (2):
net/mlx5: E-switch, Change flow rule destination checking
net/mlx5e: Change the warning when ignore_flow_level is not supported
Johan Hovold (4):
arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
phy: qcom-qmp-combo: fix drm bridge registration
phy: qcom-qmp-combo: fix type-c switch registration
Jon Hunter (1):
arm64: tegra: Fix Tegra234 MGBE power-domains
Kailang Yang (2):
ALSA: hda/realtek - Fix headset Mic no show at resume back for
Lenovo ALC897 platform
ALSA: hda/realtek - Add Headset Mic supported Acer NB platform
Kamalesh Babulal (1):
cgroup/cpuset: Fix retval in update_cpumask()
Karol Herbst (1):
drm/nouveau: fix stale locked mutex in nouveau_gem_ioctl_pushbuf
Kees Cook (2):
iio: pressure: dlhl60d: Initialize empty DLH bytes
init/Kconfig: lower GCC version check for -Warray-bounds
Konrad Dybcio (1):
arm64: dts: qcom: sm6115: Fix missing interconnect-names
Krishna Kurapati (1):
usb: gadget: ncm: Fix handling of zero block length packets
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
Leon Romanovsky (1):
xfrm: Pass UDP encapsulation in TX packet offload
Li Ma (1):
drm/amd/swsmu: modify the gfx activity scaling
Linus Torvalds (2):
iov_iter: get rid of 'copy_mc' flag
Linux 6.8
Liu Ying (1):
arm64: dts: imx8mp: Fix LDB clocks property
Ma Jun (1):
drm/amdgpu/pm: Fix the error of pwm1_enable setting
Maciej Fijalkowski (3):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ice: reorder disabling IRQ and NAPI in ice_qp_dis
Marek Vasut (1):
arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM
Masahisa Kojima (1):
MAINTAINERS: net: netsec: add myself as co-maintainer
Mathias Krause (1):
Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal
Mathias Nyman (2):
usb: port: Don't try to peer unused USB ports based on location
xhci: Fix failure to detect ring expansion need.
Matthew Auld (1):
drm/tests/buddy: fix print format
Matthieu Baerts (NGI0) (1):
selftests: mptcp: diag: avoid extra waiting
Max Nguyen (1):
Input: xpad - add additional HyperX Controller Identifiers
Melissa Wen (1):
drm/amd/display: check dc_link before dereferencing
Michael Kelley (8):
Drivers: hv: vmbus: Calculate ring buffer size for more
efficient use of memory
fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()
Drivers: hv: vmbus: Remove duplication and cleanup code in
create_gpadl_header()
Drivers: hv: vmbus: Update indentation in create_gpadl_header()
Documentation: hyperv: Add overview of PCI pass-thru device support
x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
x86/mm: Regularize set_memory_p() parameters and make non-static
x86/hyperv: Make encrypted/decrypted changes safe for
load_unaligned_zeropad()
Michal Schmidt (1):
ice: fix uninitialized dplls mutex usage
Michal Swiatkowski (1):
ice: reconfig host after changing MSI-X on VF
Mika Westerberg (1):
thunderbolt: Fix NULL pointer dereference in tb_port_update_credits()
Mike Yu (2):
xfrm: fix xfrm child route lookup for packet offload
xfrm: set skb control buffer based on packet offload as well
Moshe Shemesh (1):
net/mlx5: Check capability for fw_reset
Nathan Chancellor (1):
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
Neil Armstrong (3):
arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
usb: typec: ucsi: fix UCSI on SM8550 & SM8650 Qualcomm devices
Nicolas Pitre (1):
vt: fix unicode buffer corruption when deleting characters
Niklas Cassel (1):
mailmap: fix Kishon's email
Niklas Söderlund (1):
dt-bindings: net: renesas,ethertsn: Document default for delays
Nirmoy Das (1):
drm/i915: Check before removing mm notifier
Nuno Sa (1):
counter: fix privdata alignment
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Pablo Neira Ayuso (3):
netfilter: nf_tables: disallow anonymous set with timeout flag
netfilter: nf_tables: reject constant set with timeout
netfilter: nf_tables: mark set as dead when unbinding anonymous
set with timeout
Paolo Bonzini (1):
SEV: disable SEV-ES DebugSwap by default
Peter Collingbourne (1):
serial: 8250_dw: Do not reclock if already at correct rate
Peter Martincic (1):
hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
Puranjay Mohan (1):
arm64: prohibit probing on arch_kunwind_consume_entry()
Qi Zheng (1):
mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails
Quentin Schulz (2):
regulator: rk808: fix buck range on RK806
regulator: rk808: fix LDO range on RK806
RD Babiera (1):
usb: typec: altmodes/displayport: create sysfs nodes as driver's
default device attribute group
Rahul Rameshbabu (2):
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit
submission tracking occurs after populating the metadata_map
net/mlx5e: Switch to using _bh variant of of spinlock API in
port timestamping NAPI poll context
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()
Ricardo B. Marliere (1):
Drivers: hv: vmbus: make hv_bus const
Rickard x Andersson (1):
tty: serial: imx: Fix broken RS485
Rob Herring (1):
ASoC: dt-bindings: nvidia: Fix 'lge' vendor prefix
Rodrigo Vivi (1):
drm/xe: Return immediately on tile_init failure
Saeed Mahameed (1):
Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"
Sasha Neftin (1):
intel: legacy: Partial revert of field get conversion
Saurabh Sengar (1):
x86/hyperv: Allow 15-bit APIC IDs for VTL platforms
Sean Christopherson (8):
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
KVM: selftests: Add a testcase to verify GUEST_MEMFD and
READONLY are exclusive
KVM: SVM: Flush pages under kvm->lock to fix UAF in
svm_register_enc_region()
KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
Sherry Sun (1):
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
Stefan Binding (3):
ALSA: hda: cs35l41: Support Lenovo Thinkbook 16P
ALSA: hda/realtek: Add quirks for Lenovo Thinkbook 16P laptops
ALSA: hda: cs35l41: Overwrite CS35L41 configuration for ASUS UM5302LA
Steven Rostedt (Google) (7):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
tracing: Remove precision vsnprintf() check from print event
tracing: Limit trace_seq size to just 8K and not depend on
architecture PAGE_SIZE
tracing: Limit trace_marker writes to just 4K
ring-buffer: Fix waking up ring buffer readers
ring-buffer: Fix resetting of shortest_full
tracing: Use .flush() call to wake up readers
Stuart Henderson (4):
ASoC: madera: Fix typo in madera_set_fll_clks shift value
ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
Sumit Garg (1):
tee: optee: Fix kernel panic caused by incorrect error handling
Suraj Kandpal (3):
drm/i915/hdcp: Move to direct reads for HDCP
drm/i915/hdcp: Remove additional timing for reading mst hdcp message
drm/i915/hdcp: Extract hdcp structure from correct connector
Thierry Reding (1):
arm64: tegra: Set the correct PHY mode for MGBE
Tobias Jakobi (Compleo) (1):
net: dsa: microchip: fix register write order in ksz8_ind_write8()
Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
Tommy Huang (1):
i2c: aspeed: Fix the dummy irq expected print
Tvrtko Ursulin (1):
MAINTAINERS: Update email address for Tvrtko Ursulin
Uwe Kleine-König (1):
Input: gpio_keys_polled - suppress deferred probe error for gpio
Vasileios Amoiridis (1):
iio: pressure: Fixes BMP38x and BMP390 SPI support
Ville Syrjälä (1):
drm/i915: Don't explode when the dig port we don't have an AUX CH
Vlastimil Babka (2):
mm, vmscan: prevent infinite loop for costly GFP_NOIO |
__GFP_RETRY_MAYFAIL allocations
mm, mmap: fix vma_merge() case 7 with vma_ops->close
Waiman Long (1):
cgroup/cpuset: Fix a memory leak in update_exclusive_cpumask()
Wentong Wu (1):
mei: Add Meteor Lake support for IVSC device
Xiubo Li (1):
libceph: init the cursor when preparing sparse read in msgr2
Yicong Yang (1):
serial: port: Don't suspend if the port is still busy
Yongzhi Liu (1):
net: pds_core: Fix possible double free in error handling path
songxiebing (1):
ALSA: hda: optimize the probe codec process
^ permalink raw reply [relevance 51%]
* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
2024-03-08 21:39 99% ` Linus Torvalds
@ 2024-03-08 21:41 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-08 21:41 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent
On Fri, 8 Mar 2024 at 13:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So the above "complexity" is *literally* just changing the
>
> (new = atomic_read_acquire(&my->seq)) != old
>
> condition to
>
> should_exit ||
> (new = atomic_read_acquire(&my->seq)) != old
.. and obviously you'll need to add the exit condition to the actual
"deal with events" loop too.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
@ 2024-03-08 21:39 99% ` Linus Torvalds
2024-03-08 21:41 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-08 21:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent
On Fri, 8 Mar 2024 at 13:33, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> There's two layers:
>
> 1) the ring buffer has the above simple producer / consumer.
> Where the wake ups can happen at the point of where the buffer has
> the amount filled that the consumer wants to start consuming with.
>
> 2) The tracing layer; Here on close of a file, the consumers need to be
> woken up and not wait again. And just take whatever was there to finish
> reading.
>
> There's also another case that the ioctl() just kicks the current
> readers out, but doesn't care about new readers.
But that's the beauty of just using the wait_event() model.
Just add that "exit" condition to the condition.
So the above "complexity" is *literally* just changing the
(new = atomic_read_acquire(&my->seq)) != old
condition to
should_exit ||
(new = atomic_read_acquire(&my->seq)) != old
(replace "should_exit" with whatever that condition is, of course) and
the wait_event() logic will take care of the rest.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters
@ 2024-03-08 20:39 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-08 20:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, joel, linke li, Rabin Vincent
On Fri, 8 Mar 2024 at 10:38, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> A patch was sent to "fix" the wait_index variable that is used to help with
> waking of waiters on the ring buffer. The patch was rejected, but I started
> looking at associated code. Discussing it on IRC with Mathieu Desnoyers
> we discovered a design flaw.
Honestly, all of this seems excessively complicated.
And your new locking shouldn't be necessary if you just do things much
more simply.
Here's what I *think* you should do:
struct xyz {
...
atomic_t seq;
struct wait_queue_head seq_wait;
...
};
with the consumer doing something very simple like this:
int seq = atomic_read_acquire(&my->seq);
for (;;) {
.. consume outstanding events ..
seq = wait_for_seq_change(seq, my);
}
and the producer being similarly trivial, just having a
"add_seq_event()" at the end:
... add whatever event ..
add_seq_event(my);
And the helper functions for this are really darn simple:
static inline int wait_for_seq_change(int old, struct xyz *my)
{
int new;
wait_event(my->seq_wait,
(new = atomic_read_acquire(&my->seq)) != old);
return new;
}
static inline void add_seq_event(struct xyz *my)
{
atomic_fetch_inc_release(&my->seq);
wake_up(&my->seq_wait);
}
Note how you don't need any new locks, and note how "wait_event()"
will do all the required optimistic stuff for you (ie it will check
that "has seq changed" before even bothering to add itself to the wait
queue etc).
So the above is not only short and sweet, it generates fairly good
code too, and doesn't it look really simple and fairly understandable?
And - AS ALWAYS - the above isn't actually tested in any way, shape or form.
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-07 22:09 87% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07 22:09 UTC (permalink / raw)
To: Julia Lawall
Cc: Paul E. McKenney, Mathieu Desnoyers, Steven Rostedt, linke li,
joel, boqun.feng, dave, frederic, jiangshanlai, josh,
linux-kernel, qiang.zhang1211, quic_neeraju, rcu
On Thu, 7 Mar 2024 at 13:40, Julia Lawall <julia.lawall@inria.fr> wrote:
>
> I tried the following:
>
> @@
> expression x;
> @@
>
> *WRITE_ONCE(x,<+...READ_ONCE(x)...+>)
>
> This gave a number of results, shown below. Let me know if some of them
> are undesirable.
Well, all the ones you list do look like garbage.
That said, quite often the garbage does seem to be "we don't actually
care about the result". Several of them look like statistics.
Some of them look outright nasty, though:
> --- /home/julia/linux/net/netfilter/nf_tables_api.c
> +++ /tmp/nothing/net/netfilter/nf_tables_api.c
> @@ -10026,8 +10026,6 @@ static unsigned int nft_gc_seq_begin(str
> unsigned int gc_seq;
>
> /* Bump gc counter, it becomes odd, this is the busy mark. */
> - gc_seq = READ_ONCE(nft_net->gc_seq);
> - WRITE_ONCE(nft_net->gc_seq, ++gc_seq);
The above is garbage code, and the comment implies that it is garbage
code that _should_ be reliable.
> diff -u -p /home/julia/linux/fs/xfs/xfs_icache.c /tmp/nothing/fs/xfs/xfs_icache.c
> --- /home/julia/linux/fs/xfs/xfs_icache.c
> +++ /tmp/nothing/fs/xfs/xfs_icache.c
> @@ -2076,8 +2076,6 @@ xfs_inodegc_queue(
> cpu_nr = get_cpu();
> gc = this_cpu_ptr(mp->m_inodegc);
> llist_add(&ip->i_gclist, &gc->list);
> - items = READ_ONCE(gc->items);
> - WRITE_ONCE(gc->items, items + 1);
In contrast, this is also garbage code, but the only user of it seems
to be a heuristic, so if 'items' is off by one (or by a hundred), it
probably doesn't matter.
The xfs code is basically using that 'items' count to decide if it
really wants to do GC or not.
This is actually a case where having a "UNSAFE_INCREMENTISH()" macro
might make sense.
That said, this is also a case where using a "local_t" and using
"local_add_return()" might be a better option. It falls back on true
atomics, but at least on x86 you probably get *better* code generation
for the "incrementish" operation than you get with READ_ONCE ->
WRITE_ONCE.
> diff -u -p /home/julia/linux/kernel/rcu/tree.c /tmp/nothing/kernel/rcu/tree.c
> --- /home/julia/linux/kernel/rcu/tree.c
> +++ /tmp/nothing/kernel/rcu/tree.c
> @@ -1620,8 +1620,6 @@ static void rcu_gp_fqs(bool first_time)
> /* Clear flag to prevent immediate re-entry. */
> if (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) {
> raw_spin_lock_irq_rcu_node(rnp);
> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) & ~RCU_GP_FLAG_FQS);
> raw_spin_unlock_irq_rcu_node(rnp);
This smells bad to me. The code is holding a lock, but apparently not
one that protects gp_flags.
And that READ_ONCE->WRITE_ONCE sequence can corrupt all the other flags.
Maybe it's fine for some reason (that reason being either that the
ONCE operations aren't actually needed at all, or because nobody
*really* cares about the flags), but it smells.
> @@ -1882,8 +1880,6 @@ static void rcu_report_qs_rsp(unsigned l
> {
> raw_lockdep_assert_held_rcu_node(rcu_get_root());
> WARN_ON_ONCE(!rcu_gp_in_progress());
> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(), flags);
Same field, same lock held, same odd smelly pattern.
> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
.. and again.
> --- /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> @@ -80,8 +80,6 @@ static int cn23xx_vf_reset_io_queues(str
> q_no);
> return -1;
> }
> - WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> - ~CN23XX_PKT_INPUT_CTL_RST);
> octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
> READ_ONCE(reg_val));
I suspect this is garbage that has been triggered by the usual
mindless "fix the symptoms, not the bug" as a result of a "undefined
behavior report".
>> --- /home/julia/linux/kernel/kcsan/kcsan_test.c
> +++ /tmp/nothing/kernel/kcsan/kcsan_test.c
> @@ -381,7 +381,6 @@ static noinline void test_kernel_change_
> test_var ^= TEST_CHANGE_BITS;
> kcsan_nestable_atomic_end();
> } else
> - WRITE_ONCE(test_var, READ_ONCE(test_var) ^ TEST_CHANGE_BITS);
Presumably this is intentionally testing whether KCSAN notices these
things at all.
> diff -u -p /home/julia/linux/arch/s390/kernel/idle.c /tmp/nothing/arch/s390/kernel/idle.c
> /* Account time spent with enabled wait psw loaded as idle time. */
> - WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> - WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
> account_idle_time(cputime_to_nsecs(idle_time));
This looks like another "UNSAFE_INCREMENTISH()" case.
> --- /home/julia/linux/mm/mmap.c
> +++ /tmp/nothing/mm/mmap.c
> @@ -3476,7 +3476,6 @@ bool may_expand_vm(struct mm_struct *mm,
>
> void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
> {
> - WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm)+npages);
As does this.
> diff -u -p /home/julia/linux/fs/xfs/libxfs/xfs_iext_tree.c /tmp/nothing/fs/xfs/libxfs/xfs_iext_tree.c
> static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp)
> {
> - WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1);
> }
Ugh. A sequence count that is "incrementish"? That smells wrong to me.
But I didn't go look at the users. Maybe it's another case of "we
don't *actually* care about the sequence count".
>
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> @@ -379,8 +379,6 @@ static int cn23xx_reset_io_queues(struct
> q_no);
> return -1;
> }
> - WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> - ~CN23XX_PKT_INPUT_CTL_RST);
> ....
> - WRITE_ONCE(d64, READ_ONCE(d64) &
> - (~(CN23XX_PKT_INPUT_CTL_RING_ENB)));
> - WRITE_ONCE(d64, READ_ONCE(d64) | CN23XX_PKT_INPUT_CTL_RST);
More "likely wrong" cases.
> +++ /tmp/nothing/mm/kfence/kfence_test.c
> @@ -501,7 +501,6 @@ static void test_kmalloc_aligned_oob_wri
> * fault immediately after it.
> */
> expect.addr = buf + size;
> - WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);
Looks like questionable test-code again.
> +++ /tmp/nothing/io_uring/io_uring.c
> @@ -363,7 +363,6 @@ static void io_account_cq_overflow(struc
> {
> struct io_rings *r = ctx->rings;
>
> - WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
> ctx->cq_extra--;
Bah. Looks like garbage, but the kernel doesn't actually use that
value. Looks like a random number generator exposed to user space.
Presumably this is another "statistics, but I don't care enouhg".
> @@ -2403,8 +2402,6 @@ static bool io_get_sqe(struct io_ring_ct
> - WRITE_ONCE(ctx->rings->sq_dropped,
> - READ_ONCE(ctx->rings->sq_dropped) + 1);
As is the above.
> +++ /tmp/nothing/security/apparmor/apparmorfs.c
> @@ -596,7 +596,6 @@ static __poll_t ns_revision_poll(struct
>
> void __aa_bump_ns_revision(struct aa_ns *ns)
> {
> - WRITE_ONCE(ns->revision, READ_ONCE(ns->revision) + 1);
> wake_up_interruptible(&ns->wait);
This looks like somebody copied the RCU / tracing pattern?
> +++ /tmp/nothing/arch/riscv/kvm/vmid.c
> @@ -90,7 +90,6 @@ void kvm_riscv_gstage_vmid_update(struct
>
> /* First user of a new VMID version? */
> if (unlikely(vmid_next == 0)) {
> - WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1);
> vmid_next = 1;
Looks bogus and wrong. An unreliable address space version does _not_
sound sane, but who knows.
Anyway, from a quick look, there's a mix of "this is just wrong" and a
couple of "this seems to just want approximate statistics".
Maybe the RCU 'flags' field is using WRITE_ONCE() because while the
spinlock protects the bit changes, there are readers that look at
other bits with READ_ONCE.
That would imply that the READ_ONCE->WRITE_ONCE is just broken garbage
- the WRITE_ONCE() part may be right, but the READ_ONCE is wrong
because the value is stable.
Linus
^ permalink raw reply [relevance 87%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-07 20:00 96% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-07 20:00 UTC (permalink / raw)
To: paulmck
Cc: Mathieu Desnoyers, Steven Rostedt, linke li, joel, boqun.feng,
dave, frederic, jiangshanlai, josh, linux-kernel,
qiang.zhang1211, quic_neeraju, rcu
On Thu, 7 Mar 2024 at 11:47, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> > - The per-thread counter (Thread-Local Storage) incremented by a single
> > thread, read by various threads concurrently, is a good target
> > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> > various liburcu implementations which track read-side critical sections
> > per-thread.
>
> Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> similar?
Absolutely not.
The READ_ONCE->WRITE_ONCE pattern is almost certainly a bug.
The valid reason to have a WRITE_ONCE() is that there's a _later_
READ_ONCE() on another CPU.
So WRITE_ONCE->READ_ONCE (across threads) is very valid. But
READ_ONCE->WRITE_ONCE (inside a thread) simply is not a valid
operation.
We do have things like "local_t", which allows for non-smp-safe local
thread atomic accesses, but they explicitly are *NOT* about some kind
of READ_ONCE -> WRITE_ONCE sequence that by definition cannot be
atomic unless you disable interrupts and disable preemption (at which
point they become pointless and only generate worse code).
But the point of "local_t" is that you can do things that aresafe if
there is no SMP issues. They are kind of an extension of the
percpu_add() kind of operations.
In fact, I think it might be interesting to catch those
READ_ONCE->WRITE_ONCE chains (perhaps with coccinelle?) because they
are a sign of bugs.
Now, there's certainly some possibility of "I really don't care about
some stats, I'm willing to do non-smp-safe and non-thread safe
operations if they are faster". So I'm not saying a
READ_ONCE->WRITE_ONCE data dependency is _always_ a bug, but I do
think it's a pattern that is very very close to being one.
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
2024-03-07 2:43 93% ` Linus Torvalds
@ 2024-03-07 2:49 99% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07 2:49 UTC (permalink / raw)
To: paulmck
Cc: Steven Rostedt, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I dunno.
Oh, and just looking at that patch, I still think the code is confused.
On the reading side, we have:
pipe_count = smp_load_acquire(&p->rtort_pipe_count);
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
/* Should not happen, but... */
where that comment clearly says that the pipe_count we read (whether
with READ_ONCE() or with my smp_load_acquire() suggestion) should
never be larger than RCU_TORTURE_PIPE_LEN.
But the writing side very clearly did:
i = rp->rtort_pipe_count;
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
...
smp_store_release(&rp->rtort_pipe_count, ++i);
(again, syntactically it could have been "i + 1" instead of my "++i" -
same value), so clearly the writing side *can* write a value that is >
RCU_TORTURE_PIPE_LEN.
So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
is one thing that might be worth looking at, I think there are other
very confusing aspects here.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-07 2:43 93% ` Linus Torvalds
2024-03-07 2:49 99% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-07 2:43 UTC (permalink / raw)
To: paulmck
Cc: Steven Rostedt, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]
On Wed, 6 Mar 2024 at 18:29, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> TL;DR: Those ->rtort_pipe_count increments cannot run concurrently
> with each other or any other update of that field, so that update-side
> READ_ONCE() call is unnecessary and the update-side plain C-language
> read is OK. The WRITE_ONCE() calls are there for the benefit of the
> lockless read-side accesses to rtort_pipe_count.
Ahh. Ok. That makes a bit more sense.
So if that's the case, then the "updating side" should never use
READ_ONCE, because there's nothing else to protect against.
Honestly, this all makes me think that we'd be *much* better off
showing the real "handoff" with smp_store_release() and
smp_load_acquire().
IOW, something like this (TOTALLY UNTESTED!) patch, perhaps?
And please note that this patch is not only untested, it really is a
very handwavy patch.
I'm sending it as a patch just because it's a more precise way of
saying "I think the writers and readers could use the store-release ->
load-acquire not just to avoid any worries about accessing things
once, but also as a way to show the directional 'flow' of the data".
I dunno.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1690 bytes --]
kernel/rcu/rcutorture.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7567ca8e743c..60b74df3eae2 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -461,12 +461,12 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
WRITE_ONCE(rp->rtort_chkp, NULL);
smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
}
- i = READ_ONCE(rp->rtort_pipe_count);
+ i = rp->rtort_pipe_count;
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
- WRITE_ONCE(rp->rtort_pipe_count, i + 1);
- if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
+ smp_store_release(&rp->rtort_pipe_count, ++i);
+ if (i >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}
@@ -1408,8 +1408,7 @@ rcu_torture_writer(void *arg)
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
- WRITE_ONCE(old_rp->rtort_pipe_count,
- old_rp->rtort_pipe_count + 1);
+ smp_store_release(&old_rp->rtort_pipe_count, ++i);
// Make sure readers block polled grace periods.
if (cur_ops->get_gp_state && cur_ops->poll_gp_state) {
@@ -1991,7 +1990,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
rcu_torture_reader_do_mbchk(myid, p, trsp);
rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
preempt_disable();
- pipe_count = READ_ONCE(p->rtort_pipe_count);
+ pipe_count = smp_load_acquire(&p->rtort_pipe_count);
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
/* Should not happen, but... */
pipe_count = RCU_TORTURE_PIPE_LEN;
^ permalink raw reply related [relevance 93%]
* Re: [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled
@ 2024-03-07 0:09 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-07 0:09 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev, LKML,
the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
On Wed, 6 Mar 2024 at 14:08, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Something like below one?
I'd rather leave the regular fallbacks (to memcpy and copy_to_user())
alone, and I'd just put the
kmsan_memmove(dst, src, len - ret);
etc in the places that currently just call the MC copy functions.
The copy_mc_to_user() logic is already set up for that, since it has
to do the __uaccess_begin/end().
Changing copy_mc_to_kernel() to look visually the same would only
improve on this horror-show, I feel.
Obviously some kmsan person needs to validate your kmsan_memmove() thing, but
> Can we assume that 0 <= ret <= len is always true?
Yes. It had better be for other reasons.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
2024-03-06 19:46 92% ` Linus Torvalds
@ 2024-03-06 20:20 97% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 20:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 11:46, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That 'rtort_pipe_count' should be an atomic_t, and the "add one and
> return the old value" should be an "atomic_inc_return()-1" (the "-1"
> is because "inc_return" returns the *new* value).
Bah. I am lost in a twisty maze of operations, all the same.
One final correction to myself: if you want the old value, the nicer
thing to use is probably just "atomic_fetch_inc()".
It generates the same result as "atomic_inc_return()-1", but since we
do have that native "return old value" variant of this, let's just use
it.
So the rules are "atomic_op_return()" returns the new value after the
op, and "atomic_fetch_op()" returns the old value.
For some ops, this matters more than for others. For 'add' like
operations, it's you can deduce the old from the new (and vice versa).
But for bitwise ops, only the 'fetch" version makes much sense,
because you can see the end result from that, but you can't figure out
the original value from the final one.
And to *really* confuse things, as with the memory ordering variants,
we don't always have the full complement of operations.
So we have atomic_fetch_and() (returns old version) and atomic_and()
(doesn't return any version), but we don't have "atomic_and_return()"
because it's less useful.
But for 'inc' we have all three.
Linus
^ permalink raw reply [relevance 97%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-06 20:06 94% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 20:06 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 11:45, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Here's the back story. I received the following patch:
>
> https://lore.kernel.org/all/tencent_BA1473492BC618B473864561EA3AB1418908@qq.com/
>
> I didn't like it. My reply was:
>
> > - rbwork->wait_index++;
> > + WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);
>
> I mean the above is really ugly. If this is the new thing to do, we need
> better macros.
>
> If anything, just convert it to an atomic_t.
The right thing is definitely to convert it to an atomic_t.
The memory barriers can probably also be turned into atomic ordering,
although we don't always have all the variates.
But for example, that
/* Make sure to see the new wait index */
smp_rmb();
if (wait_index != work->wait_index)
break;
looks odd, and should probably do an "atomic_read_acquire()" instead
of a rmb and a (non-atomic and non-READ_ONCE thing).
The first READ_ONCE() should probably also be that atomic_read_acquire() op.
On the writing side, my gut feel is that the
rbwork->wait_index++;
/* make sure the waiters see the new index */
smp_wmb();
should be an "atomic_inc_release(&rbwork->wait_index);" but we don't
actually have that operation. We only have the "release" versions for
things that return a value.
So it would probably need to be either
atomic_inc(&rbwork->wait_index);
/* make sure the waiters see the new index */
smp_wmb();
or
atomic_inc_return_release(&rbwork->wait_index);
or we'd need to add the "basic atomics with ordering semantics" (which
we aren't going to do unless we end up with a lot more people who want
them).
I dunno. I didn't look all *that* closely at the code. The above might
be garbage too. Somebody who actually knows the code should think
about what ordering they actually were looking for.
(And I note that 'wait_index' is of type 'long' in 'struct
rb_irq_work', so I guess it should be "atomic_long_t" instead - just
shows how little attention I paid on the first read-through, which
should make everybody go "I need to double-check Linus here")
Linus
^ permalink raw reply [relevance 94%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-06 19:46 92% ` Linus Torvalds
2024-03-06 20:20 97% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 19:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Note this has nothing to do with tracing. This thread is in RCU. I just
> happen to receive the same patch "fix" for my code.
Ok, googling for rtort_pipe_count, I can only state that that code is
complete garbage.
And no amount of READ_ONCE/WRITE_ONCE will fix it.
For one thing, we have this code:
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
which is broken by design. The compiler is allowed to (and probably
does) turn that into just
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
which only results in the question "Why didn't the source code do that
obvious simplification itself?"
So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
regular reads in ways that just makes me go: "there's no saving this
shit".
This needs fixing. Having tests that have random code in them only
makes me doubt that the *TEST* itself is correct, rather than the code
it is trying to actually test.
And dammit, none of that makes sense anyway. This is not some
performance-crticial code. Why is it not using proper atomics if there
is an actual data race?
The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
lot faster than atomics, or - more commonly - because you have some
fundamental algorithm that doesn't do arithmetic, but cares about some
"state at time X" (the RCU _pointer_ being one such obvious case, but
doing an *increment* sure as hell isn't).
So using those READ_ONCE/WRITE_ONCE macros for that thing is
fundamntally wrong to begin with.
The question should not be "should we add another READ_ONCE()". The
question should be "what drugs were people on when writing this code"?
People - please just stop writing garbage.
That 'rtort_pipe_count' should be an atomic_t, and the "add one and
return the old value" should be an "atomic_inc_return()-1" (the "-1"
is because "inc_return" returns the *new* value).
And feel free to add "_relaxed()" to that atomic op because this code
doesn't care about ordering of that counter. It will help on some
architectures, but as mentioned, this is not performance-crticial code
to begin with.
Linus
^ permalink raw reply [relevance 92%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
2024-03-06 19:01 95% ` Linus Torvalds
@ 2024-03-06 19:27 95% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06 19:27 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 11:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In some individual tracing C file where it has a comment above it how
> it's braindamaged and unsafe and talking about why it's ok in that
> particular context? Go wild.
Actually, I take that back.
Even in a random C file, the naming makes no sense. There's no "once" about it.
So if you want to do something like
#define UNSAFE_INCREMENTISH(x) (WRITE_ONCE(a, READ_ONCE(a) + 1))
then that's fine, I guess. Because that's what the operation is.
It's not safe, and it's not an increment, but it _approximates_ an
increment most of the time. So UNSAFE_INCREMENTISH() pretty much
perfectly describes what it is doing.
Note that you'll also almost certainly end up with worse code
generation, ie don't expect to see a single "inc" instruction (or "add
$1") for the above.
Because at least for gcc, the volatiles involved with those "ONCE"
operations end up often generating much worse code, so rather than an
"inc" instruction, you'll almost certainly get "load+add+store" and
the inevitable code expansion and extra register use.
I really don't know what you want to do, but it smells bad. A big
comment about why you'd want that "incrementish" operation will be
needed.
To me, this smells like "Steven did something fundamentally wrong
again, some tool is now complaining about it, and Steven doesn't want
to fix the problem but instead paper over it again".
Not a good look.
But I don't have a link to the original report, and I'm not thrilled
enough about this to go looking for it.
Linus
^ permalink raw reply [relevance 95%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-06 19:01 95% ` Linus Torvalds
2024-03-06 19:27 95% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 19:01 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 10:53, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Now, are you OK with an addition of ADD_ONCE() and/or INC_ONCE()? So that we
> don't have to look at:
>
> WRITE_ONCE(a, READ_ONCE(a) + 1);
>
> ?
In a generic header file under include/linux/?
Absolutely not. The above is a completely broken operation. There is
no way in hell we should expose it as a general helper.
So there is no way we'd add that kind of sh*t-for-brains operation in
(for example) our <asm/rwonce.h> header file next to the normal
READ/WRITE_ONCE defines.
In some individual tracing C file where it has a comment above it how
it's braindamaged and unsafe and talking about why it's ok in that
particular context? Go wild.
But honestly, I do not see when a ADD_ONCE() would ever be a valid
thing to do, and *if* it's a valid thing to do, why you'd do it with
READ_ONCE and WRITE_ONCE.
If you don't care about races, just do a simple "++" and be done with
it. The end result is random.
Adding a "ADD_ONCE()" macro doesn't make it one whit less random. It
just makes a broken concept even uglier.
So honestly, I think the ADD_ONCE macro not only needs to be in some
tracing-specific C file, the comment needs to be pretty damn big too.
Because as a random number generator, it's not even a very good one.
So you need to explain *why* you want a particularly bad random number
generator in the first place.
Linus
^ permalink raw reply [relevance 95%]
* Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug
@ 2024-03-06 18:43 87% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06 18:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Paul E. McKenney, linke li, joel, boqun.feng, dave, frederic,
jiangshanlai, josh, linux-kernel, mathieu.desnoyers,
qiang.zhang1211, quic_neeraju, rcu
On Wed, 6 Mar 2024 at 09:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> IIRC, the original purpose of READ_ONCE() and WRITE_ONCE() was to make sure
> that the compiler only reads or writes the variable "once". Hence the name.
> That way after a load, you don't need to worry that the content of the
> variable you read isn't going to be read again from the original location
> because the compiler decided to save stack space and registers.
>
> But that macro has now been extended for other purposes.
Not really.
Tearing of simple types (as opposed to structures or bitfields or
"more than one word" or whatever) has never really been a real
concern.
It keeps being brought up as a "compilers could do this", but it's
basically just BS fear-mongering. Compilers _don't_ do it, and the
reason why compilers don't do it isn't some "compilers are trying to
be nice" issue, but simply a "it is insane and generates worse code"
issue.
So what happens is that READ_ONCE() and WRITE_ONCE() have always been
about reading and writing *consistent* values. There is no locking,
but the idea is - and has always been - that you get one *single*
answer from READ_ONCE(), and that single answer will always be
consistent with something that has been written by WRITE_ONCE.
That's often useful - lots of code doesn't really care if you get the
old or the new value, but the code *does* care that it gets *one*
value, and not some random mix of "I tested one value for validity,
then it got reloaded due to register pressure, and I actually used
another value".
And not some "I read one value, and it was a mix of two other values".
But in order to get those semantics, the READ_ONCE() and WRITE_ONCE()
macros don't do just the 'volatile' (to get the "no reloads"
guarantee), but they also do that "simple types" check.
So READ_ONCE/WRITE_ONCE has never really been "extended for other
purposes". The purpose has always been the same: one single consistent
value.
What did happen that our *original* name for this was not "read vs
write", but just "access".
So instead of "READ_ONCE(x)" you'd do "ACCESS_ONCE(x)", and instead of
"WRITE_ONCE(x,y)" you'd do "ACCESS_ONCE(x) = y".
And, to make matters more interesting, we had code that did that on
things that were *not* simple values. IOW, we'd have things like
ACCESS_ONCE() on things that literally *couldn't* be accessed as one
single value.
The most notable was accessing page table entries, which on multiple
architectures (including plain old 32-bit x86) ended up being two
words.
So the extension that *did* happen is that READ_ONCE and WRITE_ONCE
actually verify that the type is simple, and that you can't do a
64-bit READ_ONCE on a 32-bit architecture. Because then while you
migth guarantee that the value isn't reloaded multiple times, you
cannot guarantee that you actually get a value that is consistent with
a WRITE_ONCE (because the reads and writes are both two operations).
Now, we've gotten rid of the whole ACCESS_ONCE() thing, and so some of
that history is no longer visible (although you can still see that
pattern in the rseq self-tests).
So yes, READ_ONCE/WRITE_ONCE do control "tearing", but realistically,
it was always only about the "complex values" kind of tearing that the
old ACCESS_ONCE() model silently and incorrectly allowed.
Linus
^ permalink raw reply [relevance 87%]
* Re: linux-next: build warning after merge of the vfs-brauner tree
@ 2024-03-06 4:47 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-06 4:47 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Christian Brauner, Tong Tiangen, Linux Kernel Mailing List,
Linux Next Mailing List
On Tue, 5 Mar 2024 at 20:37, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> +static struct page *dump_page_copy(struct page *src, struct page *dst)
> +{
> + return NULL;
> +}
No, it needs to be "return src;" not NULL.
That
#define dump_page_copy(src, dst) ((dst), (src))
was supposed to be a "use 'dst', return 'src'" macro, and is correct
as that. The problem - as you noticed - is that it causes that "left
side of comma expression has no effect" warning.
(Technically it *does* have an effect - exactly the "argument is used"
one - but the compiler warning does make sense).
Actually, the simplest thing to do is probably just
#define dump_page_free(x) ((void)(x))
#define dump_page_copy(src, dst) (src)
where the "use" of the 'dump_page' argument is that dump_page_free()
void cast, and dump_page_copy() simply doesn't need to use it at all.
Christian?
Linus
^ permalink raw reply [relevance 99%]
* Re: linux-next: build warning after merge of the vfs-brauner tree
@ 2024-03-06 2:48 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-06 2:48 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Christian Brauner, Tong Tiangen, Linux Kernel Mailing List,
Linux Next Mailing List
On Tue, 5 Mar 2024 at 15:51, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> fs/coredump.c: In function 'dump_user_range':
> fs/coredump.c:923:40: warning: left-hand operand of comma expression has no effect [-Wunused-value]
> 923 | #define dump_page_copy(src, dst) ((dst), (src))
> | ^
> fs/coredump.c:948:58: note: in expansion of macro 'dump_page_copy'
> 948 | int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
> | ^~~~~~~~~~~~~~
>
> Introduced by commit
>
> 4630f2caafcd ("coredump: get machine check errors early rather than during iov_iter")
Bah. If comes from that
#define dump_page_copy(src,dst) ((dst),(src))
and I did it that way because I wanted to avoid *another* warning,
namely the "dst not used" thing.
But it would have probably been better to either make it an inline
function, or maybe an explicit cast, eg
#define dump_page_copy(src,dst) ((void)(dst),(src))
or whatever.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled
@ 2024-03-05 17:57 93% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-05 17:57 UTC (permalink / raw)
To: Tetsuo Handa, Alexander Potapenko, Marco Elver, Dmitry Vyukov
Cc: LKML, the arch/x86 maintainers, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
[ For the KMSAN people I brought in: this is the patch I'm NAK'ing:
https://lore.kernel.org/all/3b7dbd88-0861-4638-b2d2-911c97a4cadf@I-love.SAKURA.ne.jp/
and it looks like you were already cc'd on earlier versions (which
were even more broken) ]
On Tue, 5 Mar 2024 at 03:31, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Ping?
Please don't add new people and 'ping' without context. Very annoying.
That said, after having to search for it that whole patch is
disgusting. Why make duplicated complex conditionals when you could
have just had the tests inside one #ifndef.
Also, that patch means that a KMSAN kernel potentially simply no
longer works on admittedly crappy hardware that almost doesn't exist.
So now a debug feature changes actual semantics in a big way. Not ok.
So I think this patch is ugly but also doubly incorrect.
I think the KMSAN people need to tell us how to tell kmsan that it's a
memcpy (and about the "I'm going to touch this part of memory", needed
for the "copy_mv_to_user" side).
So somebody needs to abstract out that
depot_stack_handle_t origin;
if (!kmsan_enabled || kmsan_in_runtime())
return;
kmsan_enter_runtime();
/* Using memmove instead of memcpy doesn't affect correctness. */
kmsan_internal_memmove_metadata(dst, (void *)src, n);
kmsan_leave_runtime();
set_retval_metadata(shadow, origin);
kind of thing, and expose it as a helper function for "I did something
that looks like a memory copy", the same way that we currently have
kmsan_copy_page_meta()
Because NO, IT IS NEVER CORRECT TO USE __msan_memcpy FOR THE MC COPIES.
So no. NAK on that patch. It's completely and utterly wrong.
The onus is firmly on the KMSAN people to give kernel people a way to
tell KMSAN to shut the f&%^ up about that.
End result: don't bother the x86 people until KMSAN has the required support.
Linus
^ permalink raw reply [relevance 93%]
* Re: [PATCH] coredump: get machine check errors early rather than during iov_iter
@ 2024-03-05 17:29 71% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-05 17:29 UTC (permalink / raw)
To: Jens Axboe
Cc: Christian Brauner, Tong Tiangen, linux-fsdevel, linux-kernel,
wangkefeng.wang, Guohanjun, David Howells, Al Viro,
Alexander Viro, Jan Kara, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
On Tue, 5 Mar 2024 at 08:39, Jens Axboe <axboe@kernel.dk> wrote:
>
> For what it's worth, checking the two patches, it's basically the one
> that Linus sent. I think it should have a From: based on that, and I
> also do not see Linus actually signing off on the patch, though that
> has been added to this one.
>
> Would probably be sane to get this one resent before applying, properly
> done.
I have a sign-off in my own test-tree, so it's all ok.
Sending my changelog just in case somebody wants to mix-and-match the two.
Linus
[-- Attachment #2: 0001-iov_iter-get-rid-of-copy_mc-flag.patch --]
[-- Type: text/x-patch, Size: 8640 bytes --]
From 1077a0a82d0f9b93df4d66a63c5f758b11dc1bbb Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 2 Mar 2024 09:35:13 -0800
Subject: [PATCH] iov_iter: get rid of 'copy_mc' flag
This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).
That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.
In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.
As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.
We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.
But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.
See for example commit c9eec08bac96 ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.
So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.
Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.
Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
fs/coredump.c | 41 ++++++++++++++++++++++++++++++++++++++---
include/linux/uio.h | 16 ----------------
lib/iov_iter.c | 23 -----------------------
3 files changed, 38 insertions(+), 42 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index f258c17c1841..6a9b9f3280d8 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
loff_t pos;
ssize_t n;
+ if (!page)
+ return 0;
+
if (cprm->to_skip) {
if (!__dump_skip(cprm, cprm->to_skip))
return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
pos = file->f_pos;
bvec_set_page(&bvec, page, PAGE_SIZE, 0);
iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
- iov_iter_set_copy_mc(&iter);
n = __kernel_write_iter(cprm->file, &iter, &pos);
if (n != PAGE_SIZE)
return 0;
@@ -895,10 +897,40 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
return 1;
}
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+ void *buf = kmap_local_page(src);
+ size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+ kunmap_local(buf);
+ return left ? NULL : dst;
+}
+
+#else
+
+#define dump_page_alloc() ((struct page *)8) // Not NULL
+#define dump_page_free(x) do { } while (0)
+#define dump_page_copy(src,dst) ((dst),(src))
+
+#endif
+
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len)
{
unsigned long addr;
+ struct page *dump_page = dump_page_alloc();
+
+ if (!dump_page)
+ return 0;
for (addr = start; addr < start + len; addr += PAGE_SIZE) {
struct page *page;
@@ -912,14 +944,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
*/
page = get_dump_page(addr);
if (page) {
- int stop = !dump_emit_page(cprm, page);
+ int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
put_page(page);
- if (stop)
+ if (stop) {
+ dump_page_free(dump_page);
return 0;
+ }
} else {
dump_skip(cprm, PAGE_SIZE);
}
}
+ dump_page_free(dump_page);
return 1;
}
#endif
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bea9c89922d9..00cebe2b70de 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
struct iov_iter {
u8 iter_type;
- bool copy_mc;
bool nofault;
bool data_source;
size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
#ifdef CONFIG_ARCH_HAS_COPY_MC
size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
- i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return i->copy_mc;
-}
#else
#define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return false;
-}
#endif
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_UBUF,
- .copy_mc = false,
.data_source = direction,
.ubuf = buf,
.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..cf2eb2b2f983 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_IOVEC,
- .copy_mc = false,
.nofault = false,
.data_source = direction,
.__iov = iov,
@@ -244,27 +243,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */
-static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
- if (unlikely(i->count < bytes))
- bytes = i->count;
- if (unlikely(!bytes))
- return 0;
- return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
static __always_inline
size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
- if (unlikely(iov_iter_is_copy_mc(i)))
- return __copy_from_iter_mc(addr, bytes, i);
return iterate_and_advance(i, bytes, addr,
copy_from_user_iter, memcpy_from_iter);
}
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_KVEC,
- .copy_mc = false,
.data_source = direction,
.kvec = kvec,
.nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_BVEC,
- .copy_mc = false,
.data_source = direction,
.bvec = bvec,
.nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
BUG_ON(direction & ~1);
*i = (struct iov_iter) {
.iter_type = ITER_XARRAY,
- .copy_mc = false,
.data_source = direction,
.xarray = xarray,
.xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
BUG_ON(direction != READ);
*i = (struct iov_iter){
.iter_type = ITER_DISCARD,
- .copy_mc = false,
.data_source = false,
.count = count,
.iov_offset = 0
--
2.44.0.rc1.22.g64314bd58b
^ permalink raw reply related [relevance 71%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-05 0:17 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-05 0:17 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Mon, 4 Mar 2024 at 15:50, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> But this still isn't fixing anything. It's just adding a limit.
Limiting things to a common maximum size is a good thing. The kernel
limits much more important things for very good reasons.
The kernel really shouldn't have big strings. EVER. And it literally
shows in our kernel infrastructure. It showed in that vsnprintf
precision thing. It shows in our implementation choices, where we tend
to have simplistic implementations because doing things a byte at a
time is simple and cheap when the strings are limited in size (and we
don't want fancy and can't use vector state anyway).
If something as core as a pathname can be limited to 4kB, then
something as unimportant as a trace string had better be limited too.
Because we simply DO NOT WANT to have to deal with longer strings in
the kernel.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-04 23:20 98% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-04 23:20 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Mon, 4 Mar 2024 at 14:08, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Fine, I'll just remove the precision as that's not needed. There was no
> other overflows involved here.
I really want you to add the size check on the trace buffer *creation* side.
I don't understand why you refuse to accept the fact that the
precision warning found a PROBLEM.
And no, the fix was never to paper over the problem by limiting the
precision field. Hiding a problem isn't fixing it.
And no, the fix was also never to chop up the printing of the string
in smaller pieces to hide paper over the precision field. Again,
hiding a problem isn't fixing it.
And finally, NO, the fix was also never to add extra debug code to see
that there was a NUL character there.
The fix was *always* to simply not accept insanely long strings in the
first place, and make sure that the field was correctly *set*.
IOW, at *creation* time the code needed a proper check for length
(which obviously indirectly includes checking for the terminating NUL
character at that point).
Why do these threads with you always have to end up this long? Why do
I Nhave to explain every single step of the way that you need to *FIX*
the problem, not try to hide it with new extra code.
Linus
^ permalink raw reply [relevance 98%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-04 21:50 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-04 21:50 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Mon, 4 Mar 2024 at 13:40, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> As I mentioned that the design is based on that the allocated buffer size is
> the string length rounded up to the word size, all I need to do is to make
> sure that there's a nul terminating byte within the last word of the
> allocated buffer. Then "%s" is all I need.
Please don't add pointless code that helps nothing.
> Would this work for you?
No. This code only adds debug code, and doesn't actually improve anything.
We *have* debug code already. Things like KASAN already find array
overruns, and your ex-tempore debug code adds zero actual value.
That, btw, is why your old stupid precision code was not only
triggering warnings, but was ACTIVELY DETRIMENTAL.
All that precision code could ever do was to potentially hide bugs if
the string wasn't NUL-terminated.
So no. I absolutely do NOT want you to write more code to hide bugs or
do half-arsed checking.
I want you to *simplify* the code, and put proper limits in place for strings.
I want to see the code that actually notices when somebody generates a
crazy string, and stops that garbage in its tracks.
What I do *not* want to see is more ad-hoc code that tries to deal
with the symptoms of you not having done so.
Linus
^ permalink raw reply [relevance 99%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
@ 2024-03-04 18:32 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-04 18:32 UTC (permalink / raw)
To: David Howells
Cc: Tong Tiangen, Al Viro, Jens Axboe, Christoph Hellwig,
Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
Kefeng Wang
On Mon, 4 Mar 2024 at 03:56, David Howells <dhowells@redhat.com> wrote:
>
> That said, I wonder if:
>
> #ifdef copy_mc_to_kernel
>
> should be:
>
> #ifdef CONFIG_ARCH_HAS_COPY_MC
Hmm. Maybe. We do have that
#ifdef copy_mc_to_kernel
pattern already in <linux/uaccess.h>, so clearly we've done it both ways.
I personally like the "just test for the thing you are using" model,
which is then why I did it that way, but I don't have hugely strong
opinions on it.
> and whether it's possible to find out dynamically if MCEs can occur at all.
I really wanted to do something like that, and look at the source page
to decide "is this a pmem page that can cause machine checks", but I
didn't find any obvious way to do that.
Improvement suggestions more than welcome.
Linus
^ permalink raw reply [relevance 99%]
* Linux 6.8-rc7
@ 2024-03-03 21:15 51% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-03 21:15 UTC (permalink / raw)
To: Linux Kernel Mailing List
So we finally have a week where things have calmed down, and in fact
6.8-rc7 is smaller than usual at this point in time. So if that keeps
up (but that's a fairly notable "if") I won't feel like I need to do
an rc8 this release after all.
So no guarantees, but assuming no bad surprises, we'll have the final
6.8 next weekend.
You can see the rc7 fixes in the shortlog below, and I don't think
there's anything particularly notable in there. It's not only fairly
small for an rc7, all the stats look fairly normal: just over half of
the diff is driver fixes, with the rest being a fairly random mix of
arch updates (powerpc and RISC-C dominate - although "dominate" may
not the right word when it's all pretty small) some filesystem fixes
(btrfs stands out), some core networking and mm fixes, and some more
networking selftest updates.
It really is all pretty small. Let's hope it stays that way,
Linus
---
Abel Vesa (1):
phy: qualcomm: eusb2-repeater: Rework init to drop redundant zero-out loop
Alex Deucher (1):
Revert "drm/amd/pm: resolve reboot exception for si oland"
Alexander Ofitserov (1):
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
Alexander Stein (1):
phy: freescale: phy-fsl-imx8-mipi-dphy: Fix alias name to use dashes
Alexandre Ghiti (3):
riscv: Fix build error if !CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
Revert "riscv: mm: support Svnapot in huge vmap"
riscv: Fix pte_leaf_size() for NAPOT
Amritha Nambiar (1):
ice: Fix ASSERT_RTNL() warning during certain scenarios
Andre Werner (1):
net: smsc95xx: add support for SYS TEC USB-SPEmodule1
Andy Shevchenko (1):
gpiolib: Fix the error path order in gpiochip_add_data_with_key()
Aneesh Kumar K.V (IBM) (1):
mm/debug_vm_pgtable: fix BUG_ON with pud advanced test
Ard Biesheuvel (3):
crypto: arm64/neonbs - fix out-of-bounds access on short input
efivarfs: Drop redundant cleanup on fill_super() failure
efivarfs: Drop 'duplicates' bool parameter on efivar_init()
Arkadiusz Kubalewski (4):
ice: fix dpll input pin phase_adjust value updates
ice: fix dpll and dpll_pin data access on PF reset
ice: fix dpll periodic work data updates on PF reset
ice: fix pin phase adjust updates on PF reset
Arnd Bergmann (3):
efi/capsule-loader: fix incorrect allocation size
scsi: mpi3mr: Reduce stack usage in mpi3mr_refresh_sas_ports()
drm/xe/mmio: fix build warning for BAR resize on 32-bit
Arturas Moskvinas (1):
gpio: 74x164: Enable output pins after registers are reset
Bart Van Assche (1):
fs/aio: Make io_cancel() generate completions again
Bartosz Golaszewski (1):
gpio: fix resource unwinding order in error path
Benjamin Berg (1):
wifi: iwlwifi: mvm: ensure offloading TID queue exists
Bjorn Andersson (1):
pmdomain: qcom: rpmhpd: Fix enabled_corner aggregation
Byungchul Park (1):
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Christian König (1):
drm/ttm/tests: depend on UML || COMPILE_TEST
Christophe Kerello (1):
mmc: mmci: stm32: fix DMA API overlapping mappings warning
Christophe Leroy (1):
kunit: Fix again checksum tests on big endian CPUs
Colin Ian King (1):
ASoC: qcom: Fix uninitialized pointer dmactl
Conor Dooley (1):
RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs
Cristian Marussi (1):
pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal
Curtis Klein (1):
dmaengine: fsl-qdma: init irq after reg initialization
Dave Airlie (1):
nouveau: report byte usage in VRAM usage.
David Howells (1):
afs: Fix endless loop in directory parsing
David Sterba (1):
btrfs: dev-replace: properly validate device names
Davide Caratti (1):
mptcp: fix double-free on socket dismantle
Dimitris Vlachos (1):
riscv: Sparse-Memory/vmemmap out-of-bounds fix
Dmitry Baryshkov (2):
phy: qcom-qmp-usb: fix v3 offsets data
Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD
status changes"
Doug Smythies (1):
cpufreq: intel_pstate: fix pstate limits enforcement for
adjust_perf call back
Elad Nachman (3):
mtd: rawnand: marvell: fix layouts
mmc: sdhci-xenon: fix PHY init clock stability
mmc: sdhci-xenon: add timeout for PHY init complete
Emmanuel Grumbach (1):
wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices
Eniac Zhang (1):
ALSA: hda/realtek: fix mute/micmute LED For HP mt440
Eric Dumazet (3):
ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
dpll: rely on rcu for netdev_dpll_pin()
dpll: fix build failure due to rcu_dereference_check() on unknown type
Fei Wu (1):
perf: RISCV: Fix panic on pmu overflow handler
Felix Fietkau (1):
wifi: mac80211: only call drv_sta_rc_update for uploaded stations
Fenghua Yu (2):
dmaengine: idxd: Remove shadow Event Log head stored in idxd
dmaengine: idxd: Ensure safe user copy of completion record
Filipe Manana (6):
btrfs: send: don't issue unnecessary zero writes for trailing hole
btrfs: fix data races when accessing the reserved amount of block reserves
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
btrfs: fix race between ordered extent completion and fiemap
btrfs: ensure fiemap doesn't race with writes when
FIEMAP_FLAG_SYNC is given
btrfs: fix double free of anonymous device after snapshot creation failure
Florian Westphal (4):
netlink: add nla be16/32 types to minlen array
net: ip_tunnel: prevent perpetual headroom growth
netfilter: bridge: confirm multicast packets before passing them
up the stack
selftests: netfilter: add bridge conntrack + multicast test case
Francois Dugast (1):
drm/xe/uapi: Remove unused flags
Frank Li (2):
dmaengine: fsl-edma: correct max_segment_size setting
dmaengine: fsl-qdma: add __iomem and struct in union to fix sparse warning
Frédéric Danis (1):
Bluetooth: mgmt: Fix limited discoverable off timeout
Gaurav Batra (1):
powerpc/pseries/iommu: IOMMU table is not initialized for kdump
over SR-IOV
Geliang Tang (3):
mptcp: map v4 address to v6 when destroying subflow
selftests: mptcp: rm subflow with v4/v4mapped addr
selftests: mptcp: join: add ss mptcp support check
Geoff Levand (1):
ps3/gelic: Fix SKB allocation
Gergo Koteles (1):
ALSA: hda/realtek: tas2781: enable subwoofer volume control
Haiyue Wang (1):
Documentations: correct net_cachelines title for struct inet_sock
Han Xu (1):
mtd: spinand: gigadevice: Fix the get ecc status issue
Hans Peter (1):
ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8)
Hans de Goede (1):
power: supply: bq27xxx-i2c: Do not free non existing IRQ
Herbert Xu (1):
crypto: lskcipher - Copy IV in lskcipher glue code always
Ignat Korchagin (1):
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Ivan Semenov (1):
mmc: core: Fix eMMC initialization with 1-bit bus connection
Jakub Kicinski (4):
net: veth: clear GRO when clearing XDP even when down
selftests: net: veth: test syncing GRO and XDP state while device is down
veth: try harder when allocating queue memory
tools: ynl: fix handling of multiple mcast groups
Jakub Raczynski (1):
stmmac: Clear variable when destroying workqueue
Janaki Ramaiah Thota (1):
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Jaroslav Kysela (1):
ALSA: pcm: clarify and fix default msbits value for all formats
Jason Gunthorpe (1):
iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGES
Javier Carrasco (1):
net: usb: dm9601: fix wrong return value in dm9601_mdio_read
Jay Ajit Mate (1):
ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron
16 Plus 7630
Jeff Johnson (2):
MAINTAINERS: wifi: update Jeff Johnson e-mail address
MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files
Jeremy Kerr (1):
net: mctp: take ownership of skb in mctp_local_output
Jiawei Wang (2):
ASoC: amd: yc: add new YC platform variant (0x63) support
ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2
Jiri Bohac (1):
x86/e820: Don't reserve SETUP_RNG_SEED in e820
Jiri Slaby (SUSE) (1):
fbcon: always restore the old font data in fbcon_do_set_font()
Jisheng Zhang (1):
riscv: tlb: fix __p*d_free_tlb()
Johan Hovold (4):
drm/bridge: aux-hpd: fix OF node leaks
drm/bridge: aux-hpd: separate allocation and registration
soc: qcom: pmic_glink_altmode: fix drm bridge use-after-free
Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
Johannes Berg (1):
wifi: nl80211: reject iftype change with mesh ID change
Johannes Thumshirn (1):
btrfs: zoned: don't skip block group profile checks on conventional zones
Johnny Hsieh (1):
ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table
Jonas Dreßler (1):
Bluetooth: hci_sync: Check the correct flag before starting a scan
José Roberto de Souza (1):
drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over
Joy Zou (1):
dmaengine: fsl-edma: correct calculation of 'nbytes' in
multi-fifo scenario
Justin Iurman (1):
uapi: in6: replace temporary label with rfc9486
Kai-Heng Feng (1):
Bluetooth: Enforce validation on max value of connection interval
Kailang Yang (1):
ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port
Kory Maincent (6):
dmaengine: dw-edma: Fix the ch_count hdma callback
dmaengine: dw-edma: Fix wrong interrupt bit set for HDMA
dmaengine: dw-edma: HDMA_V0_REMOTEL_STOP_INT_EN typo fix
dmaengine: dw-edma: Add HDMA remote interrupt configuration
dmaengine: dw-edma: HDMA: Add sync read before starting the DMA
transfer in remote setup
dmaengine: dw-edma: eDMA: Add sync read before starting the DMA
transfer in remote setup
Kurt Kanzenbach (1):
net: stmmac: Complete meta data only when enabled
Lin Ma (1):
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
Linus Torvalds (1):
Linux 6.8-rc7
Lorenzo Stoakes (1):
MAINTAINERS: add memory mapping entry with reviewers
Lucas De Marchi (1):
drm/xe: Use pointers in trace events
Luiz Augusto von Dentz (2):
Bluetooth: hci_sync: Fix accept_list when attempting to suspend
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Lukas Bulwahn (1):
MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER
Lukasz Majewski (2):
net: hsr: Fix typo in the hsr_forward_do() function comment
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
Ma Jun (1):
drm/amdgpu/pm: Fix the power1_min_cap value
Maarten Lankhorst (1):
drm/xe: Add uapi for dumpable bos
Marco Elver (2):
stackdepot: use variable size records for non-evictable entries
kasan: revert eviction of stack traces in generic mode
Mark Brown (1):
spi: Drop mismerged fix
Mark O'Donovan (1):
fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS
Masami Hiramatsu (Google) (1):
fprobe: Fix to allocate entry_data_size buffer with rethook instances
Matthew Auld (3):
drm/buddy: fix range bias
drm/buddy: check range allocation matches alignment
drm/tests/drm_buddy: add alloc_range_bias test
Matthew Brost (3):
drm/xe: Fix execlist splat
drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer
drm/xe: Use vmalloc for array of bind allocation in bind IOCTL
Matthieu Baerts (NGI0) (1):
mptcp: avoid printing warning once on client side
Michael Ellerman (1):
selftests/powerpc: Fix fpu_signal failures
Mickaël Salaün (3):
selinux: fix lsm_get_self_attr()
apparmor: fix lsm_get_self_attr()
landlock: Fix asymmetric private inodes referring
Mika Kuoppala (2):
drm/xe: Expose user fence from xe_sync_entry
drm/xe: Deny unbinds if uapi ufence pending
Mikko Perttunen (1):
gpu: host1x: Skip reset assert on Tegra186
Ming Lei (1):
block: define bvec_iter as __packed __aligned(4)
Miquel Raynal (1):
mtd: Fix possible refcounting issue when going through partition nodes
Naresh Solanki (1):
regulator: max5970: Fix regulator child node name
Nathan Chancellor (2):
kbuild: Add -Wa,--fatal-warnings to as-instr invocation
RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
Nathan Lynch (1):
powerpc/rtas: use correct function name for resetting TCE tables
Nhat Pham (1):
mm: cachestat: fix folio read-after-free in cache walk
Nicolin Chen (3):
iommufd: Fix iopt_access_list_id overwrite bug
iommufd/selftest: Fix mock_dev_num bug
iommufd: Fix protection fault in iommufd_test_syz_conv_iova
Oleksij Rempel (3):
lan78xx: enable auto speed configuration for LAN7850 if no
EEPROM is detected
net: lan78xx: fix "softirq work is pending" error
igb: extend PTP timestamp adjustments to i211
Paolo Abeni (5):
mptcp: push at DSS boundaries
mptcp: fix snd_wnd initialization for passive socket
mptcp: fix potential wake-up event loss
mptcp: fix possible deadlock in subflow diag
selftests: mptcp: explicitly trigger the listener diag code-path
Paolo Bonzini (2):
x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()
x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
Paulo Zanoni (1):
drm/xe: get rid of MAX_BINDS
Peng Ma (1):
dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
Prike Liang (1):
drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
Priyanka Dandamudi (2):
drm/xe/xe_bo_move: Enhance xe_bo_move trace
drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace
Rafael J. Wysocki (1):
Revert "ACPI: EC: Use a spin lock without disabing interrupts"
Randy Dunlap (1):
net: ethernet: adi: move PHYLIB from vendor to driver symbol
Ranjan Kumar (1):
scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
Richard Fitzgerald (2):
ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET
ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()
Rob Clark (1):
soc: qcom: pmic_glink: Fix boot when QRTR=m
Ryan Lin (1):
drm/amd/display: Add monitor patch for specific eDP
Ryosuke Yasuoka (1):
netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
Sabrina Dubroca (4):
tls: decrement decrypt_pending if no async completion will be called
tls: fix peeking with sync+async decryption
tls: separate no-async decryption request handling from async
tls: fix use-after-free on failed backlog decryption
Samuel Holland (4):
MAINTAINERS: Update SiFive driver maintainers
riscv: Fix enabling cbo.zero when running in M-mode
riscv: Add a custom ISA extension for the [ms]envcfg CSR
riscv: Save/restore envcfg CSR during CPU suspend
Saravana Kannan (1):
of: property: fw_devlink: Fix stupid bug in remote-endpoint parsing
Shannon Nelson (3):
ionic: check before releasing pci regions
ionic: check cmd_regs before copying in or out
ionic: restore netdev feature bits after reset
Shiyang Ruan (1):
xfs: drop experimental warning for FSDAX
Sid Pranjale (1):
drm/nouveau: keep DMA buffers required for suspend/resume
Srinivasan Shanmugam (1):
drm/amd/display: Prevent potential buffer overflow in map_hw_resources
Tadeusz Struk (1):
dmaengine: ptdma: use consistent DMA masks
Takashi Iwai (2):
ALSA: ump: Fix the discard error code from snd_ump_legacy_open()
ALSA: Drop leftover snd-rtctimer stuff from Makefile
Takashi Sakamoto (2):
ALSA: firewire-lib: fix to check cycle continuity
firewire: core: use long bus reset on gap count error
Tetsuo Handa (1):
tomoyo: fix UAF write bug in tomoyo_write_control()
Thierry Reding (1):
drm/tegra: Remove existing framebuffer only if we support display
Thomas Weißschuh (1):
power: supply: mm8013: select REGMAP_I2C
Théo Lebrun (4):
spi: cadence-qspi: fix pointer reference in runtime PM hooks
spi: cadence-qspi: remove system-wide suspend helper calls from
runtime PM hooks
spi: cadence-qspi: put runtime in runtime PM hooks names
spi: cadence-qspi: add system-wide suspend and resume callbacks
Tim Schumacher (1):
efivarfs: Request at most 512 bytes for variable names
Vadim Shakirov (2):
drivers: perf: added capabilities for legacy PMU
drivers: perf: ctr_get_width function for legacy is not defined
Vladimir Oltean (1):
net: dpaa: fman_memac: accept phy-interface-type = "10gbase-r"
in the device tree
Willian Wang (1):
ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8
Xiubo Li (1):
ceph: switch to corrected encoding of max_xattr_size in mdsmap
Yang Yingliang (1):
phy: qcom: phy-qcom-m31: fix wrong pointer pass to PTR_ERR()
Yangyu Chen (1):
riscv: mm: fix NOCACHE_THEAD does not set bit[61] correctly
Ying Hsu (1):
Bluetooth: Avoid potential use-after-free in hci_error_reset
Yochai Hagvi (1):
ice: fix connection state of DPLL and out pin
Yuezhang Mo (1):
exfat: fix appending discontinuous clusters to empty file
Yunjian Wang (1):
tun: Fix xdp_rxq_info's queue_index when detaching
Yuxuan Hu (1):
Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
Zhangfei Gao (1):
iommu/sva: Fix SVA handle sharing in multi device case
Zijun Hu (3):
Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: qca: Fix triggering coredump implementation
Zong Li (1):
riscv: add CALLER_ADDRx support
^ permalink raw reply [relevance 51%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-03 20:09 79% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-03 20:09 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Sun, 3 Mar 2024 at 11:07, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The string in question isn't some random string. It's a print event on
> the ring buffer where the size is strlen(p) rounded up to word size.
> That means, max will be no bigger than word-size - 1 greater than
> strlen(p). That means the chunks of 1024 will never land in the middle
> of garbage.
What a piece of unbelievable crap.
So you didn't actually want the precision in the first place, then you
started limiting it to an insane value because the printk code
complains about insane precision, and then you want to "fix" that by
printing it out in chunks where you know the chunk size won't hit
garbage, but that ends up being a random implementation detail that
you don't even document in that chunking code.
> > What was wrong with saying "don't do that"? You seem to be bending
> > over backwards to doing stupid things, and then insisting on doing
> > them entirely wrong.
>
> Don't do what?
You have this pattern of not actually thinking through the code AT
ALL, andc just fixing symptoms, and then making the code worse.
The whole "let's avoid the symptom of the kernel printk code telling
us that 32kB string precision is crazy by putting a 32kB-1 limit on
it" was clearly just papering over a symptom, not fixing the problem.
Doing insane chunking in 1kB pieces was another "let's paper over the
symptom, not fix the problem".
And now you finally admit that the actual problem was that YOUR
PRECISION WAS STUPID TO BEGIN WITH.
Do you really not see what the truly _fundamental_ problem here is?
Kernel code doesn't "paper over" stuff. We do things *right*. No more
of this crap.
You really need to take a deep look at what you are doing. I spend
more time on your pull requests than I want to, exactly because you
have had this pattern of doing something wrong in the first place, and
then adding MORE CODE to paper over all the problems that initial
wrong decision causes.
This was *exactly* the same same thing that happened on the tracefs
side. You did things wrong, and then you spent a lot of effort in
trying to patch up the resulting problems, instead of going back and
doing it *right*.
And honestly, I still think that the fundamental mistake you have done
is to let people say "I want to have these big strings" and you just
roll over and say "sure, we'll create shit kernel code for you".
WTF do you think it's fine to say "let people do insane things"
instead of just telling people that no, we have sane and small limits.
As a maintainer, one of your jobs is to say "No, we're not doing crazy
stuff". I still think that having so big strings that this came up in
the first place is a sign of the deeper problem, and then the fact
that you had an insane and pointless precision field was just a small
implementation issue.
Doing tracing in the kernel is not some kind of general-purpose thing.
It's ok - in fact, it's a really damn good idea - to just tell people
"yes, you can add strings, but dammit, there needs to be sanity to
it".
So I now tell you that you should
(a) get rid of the stupid and nonsensical precision
(b) tell people that their string are limited (and that 4kB is an
_upper_ value to sane string lengths in the kernel)
(c) really fundamentally stop with the "paper over" things approach
to kernel programming
Large strings are not a "feature". They are a bug.
It's also sad that apparently your strings are counted, but you don't
count them very well, so instead of just using the count (which is
*much* cheaper) you end up using '%s' and do things until you hit a
NUL byte. Guess what? All our printk infrastructure is designed for
small strings, so '%s' isn't exactly optimized, because we expected
sanity. It ends up in a loop that literally does things one byte at a
time.
And no. The solution is *not* to paper that over by making '%s'
printing more efficient. It's not supposed to need that kind of
efficiency.
Christ. *IF* large strings were a good idea, and you actually almost
have the length encoded, this whole tracing code could have used the
fact that you have that approximate count to do something like
len = fieldsize-1;
len &= ~(sizeof(unsigned long)-1);
len += strlen(fieldbase + len);
seq_write(buf, fieldbase, len);
and at least now you'd have something *efficient*. Which is at least a
source of pride in itself.
But since I really think the core of the problem is "we shouldn't have
allowed this kind of crap", I think efficiency - while a source of
pride - is polishing a turd and more of the "paper over the
fundamental issue".
And no, the above code sequence isn't wonderfully pretty either. Using
'strlen()' to find the last NUL character in a word is disgusting, and
our kernel 'strlen()' isn't some optimized thing either.
We do have fancier cases for fancier code (the word-at-a-time stuff),
but at least the above only walks things one byte at a time for a tiny
sequence.
Linus
^ permalink raw reply [relevance 79%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-03 17:38 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-03 17:38 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Sun, 3 Mar 2024 at 04:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - trace_seq_printf(s, ": %.*s", max, field->buf);
> + trace_seq_puts(s, ": ");
> + /* Write 1K chunks at a time */
> + p = field->buf;
> + do {
> + int pre = max > 1024 ? 1024 : max;
> + trace_seq_printf(s, "%.*s", pre, p);
> + max -= pre;
> + p += pre;
> + } while (max > 0);
The above loop is complete garbage.
If 'p' is a string, you're randomly just walking past the end of the
string with 'p += pre'
And if 'o' isn't a string but has a fixed size, you shouldn't use '%s'
in the first place, you should just use seq_write().
Just stop. You are doing things entirely wrong, and you're just adding
random code.
I'm not taking *any* fixes from you as things are now, you're once
again only making things worse.
What was wrong with saying "don't do that"? You seem to be bending
over backwards to doing stupid things, and then insisting on doing
them entirely wrong.
Linus
^ permalink raw reply [relevance 99%]
* Re: arch/x86/include/asm/processor.h:698:16: sparse: sparse: incorrect type in initializer (different address spaces)
@ 2024-03-02 22:49 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 22:49 UTC (permalink / raw)
To: Thomas Gleixner
Cc: kernel test robot, oe-kbuild-all, linux-kernel, Arjan van de Ven,
x86, Luc Van Oostenryck, Sparse Mailing-list
On Sat, 2 Mar 2024 at 14:00, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I had commented out both. But the real reason is the EXPORT_SYMBOL,
> which obviously wants to be EXPORT_PER_CPU_SYMBOL_GPL...
Side note: while it's nice to hear that sparse kind of got this right,
I wonder what gcc does when we start using the named address spaces
for percpu variables.
We actively make EXPORT_PER_CPU_SYMBOL_XYZ be a no-op for sparse
exactly because sparse ended up warning about the regular
EXPORT_SYMBOL, and we didn't have any "real" per-cpu export model.
So EXPORT_PER_CPU_SYMBOL_GPL() is kind of an artificial "shut up
sparse". But with __seg_gs/fs support for native percpu symbols with
gcc, I wonder if we'll hit the same thing. Or is there something that
makes gcc not warn about the named address spaces?
Because in many ways the gcc named address spaces _should_ be pretty
much equivalent to the sparse ones.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-02 20:55 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 20:55 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers, Sachin Sant
On Sat, 2 Mar 2024 at 12:47, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I'm fine with just making it 4K with a comment saying that "4K is the
> minimum page size on most archs, and to keep this consistent for crazy
> architectures like PowerPC and it's 64K pages, we hard code 4K to keep
> all architectures acting the same".
4k is at least a somewhat sane limit, and yes, being hw-independent is
a good idea.
We have other strings that have that limit for similar reasons (ie PATH_MAX).
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
2024-03-02 20:25 99% ` Linus Torvalds
@ 2024-03-02 20:33 99% ` Linus Torvalds
1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 20:33 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers
On Sat, 2 Mar 2024 at 12:00, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The error isn't printk, it's vsnprintf() that is writing to a seq_file
> to user space. There's no stack or printk involved here.
Look again. The code uses 'struct printf_spec' and we literally have a
static_assert(sizeof(struct printf_spec) == 8);
because we want the compiler to generate sane calling conventions and
not waste space and code with arguments on the stack. That's literally
why we do all those limits in a bitfield - because the code in
question is written to say "unreasonable people can go screw
themselves".
I'm not interested in arguing this. We're not doing some completely
idiotic "let's edge up to the physical limit of what our printk code
is willing to do".
I'm perfectly happy having that WARN_ON() to continue to tell people
they are doing stupid things that won't work.
And if you ever decide that a sane limit is ok, you can send that in.
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-02 20:25 99% ` Linus Torvalds
2024-03-02 20:33 99% ` Linus Torvalds
1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 20:25 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers
On Sat, 2 Mar 2024 at 12:00, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I don't have control over the strings. Anyone can do in user space:
>
> fd = open("/sys/kernel/tracing/trace_marker", O_WRONLY);
> r = write(fd, huge_string, 10000000);
So?
Stop the stupidity.
You already limit the string.
Just limit it to a sane value. if somebody uses a 10kB trace marker,
return an error, or just truncate it to 100 bytes.
You already were willing to truncate it to 32kB. Use your brain, and
realize that 32kB is a ridiculous limit.
Why do I even need to tell you this? I'm getting really tired of
having these idiotic arguments with you.
Linus
^ permalink raw reply [relevance 99%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
2024-03-02 18:06 93% ` Linus Torvalds
@ 2024-03-02 18:11 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 18:11 UTC (permalink / raw)
To: Tong Tiangen
Cc: Al Viro, David Howells, Jens Axboe, Christoph Hellwig,
Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
Kefeng Wang
On Sat, 2 Mar 2024 at 10:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In other words, it's the usual "Enterprise Hardware" situation. Looks
> fancy on paper, costs an arm and a leg, and the reality is just sad,
> sad, sad.
Don't get me wrong. I'm sure large companies are more than willing to
sell other large companies very expensive support contracts and have
engineers that they fly out to deal with the problems all these
enterprise solutions have.
The problem *will* get fixed somehow, it's just going to cost you. A lot.
Because THAT is what Enterprise Hardware is all about.
Linus
^ permalink raw reply [relevance 99%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
@ 2024-03-02 18:06 93% ` Linus Torvalds
2024-03-02 18:11 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 18:06 UTC (permalink / raw)
To: Tong Tiangen
Cc: Al Viro, David Howells, Jens Axboe, Christoph Hellwig,
Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
Kefeng Wang
On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> I think this solution has two impacts:
> 1. Although it is not a performance-critical path, the CPU usage may be
> affected by one more memory copy in some large-memory applications.
Compared to the IO, the extra memory copy is a non-issue.
If anything, getting rid of the "copy_mc" flag removes extra code in a
much more important path (ie the normal iov_iter code).
> 2. If a hardware memory error occurs in "good location" and the
> ".copy_mc" is removed, the kernel will panic.
That's always true. We do not support non-recoverable machine checks
on kernel memory. Never have, and realistically probably never will.
In fact, as far as I know, the hardware that caused all this code in
the first place no longer exists, and never really made it to wide
production.
The machine checks in question happened on pmem, now killed by Intel.
It's possible that somebody wants to use it for something else, but
let's hope any future implementations are less broken than the
unbelievable sh*tshow that caused all this code in the first place.
The whole copy_mc_to_kernel() mess exists mainly due to broken pmem
devices along with old and broken CPU's that did not deal correctly
with machine checks inside the regular memory copy ('rep movs') code,
and caused hung machines.
IOW, notice how 'copy_mc_to_kernel()' just becomes a regular
'memcpy()' on fixed hardware, and how we have that disgusting
copy_mc_fragile_key that gets enabled for older CPU cores.
And yes, we then have copy_mc_enhanced_fast_string() which isn't
*that* disgusting, and that actually handles machine checks properly
on more modern hardware, but it's still very much "the hardware is
misdesiged, it has no testing, and nobody sane should depend on this"
In other words, it's the usual "Enterprise Hardware" situation. Looks
fancy on paper, costs an arm and a leg, and the reality is just sad,
sad, sad.
Linus
^ permalink raw reply [relevance 93%]
* Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
@ 2024-03-02 17:24 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-02 17:24 UTC (permalink / raw)
To: Steven Rostedt; +Cc: LKML, Masami Hiramatsu, Mathieu Desnoyers
On Sat, 2 Mar 2024 at 08:10, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - The change to allow trace_marker writes to be as big as the trace_seq can
> hold, and also the change that increases the size of the trace_seq to two
> pages, caused PowerPC kselftest trace_marker test to fail. The trace_marker
> kselftest writes up to subbuffer size which is determined by PAGE_SIZE.
> On PowerPC, the PAGE_SIZE can be 64K, which means the selftest will write
> a string that is around 64K in size. The output of the trace_marker is
> performed with a vsnprintf("%.*s", size, string), but this write would make
> the size greater than 32K, which is the max precision of "%.*s", and that
> causes a kernel warning. The fix is simply to keep the write of trace_marker
> less than or equal to max signed short.
Please don't just add random limits that are based on other random limits.
That printk warning is for "you did something obviously crazy".
That does NOT MEAN that you now should limit your strings to something
JUST BORDERLINE CRAZY.
See?
There is not a way in hell that printing a 32kB string in tracing is
valid. EVER.
So stop it. Stop making limits be some random implementation detail.
Make limits *sane*.
Make a *sane* limit for tracing. Not a "avoid being called crazy" limit.
Honestly, I suspect that a sane limit for tracing strings is likely on
the order of tens or maybe hundreds of bytes. Not some kind of "fits
in a short" that is just printk saying "I refuse to waste memory on
the stack".
Side note: for similar reasons the field-width is a 24-bit integer.
And no, if you think that passing a 16MB field width is sane, you need
to rethink your life. Again, that's a small implementation detail, not
a "let's explore how stupid we can be".
Linus
^ permalink raw reply [relevance 99%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
2024-02-29 17:32 91% ` Linus Torvalds
@ 2024-03-02 2:59 76% ` Linus Torvalds
0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-03-02 2:59 UTC (permalink / raw)
To: Tong Tiangen, Al Viro
Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
linux-block, linux-mm, netdev, linux-kernel, Kefeng Wang
[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]
On Thu, 29 Feb 2024 at 09:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> One option might be to make a failed memcpy_from_iter_mc() set another
> flag in the iter, and then make fault_in_iov_iter_readable() test that
> flag and return 'len' if that flag is set.
>
> Something like that (wild handwaving) should get the right error handling.
>
> The simpler alternative is maybe something like the attached.
> COMPLETELY UNTESTED. Maybe I've confused myself with all the different
> indiraction mazes in the iov_iter code.
Actually, I think the right model is to get rid of that horrendous
.copy_mc field entirely.
We only have one single place that uses it - that nasty core dumping
code. And that code is *not* performance critical.
And not only isn't it performance-critical, it already does all the
core dumping one page at a time because it doesn't want to write pages
that were never mapped into user space.
So what we can do is
(a) make the core dumping code *copy* the page to a good location
with copy_mc_to_kernel() first
(b) remove this horrendous .copy_mc crap entirely from iov_iter
This is slightly complicated by the fact that copy_mc_to_kernel() may
not even exist, and architectures that don't have it don't want the
silly extra copy. So we need to abstract the "copy to temporary page"
code a bit. But that's probably a good thing anyway in that it forces
us to have nice interfaces.
End result: something like the attached.
AGAIN: THIS IS ENTIRELY UNTESTED.
But hey, so was clearly all the .copy_mc code too that this removes, so...
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 6155 bytes --]
fs/coredump.c | 41 ++++++++++++++++++++++++++++++++++++++---
include/linux/uio.h | 16 ----------------
lib/iov_iter.c | 23 -----------------------
3 files changed, 38 insertions(+), 42 deletions(-)
diff --git a/fs/coredump.c b/fs/coredump.c
index f258c17c1841..6a9b9f3280d8 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
loff_t pos;
ssize_t n;
+ if (!page)
+ return 0;
+
if (cprm->to_skip) {
if (!__dump_skip(cprm, cprm->to_skip))
return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
pos = file->f_pos;
bvec_set_page(&bvec, page, PAGE_SIZE, 0);
iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
- iov_iter_set_copy_mc(&iter);
n = __kernel_write_iter(cprm->file, &iter, &pos);
if (n != PAGE_SIZE)
return 0;
@@ -895,10 +897,40 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
return 1;
}
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+ void *buf = kmap_local_page(src);
+ size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+ kunmap_local(buf);
+ return left ? NULL : dst;
+}
+
+#else
+
+#define dump_page_alloc() ((struct page *)8) // Not NULL
+#define dump_page_free(x) do { } while (0)
+#define dump_page_copy(src,dst) ((dst),(src))
+
+#endif
+
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len)
{
unsigned long addr;
+ struct page *dump_page = dump_page_alloc();
+
+ if (!dump_page)
+ return 0;
for (addr = start; addr < start + len; addr += PAGE_SIZE) {
struct page *page;
@@ -912,14 +944,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
*/
page = get_dump_page(addr);
if (page) {
- int stop = !dump_emit_page(cprm, page);
+ int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
put_page(page);
- if (stop)
+ if (stop) {
+ dump_page_free(dump_page);
return 0;
+ }
} else {
dump_skip(cprm, PAGE_SIZE);
}
}
+ dump_page_free(dump_page);
return 1;
}
#endif
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bea9c89922d9..00cebe2b70de 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
struct iov_iter {
u8 iter_type;
- bool copy_mc;
bool nofault;
bool data_source;
size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
#ifdef CONFIG_ARCH_HAS_COPY_MC
size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
- i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return i->copy_mc;
-}
#else
#define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return false;
-}
#endif
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_UBUF,
- .copy_mc = false,
.data_source = direction,
.ubuf = buf,
.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..cf2eb2b2f983 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_IOVEC,
- .copy_mc = false,
.nofault = false,
.data_source = direction,
.__iov = iov,
@@ -244,27 +243,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */
-static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
- if (unlikely(i->count < bytes))
- bytes = i->count;
- if (unlikely(!bytes))
- return 0;
- return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
static __always_inline
size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
- if (unlikely(iov_iter_is_copy_mc(i)))
- return __copy_from_iter_mc(addr, bytes, i);
return iterate_and_advance(i, bytes, addr,
copy_from_user_iter, memcpy_from_iter);
}
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_KVEC,
- .copy_mc = false,
.data_source = direction,
.kvec = kvec,
.nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_BVEC,
- .copy_mc = false,
.data_source = direction,
.bvec = bvec,
.nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
BUG_ON(direction & ~1);
*i = (struct iov_iter) {
.iter_type = ITER_XARRAY,
- .copy_mc = false,
.data_source = direction,
.xarray = xarray,
.xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
BUG_ON(direction != READ);
*i = (struct iov_iter){
.iter_type = ITER_DISCARD,
- .copy_mc = false,
.data_source = false,
.count = count,
.iov_offset = 0
^ permalink raw reply related [relevance 76%]
* Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
@ 2024-03-01 20:10 96% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-01 20:10 UTC (permalink / raw)
To: Nikolai Kondrashov
Cc: Maxime Ripard, Helen Koike, linuxtv-ci, dave.pigott,
linux-kernel, dri-devel, linux-kselftest, gustavo.padovan,
pawiecz, tales.aparecida, workflows, kernelci, skhan, kunit-dev,
nfraprado, davidgow, cocci, Julia.Lawall, laura.nao,
ricardo.canuelo, kernel, gregkh
On Fri, 1 Mar 2024 at 02:27, Nikolai Kondrashov <spbnick@gmail.com> wrote:
>
> I agree, it's hard to imagine even a simple majority agreeing on how GitLab CI
> should be done. Still, we would like to help people, who are interested in
> this kind of thing, to set it up. How about we reframe this contribution as a
> sort of template, or a reference for people to start their setup with,
> assuming that most maintainers would want to tweak it? We would also be glad
> to stand by for questions and help, as people try to use it.
Ack. I think seeing it as a library for various gitlab CI models would
be a lot more palatable. Particularly if you can then show that yes,
it is also relevant to our currently existing drm case.
So I'm not objecting to having (for example) some kind of CI helper
templates - I think a logical place would be in tools/ci/ which is
kind of alongside our tools/testing subdirectory.
(And then perhaps have a 'gitlab' directory under that. I'm not sure
whether - and how much - commonality there might be between the
different CI models of different hosts).
Just to clarify: when I say "a logical place", I very much want to
emphasize the "a" - maybe there are better places, and I'm not saying
that is the only possible place. But it sounds more logical to me than
some.
Linus
^ permalink raw reply [relevance 96%]
* Re: [PATCH for 6.8] tomoyo: fix UAF write bug in tomoyo_write_control()
@ 2024-03-01 19:14 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-03-01 19:14 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Sam Sun, paul, syzkaller, takedakn, jmorris, serge,
linux-security-module, linux-kernel
On Fri, 1 Mar 2024 at 05:04, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> I couldn't reproduce this problem in my environment, but I believe
> this does fix a bug. Linus, can you directly apply to linux.git ?
Thanks. Applied,
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat()
@ 2024-03-01 17:51 90% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-03-01 17:51 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Josh Poimboeuf, Jeff Layton, Chuck Lever, Kees Cook,
Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Roman Gushchin, Hyeonggon Yoo, Johannes Weiner,
Michal Hocko, Shakeel Butt, Muchun Song, Alexander Viro,
Christian Brauner, Jan Kara, linux-mm, linux-kernel, cgroups,
linux-fsdevel
On Fri, 1 Mar 2024 at 09:07, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> This is just an example of using the kmem_cache_charge() API. I think
> it's placed in a place that's applicable for Linus's example [1]
> although he mentions do_dentry_open() - I have followed from strace()
> showing openat(2) to path_openat() doing the alloc_empty_file().
Thanks. This is not the right patch, but yes, patches 1-3 look very nice to me.
> The idea is that filp_cachep stops being SLAB_ACCOUNT. Allocations that
> want to be accounted immediately can use GFP_KERNEL_ACCOUNT. I did that
> in alloc_empty_file_noaccount() (despite the contradictory name but the
> noaccount refers to something else, right?) as IIUC it's about
> kernel-internal opens.
Yeah, the "noaccount" function is about not accounting it towards nr_files.
That said, I don't think it necessarily needs to do the memory
accounting either - it's literally for cases where we're never going
to install the file descriptor in any user space.
Your change to use GFP_KERNEL_ACCOUNT isn't exactly wrong, but I don't
think it's really the right thing either, because
> Why is this unfinished:
>
> - there are other callers of alloc_empty_file() which I didn't adjust so
> they simply became memcg-unaccounted. I haven't investigated for which
> ones it would make also sense to separate the allocation and accounting.
> Maybe alloc_empty_file() would need to get a parameter to control
> this.
Right. I think the natural and logical way to deal with this is to
just say "we account when we add the file to the fdtable".
IOW, just have fd_install() do it. That's the really natural point,
and also makes it very logical why alloc_empty_file_noaccount()
wouldn't need to do the GFP_KERNEL_ACCOUNT.
> - I don't know how to properly unwind the accounting failure case. It
> seems like a new case because when we succeed the open, there's no
> further error path at least in path_openat().
Yeah, let me think about this part. Becasue fd_install() is the right
point, but that too does not really allow for error handling.
Yes, we could close things and fail it, but it really is much too late
at this point.
What I *think* I'd want for this case is
(a) allow the accounting to go over by a bit
(b) make sure there's a cheap way to ask (before) about "did we go
over the limit"
IOW, the accounting never needed to be byte-accurate to begin with,
and making it fail (cheaply and early) on the next file allocation is
fine.
Just make it really cheap. Can we do that?
For example, maybe don't bother with the whole "bytes and pages"
stuff. Just a simple "are we more than one page over?" kind of
question. Without the 'stock_lock' mess for sub-page bytes etc
How would that look? Would it result in something that can be done
cheaply without locking and atomics and without excessive pointer
indirection through many levels of memcg data structures?
Linus
^ permalink raw reply [relevance 90%]
* Re: [GIT PULL] Networking for v6.8-rc7
@ 2024-02-29 20:56 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-29 20:56 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, linux-kernel, pabeni
On Thu, 29 Feb 2024 at 12:39, Jakub Kicinski <kuba@kernel.org> wrote:
>
> A few hours late, the commit on top fixes an odd "rcu_dereference()
> needs to know full type" build issue I can't repro..
Ugfh. That change literally makes a single load instruction be a
function call. Pretty sad, particularly with all the crazy CPU
mitigations causing that to be even more expensive than it is already.
I really don't see how that error can happen, it sounds very odd.
Oh well.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
@ 2024-02-29 20:21 95% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-29 20:21 UTC (permalink / raw)
To: Nikolai Kondrashov
Cc: Maxime Ripard, Helen Koike, linuxtv-ci, dave.pigott,
linux-kernel, dri-devel, linux-kselftest, gustavo.padovan,
pawiecz, tales.aparecida, workflows, kernelci, skhan, kunit-dev,
nfraprado, davidgow, cocci, Julia.Lawall, laura.nao,
ricardo.canuelo, kernel, gregkh
On Thu, 29 Feb 2024 at 01:23, Nikolai Kondrashov <spbnick@gmail.com> wrote:
>
> However, I think a better approach would be *not* to add the .gitlab-ci.yaml
> file in the root of the source tree, but instead change the very same repo
> setting to point to a particular entry YAML, *inside* the repo (somewhere
> under "ci" directory) instead.
I really don't want some kind of top-level CI for the base kernel project.
We already have the situation that the drm people have their own ci
model. II'm ok with that, partly because then at least the maintainers
of that subsystem can agree on the rules for that one subsystem.
I'm not at all interested in having something that people will then
either fight about, or - more likely - ignore, at the top level
because there isn't some global agreement about what the rules are.
For example, even just running checkpatch is often a stylistic thing,
and not everybody agrees about all the checkpatch warnings.
I would suggest the CI project be separate from the kernel.
And having that slack channel that is restricted to particular
companies is just another sign of this whole disease.
If you want to make a google/microsoft project to do kernel CI, then
more power to you, but don't expect it to be some kind of agreed-upon
kernel project when it's a closed system.
Linus
^ permalink raw reply [relevance 95%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
@ 2024-02-29 17:32 91% ` Linus Torvalds
2024-03-02 2:59 76% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-29 17:32 UTC (permalink / raw)
To: Tong Tiangen, Al Viro
Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
linux-block, linux-mm, netdev, linux-kernel, Kefeng Wang
[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]
On Thu, 29 Feb 2024 at 00:13, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> See the logic before this patch, always success (((void)(K),0)) is
> returned for three types: ITER_BVEC, ITER_KVEC and ITER_XARRAY.
No, look closer.
Yes, the iterate_and_advance() macro does that "((void)(K),0)" to make
the compiler generate better code for those cases (because then the
compiler can see that the return value is a compile-time zero), but
notice how _copy_mc_to_iter() didn't use that macro back then. It used
the unvarnished __iterate_and_advance() exactly so that the MC copy
case would *not* get that "always return zero" behavior.
That goes back to (in a different form) at least commit 1b4fb5ffd79b
("iov_iter: teach iterate_{bvec,xarray}() about possible short
copies").
But hardly anybody ever tests this machine-check special case code, so
who knows when it broke again.
I'm just looking at the source code, and with all the macro games it's
*really* hard to follow, so I may well be missing something.
> Maybe we're all gonna fix it back? as follows:
No. We could do it for the kvec and xarray case, just to get better
code generation again (not that I looked at it, so who knows), but the
one case that actually uses memcpy_from_iter_mc() needs to react to a
short write.
One option might be to make a failed memcpy_from_iter_mc() set another
flag in the iter, and then make fault_in_iov_iter_readable() test that
flag and return 'len' if that flag is set.
Something like that (wild handwaving) should get the right error handling.
The simpler alternative is maybe something like the attached.
COMPLETELY UNTESTED. Maybe I've confused myself with all the different
indiraction mazes in the iov_iter code.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 618 bytes --]
lib/iov_iter.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b440ca5..5236c16734e0 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -248,7 +248,10 @@ static __always_inline
size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
size_t len, void *to, void *priv2)
{
- return copy_mc_to_kernel(to + progress, iter_from, len);
+ size_t n = copy_mc_to_kernel(to + progress, iter_from, len);
+ if (n)
+ memset(to + progress - n, 0, n);
+ return 0;
}
static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
^ permalink raw reply related [relevance 91%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
2024-02-28 21:21 99% ` Linus Torvalds
@ 2024-02-28 22:57 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-28 22:57 UTC (permalink / raw)
To: Tong Tiangen
Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig,
Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
Kefeng Wang
On Wed, 28 Feb 2024 at 13:21, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. If the copy doesn't succeed and make any progress at all, then
> the code in generic_perform_write() after the "goto again"
>
> //[4]
> if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
> bytes)) {
>
> should break out of the loop.
Ahh. I see the problem. Or at least part of it.
The iter is an ITER_BVEC.
And fault_in_iov_iter_readable() "knows" that an ITER_BVEC cannot
fail. Because obviously it's a kernel address, so no user page fault.
But for the machine check case, ITER_BVEC very much can fail.
This should never have worked in the first place.
What a crock.
Do we need to make iterate_bvec() always succeed fully, and make
copy_mc_to_kernel() zero out the end?
Linus
^ permalink raw reply [relevance 99%]
* Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs
@ 2024-02-28 21:21 99% ` Linus Torvalds
2024-02-28 22:57 99% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-28 21:21 UTC (permalink / raw)
To: Tong Tiangen
Cc: David Howells, Jens Axboe, Al Viro, Christoph Hellwig,
Christian Brauner, David Laight, Matthew Wilcox, Jeff Layton,
linux-fsdevel, linux-block, linux-mm, netdev, linux-kernel,
Kefeng Wang
On Sat, 17 Feb 2024 at 19:13, Tong Tiangen <tongtiangen@huawei.com> wrote:
>
> After this patch:
> copy_page_from_iter_atomic()
> -> iterate_and_advance2()
> -> iterate_bvec()
> -> remain = step()
>
> With CONFIG_ARCH_HAS_COPY_MC, the step() is copy_mc_to_kernel() which
> return "bytes not copied".
>
> When a memory error occurs during step(), the value of "left" equal to
> the value of "part" (no one byte is copied successfully). In this case,
> iterate_bvec() returns 0, and copy_page_from_iter_atomic() also returns
> 0. The callback shmem_write_end()[2] also returns 0. Finally,
> generic_perform_write() goes to "goto again"[3], and the loop restarts.
> 4][5] cannot enter and exit the loop, then deadloop occurs.
Hmm. If the copy doesn't succeed and make any progress at all, then
the code in generic_perform_write() after the "goto again"
//[4]
if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
bytes)) {
status = -EFAULT;
break;
}
should break out of the loop.
So either your analysis looks a bit flawed, or I'm missing something.
Likely I'm missing something really obvious.
Why does the copy_mc_to_kernel() fail, but the
fault_in_iov_iter_readable() succeeds?
Linus
^ permalink raw reply [relevance 99%]
* Re: [GIT PULL] hotfixes for 6.8-rc7
@ 2024-02-28 0:51 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-28 0:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, mm-commits, linux-kernel
On Tue, 27 Feb 2024 at 14:56, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 6 hotfixes. 3 are cc:stable and the remainder address post-6.7 issues
> or aren't considered appropriate for backporting.
Hmm. I notice that you add "Link:" pointers to lore, but you do so
even for emails that have been sent to you without any lists, so that
they don't actually exist on lore..
IOW, that link-generating automation of yours looks a bit overly aggressive.
Linus
^ permalink raw reply [relevance 99%]
* Re: [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN()
@ 2024-02-27 22:34 86% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 22:34 UTC (permalink / raw)
To: Dan Williams
Cc: peterz, gregkh, Ira Weiny, Dave Jiang, Jonathan Cameron,
Fabio M. De Francesco, linux-kernel, linux-cxl
[-- Attachment #1: Type: text/plain, Size: 2053 bytes --]
On Tue, 27 Feb 2024 at 13:42, Dan Williams <dan.j.williams@intel.com> wrote:
>
> I will also note that these last 3 statements, nuking the proposal from
> space, I find excessive. Yes, on the internet no one can hear you being
> subtle, but the "MORE READABLE" and "NOTHING" were pretty darn
> unequivocal, especially coming from the person who has absolute final
> say on what enters his project.
Heh. It's not just " one can hear you being subtle", sometimes it's
also "people don't take hints". It can be hard to tell..
Anyway, it's not that I hate the guard things in general. But I do
think they need to be used carefully, and I do think it's very
important that they have clean interfaces.
The current setup came about after quite long discussions about
getting reasonable syntax, and I'm still a bit worried even about the
current simpler ones.
And by "simpler ones" I don't mean our current scoped_cond_guard()
thing. We have exactly one user of it, and I have considered getting
rid of that one user because I think it's absolutely horrid. I haven't
figured out a better syntax for it.
For the non-scoped version, I actually think there *would* be a better
syntax - putting the error case after the macro (the way we put the
success case after the macro for the scoped one).
In fact, maybe the solution is to make the scoped and non-scoped
versions act very similar: we could do something like this:
[scoped_]cond_guard(name, args) { success } else { fail };
and that syntax feels much more C-line to me.
So maybe something like the attached (TOTALLY UNTESTED!!) patch for
the scoped version, and then the non-scoped version would have the
same syntax (except it would have to generate that __UNIQUE_ID()
thing, of course).
I haven't thought much about this. But I think this would be more
acceptable to me, and also solve some of the ugliness with the current
pre-existing scoped_cond_guard().
I dunno. PeterZ did the existing stuff, but he's offlined due to
shoulder problems so not likely to chip in.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1999 bytes --]
include/linux/cleanup.h | 7 +++----
kernel/ptrace.c | 5 +++--
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index c2d09bc4f976..a015ac9517a6 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -142,7 +142,7 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
* for conditional locks the loop body is skipped when the lock is not
* acquired.
*
- * scoped_cond_guard (name, fail, args...) { }:
+ * scoped_cond_guard (name, args...) { } [ else { fail } :
* similar to scoped_guard(), except it does fail when the lock
* acquire fails.
*
@@ -169,11 +169,10 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
for (CLASS(_name, scope)(args), \
*done = NULL; __guard_ptr(_name)(&scope) && !done; done = (void *)1)
-#define scoped_cond_guard(_name, _fail, args...) \
+#define scoped_cond_guard(_name, args...) \
for (CLASS(_name, scope)(args), \
*done = NULL; !done; done = (void *)1) \
- if (!__guard_ptr(_name)(&scope)) _fail; \
- else
+ if (__guard_ptr(_name)(&scope))
/*
* Additional helper macros for generating lock guards with types, either for
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 2fabd497d659..f509b21a5711 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -441,8 +441,7 @@ static int ptrace_attach(struct task_struct *task, long request,
* SUID, SGID and LSM creds get determined differently
* under ptrace.
*/
- scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
- &task->signal->cred_guard_mutex) {
+ scoped_cond_guard (mutex_intr, &task->signal->cred_guard_mutex) {
scoped_guard (task_lock, task) {
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
@@ -466,6 +465,8 @@ static int ptrace_attach(struct task_struct *task, long request,
ptrace_set_stopped(task);
}
+ } else {
+ return -ERESTARTNOINTR;
}
/*
^ permalink raw reply related [relevance 86%]
* Re: [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN()
@ 2024-02-27 20:55 98% ` Linus Torvalds
0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-02-27 20:55 UTC (permalink / raw)
To: Dan Williams
Cc: peterz, gregkh, Ira Weiny, Dave Jiang, Jonathan Cameron,
Fabio M. De Francesco, linux-kernel, linux-cxl
On Tue, 27 Feb 2024 at 08:49, Dan Williams <dan.j.williams@intel.com> wrote:
>
> - rc = down_read_interruptible(&cxl_region_rwsem);
> - if (rc)
> - return rc;
> + cond_guard(rwsem_read_intr, return -EINTR, &cxl_region_rwsem);
Yeah, this is an example of how NOT to do things.
If you can't make the syntax be something clean and sane like
if (!cond_guard(rwsem_read_intr, &cxl_region_rwsem))
return -EINTR;
then this code should simply not be converted to guards AT ALL.
Note that we have a perfectly fine way to do conditional lock guarding
by simply using helper functions, which actually makes code MORE
READABLE:
if (!down_read_interruptible(&cxl_region_rwsem))
return -EINTR;
rc = do_locked_function();
up_read(&cxl_region_rwsem);
return rc;
and notice how there are no special cases, no multiple unlocks, no
NOTHING. And the syntax is clean.
Honestly, if people are going to use 'guard' to write crap code, we
need to really stop that in its tracks.
There is no upside to making up new interfaces that only generate garbage.
This is final. I'm not willing to even entertain this kind of crap.
Linus
^ permalink raw reply [relevance 98%]
* Re: [PATCH 1/3] cleanup: Add cond_guard() to conditional guards
@ 2024-02-27 20:49 98% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 20:49 UTC (permalink / raw)
To: Dan Williams
Cc: peterz, gregkh, Dave Jiang, Ira Weiny, Jonathan Cameron,
Fabio M. De Francesco, linux-kernel, linux-cxl
On Tue, 27 Feb 2024 at 08:48, Dan Williams <dan.j.williams@intel.com> wrote:
>
> cond_guard(mutex_intr, return -EINTR, &mutex);
Again, this is *not* helping make code readable and less likely to have bugs.
The macro has obvious deficiencies, like the "_fail" argument not
being surrounded by "{ }" (the equivalent of parenthesizing an
expression argument), but even with that trivial fix the syntax is
just too ugly to live, and doesn't match normal C syntax.
And yes, we have other macros that don't have normal C syntax, and
they are ugly too (example: #define CHKINFO(ret) in
drivers/video/fbdev/hgafb.c), but we should have higher standards for
globally visible helpers, and we should have *MUCH* higher standards
for helpers that are supposed to be all about reducing mistakes.
Bad / odd syntax does not reduce mistakes.
If a sane 'guard' model doesn't work for some code, the answer is not
to make an insane guard model. The answer is to not use 'guard' in
code like that.
Linus
^ permalink raw reply [relevance 98%]
* Re: [PATCH 2/3] cleanup: Introduce cond_no_free_ptr()
@ 2024-02-27 20:40 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-27 20:40 UTC (permalink / raw)
To: Dan Williams; +Cc: peterz, gregkh, Jonathan Cameron, linux-kernel, linux-cxl
On Tue, 27 Feb 2024 at 08:49, Dan Williams <dan.j.williams@intel.com> wrote:
>
> 5/ cond_no_free_ptr(rc == 0, return rc, res, name);
Ugh. Honestly, this is all too ugly for words.
The whole - and only - point for the cond_guard() is to make mistakes
less likely.
This is not it. This makes mistakes unreadable and undebuggable.
Linus
^ permalink raw reply [relevance 99%]
* Re: Linux regressions report for mainline [2024-02-25]
@ 2024-02-26 17:33 99% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-26 17:33 UTC (permalink / raw)
To: Linux regressions mailing list; +Cc: LKML, ntfs3, Konstantin Komarov
On Sun, 25 Feb 2024 at 06:21, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> Sorry, forgot something: there is a patch to fix a ntfs3 build problem
> that was posted 10+ days ago[1] that didn't get any reaction from the
> ntfs3 maintainer at all. Given the history of occasional slow responses
> for that subsystem I thought I'd let you know in case you want to pick
> the fix up directly; but if you do, consider using v2 of the patch[2].
Ack. Picked up directly.
Linus
^ permalink raw reply [relevance 99%]
* Linux 6.8-rc6
@ 2024-02-25 23:57 42% Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-25 23:57 UTC (permalink / raw)
To: Linux Kernel Mailing List
Another week, another rc. Nothing here really stands out.
Last week I said that I was hoping things would calm down a bit.
Technically things did calm down a bit, and rc6 is smaller than rc5
was. But not by a huge amount, and honestly, while there's nothing
really alarming here, there's more here than I would really like at
this point in the release.
So this may end up being one of those releases that get an rc8. We'll
see. The fact that we have a bit more commits than I would really wish
for might not be a huge issue when a noticeable portion of said
commits end up being about self-tests etc.
So right now I'm still on the fence about things. Most of the stuff
here is really just fairly trivial driver updates (and those self-test
ones), but we do have regressions being tracked still, so...
Just reading through the appended shortlog, a lot of this really _is_
very trivial, and some of the core stuff (like the RCU fixes by Al)
are so esoteric that I kind of doubt anybody has ever hit them in real
life. But still.. Over 300 non-merge fixes in the last week isn't
exactly quiet.
I'm clearly not ready to make that "do we do an rc8" decision right
now. I'll give it another week until I have to make that decision.
Linus
---
Aaro Koskinen (1):
usb: gadget: omap_udc: fix USB gadget regression on Palm TE
Al Viro (15):
Revert "get rid of DCACHE_GENOCIDE"
erofs: fix handling kern_mount() failure
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
rcu pathwalk: prevent bogus hard errors from may_lookup()
affs: free affs_sb_info with kfree_rcu()
exfat: move freeing sbi, upcase table and dropping nls into
rcu-delayed helper
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
nfs: fix UAF on pathwalk running into umount
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
procfs: make freeing proc_fs_info rcu-delayed
fuse: fix UAF in rcu pathwalks
cifs_get_link(): bail out in unsafe case
ext4_get_link(): fix breakage in RCU mode
Alex Elder (1):
net: ipa: don't overrun IPA suspend interrupt registers
Alexander Gordeev (1):
net/iucv: fix the allocation size of iucv_path_table array
Alexander Stein (1):
arm64: dts: tqma8mpql: fix audio codec iov-supply
Alison Schofield (4):
x86/numa: Fix the address overlap check in numa_fill_memblks()
x86/numa: Fix the sort compare func used in numa_fill_memblks()
cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
cxl/region: Allow out of order assembly of autodiscovered regions
Amit Machhiwal (1):
KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty
'arch_compat'
Andreas Larsson (1):
usb: uhci-grlib: Explicitly include linux/platform_device.h
Andrey Jr. Melnikov (1):
ahci: asm1064: correct count of reported ports
Andrzej Kacprowski (1):
accel/ivpu: Don't enable any tiles by default on VPU40xx
Andy Yan (4):
arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi 4B
arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi CM5 EVB
arm64: dts: rockchip: rename vcc5v0_usb30_host regulator for
Cool Pi CM5 EVB
arm64: dts: rockchip: Fix the num-lanes of pcie3x4 on Cool Pi CM5 EVB
Anshuman Khandual (1):
mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array
Armin Wolf (1):
drm/amd/display: Fix memory leak in dm_sw_fini()
Arnd Bergmann (4):
RDMA/srpt: fix function pointer cast warnings
nouveau: fix function cast warnings
iommu/vt-d: Fix constant-out-of-range warning
dm-integrity, dm-verity: reduce stack usage for recheck
Arunpravin Paneer Selvam (1):
drm/buddy: Modify duplicate list_splice_tail call
Ashutosh Dixit (2):
drm/xe/xe_gt_idle: Drop redundant newline in name
drm/xe: Fix modpost warning on xe_mocs kunit module
Baokun Li (1):
cachefiles: fix memory leak in cachefiles_add_cache()
Bart Van Assche (2):
RDMA/srpt: Support specifying the srpt_service_guid parameter
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
Benjamin Gray (1):
kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()
Brian Foster (1):
bcachefs: fix iov_iter count underflow on sub-block dio read
Chen Jun (1):
irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
Chengming Zhou (1):
mm/zswap: invalidate duplicate entry when !zswap_enabled
Chris Morgan (1):
arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names
Conor Dooley (1):
riscv: dts: sifive: add missing #interrupt-cells to pmic
Corey Minyard (1):
i2c: imx: when being a target, mark the last read as processed
Damien Le Moal (2):
ata: libata-core: Do not try to set sleeping devices to standby
ata: libata-core: Do not call ata_dev_power_set_standby() twice
Dan Carpenter (2):
scsi: ufs: Uninitialized variable in ufshcd_devfreq_target()
drm/nouveau/mmu/r535: uninitialized variable in r535_bar_new_()
Dan Williams (2):
acpi/ghes: Remove CXL CPER notifications
cxl/acpi: Fix load failures due to single window creation failure
Daniel Vacek (1):
IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
Daniil Dulov (1):
afs: Increase buffer size in afs_update_volume_status()
Dave Airlie (3):
nouveau/gsp: add kconfig option to enable GSP paths by default
nouveau: add an ioctl to return vram bar size.
nouveau: add an ioctl to report vram usage
Dave Jiang (4):
cxl: Change 'struct cxl_memdev_state' *_perf_list to single
'struct cxl_dpa_perf'
cxl: Remove unnecessary type cast in cxl_qos_class_verify()
cxl: Fix sysfs export of qos_class for memdev
cxl/test: Add support for qos_class checking
David Howells (1):
netfs: Fix missing zero-length check in unbuffered write
Dmitry Baryshkov (1):
Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
Don Brace (1):
scsi: smartpqi: Fix disable_managed_interrupts
Emil Renner Berthing (1):
gpiolib: Handle no pin_ranges in gpiochip_generic_config()
Eric Dumazet (3):
ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
net: implement lockless setsockopt(SO_PEEK_OFF)
Erik Kurzinger (2):
drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE
flag is set
drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func
Fabio Estevam (2):
Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector"
Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector"
Florian Fainelli (1):
net: bcmasp: Indicate MAC is in charge of PHY PM
Florian Westphal (2):
netfilter: nf_tables: set dormant flag on hook register failure
netfilter: nf_tables: use kzalloc for hook allocation
Frank Li (2):
usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
usb: cdns3: fix memory double free when handle zero packet
Gaurav Batra (1):
powerpc/pseries/iommu: DLPAR add doesn't completely initialize
pci_controller
Geert Uytterhoeven (2):
soc: microchip: Fix POLARFIRE_SOC_SYS_CTRL input prompt
ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes
Geliang Tang (2):
mptcp: add needs_id for userspace appending addr
mptcp: add needs_id for netlink appending addr
Gianmarco Lusvardi (1):
bpf, scripts: Correct GPL license name
Greg Joyce (1):
block: sed-opal: handle empty atoms when parsing response
Guenter Roeck (4):
MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
parisc: Fix stack unwinder
lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
hwmon: (nct6775) Fix access to temperature configuration registers
Hangbin Liu (1):
selftests: bonding: set active slave to primary eth1 specifically
Hans de Goede (8):
platform/x86: intel: int0002_vgpio: Pass IRQF_ONESHOT to request_irq()
platform/x86: touchscreen_dmi: Allow partial (prefix) matches
for ACPI names
platform/x86: touchscreen_dmi: Consolidate Goodix upside-down
touchscreen data
platform/x86: x86-android-tablets: Fix keyboard touchscreen on
Lenovo Yogabook1 X90
platform/x86: Add new get_serdev_controller() helper
platform/x86: x86-android-tablets: Fix serdev instantiation no
longer working
platform/x86: x86-android-tablets: Fix acer_b1_750_goodix_gpios name
platform/x86: intel-vbtn: Stop calling "VBDL" from notify_handler
Hari Bathini (1):
bpf: Fix warning for bpf_cpumask in verifier
Heiko Carstens (3):
s390/configs: provide compat topic configuration target
s390/configs: enable INIT_STACK_ALL_ZERO in all configurations
s390/configs: update default configurations
Heiko Stuebner (2):
arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds
arm64: dts: rockchip: set num-cs property for spi on px30
Helge Deller (1):
Revert "parisc: Only list existing CPUs in cpu_possible_mask"
Hojin Nam (1):
perf: CXL: fix CPMU filter value mask length
Horatiu Vultur (1):
net: sparx5: Add spinlock for frame transmission from CPU
Hou Tao (3):
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
selftest/bpf: Test the read of vsyscall page under x86-64
Huacai Chen (3):
LoongArch: Disable IRQ before init_fn() for nonboot CPUs
LoongArch: Update cpu_sibling_map when disabling nonboot CPUs
LoongArch: Call early_init_fdt_scan_reserved_mem() earlier
Jakub Kicinski (5):
net/sched: act_mirred: use the backlog for mirred ingress
net/sched: act_mirred: don't override retval if we already lost the skb
docs: netdev: update the link to the CI repo
tools: ynl: make sure we always pass yarg to mnl_cb_run
tools: ynl: don't leak mcast_groups on init error
Jason Gunthorpe (4):
iommufd: Reject non-zero data_type if no data_len is provided
s390: use the correct count for __iowrite64_copy()
iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
iommu/sva: Restore SVA handle sharing
Javier Martinez Canillas (1):
sparc: Fix undefined reference to fb_is_primary_device
Jeremy Kerr (1):
net: mctp: put sock on tag allocation failure
Jianbo Liu (1):
net/sched: flower: Add lock protection when remove filter handle
Jiri Pirko (1):
devlink: fix port dump cmd type
Joao Martins (9):
iommufd/iova_bitmap: Bounds check mapped::pages access
iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
iommufd/selftest: Test u64 unaligned bitmaps
iommufd/iova_bitmap: Handle recording beyond the mapped pages
iommufd/selftest: Refactor dirty bitmap tests
iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
iommufd/selftest: Hugepage mock domain support
iommufd/selftest: Add mock IO hugepages tests
iommufd/iova_bitmap: Consider page offset for the pages to be pinned
Johan Jonker (1):
arm64: dts: rockchip: Drop interrupts property from rk3328
pwm-rockchip node
Johannes Weiner (1):
mm: memcontrol: clarify swapaccount=0 deprecation warning
Jonathan Corbet (1):
docs: Instruct LaTeX to cope with deeper nesting
Josef Bacik (1):
btrfs: fix deadlock with fiemap and extent locking
Justin Chen (1):
net: bcmasp: Sanity check is off by one
Justin Iurman (2):
Fix write to cloned skb in ipv6_hop_ioam()
selftests: ioam: refactoring to align with the fix
Kairui Song (1):
mm/swap: fix race when skipping swapcache
Kalesh AP (5):
RDMA/bnxt_re: Avoid creating fence MR for newer adapters
RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config
RDMA/bnxt_re: Fix unconditional fence for newer adapters
RDMA/bnxt_re: Return error for SRQ resize
RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq
Kamal Heib (1):
RDMA/qedr: Fix qedr_create_user_qp error flow
Kees Cook (1):
enic: Avoid false positive under FORTIFY_SOURCE
Kent Overstreet (6):
bcachefs: fix backpointer_to_text() when dev does not exist
bcachefs: Kill __GFP_NOFAIL in buffered read path
bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
bcachefs: Fix bch2_journal_flush_device_pins()
bcachefs: Fix check_snapshot() memcpy
bcachefs: fix bch2_save_backtrace()
Krishna Kurapati (1):
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Krzysztof Kozlowski (3):
riscv: dts: starfive: replace underscores in node names
arm64: dts: rockchip: minor rk3588 whitespace cleanup
LoongArch: dts: Minor whitespace cleanup
Kuniyuki Iwashima (3):
dccp/tcp: Unhash sk from ehash for tb2 alloc failure after
check_estalblished().
arp: Prevent overflow in arp_req_get().
af_unix: Drop oob_skb ref before purging queue in GC.
Kurt Kanzenbach (1):
net: stmmac: Fix EST offset for dwmac 5.10
Lad Prabhakar (1):
cache: ax45mp_cache: Align end size to cache boundary in
ax45mp_dma_cache_wback()
Leon Romanovsky (1):
RDMA/mlx5: Fix fortify source warning while accessing Eth segment
Lewis Huang (1):
drm/amd/display: Only allow dig mapping to pwrseq in new asic
Li Ming (1):
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
Lino Sanfilippo (2):
serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485
is enabled
serial: amba-pl011: Fix DMA transmission in RS485 mode
Linus Torvalds (3):
sched/membarrier: reduce the ability to hammer on sys_membarrier
drm/tests/drm_buddy: fix build failure on 32-bit targets
Linux 6.8-rc6
Lucas Stach (1):
bus: imx-weim: fix valid range check
Ma Jun (1):
drm/amdgpu: Fix the runtime resume failure issue
Marc Dionne (2):
netfs: Fix i_dio_count leak on DIO read past i_size
afs: Fix ignored callbacks over ipv4
Marek Vasut (1):
arm64: dts: imx8mp: Disable UART4 by default on Data Modul
i.MX8M Plus eDM SBC
Mario Limonciello (5):
platform/x86/amd/pmf: Fix a suspend hang on Framework 13
platform/x86/amd/pmf: Add debugging message for missing policy data
platform/x86/amd/pmf: Fixup error handling for amd_pmf_init_smart_pc()
platform/x86/amd/pmf: Fix a potential race with policy binary sideload
platform/x86: thinkpad_acpi: Only update profile if successfully converted
Mark Brown (3):
usb: typec: tpcm: Fix issues with power being removed during reset
arm64/sme: Restore SME registers on exit from suspend
arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspend
Mark Pearson (1):
platform/x86: think-lmi: Fix password opcode ordering for workstations
Mark Zhang (1):
IB/mlx5: Don't expose debugfs entries for RRoCE general
parameters if not supported
Martin Blumenstingl (1):
drm/meson: Don't remove bridges which are created by other drivers
Martin K. Petersen (2):
scsi: sd: usb_storage: uas: Access media prior to querying
device properties
scsi: core: Consult supported VPD page list prior to fetching page
Martin KaFai Lau (2):
bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
selftests/bpf: Test racing between bpf_timer_cancel_and_free and
bpf_timer_cancel
Matthew Auld (1):
drm/tests/drm_buddy: fix 32b build
Matthew Brost (3):
drm/xe: Fix xe_vma_set_pte_size
drm/xe: Add XE_VMA_PTE_64K VMA flag
drm/xe: Return 2MB page size for compact 64k PTEs
Matthieu Baerts (NGI0) (7):
selftests: mptcp: pm nl: also list skipped tests
selftests: mptcp: pm nl: avoid error msg on older kernels
selftests: mptcp: diag: fix bash warnings on older kernels
selftests: mptcp: simult flows: fix some subtest names
selftests: mptcp: userspace_pm: unique subtest names
selftests: mptcp: diag: unique 'in use' subtest names
selftests: mptcp: diag: unique 'cestab' subtest names
Max Kellermann (2):
parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check
parisc/kprobes: always include asm-generic/kprobes.h
Maxime Ripard (1):
drm/i915/tv: Fix TV mode
Melissa Wen (1):
drm/amd/display: fix null-pointer dereference on edid reading
Mike Marciniszyn (1):
RDMA/irdma: Fix KASAN issue with tasklet
Mike Snitzer (1):
dm-crypt, dm-integrity, dm-verity: bump target version
Mikulas Patocka (5):
dm-integrity: recheck the integrity tag after a failure
dm-verity: recheck the hash after a failure
dm-crypt: don't modify the data when using authenticated encryption
dm-crypt: recheck the integrity tag after a failure
dm-verity, dm-crypt: align "struct bvec_iter" correctly
Muhammad Usama Anjum (1):
selftests/iommu: fix the config fragment
Mustafa Ismail (2):
RDMA/irdma: Set the CQ read threshold for GEN 1
RDMA/irdma: Add AE for too many RNRS
Nam Cao (1):
irqchip/sifive-plic: Enable interrupt if needed before EOI
Naohiro Aota (1):
scsi: target: pscsi: Fix bio_put() for error case
Nhat Pham (1):
mm/swap_state: update zswap LRU's protection range with the folio locked
Nikita Shubin (1):
ARM: ep93xx: Add terminator to gpiod_lookup_table
Oliver Upton (3):
KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
irqchip/gic-v3-its: Do not assume vPE tables are preallocated
Ondrej Jirman (1):
Revert "usb: typec: tcpm: reset counter when enter into
unattached state after try role"
Pablo Neira Ayuso (3):
netfilter: nft_flow_offload: reset dst in route object after
setting up flow
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nf_tables: register hooks last when adding new chain/flowtable
Palmer Dabbelt (1):
tty: hvc: Don't enable the RISC-V SBI console by default
Paolo Abeni (4):
mptcp: fix lockless access in subflow ULP diag
mptcp: fix data races on local_id
mptcp: fix data races on remote_id
mptcp: fix duplicate subflow creation
Pavel Sakharov (1):
net: stmmac: Fix incorrect dereference in interrupt handlers
Pawan Gupta (5):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Pawel Laszczak (2):
usb: cdnsp: blocked some cdns3 specific code
usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
Peter Oberparleiter (1):
s390/cio: fix invalid -EBUSY on ccw_device_start
Qu Wenruo (1):
btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
Radhey Shyam Pandey (1):
ata: ahci_ceva: fix error handling for Xilinx GT PHY support
Randy Dunlap (2):
scsi: jazz_esp: Only build if SCSI core is builtin
net: ethernet: adi: requires PHYLIB support
Rob Herring (5):
arm64: dts: freescale: Disable interrupt_map check
arm: dts: Fix dtc interrupt_provider warnings
arm64: dts: Fix dtc interrupt_provider warnings
arm: dts: Fix dtc interrupt_map warnings
arm64: dts: qcom: Fix interrupt-map cell sizes
Robert Richter (1):
cxl/pci: Fix disabling memory if DVSEC CXL Range does not match
a CFMWS window
Rémi Denis-Courmont (2):
phonet: take correct lock to peek at the RX queue
phonet/pep: fix racy skb_queue_empty() use
Sabrina Dubroca (5):
tls: break out of main loop when PEEK gets a non-data record
tls: stop recv() if initial process_rx_list gave us non-DATA
tls: don't skip over different type records from the rx_list
selftests: tls: add test for merging of same-type control messages
selftests: tls: add test for peeking past a record of a different type
Samasth Norway Ananda (1):
firmware: microchip: fix wrong sizeof argument
Sandeep Dhavale (1):
erofs: fix refcount on the metabuf used for inode lookup
Sean Christopherson (1):
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Sebastian Andrzej Siewior (1):
xsk: Add truesize to skb_add_rx_frag().
Sebastian Reichel (1):
arm64: dts: rockchip: mark system power controller on rk3588-evb1
SeongJae Park (4):
mm/damon/core: check apply interval in damon_do_apply_schemes()
mm/damon/sysfs-schemes: handle schemes sysfs dir removal before
commit_schemes_quota_goals
mm/damon/reclaim: fix quota stauts loss due to online tunings
mm/damon/lru_sort: fix quota status loss due to online tunings
Shakeel Butt (1):
MAINTAINERS: mailmap: update Shakeel's email address
Shannon Nelson (1):
ionic: use pci_is_enabled not open code
Shigeru Yoshida (1):
bpf, sockmap: Fix NULL pointer dereference in
sk_psock_verdict_data_ready()
Shiraz Saleem (1):
RDMA/irdma: Validate max_send_wr and max_recv_wr
Shyam Sundar S K (2):
platform/x86/amd/pmf: Remove smart_pc_status enum
platform/x86/amd/pmf: Fix TEE enact command failure after
suspend and resume
Siddharth Vadapalli (1):
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
Simon Horman (1):
MAINTAINERS: Add framer headers to NETWORKING [GENERAL]
Srinivasan Shanmugam (1):
drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv
Steven Rostedt (Google) (1):
ring-buffer: Do not let subbuf be bigger than write mask
Subbaraya Sundeep (1):
octeontx2-af: Consider the action set by PF
Swapnil Patel (1):
drm/amd/display: fix input states translation error for dcn35 & dcn351
Terry Tritton (1):
selftests/mm: uffd-unit-test check if huge page size is 0
Thinh Nguyen (1):
usb: dwc3: gadget: Don't disconnect if not started
Thomas Hellström (2):
drm/xe/uapi: Remove support for persistent exec_queues
drm/ttm: Fix an invalid freeing on already freed page in error path
Tobias Waldekranz (2):
net: bridge: switchdev: Skip MDB replays of deferred events on offload
net: bridge: switchdev: Ensure deferred event delivery on unoffload
Tom Parkin (1):
l2tp: pass correct message length to ip6_append_data
Tudor Ambarus (2):
dt-bindings: clock: gs101: rename cmu_misc clock-names
clk: samsung: clk-gs101: comply with the new dt cmu_misc clock names
Uwe Kleine-König (1):
ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes
Vasiliy Kovalev (3):
ipv6: sr: fix possible use-after-free and null-ptr-deref
devlink: fix possible use-after-free and memory leaks in devlink_init()
gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
Vegard Nossum (1):
docs: translations: use attribute to store current language
Vidya Sagar (1):
PCI/MSI: Prevent MSI hardware interrupt number truncation
WANG Xuerui (3):
LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() &
kvm_check_cpucfg()
LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments
Wayne Lin (1):
drm/amd/display: adjust few initialization order in dm
Will Deacon (1):
Revert "arm64: jump_label: use constraints "Si" instead of "i""
Xu Yang (2):
usb: roles: fix NULL pointer issue when put module's reference
usb: roles: don't get/set_role() when usb_role_switch is unregistered
Yafang Shao (2):
bpf: Fix an issue due to uninitialized bpf_iter_task
selftests/bpf: Add negtive test cases for task iter
Yi Liu (9):
iommu/vt-d: Track nested domains in parent
iommu/vt-d: Add __iommu_flush_iotlb_psi()
iommu/vt-d: Add missing iotlb flush for parent domain
iommu/vt-d: Update iotlb in nested domain attach
iommu/vt-d: Add missing device iotlb flush for parent domain
iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
iommu/vt-d: Wrap the dirty tracking loop to be a helper
iommu/vt-d: Add missing dirty tracking set for parent domain
iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
Yishai Hadas (1):
RDMA/mlx5: Relax DEVX access upon modify commands
Yosry Ahmed (1):
mm: zswap: fix missing folio cleanup in writeback race path
Yu Kuai (6):
md: Fix missing release of 'active_io' for flush
md: Don't ignore suspended array in md_check_recovery()
md: Don't ignore read-only array in md_check_recovery()
md: Make sure md_do_sync() will set MD_RECOVERY_DONE
md: Don't register sync_thread for reshape directly
md: Don't suspend the array for interrupted reshape
Zhipeng Lu (1):
IB/hfi1: Fix a memleak in init_credit_return
zhenwei pi (1):
crypto: virtio/akcipher - Fix stack overflow on memcpy
^ permalink raw reply [relevance 42%]
* Re: [PATCH next v2 08/11] minmax: Add min_const() and max_const()
@ 2024-02-25 17:13 97% ` Linus Torvalds
0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-02-25 17:13 UTC (permalink / raw)
To: David Laight
Cc: linux-kernel, Netdev, dri-devel, Jens Axboe,
Matthew Wilcox (Oracle),
Christoph Hellwig, linux-btrfs, Andrew Morton, Andy Shevchenko,
David S . Miller, Dan Carpenter, Jani Nikula
On Sun, 25 Feb 2024 at 08:53, David Laight <David.Laight@aculab.com> wrote:
>
> The expansions of min() and max() contain statement expressions so are
> not valid for static intialisers.
> min_const() and max_const() are expressions so can be used for static
> initialisers.
I hate the name.
Naming shouldn't be about an implementation detail, particularly not
an esoteric one like the "C constant expression" rule. That can be
useful for some internal helper functions or macros, but not for
something that random people are supposed to USE.
Telling some random developer that inside an array size declaration or
a static initializer you need to use "max_const()" because it needs to
syntactically be a constant expression, and our regular "max()"
function isn't that, is just *horrid*.
No, please just use the traditional C model of just using ALL CAPS for
macro names that don't act like a function.
Yes, yes, that may end up requiring getting rid of some current users of
#define MIN(a,b) ((a)<(b) ? (a):(b))
but dammit, we don't actually have _that_ many of them, and why should
we have random drivers doing that anyway?
Linus
^ permalink raw reply [relevance 97%]
Results 1-200 of ~40000 next (older) | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2022-03-02 4:34 [PATCH 00/19] Enable -Wshadow=local for kernel/sched Matthew Wilcox (Oracle)
2024-04-16 21:15 ` Kees Cook
2024-04-17 0:29 99% ` Linus Torvalds
2024-04-17 0:50 90% ` Linus Torvalds
2022-05-27 11:29 [GIT PULL] Crypto Fixes for 5.19 Herbert Xu
2022-08-02 6:05 ` [GIT PULL] Crypto Update for 5.20 Herbert Xu
2022-10-04 8:54 ` [GIT PULL] Crypto Update for 6.1 Herbert Xu
2022-12-14 8:15 ` [GIT PULL] Crypto Update for 6.2 Herbert Xu
2023-02-20 5:22 ` [GIT PULL] Crypto Update for 6.3 Herbert Xu
2023-04-24 4:52 ` [GIT PULL] Crypto Update for 6.4 Herbert Xu
2023-06-29 5:06 ` [GIT PULL] Crypto Update for 6.5 Herbert Xu
2023-08-28 9:22 ` [GIT PULL] Crypto Update for 6.6 Herbert Xu
2023-11-02 6:56 ` [GIT PULL] Crypto Update for 6.7 Herbert Xu
2024-01-09 22:17 ` [GIT PULL] Crypto Update for 6.8 Herbert Xu
2024-03-15 3:04 ` [GIT PULL] Crypto Update for 6.9 Herbert Xu
2024-03-15 21:51 99% ` Linus Torvalds
2023-04-23 23:55 [syzbot] [kernel?] KCSAN: data-race in __fput / __tty_hangup (4) Tetsuo Handa
2023-04-24 0:44 ` Al Viro
2023-04-24 1:09 ` Tetsuo Handa
2023-04-25 14:47 ` Tetsuo Handa
2023-04-25 16:03 ` Al Viro
2023-04-25 22:09 ` Tetsuo Handa
2023-04-26 11:05 ` [PATCH] tty: tty_io: remove hung_up_tty_fops Tetsuo Handa
2023-05-14 1:02 ` [PATCH v2] " Tetsuo Handa
2023-05-30 10:44 ` Greg Kroah-Hartman
2023-05-30 11:57 ` Tetsuo Handa
2023-05-30 12:51 ` Greg Kroah-Hartman
2024-04-27 6:20 ` [PATCH v3] " Tetsuo Handa
2024-04-27 19:02 96% ` Linus Torvalds
2024-04-28 10:19 ` Tetsuo Handa
2024-04-28 18:50 99% ` Linus Torvalds
2024-04-29 13:55 ` Marco Elver
2024-04-29 15:38 99% ` Linus Torvalds
2024-05-01 18:45 ` Paul E. McKenney
2024-05-01 18:56 99% ` Linus Torvalds
2024-05-01 19:02 ` Paul E. McKenney
2024-05-01 20:14 ` Marco Elver
2024-05-01 21:06 97% ` Linus Torvalds
2024-05-01 21:20 94% ` Linus Torvalds
2023-09-25 12:02 [PATCH v7 00/12] iov_iter: Convert the iterator macros into inline funcs David Howells
2023-09-25 12:03 ` [PATCH v7 07/12] iov_iter: Convert iterate*() to " David Howells
2024-02-18 3:13 ` [bug report] dead loop in generic_perform_write() //Re: " Tong Tiangen
2024-02-28 21:21 99% ` Linus Torvalds
2024-02-28 22:57 99% ` Linus Torvalds
2024-02-29 8:13 ` Tong Tiangen
2024-02-29 17:32 91% ` Linus Torvalds
2024-03-02 2:59 76% ` Linus Torvalds
2024-03-02 9:37 ` Tong Tiangen
2024-03-02 18:06 93% ` Linus Torvalds
2024-03-02 18:11 99% ` Linus Torvalds
2024-03-04 11:56 ` David Howells
2024-03-04 18:32 99% ` Linus Torvalds
2024-02-13 5:55 [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-04-23 15:21 ` Shrikanth Hegde
2024-04-23 16:13 97% ` Linus Torvalds
2024-02-13 5:55 [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO Ankur Arora
2024-03-03 1:08 ` Joel Fernandes
2024-03-05 8:11 ` Ankur Arora
2024-03-06 20:42 ` Joel Fernandes
2024-03-07 19:01 ` Paul E. McKenney
2024-03-08 0:15 ` Joel Fernandes
2024-03-08 0:42 ` Paul E. McKenney
2024-03-08 4:22 ` Ankur Arora
2024-03-08 21:33 ` Paul E. McKenney
2024-03-11 4:50 ` Ankur Arora
2024-03-11 19:26 ` Paul E. McKenney
2024-03-11 20:09 ` Ankur Arora
2024-03-11 20:23 95% ` Linus Torvalds
2024-02-25 13:21 Linux regressions report for mainline [2024-02-25] Regzbot (on behalf of Thorsten Leemhuis)
2024-02-25 14:21 ` Linux regression tracking (Thorsten Leemhuis)
2024-02-26 17:33 99% ` Linus Torvalds
2024-02-25 16:46 [PATCH next v2 00/11] minmax: Optimise to reduce .i line length David Laight
2024-02-25 16:53 ` [PATCH next v2 08/11] minmax: Add min_const() and max_const() David Laight
2024-02-25 17:13 97% ` Linus Torvalds
2024-02-25 23:57 42% Linux 6.8-rc6 Linus Torvalds
2024-02-27 16:48 [PATCH 0/2] cleanup: A couple extensions for conditional resource management Dan Williams
2024-02-27 16:48 ` [PATCH 1/3] cleanup: Add cond_guard() to conditional guards Dan Williams
2024-02-27 20:49 98% ` Linus Torvalds
2024-02-27 16:48 ` [PATCH 2/3] cleanup: Introduce cond_no_free_ptr() Dan Williams
2024-02-27 20:40 99% ` Linus Torvalds
2024-02-27 16:49 ` [PATCH 3/3] cxl/region: Use cond_guard() in show_targetN() Dan Williams
2024-02-27 20:55 98% ` Linus Torvalds
2024-02-27 21:41 ` Dan Williams
2024-02-27 22:34 86% ` Linus Torvalds
2024-02-27 22:56 [GIT PULL] hotfixes for 6.8-rc7 Andrew Morton
2024-02-28 0:51 99% ` Linus Torvalds
2024-02-28 22:55 [PATCH 0/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing Helen Koike
2024-02-28 22:55 ` [PATCH 1/3] " Helen Koike
2024-02-29 9:02 ` Maxime Ripard
2024-02-29 9:23 ` Nikolai Kondrashov
2024-02-29 20:21 95% ` Linus Torvalds
2024-03-01 10:27 ` Nikolai Kondrashov
2024-03-01 20:10 96% ` Linus Torvalds
2024-02-29 20:39 [GIT PULL] Networking for v6.8-rc7 Jakub Kicinski
2024-02-29 20:56 99% ` Linus Torvalds
2024-03-01 8:32 [Linux Kernel Bug] KASAN: slab-out-of-bounds Write in tomoyo_write_control Sam Sun
2024-03-01 13:04 ` [PATCH for 6.8] tomoyo: fix UAF write bug in tomoyo_write_control() Tetsuo Handa
2024-03-01 19:14 99% ` Linus Torvalds
2024-03-01 17:07 [PATCH RFC 0/4] memcg_kmem hooks refactoring and kmem_cache_charge() Vlastimil Babka
2024-03-01 17:07 ` [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat() Vlastimil Babka
2024-03-01 17:51 90% ` Linus Torvalds
2024-03-24 2:27 ` Al Viro
2024-03-24 17:44 94% ` Linus Torvalds
2024-03-01 20:12 arch/x86/include/asm/processor.h:698:16: sparse: sparse: incorrect type in initializer (different address spaces) kernel test robot
2024-03-01 21:57 ` Thomas Gleixner
2024-03-01 22:26 ` Thomas Gleixner
2024-03-02 11:37 ` Thomas Gleixner
2024-03-02 15:44 ` Thomas Gleixner
2024-03-02 22:00 ` Thomas Gleixner
2024-03-02 22:49 99% ` Linus Torvalds
2024-03-01 22:52 [PATCH v2] x86: disable non-instrumented version of copy_mc when KMSAN is enabled Tetsuo Handa
2024-03-05 11:31 ` Tetsuo Handa
2024-03-05 17:57 93% ` Linus Torvalds
2024-03-06 22:08 ` Tetsuo Handa
2024-03-07 0:09 99% ` Linus Torvalds
2024-03-02 16:12 [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short Steven Rostedt
2024-03-02 17:24 99% ` Linus Torvalds
2024-03-02 19:59 ` Steven Rostedt
2024-03-02 20:25 99% ` Linus Torvalds
2024-03-02 20:33 99% ` Linus Torvalds
2024-03-02 20:47 ` Steven Rostedt
2024-03-02 20:55 99% ` Linus Torvalds
2024-03-03 12:59 ` Steven Rostedt
2024-03-03 17:38 99% ` Linus Torvalds
2024-03-03 19:07 ` Steven Rostedt
2024-03-03 20:09 79% ` Linus Torvalds
2024-03-03 21:00 ` Steven Rostedt
2024-03-04 21:42 ` Steven Rostedt
2024-03-04 21:50 99% ` Linus Torvalds
2024-03-04 22:10 ` Steven Rostedt
2024-03-04 23:20 98% ` Linus Torvalds
2024-03-04 23:47 ` Steven Rostedt
2024-03-04 23:52 ` Steven Rostedt
2024-03-05 0:17 99% ` Linus Torvalds
2024-03-03 21:15 51% Linux 6.8-rc7 Linus Torvalds
2024-03-04 10:12 [patch 0/9] x86: Cure tons of sparse warnings (mostly __percpu) Thomas Gleixner
2024-03-04 10:12 ` [patch 5/9] x86: Cure per CPU madness on UP Thomas Gleixner
2024-03-15 16:17 ` Guenter Roeck
2024-03-15 16:42 97% ` Linus Torvalds
2024-03-15 17:40 ` Thomas Gleixner
2024-03-15 22:55 ` Thomas Gleixner
2024-03-15 23:23 93% ` Linus Torvalds
2024-03-16 1:11 ` Thomas Gleixner
2024-03-16 1:23 99% ` Linus Torvalds
2024-03-04 16:19 [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug Joel Fernandes
2024-03-05 6:24 ` linke li
2024-03-06 15:37 ` Steven Rostedt
2024-03-06 17:36 ` Paul E. McKenney
2024-03-06 18:01 ` Steven Rostedt
2024-03-06 18:43 87% ` Linus Torvalds
2024-03-06 18:55 ` Steven Rostedt
2024-03-06 19:01 95% ` Linus Torvalds
2024-03-06 19:27 95% ` Linus Torvalds
2024-03-06 19:47 ` Steven Rostedt
2024-03-06 20:06 94% ` Linus Torvalds
2024-03-06 19:27 ` Steven Rostedt
2024-03-06 19:46 92% ` Linus Torvalds
2024-03-06 20:20 97% ` Linus Torvalds
2024-03-07 2:29 ` Paul E. McKenney
2024-03-07 2:43 93% ` Linus Torvalds
2024-03-07 2:49 99% ` Linus Torvalds
2024-03-07 3:06 ` Mathieu Desnoyers
2024-03-07 3:37 ` Paul E. McKenney
2024-03-07 13:53 ` Mathieu Desnoyers
2024-03-07 19:47 ` Paul E. McKenney
2024-03-07 20:00 96% ` Linus Torvalds
2024-03-07 20:57 ` Paul E. McKenney
2024-03-07 21:40 ` Julia Lawall
2024-03-07 22:09 87% ` Linus Torvalds
2024-03-05 13:33 [PATCH] coredump: get machine check errors early rather than during iov_iter Tong Tiangen
2024-03-05 16:33 ` Christian Brauner
2024-03-05 16:39 ` Jens Axboe
2024-03-05 17:29 71% ` Linus Torvalds
2024-03-05 23:51 linux-next: build warning after merge of the vfs-brauner tree Stephen Rothwell
2024-03-06 2:48 99% ` Linus Torvalds
2024-03-06 4:37 ` Stephen Rothwell
2024-03-06 4:47 99% ` Linus Torvalds
2024-03-08 10:13 [GIT PULL] vfs pidfd Christian Brauner
2024-03-11 20:05 99% ` Linus Torvalds
2024-03-12 14:15 ` Christian Brauner
2024-03-12 16:23 99% ` Linus Torvalds
2024-03-12 20:09 ` Christian Brauner
2024-03-12 20:21 99% ` Linus Torvalds
2024-03-13 17:10 ` Christian Brauner
2024-03-13 19:40 99% ` Linus Torvalds
2024-03-08 17:15 [GIT PULL] RCU changes for v6.9 Boqun Feng
2024-03-12 20:32 ` Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9] Florian Fainelli
2024-03-12 21:07 ` Boqun Feng
2024-03-12 21:34 ` Florian Fainelli
2024-03-12 21:44 99% ` Linus Torvalds
2024-03-12 23:48 ` Boqun Feng
2024-03-13 16:01 ` Joel Fernandes
2024-03-13 21:30 ` Florian Fainelli
2024-03-13 21:59 ` Russell King (Oracle)
2024-03-13 22:04 ` Florian Fainelli
2024-03-13 22:49 ` Russell King (Oracle)
2024-03-13 23:29 ` Florian Fainelli
2024-03-14 1:15 97% ` Linus Torvalds
2024-03-08 18:38 [PATCH 0/6] tracing/ring-buffer: Fix wakeup of ring buffer waiters Steven Rostedt
2024-03-08 20:39 96% ` Linus Torvalds
2024-03-08 21:35 ` Steven Rostedt
2024-03-08 21:39 99% ` Linus Torvalds
2024-03-08 21:41 99% ` Linus Torvalds
2024-03-10 21:06 51% Linux 6.8 Linus Torvalds
2024-03-11 15:19 [GIT PULL] x86/sev for v6.9-rc1 Borislav Petkov
2024-03-12 0:50 99% ` Linus Torvalds
2024-03-11 15:57 [GIT PULL] EDAC updates for v6.9 Borislav Petkov
2024-03-12 1:12 99% ` Linus Torvalds
2024-03-12 2:24 ` Randy Dunlap
2024-03-12 2:25 99% ` Linus Torvalds
2024-03-11 19:30 [GIT PULL] AFFS update for 6.9 David Sterba
2024-03-12 20:02 54% ` Linus Torvalds
2024-03-12 4:25 [GIT PULL] Networking for v6.9 Jakub Kicinski
2024-03-12 20:17 99% ` Linus Torvalds
2024-03-12 20:34 ` Jakub Kicinski
2024-03-12 20:47 ` Jakub Kicinski
2024-03-12 21:11 95% ` Linus Torvalds
2024-03-13 1:00 99% ` Linus Torvalds
2024-03-12 9:55 [GIT PULL] slab updates for 6.9 Vlastimil Babka
2024-03-13 3:54 99% ` Linus Torvalds
2024-03-13 1:10 [GIT PULL] bcachefs " Kent Overstreet
2024-03-13 20:47 94% ` Linus Torvalds
2024-03-13 21:34 ` Kent Overstreet
2024-03-13 21:51 99% ` Linus Torvalds
2024-03-13 22:22 ` Kent Overstreet
2024-03-13 22:28 ` Kent Overstreet
2024-03-14 17:15 99% ` Linus Torvalds
2024-03-13 4:06 [git pull] drm for 6.9-rc1 Dave Airlie
2024-03-14 1:49 99% ` Linus Torvalds
2024-03-13 20:56 [RFC PATCH 0/2] Introduce serialized smp_call_function APIs Mathieu Desnoyers
2024-03-13 20:56 ` [RFC PATCH 1/2] smp: Implement " Mathieu Desnoyers
2024-03-13 21:19 99% ` Linus Torvalds
[not found] <65f2d9d4.050a0220.b240.7bddSMTPIN_ADDED_BROKEN@mx.google.com>
2024-03-14 18:36 97% ` [GIT PULL] platform-drivers-x86 for v6.9-1 Linus Torvalds
2024-03-14 18:43 [GIT PULL] dlm fixes for 6.9 David Teigland
2024-03-15 17:10 87% ` Linus Torvalds
2024-03-14 19:43 [GIT PULL] clk changes for the merge window Stephen Boyd
2024-03-15 18:54 99% ` Linus Torvalds
2024-03-14 20:31 [GIT PULL] lsm/lsm-pr-20240314 Paul Moore
2024-03-14 23:05 99% ` Linus Torvalds
2024-03-15 11:03 [GIT PULL]: Generic phy updates for v6.9 Vinod Koul
2024-03-15 19:22 99% ` Linus Torvalds
2024-03-16 18:05 ` Vinod Koul
2024-03-16 18:23 99% ` Linus Torvalds
2024-03-15 15:10 [GIT PULL] fs/9p patches for 6.9 merge window Eric Van Hensbergen
2024-03-15 17:17 99% ` Linus Torvalds
2024-03-15 16:29 [GIT PULL] tracing: Updates for v6.9 Steven Rostedt
2024-03-16 16:31 99% ` Linus Torvalds
2024-03-16 16:59 93% ` Linus Torvalds
2024-03-16 18:18 97% ` Linus Torvalds
2024-03-16 18:20 ` Steven Rostedt
2024-03-16 18:42 93% ` Linus Torvalds
2024-03-16 20:00 ` Borislav Petkov
2024-03-16 20:42 88% ` Linus Torvalds
2024-03-15 17:49 [GIT PULL] KVM changes for Linux 6.9 merge window Paolo Bonzini
2024-03-15 22:28 98% ` Linus Torvalds
2024-03-15 23:32 ` Oliver Upton
2024-03-15 23:49 ` Oliver Upton
2024-03-16 8:48 ` Paolo Bonzini
2024-03-16 16:01 99% ` Linus Torvalds
2024-03-16 0:24 ` [PATCH] Revert "KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checking" Oliver Upton
2024-03-16 0:51 99% ` Linus Torvalds
2024-03-18 12:19 [GIT PULL] vfs fixes Christian Brauner
2024-03-18 19:14 92% ` Linus Torvalds
2024-03-18 19:41 99% ` Linus Torvalds
2024-03-18 15:30 [GIT PULL v2] tracing: Updates for v6.9 Steven Rostedt
2024-03-19 16:23 96% ` Linus Torvalds
2024-03-19 17:06 ` Steven Rostedt
2024-03-19 17:13 ` Steven Rostedt
2024-03-19 21:03 ` Nathan Chancellor
2024-03-19 21:22 99% ` Linus Torvalds
2024-03-18 21:25 [GIT PULL v2] dlm fixes for 6.9 David Teigland
2024-03-18 22:44 99% ` Linus Torvalds
2024-03-19 7:41 [GIT PULL] virtio: features, fixes Michael S. Tsirkin
2024-03-19 18:03 98% ` Linus Torvalds
2024-03-19 14:12 [GIT PULL] more s390 updates for 6.9 merge window Heiko Carstens
2024-03-19 18:54 97% ` Linus Torvalds
2024-03-19 16:36 [PATCH v1 1/3] mm: kmsan: implement kmsan_memmove() Alexander Potapenko
2024-03-19 16:36 ` [PATCH v1 2/3] instrumented.h: add instrument_memcpy_before, instrument_memcpy_after Alexander Potapenko
2024-03-19 17:52 99% ` Linus Torvalds
2024-03-19 16:36 ` [PATCH v1 3/3] x86: call instrumentation hooks from copy_mc.c Alexander Potapenko
2024-03-19 17:58 99% ` Linus Torvalds
2024-03-20 10:18 [PATCH v2 1/3] mm: kmsan: implement kmsan_memmove() Alexander Potapenko
2024-03-20 16:04 99% ` Linus Torvalds
2024-03-20 15:22 [GIT PULL] tracing/tools: Updates for v6.9 Steven Rostedt
2024-03-20 23:40 99% ` Linus Torvalds
2024-03-21 4:09 [GIT PULL] Hyper-V commits for 6.9 Wei Liu
2024-03-21 17:06 99% ` Linus Torvalds
2024-03-22 23:25 ` Wei Liu
2024-03-22 23:42 99% ` Linus Torvalds
2024-03-21 12:55 [GIT PULL] remoteproc updates for v6.9 Bjorn Andersson
2024-03-21 18:08 ` Bjorn Andersson
2024-03-21 18:05 99% ` Linus Torvalds
2024-03-21 13:02 [GIT PULL] Char/Misc driver changes for 6.9-rc1 Greg KH
2024-03-21 13:48 ` Nathan Chancellor
2024-03-21 18:10 99% ` Linus Torvalds
2024-03-21 18:12 99% ` Linus Torvalds
2024-03-21 18:30 ` Nathan Chancellor
2024-03-21 20:28 99% ` Linus Torvalds
2024-03-27 16:56 97% ` Linus Torvalds
2024-03-27 20:26 99% ` Linus Torvalds
2024-03-22 16:52 [PATCH v4 00/16] x86-64: Stack protector and percpu improvements Brian Gerst
2024-03-23 11:39 ` Uros Bizjak
2024-03-23 13:22 ` Brian Gerst
2024-03-23 16:16 94% ` Linus Torvalds
2024-03-23 17:06 96% ` Linus Torvalds
2024-03-22 19:12 [GIT PULL] SCSI postmerge updates for the 6.8+ merge window James Bottomley
2024-03-22 19:55 99% ` Linus Torvalds
2024-03-22 20:24 ` James Bottomley
2024-03-22 20:34 99% ` Linus Torvalds
2024-03-22 23:38 [WIP 0/3] Memory model and atomic API in Rust Boqun Feng
2024-03-22 23:57 ` Kent Overstreet
2024-03-23 0:12 88% ` Linus Torvalds
2024-03-23 0:21 ` Kent Overstreet
2024-03-23 0:36 84% ` Linus Torvalds
2024-03-25 13:56 ` Philipp Stanner
2024-03-25 17:44 80% ` Linus Torvalds
2024-03-25 18:59 ` Kent Overstreet
2024-03-25 19:44 84% ` Linus Torvalds
2024-03-26 0:05 ` Dr. David Alan Gilbert
2024-03-26 3:49 76% ` Linus Torvalds
2024-03-27 16:16 ` comex
2024-03-27 18:50 ` Kent Overstreet
2024-03-27 19:07 89% ` Linus Torvalds
2024-03-27 19:41 ` Kent Overstreet
2024-03-27 20:45 88% ` Linus Torvalds
2024-03-27 21:41 ` Kent Overstreet
2024-03-27 22:57 94% ` Linus Torvalds
2024-04-08 16:02 ` Matthew Wilcox
2024-04-08 17:01 75% ` Linus Torvalds
2024-04-08 18:14 ` Al Viro
2024-04-08 20:05 85% ` Linus Torvalds
2024-03-24 21:56 72% Linux 6.9-rc1 Linus Torvalds
2024-03-25 14:09 [PATCH 1/2] locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending() Uros Bizjak
2024-04-11 13:33 ` [tip: locking/core] " tip-bot2 for Uros Bizjak
2024-04-11 16:31 99% ` Linus Torvalds
2024-03-26 14:38 [GIT PULL] tpmdd changes for v6.9-rc2 Jarkko Sakkinen
2024-03-30 22:32 99% ` Linus Torvalds
2024-03-31 5:57 ` Jarkko Sakkinen
2024-03-31 17:01 99% ` Linus Torvalds
[not found] <CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com>
2023-01-20 3:15 ` PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression) Nick Bowler
2023-01-21 13:31 ` Linux kernel regression tracking (Thorsten Leemhuis)
2024-03-22 4:57 ` Nick Bowler
2024-03-28 19:36 ` Linux regression tracking (Thorsten Leemhuis)
2024-03-28 20:09 95% ` Linus Torvalds
2024-03-31 18:24 [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor Kenny Levinsen
2024-04-22 17:10 ` Linux regression tracking (Thorsten Leemhuis)
2024-04-23 14:59 ` Benjamin Tissoires
2024-04-24 16:56 ` regression fixes sitting in subsystem git trees for a week or longer (was: Re: [PATCH v2] HID: i2c-hid: Revert to await reset ACK before reading report descriptor) Thorsten Leemhuis
2024-04-24 18:53 99% ` Linus Torvalds
2024-03-31 22:05 43% Linux 6.9-rc2 Linus Torvalds
2024-04-02 2:14 [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns kernel test robot
2024-04-03 12:23 ` [PATCH] x86/retpoline: Fix a missing return thunk warning (was: Re: [linus:master] [x86/bugs] 4535e1a417: WARNING:at_arch/x86/kernel/alternative.c:#apply_returns) Borislav Petkov
2024-04-03 16:45 99% ` Linus Torvalds
2024-04-03 17:05 ` Borislav Petkov
2024-04-03 17:13 99% ` Linus Torvalds
2024-04-02 14:11 [GIT PULL] security changes for v6.9-rc3 Roberto Sassu
2024-04-02 19:39 92% ` Linus Torvalds
2024-04-02 19:57 96% ` Linus Torvalds
2024-04-02 21:00 ` Al Viro
2024-04-02 21:35 99% ` Linus Torvalds
2024-04-02 20:53 user-space concurrent pipe buffer scheduler interactions Michael Clark
2024-04-03 16:56 99% ` Linus Torvalds
2024-04-03 20:39 ` Michael Clark
2024-04-03 20:57 99% ` Linus Torvalds
2024-04-03 9:07 [RESEND][PATCH v3] security: Place security_path_post_mknod() where the original IMA call was Roberto Sassu
2024-04-03 16:59 99% ` Linus Torvalds
2024-04-04 22:53 81% More annoying code generation by clang Linus Torvalds
2024-04-06 10:56 ` Ingo Molnar
2024-04-06 12:30 ` Uros Bizjak
2024-04-06 15:39 99% ` Linus Torvalds
2024-04-06 16:04 87% ` Linus Torvalds
2024-04-08 8:49 ` Peter Zijlstra
2024-04-08 18:32 99% ` Linus Torvalds
2024-04-08 19:42 77% ` Linus Torvalds
2024-04-07 20:39 41% Linux 6.9-rc3 Linus Torvalds
2024-04-08 17:47 [PATCH RFC cmpxchg 0/8] Provide emulation for one- and two-byte cmpxchg() Paul E. McKenney
2024-04-08 17:49 ` [PATCH cmpxchg 08/14] parisc: add u16 support to cmpxchg() Paul E. McKenney
2024-04-08 20:10 99% ` Linus Torvalds
2024-04-10 13:24 [GIT PULL] turbostat 2024.04.10 Len Brown
2024-04-10 20:18 99% ` Linus Torvalds
2024-04-11 18:20 ` Len Brown
2024-04-11 19:14 99% ` Linus Torvalds
2024-04-10 16:38 [GIT PULL for v6.9-rc4] media fixes Mauro Carvalho Chehab
2024-04-10 20:53 98% ` Linus Torvalds
2024-04-11 0:10 76% [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements Linus Torvalds
2024-04-11 0:20 99% ` Linus Torvalds
2024-04-11 2:39 96% ` Linus Torvalds
2024-04-11 9:04 ` Christian Brauner
2024-04-11 12:25 ` Christian Brauner
2024-04-11 16:21 99% ` Linus Torvalds
2024-04-11 16:15 93% ` Linus Torvalds
2024-04-11 16:44 ` Charles Mirabile
2024-04-11 17:29 ` Charles Mirabile
2024-04-11 17:35 ` Charles Mirabile
2024-04-11 18:13 99% ` Linus Torvalds
2024-04-11 19:34 89% ` Linus Torvalds
2024-04-12 7:45 ` Christian Brauner
2024-04-12 15:36 99% ` Linus Torvalds
2024-04-11 20:08 ` Charles Mirabile
2024-04-11 20:22 99% ` Linus Torvalds
2024-04-12 9:07 ` Christian Brauner
2024-04-12 17:43 95% ` Linus Torvalds
2024-04-13 9:41 ` Christian Brauner
2024-04-13 15:16 ` Christian Brauner
2024-04-13 17:07 99% ` Linus Torvalds
2024-04-12 14:32 [GIT PULL] tracing: Fixes for v6.9 Steven Rostedt
2024-04-12 16:07 99% ` Linus Torvalds
2024-04-12 16:15 ` Steven Rostedt
2024-04-12 16:20 ` Randy Dunlap
2024-04-12 16:21 99% ` Linus Torvalds
2024-04-12 16:20 99% ` Linus Torvalds
2024-04-12 18:10 [PATCH 0/3] x86/bugs: BHI fixes / improvements - round 2 Josh Poimboeuf
2024-04-12 18:10 ` [PATCH v2 1/3] x86/bugs: Only harden syscalls when needed Josh Poimboeuf
2024-04-15 7:32 ` Nikolay Borisov
2024-04-15 15:16 99% ` Linus Torvalds
2024-04-15 15:27 ` Nikolay Borisov
2024-04-15 15:47 98% ` Linus Torvalds
2024-04-14 20:48 43% Linux 6.9-rc4 Linus Torvalds
2024-04-15 16:35 [PATCH v10 0/5] Introduce mseal jeffxu
2024-04-15 16:35 ` [PATCH v10 1/5] mseal: Wire up mseal syscall jeffxu
2024-04-15 18:12 ` Muhammad Usama Anjum
2024-04-15 18:21 96% ` Linus Torvalds
2024-04-17 23:45 [GIT PULL] Btrfs fixes for 6.9-rc5 David Sterba
2024-04-18 0:14 99% ` Linus Torvalds
2024-04-20 11:12 [PATCH v2] tty: n_gsm: restrict tty devices to attach Tetsuo Handa
2024-04-20 17:34 97% ` Linus Torvalds
2024-04-20 18:02 99% ` Linus Torvalds
2024-04-20 18:05 99% ` Linus Torvalds
2024-04-21 13:28 ` Tetsuo Handa
2024-04-21 16:04 95% ` Linus Torvalds
2024-04-21 17:18 87% ` Linus Torvalds
2024-04-23 15:26 ` Tetsuo Handa
2024-04-23 16:37 99% ` Linus Torvalds
2024-04-21 19:53 47% Linux 6.9-rc5 Linus Torvalds
2024-04-23 16:33 89% [PATCH] tty: add the option to have a tty reject a new ldisc Linus Torvalds
2024-04-25 17:45 [GIT PULL] ACPI fixes for v6.9-rc6 Rafael J. Wysocki
2024-04-25 18:58 95% ` Linus Torvalds
2024-04-25 19:01 99% ` Linus Torvalds
2024-04-25 19:18 96% ` Linus Torvalds
2024-04-27 20:00 [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task syzbot
2024-04-27 23:13 ` Hillf Danton
2024-04-28 20:01 91% ` Linus Torvalds
2024-04-28 20:22 96% ` Linus Torvalds
2024-04-28 23:23 ` Hillf Danton
2024-04-29 0:50 99% ` Linus Torvalds
2024-04-29 1:33 75% ` Linus Torvalds
2024-04-29 8:00 ` [PATCH] x86/mm: Remove broken vsyscall emulation code from the page fault code Ingo Molnar
2024-04-29 15:51 99% ` Linus Torvalds
2024-04-29 18:47 95% ` Linus Torvalds
2024-04-29 19:07 98% ` Linus Torvalds
2024-04-29 23:29 ` Andy Lutomirski
2024-04-30 0:05 99% ` Linus Torvalds
2024-04-30 6:16 51% ` [tip: x86/urgent] " tip-bot2 for Linus Torvalds
2024-05-01 7:50 50% ` tip-bot2 for Linus Torvalds
2024-04-28 8:24 [GIT PULL] scheduler fixes Ingo Molnar
2024-04-28 8:42 ` Ingo Molnar
2024-04-28 19:13 99% ` Linus Torvalds
2024-04-28 20:58 43% Linux 6.9-rc6 Linus Torvalds
2024-04-29 14:47 [PATCH] bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS Matthew Wilcox (Oracle)
2024-04-29 15:32 99% ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).