linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  @ 2024-01-26 22:48 98%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 22:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mathieu Desnoyers, Steven Rostedt, LKML, Linux Trace Devel,
	Masami Hiramatsu, Christian Brauner, Ajay Kaher,
	Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 14:34, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jan 26, 2024 at 05:14:12PM -0500, Mathieu Desnoyers wrote:
> > I would suggest this straightforward solution to this:
> >
> > a) define a EVENTFS_MAX_INODES (e.g. 4096 * 8),
> >
> > b) keep track of inode allocation in a bitmap (within a single page),
> >
> > c) disallow allocating more than "EVENTFS_MAX_INODES" in eventfs.
>
> ... reinventing the IDA?

Guysm, this is a random number that is *so* interesting that I
seriously think we shouldn't have it at all.

End result: nobody should care. Even the general VFS layer doesn't care.

It literally avoids inode number zero, not because it would be a bad
inode number, but simply because of some random historical oddity.

In fact, I don't think we even have a reason for it. We have a commit
2adc376c5519 ("vfs: avoid creation of inode number 0 in get_next_ino")
and that one calls out glibc for not deleting them. That makes no
sense to me, but whatever.

But note how the generic function does *not* try to make them unique,
for example. They are just "unique enough".

The generic function *does* care about being scalable in an SMP
environment. To a disturbing degree. Oh well.

                Linus

^ permalink raw reply	[relevance 98%]

* Re: [GIT PULL] Enable -Wstringop-overflow globally
  @ 2024-01-26 22:36 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 22:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: Gustavo A. R. Silva, Gustavo A. R. Silva, linux-hardening, linux-kernel

On Fri, 26 Jan 2024 at 14:24, Kees Cook <keescook@chromium.org> wrote:
>
> I think xe has some other weird problems too. This may be related (under
> allocating):
>
> ../drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
> ../drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of insufficient size '224' for type 'struct xe_vma' with size '368' [-Walloc-size]
>   806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
>       |                     ^

That code is indeed odd, but there's a comment in the xe_vma definition

        /**
         * @userptr: user pointer state, only allocated for VMAs that are
         * user pointers
         */
        struct xe_userptr userptr;

although I agree that it should probably simply be made a final
variably-sized array instead (and then you make that array size be
0/1).

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  @ 2024-01-26 22:29 99%           ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 22:29 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, LKML, Linux Trace Devel, Masami Hiramatsu,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 14:14, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> I do however have a concern with the approach of using the same
> inode number for various files on the same filesystem: AFAIU it
> breaks userspace ABI expectations.

Virtual filesystems have always done that in various ways.

Look at the whole discussion about the size of the file. Then look at /proc.

And honestly, eventfs needs to be simplified. It's a mess. It's less
of a mess than it used to be, but people should *NOT* think that it's
a real filesystem.

Don't use some POSIX standard as an expectation for things like /proc,
/sys or tracefs.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  @ 2024-01-26 22:26 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 22:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Devel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 14:09, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I'm not at my computer, but when I tried deleting that, it caused issues with the lookup code.

The VSF layer should be serializing all lookups of the same name. If
it didn't, we'd have serious issues on other filesystems.

So you should never get more than one concurrent lookup of one
particular entry, and as long as the dentry exists, you should then
not get a new one. It's one of the things that the VFS layer does to
make things simple for the filesystem.

But it's worth noting that that is about *one* entry. You can get
concurrent lookups in the same directory for different names.

Another thing that worries me is that odd locking that releases the
lock in the middle. I don't understand why you release the
tracefs_mutex() over create_file(), for example. There's a lot of
"take, drop, re-take, re-drop" of that mutex that seems strange.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
  @ 2024-01-26 22:02 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 22:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, David Sterba, linux-btrfs, linux-kernel

On Fri, 26 Jan 2024 at 13:56, Qu Wenruo <wqu@suse.com> wrote:
>
> On 2024/1/27 08:21, Linus Torvalds wrote:
> >
> > Allocation lifetime problems?
>
> Could be, thus it may be better to output the flags of the first page
> for tree-checker.

Note that the fact that it magically went away certainly implies that
it never "really" existed, and that something was using a pointer or
similar.

IOW, this is not some IO that got scribbled over, or a cache that got
corrupted. If it had been real corruption, I would have expected that
it would have stayed around in memory.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
  @ 2024-01-26 21:51 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 21:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, David Sterba, linux-btrfs, linux-kernel

On Fri, 26 Jan 2024 at 13:39, Qu Wenruo <wqu@suse.com> wrote:
>
> Oh, I forgot the most obvious problem.
>
> This means the extent buffer is full of garbage.

Allocation lifetime problems?

> What's the page size of the system? 4K or 16K or 64K?

This is a bog-standard x86-64 system. With 32 cores (and 64 threads),
but there's nothing remotely odd about it, except for the fact that
it's running a very recent kernel...

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  2024-01-26 21:36 98%     ` Linus Torvalds
@ 2024-01-26 21:49 99%       ` Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 21:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Devel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 13:36, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you have more than 4 billion inodes, something is really really wrong.

Btw, once again, the vfs layer function you took this from *does* have
some reason to worry. Somebody might be doing 'pipe()' in a loop.

Also, if your worry is "what if somebody mounts that thing a million
times", the solution to *that* would have been to make it a per-sb
counter, which I think would be cleaner anyway.

But my real issue is that I think you would be *much* better off just
deleting code, instead of adding new code.

For example, what purpose does 'e->dentry' and 'ei->d_childen[]' have?
Isn't that entirely a left-over from the bad old days?

So please try to look at things to *fix* and simplify, not at things
to mess around with and make more complicated.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
    2024-01-26 21:31 99%     ` Linus Torvalds
@ 2024-01-26 21:36 98%     ` Linus Torvalds
  2024-01-26 21:49 99%       ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 21:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Devel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 13:26, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I'd be happy to change that patch to what I originally did before deciding
> to copy get_next_ino():
>
> unsigned int tracefs_get_next_ino(int files)
> {
>         static atomic_t next_inode;
>         unsigned int res;
>
>         do {
>                 res = atomic_add_return(files + 1, &next_inode);
>
>                 /* Check for overflow */
>         } while (unlikely(res < files + 1));
>
>         return res - files;

Still entirely pointless.

If you have more than 4 billion inodes, something is really really wrong.

So why is it ten lines instead of one?

Dammit, this is a number that NOBODY HAS SHOWN IS EVEN WORTH EXISTING
IN THE FIRST PLACE.

So no. I'm not taking this. End of discussion. My point stands: I want
this filesystem *stabilized*, and in s sane format.

Look to *simplify* things. Send me patches that *remove* complexity,
not add new complexity that you have zero evidence is worth it.

Face it, eventfs isn't some kind of "real filesystem". It shouldn't
even attempt to look like one.

If somebody goes "I want to tar this thiing up", you should laugh in
their face and call them names, not say "sure, let me whip up a
50-line patch to make this fragile thing even more complex".

            Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  @ 2024-01-26 21:31 99%     ` Linus Torvalds
  2024-01-26 21:36 98%     ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 21:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Devel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 13:26, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> So we keep the same inode number until something breaks with it, even
> though, using unique ones is not that complicated?

Using unique ones for directories was a trivial cleanup.

The file case is clearly different. I thought it would be the same
trivial one-liner, but nope.

When you have to add 30 lines of code just to get unique inode numbers
that nobody has shown any interest in, it's 30 lines too much.

And when it happens in a filesystem that has a history of copying code
from the VFS layer and having nasty bugs, it's *definitely* too much.

Simplify. If you can clean things up and we have a few release of
not-horrendous-bugs every other day, I may change my mind.

As it is, I feel like I have to waste my time checking all your
patches, and I'm saying "it's not worth it".

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Enable -Wstringop-overflow globally
  @ 2024-01-26 21:22 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 21:22 UTC (permalink / raw)
  To: Gustavo A. R. Silva; +Cc: Kees Cook, linux-hardening, linux-kernel

On Mon, 22 Jan 2024 at 07:29, Gustavo A. R. Silva <gustavoars@kernel.org> wrote:
>
> Enable -Wstringop-overflow globally

I suspect I'll have to revert this.

On arm64, I get a "writing 16 bytes into a region of size 0" in the Xe driver

   drivers/gpu/drm/xe/xe_gt_pagefault.c:340

but I haven't looked into it much yet.

It's not some gcc-11 issue, though, this is with gcc version 13.2.1

It looks like the kernel test robot reported this too (for s390), at

    https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/

and in that case it was gcc-13.2.0.

So I don't think the issue is about gcc-11 at all, but about other
random details.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Have inodes have unique inode numbers
  @ 2024-01-26 20:25 94% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 20:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Devel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

Steven,
 stop making things more complicated than they need to be.

And dammit, STOP COPYING VFS LAYER FUNCTIONS.

It was a bad idea last time, it's a horribly bad idea this time too.

I'm not taking this kind of crap.

The whole "get_next_ino()" should be "atomic64_add_return()". End of story.

You arent' special. If the VFS functions don't work for you, you don't
use them, but dammit, you also don't then steal them without
understanding what they do, and why they were necessary.

The reason get_next_ino() is critical is because it's used by things
like pipes and sockets etc that get created at high rates, the the
inode numbers most definitely do not get cached.

You copied that function without understanding why it does what it
does, and as a result your code IS GARBAGE.

AGAIN.

Honestly, kill this thing with fire. It was a bad idea. I'm putting my
foot down, and you are *NOT* doing unique regular file inode numbers
uintil somebody points to a real problem.

Because this whole "I make up problems, and then I write overly
complicated crap code to solve them" has to stop,.

No more. This stops here.

I don't want to see a single eventfs patch that doesn't have a real
bug report associated with it. And the next time I see you copying VFS
functions (or any other core functions) without udnerstanding what the
f*ck they do, and why they do it, I'm going to put you in my
spam-filter for a week.

I'm done. I'm really *really* tired of having to look at eventfs garbage.

              Linus

^ permalink raw reply	[relevance 94%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
    2024-01-22 22:34 99% ` Linus Torvalds
@ 2024-01-26 19:25 98% ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 19:25 UTC (permalink / raw)
  To: David Sterba, Qu Wenruo; +Cc: linux-btrfs, linux-kernel

On Mon, 22 Jan 2024 at 10:34, David Sterba <dsterba@suse.com> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git tags/for-6.8-rc1-tag

I have no idea if this is related to the new fixes, but I have never
seen it before:

  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  BTRFS critical (device dm-0): corrupted node, root=256
block=8550954455682405139 owner mismatch, have 11858205567642294356
expect [256, 18446744073709551360]
  SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
ino=5737268
  SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
ino=5737267

and it caused an actual warning to be printed for my kernel tree from 'git':

   error: failed to stat 'sound/pci/ice1712/se.c': Structure needs cleaning

(and yes, 117 is EUCLEAN, aka "Structure needs cleaning")

The problem seems to have self-corrected, because it didn't happen
when repeating the command, and that file that failed to stat looks
perfectly fine.

But it is clearly worrisome.

The "owner mismatch" check isn't new - it was added back in 5.19 in
commit 88c602ab4460 ("btrfs: tree-checker: check extent buffer owner
against owner rootid"). So something else must have changed to trigger
it.

           Linus

^ permalink raw reply	[relevance 98%]

* Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same
  @ 2024-01-26 19:09 99%                               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 19:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christian Brauner, Geert Uytterhoeven, Kees Cook, linux-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Al Viro, Ajay Kaher

On Fri, 26 Jan 2024 at 08:26, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 26 Jan 2024 11:11:39 +0100
> Christian Brauner <brauner@kernel.org> wrote:
>
> > The size would be one thing. The other is that tar requires unique inode
> > numbers for all files iirc (That's why we have this whole btrfs problem
> > - let's not get into this here -  where inode numbers aren't unique and
> > are duplicated per subvolume.).
>
> Well, I guess that answers Linus's question about wondering if there's any
> user space program that actually cares what the inodes are for files. The
> answer is "yes" and the program is "tar".

Well, the fact that it hits snapshots, shows that the real problem is
just "tar does stupid things that it shouldn't do".

Yes, inode numbers used to be special, and there's history behind it.
But we should basically try very hard to walk away from that broken
history.

An inode number just isn't a unique descriptor any more. We're not
living in the 1970s, and filesystems have changed.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size
  @ 2024-01-26 19:06 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-26 19:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 10:41, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Fine, but I still plan on sending you the update to give all files unique
> inode numbers. If it screws up tar, it could possibly screw up something
> else.

Well, that in many ways just regularizes the code, and the dynamic
inode numbers are actually prettier than the odd fixed date-based one
you picked. I assume it's your birthdate (although I don't know what
the directory ino number was).

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Give files a default of PAGE_SIZE size
  @ 2024-01-26 18:31 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-26 18:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Christian Brauner, Ajay Kaher, Geert Uytterhoeven, linux-fsdevel

On Fri, 26 Jan 2024 at 10:18, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> By following what sysfs does, and give files a default size of PAGE_SIZE,
> it allows the tar to work. No event file is greater than PAGE_SIZE.

No, please. Just don't.

Nobody has asked for this, and nobody sane should use 'tar' on tracefs anyway.

It hasn't worked before, so saying "now you can use tar" is just a
*bad* idea. There is no upside, only downsides, with tar either (a)
not working at all on older kernels or (b) complaining about how the
size isn't reliable on newer ones.

So please. You should *NOT* look at "this makes tar work, albeit badly".

You should look at whether it improves REAL LOADS. And it doesn't. All
it does is add a hack for a bad idea. Leave it alone.

                   Linus

^ permalink raw reply	[relevance 99%]

* [tip: x86/mm] x86/mm: Get rid of conditional IF flag handling in page fault path
  2024-01-25 17:34 89% [PATCH] x86: mm: get rid of conditional IF flag handling in page fault path Linus Torvalds
@ 2024-01-26  9:39 60% ` tip-bot2 for Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: tip-bot2 for Linus Torvalds @ 2024-01-26  9:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Brian Gerst, Denys Vlasenko, H. Peter Anvin, Josh Poimboeuf,
	Uros Bizjak, Sean Christopherson, x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     8f588afe6256c50b3d1f8a671828fc4aab421c05
Gitweb:        https://git.kernel.org/tip/8f588afe6256c50b3d1f8a671828fc4aab421c05
Author:        Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate:    Thu, 25 Jan 2024 09:34:57 -08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 26 Jan 2024 10:27:54 +01:00

x86/mm: Get rid of conditional IF flag handling in page fault path

We had this nonsensical code that would happily handle kernel page
faults with interrupts disabled, which makes no sense at all.

It turns out that this is legacy code that _used_ to make sense, back
when we enabled IRQs as early as possible, and we used to have this code
sequence essentially immediately after reading the faulting address from
the %cr2 register.

Back then, we could have kernel page faults to populate the vmalloc area
with interrupts disabled, and they would need to stay disabled for that
case.

However, the code in question has been moved down in the page fault
handling, and is now in the "handle faults in user addresses" section,
and apparently nobody ever noticed that it no longer makes sense to
handle these page faults with interrupts conditionally disabled.

So replace the conditional IRQ enable:

        if (regs->flags & X86_EFLAGS_IF)
                local_irq_enable();

with an unconditional one, and add a temporary WARN_ON_ONCE() if some
codepath actually does do page faults with interrupts disabled (without
also doing a pagefault_disable(), of course).

NOTE! We used to allow user space to disable interrupts with iopl(3).
That is no longer true since commits:

 a24ca9976843 ("x86/iopl: Remove legacy IOPL option")
 b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")

so the WARN_ON_ONCE() is valid for both the kernel and user situation.

For some of the history relevant to this code, see particularly commit
8c914cb704a1 ("x86_64: actively synchronize vmalloc area when
registering certain callbacks"), which moved this below the vmalloc fault
handling.

Now that the user_mode() check is irrelevant, we can also move the
FAULT_FLAG_USER flag setting down to where the other flag settings are
done.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20240125173457.1281880-1-torvalds@linux-foundation.org
---
 arch/x86/mm/fault.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09c..150e002 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1302,21 +1302,14 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
-	/*
-	 * It's safe to allow irq's after cr2 has been saved and the
-	 * vmalloc fault has been handled.
-	 *
-	 * User-mode registers count as a user access even for any
-	 * potential system fault or CPU buglet:
-	 */
-	if (user_mode(regs)) {
-		local_irq_enable();
-		flags |= FAULT_FLAG_USER;
-	} else {
-		if (regs->flags & X86_EFLAGS_IF)
-			local_irq_enable();
+	/* Legacy check - remove this after verifying that it doesn't trigger */
+	if (WARN_ON_ONCE(!(regs->flags & X86_EFLAGS_IF))) {
+		bad_area_nosemaphore(regs, error_code, address);
+		return;
 	}
 
+	local_irq_enable();
+
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
 	/*
@@ -1332,6 +1325,14 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * We set FAULT_FLAG_USER based on the register state, not
+	 * based on X86_PF_USER. User space accesses that cause
+	 * system page faults are still user accesses.
+	 */
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The

^ permalink raw reply related	[relevance 60%]

* Re: [PATCH] softirq: fix memory corruption when freeing tasklet_struct
  @ 2024-01-25 19:51 84% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-25 19:51 UTC (permalink / raw)
  To: Mikulas Patocka, Tejun Heo
  Cc: Thomas Gleixner, linux-kernel, dm-devel, Mike Snitzer,
	Ignat Korchagin, Damien Le Moal, Bob Liu, Hou Tao,
	Nathan Huckleberry, Peter Zijlstra, Ingo Molnar

On Thu, 25 Jan 2024 at 10:30, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> There's a problem with the tasklet API - there is no reliable way how to
> free a structure that contains tasklet_struct. The problem is that the
> function tasklet_action_common calls task_unlock(t) after it called the
> callback. If the callback does something that frees tasklet_struct,
> task_unlock(t) would write into free memory.

Ugh.

I see what you're doing, but I have to say, I dislike this patch
immensely. It feels like a serious misdesign that is then papered over
with a hack.

I'd much rather see us trying to move away from tasklets entirely in
cases like this. Just say "you cannot do that".

In fact, of the two cases that want this new functionality, at least
dm-verity already makes tasklets a conditional feature that isn't even
enabled by default, and that was only introduced in the last couple of
years.

So I think dm-verity would be better off just removing tasklet use,
and we should check whether there are better models for handling the
latency issue.

The dm-crypt.c case looks different, but similar. I'm not sure why it
doesn't just use the workqueue for the "in interrupt" case. Like
dm-verity, it already does have a workqueue option, and it's a
setup-time option to say "don't use the workqueue for reads / writes".
But it feels like the code should just say "tough luck, in interrupt
context we *will* use workqueues".

So honestly, both of the cases you bring up seem to be just BUGGY. The
fix is not to extend tasklets to a new thing, the fix is to say "those
two uses of tasklets were broken, and should go away".

End result: I would suggest:

 - just get rid of the actively buggy use of tasklets. It's not
necessary in either case.

 - look at introducing a "low-latency atomic workqueue" that looks
*exactly* like a regular workqueue, but has the rule that it's per-cpu
and functions on it cannot sleep

because I think one common issue with workqueues - which are better
designed than tasklets - is that scheduling latency.

I think if we introduced a workqueue that worked more like a tasklet -
in that it's run in softirq context - but doesn't have the interface
mistakes of tasklets, a number of existing workqueue users might
decide that that is exactly what they want.

So we could have a per-cpu 'atomic_wq' that things can be scheduled
on, and that runs from softirqs just like tasklets, and shares the
workqueue queueing infrastructure but doesn't use the workqueue
threads.

Yes, the traditional use of workqueues is to be able to sleep and do
things in process context, so that sounds a bit odd, but let's face
it, we

 (a) already have multiple classes of workqueues

 (b) avoiding deep - and possibly recursive - stack depths is another
reason people use workqueues

 (c) avoiding interrupt context is a real concern, even if you don't
want to sleep

and I really *really* would like to get rid of tasklets entirely.

They started as this very specific hardcoded softirq thing used by
some drivers, and then the notion was generalized.

And I think it was generalized badly, as shown by this example.

I have added Tejun to the cc, so that he can throw his hands up in
horror and say "Linus, you're crazy, your drug-fueled idea would be
horrid because of Xyz".

But *maybe* Tejun has been taking the same drugs I have, and goes
"yeah, that would fit well".

Tejun? Please tell me I'm not on some bad crack..

               Linus

^ permalink raw reply	[relevance 84%]

* Re: [GIT PULL] final round of SCSI updates for the 6.7+ merge window
  @ 2024-01-25 17:56 96%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-25 17:56 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Alexander Gordeev, G, James Bottomley, Andrew Morton, linux-scsi,
	linux-kernel

On Tue, 23 Jan 2024 at 21:36, Theodore Ts'o <tytso@mit.edu> wrote:
>
> If we told those people who wantg to pursue key rotation to just
> always upload keys to the Kernel keyring [..]

As long as the keys exist in the kernel.org keyring, it's all good.

That said, I still claim that nobody has *ever* had a valid and
meaningful reason to have expiry dates, so I want to stop you right
there when you talk about "people who want to pursue key rotation".

The absolute *first* thing you should tell those people is "Why? Don't
bother, it's just added pain for no gain".

It's like revocation keys. To a very close approximation, never in the
history of the universe have they been useful and meaningful.

The fact that the keyservers don't even work any more have made them
even less so, since now the revocations will never really spread
anyway.

So no. Let's not encourage people to do this silly thing.

If you ABSOLUTELY HAVE TO have expiration dates and other silly games,
yes, I will complain if I can't then easily get your key from the
single reliably working remaining setup.

But if you cannot explain exactly why you absolutely need to do it and
have some external entity that forces you to do silly things ("Your
daughter has been kidnapped, and you're not Liam Neeson"), the answer
should not be "remember to update the key at kernel.org", but simply a
plain "DON'T".

               Linus

^ permalink raw reply	[relevance 96%]

* [PATCH] x86: mm: get rid of conditional IF flag handling in page fault path
@ 2024-01-25 17:34 89% Linus Torvalds
  2024-01-26  9:39 60% ` [tip: x86/mm] x86/mm: Get " tip-bot2 for Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-25 17:34 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: arch/x86 maintainers, linux-kernel, Linus Torvalds

We had this nonsensical code that would happily handle kernel page
faults with interrupts disabled, which makes no sense at all.

It turns out that this is legacy code that _used_ to make sense, back
when we enabled irqs as early as possible, and we used to have this code
sequence essentially immediately after reading the faulting address from
register %cr2.

Back then, we could have kernel page faults to populate the vmalloc area
with interrupts disabled, and they would need to stay disabled for that
case.

However, the code in question has been moved down in the page fault
handling, and is now in the "handle faults in user addresses" section,
and apparently nobody ever noticed that it no longer makes sense to
handle these page faults with interrupts conditionally disabled.

So replace the conditional irq enable

        if (regs->flags & X86_EFLAGS_IF)
                local_irq_enable();

with an unconditional one, and add a temporary WARN_ON_ONCE() if some
codepath actually does do page faults with interrupts disabled (without
also doing a pagefault_disable(), of course).

NOTE! We used to allow user space to disable interrupts with iopl(3).
That is no longer true since commits

 a24ca9976843 ("x86/iopl: Remove legacy IOPL option")
 b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage")

so the WARN_ON_ONCE() is valid for both the kernel and user situation.

For some of the history relevant to this code, see particularly commit
8c914cb704a1 ("x86_64: actively synchronize vmalloc area when
registering certain callbacks") which moved this below the vmalloc fault
handling.

Now that the user_mode() check is irrelevant, we can also move the
FAULT_FLAG_USER flag setting down to where the other flag settings are
done.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/mm/fault.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241..150e002e0884 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1302,21 +1302,14 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
-	/*
-	 * It's safe to allow irq's after cr2 has been saved and the
-	 * vmalloc fault has been handled.
-	 *
-	 * User-mode registers count as a user access even for any
-	 * potential system fault or CPU buglet:
-	 */
-	if (user_mode(regs)) {
-		local_irq_enable();
-		flags |= FAULT_FLAG_USER;
-	} else {
-		if (regs->flags & X86_EFLAGS_IF)
-			local_irq_enable();
+	/* Legacy check - remove this after verifying that it doesn't trigger */
+	if (WARN_ON_ONCE(!(regs->flags & X86_EFLAGS_IF))) {
+		bad_area_nosemaphore(regs, error_code, address);
+		return;
 	}
 
+	local_irq_enable();
+
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
 	/*
@@ -1332,6 +1325,14 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * We set FAULT_FLAG_USER based on the register state, not
+	 * based on X86_PF_USER. User space accesses that cause
+	 * system page faults are still user accesses.
+	 */
+	if (user_mode(regs))
+		flags |= FAULT_FLAG_USER;
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
-- 
2.43.0.5.g38fb137bdb


^ permalink raw reply related	[relevance 89%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  @ 2024-01-25 17:17 99%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-25 17:17 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Kees Cook, John Johansen, Paul Moore, Kevin Locke, Josh Triplett,
	Mateusz Guzik, Al Viro, linux-mm, linux-fsdevel, linux-kernel,
	linux-security-module, Kentaro Takeda

On Thu, 25 Jan 2024 at 06:17, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/01/25 3:27, Linus Torvalds wrote:
> > The whole cred use of current->in_execve in tomoyo should
> > *also* be fixed, but I didn't even try to follow what it actually
> > wanted.
>
> Due to TOMOYO's unique domain transition (transits to new domain before
> execve() succeeds and returns to old domain if execve() failed), TOMOYO
> depends on a tricky ordering shown below.

Ok, that doesn't really clarify anything for me.

I'm less interested in what the call paths are, and more like "_Why_
is all this needed for tomoyo?"

Why doesn't tomoyo just install the new cred at "commit_creds()" time?

(The security hooks that surround that  are
"->bprm_committing_creds()" and "->bprm_committed_creds()")

IOW, the whole "save things across two *independent* execve() calls"
seems crazy.

Very strange and confusing.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: Strange EFAULT on mips64el returned by syscall when another thread is forking
  2024-01-24 21:54 93%         ` Linus Torvalds
@ 2024-01-24 22:10 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-24 22:10 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Andreas Schwab, Ben Hutchings, linux-mips, linux-kernel,
	Jiaxun Yang, Thomas Bogendoerfer, libc-alpha

On Wed, 24 Jan 2024 at 13:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> And I think the "fails with any integer in [1, 8)" is because the MIPS
> "copy_from_user()" code is likely doing something special for those
> small copies.

.Lcopy_bytes_checklen\@: does COPY_BYTE(0) for the first access, which is

#define COPY_BYTE(N)                    \
        LOADB(t0, N(src), .Ll_exc\@);   \
        SUB     len, len, 1;            \
        beqz    len, .Ldone\@;          \
        STOREB(t0, N(dst), .Ls_exc_p1\@)

so yeah, for 'copy_to_user()" (which is what that "read (fd, buf, 7)"
will do, we have that user space write ("STOREB()") in the branch
delay slot of the length test.

So that matches.

And it only fails when

 (a) you're unlucky, and that stack buffer

          char buf[16] = {};

     happens to be just under the last page that has been accessed, so
you get a page fault

 (b) you hit a mmap_sem already being locked, presumably because
another thread is doing that fork().

Anyway, I'm pretty sure this is the bug, now some MIPS person just
needs to fix the MIPS version of "instruction_pointer()" to do what
"exception_epc()" already does.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: Strange EFAULT on mips64el returned by syscall when another thread is forking
  @ 2024-01-24 21:54 93%         ` Linus Torvalds
  2024-01-24 22:10 99%           ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-24 21:54 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: Andreas Schwab, Ben Hutchings, linux-mips, linux-kernel,
	Jiaxun Yang, Thomas Bogendoerfer, libc-alpha

On Wed, 24 Jan 2024 at 13:33, Xi Ruoyao <xry111@xry111.site> wrote:
>
> Re-posting the broken test case for Ben (I also added a waitpid call to
> prevent PID exhaustion):

Funky, funky.

>       ssize_t ret = read (fd, buf, 7);
>       if (ret == -1 && errno == EFAULT)
>         abort ();

So I think I have a clue:

> and the "interesting" aspects:
>
> 1. If I change the third parameter of "read" to any value >= 8, it no
> longer fails.  But it fails with any integer in [1, 8).

One change (the only one, really), is that now that MIPS uses
lock_mm_and_find_vma(), it also has this code:

        if (regs && !user_mode(regs)) {
                unsigned long ip = instruction_pointer(regs);
                if (!search_exception_tables(ip))
                        return false;
        }

in case the mmap trylock fails.

That code protects against the deadlock case of "we hold the mmap
lock, and take a kernel page fault due to a bug, and that page fault
happens to be to user space, and the page fault code then deadlocks on
the mmap lock".

It's a rare bug, but it's so nasty to debug that x86 has had that code
pretty much forever, and the lock_mm_and_find_vma() helper got it that
way. MIPS was clearly expecting kernel debugging to happen on other
platforms ;)

And I think the "fails with any integer in [1, 8)" is because the MIPS
"copy_from_user()" code is likely doing something special for those
small copies.

And I note that the MIPS extable.c code uses

        fixup = search_exception_tables(exception_epc(regs));

Note the difference: lock_mm_and_find_vma() uses
instruction_pointer(regs), extable.c uses exception_epc(regs).

The former is just "((regs)->cp0_epc)", while the latter is some
complex mess due to MIPS delay slots and isa16.

My *suspicion* is that instruction_pointer() needs to be fixed to do
the same full exception_epc() thing.

But honestly, I absolutely detest delay slots and refuse to touch
anything MIPS for that reason,.

And there could certainly be something else going on too. But that odd
size limitation, and the fact that it only happens on MIPS, does make
me think the above analysis is right.

I guess you could test it by changing the two cases of
'instruction_pointer(regs)' in mm/memory.c to use exception_epc(regs)
instead. It will only build on MIPS, but for *testing* that theory
out, it's fine.

Over to MIPS people..

                        Linus

^ permalink raw reply	[relevance 93%]

* Re: [PATCH] exec: Check __FMODE_EXEC instead of in_execve for LSMs
  @ 2024-01-24 20:47 91%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-24 20:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jann Horn, Josh Triplett, Kevin Locke, John Johansen, Paul Moore,
	James Morris, Serge E. Hallyn, Kentaro Takeda, Tetsuo Handa,
	Alexander Viro, Christian Brauner, Jan Kara, Eric Biederman,
	Andrew Morton, Sebastian Andrzej Siewior, linux-fsdevel,
	linux-mm, apparmor, linux-security-module, linux-kernel,
	linux-hardening

On Wed, 24 Jan 2024 at 12:15, Kees Cook <keescook@chromium.org> wrote:
>
> Hmpf, and frustratingly Ubuntu (and Debian) still builds with
> CONFIG_USELIB, even though it was reported[2] to them almost 4 years ago.

Well, we could just remove the __FMODE_EXEC from uselib.

It's kind of wrong anyway.

Unlike a real execve(), where the target executable actually takes
control and you can't actually control it (except with ptrace, of
course), 'uselib()' really is just a wrapper around a special mmap.

And you can see it in the "acc_mode" flags: uselib already requires
MAY_READ for that reason. So you cannot uselib() a non-readable file,
unlike execve().

So I think just removing __FMODE_EXEC would just do the
RightThing(tm), and changes nothing for any sane situation.

In fact, I don't think __FMODE_EXEC really ever did anything for the
uselib() case, so removing it *really* shouldn't matter, and only fix
the new AppArmor / Tomoyo use.

Of course, as you say, not having CONFIG_USELIB enabled at all is the
_truly_ sane thing, but the only thing that used the FMODE_EXEC bit
were landlock and some special-case nfs stuff.

And at least the nfs stuff was about "don't require read permissions
for exec", which was already wrong for the uselib() case as per above.

So I think the simple oneliner is literally just

  --- a/fs/exec.c
  +++ b/fs/exec.c
  @@ -128,7 +128,7 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
        struct filename *tmp = getname(library);
        int error = PTR_ERR(tmp);
        static const struct open_flags uselib_flags = {
  -             .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
  +             .open_flag = O_LARGEFILE | O_RDONLY,
                .acc_mode = MAY_READ | MAY_EXEC,
                .intent = LOOKUP_OPEN,
                .lookup_flags = LOOKUP_FOLLOW,

but I obviously have nothing that uses uselib(). I don't see how it
really *could* break anything, though, exactly because of that

                .acc_mode = MAY_READ | MAY_EXEC,

that means that the *regular* permission checks already require the
file to be readable. Never mind any LSM checks that might be confused.

           Linus

^ permalink raw reply	[relevance 91%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  @ 2024-01-24 19:41 99%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-24 19:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kentaro Takeda, Tetsuo Handa, John Johansen, Paul Moore,
	Kevin Locke, Josh Triplett, Mateusz Guzik, Al Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-security-module

On Wed, 24 Jan 2024 at 11:02, Kees Cook <keescook@chromium.org> wrote:
>
> Yup. Should I post a formal patch, or do you want to commit what you've
> got (with the "file" -> "f" fix)?

I took your formal patch. Thanks,

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  2024-01-24 18:27 89%             ` Linus Torvalds
@ 2024-01-24 18:29 99%               ` Linus Torvalds
      2 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-24 18:29 UTC (permalink / raw)
  To: Kees Cook, Kentaro Takeda, Tetsuo Handa, John Johansen, Paul Moore
  Cc: Kevin Locke, Josh Triplett, Mateusz Guzik, Al Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-security-module

On Wed, 24 Jan 2024 at 10:27, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> UNTESTED

.. and just to check who is awake, I used 'file->f_flags &
__FMODE_EXEC' in tomoyo when 'file' doesn't exist as a variable.

It should be 'f->f_flags & __FMODE_EXEC'.

That way it at least compiles.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  2024-01-24 17:27 99%           ` Linus Torvalds
@ 2024-01-24 18:27 89%             ` Linus Torvalds
  2024-01-24 18:29 99%               ` Linus Torvalds
                                 ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-01-24 18:27 UTC (permalink / raw)
  To: Kees Cook, Kentaro Takeda, Tetsuo Handa, John Johansen, Paul Moore
  Cc: Kevin Locke, Josh Triplett, Mateusz Guzik, Al Viro, linux-mm,
	linux-fsdevel, linux-kernel, linux-security-module

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On Wed, 24 Jan 2024 at 09:27, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, I think the goal here should be "minimal fix" followed by "remove
> that horrendous thing".

Ugh. The tomoyo use is even *more* disgusting, in how it uses it for
"tomoyo_domain()" entirely independently of even the ->file_open()
callback.

So for tomoyo, it's not about the file open, it's about
tomoyo_cred_prepare() and friends.

So the patch I posted probably fixes apparmor, but only breaks tomoyo
instead, because tomoyo really does seem to use it around the whole
security_bprm_creds_for_exec() thing.

Now, tomoyo *also* uses it for the file_open() callback, just to confuse things.

IOW, I think the right thing to do is to split this in two:

 - leave the existing ->in_execve for the bprm_creds dance in
boprm_execve(). Horrendous and disgusing.

 - the ->file_open() thing is changed to check file->f_flags

(with a comment about how FMODE_EXEC is in f_flags, not f_mode like it
should be).

IOW, I think the patch I posted earlier - and Kees' version of the
same thing - is just broken. This attached patch might work.

And as noted, since it checks __FMODE_EXEC, it now allows the uselib()
case too. I think that's ok.

UNTESTED. But I think this is at least a movement in the right
direction. The whole cred use of current->in_execve in tomoyo should
*also* be fixed, but I didn't even try to follow what it actually
wanted.

           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1366 bytes --]

 security/apparmor/lsm.c  | 4 +++-
 security/tomoyo/tomoyo.c | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 7717354ce095..98e1150bee9d 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -469,8 +469,10 @@ static int apparmor_file_open(struct file *file)
 	 * Cache permissions granted by the previous exec check, with
 	 * implicit read and executable mmap which are required to
 	 * actually execute the image.
+	 *
+	 * Illogically, FMODE_EXEC is in f_flags, not f_mode.
 	 */
-	if (current->in_execve) {
+	if (file->f_flags & __FMODE_EXEC) {
 		fctx->allow = MAY_EXEC | MAY_READ | AA_EXEC_MMAP;
 		return 0;
 	}
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 3c3af149bf1c..e8fb02b716aa 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -327,8 +327,9 @@ static int tomoyo_file_fcntl(struct file *file, unsigned int cmd,
  */
 static int tomoyo_file_open(struct file *f)
 {
-	/* Don't check read permission here if called from execve(). */
-	if (current->in_execve)
+	/* Don't check read permission here if execve(). */
+	/* Illogically, FMODE_EXEC is in f_flags, not f_mode. */
+	if (file->f_flags & __FMODE_EXEC)
 		return 0;
 	return tomoyo_check_open_permission(tomoyo_domain(), &f->f_path,
 					    f->f_flags);

^ permalink raw reply related	[relevance 89%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  @ 2024-01-24 17:27 99%           ` Linus Torvalds
  2024-01-24 18:27 89%             ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-24 17:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kevin Locke, John Johansen, Josh Triplett, Mateusz Guzik,
	Al Viro, linux-mm, linux-fsdevel, linux-kernel,
	linux-security-module

On Wed, 24 Jan 2024 at 09:21, Kees Cook <keescook@chromium.org> wrote:
>
> I opted to tie "current->in_execve" lifetime to bprm lifetime just to
> have a clean boundary (i.e. strictly in alloc/free_bprm()).

Honestly, the less uinnecessary churn that horrible flag causes, the better.

IOW, I think the goal here should be "minimal fix" followed by "remove
that horrendous thing".

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  2024-01-24 16:54 99%     ` Linus Torvalds
@ 2024-01-24 17:10 93%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-24 17:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kevin Locke, John Johansen, Josh Triplett, Mateusz Guzik,
	Al Viro, linux-mm, linux-fsdevel, linux-kernel,
	linux-security-module

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

On Wed, 24 Jan 2024 at 08:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. That whole thing is disgusting. I think it should have checked
> FMODE_EXEC, and I have no idea why it doesn't.

Maybe because FMODE_EXEC gets set for uselib() calls too? I dunno. I
think it would be even better if we had the 'intent' flags from
'struct open_flags' available, but they aren't there in the
file_open() security chain.

Anyway, moving current->in_execve earlier looks fairly trivial, but I
worry about the randomness. I'd be *so*( much happier if this crazy
flag went away, and it got changed to look at the open intent instead.

Attached patch is ENTIRELY UNTESTED. And disgusting.

I went back and looked. This whole disgusting thing goes back to 2009
and commit f9ce1f1cda8b ("Add in_execve flag into task_struct").

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1795 bytes --]

 fs/exec.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8cdd5b2dd09c..fc1d6befe830 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1843,7 +1843,6 @@ static int bprm_execve(struct linux_binprm *bprm)
 	 * where setuid-ness is evaluated.
 	 */
 	check_unsafe_exec(bprm);
-	current->in_execve = 1;
 	sched_mm_cid_before_execve(current);
 
 	sched_exec();
@@ -1860,7 +1859,6 @@ static int bprm_execve(struct linux_binprm *bprm)
 	sched_mm_cid_after_execve(current);
 	/* execve succeeded */
 	current->fs->in_exec = 0;
-	current->in_execve = 0;
 	rseq_execve(current);
 	user_events_execve(current);
 	acct_update_integrals(current);
@@ -1879,7 +1877,6 @@ static int bprm_execve(struct linux_binprm *bprm)
 
 	sched_mm_cid_after_execve(current);
 	current->fs->in_exec = 0;
-	current->in_execve = 0;
 
 	return retval;
 }
@@ -1910,6 +1907,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	/* We're below the limit (still or again), so we don't want to make
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
+	current->in_execve = 1;
 
 	bprm = alloc_bprm(fd, filename, flags);
 	if (IS_ERR(bprm)) {
@@ -1965,6 +1963,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	free_bprm(bprm);
 
 out_ret:
+	current->in_execve = 0;
 	putname(filename);
 	return retval;
 }
@@ -1985,6 +1984,7 @@ int kernel_execve(const char *kernel_filename,
 	if (IS_ERR(filename))
 		return PTR_ERR(filename);
 
+	current->in_execve = 1;
 	bprm = alloc_bprm(fd, filename, 0);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
@@ -2024,6 +2024,7 @@ int kernel_execve(const char *kernel_filename,
 out_free:
 	free_bprm(bprm);
 out_ret:
+	current->in_execve = 0;
 	putname(filename);
 	return retval;
 }

^ permalink raw reply related	[relevance 93%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  2024-01-24 16:46 99%   ` Linus Torvalds
@ 2024-01-24 16:54 99%     ` Linus Torvalds
  2024-01-24 17:10 93%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-24 16:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kevin Locke, John Johansen, Josh Triplett, Mateusz Guzik,
	Al Viro, linux-mm, linux-fsdevel, linux-kernel,
	linux-security-module

On Wed, 24 Jan 2024 at 08:46, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If the code ends up deciding "is this an exec" based on some state
> flag that hasn't been set, that would explain it.
>
> Something like "current->in_execve", perhaps?

Yeah, that looks like exactly what some of the security layer is testing.

Hmm. That whole thing is disgusting. I think it should have checked
FMODE_EXEC, and I have no idea why it doesn't.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper
  @ 2024-01-24 16:46 99%   ` Linus Torvalds
  2024-01-24 16:54 99%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-24 16:46 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kevin Locke, John Johansen, Josh Triplett, Mateusz Guzik,
	Al Viro, linux-mm, linux-fsdevel, linux-kernel,
	linux-security-module

On Wed, 24 Jan 2024 at 08:35, Kees Cook <keescook@chromium.org> wrote:
>
> Oh, yikes. This means the LSM lost the knowledge that this open is an
> _exec_, not a _read_.
>
> I will starting looking at this. John might be able to point me in the
> right direction more quickly, though.

One obvious change in -rc1 is that the exec open was moved much
earlier: commit 978ffcbf00d8 ("execve: open the executable file before
doing anything else").

If the code ends up deciding "is this an exec" based on some state
flag that hasn't been set, that would explain it.

Something like "current->in_execve", perhaps?

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 34/82] ipc: Refactor intentional wrap-around calculation
  @ 2024-01-23 18:06 93%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-23 18:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-hardening, Andrew Morton, Liam R. Howlett, Mark Brown,
	Mike Kravetz, Vasily Averin, Alexander Mikhalitsyn,
	Gustavo A. R. Silva, Bill Wendling, Justin Stitt, linux-kernel

On Mon, 22 Jan 2024 at 17:38, Kees Cook <keescook@chromium.org> wrote:
>
> I've tried to find the right balance between not enough details and too
> much. I guess I got it wrong.

My complaint isn't about the level of detail.

My complaint is about how the commit log IS ACTIVELY MISLEADING
GARBAGE and does not match the actual patch in any way, shape, or
form.

It talks about completely irrelevant issues that simply have nothing
to do with it.

It talks about undefined behavior and about a "unsigned wrap-around
sanitizer[2]", which is nonsensical, since there is no undefined
behavior to sanitize. It literally gives a link to a github "issue"
for that claim, but when you follow the link, it's actually about
*signed* overflow, which is something entirely different.

And honestly, the patch itself is garbage. The code is fine. Any
"sanitizer" that complains about that code is pure and utter shite.

Really.

If you actually have some real "detect unsigned wraparound" tool
(NOTE: that is *NOT* undefined behavior, and that is *NOT* a
"sanitizer", it's at most some helpful checker), then such a tool had
better recognize the perfectly fine traditional idiom for this, which
is to do the addition and check that the result is smaller. Like the
code does.

See what I'm saying? The patch is garbage. Any sanitizer that would
complain about the old code is garbage. And the commit message is
worse than garbage, it is actively misleading to the point that I'd
call it lying, trying to confuse the issues by bringing up things that
are utterly and entirely irrelevant to the patch.

So:

 - get rid of that commit message that is lying garbage

 - fix the so-called "sanitizer".

 - stop calling the unsigned wrap-around a "sanitizer" and talking
about "undefined behavior" in the same sentence, since it's neither.

Do you really not see why I think that thing is actively *WRONG*?

           Linus

^ permalink raw reply	[relevance 93%]

* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
       [not found]           ` <27c3d1e9-5933-47a9-9c33-ff8ec13f40d3@amd.com>
@ 2024-01-23  1:25 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-23  1:25 UTC (permalink / raw)
  To: Bhardwaj, Rajneesh
  Cc: Steven Rostedt, LKML, Felix Kuehling, Christian König, dri-devel

On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh
<rajneesh.bhardwaj@amd.com> wrote:
>
> I think a fix might already be in flight. Please see  Linux-Kernel Archive: Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers (iu.edu)

Please use lore.kernel.org that doesn't corrupt whitespace in patches
or lose header information:

  https://lore.kernel.org/lkml/20240113213347.9562-1-pchelkin@ispras.ru/

although that seems to be a strange definition of "in flight". It was
sent out 8 days ago, and apparently nobody thought to include it in
the drm fixes pile that came in last Friday.

So it made it into rc1, even though it was reported a week before.

It also looks like some mailing list there is mangling emails - if you
use 'all' instead of 'lkml', lore reports multiple emails with the
same message-id, and it all looks messier as a result.

I assume it's dri-devel@lists.freedesktop.org that messes up, mainly
because I don't tend to see this behaviour when only the usual
kernel.org mailing lists are involved.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 34/82] ipc: Refactor intentional wrap-around calculation
  @ 2024-01-23  1:07 94%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-23  1:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-hardening, Andrew Morton, Liam R. Howlett, Mark Brown,
	Mike Kravetz, Vasily Averin, Alexander Mikhalitsyn,
	Gustavo A. R. Silva, Bill Wendling, Justin Stitt, linux-kernel

On Mon, 22 Jan 2024 at 16:46, Kees Cook <keescook@chromium.org> wrote:
>
> Refactor open-coded unsigned wrap-around addition test to use
> check_add_overflow(),

NAK.

First off, none of this has anything to do with -fno-strict-overflow.
We do that, because without it gcc ends up doing various odd and
surprising things, the same way it does with strict-aliasing.

IOW, you should think of -fno-strict-overflow as a hardening thing.
Any optimization that depends on "this can overflow, so I can do
anything I want" is just a dangerous optimization for the kernel.

It matches -fno-strict-aliasing and -fno-delete-null-pointer-checks,
in other words.

And I do not understand why you mention it in the first place, since
this code USES UNSIGNED INTEGER ARITHMETIC, and thus has absolutely
nothing to do with that no-strict-overflow flag.

So the commit message is actively misleading and broken. Unsigned
arithmetic has very well-defined behavior, and the code uses that with
a very traditional and valid test.

The comment about "redundant open-coded addition" is also PURE
GARBAGE, since the compiler will trivially do the CSE - and on the
source code level your modified code is actively bigger and uglier.

So your patch improves neither code generation or source code.

And if there's some unsigned wrap-around checker that doesn't
understand this traditional way of doing overflow checking, that piece
of crap needs fixing.

I don't want to see mindless conversion patches that work around some
broken tooling.

I want to see them even less when pretty much EVERY SINGLE WORD in the
commit message seems to be actively misleading and irrelevant garbage.

Stop making the world a worse place.

                 Linus

^ permalink raw reply	[relevance 94%]

* Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
  @ 2024-01-23  0:43 99%     ` Linus Torvalds
       [not found]           ` <27c3d1e9-5933-47a9-9c33-ff8ec13f40d3@amd.com>
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-23  0:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Rajneesh Bhardwaj, Felix Kuehling, Christian König, dri-devel

On Mon, 22 Jan 2024 at 15:17, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Perhaps this is the real fix?

If you send a signed-off version, I'll apply it asap.

Thanks,
                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
  2024-01-22 22:54 99%   ` Linus Torvalds
@ 2024-01-22 23:01 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-22 23:01 UTC (permalink / raw)
  To: David Sterba, Qu Wenruo; +Cc: linux-btrfs, linux-kernel

On Mon, 22 Jan 2024 at 14:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Let me reboot to verify that at least my machine boots.

My tree with that commit reverted does indeed boot:

  Revert "btrfs: zstd: fix and simplify the inline extent decompression"

is working ok for me.

I do not think I have anything odd in my Kconfig, and I didn't see any
messages, and there is nothing logged either - just a hang at boot.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
  2024-01-22 22:34 99% ` Linus Torvalds
@ 2024-01-22 22:54 99%   ` Linus Torvalds
  2024-01-22 23:01 99%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-22 22:54 UTC (permalink / raw)
  To: David Sterba, Qu Wenruo; +Cc: linux-btrfs, linux-kernel

On Mon, 22 Jan 2024 at 14:34, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Bah. These fixes are garbage. Now my machine doesn't even boot. I'm
> bisecting

My bisection says

   1e7f6def8b2370ecefb54b3c8f390ff894b0c51b is the first bad commit

but I'll still have to verify by testing the revert on top of my current tree.

It did revert cleanly, but I also note that if the zstd case is wrong,
I assume the other very similar commits (for zlib and lzo) are
potentially also wrong.

Let me reboot to verify that at least my machine boots.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs fixes for 6.8-rc2
  @ 2024-01-22 22:34 99% ` Linus Torvalds
  2024-01-22 22:54 99%   ` Linus Torvalds
  2024-01-26 19:25 98% ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-22 22:34 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, linux-kernel

On Mon, 22 Jan 2024 at 10:34, David Sterba <dsterba@suse.com> wrote:
>
> please pull the following fixes.

Bah. These fixes are garbage. Now my machine doesn't even boot. I'm
bisecting, but it's between

good: e94dfb7a2935 ("btrfs: pass btrfs_io_geometry into btrfs_max_io_len")

bad: f398e70dd69e ("btrfs: tree-checker: fix inline ref size in error messages")

Not ok.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same
  @ 2024-01-22 22:02 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-22 22:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Christian Brauner, Al Viro, Ajay Kaher, linux-fsdevel

On Mon, 22 Jan 2024 at 13:59, Darrick J. Wong <djwong@kernel.org> wrote:
>
>          though I don't think
> leaking raw kernel pointers is an awesome idea.

Yeah, I wasn't all that comfortable even with trying to hash it
(because I think the number of source bits is small enough that even
with a crypto hash, it's trivially brute-forceable).

See

   https://lore.kernel.org/all/20240122152748.46897388@gandalf.local.home/

for the current patch under discussion (and it contains a link _to_
said discussion).

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same
  2024-01-22 17:39 99%             ` Linus Torvalds
@ 2024-01-22 18:19 94%               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-22 18:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Geert Uytterhoeven, Kees Cook, linux-kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Christian Brauner, Al Viro, Ajay Kaher

[-- Attachment #1: Type: text/plain, Size: 884 bytes --]

On Mon, 22 Jan 2024 at 09:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Actually, why not juist add an inode number to your data structures,
> at least for directories? And just do a static increment on it as they
> get registered?
>
> That avoids the whole issue with possibly leaking kernel address data.

The 'nlink = 1' thing doesn't seem to make 'find' any happier for this
case, sadly.

But the inode number in the 'struct eventfs_inode' looks trivial. And
doesn't even grow that structure on 64-bit architectures at least,
because the struct is already 64-bit aligned, and had only one 32-bit
entry at the end.

On 32-bit architectures the structure size grows, but I'm not sure the
allocation size grows. Our kmalloc() is quantized at odd numbers.

IOW, this trivial patch seems to be much safer than worrying about
some pointer exposure.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1593 bytes --]

 fs/tracefs/event_inode.c | 6 ++++--
 fs/tracefs/internal.h    | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6795fda2af19..0b52ec111cf3 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -395,8 +395,7 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
 	inode->i_op = &eventfs_root_dir_inode_operations;
 	inode->i_fop = &eventfs_file_operations;
 
-	/* All directories will have the same inode number */
-	inode->i_ino = EVENTFS_DIR_INODE_INO;
+	inode->i_ino = ei->ino;
 
 	ti = get_tracefs(inode);
 	ti->flags |= TRACEFS_EVENT_INODE;
@@ -859,6 +858,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
 					 int size, void *data)
 {
 	struct eventfs_inode *ei;
+	static int ino_counter = EVENTFS_DIR_INODE_INO;
 
 	if (!parent)
 		return ERR_PTR(-EINVAL);
@@ -889,6 +889,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
 	INIT_LIST_HEAD(&ei->list);
 
 	mutex_lock(&eventfs_mutex);
+	ei->ino = ++ino_counter;
+
 	if (!parent->is_freed) {
 		list_add_tail(&ei->list, &parent->children);
 		ei->d_parent = parent->dentry;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 12b7d0150ae9..1a574d306ea9 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -64,6 +64,7 @@ struct eventfs_inode {
 		struct llist_node	llist;
 		struct rcu_head		rcu;
 	};
+	unsigned int			ino;
 	unsigned int			is_freed:1;
 	unsigned int			is_events:1;
 	unsigned int			nr_entries:30;

^ permalink raw reply related	[relevance 94%]

* Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same
  2024-01-22 17:37 97%           ` Linus Torvalds
@ 2024-01-22 17:39 99%             ` Linus Torvalds
  2024-01-22 18:19 94%               ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-22 17:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Geert Uytterhoeven, Kees Cook, linux-kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Christian Brauner, Al Viro, Ajay Kaher

On Mon, 22 Jan 2024 at 09:37, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, limiting it to directories will at least somewhat help the
> address leaking.

Actually, why not juist add an inode number to your data structures,
at least for directories? And just do a static increment on it as they
get registered?

That avoids the whole issue with possibly leaking kernel address data.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same
  @ 2024-01-22 17:37 97%           ` Linus Torvalds
  2024-01-22 17:39 99%             ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-22 17:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Geert Uytterhoeven, Kees Cook, linux-kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Christian Brauner, Al Viro, Ajay Kaher

On Mon, 22 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I can add this patch to make sure directory inodes are unique, as it causes
> a regression in find, but keep the file inodes the same.

Yeah, limiting it to directories will at least somewhat help the
address leaking.

However, I also note that you never did the "set i_nlink to one"
trick, which is the traditional thing to do to tell 'find' that it
cannot do its directory optimization thing.

I'm not sure that the nlink trick disables this part of the find
sanity checks, but the *first* thing to check would be something like
this

  --- a/fs/tracefs/inode.c
  +++ b/fs/tracefs/inode.c
  @@ -182,6 +182,7 @@ static int tracefs_getattr(struct mnt_idmap *idmap,

        set_tracefs_inode_owner(inode);
        generic_fillattr(idmap, request_mask, inode, stat);
  +     stat->nlink = 1;
        return 0;
   }

because it might just fix the issue.

Having nlink == 1 is how non-unix filesystems (like FAT etc) indicate
that you can't try to count directory entries to optimize traversal.

And it is possible that that is where the whole find thing comes from,
but who knows, it could be a generic loop detector that runs
independently of the usual link detection.

               Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  @ 2024-01-22  5:10 85%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-22  5:10 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
> >
> > So I don't see how we can make it really cheap (say, less than 5% overhead)
> > without caching pre-accounted objects.
>
> Maybe this is what we want. Now we are down to just SLUB, maybe such
> caching of pre-accounted objects can be in SLUB layer and we can
> decide to keep this caching per-kmem-cache opt-in or always on.

So it turns out that we have another case of SLAB_ACCOUNT being quite
a big expense, and it's actually the normal - but failed - open() or
execve() case.

See the thread at

    https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/

and to see the effect in profiles, you can use this EXTREMELY stupid
test program:

    #include <fcntl.h>

    int main(int argc, char **argv)
    {
        for (int i = 0; i < 10000000; i++)
                open("nonexistent", O_RDONLY);
    }

where the point of course is that the "nonexistent" pathname doesn't
actually exist (so don't create a file called that for the test).

What happens is that open() allocates a 'struct file *' early from the
filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
for it, failt the pathname open, and free it again, which uncharges
the accounting.

Now, in this case, I actually have a suggestion: could we please just
make SLAB_ACCOUNT be something that we do *after* the allocation, kind
of the same way the zeroing works?

IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
slab_post_alloc_hook() do all the "charge the memcg if required".

Obviously that means that now a failure to charge the memcg would have
to then de-allocate things, but that's an uncommon path and would be
marked unlikely and not be in the hot path at all.

Now, the reason I would prefer that is that the *second* step would be to

 (a) expose a "kmem_cache_charge()" function that takes a
*non*-accounted slab allocation, and turns it into an accounted one
(and obviously this is why you want to do everything in the post-alloc
hook: just try to share this code)

 (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
allocations start out unaccounted.

 (c) when we have *actually* looked up the pathname and open the file
successfully, at *that* point we'd do a

        error = kmem_cache_charge(filp_cachep, file);

    in do_dentry_open() to turn the unaccounted file pointer into an
accounted one (and if that fails, we do the cleanup and free it, of
course, exactly like we do when file_get_write_access() fails)

which means that now the failure case doesn't unnecessarily charge the
allocation that never ends up being finalized.

NOTE! I think this would clean up mm/slub.c too, simply because it
would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
rid of the need to carry the "struct obj_cgroup **objcgp" pointer
along until the post-alloc hook: everything would be done post-alloc.

The actual kmem_cache_free() code already deals with "this slab hasn't
been accounted" because it obviously has to deal with allocations that
were done without __GFP_ACCOUNT anyway. So there's no change needed on
the freeing path, it already has to handle this all gracefully.

I may be missing something, but it would seem to have very little
downside, and fix a case that actually is visible right now.

              Linus

^ permalink raw reply	[relevance 85%]

* Linux 6.8-rc1
@ 2024-01-21 22:23 76% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-21 22:23 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So this wasn't the most pleasant merge window, but most of the
unpleasantness was entirely unrelated to the code base and almost
entirely related to nasty weather. Just a few technical hiccups. And
after a very big 6.7 release, 6.8 looks to actually be smaller than
average, although not really all that significantly so.

And while maybe a bit smaller than usual (I blame the holidays),
things generally look pretty normal. The bulk is driver updates (GPU
and networking drivers are the big areas as always, but there's a bit
of everything), but we've also got a fair chunk of filesystem updates
(mainly core vfs, bcachefs, xfs and btrfs) and obviously all the usual
arch updates.

The rest is all over: docs, tooling, core kernel, mm and networking.
My mergelog below gives some kind of high-level overview.

Let the testing and calming down begin,

                 Linus

--

Al Viro (6):
    minixfs updates
    rename updates
    dcache updates
    misc filesystem updates
    nfsctl update
    bcachefs locking fix

Alex Williamson (1):
    VFIO updates

Alexander Gordeev (2):
    s390 updates
    more s390 updates

Alexandre Belloni (2):
    i3c updates
    RTC updates

Amir Goldstein (1):
    overlayfs updates

Andreas Gruenbacher (1):
    gfs2 updates

Andrew Morton (3):
    MM updates
    non-MM updates
    misc hotfixes

Aneesh Kumar (1):
    powerpc fixes

Anna Schumaker (1):
    nfs client updates

Ard Biesheuvel (1):
    EFI updates

Arnaldo Carvalho de Melo (1):
    perf tools updates

Arnd Bergmann (5):
    asm-generic cleanups
    SoC DT updates
    SoC driver updates
    ARM SoC code updates
    ARM SoC defconfig updates

Bartosz Golaszewski (2):
    gpio updates
    gpio fixes

Bjorn Andersson (3):
    rpmsg updates
    remoteproc updates
    hwspinlock updates

Bjorn Helgaas (1):
    pci updates

Borislav Petkov (7):
    EDAC updates
    x86 microcode updates
    misc x86 updates
    x86 paravirt updates
    x86 SEV updates
    x86 cpu feature updates
    x86 RAS updates

Chandan Babu (2):
    xfs updates
    xfs fix

Christian Brauner (8):
    misc vfs updates
    vfs super updates
    vfs mount updates
    vfs rw updates
    vfs cachefiles updates
    vfs iov_iter cleanups
    vfs fixes
    netfs updates

Christoph Hellwig (2):
    dma-mapping updates
    dma-mapping fixes

Chuck Lever (1):
    nfsd updates

Corey Minyard (1):
    IPMI updates

Damien Le Moal (1):
    ata updates

Dan Williams (1):
    CXL (Compute Express Link) updates

Daniel Thompson (1):
    kgdb update

Dave Airlie (3):
    drm updates
    drm fixes
    more drm fixes

Dave Hansen (2):
    x86 SGX updates
    x86 TDX updates

David Howells (1):
    afs updates

David Kleikamp (1):
    jfs updates

David Sterba (1):
    btrfs updates

David Teigland (1):
    dlm updates

Dennis Zhou (1):
    percpu updates

Dmitry Torokhov (1):
    input updates

Eric Biggers (2):
    fscrypt updates
    fscrypt fix

Gabriel Krisman Bertazi (1):
    unicode updates

Gao Xiang (2):
    erofs updates
    erofs fixes

Geert Uytterhoeven (1):
    m68k updates

Greg KH (5):
    char/misc and other driver updates
    driver core updates
    staging driver updates
    tty / serial updates
    USB / Thunderbolt updates

Guenter Roeck (2):
    hwmon updates
    hwmonfix

Hans de Goede (1):
    x86 platform driver updates

Helge Deller (3):
    fbdev updates
    parisc updates
    fbdev fix

Herbert Xu (1):
    crypto updates

Huacai Chen (1):
    LoongArch updates

Ilya Dryomov (1):
    ceph updates

Ingo Molar (1):
    locking updates

Ingo Molnar (16):
    x86 apic updates
    x86 asm updates
    x86 boot updates
    x86 build updates
    x86 cleanups
    x86 core updates
    x86 entry updates
    objtool fixlet
    debugobject update
    generic syscall updates
    CPU hotplug updates
    timer subsystem updates
    irq subsystem updates
    performance events updates
    scheduler updates
    scheduler fix

Ira Weiny (1):
    libnvdimm updates

Jaegeuk Kim (1):
    f2fs update

Jakub Kicinski (1):
    networking fixes

James Bottomley (2):
    SCSI updates
    SCSI updates

Jan Kara (2):
    small quota cleanup
    fsnotify updates

Jarkko Sakkinen (1):
    tpm updates

Jason Gunthorpe (2):
    rdma updates
    iommufd updates

Jassi Brar (1):
    mailbox updates

Jens Axboe (4):
    block updates
    io_uring updates
    io_uring fixes
    block fixes

Jiri Kosina (1):
    HID updates

Joerg Roedel (1):
    iommu updates

Johan Hovold (1):
    GNSS updates

John Johansen (1):
    AppArmor updates

John Paul Adrian Glaubitz (1):
    sh updates

Jonathan Corbet (2):
    documentation update
    documentation fixes

Juergen Gross (1):
    xen updates

Julia Lawall (1):
    coccinelle updates

Kees Cook (3):
    pstore updates
    hardening updates
    strlcpy removal

Kent Overstreet (4):
    bcachefs updates
    header cleanups
    header fix
    more bcachefs updates

Lee Jones (3):
    mfd updates
    LED updates
    backlight updates

Linus Walleij (1):
    pin control updates

Luis Chamberlain (2):
    sysctl updates
    module updates

Mark Brown (4):
    regmap updates
    regulator updates
    spi updates
    spi fix

Masahiro Yamada (1):
    Kbuild updates

Masami Hiramatsu (1):
    probes update

Mauro Carvalho Chehab (1):
    media updates

Max Filippov (1):
    Xtensa updates

Michael Ellerman (1):
    powerpc updates

Michael Tsirkin (1):
    virtio updates

Michal Simek (1):
    microblaze updates

Mickaël Salaün (1):
    Landlock updates

Miguel Ojeda (3):
    Rust updates
    clang-format updates
    auxdisplay update

Mike Rapoport (1):
    memblock update

Mimi Zohar (1):
    integrity updates

Miquel Raynal (1):
    mtd updates

Namjae Jeon (1):
    exfat updates

Neeraj Upadhyay (1):
    RCU updates

Palmer Dabbelt (2):
    RISC-V updates
    more RISC-V updates

Paolo Abeni (1):
    networking updates

Paolo Bonzini (1):
    kvm updates

Paul Moore (3):
    audit updates
    selinux updates
    security module updates

Rafael Wysocki (6):
    ACPI updates
    thermal control updates
    power management updates
    more power management updates
    more ACPI updates
    more thermal control updates

Richard Weinberger (2):
    UBI and UBIFS updates
    UML updates

Rob Herring (2):
    devicetree updates
    devicetree header detangling

Russell King (1):
    ARM updates

Sebastian Reichel (2):
    HSI update
    power supply and reset updates

Shuah Khan (3):
    nolibc updates
    KUnit updates
    kselftest update

Stephen Boyd (1):
    clk updates

Steve French (4):
    smb client fixes
    smb server updates
    more smb server updates
    smb client updates

Steven Rostedt (2):
    tracing updates
    eventfs updates

Takashi Iwai (2):
    sound updates
    sound fixes

Takashi Sakamoto (1):
    firewire updates

Ted Ts'o (1):
    ext4 updates

Tejun Heo (1):
    cgroup updates

Thierry Reding (1):
    pwm updates

Thomas Bogendoerfer (1):
    MIPS updates

Thomas Gleixner (1):
    timer updates

Tzung-Bi Shih (2):
    chrome platform updates
    chrome platform firmware updates

Ulf Hansson (2):
    pmdomain updates
    MMC updates

Uwe Kleine-König (1):
    pwm fixes

Vinod Koul (3):
    soundwire updates
    phy updates
    dmaengine updates

Vlastimil Babka (1):
    slab updates

Will Deacon (2):
    arm64 updates
    arm64 fixes

Wim Van Sebroeck (1):
    watchdog updates

Wolfram Sang (1):
    i2c updates

^ permalink raw reply	[relevance 76%]

* Re: [GIT PULL] More bcachefs updates for 6.8-rc1
  @ 2024-01-21 22:05 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-21 22:05 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs, linux-fsdevel, linux-kernel

On Sun, 21 Jan 2024 at 13:35, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> Hi Linus, another small bcachefs pull. Some fixes, Some refactoring,
> some minor features.

I'm taking this, but only because bcachefs is new.

You need to be aware that the merge window is for *merging*. Not for
new development.

And almost all of the code here is new development.

What you send during the merge window is stuff that should all have
been ready *before* the merge window opened, not whatever random
changes you made during it.

Now, fixes happen any time, but for that argument to work they need to
be real fixes. Not "reorganize the code to make things easier to fix"
with the fix being something small on top of a big change.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] bitmap patches for v6.8
  @ 2024-01-21 21:47 91% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-21 21:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: Linux Kernel Mailing List, Alexandra Winter, Andy Shevchenko,
	Bart Van Assche, Bjorn Helgaas, Chengming Zhou, Dave Jiang,
	Edward Cree, Fenghua Yu, Geert Uytterhoeven, Greg Ungerer,
	Guanjun, Hans Verkuil, Jan Kara, Jens Axboe,
	John Paul Adrian Glaubitz, Mathieu Desnoyers, Michael Kelley,
	Oliver Neukum, Peter Zijlstra, Rasmus Villemoes,
	Sean Christopherson, Takashi Iwai, Tony Lu, Vinod Koul,
	Vitaly Kuznetsov, Wei Liu, Wen Gu, Will Deacon

So I've left this to be my last pull request, because I hate how our
header files are growing, and this part:

 include/linux/find.h | 301 ++++++++++++++++++++++++++++++-
 1 file changed, 297 insertions(+), 4 deletions(-)

in particular.

Nobody includes <linux/find.h> directly, but indirectly pretty much
*every* single kernel C file includes it.

Looking at some basic stats of my dependency files in my tree, 4426 of
4526 object files (~98%) depend on find.h because they get it through
*some* path that ends up with bitmap.h -> find.h.

And honestly, the number of files that actually want the new functions
is basically just a tiny handful. It's also not obvious how useful
those optimizations are, considering that a lot of the loops are
*tiny*. I looked at a few cases, and the size of the bitmap it was
iterating over was often in the 2-4 range, sometimes (like
RTW89_TXCH_NUM) 13, etc.

In radio-shark, you replaced a loop like this

        for (i = 0; i < 2; i++) {

with that for_each_test_and_clear_bit(), and it *really* isn't clear
that it was worth it. It sure wasn't performance-critical to begin
with.

In general, if an "optimization" doesn't have any performance numbers
attached to it, is it an optimization at all?

So I finally ended up pulling this, but after looking at the patch I
went "this is adding more lines than it removes, has no performance
numbers, and grows a core header file that is included by absolutely
everything by a third".

.. and then I decided to just unpull it again.

                  Linus

^ permalink raw reply	[relevance 91%]

* Re: [PATCH] media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c)
  @ 2024-01-21 19:57 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-21 19:57 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Aurelien Jarno, linux-kernel, Bluecherry Maintainers,
	Anton Sviridenko, Andrey Utkin, Ismael Luceno,
	Mauro Carvalho Chehab, open list:SOFTLOGIC 6x10 MPEG CODEC,
	Andy Shevchenko', Andrew Morton',
	Matthew Wilcox, Christoph Hellwig',
	Jason A . Donenfeld, Jiri Slaby, stable, David Laight

On Sun, 14 Jan 2024 at 03:04, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> I'll pick this up as a fix for v6.8.
>
> Linus, if you prefer to pick this up directly, then that's fine as well.

Bah, missed this email, and so a belated note that I picked the patch
up as commit 31e97d7c9ae3.

It even got your Reviewed-by thanks to b4 picking that up automatically.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] final round of SCSI updates for the 6.7+ merge window
  @ 2024-01-21 18:48 96%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-21 18:48 UTC (permalink / raw)
  To: Theodore Ts'o, Alexander Gordeev
  Cc: G, James Bottomley, Andrew Morton, linux-scsi, linux-kernel

On Sat, 20 Jan 2024 at 22:30, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Linus, you haven't been complaining about my key, which hopefully
> means that I'm not causing you headaches

Well, honestly, while I pointed out that if everybody was expiring
keys, I'd have this headache once or twice a week, the reality is that
pretty much nobody is. There's James, you, and a handful of others.

So in practice, I hit this every couple of months, not weekly. And if
I can pick up updates from the usual sources, it's all fine. James'
setup just doesn't match anybody elses, so it's grating.

I do end up having a fair number of signatures that show up as expired
for me in the tree. Some may well be because it's literally an old key
that has been left behind - it may have been fine at the time, but now
it shows as expired. It is what it is, and I'm not going to worry
about it.

But every time I do a pull, and the key doesn't verify, my git hook
gives me a warning, and so those things are a somewhat regular
annoyance just because then I have to go and check.

And I just checked: with James key now fixed, it's currently just
Alexander Gordeev that shows up as recently expired with me not
knowing where to get an update.

That key expired two days ago - I'm pretty sure it was fine last pull.

Alexander?

              Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  2024-01-11 17:42 99%                 ` Linus Torvalds
@ 2024-01-20 22:18 94%                   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-20 22:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Josh Triplett, Kees Cook, Kees Cook, linux-kernel, Alexey Dobriyan

On Thu, 11 Jan 2024 at 09:42, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's the "don't allocate filp until you actually need it" that looks
> nasty. And yes, atomic_open() is part of the problem, but so is the
> fact that wee end up saving some flags in the filp early.

So just an update on this, since I came back to it.

It turns out that the bulk of the cost of the 'struct file *'
allocation is actually the exact same thing that was discussed the
other day about file locking: it's the fact that the file allocation
is done with SLAB_ACCOUNT. See

    https://lore.kernel.org/all/CAHk-=wg_CoTOfkREgaQQA6oJ5nM9ZKYrTn=E1r-JnvmQcgWpSg@mail.gmail.com/

and that thread on the recent file locking accounting discussion.

Now, the allocation itself isn't free either, but the SLAB_ACCOUNT
really does make it noticeable more expensive than it should be.

It's a bit more spread out: the cost of the slab allocation itself is
mainly the (optimized) path that does a cmpxchg and the memset, but
the SLAB_ACCOUNT cost is spread out in mod_objcg_state,
__memcg_slab_post_alloc_hook, obj_cgroup_charge,
__memcg_slab_free_hook).

And that actually opens the door up for a _somewhat_ simple partial
workaround: instead of using SLAB_ACCOUNT, we could just do the memcg
accounting when we set FMODE_OPEN instead, and de-account it when we
free the filp (which checks FMODE_OPEN since other cleanup is
dependent on that anyway).

That would help not just the execve() case, but the whole "open
non-existent file" case too.

And I suspect "open()" with ENOENT is actually way more common than
execve() is. All those open->fstat paths for various perfectly normal
loads.

Anyway, I didn't actually write that patch, but I did want to mention
it as a smaller-scale fix (because getting rid of the 'struct file'
allocation entirely does look somewhat painful).

End result: I committed my "move do_open_execat() to the beginning of
execve()" patch, since it's clearly an improvement on the existing
behavior, and that whole "struct file allocations are unnecessarily
expensive" issue is a separate thing.

              Linus

^ permalink raw reply	[relevance 94%]

* Re: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
  @ 2024-01-20 21:33 99%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-20 21:33 UTC (permalink / raw)
  To: David Laight
  Cc: Stephen Rothwell, Jiri Slaby, linux-kernel, Andy Shevchenko,
	Andrew Morton, Matthew Wilcox (Oracle),
	Christoph Hellwig, Jason A. Donenfeld

[ Going through some pending issues now that I've mostly emptied my pull queue ]

On Wed, 10 Jan 2024 at 14:58, David Laight <David.Laight@aculab.com> wrote:
>
> The first check in __types_ok() can go, the second one (with the '+ 0')
> (added to promote char to int) includes the first one.

That turns out to not be true. An expression like

  min(u8, unsigned int)

is fine because the underlying types are compatible.

But the promotion to 'int' makes the first argument be a signed
integer, and is no longer compatible with the second argument.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] final round of SCSI updates for the 6.7+ merge window
  @ 2024-01-20 19:35 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-20 19:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Sat, 20 Jan 2024 at 11:09, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> It also seems that this magic option combination works better (just
> tried it on an old laptop that had my expired keys)
>
> gpg --auto-key-locate clear,dane --locate-external-key james.bottomley@hansenpartnership.com

So now I have a new subkey.

However, I note that you really do not seem to have gotten the message:

  sub   nistp256 2018-01-23 [S] [expires: 2026-01-16]
        E76040DB76CA3D176708F9AAE742C94CEE98AC85

WTF? What happened to "stop doing these idiotic short expirations"?

What's the advantage of all this stupid and pointless pain? Why didn't
you extend it by AT LEAST five years?

Has the expiration date *EVER* had a single good reason for it?

From a quick git lookup, in the last year I have pulled from 160
people. Imagine if they all set two-year expiration dates. Do the
math: I'd see pointlessly expired keys probably on average once or
twice a week.

Guess why I don't? BECAUSE NOBODY ELSE DOES THAT POINTLESS EXPIRY DANCE.

Why do you insist on being the problem?

Stop it. Really. I'm tired of the pointless extra work. PGP keys are a
disaster, and you keep on making things worse than they need to be.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] final round of SCSI updates for the 6.7+ merge window
  @ 2024-01-20 17:52 97% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-20 17:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Sat, 20 Jan 2024 at 07:26, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> As requested, I did a longer extension of my gpg keys, so my key needs
> refreshing, before you pull, to fix the expiry date.  You can get my
> updates via DANE using:
>
> gpg --auto-key-locate dane --recv D5606E73C8B46271BEAD9ADF814AE47C214854D6

No I can't.

I get

  $ gpg --auto-key-locate dane --recv D5606E73C8B46271BEAD9ADF814AE47C214854D6
  gpg: key 814AE47C214854D6: "James Bottomley
<James.Bottomley@HansenPartnership.com>" not changed
  gpg: Total number processed: 1
  gpg:              unchanged: 1

Fine - maybe I already had the update from the last time...

But no:

  git log --show-signature

says

  commit c25b24fa72c734f8cd6c31a13548013263b26286 (HEAD -> master)
  merged tag 'scsi-misc'
  gpg: Signature made Sat 20 Jan 2024 07:22:08 PST
  gpg:                using ECDSA key E76040DB76CA3D176708F9AAE742C94CEE98AC85
  gpg:                issuer "james.bottomley@hansenpartnership.com"
  gpg: Good signature from "James Bottomley
<James.Bottomley@HansenPartnership.com>" [full]
  gpg:                 aka "James Bottomley <jejb@kernel.org>" [full]
  gpg:                 aka "[jpeg image of size 5254]" [full]
  gpg:                 aka "James Bottomley <jejb@linux.vnet.ibm.com>" [unknown]
  gpg:                 aka "James Bottomley <jejb@linux.ibm.com>" [unknown]
  gpg: Note: This key has expired!
  Primary key fingerprint: D560 6E73 C8B4 6271 BEAD  9ADF 814A E47C 2148 54D6
       Subkey fingerprint: E760 40DB 76CA 3D17 6708  F9AA E742 C94C EE98 AC85

and fighting that ^&%%$^ gpg command to try to figure out why, I still see

  gpg: Note: signature key E742C94CEE98AC85 expired 2024-01-16 11:39:15
  sub   nistp256 2018-01-23 [S] [expired: 2024-01-16]
        E760 40DB 76CA 3D17 6708  F9AA E742 C94C EE98 AC85

and that's the one you signed with.

This mess continues to happen only with your crazy setup.

                        Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  @ 2024-01-20 17:00 94%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-20 17:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thorsten Glaser, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, 20 Jan 2024 at 00:28, H. Peter Anvin <hpa@zytor.com> wrote:
>
> %eiz was something that binutils used to put in when disassembling certain redundant encodings with SIB at some point.

Yeah, it's purely (bad) syntactic sugar for "no register". Somebody
decided that the fact that so many RISC architectures have a "zero
register" means that they should make x86 look like it has a "zero
register" too.

I assume it regularized some very silly decoding issue, but it was horrible.

It's not the worst thing I've ever seen - in objdump output, and it's
easy to just remove with a sed script or a simple search-and-replace
in the editor.  Unlike some of the other "design" choices of objdump.

On that note, does anybody have a better disassembler than objdump? Or
even a script around it to make it more useful? I do use "objdump
--disassemble" a fair amount, and I hate how bad it is.

My pet peeve is the crazy relocation handling (or lack there-of). IOW,
if I do something like

    objdump --disassemble \
        --no-show-raw-insn
        --no-addresses \
        kernel/exit.o

I get output like this:

        call   <delayed_put_task_struct+0x1a>

whis is garbage: it's not calling delayed_put_task_struct+0x1a at all,
that's just "the offset bytes are all zero because the data is in the
relocation".

And if I add "-r" to get relocation info, I get

        call   <delayed_put_task_struct+0x1a>
                        R_X86_64_PLT32  rethook_flush_task-0x4

which shows the raw relocation data, but with truly mind-bogglingly
horrendous syntax.

Is there some sane tool that just does the sane thing and shows this as

        call   rethook_flush_task

which is what the thing actually means?

And no, the llvm-objdump thing isn't any better. It isn't compatible
with the GNU binutils objdump, but it does the same insanely bad
decoding.

            Linus

^ permalink raw reply	[relevance 94%]

* Re: [RFC PATCH v3 11/11] mseal:add documentation
  @ 2024-01-20 16:40 99%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-20 16:40 UTC (permalink / raw)
  To: Theo de Raadt
  Cc: Jeff Xu, Stephen Röttger, Jeff Xu, akpm, keescook, jannh,
	willy, gregkh, jorgelo, groeck, linux-kernel, linux-kselftest,
	linux-mm, pedro.falcato, dave.hansen, linux-hardening

On Sat, 20 Jan 2024 at 07:23, Theo de Raadt <deraadt@openbsd.org> wrote:
>
> There is an one large difference remainig between mimmutable() and mseal(),
> which is how other system calls behave.
>
> We return EPERM for failures in all the system calls that fail upon
> immutable memory (since Oct 2022).
>
> You are returning EACESS.
>
> Before it is too late, do you want to reconsider that return value, or
> do you have a justification for the choice?

I don't think there's any real reason for the difference.

Jeff - mind changing the EACESS to EPERM, and we'll have something
that is more-or-less compatible between Linux and OpenBSD?

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] strlcpy removal for v6.8-rc1
  @ 2024-01-19 23:59 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-19 23:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrew Morton, Andy Shevchenko, Andy Whitcroft,
	Azeem Shaikh, Brian Foster, Dwaipayan Ray, Joe Perches,
	Kent Overstreet, linux-bcachefs, linux-hardening, Lukas Bulwahn

On Fri, 19 Jan 2024 at 14:53, Kees Cook <keescook@chromium.org> wrote:
>
> Sorry, I should have called that out in the PR, but the commit itself
> had my rationale for intentionally leaving those in:
>
>     Leave mentions in Documentation (about its deprecation), and in
>     checkpatch.pl (to help migrate host-only tools/ usage).

Hmm. Yeah, I guess the host tooling is an issue, although there
strlcpy makes a lot more sense since I think it exists in various user
space libraries (while strscpy() is kernel-only).

> If you feel like that's not right, I can either respin or send a
> follow-up patch?

Oh, I already took the pull request, I was just reacting to leftovers.
This is not a big deal.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] strlcpy removal for v6.8-rc1
  @ 2024-01-19 22:00 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-19 22:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrew Morton, Andy Shevchenko, Andy Whitcroft,
	Azeem Shaikh, Brian Foster, Dwaipayan Ray, Joe Perches,
	Kent Overstreet, linux-bcachefs, linux-hardening, Lukas Bulwahn

On Fri, 19 Jan 2024 at 13:14, Kees Cook <keescook@chromium.org> wrote:
>
> The kernel is now free of the strlcpy() API!

.. still mentioned in docs and checkpatch. Maybe remove that too?

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] dma-mapping fixes for Linux 6.8
  @ 2024-01-19  0:52 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-19  0:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, iommu

On Wed, 17 Jan 2024 at 23:30, Christoph Hellwig <hch@infradead.org> wrote:
>
> are available in the Git repository at:
>
>   git.infradead.org:public_git/dma-mapping.git tags/dma-mapping-6.8-2024-01-18

Yeah, that doesn't work at all.  Please fix your scripts to use the
proper public facing side like

   git://git.infradead.org/users/hch/dma-mapping tags/dma-mapping-6.8-2024-01-18

instead of the ssh address you use to upload there.

I've pulled from the proper place, but please don't make me do that
for all your pulls.

In fact, this is the first time this happened, so you must have
changed some workflow for the worse..

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] power-supply changes for 6.8
  @ 2024-01-18  0:11 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-18  0:11 UTC (permalink / raw)
  To: Nathan Chancellor; +Cc: Sebastian Reichel, linux-kernel, linux-pm

On Wed, 17 Jan 2024 at 10:00, Nathan Chancellor <nathan@kernel.org> wrote:
>
> This is missing a fix for building with older compilers:

Dropped from my queue, will wait for a fixed pull request. Thanks for noticing,

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PULL REQUEST] i2c-for-6.8-rc1-fixed
  @ 2024-01-18  0:02 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-18  0:02 UTC (permalink / raw)
  To: Wolfram Sang, Linus Torvalds, linux-i2c, linux-kernel,
	Peter Rosin, Bartosz Golaszewski, Andi Shyti, Kim Phillips

On Wed, 17 Jan 2024 at 13:30, Wolfram Sang <wsa@kernel.org> wrote:
>
>  And a big series for the
> designware-driver needed to be reverted because issues have been
> reported late in the cycle and no incremental fix has been found yet.
> This is the fixed pull requested with a missing revert added.

Honestly, with three quarters of the commits being the broken series,
followed by reverting it, I get the feeling that this would be better
rebased.

I don't like rebasing, but I also don't like "look, we had most of
these commits broken, so we just reverted them all" all noticed before
it even hits my tree.

So I really feel like at that point you go "this branch was a failure"
and start anew - aka rebase. Along with a big explanation of why a
recent rebase ended up happening, so that there is no confusion about
it.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Backlight for v6.8
  @ 2024-01-17 23:38 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-17 23:38 UTC (permalink / raw)
  To: Lee Jones; +Cc: Linux Kernel Mailing List, Daniel Thompson

On Tue, 16 Jan 2024 at 08:42, Lee Jones <lee@kernel.org> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight.git backlight-next-6.8

-ENOSUCHTAG.

Did you forget to push out?

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  @ 2024-01-17 20:20 92%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-17 20:20 UTC (permalink / raw)
  To: Josh Poimboeuf, Vlastimil Babka
  Cc: Jeff Layton, Chuck Lever, Shakeel Butt, Roman Gushchin,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> That's a good point.  If the microbenchmark isn't likely to be even
> remotely realistic, maybe we should just revert the revert until if/when
> somebody shows a real world impact.
>
> Linus, any objections to that?

We use SLAB_ACCOUNT for much more common allocations like queued
signals, so I would tend to agree with Jeff that it's probably just
some not very interesting microbenchmark that shows any file locking
effects from SLAB_ALLOC, not any real use.

That said, those benchmarks do matter. It's very easy to say "not
relevant in the big picture" and then the end result is that
everything is a bit of a pig.

And the regression was absolutely *ENORMOUS*. We're not talking "a few
percent". We're talking a 33% regression that caused the revert:

   https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/

I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
single allocation, it would be much nicer to account at a bigger
granularity, possibly by having per-thread counters first before
falling back to the obj_cgroup_charge. Whatever.

It's kind of stupid to have a benchmark that just allocates and
deallocates a file lock in quick succession spend lots of time
incrementing and decrementing cgroup charges for that repeated
alloc/free.

However, that problem with SLAB_ACCOUNT is not the fault of file
locking, but more of a slab issue.

End result: I think we should bring in Vlastimil and whoever else is
doing SLAB_ACCOUNT things, and have them look at that side.

And then just enable SLAB_ACCOUNT for file locks. But very much look
at silly costs in SLAB_ACCOUNT first, at least for trivial
"alloc/free" patterns..

Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
file lock caches") for the history here.

                 Linus

^ permalink raw reply	[relevance 92%]

* Heads up - effectively offline for now
@ 2024-01-13 21:31 99% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-13 21:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Andrew Morton, Christian Brauner, Al Viro,
	Thomas Gleixner
  Cc: Linux Kernel Mailing List

Just a note to say that the merge window is paused as far as I'm
concerned, because we've lost power and internet thanks to a winter
storm. Of course, this is Oregon, so "storm" here is what some people
would probably consider "somewhat windy", and "winter" here means that
the temperature is approaching -10°C.

There's apparently about 100k people without power, and I doubt our
neighborhood is the priority, so I expect to be without power for some
time still. I hope I'm wrong, but a few years ago it took more than a
week to restore power due to all the downed trees. It's hopefully
nowhere near that, but..

And before anybody says "just go to a Starbucks and work from there",
the scariest thing out there - apart from possibly downed trees and
power lines - is other drivers.  I'll stay put.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-13  1:24 99%                               ` Linus Torvalds
@ 2024-01-13  1:31 99%                                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-13  1:31 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Vincent Guittot, Dietmar Eggemann, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider

On Fri, 12 Jan 2024 at 17:24, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> With a *working* kernel, I get events, setting the frequency to either
> 2.2GHz (idle) or 3.8GHz (work).

Just to fix that - not 3.8Ghz, but in addition to 2.2 I see 2.8 or 3.7:

  ...
  <idle>-0       [034] d.s..   208.340412: cpu_frequency:
state=2200000 cpu_id=34
     cc1-101686  [034] d.h..   208.342402: cpu_frequency:
state=2800000 cpu_id=34
     cc1-101686  [034] d.h..   208.343401: cpu_frequency:
state=3700000 cpu_id=34
      sh-108794  [029] d.h..   216.401014: cpu_frequency:
state=2200000 cpu_id=29
      sh-108794  [029] d....   216.402670: cpu_frequency:
state=2800000 cpu_id=29
genksyms-108565  [029] d.h..   216.404005: cpu_frequency:
state=3700000 cpu_id=29
  ...

etc.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  @ 2024-01-13  1:24 99%                               ` Linus Torvalds
  2024-01-13  1:31 99%                                 ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-13  1:24 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Vincent Guittot, Dietmar Eggemann, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider

On Fri, 12 Jan 2024 at 17:04, Qais Yousef <qyousef@layalina.io> wrote:
>
> That is odd. I can't see how the patch can cause this yet, could you try with
> a different compiler if possible?

I use two different compilers - I do my allmodconfig builds with gcc,
and the kernels I boot with clang, so my problems have been with a
kernel built with

   clang version 17.0.6

but to check that it's not a compiler issue I just did another try
with my current public tip of tree (ie *without* any reverts for this
issue) and gcc:

    gcc version 13.2.1

and the behavior is exactly the same: all cores are stuck at 2.2GHz.

So no, it's not compiler-dependent.

> I usually use perfetto but it should be easy to see frequency updates from
> power/cpu_frequency trace event.
>
>         echo 1 | sudo tee /sys/kernel/tracing/tracing_on
>         echo 1 | sudo tee /sys/kernel/tracing/events/power/cpu_frequency/enable
>         sudo cat /sys/kernel/tracing/trace

Shows absolutely nothing. Or rather, it shows the header with

  # entries-in-buffer/entries-written: 0/0   #P:64

and that's it.

With a *working* kernel, I get events, setting the frequency to either
2.2GHz (idle) or 3.8GHz (work).

IOW, the tracing output is 100% consistent with "that commit breaks everything".

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-12 20:49 99%                         ` Linus Torvalds
@ 2024-01-12 21:04 97%                           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-12 21:04 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Qais Yousef, Dietmar Eggemann, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider

On Fri, 12 Jan 2024 at 12:49, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> All cores stay at 2.2GHz (ok, so there's noise, but we're
> talking "within a couple of MHz of 2.2GHz").

Note: that is true also when every CPU is fully loaded and I do a real
full build.

So the "empty make" is just my quick test that happens to be
single-threaded and should take just 20s. All my real builds slow down
too, because all CPUs stay at the minimum frequency.

And I just verified that Ingo's revert that only reverts two commits
(commit 60ee1706bd11 in the tip tree), makes things work correctly for
me.

Not surprising, since the bisection clearly pointed at just commit
9c0b4bb7f6303c being the one that caused the issue, but I decided to
just double-check anyway.

So with that revert, for the single-threaded case I see 4GHz+ numbers
(they spread from a single CPU to multiple CPUs once you run the
benchmark a few times).

And then when I run a full parallel build (rather than the
single-threaded empty one), the frequencies drop to ~3.85GHz for the
all-cpu case.

                Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-12 20:30 92%                       ` Linus Torvalds
@ 2024-01-12 20:49 99%                         ` Linus Torvalds
  2024-01-12 21:04 97%                           ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-12 20:49 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Qais Yousef, Dietmar Eggemann, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider

On Fri, 12 Jan 2024 at 12:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I will test Vincent's test-patch next.

The patch at

    https://lore.kernel.org/all/ZZ+ixagkxRPYyTCE@vingu-book/

makes absolutely no difference. All cores stay at 2.2GHz (ok, so
there's noise, but we're talking "within a couple of MHz of 2.2GHz").

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  @ 2024-01-12 20:30 92%                       ` Linus Torvalds
  2024-01-12 20:49 99%                         ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-12 20:30 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Qais Yousef, Dietmar Eggemann, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Thomas Gleixner, Juri Lelli, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider

Ok, so testing a bit more. On a working kernel, when I do an empty
"make" (which is the fast test I've used), it's all single-threaded
because it's just 'make' doing tons of stat calls and string
operations.

And "cat /proc/cpuinfo | grep MHz" shows a nice clear signal:

  ...
  cpu MHz : 2200.000
  cpu MHz : 2200.000
  cpu MHz : 4425.339
  cpu MHz : 2200.000
  ...

so it boosts up to the top boost frequency.

Without the revert, doing the same thing, what I see is very
different. It's all just

  ...
  cpu MHz : 2200.000
  cpu MHz : 2200.000
  cpu MHz : 2200.000
  cpu MHz : 2200.000
  ...

which certainly explains why it takes 45s rather than 22s to do a full
empty build.

Btw, the "full empty build" I do is literally just

    timestamp sh -c "make -j128 > ../makes"

where 'timestamp' is my stupid little wrapper program that just shows
elapsed time as the command is progressing (as opposed to "time",
which just shows it at the end).

Side note: that 4425.339 is very much the boost frequency, 'cpupower' reports

  $ cpupower frequency-info
  analyzing CPU 0:
    driver: acpi-cpufreq
    CPUs which run at the same hardware frequency: 0
    CPUs which need to have their frequency coordinated by software: 0
    maximum transition latency:  Cannot determine or is not supported.
    hardware limits: 2.20 GHz - 3.70 GHz
    available frequency steps:  3.70 GHz, 2.80 GHz, 2.20 GHz
    available cpufreq governors: conservative ondemand userspace
powersave performance schedutil
    current policy: frequency should be within 2.20 GHz and 3.70 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
    current CPU frequency: Unable to call hardware
    current CPU frequency: 2.20 GHz (asserted by call to kernel)
    boost state support:
      Supported: yes
      Active: no

and for all I know the scheduler got confused by the fact that it
thinks the hardware limits are 2.2-3.7 GHz. But the 3970X has a boost
frequency of 4.5GHz, and yes, I very much want it.

I will test Vincent's test-patch next.

                Linus

^ permalink raw reply	[relevance 92%]

* Re: [git pull] drm for 6.8
  @ 2024-01-12 19:33 96% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-12 19:33 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Daniel Vetter, dri-devel, LKML

On Wed, 10 Jan 2024 at 11:49, Dave Airlie <airlied@gmail.com> wrote:
>
> Let me know if there are any issues,

Your testing is seriously lacking.

This doesn't even build. The reason seems to be that commit
b49e894c3fd8 ("drm/i915: Replace custom intel runtime_pm tracker with
ref_tracker library") changed the 'intel_wakeref_t' type from a
'depot_stack_handle_t' to 'unsigned long', and as a result did this:

-       drm_dbg(&i915->drm, "async_put_wakeref %u\n",
+       drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
                power_domains->async_put_wakeref);

meanwhile, the Xe driver has this:

  drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h:
        typedef bool intel_wakeref_t;

which has never been valid, but now the build fails with

  drivers/gpu/drm/i915/display/intel_display_power.c: In function
‘print_async_put_domains_state’:
  drivers/gpu/drm/i915/display/intel_display_power.c:408:29: error:
format ‘%lu’ expects argument of type ‘long unsigned int’, but
argument 5 has type ‘int’ [-Werror=format=]

because the drm header files have this disgusting thing where a
*header* file includes a *C* file:

  In file included from ./include/drm/drm_mm.h:51,
                 from drivers/gpu/drm/xe/xe_bo_types.h:11,
                 from drivers/gpu/drm/xe/xe_bo.h:11,
                 from
./drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h:11,
                 from ./drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h:15,
                 from drivers/gpu/drm/i915/display/intel_display_power.c:8:

nasty.

I made it build by fixing that broken Xe compat header file, but this
is definitely *NOT* how things should have worked. How did this ever
get to me without any kind of build testing?

And why the %^!@$% does a header file include a C file? That's wrong
regardless of this bug.

                   Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] first round of SCSI updates for the 6.7+ merge window
  @ 2024-01-12 18:34 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-12 18:34 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: James Bottomley, Andrew Morton, linux-scsi, linux-kernel

On Fri, 12 Jan 2024 at 06:27, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> I'm piping up just because I know how to get the output you want

Oh, I know how to get the output - I can read a man-page.

I'm just saying that the default output is unbelievably bad, and
subkeys are really atrocious from a usability standpoint, with
expiration making things even worse.

And being bad from a usability standpoint here is in the context of
gpg. That's a very low bar to begin with.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] f2fs update for 6.8-rc1
  @ 2024-01-12 18:18 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-12 18:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Jaegeuk Kim, Linux Kernel Mailing List, Linux F2FS Dev Mailing List

On Thu, 11 Jan 2024 at 23:12, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Where would you end up with old_dir_page != NULL and old_dir_entry == NULL?

D'oh.

You are of course right, and I missed that connection. Happily my
merge still works, just isn't as minimal as yours.

I see that Jaegeuk already posted the patch for the cleanup.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] f2fs update for 6.8-rc1
  @ 2024-01-12  5:05 96% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-12  5:05 UTC (permalink / raw)
  To: Jaegeuk Kim, Al Viro
  Cc: Linux Kernel Mailing List, Linux F2FS Dev Mailing List

On Thu, 11 Jan 2024 at 10:28, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git tags/f2fs-for-6.8-rc1

Hmm. I got a somewhat confusing conflict in f2fs_rename().

And honestly, I really don't know what the right resolution is. What I
ended up with was this:

        if (old_is_dir) {
                if (old_dir_entry)
                        f2fs_set_link(old_inode, old_dir_entry,
                                                old_dir_page, new_dir);
                else
                        f2fs_put_page(old_dir_page, 0);
                f2fs_i_links_write(old_dir, false);
        }

which seems to me to be the right thing as a resolution. But I note
that linux-next has something different, and it is because Al said in

      https://lore.kernel.org/all/20231220013402.GW1674809@ZenIV/

that the resolution should just be

        if (old_dir_entry)
                f2fs_set_link(old_inode, old_dir_entry, old_dir_page, new_dir);
        if (old_is_dir)
                f2fs_i_links_write(old_dir, false);

instead.

Now, some of those differences are artificial - old_dir_entry can only
be set if old_is_dir is set, so the nesting difference is kind of a
red herring.

But I feel like that f2fs_put_page() is actually needed, or you end up
with a reference leak.

So despite the fact that Al is never wrong, I ended up going with my
gut, and kept my resolution that is different from linux-next.

End result: I'm now very leery of my merge. On the one hand, I think
it's right. On the other hand, the likelihood that Al is wrong is
pretty low.

So please double- and triple-check that merge, and please send in a
fix for it. Presumably with a comment along the lines of "Al was
right, don't try to overthink things".

Hubris. That's the word for thinking you know better than Al.

                Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] Documentation for 6.8
  @ 2024-01-12  3:53 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-12  3:53 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel

On Mon, 8 Jan 2024 at 10:59, Jonathan Corbet <corbet@lwn.net> wrote:
>
> - The minimum Sphinx requirement has been raised to 2.4.4, following a
>   warning that was added in 6.2.

Well, speaking of warnings, github now has this "dependabot" thing
that warns about bad minimum requirements due to tooling that has
security issues.

And it warns about our "jinja2 < 3.1" requirement, because apparently
that can cause issues:

  "The xmlattr filter in affected versions of Jinja accepts keys
containing spaces. XML/HTML attributes cannot contain spaces, as each
would then be interpreted as a separate attribute. If an application
accepts keys (as opposed to only values) as user input, and renders
these in pages that other users see as well, an attacker could use
this to inject other attributes and perform XSS. Note that accepting
keys as user input is not common or a particularly intended use case
of the xmlattr filter, and an application doing so should already be
verifying what keys are provided regardless of this fix"

with affected versions being marked as < 3.1.3 and fixed in Jinja2 3.1.3

I'm ignoring this github dependabit warning since the issue seems to
be rather irrelevant for our doc use, but I thought I'd mention it.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v4 5/6] add listmount(2) syscall
  @ 2024-01-12  3:40 93%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-12  3:40 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
	Miklos Szeredi, linux-fsdevel, linux-kernel, linux-api,
	linux-man, linux-security-module, Karel Zak, Ian Kent,
	David Howells, Al Viro, Christian Brauner, Amir Goldstein,
	Matthew House, Florian Weimer, Arnd Bergmann

On Thu, 11 Jan 2024 at 15:57, Guenter Roeck <linux@roeck-us.net> wrote:
>
> I wonder if something may be wrong with the definition and use of __m
> for u64 accesses. The code below also fixes the build problem.

Ok, that looks like the right workaround.

> But then I really don't know what
>
> struct __large_struct { unsigned long buf[100]; };
> #define __m(x) (*(struct __large_struct __user *)(x))

This is a historical pattern we've used because the gcc docs weren't
100% clear on what "m" does, and whether it might for example end up
loading the value from memory into a register, spilling it to the
stack, and then using the stack address...

Using the whole "tell the compiler it accesses a big structure" turns
the memory access into "BLKmode" (in gcc terms) and makes sure that
never happens.

NOTE! I'm not sure it ever did happen with gcc, but we have literally
seen clang do that "load from memory, spill to stack, and then use the
stack address for the asm". Crazy, I know. See

  https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/

where I point to clang doing basically exactly that with the "rm"
constraint for another case entirely. I consider it a clang bug, but
happily I've never seen the "load only to spill" in a case where the
"stupid code generation" turned into "actively buggy code generation".

If it ever does, we may need to turn the "m" into a "p" and a memory
clobber, which will generate horrendous code. Or we may just need to
tell clang developers that enough is enough, and that they actually
need to take the asm constraints more seriously.

                Linus

^ permalink raw reply	[relevance 93%]

* Re: [GIT PULL] bcachefs updates for 6.8
  @ 2024-01-11 23:58 99%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-11 23:58 UTC (permalink / raw)
  To: Kees Cook
  Cc: Matthew Wilcox, Kent Overstreet, linux-bcachefs, linux-fsdevel,
	linux-kernel, linux-hardening

On Thu, 11 Jan 2024 at 15:42, Kees Cook <keescook@chromium.org> wrote:
>
> Another ugly idea would be to do a treewide replacement of "func" to
> "func_deprecated", and make "func" just a wrapper for it that is marked
> with __deprecated.

That's probably not a horrible idea, at least when we're talking a
reasonable number of users (ie when we're talking "tens of users" like
strlcpy is now).

We should probably generally rename functions much more aggressively
any time the "signature" changes.

We've had situations where the semantics changed but not enough to
necessarily trigger type warnings, and then renaming things is just a
good thing just to avoid mistakes. Even if it's temporary and you plan
on renaming things back.

And with a coccinelle script (that should be documented in the patch)
it's not necessarily all that painful to do.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] first round of SCSI updates for the 6.7+ merge window
  @ 2024-01-11 23:50 92%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 23:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Thu, 11 Jan 2024 at 15:28, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> You installed the special "make it even harder to use" version didn't
> you?

We call that the standard version. Because "harder to use" comes with
the base package.

You have the same one:

> Because for me (gpg 2.4.3) it gives
>
> jejb@lingrow:~> gpg --list-key E76040DB76CA3D176708F9AAE742C94CEE98AC85
> pub   rsa2048 2011-09-23 [SC] [expires: 2026-03-11]
>       D5606E73C8B46271BEAD9ADF814AE47C214854D6
> uid           [ultimate] James Bottomley
> <James.Bottomley@HansenPartnership.com>
> uid           [ultimate] James Bottomley <jejb@linux.vnet.ibm.com>
> uid           [ultimate] James Bottomley <jejb@kernel.org>
> uid           [ultimate] [jpeg image of size 5254]
> uid           [ultimate] James Bottomley <jejb@linux.ibm.com>
> uid           [ultimate] James Bottomley <jejb@hansenpartnership.com>
> sub   nistp256 2018-01-23 [S] [expires: 2024-01-16]
> sub   nistp256 2018-01-23 [E] [expires: 2024-01-16]
> sub   nistp256 2023-07-20 [A] [expires: 2024-01-16]

Look closer.

NOWHERE there does it mention E76040D.. Nowhere.

Really.

Yeah, it says that a key that I didn't even ask for has subkeys.  It
doesn't say what those subkeys are, nor does it say which one matches
the one I actually asked for.

Yes, you clearly have Stockholm syndrome and think that this is all
normal and exactly what you would expect to see.

I happen to think it's unbelievable garbage, and I think subkeys are
something that makes gpg even harder to use than it would otherwise
be.

Here's a clue: if I ask "ls" to show a file, do you think it would be
ok if "ls" instead said "here's the directory the file is in, and here
are the dates of all the files inside that directory"?

Or would you say that such a program is crap? Honestly now...

And the above is actually being *generous* to gpg. The reality is even
worse. Try this:

   gpg --list-key 37AAA9562C5CBD0C

and notice how it doesn't even list the subkey I asked about. Not even
with '--with-subkey-fingerprint'.

And no, I'm not just making up particularly bad examples. This is the
reality I deal with all the time when people use expiration dates on
their keys.

The above "show my the key" is *literally* the key you used a decade ago:

    git show --oneline --show-signature 233ba2c5ffcf

and this is (one of millions) reason why I despise gpg and subkeys in
particular. That key was valid at the time, and as far as I know
there's no way for git to say "was it expired at the time", so now all
those signatures flag as invalid.

Plus the "--list-key" thing NOT EVEN SHOWING THE KEY I ASKED FOR.

Christ.

Ok, I'm over it now. I just wanted to rant about my least favourite
program ever, and how you trigger all the worst parts of it.

           Linus

^ permalink raw reply	[relevance 92%]

* Re: [GIT PULL] first round of SCSI updates for the 6.7+ merge window
  @ 2024-01-11 22:53 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-11 22:53 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Thu, 11 Jan 2024 at 14:47, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> Well, I did already tell you that I bypass the pgp keyservers because I
> use a DNSSEC based DANE entry instead:
>
> https://lore.kernel.org/all/1564171685.9950.14.camel@HansenPartnership.com/

I think I dimly remember seeing that email.

But honestly, that just reinforces my point: this is yet ANOTHER
magical thing you have to know about gpg, and that nobody buy you use.

So if you insist on using these things that are obscure, you need to
keep reminding people. Every time your keys are close to expiry, send
out an email saying "To update my key, use this magical command line".

If gpg did that auto-locate automatically, and it all JustWorked(tm),
it would be one thing. But that is clearly against the design
principles of pgp and gpg.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] first round of SCSI updates for the 6.7+ merge window
  2024-01-11 22:36 98% ` Linus Torvalds
  @ 2024-01-11 22:47 97%   ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 22:47 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Thu, 11 Jan 2024 at 14:36, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Stop making a bad pgp experience even worse - for no reason and
> absolutely zero upside.

Side note: even getting gpg to show the subkeys was just an exercise
in frustration.

For example, I'd expect that when you do

   gpg --list-key E76040DB76CA3D176708F9AAE742C94CEE98AC85

it would show the details of that key. No, it does not. It doesn't
even *mention* that key.

Because this is gpg, and the project motto was probably "pgp was
designed to be hard to use, and by golly, we'll take that to 11".

And no, adding "-vv" to get more verbose output doesn't help. That
just makes gpg show more *other* keys.

Now, obviously, in order to actually show the key I *asked* gpg to
list, I also have to use the "--with-subkey-fingerprint". OBVIOUSLY.

I can hear everybody go all Homer on me and say "Well, duh, dummy".

So yes, I realize that my frustration with pgp is because I'm just too
stupid to understand how wonderful the UX really is, but my point is
that you're really making it worse by using pointless features that
actively makes it all so much less usable than it already is.

Subkeys and expiration date make a bad experience worse.

Yes, I blame myself for thinking pgp was a good model for tag signing.
What can I say? I didn't expect people to actively try to use every
bad feature.

                Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] first round of SCSI updates for the 6.7+ merge window
  @ 2024-01-11 22:36 98% ` Linus Torvalds
    2024-01-11 22:47 97%   ` Linus Torvalds
  0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-01-11 22:36 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, linux-scsi, linux-kernel

On Wed, 10 Jan 2024 at 12:48, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

Ok, I note that this has been signed with ECDSA key
E76040DB76CA3D176708F9AAE742C94CEE98AC85, and while it is currently
available and up-to-date at kernel.org, it shows as

  sub   nistp256 2018-01-23 [S] [expires: 2024-01-16]
        E76040DB76CA3D176708F9AAE742C94CEE98AC85

note that expiration date: it's three days in the future.

Can I please ask you for the umpteenth time to STOP DICKING AROUND
WITH SHORT EXPIRATION DATES!

The pgp keyservers work *so* badly these days that refreshing keys is
a joke. The whole expiration date thing has always been a bad joke,
and only makes pgp an even worse UX than it already is (and damn,
that's saying a lot - pgp is some nasty stuff).

When you make a new key, or when you extend the expiration date, do it
properly. Give ita lifetime that is a big fraction of a decade. Or
two.

Because your keys constantly end up being expired, and they are making
the experience of pulling from you a pain - because I actually *check*
the keys.

Stop making a bad pgp experience even worse - for no reason and
absolutely zero upside.

                Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH v4 5/6] add listmount(2) syscall
  @ 2024-01-11 20:14 82%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 20:14 UTC (permalink / raw)
  To: Guenter Roeck, Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-api,
	linux-man, linux-security-module, Karel Zak, Ian Kent,
	David Howells, Al Viro, Christian Brauner, Amir Goldstein,
	Matthew House, Florian Weimer, Arnd Bergmann

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Thu, 11 Jan 2024 at 10:57, Guenter Roeck <linux@roeck-us.net> wrote:
>
> Any variance of put_user() with &buf[ctr] or buf + ctr fails
> if ctr is a variable and permitted to be != 0.

Crazy. But the 64-bit put_user() is a bit special and tends to require
more registers (the 64-bit value is passed in two registers), so that
probably then results in the ICE.

Side note: looking at the SH version of __put_user_u64(), I think it's
buggy and is missing the exception handler for the second 32-bit move.
I dunno, I don't read sh asm, but it looks suspicious.

> The following works. Would this be acceptable ?

It might be very easy to trigger this once again if somebody goes "that's silly"

That said, I also absolutely detest the "error handling" in that
function. It's horrible.

Noticing the user access error in the middle is just sad, and if that
was just handled better and at least the range was checked first, the
overflow error couldn't happen and checking for it is thus pointless.

And looking at it all, it really looks like the whole interface is
broken. The "bufsize" argument isn't the size of the buffer at all.
It's the number of entries.

Extra confusingly, in the *other* system call, bufsize is in fact the
size of the buffer.

And the 'ctr' overflow checking is doubly garbage, because the only
reason *that* can happen is that we didn't check the incoming
arguments properly.

Same goes for the whole array_index_nospec() - it's pointless, because
the user controls what that code checks against anyway, so there's no
point to trying to manage some range checking.

The only range checking there that matters would be the one that
put_user() has to do against the address space size, but that's done
by put_user().

End result: that thing needs a rewrite.

The SH put_user64() needs to be looked at too, but in the meantime,
maybe something like this fixes the problems with listmount?

NOTE! ENTIRELY untested, but that naming and lack of argument sanity
checking really is horrendous. We should have caught this earlier.

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2265 bytes --]

 fs/namespace.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ef1fd6829814..df74f4769733 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5043,12 +5043,17 @@ static struct mount *listmnt_next(struct mount *curr)
 }
 
 static ssize_t do_listmount(struct mount *first, struct path *orig, u64 mnt_id,
-			    u64 __user *buf, size_t bufsize,
+			    u64 __user *buf, size_t nentries,
 			    const struct path *root)
 {
 	struct mount *r;
-	ssize_t ctr;
-	int err;
+	const size_t maxentries = (size_t)-1 >> 3;
+	ssize_t ret;
+
+	if (unlikely(nentries > maxentries))
+		return -EFAULT;
+	if (!access_ok(buf, nentries * sizeof(*buf)))
+		return -EFAULT;
 
 	/*
 	 * Don't trigger audit denials. We just want to determine what
@@ -5058,26 +5063,24 @@ static ssize_t do_listmount(struct mount *first, struct path *orig, u64 mnt_id,
 	    !ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
-	err = security_sb_statfs(orig->dentry);
-	if (err)
-		return err;
+	ret = security_sb_statfs(orig->dentry);
+	if (ret)
+		return ret;
 
-	for (ctr = 0, r = first; r && ctr < bufsize; r = listmnt_next(r)) {
+	for (ret = 0, r = first; r && nentries; r = listmnt_next(r)) {
 		if (r->mnt_id_unique == mnt_id)
 			continue;
 		if (!is_path_reachable(r, r->mnt.mnt_root, orig))
 			continue;
-		ctr = array_index_nospec(ctr, bufsize);
-		if (put_user(r->mnt_id_unique, buf + ctr))
+		if (put_user(r->mnt_id_unique, buf))
 			return -EFAULT;
-		if (check_add_overflow(ctr, 1, &ctr))
-			return -ERANGE;
+		buf++, ret++; nentries--;
 	}
-	return ctr;
+	return ret;
 }
 
 SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
-		u64 __user *, buf, size_t, bufsize, unsigned int, flags)
+		u64 __user *, buf, size_t, nentries, unsigned int, flags)
 {
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	struct mnt_id_req kreq;
@@ -5111,7 +5114,7 @@ SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
 	else
 		first = mnt_find_id_at(ns, last_mnt_id + 1);
 
-	ret = do_listmount(first, &orig, mnt_id, buf, bufsize, &root);
+	ret = do_listmount(first, &orig, mnt_id, buf, nentries, &root);
 err:
 	path_put(&root);
 	up_read(&namespace_sem);

^ permalink raw reply related	[relevance 82%]

* Re: [GIT PULL] RCU changes for v6.8
  @ 2024-01-11 19:12 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-11 19:12 UTC (permalink / raw)
  To: Neeraj Upadhyay (AMD)
  Cc: linux-kernel, kernel-team, paulmck, mingo, tglx, rcu, boqun.feng,
	joel, neeraj.upadhyay, urezki, qiang.zhang1211

On Thu, 11 Jan 2024 at 10:33, Neeraj Upadhyay (AMD)
<neeraj.iitr10@gmail.com> wrote:
>
> Please pull the latest RCU git tree from:
>
>   https://github.com/neeraju/linux.git tags/rcu.release.v6.8

Not pulled yet - I have a big pile to go - but an ack that I got the email.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-11 17:45 99%           ` Linus Torvalds
@ 2024-01-11 17:53 99%             ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 17:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider

On Thu, 11 Jan 2024 at 09:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, 11 Jan 2024 at 00:11, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > Could you confirm that cpufreq governor is schedutil and the driver is
> > amd-pstate on your system ?
>
> schedutil yes, amd-pstate no. I actually just use acpi_cpufreq

Bah. Hit 'send' mistakenly too soon, thus the abrupt end and
unfinished quoting removal.

And don't ask me why it's acpi_pstate-driven. I have X86_AMD_PSTATE=y, but

    /sys/devices/system/cpu/cpufreq/policy0/scaling_driver

clearly says 'acpi-cpufreq'. Maybe I'm looking in the wrong place. My dmesg says

    amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled

which is presumably the reason my machine uses acpi-pstate.

I will also test out your other questions, but I need to go back and
do more pull requests first.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  @ 2024-01-11 17:45 99%           ` Linus Torvalds
  2024-01-11 17:53 99%             ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 17:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider

On Thu, 11 Jan 2024 at 00:11, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> Could you confirm that cpufreq governor is schedutil and the driver is
> amd-pstate on your system ?

schedutil yes, amd-pstate no. I actually just use acpi_cpufreq

>
> Also I'm interested by the output of the amd_pstate to confirm that it uses the
> adjust_perf callback
>
> I suppose that you don't use uclamp feature and amd doesn't use EAS so that let
> the change of the min parameter of adjust_perf which was probably always 0
> unless you use deadline scheduler and which now takes into account irq pressure.
>
> Could you try the patch below which restores the previous min value ?
>
> ---
>  kernel/sched/cpufreq_schedutil.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..3fe8ac6ce9cc 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -194,10 +194,11 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
>  static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
>  {
>         unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu);
> +       struct rq *rq = cpu_rq(sg_cpu->cpu);
>
>         util = effective_cpu_util(sg_cpu->cpu, util, &min, &max);
>         util = max(util, boost);
> -       sg_cpu->bw_min = min;
> +       sg_cpu->bw_min = cpu_bw_dl(rq);
>         sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
>  }
>
> @@ -442,7 +443,7 @@ static void sugov_update_single_perf(struct update_util_data *hook, u64 time,
>             sugov_cpu_is_busy(sg_cpu) && sg_cpu->util < prev_util)
>                 sg_cpu->util = prev_util;
>
> -       cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min,
> +       cpufreq_driver_adjust_perf(sg_cpu->cpu, map_util_perf(sg_cpu->bw_min),
>                                    sg_cpu->util, max_cap);
>
>         sg_cpu->sg_policy->last_freq_update_time = time;
> --
> 2.34.1
>
>
> >
> > I'll keep that revert in my private test-tree for now (so that I have
> > a working machine again), but I'll move it to my main branch soon
> > unless somebody has a quick fix for this problem.
> >
> >                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-11 17:42 99%                 ` Linus Torvalds
  2024-01-20 22:18 94%                   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11 17:42 UTC (permalink / raw)
  To: Al Viro
  Cc: Josh Triplett, Kees Cook, Kees Cook, linux-kernel, Alexey Dobriyan

On Thu, 11 Jan 2024 at 02:05, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Something like (completely untested) delta below, perhaps?

No, this looks horrible.

This doesn't actually get rid of the early filp allocation for
execve(), it only seems to get rid of the repeated allocation for when
the RCU lookup fails.

And *that* is much easier to get rid of differently: just do the file
allocation in do_filp_open(), instead of path_openat. We'd need to
have some way to make sure that there is no left-over crud from the
RCU path into the next stage, but that doesn't look bad.

So the "path_openat() allocates filp on each invocation" looks fairly easy.

It's the "don't allocate filp until you actually need it" that looks
nasty. And yes, atomic_open() is part of the problem, but so is the
fact that wee end up saving some flags in the filp early.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
    @ 2024-01-11 17:37 99%               ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-11 17:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Josh Triplett, Kees Cook, Kees Cook, linux-kernel, Alexey Dobriyan

On Thu, 11 Jan 2024 at 01:47, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Two things, both related to ->atomic_open():

Yeah, I was staring at the atomic_open() cases, and just thought that
we could allocate the filp early for that.

It wouldn't matter for normal filesystems, so from a performance
standpoint it would be ok.

My handwavy thinking was that we'd remove 'filp' from the arguments we
pass around, and instead make it be a member of 'struct nameidata',
and then the different codepaths could decide that "now I need the
filp, so I'll instantiate it".

But then I looked more at the code, and it seemed to get quite messy,
quite fast.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] bcachefs updates for 6.8
  @ 2024-01-11  1:47 99%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11  1:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Kees Cook, linux-bcachefs, linux-fsdevel, linux-kernel, linux-hardening

On Wed, 10 Jan 2024 at 16:58, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> ...And how does that make any sense? "The warnings weren't getting
> cleaned up, so get rid of them - except not really, just move them off
> to the side so they'll be more annoying when they do come up"...

Honestly,the checkpatch warnings are often garbage too.

The whole deprecation warnings never worked. They don't work in
checkpatch either.

> Perhaps we could've just switched to deprecation warnings being on in a
> W=1 build?

No, because the whole idea of "let me mark something deprecated and
then not just remove it" is GARBAGE.

If somebody wants to deprecate something, it is up to *them* to finish
the job. Not annoy thousands of other developers with idiotic
warnings.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v4 5/6] add listmount(2) syscall
  @ 2024-01-11  0:32 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-11  0:32 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-api,
	linux-man, linux-security-module, Karel Zak, Ian Kent,
	David Howells, Al Viro, Christian Brauner, Amir Goldstein,
	Matthew House, Florian Weimer, Arnd Bergmann

On Wed, 10 Jan 2024 at 14:23, Guenter Roeck <linux@roeck-us.net> wrote:
>
> with this patch in the tree, all sh4 builds fail with ICE.
>
> during RTL pass: final
> In file included from fs/namespace.c:11:
> fs/namespace.c: In function '__se_sys_listmount':
> include/linux/syscalls.h:258:9: internal compiler error: in change_address_1, at emit-rtl.c:2275

We do have those very ugly SYSCALL_DEFINEx() macros, but I'm not
seeing _anything_ that would be odd about the listmount case.

And the "__se_sys" thing in particular is just a fairly trivial wrapper.

It does use that asmlinkage_protect() thing, and it is unquestionably
horrendously ugly (staring too long at <linux/syscalls.h> has been
known to cause madness and despair), but we do that for *every* single
system call and I don't see why the new listmount entry would be
different.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-10 22:41 99%     ` Linus Torvalds
@ 2024-01-10 22:57 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-10 22:57 UTC (permalink / raw)
  To: Ingo Molnar, Vincent Guittot
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider

On Wed, 10 Jan 2024 at 14:41, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's one of these two:
>
>   f12560779f9d sched/cpufreq: Rework iowait boost
>   9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation
>
> one more boot to go, then I'll try to revert whichever causes my
> machine to perform horribly much worse.

I guess it should come as no surprise that the result is

   9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d is the first bad commit

but to revert cleanly I will have to revert all of

      b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
      f12560779f9d ("sched/cpufreq: Rework iowait boost")
      9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
performance estimation")

This is on a 32-core (64-thread) AMD Ryzen Threadripper 3970X, fwiw.

I'll keep that revert in my private test-tree for now (so that I have
a working machine again), but I'll move it to my main branch soon
unless somebody has a quick fix for this problem.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  2024-01-10 22:19 99%   ` Linus Torvalds
@ 2024-01-10 22:41 99%     ` Linus Torvalds
  2024-01-10 22:57 99%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-10 22:41 UTC (permalink / raw)
  To: Ingo Molnar, Vincent Guittot
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider

On Wed, 10 Jan 2024 at 14:19, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Just a note that I'm currently bisecting into this merge for a
> horrendous performance regression.
>
> It makes my empty kernel build go from 22 seconds to 44 seconds, and
> makes a full kernel build enormously slower too.
>
> I haven't finished the bisection, but it's now inside *just* this
> pull, so I can already tell that I'm going to revert something in
> here, because this has been making my merge window miserable.

It's one of these two:

  f12560779f9d sched/cpufreq: Rework iowait boost
  9c0b4bb7f630 sched/cpufreq: Rework schedutil governor performance estimation

one more boot to go, then I'll try to revert whichever causes my
machine to perform horribly much worse.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Scheduler changes for v6.8
  @ 2024-01-10 22:19 99%   ` Linus Torvalds
  2024-01-10 22:41 99%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-10 22:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider

On Mon, 8 Jan 2024 at 06:07, Ingo Molnar <mingo@kernel.org> wrote:
>
> Please pull the latest sched/core git tree from:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2024-01-08

Just a note that I'm currently bisecting into this merge for a
horrendous performance regression.

It makes my empty kernel build go from 22 seconds to 44 seconds, and
makes a full kernel build enormously slower too.

I haven't finished the bisection, but it's now inside *just* this
pull, so I can already tell that I'm going to revert something in
here, because this has been making my merge window miserable.

You've been warned,

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] lsm/lsm-pr-20240105
  @ 2024-01-10 20:22 96%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-10 20:22 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, linux-kernel

On Wed, 10 Jan 2024 at 11:54, Paul Moore <paul@paul-moore.com> wrote:
>
> Thanks for pulling the changes, I'm sorry the syscall table entries
> for the LSM syscalls were not how you want to see them, but I'm more
> than a little confused as to what exactly we did wrong here.

Look at commit 5f42375904b0 ("LSM: wireup Linux Security Module
syscalls") and notice for example this:

  --- a/arch/x86/entry/syscalls/syscall_64.tbl
  +++ b/arch/x86/entry/syscalls/syscall_64.tbl
  @@ -378,6 +378,9 @@
   454    common  futex_wake              sys_futex_wake
   455    common  futex_wait              sys_futex_wait
   456    common  futex_requeue           sys_futex_requeue
  +457    common  lsm_get_self_attr       sys_lsm_get_self_attr
  +458    common  lsm_set_self_attr       sys_lsm_set_self_attr
  +459    common  lsm_list_modules        sys_lsm_list_modules

Ok, fine - you added your new system calls to the end of the table.
Sure, I ended up having to fix them up because the "end of the table"
was different by the time I merged your tree, but that wasn't the
problem.

The problem is here - in the same commit:

  --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  @@ -375,6 +375,9 @@
   451    common  cachestat               sys_cachestat
   452    common  fchmodat2               sys_fchmodat2
   453    64      map_shadow_stack        sys_map_shadow_stack
  +454    common  lsm_get_self_attr       sys_lsm_get_self_attr
  +455    common  lsm_set_self_attr       sys_lsm_set_self_attr
  +456    common  lsm_list_modules        sys_lsm_list_modules

note how you updated the tools copy WITH THE WRONG NUMBERS!

You just added them at the end of the table again, and just
incremented the numbers, but that was complete nonsense, because the
numbers didn't actually match the real system call numbers, because
that tools table hadn't been updated for new system calls - because it
hadn't needed them.

Yeah, our tooling header duplication is annoying, but the old
situation where the tooling just used various kernel headers directly
and would randomly break when kernel changes were made was even worse.

End result: avoid touching the tooling headers - and if you have to,
you need to *think* about it.

           Linus

^ permalink raw reply	[relevance 96%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-10 20:12 92%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-10 20:12 UTC (permalink / raw)
  To: Kees Cook, Al Viro
  Cc: Josh Triplett, Kees Cook, linux-kernel, Alexey Dobriyan

On Wed, 10 Jan 2024 at 11:24, Kees Cook <keescook@chromium.org> wrote:
>
> I've been trying to figure out how to measure only the execve portion of
> a workload (with perf)[1] to get a more real-world measurement, but the
> above does show improvements for the "open once early". I'll get the
> behavior landed in -next after the merge window closes, and we can
> continue examining if we can make do_filp_open() better...

Well, so the workload that shows the "early open" issue most is
actually something disgusting like this:

    #include <unistd.h>

    int main(int argc, char **argv, char **envp)
    {
        for (int i = 0; i < 10000000; i++)
                execve("nonexistent", argv, envp);
        return 0;
    }

and it's trivial to run under perf. You'll get something like this
with my patch:

   8.65%  [k] strncpy_from_user
   8.37%  [k] kmem_cache_alloc
   7.71%  [k] kmem_cache_free
   5.14%  [k] mod_objcg_state
   4.84%  [k] link_path_walk
   4.36%  [k] memset_orig
   3.93%  [k] percpu_counter_add_batch
   3.66%  [k] do_syscall_64
   3.63%  [k] path_openat

and with the hacky "open twice" you'll see that kmem_cache_alloc/free
should be much lower - it still does a kmem_cache_alloc/free() pair
for the pathname, but the 'struct file *' allocation/free goes away.

Anyway, you can see a very similar issue by replacing the "execve()" line with

                open("nonexistent", O_RDONLY);

instead, and for exactly the same reason. Just to show that this issue
isn't execve-related.

I really think that the "open twice" is wrong. It will look
artificially good in this "does not exist" case, but it will penalize
other cases, and it just hides this issue.

Without either of the patches, you'll see that execve case spend all
its time counting environment variables, and be much slower as a
result. Instead of that "strncpy_from_user()", you'll see
"strnlen_user()" and ccopy_from_user() shoot up because of that.

The above perf profile is actually quote good in general: the slab
alloc/free is a big issue only because nothing else is.

Oh, and the above profile depends *heavily* on your particular
microarchitecture and which mitigations you have in place. System call
overhead might be at the top, for example.

And the cost of "strncpy_from_user()" is so high above not because we
do a lot of copies (it's just that shortish filename), but simply
mainly because user copies are so insanely expensive on some uarchs
due to CLAC/STAC being expensive.

So even a short filename copy can end up taking more than the whole path walk.

So your exact mileage will vary, but you should see that pattern of
"kmem_cache_alloc/free" (and the "strnlen_user()" issue with none of
the patches being applied) etc etc.

                     Linus

^ permalink raw reply	[relevance 92%]

* Re: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
  @ 2024-01-10 19:35 99%           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-10 19:35 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Jiri Slaby, David Laight, linux-kernel, Andy Shevchenko,
	Andrew Morton, Matthew Wilcox (Oracle),
	Christoph Hellwig, Jason A. Donenfeld

On Tue, 9 Jan 2024 at 22:17, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> > Can somebody else confirm similar time differences? Or is it just me?
>
> I was hopeful, but:

Yeah, my build times seem to be very unstable for some reason, and
seem to fluctuate fairly widely. I'm not sure what triggers it.

The min/max simplification helps, but I think my "big change" thing
was mostly due to other fluctuations.

It would be lovely to have some performance automation to find build
time regressions, although at least for me, one source of regressions
tends to be system updates with new compilers ;(

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] keys, dns: Fix missing size check of V1 server-list header
  @ 2024-01-10 18:52 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-10 18:52 UTC (permalink / raw)
  To: David Howells
  Cc: Pengfei Xu, eadavis, Simon Horman, Markus Suvanto,
	Jeffrey E Altman, Marc Dionne, Wang Lei, Jeff Layton,
	Steve French, Jarkko Sakkinen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-afs, keyrings, linux-cifs,
	linux-nfs, ceph-devel, netdev, linux-fsdevel, linux-kernel,
	heng.su

On Wed, 10 Jan 2024 at 09:23, David Howells <dhowells@redhat.com> wrote:
>
> Meh.  Does the attached fix it for you?

Bah. Obvious fix is obvious.

Mind sending it as a proper patch with sign-off etc, and we'll get
this fixed and marked for stable.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Btrfs updates for 6.8
  @ 2024-01-10 17:34 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-10 17:34 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, linux-kernel

On Fri, 5 Jan 2024 at 11:04, David Sterba <dsterba@suse.com> wrote:
>
> There are possible minor merge conflicts reported by linux-next.

Bah. The block open mode changes were ugly. I did my best to make the
end result legible.

You may want to note the btrfs_open_mode() helper I added and possibly
do it differently.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-10  3:54 91%           ` Linus Torvalds
      1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-10  3:54 UTC (permalink / raw)
  To: Josh Triplett, Al Viro
  Cc: Kees Cook, Kees Cook, linux-kernel, Alexey Dobriyan

On Tue, 9 Jan 2024 at 18:21, Josh Triplett <josh@joshtriplett.org> wrote:
>
> Instead, here are some numbers from Linus's suggested benchmark
> (modified to use execvpe, and to count down rather than up so it doesn't
> need two arguments; modified version and benchmark driver script
> attached; compiled with `musl-gcc -Wall -O3 -s -static`):

Yeah, that's better. I had actually only benchmarked the success case.

And I see what the problem is: the "sane" way that only opens the
pathname once does so using

        file = do_filp_open(fd, name, &open_exec_flags);

and the "open path twice" does the first open with

        retval = filename_lookup(AT_FDCWD, filename, 0, &path, NULL);

and guess what the difference is?

The difference is that path_openat() starts out with

        file = alloc_empty_file(op->open_flag, current_cred());

and when the open fails, it will free the file with

        fput(file);

So if there are a lot of failures (because "." is at the end of the
path), it will have done a fair amount of those useless file
allocations and frees.

And - not surprisingly - the "open once" is then faster if there are
*not* a lot of failures, when the executable is found early in the
PATH.

Now, there's no fundamental *reason* to do that alloc_empty_file()
early, except for how the code is structured.

It partly makes the error handling simpler and since all the cases
want the filp in the end, doing it at the top means that it's only
done once.

And we occasionally do use the file pointer early (ie lookup_open()
will clear/set FMODE_CREATED in it even if it doesn't otherwise touch
the file pointer) even before the final lookup - and at creation time,
atomic_open() will actually want it for the final lookup.

Anyway, the real fix is definitely to just fix do_filp_open() to at
least not be *too* eager to allocate a file pointer.

In fact, that code is oddly non-optimal for another reason: it does
that "allocate and free file" not just when the path lookup fails, but
it does it for things like RCU lookup failures too.

So what happens is that if RCU lookup fails, do_filp_open() will call
path_openat() twice: first with LOOKUP_RCU, and then without it. And
path_openat() will allocate that "struct file *" twice.

On NFS - or other filesystems that can return ESTALE - it will in fact
do it three times.

That's pretty disgusting.

Al, comments? We *could* just special-case the execve() code not to
use do_filp_open() and avoid this issue that way, but it does feel
like even the regular open() case is pessimal with that whole RCU
situation.

                Linus

^ permalink raw reply	[relevance 91%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-09 23:40 72%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-09 23:40 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Kees Cook, Kees Cook, linux-kernel, Alexey Dobriyan

On Tue, 9 Jan 2024 at 10:57, Josh Triplett <josh@joshtriplett.org> wrote:
>
> But I do think the spawnbench
> benchmark I provided (which has fork-execvpe and vfork-execvpe and
> posix_spawnp variants) is representative of real-world patterns for how
> programs execute other programs on $PATH.

No, it really isn't.

I already pointed out that the benchmark is entirely broken, because
what it times is broken.

It basically shows the difference in times between the parent and
child both doing a clock_gettime(), but what happens in between those
is
 - the parent doing a fork
 - the child doing an execve
 - the child reading the time
and that makes it a "at which point did we happen to schedule"
benchmark, not an execve() benchmark.

Just as an example, imagine if we schedule the child immediately after
the fork, and the parent doesn't run at all.

That obviously gives the minimum time difference - what your benchmark
then treats as "best". Agreed?

Except does it?

You have random details like "who happens to do a page fault and has
to copy the the stack page that has been marked COW, and that both the
parent and child have mapped and both will write to immediately  after
the fork()"

If the child runs first, the child will take that page fault, and the
child will have to do the page copy and insert it into its own VM.

So maybe it's better if the parent runs first and takes the page fault
and does the copy, and the child runs on another CPU just a little bit
later, and sees that it now has an exclusive page and doesn't need to
copy it at all? Maybe it gets to the execve() faster that way, and
gets a lower time difference just by pure luck? Or at least has a CPU
core of its own while the parent does something else?

Now, with "fork()" *something* has to do the page copy before the
execve() happens, unless it's all very lucky and the child happens to
run with the stack just at a page boundary and just gets its own page
that way.

I suspect you'll get the best performance if you run everything on
just one CPU, and don't try to spread things out, at least if your L2
caches are big enough to fit there - just for the best cache
utilization.

Because if you try to run the two loads on different CPU cores (and
maybe avoid HT siblings too, to get the best throughput), you'll have
to push all the cached contents from the parent to the child.

And maybe thats' ok on this one. It's almost certainly a good thing on
*some* loads, particularly if the child then ends up having more work
it does longer-term.

And yes, our scheduler tries to actually take cache affinity etc into
account, although the interaction with fork() may or may not be
optimal.

But my point is that what you are testing isn't actually the execve()
cycle, you're basically testing all these scheduler interactions on a
benchmark that doesn't actually match any real load.

Now, using vfork() instead of fork() will improve things, both from a
performance standpoint and from a "not as much a scheduler benchmark"
standpoint.

At least we don't have the issue with COW pages and trying to aim for
cache re-use, because there will be no overlap in execution of the
child and parent while they share the same VM. The parent is going to
stop in vfork(), the child is obviously best run on the same CPU until
it does an execve() and releases the parent, and at that point it's
*probably* best to try to run the new child on a different CPU, and
bring the parent back on the original CPU,.

Except that behavior (which sounds to me like the best option in
general) is not AT ALL what your benchmark would consider the best
option - because all your spawn bench thing looks at is how quickly
the child gets to run, so things like "cache placement for parent" no
longer matter at all for spawnbench.

So for that benchmark, instead of maybe trying to keep the parent
local to its own caches, and run the (new) child with no cache
footprint on another CPU, the best numbers for your benchmark probably
come from running the new execve() on the same CPU and not running the
parent at all until later.

And those are good numbers for the spawnbench just because the process
was already on that CPU in the kernel, so not running the parent where
it makes sense is good, because alll that matterns by then is that you
want to run the child asap.

See? your benchmark doesn't actually even *attempt* to time how good
our fork-execve sequence is. It times something entirely different. It
basically gives the best score to a scheduler decision that probably
doesn't even make sense.

Or maybe it does. Who knows? Maybe we *should* change the scheduler to
do what is best for spawnbench.

But do you see why I think it's at least as much a scheduler benchmark
as it is a execve() one, and why I think it's likely not a very good
benchmark at all, because I _suspect_ that the best numbers come from
doing things that may or may not make sense..

Now, I sent out that other benchmark, which at least avoids the whole
scheduler thing, because it does everything as one single process. I'm
not saying that's a sensible benchmark _either_, but at least it's
targeted to just execve().

Another option would be to not time the whole "parent clock_gettime ->
child clock_gettime" sequience that makes no sense, but to just time
the whole "fork-execve-exit-wait" sequence (which you can do in the
parent).

Because at that point, you're not timing the difference between two
random points (where scheduling decisions will change what happens
between them), you're actually timing the cost of the *whole*
sequence. Sure, scheduling will still matter for the details, but at
least you've timed the whole work, rather than timed a random *part*
of the work where other things are then ignored entirely.

For example, once you time the whole thing, it's no longer a "did the
parent of the child do the COW copy"? We don't care. One or the other
has to take the cost, and it's part of the *whole* cost of the
operation. Sure, scheduling decisions will still end up mattering, so
it's not a pure execve() benchmark, but at least now it's a benchmark
for the whole load, not just a random part of it.

              Linus

^ permalink raw reply	[relevance 72%]

* Re: [GIT PULL] lsm/lsm-pr-20240105
  @ 2024-01-09 21:07 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-09 21:07 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, linux-kernel

On Fri, 5 Jan 2024 at 15:21, Paul Moore <paul@paul-moore.com> wrote:
>
>             The hightlights of the LSM pull
> request are below, but before we get to that I want to mention that I
> expect you will hit merge conflicts in the arch-specific syscall
> tables as well as in the doc userspace-api documentation index.  Some
> of these conflicts exist in your tree now (syscall tables), with some
> others likely depending on what is submitted from linux-next and the
> order in which you merge things.  All of the conflicts that I've seen
> have been rather trivial and easily resolved, but I wanted to give you
> a heads-up; if you want me to resolve any of these let me know.

The tooling header file updates by the LSM tree were particularly annoying.

Not because the conflicts were hard per se, but because you had done
the header files wrong in the first place.

Your version of the tooling header files just didn't match the real
ones, as you had added your new system calls at the end mindlessly,
without noticing that others had *not* done so, so all your tooling
header system call number additions were just the wrong numbers
entirely.

I fixed it up, but it added an extra layer of "this is just annoying".
You'd have been better off not touching the tooling headers at all,
rather than touch them incorrectly.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  @ 2024-01-09 19:05 94% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-09 19:05 UTC (permalink / raw)
  To: syzbot, Oleg Nesterov, Eric W. Biederman
  Cc: linux-kernel, luto, michael.christie, mst, peterz, syzkaller-bugs, tglx

Oleg/Eric, can you make any sense of this?

On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080

Hmm. This smells more like a "that triggers the problem" than a cause.

Because the warning itself is

> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771

That's

        lockdep_assert_held(&t->sighand->siglock);

at the top of the function, with the call trace being

>  signal_wake_up include/linux/sched/signal.h:448 [inline]

just a wrapper setting 'state'.

>  zap_process fs/coredump.c:373 [inline]

That's zap_process() that does a

        for_each_thread(start, t) {

and then does a

                        signal_wake_up(t, 1);

on each thread.

>  zap_threads fs/coredump.c:392 [inline]

And this is zap_threads(), which does

        spin_lock_irq(&tsk->sighand->siglock);
        ...
                nr = zap_process(tsk, exit_code);

Strange. The sighand->siglock is definitely taken.

The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a

        list_for_each_entry_rcu(..)

walking over the tsk->signal->thread_head list.

But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:

        if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
                return ERR_PTR(-EINVAL);

so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.

So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.

Is there some odd exit race? The thread is removed with

        list_del_rcu(&p->thread_node);

in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.

Thus bringing in Eric/Oleg to see if they see something I miss.

Original email at

    https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/

for your pleasure.

            Linus

^ permalink raw reply	[relevance 94%]

* Re: [GIT PULL] x86/mm changes for v6.8
  2024-01-09  2:06 99% ` Linus Torvalds
@ 2024-01-09  3:57 85%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-09  3:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, the arch/x86 maintainers, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On Mon, 8 Jan 2024 at 18:06, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This does not even compile for me.
>
>   arch/x86/include/asm/uaccess_64.h: In function ‘__untagged_addr’:
>   arch/x86/include/asm/uaccess_64.h:25:28: error: implicit declaration
> of function ‘__my_cpu_var’; did you mean ‘put_cpu_var’?
> [-Werror=implicit-function-declaration]

Side note: the whole __my_cpu_var() reminds me of the attached patch
that I have in my testing tree, and have been carrying along for a
number of months now.

I definitely think it's the right thing to do, so here it is again,
even if it is only tangentially related to the build failure wrt this
broken pull request.

                   Linus

[-- Attachment #2: 0001-x86-clean-up-fpu-switching-to-not-load-current-in-th.patch --]
[-- Type: text/x-patch, Size: 4341 bytes --]

From 14f81cfd3aa3b53be9ad05801cdc7d7de91f807a Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 16 Oct 2023 16:04:11 -0700
Subject: [PATCH] x86: clean up fpu switching to not load 'current' in the
 middle of task switching

It happens to work, but it's very very wrong, because out 'current'
macro is magic that is supposedly loading a stable value.

It just happens to be not quite stable enough and the compilers re-load
the value enough for this code to work.  But it's wrong.

It also generates worse code.

So fix it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/include/asm/fpu/sched.h | 10 ++++++----
 arch/x86/kernel/process_32.c     |  7 +++----
 arch/x86/kernel/process_64.c     |  7 +++----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index ca6e5e5f16b2..c485f1944c5f 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -37,10 +37,12 @@ extern void fpu_flush_thread(void);
  * The FPU context is only stored/restored for a user task and
  * PF_KTHREAD is used to distinguish between kernel and user threads.
  */
-static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
+static inline void switch_fpu_prepare(struct task_struct *old, int cpu)
 {
 	if (cpu_feature_enabled(X86_FEATURE_FPU) &&
-	    !(current->flags & (PF_KTHREAD | PF_USER_WORKER))) {
+	    !(old->flags & (PF_KTHREAD | PF_USER_WORKER))) {
+		struct fpu *old_fpu = &old->thread.fpu;
+
 		save_fpregs_to_fpstate(old_fpu);
 		/*
 		 * The save operation preserved register state, so the
@@ -60,10 +62,10 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Delay loading of the complete FPU state until the return to userland.
  * PKRU is handled separately.
  */
-static inline void switch_fpu_finish(void)
+static inline void switch_fpu_finish(struct task_struct *new)
 {
 	if (cpu_feature_enabled(X86_FEATURE_FPU))
-		set_thread_flag(TIF_NEED_FPU_LOAD);
+		set_tsk_thread_flag(new, TIF_NEED_FPU_LOAD);
 }
 
 #endif /* _ASM_X86_FPU_SCHED_H */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 708c87b88cc1..0917c7f25720 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -156,13 +156,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 {
 	struct thread_struct *prev = &prev_p->thread,
 			     *next = &next_p->thread;
-	struct fpu *prev_fpu = &prev->fpu;
 	int cpu = smp_processor_id();
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
-	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_p, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -209,7 +208,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	raw_cpu_write(pcpu_hot.current_task, next_p);
 
-	switch_fpu_finish();
+	switch_fpu_finish(next_p);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in(next_p);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 33b268747bb7..1553e19904e0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -562,14 +562,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 {
 	struct thread_struct *prev = &prev_p->thread;
 	struct thread_struct *next = &next_p->thread;
-	struct fpu *prev_fpu = &prev->fpu;
 	int cpu = smp_processor_id();
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(pcpu_hot.hardirq_stack_inuse));
 
-	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_fpu, cpu);
+	if (!test_tsk_thread_flag(prev_p, TIF_NEED_FPU_LOAD))
+		switch_fpu_prepare(prev_p, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -623,7 +622,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	raw_cpu_write(pcpu_hot.current_task, next_p);
 	raw_cpu_write(pcpu_hot.top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish();
+	switch_fpu_finish(next_p);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
-- 
2.43.0.5.g38fb137bdb


^ permalink raw reply related	[relevance 85%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  2024-01-09  1:53 99%     ` Linus Torvalds
@ 2024-01-09  3:28 84%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-09  3:28 UTC (permalink / raw)
  To: Kees Cook; +Cc: Kees Cook, linux-kernel, Alexey Dobriyan, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]

On Mon, 8 Jan 2024 at 17:53, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Because I *guarantee* that we can trivially write another benchmark
> that shows that looking up the pathname twice is worse.

Ok, so I just took a look at the alleged benchmark that was used for
the "look  up twice" argument.

It looks quite broken.

What it seems to do is to "fork+execve" on a small file, and do
clock_gettime() in the parent and in the child, and add up the
differences between the times.

But that's just testing random scheduler interactions, not the speed
of fork/exec.

IOW, that one improves performance if you always run the child first
after the fork(), so that the child runs immediately, finishes the
work, and when the parent then resumes, it reads the completed result
from the pipe.

It will give big behavior changes for any scheduling behavior - like
trying to run children concurrently on another CPU vs running it
immediately on the same CPU etc etc.

Using "vfork()" instead of "fork()" will remove *one* variable, in
that it will force that "child runs first" behavior that you want, and
would likely help performance a lot. But even then you'll end up with
a scheduling benchmark: when the child does "execve()" that will now
wake up the parent again, and the *optimal* behavior is probably to
run the child fully until it does "exit" (well, at least until it runs
"clock_gettime()") before scheduling the parent.

You might get that by just forcing it all to run on one single CPU,
unless the wakeup by the execve() synchronously wakes up the parent.

IOW, you can probably get closer to the numbers you want with vfork(),
but even then it's a crap-shoot and depends on scheduling.

If you want to actually test execve() itself, you shouldn't use fork()
at all - you should literally execve() in a loop, using the execve()
argument as the "loop variable". That will actually test execve(), not
the scheduling of the child, which will be pretty random.

IOW, something (truly stuipid) like the attached, and then you do

    $ gcc -O2 --static t.c
    $ time ./a.out 100000 1

to time a hundred thousand execve() calls.

Look ma, no fork, vfork, or scheduler interactions.

Of course, if you then want to check the pathname lookup failure cost,
you'd need to change the "execve()" into a "execvpe()" and play around
with the PATH variable, putting "." in different places etc. And you
might want to write your own PATH lookup one, to make sure it actually
uses the "execve()" system call and not "stat()" to find the
executable.

.. and do you want to then check using "execveat()" (new model) vs
"path string created by appending in user space" (classic model)?

Tons of variables. For example, modern "execveat()" behavior is
*probably* using a small pathname that is looked up by opening the
different directories in $PATH, but the old-school thing that creates
pathnames all in user space and then does "execve()" on them will
probably have fairly heavy path lookup costs.

So now the whole "look up path twice" might be very differently
expensive depending on just how you ended up dealing with the $PATH
components. It *could* be cheap. Or it might be looking up a long
path.

End result: there's a million interactions here. You need to decide
what you want to test. But you *definitely* shouldn't decide to test
some random scheduler behavior and call it "execve cost".

                Linus

[-- Attachment #2: t.c --]
[-- Type: text/x-c-code, Size: 360 bytes --]

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv, char **envp)
{
	char buffer[10];
	int n, m;

	if (argc < 3)
		exit(1);
	n = atoi(argv[1]);
	if (n <= 0)
		exit(2);
	m = atoi(argv[2]);
	if (m >= n)
		exit(0);
	snprintf(buffer, sizeof(buffer), "%d", m+1);
	argv[2] = buffer;
	execve("./a.out", argv, envp);
	exit(3);
}

^ permalink raw reply	[relevance 84%]

* Re: [GIT PULL] x86/mm changes for v6.8
  @ 2024-01-09  2:06 99% ` Linus Torvalds
  2024-01-09  3:57 85%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-09  2:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, the arch/x86 maintainers, Peter Zijlstra

On Mon, 8 Jan 2024 at 03:35, Ingo Molnar <mingo@kernel.org> wrote:
>
>  - Robustify pfn_to_kaddr()
>
>  - Improve the __untagged_addr() code: RIP-relative addresses are fine these days
>    and generate better code, and update misleading/outdated comments as well.

This does not even compile for me.

  arch/x86/include/asm/uaccess_64.h: In function ‘__untagged_addr’:
  arch/x86/include/asm/uaccess_64.h:25:28: error: implicit declaration
of function ‘__my_cpu_var’; did you mean ‘put_cpu_var’?
[-Werror=implicit-function-declaration]

WTH?

Maybe this has worked in your tree by mistake because there was some
branch dependency that just happened to work out because you had
merged things in a different order.

But that would very much not be ok regardless. Those branches should
be tested independently, and clearly they were not.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-09  1:53 99%     ` Linus Torvalds
  2024-01-09  3:28 84%       ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-09  1:53 UTC (permalink / raw)
  To: Kees Cook; +Cc: Kees Cook, linux-kernel, Alexey Dobriyan, Josh Triplett

On Mon, 8 Jan 2024 at 17:48, Kees Cook <kees@kernel.org> wrote:
>
> This was exactly the feedback I had originally and wrote almost what you suggest:
>
> https://lore.kernel.org/lkml/202209161637.9EDAF6B18@keescook/
>
> But the perf testing of my proposed "look it up once" patch showed a
> net loss to the successful execs which no one could explain. In the
> end we went with the original proposal.

Basing things one one random benchmark which must clearly have some
very particular cache effects or something is not ok.

End result: I'm not taking a random "look up filename twice because we
can't understand what is going on".

Because I *guarantee* that we can trivially write another benchmark
that shows that looking up the pathname twice is worse.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] vfs mount api updates
  @ 2024-01-09  1:02 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-09  1:02 UTC (permalink / raw)
  To: Christian Brauner, Catalin Marinas, Will Deacon
  Cc: linux-fsdevel, linux-kernel

On Fri, 5 Jan 2024 at 04:47, Christian Brauner <brauner@kernel.org> wrote:
>
> This contains the work to retrieve detailed information about mounts via two
> new system calls.

Gaah. While I have an arm64 laptop now, I don't do arm64 builds in
between each pull like I do x86 ones.

I *did* just start one, because I got the arm64 pull request.

And this fails the arm64 build, because __NR_statmount and
__NR_listmount (457 and 458 respectively) exceed the compat system
call array size, which is

arch/arm64/include/asm/unistd.h:
  #define __NR_compat_syscalls            457

I don't think this is a merge error, I think the error is there in the
original, but I'm about to go off and have dinner, so I'm just sending
this out for now.

How was this not noted in linux-next? Am I missing something?

Now, admittedly this looks like an easy mistake to make due to that
whole odd situation where the compat system calls are listed in
unistd32.h, but then the max number is in unistd.h, but I would still
have expected this to have raised flags before it hit my tree..

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  2024-01-09  0:30 99%   ` Linus Torvalds
@ 2024-01-09  0:46 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-09  0:46 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-kernel, Alexey Dobriyan, Josh Triplett

On Mon, 8 Jan 2024 at 16:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Again - note the "might". Somebody needs to actually test it.  I may
> try to do that in between pulls.

It boots. It builds a kernel. It must be perfect.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  2024-01-09  0:19 83% ` Linus Torvalds
@ 2024-01-09  0:30 99%   ` Linus Torvalds
  2024-01-09  0:46 99%     ` Linus Torvalds
    1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-09  0:30 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-kernel, Alexey Dobriyan, Josh Triplett

On Mon, 8 Jan 2024 at 16:19, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, I want to repeat: this patch is UNTESTED. It compiles for me.

Actually, I take that back. I did a clang build, and clang noted that
my "remove the retval initialization as unnecessary" was wrong,
because the

                if (!bprm->fdpath)
                        goto out_free;

code path in alloc_bprm() still wanted that initial -ENOMEM initialization.

So you need to fix the

        int retval;

in alloc_bprm() to be back to the original

        int retval = -ENOMEM;

but then it might all work.

Again - note the "might". Somebody needs to actually test it.  I may
try to do that in between pulls.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] execve updates for v6.8-rc1
  @ 2024-01-09  0:19 83% ` Linus Torvalds
  2024-01-09  0:30 99%   ` Linus Torvalds
    0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-01-09  0:19 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-kernel, Alexey Dobriyan, Josh Triplett

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

On Mon, 8 Jan 2024 at 10:35, Kees Cook <keescook@chromium.org> wrote:
>
> Josh Triplett (1):
>       fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm

No, we're not doing this.

If you want to open the file before the allocations, then dammit, do
exactly that.

Don't look up the path twice. Something (ENTIRELY UNTESTED) like this
patch that just moves the open from "bprm_execve()" to "alloc_bprm()".
It actually cleans up the odd BINPRM_FLAGS_PATH_INACCESSIBLE case too,
by setting it where it makes sense.

Anyway, I want to repeat: this patch is UNTESTED. It compiles for me.
But that is literally all the testing it has gotten apart from a
cursory "this patch looks sane".

There might be something seriously wrong with this patch, but it at
least makes sense, unlike that horror that will look up the filename
twice.

I bet whatever benchmark did the original was not using long filenames
with lots of components, or was only testing the ENOENT case.

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4547 bytes --]

 fs/exec.c | 71 ++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 4aa19b24f281..a7f6f50a453f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1507,12 +1507,24 @@ static void free_bprm(struct linux_binprm *bprm)
 	kfree(bprm);
 }
 
-static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
+static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags)
 {
-	struct linux_binprm *bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
-	int retval = -ENOMEM;
-	if (!bprm)
-		goto out;
+	struct linux_binprm *bprm;
+	struct file *file;
+	int retval;
+
+	file = do_open_execat(fd, filename, flags);
+	if (IS_ERR(file))
+		return ERR_CAST(file);
+
+	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
+	if (!bprm) {
+		allow_write_access(file);
+		fput(file);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	bprm->file = file;
 
 	if (fd == AT_FDCWD || filename->name[0] == '/') {
 		bprm->filename = filename->name;
@@ -1525,18 +1537,28 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 		if (!bprm->fdpath)
 			goto out_free;
 
+		/*
+		 * Record that a name derived from an O_CLOEXEC fd will be
+		 * inaccessible after exec.  This allows the code in exec to
+		 * choose to fail when the executable is not mmaped into the
+		 * interpreter and an open file descriptor is not passed to
+		 * the interpreter.  This makes for a better user experience
+		 * than having the interpreter start and then immediately fail
+		 * when it finds the executable is inaccessible.
+		 */
+		if (get_close_on_exec(fd))
+			bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+
 		bprm->filename = bprm->fdpath;
 	}
 	bprm->interp = bprm->filename;
 
 	retval = bprm_mm_init(bprm);
-	if (retval)
-		goto out_free;
-	return bprm;
+	if (!retval)
+		return bprm;
 
 out_free:
 	free_bprm(bprm);
-out:
 	return ERR_PTR(retval);
 }
 
@@ -1807,10 +1829,8 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int bprm_execve(struct linux_binprm *bprm,
-		       int fd, struct filename *filename, int flags)
+static int bprm_execve(struct linux_binprm *bprm)
 {
-	struct file *file;
 	int retval;
 
 	retval = prepare_bprm_creds(bprm);
@@ -1826,26 +1846,8 @@ static int bprm_execve(struct linux_binprm *bprm,
 	current->in_execve = 1;
 	sched_mm_cid_before_execve(current);
 
-	file = do_open_execat(fd, filename, flags);
-	retval = PTR_ERR(file);
-	if (IS_ERR(file))
-		goto out_unmark;
-
 	sched_exec();
 
-	bprm->file = file;
-	/*
-	 * Record that a name derived from an O_CLOEXEC fd will be
-	 * inaccessible after exec.  This allows the code in exec to
-	 * choose to fail when the executable is not mmaped into the
-	 * interpreter and an open file descriptor is not passed to
-	 * the interpreter.  This makes for a better user experience
-	 * than having the interpreter start and then immediately fail
-	 * when it finds the executable is inaccessible.
-	 */
-	if (bprm->fdpath && get_close_on_exec(fd))
-		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
-
 	/* Set the unchanging part of bprm->cred */
 	retval = security_bprm_creds_for_exec(bprm);
 	if (retval)
@@ -1875,7 +1877,6 @@ static int bprm_execve(struct linux_binprm *bprm,
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
 		force_fatal_sig(SIGSEGV);
 
-out_unmark:
 	sched_mm_cid_after_execve(current);
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
@@ -1910,7 +1911,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	bprm = alloc_bprm(fd, filename);
+	bprm = alloc_bprm(fd, filename, flags);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
 		goto out_ret;
@@ -1959,7 +1960,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 		bprm->argc = 1;
 	}
 
-	retval = bprm_execve(bprm, fd, filename, flags);
+	retval = bprm_execve(bprm);
 out_free:
 	free_bprm(bprm);
 
@@ -1984,7 +1985,7 @@ int kernel_execve(const char *kernel_filename,
 	if (IS_ERR(filename))
 		return PTR_ERR(filename);
 
-	bprm = alloc_bprm(fd, filename);
+	bprm = alloc_bprm(fd, filename, 0);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
 		goto out_ret;
@@ -2019,7 +2020,7 @@ int kernel_execve(const char *kernel_filename,
 	if (retval < 0)
 		goto out_free;
 
-	retval = bprm_execve(bprm, fd, filename, 0);
+	retval = bprm_execve(bprm);
 out_free:
 	free_bprm(bprm);
 out_ret:

^ permalink raw reply related	[relevance 83%]

* Re: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
  2024-01-08 20:04 83%     ` Linus Torvalds
@ 2024-01-08 21:11 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-08 21:11 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: David Laight, linux-kernel, Andy Shevchenko, Andrew Morton,
	Matthew Wilcox (Oracle),
	Christoph Hellwig, Jason A. Donenfeld

On Mon, 8 Jan 2024 at 12:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So we *could* plan on that, remove the checks from min/max, and use
> something like the attached patch.

Whee.

On my machine, that patch makes an "allmodconfig" build go from

    10:41 elapsed

to

     8:46 elapsed

so that min/max type checking is almost 20% of the build time.

Yeah, I think we need to get rid of it.

Can somebody else confirm similar time differences? Or is it just me?

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
  2024-01-08 18:19 97%   ` Linus Torvalds
@ 2024-01-08 20:04 83%     ` Linus Torvalds
  2024-01-08 21:11 99%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-08 20:04 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: David Laight, linux-kernel, Andy Shevchenko, Andrew Morton,
	Matthew Wilcox (Oracle),
	Christoph Hellwig, Jason A. Donenfeld

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Mon, 8 Jan 2024 at 10:19, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, I'm sure this thing exists to a smaller degree elsewhere. I
> wonder if we could simplify our min/max type tests..

Hmm. Gcc seems to have fixed the old (horrid) behavior of warning
about comparing an unsigned variable with a (signed) positive constant
integer, which caused lots of completely unacceptable warnings.

Which means that maybe we could some day enable -Wsign-compare, if we
just fix all the cases we didn't care about because the warning was
fundamentally broken and useless anyway.

So we *could* plan on that, remove the checks from min/max, and use
something like the attached patch.

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4682 bytes --]

 arch/x86/Makefile          |  2 --
 include/linux/irqchip.h    |  3 +++
 include/linux/minmax.h     | 31 +++----------------------------
 init/Kconfig               |  4 ++++
 scripts/Makefile.extrawarn |  1 +
 5 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1a068de12a56..b4994eb934bc 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -186,8 +186,6 @@ ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
 	KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args,)
 endif
 
-# Workaround for a gcc prelease that unfortunately was shipped in a suse release
-KBUILD_CFLAGS += -Wno-sign-compare
 #
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 
diff --git a/include/linux/irqchip.h b/include/linux/irqchip.h
index d5e6024cb2a8..6488f3a3ca5c 100644
--- a/include/linux/irqchip.h
+++ b/include/linux/irqchip.h
@@ -20,6 +20,9 @@
 /* Undefined on purpose */
 extern of_irq_init_cb_t typecheck_irq_init_cb;
 
+#define __typecheck(x, y) \
+	(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
+
 #define typecheck_irq_init_cb(fn)					\
 	(__typecheck(typecheck_irq_init_cb, &fn) ? fn : fn)
 
diff --git a/include/linux/minmax.h b/include/linux/minmax.h
index 2ec559284a9f..b2e42c741859 100644
--- a/include/linux/minmax.h
+++ b/include/linux/minmax.h
@@ -8,37 +8,16 @@
 #include <linux/types.h>
 
 /*
- * min()/max()/clamp() macros must accomplish three things:
+ * min()/max()/clamp() macros must accomplish two things:
  *
  * - Avoid multiple evaluations of the arguments (so side-effects like
  *   "x++" happen only once) when non-constant.
  * - Retain result as a constant expressions when called with only
  *   constant expressions (to avoid tripping VLA warnings in stack
  *   allocation usage).
- * - Perform signed v unsigned type-checking (to generate compile
- *   errors instead of nasty runtime surprises).
- * - Unsigned char/short are always promoted to signed int and can be
- *   compared against signed or unsigned arguments.
- * - Unsigned arguments can be compared against non-negative signed constants.
- * - Comparison of a signed argument against an unsigned constant fails
- *   even if the constant is below __INT_MAX__ and could be cast to int.
+ *
+ * Hopefully, sign comparison warnings can be done by the compilers.
  */
-#define __typecheck(x, y) \
-	(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
-
-/* is_signed_type() isn't a constexpr for pointer types */
-#define __is_signed(x) 								\
-	__builtin_choose_expr(__is_constexpr(is_signed_type(typeof(x))),	\
-		is_signed_type(typeof(x)), 0)
-
-/* True for a non-negative signed int constant */
-#define __is_noneg_int(x)	\
-	(__builtin_choose_expr(__is_constexpr(x) && __is_signed(x), x, -1) >= 0)
-
-#define __types_ok(x, y) 					\
-	(__is_signed(x) == __is_signed(y) ||			\
-		__is_signed((x) + 0) == __is_signed((y) + 0) ||	\
-		__is_noneg_int(x) || __is_noneg_int(y))
 
 #define __cmp_op_min <
 #define __cmp_op_max >
@@ -48,8 +27,6 @@
 #define __cmp_once(op, x, y, unique_x, unique_y) ({	\
 	typeof(x) unique_x = (x);			\
 	typeof(y) unique_y = (y);			\
-	static_assert(__types_ok(x, y),			\
-		#op "(" #x ", " #y ") signedness error, fix types or consider u" #op "() before " #op "_t()"); \
 	__cmp(op, unique_x, unique_y); })
 
 #define __careful_cmp(op, x, y)					\
@@ -67,8 +44,6 @@
 	static_assert(__builtin_choose_expr(__is_constexpr((lo) > (hi)), 	\
 			(lo) <= (hi), true),					\
 		"clamp() low limit " #lo " greater than high limit " #hi);	\
-	static_assert(__types_ok(val, lo), "clamp() 'lo' signedness error");	\
-	static_assert(__types_ok(val, hi), "clamp() 'hi' signedness error");	\
 	__clamp(unique_val, unique_lo, unique_hi); })
 
 #define __careful_clamp(val, lo, hi) ({					\
diff --git a/init/Kconfig b/init/Kconfig
index 9ffb103fc927..0245253203c0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -876,6 +876,10 @@ config CC_NO_ARRAY_BOUNDS
 	bool
 	default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC11_NO_ARRAY_BOUNDS
 
+# -Wsign-compare has traditionally been horrific
+config CC_NO_SIGN_COMPARE
+	bool
+	default y
 #
 # For architectures that know their GCC __int128 support is sound
 #
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 2fe6f2828d37..edef0cbcf7d4 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -25,6 +25,7 @@ endif
 KBUILD_CPPFLAGS-$(CONFIG_WERROR) += -Werror
 KBUILD_CPPFLAGS += $(KBUILD_CPPFLAGS-y)
 KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+KBUILD_CFLAGS-$(CONFIG_CC_NO_SIGN_COMPARE) += -Wno-sign-compare
 
 ifdef CONFIG_CC_IS_CLANG
 # The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable.

^ permalink raw reply related	[relevance 83%]

* Re: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
  @ 2024-01-08 18:19 97%   ` Linus Torvalds
  2024-01-08 20:04 83%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-08 18:19 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: David Laight, linux-kernel, Andy Shevchenko, Andrew Morton,
	Matthew Wilcox (Oracle),
	Christoph Hellwig, Jason A. Donenfeld

On Mon, 8 Jan 2024 at 03:46, Jiri Slaby <jirislaby@gmail.com> wrote:
>
>    CPP [M] drivers/media/pci/solo6x10/solo6x10-p2m.i
> real    0m45,002s
>
> $ git revert 867046cc7027703f60a46339ffde91a1970f2901
>    CPP [M] drivers/media/pci/solo6x10/solo6x10-p2m.i
> real    0m11,132s
>
> $ git revert 4ead534fba42fc4fd41163297528d2aa731cd121
>    CPP [M] drivers/media/pci/solo6x10/solo6x10-p2m.i
> real    0m3,711s

Ouch. Yeah, that's unfortunate. There's a lot of nested nasty macro
expansion there, but that timing is excessive.

Sparse actually complains about that file:

  drivers/media/pci/solo6x10/solo6x10-p2m.c:309:13: error: too long
token expansion
  drivers/media/pci/solo6x10/solo6x10-p2m.c:310:17: error: too long
token expansion

and while that is a sparse limitation, it's still interesting. Having
that file expand to 122M is not ok.

In this case, I suspect the right thing to do is to simply not use
min()/max() in that header at all, but do something like

  --- a/drivers/media/pci/solo6x10/solo6x10-offsets.h
  +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h
  @@ -56,2 +56,5 @@

  +#define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
  +#define MAX(X, Y) ((X) > (Y) ? (X) : (Y))
  +
   #define SOLO_MP4E_EXT_ADDR(__solo) \
  @@ -59,4 +62,4 @@
   #define SOLO_MP4E_EXT_SIZE(__solo) \
  -     max((..),                               \
  -         min(((..) - \
  +     MAX((..),                               \
  +         MIN(((..) - \
                 ..), 0x00ff0000))
  @@ -67,4 +70,4 @@
   #define SOLO_JPEG_EXT_SIZE(__solo) \
  -     max(..,                         \
  -         min(..)
  +     MAX(..,                         \
  +         MIN(..)

and avoid this issue.

That said, I'm sure this thing exists to a smaller degree elsewhere. I
wonder if we could simplify our min/max type tests..

             Linus

^ permalink raw reply	[relevance 97%]

* Linux 6.7
@ 2024-01-07 20:29 62% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-07 20:29 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So we had a little bit more going on last week compared to the holiday
week before that, but certainly not enough to make me think we'd want
to delay this any further.

End result: 6.7 is (in number of commits: over 17k non-merge commits,
with 1k+ merges) one of the largest kernel releases we've ever had,
but the extra rc8 week was purely due to timing with the holidays, not
about any difficulties with the larger release.

The main changes this last week were a few DRM updates (mainly fixes
for new hw enablement in this version - both amd and nouveau), some
more bcachefs fixes (and bcachefs is obviously new to 6.7 and one of
the reasons for the large number of commits), and then a few random
driver updates. And a smattering of minor noise elsewhere.

The shortlog is appended - there really isn't much there, you can
scroll through it quickly if you care about the details - and this
obviously means that tomorrow the merge window for 6.8 opens. I
already have two dozen+ early pull requests pending - thank you.

But please do kick the tires of this before the fun of the next
development series starts. Ok?

                 Linus

---

Aabish Malik (1):
      ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series

Adrian Cinal (1):
      net: bcmgenet: Fix FCS generation for fragmented skbuffs

Alex Deucher (2):
      drm/amd/display: add nv12 bounding box
      drm/amdgpu: skip gpu_info fw loading on navi12

Alex Henrie (1):
      Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required"

Alexander Lobakin (1):
      idpf: fix corrupted frames and skb leaks in singleq mode

Andrii Staikov (1):
      i40e: Restore VF MSI-X state during PCI reset

Andy Chi (1):
      ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook

Arkadiusz Kubalewski (1):
      ice: dpll: fix phase offset value

Arnd Bergmann (1):
      ALSA: hda: cs35l41: fix building without CONFIG_SPI

Asad Kamal (5):
      drm/amd/pm: Use separate metric table for APU
      drm/amd/pm: Update metric table for jpeg/vcn data
      drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apu
      drm/amd/pm: Add gpu_metrics_v1_5
      drm/amd/pm: Use gpu_metrics_v1_5 for SMUv13.0.6

Attreyee Mukherjee (1):
      Documentation/i2c: fix spelling error in i2c-address-translators

Baolin Wang (1):
      mm: memcg: fix split queue list crash when large folio migration

Benjamin Bara (1):
      i2c: core: Fix atomic xfer check for non-preempt config

Benjamin Berg (2):
      wifi: mac80211: do not re-add debugfs entries during resume
      wifi: mac80211: add/remove driver debugfs entries as appropriate

Bjorn Helgaas (2):
      Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
      MAINTAINERS: Orphan Cadence PCIe IP

Brad Cowie (1):
      netfilter: nf_nat: fix action not being set for all ct states

Chancel Liu (1):
      ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable

Chen Ni (1):
      asix: Add check for usbnet_get_endpoints

Claudiu Beznea (1):
      net: ravb: Wait for operating mode to be applied

Dave Airlie (9):
      nouveau/gsp: add three notifier callbacks that we see in normal
operation (v2)
      nouveau/gsp: drop some acpi related debug
      nouveau: fix disp disabling with GSP
      nouveau/gsp: free acpi object after use
      nouveau/gsp: free userd allocation.
      nouveau/gsp: convert gsp errors to generic errors
      nouveau/gsp: don't free ctrl messages on errors
      nouveau/gsp: always free the alloc messages on r535
      nouveau: push event block/allowing out of the fence context

David Thompson (1):
      mlxbf_gige: fix receive packet race condition

Dinghao Liu (1):
      net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues

Dmitry Safonov (2):
      net/tcp_sigpool: Use kref_get_unless_zero()
      net/tcp: Only produce AO/MD5 logs if there are any keys

Douglas Anderson (3):
      drm/bridge: parade-ps8640: Never store more than msg->size bytes
in AUX xfer
      drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes
in AUX xfer
      drm/bridge: ps8640: Fix size mismatch warning w/ len

Eugen Hristev (1):
      ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset

Geert Uytterhoeven (1):
      mmc: core: Cancel delayed work before releasing host

Geoffrey D. Bennett (1):
      ALSA: scarlett2: Convert meter levels from little-endian

Gergo Koteles (4):
      ALSA: hda/tas2781: do not use regcache
      ALSA: hda/tas2781: fix typos in comment
      ALSA: hda/tas2781: move set_drv_data outside tasdevice_init
      ALSA: hda/tas2781: remove sound controls in unbind

Hangbin Liu (1):
      selftests: bonding: do not set port down when adding to bond

Hangyu Hua (1):
      net: sched: em_text: fix possible memory leak in em_text_destroy()

Jeff Layton (1):
      nfsd: drop the nfsd_put helper

Jeffrey Hugo (1):
      accel/qaic: Implement quirk for SOC_HW_VERSION

Jiajun Xie (1):
      mm: fix unmap_mapping_range high bits shift bug

Jingbo Xu (2):
      mm: fix arithmetic for bdi min_ratio
      mm: fix arithmetic for max_prop_frac when setting max_ratio

Jinghao Jia (1):
      x86/kprobes: fix incorrect return address calculation in
kprobe_emulate_call_indirect

Jocelyn Falempe (1):
      drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE

Johannes Berg (1):
      wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ

John Johansen (1):
      apparmor: Fix move_mount mediation by detecting if source is detached

Jorge Ramirez-Ortiz (1):
      mmc: rpmb: fixes pause retune on all RPMB partitions.

Joshua Ashton (1):
      drm/amd/display: Fix sending VSC (+ colorimetry) packets for
DP/eDP displays without PSR

Jörn-Thorben Hinz (1):
      net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)

Kai-Heng Feng (1):
      r8169: Fix PCI error on system resume

Katarzyna Wieczerzycka (1):
      ice: Fix link_down_on_close message

Ke Xiao (1):
      i40e: fix use-after-free in i40e_aqc_add_filters()

Kent Overstreet (13):
      bcachefs: Fix extents iteration + snapshots interaction
      bcachefs: fix invalid free in dio write path
      bcachefs: fix setting version_upgrade_complete
      bcachefs: Factor out darray resize slowpath
      bcachefs: Switch darray to kvmalloc()
      bcachefs: DARRAY_PREALLOCATED()
      bcachefs: fix buffer overflow in nocow write path
      bcachefs: move BCH_SB_ERRS() to sb-errors_types.h
      bcachefs: prt_bitflags_vector()
      bcachefs: Add persistent identifiers for recovery passes
      bcachefs: bch_sb.recovery_passes_required
      bcachefs: bch_sb_field_downgrade
      bcachefs: make RO snapshots actually RO

Khaled Almahallawy (1):
      drm/i915/dp: Fix passing the correct DPCD_REV for
drm_dp_set_phy_test_pattern

Kurt Kanzenbach (3):
      igc: Report VLAN EtherType matching back to user
      igc: Check VLAN TCI mask
      igc: Check VLAN EtherType mask

Linus Torvalds (2):
      x86/csum: clean up `csum_partial' further
      Linux 6.7

Lukas Bulwahn (1):
      MAINTAINERS: wifi: brcm80211: remove non-existing
SHA-cyfmac-dev-list@infineon.com

Lyude Paul (2):
      drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations
      drm/nouveau/dp: Honor GSP link training retry timeouts

Marc Dionne (1):
      net: Save and restore msg_namelen in sock_sendmsg

Marcin Wojtas (1):
      MAINTAINERS: Update mvpp2 driver email

Mark Brown (4):
      ASoC: meson: g12a-toacodec: Validate written enum values
      ASoC: meson: g12a-tohdmitx: Validate written enum values
      ASoC: meson: g12a-toacodec: Fix event generation
      ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux

Mathieu Othacehe (1):
      mailmap: add entries for Mathieu Othacehe

Matthieu Baerts (1):
      MAINTAINERS: add Geliang as reviewer for MPTCP

Michael Chan (1):
      bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()

Mike Kravetz (1):
      MAINTAINERS: remove hugetlb maintainer Mike Kravetz

Naoya Horiguchi (1):
      MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin

Naveen Mamindlapalli (2):
      octeontx2-af: Always configure NIX TX link credits based on max frame size
      octeontx2-af: Re-enable MAC TX in otx2_stop processing

Ngai-Mint Kwan (1):
      ice: Shut down VSI with "link-down-on-close" enabled

Noah Goldstein (1):
      x86/csum: Remove unnecessary odd handling

Pablo Neira Ayuso (3):
      netfilter: nf_tables: set transport offset from mac header for
netdev/egress
      netfilter: nf_tables: skip set commit for deleted/destroyed sets
      netfilter: nft_immediate: drop chain reference counter on error

Paolo Abeni (1):
      mptcp: prevent tcp diag from closing listener subflows

Paolo Bonzini (1):
      KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL

Paul Greenwalt (1):
      ice: fix Get link status data length

Pavan Kumar Linga (1):
      idpf: avoid compiler introduced padding in virtchnl2_rss_key struct

Peter Ujfalusi (1):
      ASoC: SOF: Intel: hda-codec: Delay the codec device registration

Pranjal Ramajor Asha Kanojiya (1):
      accel/qaic: Fix GEM import path code

Radu Pirea (NXP OSS) (1):
      MAINTAINERS: step down as TJA11XX C45 maintainer

Randy Dunlap (1):
      net: phy: linux/phy.h: fix Excess kernel-doc description warning

Rik van Riel (1):
      mm: align larger anonymous mappings on THP boundaries

Rodrigo Cataldo (1):
      igc: Fix hicredit calculation

Sagi Maimon (1):
      ptp: ocp: fix bug in unregistering the DPLL subsystem

Sarannya S (1):
      net: qrtr: ns: Return 0 if server port is not present

Shin'ichiro Kawasaki (1):
      Revert "platform/x86: p2sb: Allow p2sb_bar() calls during PCI
device probe"

Shyam Prasad N (3):
      cifs: after disabling multichannel, mark tcon for reconnect
      cifs: cifs_chan_is_iface_active should be called with chan_lock held
      cifs: do not depend on release_iface for maintaining iface_list

Siddh Raman Pant (2):
      nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref
to llcp_local
      nfc: Do not send datagram if socket state isn't LLCP_BOUND

Siddhesh Dharme (1):
      ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6

Stefan Wahren (2):
      ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
      ARM: sun9i: smp: fix return code check of of_property_match_string

Steven Rostedt (Google) (2):
      tracefs: Check for dentry->d_inode exists in set_gid()
      eventfs: Fix bitwise fields for "is_events"

Sudheer Mogilappagari (1):
      i40e: Fix filter input checks to prevent config with invalid values

Suman Ghosh (1):
      octeontx2-af: Fix marking couple of structure as __packed

Suren Baghdasaryan (1):
      arch/mm/fault: fix major fault accounting when retrying under per-VMA lock

Takashi Sakamoto (1):
      firewire: ohci: suppress unexpected system reboot in AMD Ryzen
machines and ASM108x/VT630x PCIe cards

Tetsuo Handa (1):
      mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()

Thomas Lange (1):
      net: Implement missing SO_TIMESTAMPING_NEW cmsg support

Umesh Nerlige Ramappa (1):
      drm/i915/perf: Update handling of MMIO triggered reports

Wayne Lin (1):
      drm/amd/display: pbn_div need be updated for hotplug event

Wen Gu (1):
      net/smc: fix invalid link access in dumping SMC-R connections

Wenchao Chen (1):
      mmc: sdhci-sprd: Fix eMMC init failure after hw reset

Xuan Zhuo (1):
      virtio_net: fix missing dma unmap for resize

Yu Zhao (1):
      mm/mglru: skip special VMAs in lru_gen_look_around()

Yuntao Wang (1):
      efi/x86: Fix the missing KASLR_FLAG bit in boot_params->hdr.loadflags

Zack Rusin (1):
      MAINTAINERS: change vmware.com addresses to broadcom.com

Zhipeng Lu (1):
      sfc: fix a double-free bug in efx_probe_filters

Ziyang Huang (1):
      mmc: meson-mx-sdhc: Fix initialization frozen issue

wangkeqi (1):
      connector: Fix proc_event_num_listeners count not cleared

^ permalink raw reply	[relevance 62%]

* Re: [RFC PATCH v4 2/4] mseal: add mseal syscall
  @ 2024-01-07 18:41 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-07 18:41 UTC (permalink / raw)
  To: jeffxu
  Cc: akpm, keescook, jannh, sroettger, willy, gregkh, usama.anjum,
	jeffxu, jorgelo, groeck, linux-kernel, linux-kselftest, linux-mm,
	pedro.falcato, dave.hansen, linux-hardening, deraadt

One comment:

On Thu, 4 Jan 2024 at 10:51, <jeffxu@chromium.org> wrote:
>
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 9a846439b36a..02280199069b 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -193,6 +193,7 @@ COND_SYSCALL(migrate_pages);
>  COND_SYSCALL(move_pages);
>  COND_SYSCALL(set_mempolicy_home_node);
>  COND_SYSCALL(cachestat);
> +COND_SYSCALL(mseal);
>
>  COND_SYSCALL(perf_event_open);
>  COND_SYSCALL(accept4);

Move this part to PATCH 1/1, so that it all builds cleanly.

Other than that, this seems all reasonable to me now.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: include/asm-generic/unaligned.h:119:16: sparse: sparse: cast truncates bits from constant value (aa01a0 becomes a0)
  @ 2024-01-07  5:54 98%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-07  5:54 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: kernel test robot, Arnd Bergmann, linux-sparse, Chris Morgan,
	oe-kbuild-all, linux-kernel

On Sat, 6 Jan 2024 at 16:42, Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote:
>
> This is not really a kernel/driver bug, just sparse being over-eager
> with truncation detection. I wonder if we could make sparse skip this
> check on forced casts like this:

No, please don't.

Just face the fact that using integer casts to mask bits off is a bad idea.

Yes, we could say "explicit casting is ok", since it's really the
hidden implicit casts changing values that sparse complains about, but
your solution is really ugly:

>  static inline void __put_unaligned_be24(const u32 val, u8 *p)
>  {
> -       *p++ = val >> 16;
> -       *p++ = val >> 8;
> -       *p++ = val;
> +       *p++ = (__force u8)(val >> 16);
> +       *p++ = (__force u8)(val >> 8);
> +       *p++ = (__force u8)val;
>  }

That's just disgusting.

The *natural* thing to do is to simply make the masking itself be
explicit - not the cast. IOW, just write it as

        *p++ = (val >> 16) & 0xff;
        *p++ = (val >> 8) & 0xff;
        *p++ = val & 0xff;

and doesn't that look much more natural?

Sure, the compiler will then just notice "you're assigning to a char,
to I don't actually need to do any masking at all", but now sparse
won't complain because there's no "cast silently drops bits" issue any
more.

And while the code is a bit more to read, I think it is actually to
some degree more obvious to a human too what is going on.

No?

              Linus

^ permalink raw reply	[relevance 98%]

* Re: x86/csum: Remove unnecessary odd handling
  @ 2024-01-06 19:32 90%                           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-06 19:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, Noah Goldstein, kernel test robot, x86,
	oe-kbuild-all, linux-kernel, tglx, mingo, bp, dave.hansen, hpa

On Sat, 6 Jan 2024 at 02:26, Eric Dumazet <edumazet@google.com> wrote:
>
> On a related note, at least with clang, I found that csum_ipv6_magic()
> is needlessly using temporary on-stack storage,
> showing a stall on Cascade Lake unless I am patching add32_with_carry() :

This is a known compiler bug in clang:

    https://github.com/llvm/llvm-project/issues/20571
    https://github.com/llvm/llvm-project/issues/30873
    https://github.com/llvm/llvm-project/issues/34837

and while we could always just use "r" for constraints, it will

 (a) possibly run out of registers in some situations

 (b) pessimize gcc that does this right

Can you please push the clang people to not do the stupid thing they
do now, which is to say "oh, I can use registers _or_ memory, so I'll
always use memory".

Btw, it's actually much worse than that, and clang is just doing
incredibly broken things. Switching to "r" just hides the garbage.

Because sometimes the source really *is* memory, ie we have

    net/ipv4/udp.c:
                 csum = csum_add(csum, frags->csum);

and then it's criminally stupid to load it into a register when it can
be just used directly.

But clang says "criminally stupid? *I* will show you stupid!" -
because what *clang* does with the above is this thing of beauty:

        movl    136(%rax), %edi
        movl    %edi, 28(%rsp)
        addl    28(%rsp), %ecx
        adcl    $0, %ecx

which has turned from "criminally stupid" to "it's art, I tell you -
you're not supposed to understand it".

Anyway, this is *literally* a clang bug. Complain to clang people for
being *extra* stupid - we told the compiler that it can use a register
or memory, and clang decided "I'll use memory", so then when we gave
it a memory location, it said "no, not *that* memory - I'll just
reload it on stack".

In contrast, gcc gets this right - and for that udp.c case it just generates

        addl 136(%rax),%ecx     # frags_67->D.58941.D.58869.D.58836.csum, a
        adcl $0,%ecx    # a

like it should.

And for csum_ipv6_magic, gcc generates this:

        addl %edx,%eax  # tmp112, a
        adcl $0,%eax    # a

IOW, the kernel is *right*, and this is purely clang being completely bogus.

I really don't want to penalize a good compiler because a bad one
can't get its act together.

               Linus

^ permalink raw reply	[relevance 90%]

* Re: x86/csum: Remove unnecessary odd handling
  @ 2024-01-06  0:18 99%                       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-06  0:18 UTC (permalink / raw)
  To: David Laight
  Cc: Noah Goldstein, kernel test robot, x86, oe-kbuild-all,
	linux-kernel, edumazet, tglx, mingo, bp, dave.hansen, hpa

On Fri, 5 Jan 2024 at 15:53, David Laight <David.Laight@aculab.com> wrote:
>
> I'd have to fix his benchmark code first :-)
> You can't use the TSC unless you lock the cpu frequency.
> The longer the test runs for the faster the cpu will run.

They'll stabilize, it has soem cycle result aging code.

But yes, set the CPU policy to 'performance' or do performance
counters if you care deeply.

> On a related point, do you remember what the 'killer app'
> was for doing the checksum in copy_to/from_user?

No. It's a long time ago, and many things have changed since.

It's possible the copy-and-csum it's not worth it any more, simply
because all modern network cards will do the csum for us, and I think
loopback sets a flag saying "no need to checksum" too.

But I do have a strong memory of it being a big deal back when. A
_loong_ time ago.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: x86/csum: Remove unnecessary odd handling
  @ 2024-01-05 18:05 82%                   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-05 18:05 UTC (permalink / raw)
  To: David Laight
  Cc: Noah Goldstein, kernel test robot, x86, oe-kbuild-all,
	linux-kernel, edumazet, tglx, mingo, bp, dave.hansen, hpa

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

On Fri, 5 Jan 2024 at 02:41, David Laight <David.Laight@aculab.com> wrote:
>
> Interesting, I'm pretty sure trying to get two blocks of
>  'adc' scheduled in parallel like that doesn't work.

You should check out the benchmark at

       https://github.com/fenrus75/csum_partial

and see if you can improve on it. I'm including the patch (on top of
that code by Arjan) to implement the actual current kernel version as
"New version".

         Linus

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 4840 bytes --]

From 6ff7f7a72a4855970b1621ac9724c44c393a6d44 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 5 Jan 2024 09:46:32 -0800
Subject: [PATCH] Add the current kernel version as "New version"

---
 Makefile       |   3 --
 csum_partial.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index e4b1bb3..4e29f8a 100644
--- a/Makefile
+++ b/Makefile
@@ -17,6 +17,3 @@ chain2.svg: graphs/chain2.dot
 chain2a.svg: graphs/chain2a.dot
 	dot -Tsvg -O graphs/chain2a.dot  
 	mv graphs/chain2a.dot.svg chain2a.svg
-	
-	
-	
\ No newline at end of file
diff --git a/csum_partial.c b/csum_partial.c
index 4db0d97..ddf6acd 100644
--- a/csum_partial.c
+++ b/csum_partial.c
@@ -14,13 +14,28 @@
 #include <time.h>
 
 typedef uint32_t __wsum;
+typedef uint32_t __u32;
+typedef uint64_t __u64;
 typedef uint64_t u64;
 typedef uint32_t u32;
+# define likely(x) __builtin_expect(!!(x), 1)
 # define unlikely(x) __builtin_expect(!!(x), 0)
 
+#define __force
+
 #define LOOPCOUNT 102400
 #define PACKETSIZE 40
 
+/**
+ * ror64 - rotate a 64-bit value right
+ * @word: value to rotate
+ * @shift: bits to roll
+ */
+static inline __u64 ror64(__u64 word, unsigned int shift)
+{
+	return (word >> (shift & 63)) | (word << ((-shift) & 63));
+}
+
 static inline unsigned long load_unaligned_zeropad(const void *addr)
 {
 	unsigned long ret, dummy;
@@ -484,7 +499,105 @@ static inline __wsum nulltest(const void *buff, int len, __wsum sum)
 {
 	return 2;
 }
+static inline __wsum csum_finalize_sum(u64 temp64)
+{
+	return (__force __wsum)((temp64 + ror64(temp64, 32)) >> 32);
+}
 
+static inline unsigned long update_csum_40b(unsigned long sum, const unsigned long m[5])
+{
+	asm("addq %1,%0\n\t"
+	     "adcq %2,%0\n\t"
+	     "adcq %3,%0\n\t"
+	     "adcq %4,%0\n\t"
+	     "adcq %5,%0\n\t"
+	     "adcq $0,%0"
+		:"+r" (sum)
+		:"m" (m[0]), "m" (m[1]), "m" (m[2]),
+		 "m" (m[3]), "m" (m[4]));
+	return sum;
+}
+
+/*
+ * Do a checksum on an arbitrary memory area.
+ * Returns a 32bit checksum.
+ *
+ * This isn't as time critical as it used to be because many NICs
+ * do hardware checksumming these days.
+ *
+ * Still, with CHECKSUM_COMPLETE this is called to compute
+ * checksums on IPv6 headers (40 bytes) and other small parts.
+ * it's best to have buff aligned on a 64-bit boundary
+ */
+__wsum csum_partial_new(const void *buff, int len, __wsum sum)
+{
+	u64 temp64 = (__force u64)sum;
+
+	/* Do two 40-byte chunks in parallel to get better ILP */
+	if (likely(len >= 80)) {
+		u64 temp64_2 = 0;
+		do {
+			temp64 = update_csum_40b(temp64, buff);
+			temp64_2 = update_csum_40b(temp64_2, buff + 40);
+			buff += 80;
+			len -= 80;
+		} while (len >= 80);
+
+		asm("addq %1,%0\n\t"
+		    "adcq $0,%0"
+		    :"+r" (temp64): "r" (temp64_2));
+	}
+
+	/*
+	 * len == 40 is the hot case due to IPv6 headers, so return
+	 * early for that exact case without checking the tail bytes.
+	 */
+	if (len >= 40) {
+		temp64 = update_csum_40b(temp64, buff);
+		len -= 40;
+		if (!len)
+			return csum_finalize_sum(temp64);
+		buff += 40;
+	}
+
+	if (len & 32) {
+		asm("addq 0*8(%[src]),%[res]\n\t"
+		    "adcq 1*8(%[src]),%[res]\n\t"
+		    "adcq 2*8(%[src]),%[res]\n\t"
+		    "adcq 3*8(%[src]),%[res]\n\t"
+		    "adcq $0,%[res]"
+		    : [res] "+r"(temp64)
+		    : [src] "r"(buff), "m"(*(const char(*)[32])buff));
+		buff += 32;
+	}
+	if (len & 16) {
+		asm("addq 0*8(%[src]),%[res]\n\t"
+		    "adcq 1*8(%[src]),%[res]\n\t"
+		    "adcq $0,%[res]"
+		    : [res] "+r"(temp64)
+		    : [src] "r"(buff), "m"(*(const char(*)[16])buff));
+		buff += 16;
+	}
+	if (len & 8) {
+		asm("addq 0*8(%[src]),%[res]\n\t"
+		    "adcq $0,%[res]"
+		    : [res] "+r"(temp64)
+		    : [src] "r"(buff), "m"(*(const char(*)[8])buff));
+		buff += 8;
+	}
+	if (len & 7) {
+		unsigned int shift = (-len << 3) & 63;
+		unsigned long trail;
+
+		trail = (load_unaligned_zeropad(buff) << shift) >> shift;
+
+		asm("addq %[trail],%[res]\n\t"
+		    "adcq $0,%[res]"
+		    : [res] "+r"(temp64)
+		    : [trail] "r"(trail));
+	}
+	return csum_finalize_sum(temp64);
+}
 
 double cycles[64];
 int cyclecount[64];
@@ -612,6 +725,7 @@ int main(int argc, char **argv)
 
 	MEASURE(2,  csum_partial, "Upcoming linux kernel version");
 	MEASURE(4,  csum_specialized, "Specialized to size 40");
+	MEASURE(6,  csum_partial_new, "New version");
 	MEASURE(22, csum_partial_no_odd, "Odd-alignment handling removed");
 	MEASURE(24, csum_partial_dead_code, "Dead code elimination           ");
 	MEASURE(28, csum_partial_ACX, "ADX interleaved ");
@@ -619,7 +733,6 @@ int main(int argc, char **argv)
 	MEASURE(34, csum_partial_32bit, "32 bit train ");
 	MEASURE(36, csum_partial_zero_sum, "Assume zero input sum");
 
-
 	report();
 	}
-}
\ No newline at end of file
+}

^ permalink raw reply related	[relevance 82%]

* Re: [GIT PULL] Final KVM fix for Linux 6.7
  @ 2024-01-05 17:38 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-05 17:38 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, linux-kernel, kvm, peterz, paulmck

On Fri, 5 Jan 2024 at 09:29, Sean Christopherson <seanjc@google.com> wrote:
>
> Ha!  That's what I suggested too, clearly Paolo is the weird one :-)

Well, it's technically one fewer operation to do it our way, but
Paolo's version is

 (a) textually one character shorter

 (b) something the compiler can (and likely will) munge anyway, since
boolean operation optimizations are common

 (c) with the 'andn' instruction, the "fewer operations" isn't
necessarily fewer instructions

Of course, we can't currently use 'andn' in kernel code due to it
being much too new and requires BMI1. Plus the memory op version is
the wrong way around (ie the "not" part of the op only works on
register inputs), but _evenbtually_ that might have been an argument.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Final KVM fix for Linux 6.7
  @ 2024-01-05 17:21 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-05 17:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, peterz, paulmck

On Thu, 4 Jan 2024 at 07:48, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> * Fix Boolean logic in intel_guest_get_msrs

I think the intention of the original was to write this as

        .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),

but your version certainly works too.

Pulled.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: x86/csum: Remove unnecessary odd handling
  2024-01-04 23:36 99%             ` Linus Torvalds
@ 2024-01-05  0:33 99%               ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-05  0:33 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: David Laight, kernel test robot, x86, oe-kbuild-all,
	linux-kernel, edumazet, tglx, mingo, bp, dave.hansen, hpa

On Thu, 4 Jan 2024 at 15:36, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anyway, since I looked at the thing originally, and feel like I know
> the x86 side and understand the strange IP csum too, I just applied it
> directly.

I ended up just applying my 40-byte cleanup thing too that I've been
keeping in my own tree since posting it (as the "Silly csum
improvement. Maybe" patch).

I've been running it on my own machine since last June, and I finally
even enabled the csum KUnit test just to double-check it.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: x86/csum: Remove unnecessary odd handling
  @ 2024-01-04 23:36 99%             ` Linus Torvalds
  2024-01-05  0:33 99%               ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-04 23:36 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: David Laight, kernel test robot, x86, oe-kbuild-all,
	linux-kernel, edumazet, tglx, mingo, bp, dave.hansen, hpa

On Thu, 4 Jan 2024 at 15:28, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> ping

Bah. I think this keeps falling through the cracks because the
networking people consider this an architecture thing, and the x86
people probably consider this a networking thing.

Anyway, since I looked at the thing originally, and feel like I know
the x86 side and understand the strange IP csum too, I just applied it
directly.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-04 20:18 98%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-04 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 at 12:04, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Also, I just realized it breaks if we update the 'c--' before the callback. :-/
>
> I have to put this check *after* the callback check.

What? No.

> Reason being, the callback can say "this event doesn't get this file" and
> return 0, which tells eventfs to skip this file.

So yes, there seems to be a bug there, in that ctx->pos is only
updated for successful callbacks (and not for "ignored entry").

But that just means that you should always update 'ctx->pos' as you
'continue' the loop.

The logical place to do that would be in the for-loop itself, which
actually is very natural for the simple case, ie you should just do

        for (i = 0; i < ei->nr_entries; i++, ctx->pos++) {

but in the list_for_each_entry_srcu() case the 'update' part of the
for-loop isn't actually accessible, so it would have to be at the
'continue' point(s).

Which is admittedly a bit annoying.

Looking at that I'm actually surprised that I don't recall that we'd
have hit that issue with our 'for_each_xyz()' loops before.

The update for our "for_each_xyz()" helpers are all hardcoded to just
do the "next iterator" thing, and there's no nice way to take
advantage of the normal for-loop semantics of "do this at the end of
the loop"

            Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership
  2024-01-04 19:35 93%             ` Linus Torvalds
@ 2024-01-04 20:02 99%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-04 20:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Al Viro, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Christian Brauner, linux-fsdevel,
	Greg Kroah-Hartman, Jonathan Corbet, linux-doc

On Thu, 4 Jan 2024 at 11:35, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>>
> Which is *NOT* the inode, because the 'struct file' has other things
> in it (the file position, the permissions that were used at open time
> etc, close-on-exec state etc etc).

That close-on-exec thing was a particularly bad example of things that
are in the 'struct file', because it's in fact the only thing that
*isn't* in 'struct file' and is associated directly with the 'int fd'.

But hopefully the intent was clear despite me picking a particularly
bad example.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership
  @ 2024-01-04 19:35 93%             ` Linus Torvalds
  2024-01-04 20:02 99%               ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-04 19:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Al Viro, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Christian Brauner, linux-fsdevel,
	Greg Kroah-Hartman, Jonathan Corbet, linux-doc

On Thu, 4 Jan 2024 at 11:14, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> "file descriptor" - is just what maps to a specific inode.

Nope. Technically and traditionally, file descriptor is just the
integer index that is used to look up a 'struct file *'.

Except in the kernel, we really just tend to use that term (well, I
do) for the 'struct file *' itself, since the integer 'fd' is usually
not really relevant except at the system call interface.

Which is *NOT* the inode, because the 'struct file' has other things
in it (the file position, the permissions that were used at open time
etc, close-on-exec state etc etc).

> "file description" - is how the file is accessed (position in the file and
>                         flags associated to how it was opened)

That's a horrible term that shouldn't be used at all. Apparently some
people use it for what is our 'struct file *", also known as a "file
table entry".  Avoid it.

If anything, just use "fd" for the integer representation, and "file"
for the pointer to a 'struct file".

But most of the time the two are conceptually interchangeable, in that
an 'fd' just translates directly to a 'struct file *'.

Note that while there's that conceptual direct translation, there's
also very much a "time of use" issue, in that a "fd -> file"
translation happens at one particular time and in one particular user
context, and then it's *done* (so closing and possibly re-using the fd
after it's been looked up does not actually affect an existing 'struct
file *').

And while 'fd -> file' lookup is quick and common, the other way
doesn't exist, because multiple 'fd's can map to one 'struct file *'
thanks to dup() (and 'fork()', since a 'fd -> file' translation always
happens within the context of a particular user space, an 'fd' in one
process is obviously not the same as an 'fd' in another one).

               Linus

^ permalink raw reply	[relevance 93%]

* Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership
  @ 2024-01-04 19:21 92%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-04 19:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Al Viro, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Christian Brauner, linux-fsdevel,
	Greg Kroah-Hartman, Jonathan Corbet, linux-doc

On Thu, 4 Jan 2024 at 11:09, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> My mistake was thinking that the dentry was attached more to the path than
> the inode. But that doesn't seem to be the case. I wasn't sure if there was
> a way to get to a dentry from the inode.

Yeah, so dentry->inode and path->dentry are one-way translations,
because the other way can have multiple different cases.

IOW, a path will specify *one* dentry, and a dentry will specily *one*
inode, but one inode can be associated with multiple dentries, and
there may be other undiscovered dentries that *would* point to it but
aren't even cached right now.

And a single dentry can be part of multiple paths, thanks to bind mounts.

The "inode->i_dentry" list is *not* a way to look up all dentries,
because - as mentioned - there may be potential other paths (and thus
other dentries) that lead to the same inode that just haven't been
looked up yet (or that have already been aged out of the cache).

Of course any *particular* filesystem may not have hard links (so one
inode has only one possible dentry), and you may not have bind mounts,
and it might be one of the virtual filesystems where everything is
always in memory, so none of the above problems are guaranteed to be
the case in any *particular* situation.

But it's all part of why the dcache is actually really subtle. It's
not just the RCU lookup rules and the specialized locking (both
reflock and the rather complicated rules about d_lock ordering), it's
also that whole "yeah, the filesystem only sees a 'dentry', but
because of bind mounts the vfs layer actually does things internally
in terms of 'struct path' in order to be able to then show that single
fiolesystem in multiple places".

Etc etc.

There's a reason Al Viro ends up owning the dcache. Nobody else can
wrap their tiny little minds around it all.

               Linus

^ permalink raw reply	[relevance 92%]

* Re: [git pull] drm fixes for 6.8
  @ 2024-01-04 18:50 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-04 18:50 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Daniel Vetter, dri-devel, LKML

On Wed, 3 Jan 2024 at 18:30, Dave Airlie <airlied@gmail.com> wrote:
>
> These were from over the holiday period, mainly i915, a couple of
> qaic, bridge and an mgag200.
>
> I have a set of nouveau fixes that I'll send after this, that might be
> too rich for you at this point.
>
> I expect there might also be some more regular fixes before 6.8, but
> they should be minor.

I'm assuming you're just confused about the numbering, and meant 6.7
here and in the subject line.

This seems to be too small of a pull to be an early pull request for
the 6.8 merge window.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-04 18:46 97%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-04 18:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ajay Kaher, Al Viro, Christian Brauner

On Thu, 4 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>         list_for_each_entry_srcu(ei_child, &ei->children, list,
>                                  srcu_read_lock_held(&eventfs_srcu)) {
> +
> +               if (c > 0) {
> +                       c--;
> +                       continue;
>                 }

Thanks for putting that at the top, I really do think it's not just
more efficient, but "more correct" too - ie if some entry that *used*
to exist and was previously counted by 'pos' went away, it's actually
*better* to count it again if we still see it, in order to not skip
subsequent entries that haven't been seen..

And that very fact actually makes me wonder:

>         for (i = 0; i < ei->nr_entries; i++) {
> +               void *cdata = ei->data;
> +
> +               if (c > 0) {
> +                       c--;
> +                       continue;
> +               }

The 'ei->nr_entries' things are in a stable array, so the indexing for
them cannot change (ie even if "is_freed" were to be set the array is
still stable).

So I wonder if - just from a 'pos' iterator stability standpoint - you
should change the tracefs directory iterator to always start with the
non-directory entries in ei->entries[]?

That way, even if concurrent dynamic add/remove events might change
the 'ei->children' list, it could never cause an 'ei->entry[]' to
disappear (or be returned twice).

This is very nitpicky and I doubt it matters, because I doubt the
whole "ls on a tracefs directory while changing it" case matters, but
I thought I'd mention it.

              Linus

^ permalink raw reply	[relevance 97%]

* Re: [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership
  @ 2024-01-04 18:38 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-04 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Al Viro, Christian Brauner, Greg Kroah-Hartman

On Thu, 4 Jan 2024 at 08:46, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Instead of walking the dentries on mount/remount to update the gid values of
> all the dentries if a gid option is specified on mount, just update the root
> inode. Add .getattr, .setattr, and .permissions on the tracefs inode
> operations to update the permissions of the files and directories.

Looks mostly good, thanks. This may add more lines than it removes,
but the lines it adds are *much* simpler than the removed ones.

I don't understand why you do those odd TRACEFS_INSTANCE_INODE games.
That seems entirely new functionality. The old'set_gid()' thing did
none of that, and just forced everything to new gid values.

IOW, this seems entirely random. I *suspect* that you have just tried
to retain some odd random semantics that happened to be the result of
a random implementation detail that came out of the dentry tree not
necessarily being fully populated by the time you did the remount.

So this seems wrong.

             Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-03 22:17 99%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 22:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Steven Rostedt, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Christian Brauner, linux-fsdevel,
	Greg Kroah-Hartman

On Wed, 3 Jan 2024 at 14:14, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Jan 03, 2024 at 01:54:36PM -0800, Linus Torvalds wrote:
>
> > Again: UNTESTED, and meant as a "this is another way to avoid messing
> > with the dentry tree manually, and just using the VFS interfaces we
> > already have"
>
> That would break chown(), though.

Right,. That's why I had that note about

   So take this as a "this might work", but it probably needs a bit more
   work - eventfs_set_attr() should set some bit in the inode to say
   "these have been set manually", and then revalidate would say "I'll
   not touch inodes that have that bit set".

and how my example patch overrides things a bit *too* aggressively.

That said, I think the patch that Steven just sent out is the right
direction to go regardless. The d_revalidate() thing was literally
just a "we can do this many different ways".

                     Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-03 22:14 97%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 22:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Al Viro, Christian Brauner, linux-fsdevel, Greg Kroah-Hartman

On Wed, 3 Jan 2024 at 14:04, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I actually have something almost working too. Here's the WIP. It only works
> for tracefs, and now eventfs needs to be updated as the "events" directory
> no longer has the right ownership. So I need a way to link the eventfs
> entries to use the tracefs default conditionally.

Ack. So the ->getattr() and ->permission() thing is a bit more
targeted to "look at modes", and is probably better just for that
reason.

Doing it in d_revalidate() is a bit hacky, and impacts path lookup a
bit even when not necessary. But it's still a lot less evil than
walking the dentry tree manually.

So that d_revalidate() patch was more of a "I think you can make it
smaller by just hooking in at this layer"). So d_revalidate ends up
with a smaller patch, I think, but it has the problem that now you
*have* to be able to deal with things in RCU context.

In contrast, doing it in ->getattr() and ->permission() ends up
meaning you can use sleeping locks etc if you need to serialize, for
example. So it's a bit more specific to the whole issue of "deal with
modes and owndership", but it is *also* a bit more flexible in how you
can then do it.

Anyway, your patch looks fine from a quick scan.

                  Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-03 21:54 80%         ` Linus Torvalds
  @ 2024-01-03 22:06 99%           ` Linus Torvalds
    2 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Al Viro, Christian Brauner, linux-fsdevel, Greg Kroah-Hartman

On Wed, 3 Jan 2024 at 13:54, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Here's an updated patch that builds, and is PURELY AN EXAMPLE.

Oh, and while doing this patch, I found another bug in tracefs,
although it happily is one that doesn't have any way to trigger.

Tracefs has code like this:

        if (dentry->d_inode->i_mode & S_IFDIR) {

and that's very wrong. S_IFDIR is not a bitmask, it's a value that is
part of S_IFMT.

The reason this bug doesn't have any way to trigger is that I think
tracefs can only have S_IFMT values of S_IFDIR and S_IFREG, and those
happen to not have any bits in common, so doing it as a bit test is
wrong, but happens to work.

The test *should* be done as

        if (S_ISDIR(dentry->d_inode->i_mode)) {

(note "IS" vs "IF" - not the greatest user experience ever, but hey,
it harkens back to Ye Olden Times).

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v2 08/11] tty: splice_read: disable
  @ 2024-01-03 21:57 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 21:57 UTC (permalink / raw)
  To: Oliver Giles
  Cc: Jiri Slaby, Ahelenia Ziemiańska, Jens Axboe,
	Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, linux-kernel, linux-serial

On Wed, 3 Jan 2024 at 13:34, Oliver Giles <ohw.giles@gmail.com> wrote:
>
> I'm happy to report that that particular SSL VPN tool is no longer
> around.

Ahh, well that simplifies things and we can then just remove the tty
splice support again.

Of course, maybe then somebody else will report on some other odd
user, but ... fingers crossed.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-03 19:57 98%       ` Linus Torvalds
@ 2024-01-03 21:54 80%         ` Linus Torvalds
                               ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 21:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Al Viro, Christian Brauner, linux-fsdevel, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

On Wed, 3 Jan 2024 at 11:57, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Or, you know, you could do what I've told you to do at least TEN TIMES
> already, which is to not mess with any of this, and just implement the
> '->permission()' callback (and getattr() to just make 'ls' look sane
> too, rather than silently saying "we'll act as if gid is set right,
> but not show it").

Actually, an even simpler option might be to just do this all at
d_revalidate() time.

Here's an updated patch that builds, and is PURELY AN EXAMPLE. I think
it "works", but it currently always resets the inode mode/uid/gid
unconditionally, which is wrong - it should not do so if the inode has
been manually set.

So take this as a "this might work", but it probably needs a bit more
work - eventfs_set_attr() should set some bit in the inode to say
"these have been set manually", and then revalidate would say "I'll
not touch inodes that have that bit set".

Or something.

Anyway, this patch is nwo relative to your latest pull request, so it
has the check for dentry->d_inode in set_gid() (and still removes the
whole function).

Again: UNTESTED, and meant as a "this is another way to avoid messing
with the dentry tree manually, and just using the VFS interfaces we
already have"

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 5197 bytes --]

 fs/tracefs/inode.c | 147 ++++++++++-------------------------------------------
 1 file changed, 26 insertions(+), 121 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index bc86ffdb103b..5bc9e1a23a31 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -183,87 +183,6 @@ struct tracefs_fs_info {
 	struct tracefs_mount_opts mount_opts;
 };
 
-static void change_gid(struct dentry *dentry, kgid_t gid)
-{
-	if (!dentry->d_inode)
-		return;
-	dentry->d_inode->i_gid = gid;
-}
-
-/*
- * Taken from d_walk, but without he need for handling renames.
- * Nothing can be renamed while walking the list, as tracefs
- * does not support renames. This is only called when mounting
- * or remounting the file system, to set all the files to
- * the given gid.
- */
-static void set_gid(struct dentry *parent, kgid_t gid)
-{
-	struct dentry *this_parent;
-	struct list_head *next;
-
-	this_parent = parent;
-	spin_lock(&this_parent->d_lock);
-
-	change_gid(this_parent, gid);
-repeat:
-	next = this_parent->d_subdirs.next;
-resume:
-	while (next != &this_parent->d_subdirs) {
-		struct tracefs_inode *ti;
-		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
-		next = tmp->next;
-
-		/* Note, getdents() can add a cursor dentry with no inode */
-		if (!dentry->d_inode)
-			continue;
-
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-
-		change_gid(dentry, gid);
-
-		/* If this is the events directory, update that too */
-		ti = get_tracefs(dentry->d_inode);
-		if (ti && (ti->flags & TRACEFS_EVENT_INODE))
-			eventfs_update_gid(dentry, gid);
-
-		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, _RET_IP_);
-			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
-			goto repeat;
-		}
-		spin_unlock(&dentry->d_lock);
-	}
-	/*
-	 * All done at this level ... ascend and resume the search.
-	 */
-	rcu_read_lock();
-ascend:
-	if (this_parent != parent) {
-		struct dentry *child = this_parent;
-		this_parent = child->d_parent;
-
-		spin_unlock(&child->d_lock);
-		spin_lock(&this_parent->d_lock);
-
-		/* go into the first sibling still alive */
-		do {
-			next = child->d_child.next;
-			if (next == &this_parent->d_subdirs)
-				goto ascend;
-			child = list_entry(next, struct dentry, d_child);
-		} while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
-		rcu_read_unlock();
-		goto resume;
-	}
-	rcu_read_unlock();
-	spin_unlock(&this_parent->d_lock);
-	return;
-}
-
 static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
 {
 	substring_t args[MAX_OPT_ARGS];
@@ -315,49 +234,12 @@ static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
 	return 0;
 }
 
-static int tracefs_apply_options(struct super_block *sb, bool remount)
-{
-	struct tracefs_fs_info *fsi = sb->s_fs_info;
-	struct inode *inode = d_inode(sb->s_root);
-	struct tracefs_mount_opts *opts = &fsi->mount_opts;
-	umode_t tmp_mode;
-
-	/*
-	 * On remount, only reset mode/uid/gid if they were provided as mount
-	 * options.
-	 */
-
-	if (!remount || opts->opts & BIT(Opt_mode)) {
-		tmp_mode = READ_ONCE(inode->i_mode) & ~S_IALLUGO;
-		tmp_mode |= opts->mode;
-		WRITE_ONCE(inode->i_mode, tmp_mode);
-	}
-
-	if (!remount || opts->opts & BIT(Opt_uid))
-		inode->i_uid = opts->uid;
-
-	if (!remount || opts->opts & BIT(Opt_gid)) {
-		/* Set all the group ids to the mount option */
-		set_gid(sb->s_root, opts->gid);
-	}
-
-	return 0;
-}
-
 static int tracefs_remount(struct super_block *sb, int *flags, char *data)
 {
-	int err;
 	struct tracefs_fs_info *fsi = sb->s_fs_info;
 
 	sync_filesystem(sb);
-	err = tracefs_parse_options(data, &fsi->mount_opts);
-	if (err)
-		goto fail;
-
-	tracefs_apply_options(sb, true);
-
-fail:
-	return err;
+	return tracefs_parse_options(data, &fsi->mount_opts);
 }
 
 static int tracefs_show_options(struct seq_file *m, struct dentry *root)
@@ -399,8 +281,33 @@ static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
 	iput(inode);
 }
 
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct tracefs_fs_info *fsi = dentry->d_sb->s_fs_info;
+	struct tracefs_mount_opts *opts = &fsi->mount_opts;
+	struct inode *inode;
+
+	rcu_read_lock();
+	inode = d_inode_rcu(dentry);
+	if (inode) {
+		if (opts->opts & BIT(Opt_mode)) {
+			umode_t tmp_mode;
+			tmp_mode = READ_ONCE(inode->i_mode) & ~S_IALLUGO;
+			tmp_mode |= opts->mode;
+			WRITE_ONCE(inode->i_mode, tmp_mode);
+		}
+		if (opts->opts & BIT(Opt_uid))
+			inode->i_uid = opts->uid;
+		if (opts->opts & BIT(Opt_gid))
+			inode->i_gid = opts->gid;
+	}
+	rcu_read_unlock();
+	return 0;
+}
+
 static const struct dentry_operations tracefs_dentry_operations = {
 	.d_iput = tracefs_dentry_iput,
+	.d_revalidate = tracefs_d_revalidate,
 };
 
 static int trace_fill_super(struct super_block *sb, void *data, int silent)
@@ -427,8 +334,6 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_op = &tracefs_super_operations;
 	sb->s_d_op = &tracefs_dentry_operations;
 
-	tracefs_apply_options(sb, false);
-
 	return 0;
 
 fail:

^ permalink raw reply related	[relevance 80%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-03 19:57 98%       ` Linus Torvalds
  2024-01-03 21:54 80%         ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-03 19:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Al Viro, Christian Brauner, linux-fsdevel, Greg Kroah-Hartman

On Wed, 3 Jan 2024 at 11:52, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> This doesn't work because for tracefs (not eventfs) the dentries are
> created at boot up and before the file system is mounted. This means you
> can't even set a gid in /etc/fstab. This will cause a regression.

Which is why I suggested

   "I think the whole thing was triggered by commit 49d67e445742, and
    maybe the fix is to just revert that commit"

there was never any coherent reason for that commit, since the
permissions are dealt with at the mount point.

So this all was triggered by that original change that makes little
sense. The fact that you then apparently changed other things
afterwards too might need fixing.

Or, you know, you could do what I've told you to do at least TEN TIMES
already, which is to not mess with any of this, and just implement the
'->permission()' callback (and getattr() to just make 'ls' look sane
too, rather than silently saying "we'll act as if gid is set right,
but not show it").

Why do you keep bringing up things that I've told you solutions for many times?

                 Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH v2 08/11] tty: splice_read: disable
  @ 2024-01-03 19:14 99%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-03 19:14 UTC (permalink / raw)
  To: Jiri Slaby, Oliver Giles
  Cc: Ahelenia Ziemiańska, Jens Axboe, Christian Brauner,
	Alexander Viro, linux-fsdevel, Greg Kroah-Hartman, linux-kernel,
	linux-serial

On Wed, 3 Jan 2024 at 03:36, Jiri Slaby <jirislaby@kernel.org> wrote:
>
> What are those "things" doing that "splice to tty", I don't recall and
> the commit message above ^^^ does not spell that out. Linus?

It's some annoying SSL VPN thing that splices to pppd:

   https://lore.kernel.org/all/C8KER7U60WXE.25UFD8RE6QZQK@oguc/

and I'd be happy to try to limit splice to tty's to maybe just the one
case that pppd uses.

So I don't think we should remove splice_write for tty's entirely, but
maybe we can limit it to only the case that the VPN thing used.

I never saw the original issue personally, and I think only Oliver
reported it, so cc'ing Oliver.

Maybe that VPN thing already has the pty in non-blocking mode, for
example, and we could make the tty splicing fail for any blocking op?

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-03 19:04 97%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 19:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers

On Wed, 3 Jan 2024 at 10:50, Steven Rostedt <rostedt@goodmis.org> wrote:
> I think these changes are a bit much for -rc8, don't you?

Oh, absolutely.

None of this is rc8 material apart from the oops fix in your pull
request (which my patch that then removes the whole function did *not*
have - so that patch won't apply as-is to your tree).

But let's aim for a tracefs that doesn't play games with the dcache.

Basically, the dcache is *much* too subtle for a filesystem to mess
with. You should either:

 - be a fully virtual filesystem where the dcache just maintains
everything, and you don't mess with it because you don't need to (eg
tmpfs etc). Everything is in the dcache, and you don't need to touch
it, because you don't even care - the dcache is doing everything for
you.

 - be a "normal" filesystem where the dcache is just a cache, and you
maintain your *own* file data structures, and just get normal lookup
etc ops, and you don't mess with the dcache because it is just doing
its caching thing that you as a filesystem don't care about.

and in both of those cases the filesystem just never has to really
delve into it. But tracefs had this abomination where it maintained
its own data structures, _and_ it tried to make them coherent with the
dcache that maintained part of it. That's the part I hated.

               Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  2024-01-03 18:12 95% ` Linus Torvalds
@ 2024-01-03 18:38 87%   ` Linus Torvalds
      0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 2024-01-03 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

On Wed, 3 Jan 2024 at 10:12, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Much better. Now eventfs looks more like a real filesystem, and less
> like an eldritch horror monster that is parts of dcache tackled onto a
> pseudo-filesystem.

Oh, except I think you still need to just remove the 'set_gid()' mess.

It's disgusting and it's wrong, and it's not even what the 'uid'
option does (it only sets the root inode uid).

If you remount the filesystem with different gid values, you get to
keep both broken pieces. And if it isn't a remount, then setting the
root uid is sufficient.

I think the whole thing was triggered by commit 49d67e445742, and
maybe the fix is to just revert that commit.

That commit makes no sense in general, since the default mounting
position for tracefs that the kernel sets up is only accessible to
root anyway.

Alternatively, just do the ->permissions() thing, and allow access to
the group in the mount options.

Getting rid of set_gid() would be this attached lovely patch:

 fs/tracefs/inode.c | 83 ++----------------------------------------------------
 1 file changed, 2 insertions(+), 81 deletions(-)

and would get rid of the final (?) piece of disgusting dcache hackery
that tracefs most definitely should not have.

             Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2965 bytes --]

 fs/tracefs/inode.c | 83 ++----------------------------------------------------
 1 file changed, 2 insertions(+), 81 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 62524b20964e..a22253037e3e 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -183,83 +183,6 @@ struct tracefs_fs_info {
 	struct tracefs_mount_opts mount_opts;
 };
 
-static void change_gid(struct dentry *dentry, kgid_t gid)
-{
-	if (!dentry->d_inode)
-		return;
-	dentry->d_inode->i_gid = gid;
-}
-
-/*
- * Taken from d_walk, but without he need for handling renames.
- * Nothing can be renamed while walking the list, as tracefs
- * does not support renames. This is only called when mounting
- * or remounting the file system, to set all the files to
- * the given gid.
- */
-static void set_gid(struct dentry *parent, kgid_t gid)
-{
-	struct dentry *this_parent;
-	struct list_head *next;
-
-	this_parent = parent;
-	spin_lock(&this_parent->d_lock);
-
-	change_gid(this_parent, gid);
-repeat:
-	next = this_parent->d_subdirs.next;
-resume:
-	while (next != &this_parent->d_subdirs) {
-		struct tracefs_inode *ti;
-		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
-		next = tmp->next;
-
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-
-		change_gid(dentry, gid);
-
-		/* If this is the events directory, update that too */
-		ti = get_tracefs(dentry->d_inode);
-		if (ti && (ti->flags & TRACEFS_EVENT_INODE))
-			eventfs_update_gid(dentry, gid);
-
-		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, _RET_IP_);
-			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
-			goto repeat;
-		}
-		spin_unlock(&dentry->d_lock);
-	}
-	/*
-	 * All done at this level ... ascend and resume the search.
-	 */
-	rcu_read_lock();
-ascend:
-	if (this_parent != parent) {
-		struct dentry *child = this_parent;
-		this_parent = child->d_parent;
-
-		spin_unlock(&child->d_lock);
-		spin_lock(&this_parent->d_lock);
-
-		/* go into the first sibling still alive */
-		do {
-			next = child->d_child.next;
-			if (next == &this_parent->d_subdirs)
-				goto ascend;
-			child = list_entry(next, struct dentry, d_child);
-		} while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
-		rcu_read_unlock();
-		goto resume;
-	}
-	rcu_read_unlock();
-	spin_unlock(&this_parent->d_lock);
-	return;
-}
-
 static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
 {
 	substring_t args[MAX_OPT_ARGS];
@@ -332,10 +255,8 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
 	if (!remount || opts->opts & BIT(Opt_uid))
 		inode->i_uid = opts->uid;
 
-	if (!remount || opts->opts & BIT(Opt_gid)) {
-		/* Set all the group ids to the mount option */
-		set_gid(sb->s_root, opts->gid);
-	}
+	if (!remount || opts->opts & BIT(Opt_gid))
+		inode->i_gid = opts->gid;
 
 	return 0;
 }

^ permalink raw reply related	[relevance 87%]

* Re: [PATCH] eventfs: Stop using dcache_readdir() for getdents()
  @ 2024-01-03 18:12 95% ` Linus Torvalds
  2024-01-03 18:38 87%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2024-01-03 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mathieu Desnoyers

On Wed, 3 Jan 2024 at 07:24, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Instead, just have eventfs have its own iterate_shared callback function
> that will fill in the dent entries. This simplifies the code quite a bit.

Much better. Now eventfs looks more like a real filesystem, and less
like an eldritch horror monster that is parts of dcache tackled onto a
pseudo-filesystem.

However, one request, and one nit:

> Also, remove the "lookup" parameter to the create_file/dir_dentry() and
> always have it return a dentry that has its ref count incremented, and
> have the caller call the dput. This simplifies that code as well.

Can you please do that as a separate patch, where the first patch just
cleans up the directory iteration, and the second patch then goes "now
there are no more callers that have the 'lookup' argument set to
false".

Because as-is, the patch is kind of two things mixed up.

The small nit is this:

> +static int eventfs_iterate(struct file *file, struct dir_context *ctx)
>  {
> +       /*
> +        * Need to create the dentries and inodes to have a consistent
> +        * inode number.
> +        */
>         list_for_each_entry_srcu(ei_child, &ei->children, list,
>                                  srcu_read_lock_held(&eventfs_srcu)) {
> -               d = create_dir_dentry(ei, ei_child, parent, false);
> -               if (d) {
> -                       ret = add_dentries(&dentries, d, cnt);
> -                       if (ret < 0)
> -                               break;
> -                       cnt++;
> +
> +               if (ei_child->is_freed)
> +                       continue;
> +
> +               name = ei_child->name;
> +
> +               dentry = create_dir_dentry(ei, ei_child, ei_dentry);
> +               if (!dentry)
> +                       goto out;
> +               ino = dentry->d_inode->i_ino;
> +               dput(dentry);
> +
> +               if (c > 0) {
> +                       c--;
> +                       continue;
>                 }

Just move this "is the position before this name" up to the top of the
loop. Even above the "is_freed" part.

Let's just always count all the entries in the child list.

And same for the ei->nr_entries loop:

>         for (i = 0; i < ei->nr_entries; i++) {

where there's no point in creating that dentry just to look up the
inode number, only to then decide "Oh, we already iterated past this
part, so let's not do anything with it".

This wouldn't seem to matter much with a big enough getdents buffer
(which is the normal user level behavior), but it actually does,
because we don't keep track of "we have read to the end of the
directory".

So every readdir ends up effectively doing getdents at least twice:
once to read the directory entries, and then once to just be told
"that was all".

End result: you should strive very hard to *not* waste time on the
directory entries that have already been read, and are less than
'ctx->pos'.

             Linus

^ permalink raw reply	[relevance 95%]

* Linux 6.7-rc8
@ 2023-12-31 21:04 86% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-31 21:04 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So as expected, pretty much nothing happened over the holiday week.
We've got literally just 45 files changed, and almost a third of those
files aren't even kernel code (ie things like selftests, scripting,
Kconfig and maintainer file updates). And some of the rest is
prep-work and cleanups for future (real) changes.

But we do have a couple of real fixes in there, and I suspect we'll
get a few more next week as people come back from their food-induced
torpor.

So rc8 is mostly just a placeholder, and a "I do rc's each week,
whether they matter or not". Shortlog appended for completeness.

And hey, regardless of whether all you peeps are interested in testing
another rc or not, here's to hoping you all had a good 2023, and
wishes for an even better 2024!

                     Linus

---

Alvin Šipraga (2):
      get_maintainer: correctly parse UTF-8 encoded names in files
      get_maintainer: remove stray punctuation when cleaning file emails

Andy Shevchenko (2):
      MAINTAINERS: Remove Andy from GPIO maintainers
      MAINTAINERS: Add a missing file to the INTEL GPIO section

Arnd Bergmann (2):
      kexec: fix KEXEC_FILE dependencies
      kexec: select CRYPTO from KEXEC_FILE instead of depending on it

Baokun Li (1):
      mm/filemap: avoid buffered read/write race to read inconsistent data

Bartosz Golaszewski (1):
      MAINTAINERS: split out the uAPI into a new section

Charan Teja Kalla (1):
      mm: migrate high-order folios in swap cache correctly

Christoph Hellwig (1):
      block: renumber QUEUE_FLAG_HW_WC

Coly Li (1):
      badblocks: avoid checking invalid range in badblocks_check()

David E. Box (3):
      platform/x86/intel/pmc: Add suspend callback
      platform/x86/intel/pmc: Allow reenabling LTRs
      platform/x86/intel/pmc: Move GBE LTR ignore to suspend callback

David Laight (3):
      locking/osq_lock: Move the definition of optimistic_spin_node
into osq_lock.c
      locking/osq_lock: Clarify osq_wait_next() calling convention
      locking/osq_lock: Clarify osq_wait_next()

Edward Adam Davis (1):
      keys, dns: Fix missing size check of V1 server-list header

Helge Deller (2):
      linux/export: Fix alignment for 64-bit ksymtab entries
      linux/export: Ensure natural alignment of kcrctab array

Jialu Xu (1):
      gen_compile_commands.py: fix path resolve with symlinks in it

Kent Overstreet (4):
      bcachefs: fix BCH_FSCK_ERR enum
      bcachefs: Fix insufficient disk reservation with compression + snapshots
      bcachefs: Fix leakage of internal error code
      bcachefs: Fix promotes

Linus Torvalds (1):
      Linux 6.7-rc8

Masahiro Yamada (1):
      kbuild: fix build ID symlinks to installed debug VDSO files

Matthew Wilcox (Oracle) (4):
      mm/memory-failure: pass the folio and the page to collect_procs()
      mm/memory-failure: check the mapcount of the precise page
      mm/memory-failure: cast index to loff_t before shifting it
      mailmap: add an old address for Naoya Horiguchi

Muhammad Usama Anjum (1):
      selftests: secretmem: floor the memory size to the multiple of page_size

Namjae Jeon (1):
      ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()

Nathan Chancellor (1):
      MAINTAINERS: Add scripts/clang-tools to Kbuild section

Nico Pache (1):
      kunit: kasan_test: disable fortify string checker on kmalloc_oob_memset

Shin'ichiro Kawasaki (1):
      platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe

Sidhartha Kumar (1):
      maple_tree: do not preallocate nodes for slot stores

Stefan Hajnoczi (1):
      virtio_blk: fix snprintf truncation compiler warning

Steven Rostedt (Google) (4):
      eventfs: Fix file and directory uid and gid ownership
      ring-buffer: Fix wake ups when buffer_percent is set to 100
      tracing: Fix blocked reader of snapshot buffer
      ftrace: Fix modification of direct_function hash while in use

Xuan Zhuo (1):
      virtio_ring: fix syncs DMA memory with different direction

^ permalink raw reply	[relevance 86%]

* Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.
  2023-12-30 20:41 87%   ` Linus Torvalds
@ 2023-12-30 20:59 95%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-30 20:59 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, peterz, longman, mingo, will, boqun.feng,
	xinhui.pan, virtualization, Zeng Heng

On Sat, 30 Dec 2023 at 12:41, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> UNTESTED patch to just do the "this_cpu_write()" parts attached.
> Again, note how we do end up doing that this_cpu_ptr conversion later
> anyway, but at least it's off the critical path.

Also note that while 'this_cpu_ptr()' doesn't exactly generate lovely
code, it really is still better than caching a value in memory.

At least the memory location that 'this_cpu_ptr()' accesses is
slightly more likely to be hot (and is right next to the cpu number,
iirc).

That said, I think we should fix this_cpu_ptr() to not ever generate
that disgusting cltq just because the cpu pointer has the wrong
signedness. I don't quite know how to do it, but this:

  -#define per_cpu_offset(x) (__per_cpu_offset[x])
  +#define per_cpu_offset(x) (__per_cpu_offset[(unsigned)(x)])

at least helps a *bit*. It gets rid of the cltq, at least, but if
somebody actually passes in an 'unsigned long' cpuid, it would cause
an unnecessary truncation.

And gcc still generates

        subl    $1, %eax        #, cpu_nr
        addq    __per_cpu_offset(,%rax,8), %rcx

instead of just doing

        addq    __per_cpu_offset-8(,%rax,8), %rcx

because it still needs to clear the upper 32 bits and doesn't know
that the 'xchg()' already did that.

Oh well. I guess even without the -1/+1 games by the OSQ code, we
would still end up with a "movl" just to do that upper bits clearing
that the compiler doesn't know is unnecessary.

I don't think we have any reasonable way to tell the compiler that the
register output of our xchg() inline asm has the upper 32 bits clear.

              Linus

^ permalink raw reply	[relevance 95%]

* Re: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.
  @ 2023-12-30 20:41 87%   ` Linus Torvalds
  2023-12-30 20:59 95%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-30 20:41 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, peterz, longman, mingo, will, boqun.feng,
	xinhui.pan, virtualization, Zeng Heng

[-- Attachment #1: Type: text/plain, Size: 3129 bytes --]

On Fri, 29 Dec 2023 at 12:57, David Laight <David.Laight@aculab.com> wrote:
>
> this_cpu_ptr() is rather more expensive than raw_cpu_read() since
> the latter can use an 'offset from register' (%gs for x86-84).
>
> Add a 'self' field to 'struct optimistic_spin_node' that can be
> read with raw_cpu_read(), initialise on first call.

No, this is horrible.

The problem isn't the "this_cpu_ptr()", it's the rest of the code.

>  bool osq_lock(struct optimistic_spin_queue *lock)
>  {
> -       struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> +       struct optimistic_spin_node *node = raw_cpu_read(osq_node.self);

No. Both of these are crap.

>         struct optimistic_spin_node *prev, *next;
>         int old;
>
> -       if (unlikely(node->cpu == OSQ_UNLOCKED_VAL))
> -               node->cpu = encode_cpu(smp_processor_id());
> +       if (unlikely(!node)) {
> +               int cpu = encode_cpu(smp_processor_id());
> +               node = decode_cpu(cpu);
> +               node->self = node;
> +               node->cpu = cpu;
> +       }

The proper fix here is to not do that silly

        node = this_cpu_ptr(&osq_node);
        ..
        node->next = NULL;

dance at all, but to simply do

        this_cpu_write(osq_node.next, NULL);

in the first place. That makes the whole thing just a single store off
the segment descriptor.

Yes, you'll eventually end up doing that

        node = this_cpu_ptr(&osq_node);

thing because it then wants to use that raw pointer to do

        WRITE_ONCE(prev->next, node);

but that's a separate issue and still does not make it worth it to
create a pointless self-pointer.

Btw, if you *really* want to solve that separate issue, then make the
optimistic_spin_node struct not contain the pointers at all, but the
CPU numbers, and then turn those numbers into the pointers the exact
same way it does for the "lock->tail" thing, ie doing that whole

        prev = decode_cpu(old);

dance. That *may* then result in avoiding turning them into pointers
at all in some cases.

Also, I think that you might want to look into making OSQ_UNLOCKED_VAL
be -1 instead, and add something like

  #define IS_OSQ_UNLOCKED(x) ((int)(x)<0)

and that would then avoid the +1 / -1 games in encoding/decoding the
CPU numbers. It causes silly code generated like this:

        subl    $1, %eax        #, cpu_nr
...
        cltq
        addq    __per_cpu_offset(,%rax,8), %rcx

which seems honestly stupid. The cltq is there for sign-extension,
which is because all these things are "int", and the "subl" will
zero-extend to 64-bit, not sign-extend.

At that point, I think gcc might be able to just generate

        addq    __per_cpu_offset-8(,%rax,8), %rcx

but honestly, I think it would be nicer to just have decode_cpu() do

        unsigned int cpu_nr = encoded_cpu_val;

        return per_cpu_ptr(&osq_node, cpu_nr);

and not have the -1/+1 at all.

Hmm?

UNTESTED patch to just do the "this_cpu_write()" parts attached.
Again, note how we do end up doing that this_cpu_ptr conversion later
anyway, but at least it's off the critical path.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1083 bytes --]

 kernel/locking/osq_lock.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 75a6f6133866..c3a166b7900c 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -92,14 +92,14 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 
 bool osq_lock(struct optimistic_spin_queue *lock)
 {
-	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
+	struct optimistic_spin_node *node;
 	struct optimistic_spin_node *prev, *next;
 	int curr = encode_cpu(smp_processor_id());
 	int old;
 
-	node->locked = 0;
-	node->next = NULL;
-	node->cpu = curr;
+	this_cpu_write(osq_node.next, NULL);
+	this_cpu_write(osq_node.locked, 0);
+	this_cpu_write(osq_node.cpu, curr);
 
 	/*
 	 * We need both ACQUIRE (pairs with corresponding RELEASE in
@@ -112,7 +112,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		return true;
 
 	prev = decode_cpu(old);
-	node->prev = prev;
+	this_cpu_write(osq_node.prev, prev);
+
+	node = this_cpu_ptr(&osq_node);
 
 	/*
 	 * osq_lock()			unqueue

^ permalink raw reply related	[relevance 87%]

* Re: [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code
      @ 2023-12-30 19:40 99% ` Linus Torvalds
  2 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-30 19:40 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, peterz, longman, mingo, will, boqun.feng,
	xinhui.pan, virtualization, Zeng Heng

On Fri, 29 Dec 2023 at 12:52, David Laight <David.Laight@aculab.com> wrote:
>
> David Laight (5):
>   Move the definition of optimistic_spin_node into osf_lock.c
>   Clarify osq_wait_next()

I took these two as preparatory independent patches, with that
osq_wait_next() clarification split into two.

I also did the renaming that Waiman asked for.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH next 3/5] locking/osq_lock: Clarify osq_wait_next()
  @ 2023-12-29 22:54 97%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-29 22:54 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, peterz, longman, mingo, will, boqun.feng,
	xinhui.pan, virtualization, Zeng Heng

On Fri, 29 Dec 2023 at 12:56, David Laight <David.Laight@aculab.com> wrote:
>
> osq_wait_next() is passed 'prev' from osq_lock() and NULL from osq_unlock()
> but only needs the 'cpu' value to write to lock->tail.
> Just pass prev->cpu or OSQ_UNLOCKED_VAL instead.
>
> Also directly return NULL or 'next' instead of breaking the loop.

Please split these two totally independent things out of the patch,
just to make things much more obvious.

I like the new calling convention, but I don't like how the patch
isn't obviously just that.

In fact, I'd take your patch #1 and just the calling convention change
from #3 as "these are obviously not changing anything at all, only
moving things to more local places".

I'd also take the other part of #3 as a "clearly doesn't change
anything" but it should be a separate patch, and it should be done
differently: make 'next' be local to just *inside* the for-loop (in
fact, make it local to the if-statement that sets it), to clarify the
whole thing that it can never be non-NULL at the top of the loop, and
can never have any long-term semantics.

The other parts actually change some logic, and would need the OSQ
people to take a more serious look.

            Linus

^ permalink raw reply	[relevance 97%]

* Re: [GIT PULL] hotfixes for 6.7
  @ 2023-12-28  0:36 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-28  0:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mm-commits, linux-mm, linux-kernel

On Wed, 27 Dec 2023 at 15:03, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Baokun Li (1):
>       mm/filemap: avoid buffered read/write race to read inconsistent data

Hmm. I wonder if we should have made the i_size_read/write helpers be
smp_load_acquire/store_release()?

The existing smp_wmb() are almost accidental, and aren't primarily
about the inode size, but about the page/folio uptodate bit. I guess
they work, but it's all a bit messy.

Which might *also* be better off with acquire/release, but we don't
have those bitops, I guess. Oh well.

             Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.7-rc7
@ 2023-12-24  0:42 48% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-24  0:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Normally I do rc releases on a Sunday afternoon, but since tomorrow is
Xmas Eve, and the festivities will have started (or at least I'll be
driving to the store a few times for everything that we forgot - not a
year has passed without _some_ last-minute "Oh, we need ..."), I'm
doing rc7 on a Saturday instead.

As as I already mentioned in an earlier email or two, while things
look fine and we *could* release a final 6.7 next weekend as per the
usual schedule, I'm not going to do that. It's the holidays, lots of
people have already been off for a week or more, and plan on being off
for the upcoming week (or more).

So next weekend is going to be rc8, and I expect that it will be small
as nobody should be around.

And then we might get back to a more normal schedule the week after. Maybe.

Anyway, rc7 itself looks fairly normal. It's actually a bit bigger
than rc6 was, but not hugely so, and nothing in here looks at all
strange. Please do give it a whirl if you have the time and the
energy, but let's face it, I expect things to be very quiet and this
to be one of those "nothing happens" weeks. Because even if you aren't
celebrating this time of year, you might take advantage of the peace
and quiet.

              Linus

---

Alex Lu (1):
      Bluetooth: Add more enc key size check

Alexander Atanasov (1):
      scsi: core: Always send batch on reset or error handling command

Alexis Lothoré (1):
      pinctrl: at91-pio4: use dedicated lock class for IRQ

Alper Ak (1):
      USB: serial: option: add Quectel EG912Y module support

Alvin Lee (1):
      drm/amd/display: Revert " drm/amd/display: Use channel_width = 2
for vram table 3.0"

Amir Goldstein (1):
      ovl: fix dentry reference leak after changes to underlying layers

Andrew Davis (1):
      ARM: dts: dra7: Fix DRA7 L3 NoC node register size

Andrew Jones (1):
      KVM: riscv: selftests: Fix get-reg-list print_reg defaults

Andy Gospodarek (1):
      bnxt_en: do not map packet buffers twice

Ankit Nautiyal (1):
      drm/i915/display: Get bigjoiner config before dsc config during readout

Arnd Bergmann (2):
      Bluetooth: hci_event: shut up a false-positive warning
      x86/xen: add CPU dependencies for 32-bit build

Avraham Stern (1):
      wifi: iwlwifi: pcie: avoid a NULL pointer dereference

Benjamin Bigler (1):
      spi: spi-imx: correctly configure burst length when using dma

Bjorn Andersson (1):
      interconnect: qcom: icc-rpm: Fix peak rate calculation

Can Guo (1):
      scsi: ufs: core: Let the sq_lock protect sq_tail_slot access

Carolina Jubran (1):
      net/mlx5e: XDP, Drop fragmented packets larger than MTU size

ChanWoo Lee (1):
      scsi: ufs: qcom: Return ufs_qcom_clk_scale_*() errors in
ufs_qcom_clk_scale_notify()

Charlene Liu (1):
      drm/amd/display: get dprefclk ss info from integration info table

Charles Keepax (1):
      ASoC: cs42l43: Don't enable bias sense during type detect

Chen-Yu Tsai (1):
      wifi: cfg80211: Add my certificate

Chris Mi (1):
      net/mlx5e: Decrease num_block_tc when unblock tc offload

Christoffer Sandberg (1):
      Input: soc_button_array - add mapping for airplane mode button

Chuck Lever (3):
      NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d
      NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0
      SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806

Chukun Pan (1):
      arm64: dts: allwinner: h616: update emac for Orange Pi Zero 3

Clément Villeret (1):
      ALSA: hda/realtek: Add quirk for ASUS ROG GV302XA

Curtis Malainey (1):
      ASoC: SOF: mediatek: mt8186: Revert Add Google Steelix topology compatible

Dan Carpenter (3):
      net/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()
      net/mlx5e: Fix error codes in alloc_branch_attr()
      usb: fotg210-hcd: delete an incorrect bounds test

Dan Williams (1):
      driver core: Add a guard() definition for the device_lock()

Daniel Golle (1):
      net: phy: skip LED triggers on PHYs on SFP modules

Daniel Hill (1):
      bcachefs: improve modprobe support by providing softdeps

Dave Ertman (1):
      ice: alter feature support check for SRIOV and LAG

David Ahern (1):
      net/ipv6: Revert remove expired routes with a separated list of routes

David Howells (5):
      afs: Fix the dynamic root's d_delete to always delete unused dentries
      afs: Fix dynamic root lookup DNS check
      keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry
      afs: Fix overwriting of result of DNS query
      afs: Fix use-after-free due to get/remove race in volume tree

David Lechner (1):
      iio: triggered-buffer: prevent possible freeing of wrong buffer

Dinghao Liu (1):
      net/mlx5e: fix a potential double-free in fs_udp_create_groups

Edward Adam Davis (1):
      wifi: mac80211: check if the existing link config remains unchanged

Eric Dumazet (3):
      net: sched: ife: fix potential use-after-free
      net/rose: fix races in rose_kill_by_device()
      net: check dev->gso_max_size in gso_features_check()

Esther Shimanovich (1):
      Input: i8042 - add nomux quirk for Acer P459-G2-M

Fabio Estevam (1):
      dt-bindings: nvmem: mxs-ocotp: Document fsl,ocotp

Fedor Pchelkin (1):
      net: 9p: avoid freeing uninit memory in p9pdu_vreadf

Felix Fietkau (1):
      wifi: mt76: fix crash with WED rx support enabled

Frédéric Danis (1):
      Bluetooth: L2CAP: Send reject on command corrupted request

Geert Uytterhoeven (1):
      reset: Fix crash when freeing non-existent optional resets

Geliang Tang (2):
      selftests: mptcp: join: fix subflow_send_ack lookup
      mailmap: add entries for Geliang Tang

George Stark (1):
      iio: adc: meson: add separate config for axg SoC family

Gergo Koteles (2):
      ALSA: hda/tas2781: select program 0, conf 0 by default
      ASoC: tas2781: check the validity of prm_no/cfg_no

Ghanshyam Agrawal (1):
      kselftest: alsa: fixed a print formatting warning

Gil Fine (1):
      thunderbolt: Fix minimum allocated USB 3.x and PCIe bandwidth

Guilherme G. Piccoli (1):
      HID: nintendo: Prevent divide-by-zero on code

Haibo Chen (1):
      iio: adc: imx93: add four channels for imx93 adc

Hamza Mahfooz (1):
      drm/amd/display: disable FPO and SubVP for older DMUB versions on DCN32x

Hangbin Liu (1):
      kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail

Hans de Goede (3):
      Input: atkbd - skip ATKBD_CMD_GETID in translated mode
      ASoC: Intel: bytcr_rt5640: Add quirk for the Medion Lifetab S10346
      ASoC: Intel: bytcr_rt5640: Add new swapped-speakers quirk

Haoran Liu (1):
      Input: ipaq-micro-keys - add error handling for devm_kmemdup

Heiko Carstens (2):
      s390/vx: fix save/restore of fpu kernel context
      s390: update defconfigs

Herve Codina (1):
      lib/vsprintf: Fix %pfwf when current node refcount == 0

Hyunwoo Kim (1):
      Bluetooth: af_bluetooth: Fix Use-After-Free in bt_sock_recvmsg

Imre Deak (1):
      drm/i915/mtl: Fix HDMI/DP PLL clock selection

Ivan Vecera (1):
      i40e: Fix ST code value for Clause 45

JP Kobryn (1):
      9p: prevent read overrun in protocol dump tracepoint

Jacob Keller (1):
      ice: stop trashing VF VSI aggregator node ID information

Jan Kara (1):
      bcachefs: Fix determining required file handle length

Javier Carrasco (3):
      iio: common: ms_sensors: ms_sensors_i2c: fix humidity conversion
time table
      iio: tmag5273: fix temperature offset
      iio: adc: MCP3564: fix calib_bias and calib_scale range checks

Jensen Huang (1):
      i2c: rk3x: fix potential spinlock recursion on poll

Jeremie Knuesel (1):
      ALSA: usb-audio: Increase delay in MOTU M quirk

Jerome Brunet (1):
      ASoC: hdmi-codec: fix missing report for jack initial status

Jianbo Liu (1):
      net/mlx5e: Fix overrun reported by coverity

Jijie Shao (1):
      net: hns3: add new maintainer for the HNS3 ethernet driver

Jiri Olsa (1):
      bpf: Add missing BPF_LINK_TYPE invocations

Johan Hovold (1):
      usb: typec: ucsi: fix gpio-based orientation detection

Johannes Berg (8):
      wifi: ieee80211: don't require protected vendor action frames
      wifi: iwlwifi: pcie: add another missing bh-disable for rxq->lock
      wifi: mac80211: don't re-add debugfs during reconfig
      wifi: mac80211: check defragmentation succeeded
      wifi: mac80211: mesh: check element parsing succeeded
      wifi: mac80211: mesh_plink: fix matches_local logic
      wifi: cfg80211: fix certs build to not depend on file order
      debugfs: initialize cancellations earlier

John Fastabend (2):
      bpf: syzkaller found null ptr deref in unix_bpf proto add
      bpf: sockmap, test for unconnected af_unix sock

Jose Ignacio Tornos Martinez (1):
      net: usb: ax88179_178a: avoid failed operations when device is
disconnected

Josip Pavic (1):
      drm/amd/display: dereference variable before checking for zero

José Pekkarinen (1):
      Input: psmouse - enable Synaptics InterTouch for ThinkPad L14 G1

Karthik Poosa (1):
      drm/i915/hwmon: Fix static analysis tool reported issues

Keith Busch (1):
      Revert "nvme-fc: fix race between error recovery and creating association"

Kent Gibson (1):
      gpiolib: cdev: add gpio_device locking wrapper around gpio_ioctl()

Kent Overstreet (5):
      bcachefs: Fix nocow locks deadlock
      bcachefs: print explicit recovery pass message only once
      bcachefs: btree_node_u64s_with_format() takes nr keys
      bcachefs; guard against overflow in btree node split
      bcachefs: Fix bch2_alloc_sectors_start_trans() error handling

Konrad Dybcio (1):
      interconnect: qcom: sm8250: Enable sync_state

Krzysztof Kozlowski (1):
      reset: hisilicon: hi6220: fix Wvoid-pointer-to-enum-cast warning

Kunwu Chan (1):
      ARM: OMAP2+: Fix null pointer dereference and memory leak in
omap_soc_device_init

Lai Peter Jun Ann (1):
      net: stmmac: fix incorrect flag check in timestamp interrupt

Larysa Zaremba (1):
      ice: Fix PF with enabled XDP going no-carrier after reset

Linus Torvalds (2):
      posix-timers: Get rid of [COMPAT_]SYS_NI() uses
      Linux 6.7-rc7

Liu Jian (2):
      net: check vlan filter feature in vlan_vids_add_by_dev() and
vlan_vids_del_by_dev()
      selftests: add vlan hw filter tests

Lorenzo Bianconi (1):
      net: ethernet: mtk_wed: fix possible NULL pointer dereference in
mtk_wed_wo_queue_tx_clean()

Louis Chauvet (1):
      spi: atmel: Fix clock issue when using devices with different polarities

Luca Weiss (1):
      Input: xpad - add Razer Wolverine V2 support

Luiz Augusto von Dentz (3):
      Bluetooth: Fix not notifying when connection encryption changes
      Bluetooth: hci_event: Fix not checking if HCI_OP_INQUIRY has been sent
      Bluetooth: hci_core: Fix hci_conn_hash_lookup_cis

Macpaul Lin (1):
      arm64: dts: mediatek: mt8395-genio-1200-evk: add
interrupt-parent for mt6360

Marc Zyngier (5):
      KVM: arm64: vgic: Simplify kvm_vgic_destroy()
      KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()
      KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy
      KVM: arm64: vgic: Ensure that slots_lock is held in
vgic_register_all_redist_iodevs()
      KVM: Convert comment into an assertion in kvm_io_bus_register_dev()

Mario Limonciello (5):
      pinctrl: amd: Mask non-wake source pins with interrupt enabled at suspend
      platform/x86/amd/pmc: Move platform defines to header
      platform/x86/amd/pmc: Only run IRQ1 firmware version check on Cezanne
      platform/x86/amd/pmc: Move keyboard wakeup disablement detection
to pmc-quirks
      platform/x86/amd/pmc: Disable keyboard wakeup on AMD Framework 13

Marius Cristea (1):
      iio: adc: MCP3564: fix hardware identification logic

Mark Glover (1):
      USB: serial: ftdi_sio: update Actisense PIDs constant names

Martin K. Petersen (1):
      Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"

Matthew Wilcox (Oracle) (1):
      ida: Fix crash in ida_free when the bitmap is empty

Matthieu Baerts (1):
      mptcp: fill in missing MODULE_DESCRIPTION()

Matti Vaittinen (1):
      iio: kx022a: Fix acceleration value scaling

Maurizio Lombardi (1):
      nvme-pci: fix sleeping function called from interrupt context

Michael Roth (1):
      KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests

Michal Schmidt (1):
      ice: fix theoretical out-of-bounds access in ethtool link modes

Mike Snitzer (2):
      dm audit: fix Kconfig so DM_AUDIT depends on BLK_DEV_DM
      MAINTAINERS: remove stale info for DEVICE-MAPPER

Mike Tipton (1):
      interconnect: Treat xlate() returning NULL node as an error

Mikulas Patocka (1):
      dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata()

Miquel Raynal (3):
      spi: atmel: Do not cancel a transfer upon any signal
      spi: atmel: Drop unused defines
      spi: atmel: Prevent spi transfers from being killed

Moshe Shemesh (1):
      net/mlx5: Fix fw tracer first block check

Nam Cao (3):
      pinctrl: starfive: jh7110: ignore disabled device tree nodes
      pinctrl: starfive: jh7100: ignore disabled device tree nodes
      spi: cadence: revert "Add SPI transfer delays"

NeilBrown (2):
      nfsd: call nfsd_last_thread() before final nfsd_put()
      nfsd: hold nfsd_mutex across entire netlink operation

Nitin Rawat (1):
      scsi: ufs: core: Store min and max clk freq from OPP table

Nuno Sa (2):
      iio: imu: adis16475: add spi_device_id table
      iio: imu: adis16475: use bit numbers in assign_bit()

Oliver Upton (1):
      KVM: selftests: Ensure sysreg-defs.h is generated at the expected path

Paolo Abeni (1):
      mptcp: fix inconsistent state on fastopen race

Paolo Bonzini (1):
      KVM: selftests: Fix dynamic generation of configuration names

Patrick Rudolph (3):
      pinctrl: cy8c95x0: Fix typo
      pinctrl: cy8c95x0: Fix regression
      pinctrl: cy8c95x0: Fix get_pincfg

Paulo Alcantara (5):
      smb: client: fix OOB in cifsd when receiving compounded resps
      smb: client: fix OOB in SMB2_query_info_init()
      smb: client: fix OOB in smbCalcSize()
      smb: client: fix potential OOB in cifs_dump_detail()
      smb: client: fix potential OOB in smb2_dump_detail()

Pavel Kozlov (1):
      ARC: add hugetlb definitions

Philip Yang (1):
      drm/amdkfd: svm range always mapped flag not working on APU

Quan Nguyen (1):
      i2c: aspeed: Handle the coalesced stop conditions with the start
conditions.

Rafał Miłecki (1):
      nvmem: brcm_nvram: store a copy of NVRAM content

Rahul Rameshbabu (2):
      net/mlx5e: Correct snprintf truncation handling for fw_version buffer
      net/mlx5e: Correct snprintf truncation handling for fw_version
buffer used by representors

Rajvi Jingar (1):
      platform/x86/intel/pmc: Fix hang in pmc_core_send_ltr_ignore()

Randy Dunlap (1):
      tracing/synthetic: fix kernel-doc warnings

Reinhard Speyerer (1):
      USB: serial: option: add Quectel RM500Q R13 firmware support

Ricardo Rivera-Matos (3):
      ASoC: cs35l45: Use modern pm_ops
      ASoC: cs35l45: Prevent IRQ handling when suspending/resuming
      ASoC: cs35l45: Prevents spinning during runtime suspend

Richard Fitzgerald (1):
      ASoC: Intel: soc-acpi-intel-mtl-match: Change CS35L56 prefixes to AMPn

Ronald Wahl (1):
      net: ks8851: Fix TX stall caused by TX buffer overrun

Rouven Czerwinski (1):
      net: rfkill: gpio: set GPIO direction

Ryan McClelland (1):
      HID: nintendo: fix initializer element is not constant error

Shengjiu Wang (1):
      ASoC: fsl_sai: Fix channel swap issue on i.MX8MP

Shifeng Li (2):
      net/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()
      net/mlx5e: Fix a race in command alloc flow

Shigeru Yoshida (1):
      net: Return error from sk_stream_wait_connect() if sk_wait_event() fails

Shyam Prasad N (2):
      cifs: fix a pending undercount of srv_count
      cifs: do not let cifs_chan_update_iface deallocate channels

Slark Xiao (1):
      USB: serial: option: add Foxconn T99W265 with new baseline

Srinivas Pandruvada (2):
      Revert "iio: hid-sensor-als: Add light chromaticity support"
      Revert "iio: hid-sensor-als: Add light color temperature support"

Stefan Binding (9):
      ALSA: hda: cs35l41: Add config table to support many laptops without _DSD
      ALSA: hda: cs35l41: Support additional ASUS ROG 2023 models
      ALSA: hda/realtek: Add quirks for ASUS ROG 2023 models
      ALSA: hda: cs35l41: Support additional ASUS Zenbook 2022 Models
      ALSA: hda/realtek: Add quirks for ASUS Zenbook 2022 Models
      ALSA: hda: cs35l41: Support additional ASUS Zenbook 2023 Models
      ALSA: hda/realtek: Add quirks for ASUS Zenbook 2023 Models
      ALSA: hda: cs35l41: Do not allow uninitialised variables to be freed
      ALSA: hda: cs35l41: Only add SPI CS GPIO if SPI is enabled in kernel

Steven Rostedt (Google) (3):
      ring-buffer: Fix slowpath of interrupted event
      eventfs: Have event files and directories default to parent uid and gid
      tracing / synthetic: Disable events after testing in
synth_event_gen_test_init()

Su Hui (1):
      iio: imu: inv_mpu6050: fix an error code problem in inv_mpu6050_read_raw

Suman Ghosh (1):
      octeontx2-pf: Fix graceful exit during PFC configuration failure

Tasos Sahanidis (1):
      usb-storage: Add quirk for incorrect WP on Kingston DT Ultimate 3.0 G3

Thomas Bertschinger (1):
      bcachefs: fix invalid memory access in bch2_fs_alloc() error path

Thomas Gleixner (4):
      x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
      x86/alternatives: Sync core before enabling interrupts
      x86/alternatives: Disable interrupts and sync when optimizing
NOPs in place
      x86/acpi: Handle bogus MADT APIC tables gracefully

Thomas Weißschuh (1):
      net: avoid build bug in skb extension length calculation

Tony Lindgren (2):
      bus: ti-sysc: Flush posted write only after srst_udelay
      ARM: dts: Fix occasional boot hang for am3 usb

Uwe Kleine-König (1):
      Input: amimouse - convert to platform remove callback returning void

Ville Syrjälä (2):
      drm/i915: Reject async flips with bigjoiner
      drm/i915/dmc: Don't enable any pipe DMC events

Vineet Gupta (5):
      ARC: entry: SAVE_ABI_CALLEE_REG: ISA/ABI specific helper
      ARC: entry: move ARCompact specific bits out of entry.h
      ARC: mm: retire support for aliasing VIPT D$
      ARC: fix spare error
      ARC: fix smatch warning

Vineeth Vijayan (1):
      s390/scm: fix virtual vs physical address confusion

Vishnu Sankar (1):
      platform/x86: thinkpad_acpi: fix for incorrect fan reporting on
some ThinkPad systems

Vlad Buslov (4):
      Revert "net/mlx5e: fix double free of encap_header in update funcs"
      Revert "net/mlx5e: fix double free of encap_header"
      net/mlx5e: fix double free of encap_header
      net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num

Vladimir Oltean (2):
      net: mscc: ocelot: fix eMAC TX RMON stats for bucket 256-511 and above
      net: mscc: ocelot: fix pMAC TX RMON stats for bucket 256-511 and above

Wadim Egorov (1):
      iio: adc: ti_am335x_adc: Fix return value check of tiadc_request_dma()

Wayne Lin (1):
      drm/amd/display: Add case for dcn35 to support usb4 dmub hpd event

Wei Yongjun (1):
      scsi: bnx2fc: Fix skb double free in bnx2fc_rcv()

Xiao Yao (1):
      Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE

Yang Yingliang (1):
      i2c: qcom-geni: fix missing clk_disable_unprepare() and
geni_se_resources_off()

Yaxiong Tian (1):
      thunderbolt: Fix memory leak in margining_port_remove()

Ying Hsu (1):
      Bluetooth: Fix deadlock in vhci_send_frame

Yong-Xuan Wang (1):
      RISCV: KVM: update external interrupt atomically for IMSIC swfile

Yu Kuai (1):
      dm-raid: delay flushing event_work() after reconfig_mutex is released

Yury Norov (1):
      net: mana: select PAGE_POOL

ZhenGuo Yin (1):
      drm/amdgpu: re-create idle bo's PTE during VM state machine reset

Zhipeng Lu (1):
      ethernet: atheros: fix a memleak in atl1e_setup_ring_resources

Zizhi Wo (1):
      fs: cifs: Fix atime update check

duanqiangwen (1):
      net: libwx: fix memory leak on free page

xiongxin (1):
      gpio: dwapb: mask/unmask IRQ when disable/enale it

^ permalink raw reply	[relevance 48%]

* Re: [GIT PULL] afs, dns: Fix dynamic root interaction with negative DNS
  @ 2023-12-23 19:14 88%   ` Linus Torvalds
    1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-23 19:14 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Howells, Markus Suvanto, Marc Dionne, Wang Lei,
	Jeff Layton, Steve French, Jarkko Sakkinen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-afs, keyrings,
	linux-cifs, linux-nfs, ceph-devel, netdev, linux-fsdevel,
	linux-kernel, Edward Adam Davis

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

On Sat, 23 Dec 2023 at 09:29, Simon Horman <horms@kernel.org> wrote:
>
>
>         if (data[0] == 0) {
>                 /* It may be a server list. */
> -               if (datalen <= sizeof(*bin))
> +               if (datalen <= sizeof(*v1))
>                         return -EINVAL;
>
>                 bin = (const struct dns_payload_header *)data;

Ugh, I hate how it checks the size of a *different* structure than the
one it then assigns the pointer to.

So I get the feeling that we should get rid of 'bin' entirely, and
just use the 'v1' pointer, since it literally checks that that is what
it is, and then the size check matches the thing we're casting things
to.

So then "bin->xyz" becomes "v1->hdr.xyz".

Yes, the patch becomes a bit bigger, but I think the end result is a
whole lot more obvious and maintainable.

I'd also move the remaining 'v1' variable declaration to the inner
context where it's actually used.

IOW, I personally would be much happier with a patch like the attached, but I

 (a) don't want to take credit for this, since my change is purely syntactic

 (b) have not tested this patch apart from checking that it compiles
in at least one config

so honestly, I'd love to see this patch come back to me with sign-offs
and tested-bys by the actual people who noticed this.

Hmm?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1751 bytes --]

 net/dns_resolver/dns_key.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/dns_resolver/dns_key.c b/net/dns_resolver/dns_key.c
index 2a6d363763a2..f18ca02aa95a 100644
--- a/net/dns_resolver/dns_key.c
+++ b/net/dns_resolver/dns_key.c
@@ -91,8 +91,6 @@ const struct cred *dns_resolver_cache;
 static int
 dns_resolver_preparse(struct key_preparsed_payload *prep)
 {
-	const struct dns_server_list_v1_header *v1;
-	const struct dns_payload_header *bin;
 	struct user_key_payload *upayload;
 	unsigned long derrno;
 	int ret;
@@ -103,27 +101,28 @@ dns_resolver_preparse(struct key_preparsed_payload *prep)
 		return -EINVAL;
 
 	if (data[0] == 0) {
+		const struct dns_server_list_v1_header *v1;
+
 		/* It may be a server list. */
-		if (datalen <= sizeof(*bin))
+		if (datalen <= sizeof(*v1))
 			return -EINVAL;
 
-		bin = (const struct dns_payload_header *)data;
-		kenter("[%u,%u],%u", bin->content, bin->version, datalen);
-		if (bin->content != DNS_PAYLOAD_IS_SERVER_LIST) {
+		v1 = (const struct dns_server_list_v1_header *)data;
+		kenter("[%u,%u],%u", v1->hdr.content, v1->hdr.version, datalen);
+		if (v1->hdr.content != DNS_PAYLOAD_IS_SERVER_LIST) {
 			pr_warn_ratelimited(
 				"dns_resolver: Unsupported content type (%u)\n",
-				bin->content);
+				v1->hdr.content);
 			return -EINVAL;
 		}
 
-		if (bin->version != 1) {
+		if (v1->hdr.version != 1) {
 			pr_warn_ratelimited(
 				"dns_resolver: Unsupported server list version (%u)\n",
-				bin->version);
+				v1->hdr.version);
 			return -EINVAL;
 		}
 
-		v1 = (const struct dns_server_list_v1_header *)bin;
 		if ((v1->status != DNS_LOOKUP_GOOD &&
 		     v1->status != DNS_LOOKUP_GOOD_WITH_BAD)) {
 			if (prep->expiry == TIME64_MAX)

^ permalink raw reply related	[relevance 88%]

* Re: [GIT PULL] tracing: Fix eventfs ownership again
  @ 2023-12-22 22:24 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-22 22:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Dongliang Cui, Hongyu Jin

On Fri, 22 Dec 2023 at 05:29, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> tracing: Fix eventfs owenrship

Instead of doing these daily pulls for fixes that fix the previous fix
that fixed another fix from a week ago, I'll just wait a few weeks and
maybe it will actually be right then.

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: A few more fixes for 6.7
  @ 2023-12-21 20:01 94%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-21 20:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Randy Dunlap, Alexander Graf

On Thu, 21 Dec 2023 at 11:27, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Luckily, that's easy to get to. All I need to do is:
>
> static void update_inode_attr(struct dentry *dentry, struct inode *inode,
>                               struct eventfs_attr *attr, umode_t mode)
> {
>         struct tracefs_fs_info *fsi = dentry->d_sb->s_fs_info;
>         struct tracefs_mount_opts *opts = &fsi->mount_opts;
>
>         /* Default the ownership to what it was mounted as */
>         inode->i_uid = opts->uid;
>         inode->i_gid = opts->gid;

I think you should add

>         inode->i_mode = mode;

to that "default setup", which not only makes things more consistent,
it also means that you can then remove it from here:

>         if (!attr) {
>                 inode->i_mode = mode;
>                 return;
>         }

.. and the 'else' side from here:

>         if (attr->mode & EVENTFS_SAVE_MODE)
>                 inode->i_mode = attr->mode & EVENTFS_MODE_MASK;
>         else
>                 inode->i_mode = mode;

and it all looks a lot more clear and obvious.

"Set things to default values, then if we have attr and the specific
fields are set in those attrs, update them".

Instead of having this odd "do one thing for git/uid, another for mode".

> > I still claim that the whole dynamic ftrace stuff was a huge mistake,
> > and that the real solution should always have been to just use one
> > single inode for every file (and use that 'attr' that you track and
> > the '->getattr()' callback to make them all *look* different to
> > users).
>
> Files now do not even have meta-data, and that saved 2 megs per trace
> instance. I only keep meta data for the directories. The files themselves
> are created via callback functions.

I bet that was basically *all* just the inodes.

The dentries take up very little space, and the fact that you didn't
keep the dentries around meant that you instead replaced them with
that 'struct eventfs_file' which probably takes up as much room as the
dentries ever did - and now when you use them, you obviously use
*more* memory since it duplicates the data in the dentries, including
the name etc.

So I bet you use *more* memory than if you just kept the dentry tree
around, and this dynamic creation has then caused a number of bugs and
a lot of extra complexity - things like having to re-implement your
own readdir() etc, much of which has been buggy.

And when you fix the resulting bugs, the end result is often
disgusting. I'm talking about things like commit ef36b4f92868
("eventfs: Remember what dentries were created on dir open"), which
does things like re-use file->private_data for two entirely different
things (is it a 'cursor' or a 'dlist'? Who can know? That thing makes
me gag).

Honestly, that was just one example of "that code does some truly ugly
things because the whole notion is mis-designed".

            Linus

^ permalink raw reply	[relevance 94%]

* Re: [PATCH] afs: Fix overwriting of result of DNS query
  @ 2023-12-21 18:01 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-21 18:01 UTC (permalink / raw)
  To: David Howells
  Cc: Anastasia Belova, Marc Dionne, linux-afs, linux-fsdevel,
	linux-kernel, lvc-project

On Thu, 21 Dec 2023 at 07:09, David Howells <dhowells@redhat.com> wrote:
>
> Could you apply this fix, please?

Ok, so this is just *annoying*.

Why did you send me this as a patch, and then *twenty minutes* later
you send me an AFS pull request that does *not* include this patch?

WTF?

I've applied this, but I'm really annoyed, because it really feels
like you went out of your way to just generate unnecessary noise and
pointless workflow churn.

It's not even like the pull request contained anything different. The
patch _and_ the pull request were both not just about AFS, but about
DNS issues in AFS.

Get your act together.

                    Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] tracing: A few more fixes for 6.7
  @ 2023-12-21 17:45 98% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-21 17:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers,
	Randy Dunlap, Alexander Graf

On Thu, 21 Dec 2023 at 07:26, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> - Fix eventfs files to inherit the ownership of its parent directory.
>   The dynamic creating of dentries in eventfs did not take into
>   account if the tracefs file system was mounted with a gid/uid,
>   and would still default to the gid/uid of root. This is a regression.

Honestly, this seems to still be entirely buggy. In fact, it looks
buggy in two different ways:

 (a) if 'attr' is NULL, none of this logic is triggered, and uid/gid
is still left as root despite the explicit mount options

 (b) if somebody has done a chown/gid on the directory, the new
dynamic creation logic seems to create any files inside that directory
with the new uid/gid.

Maybe (a) cannot happen, but that code in update_inode_attr() does
have a check for a NULL attr, so either it can happen, or that check
is bogus.

And (b) just looks messy.  Maybe you've disallowed chown/chgid on
tracefs, I didn't check. But why would it inherit the parent uid/gid?
That just doesn't seem to make any sense at all.

I still claim that the whole dynamic ftrace stuff was a huge mistake,
and that the real solution should always have been to just use one
single inode for every file (and use that 'attr' that you track and
the '->getattr()' callback to make them all *look* different to
users).

               Linus

^ permalink raw reply	[relevance 98%]

* Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation
  @ 2023-12-21  5:38 98%             ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-21  5:38 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Andrew Cooper, Borislav Petkov, kernel test robot,
	Thomas Gleixner, oe-lkp, lkp, linux-kernel, Dave Hansen,
	Kirill A. Shutemov, xen-devel

On Wed, 20 Dec 2023 at 15:40, Sami Tolvanen <samitolvanen@google.com> wrote:
>
> I tested the patch with the 0-day bot reproducer and it does fix the
> warning. My usual arm64 and riscv configs also seem to build and boot
> just fine.

Thanks. I've been running it on my machine too, and still don't see
anything wrong with it..

I suspect all sane people are already on xmas break, which explains
why people are being quiet. They _should_ be.

But since I'm not in that sane group, I decided to just bypass the
normal channels and apply it directly.

It really isn't all that critical, since I don't expect anybody to
actually disable the posix timer subsystem: I think the config
variable came out of the kernel minimization project, and it's
probably much more likely that people turn off CFI (particularly since
you afaik still need to build with clang to get it) than that they'd
turn off the posix timer support.

But I think it's a worthy cleanup of some messy system call macros, so
I wanted to put this behind us whether it truly matters or not.

            Linus

^ permalink raw reply	[relevance 98%]

* Re: [PATCH] wifi: brcmfmac: cfg80211: Use WSEC to set SAE password
  @ 2023-12-20  1:44 96%               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-20  1:44 UTC (permalink / raw)
  To: Hector Martin
  Cc: Kalle Valo, Daniel Berlin, Arend van Spriel, Arend van Spriel,
	Franky Lin, Hante Meuleman, SHA-cyfmac-dev-list, asahi,
	brcm80211-dev-list.pdl, linux-kernel, linux-wireless,
	David Airlie, Daniel Vetter

On Tue, 19 Dec 2023 at 16:06, Hector Martin <marcan@marcan.st> wrote:
>
> On 2023/12/19 23:42, Kalle Valo wrote:
> >
> > Why is it that every patch Hector submits seems to end up with flame
> > wars?

Well, I do think some of it is Hector's personality and forceful
approaches, but I do think part of it is also the area in question.

Because I do agree with Hector that..

> Just recently a patch was posted to remove the Infineon list from
> MAINTAINERS because that company cares so little they have literally
> stopped accepting emails from us. Meanwhile they are telling their
> customers that they do not recommend upstream brcmfmac and they should
> use their downstream driver [1].

Unquestionably broadcom is not helping maintain things, and I think it
should matter.

As Hector says, they point to their random driver dumps on their site
that you can't even download unless you are a "Broadcom community
member" or whatever, and hey - any company that works that way should
be seen as pretty much hostile to any actual maintenance and proper
development.

If Daniel and Hector are responsive to actual problem reports for the
changes they cause, I do think that should count a lot.

I don't think Cypress support should necessarily be removed (or marked
broken), but if the sae_password code already doesn't work, _that_
part certainly shouldn't hold things up?

Put another way: if we effectively don't have a driver maintainer that
can test things, and somebody is willing to step up, shouldn't we take
that person up on it?

                  Linus

^ permalink raw reply	[relevance 96%]

* Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation
  2023-12-19 20:17 99%       ` Linus Torvalds
@ 2023-12-19 23:15 71%         ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-19 23:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Borislav Petkov, kernel test robot, Thomas Gleixner, oe-lkp, lkp,
	linux-kernel, Dave Hansen, Kirill A. Shutemov, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

On Tue, 19 Dec 2023 at 12:17, Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> That said, I still think that just getting rid of this horrid special
> case for posix timers is the right thing, and we should just remove
> that SYS_NI() alias thing entirely.

IOW, something like the attached patch.

It's not extensively tested, but hey, the diffstat looks nice:

  arch/arm64/include/asm/syscall_wrapper.h |  4 ---
  arch/riscv/include/asm/syscall_wrapper.h |  5 ----
  arch/s390/include/asm/syscall_wrapper.h  | 13 +--------
  arch/x86/include/asm/syscall_wrapper.h   | 34 +++---------------------
  kernel/sys_ni.c                          | 14 ++++++++++
  kernel/time/posix-stubs.c                | 45 --------------------------------
  6 files changed, 19 insertions(+), 96 deletions(-)

and it builds in at least a *couple* of configurations, including with
CONFIG_POSIX_TIMERS disabled.

I did *not* check whether it might fix the warning, since I doubt my
user space would even boot without that posix timer support (actually,
honestly, because I'm just lazy and "it _looks_ fine to me" was the
main real thing).

But that SYS_NI() thing really does deserve to die, as it was purely
used as a hack for some random timer system calls.

Comments?

            Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 10456 bytes --]

 arch/arm64/include/asm/syscall_wrapper.h |  4 ---
 arch/riscv/include/asm/syscall_wrapper.h |  5 ----
 arch/s390/include/asm/syscall_wrapper.h  | 13 +--------
 arch/x86/include/asm/syscall_wrapper.h   | 34 +++---------------------
 kernel/sys_ni.c                          | 14 ++++++++++
 kernel/time/posix-stubs.c                | 45 --------------------------------
 6 files changed, 19 insertions(+), 96 deletions(-)

diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h
index d977713ec0ba..abb57bc54305 100644
--- a/arch/arm64/include/asm/syscall_wrapper.h
+++ b/arch/arm64/include/asm/syscall_wrapper.h
@@ -44,9 +44,6 @@
 		return sys_ni_syscall();						\
 	}
 
-#define COMPAT_SYS_NI(name) \
-	SYSCALL_ALIAS(__arm64_compat_sys_##name, sys_ni_posix_timers);
-
 #endif /* CONFIG_COMPAT */
 
 #define __SYSCALL_DEFINEx(x, name, ...)						\
@@ -81,6 +78,5 @@
 	}
 
 asmlinkage long __arm64_sys_ni_syscall(const struct pt_regs *__unused);
-#define SYS_NI(name) SYSCALL_ALIAS(__arm64_sys_##name, sys_ni_posix_timers);
 
 #endif /* __ASM_SYSCALL_WRAPPER_H */
diff --git a/arch/riscv/include/asm/syscall_wrapper.h b/arch/riscv/include/asm/syscall_wrapper.h
index 1d7942c8a6cb..eeec04b7dae6 100644
--- a/arch/riscv/include/asm/syscall_wrapper.h
+++ b/arch/riscv/include/asm/syscall_wrapper.h
@@ -46,9 +46,6 @@ asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *);
 		return sys_ni_syscall();						\
 	}
 
-#define COMPAT_SYS_NI(name) \
-	SYSCALL_ALIAS(__riscv_compat_sys_##name, sys_ni_posix_timers);
-
 #endif /* CONFIG_COMPAT */
 
 #define __SYSCALL_DEFINEx(x, name, ...)						\
@@ -82,6 +79,4 @@ asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *);
 		return sys_ni_syscall();					\
 	}
 
-#define SYS_NI(name) SYSCALL_ALIAS(__riscv_sys_##name, sys_ni_posix_timers);
-
 #endif /* __ASM_SYSCALL_WRAPPER_H */
diff --git a/arch/s390/include/asm/syscall_wrapper.h b/arch/s390/include/asm/syscall_wrapper.h
index 9286430fe729..35c1d1b860d8 100644
--- a/arch/s390/include/asm/syscall_wrapper.h
+++ b/arch/s390/include/asm/syscall_wrapper.h
@@ -63,10 +63,6 @@
 	cond_syscall(__s390x_sys_##name);				\
 	cond_syscall(__s390_sys_##name)
 
-#define SYS_NI(name)							\
-	SYSCALL_ALIAS(__s390x_sys_##name, sys_ni_posix_timers);		\
-	SYSCALL_ALIAS(__s390_sys_##name, sys_ni_posix_timers)
-
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)						\
 	long __s390_compat_sys##name(struct pt_regs *regs);				\
 	ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO);				\
@@ -85,15 +81,11 @@
 
 /*
  * As some compat syscalls may not be implemented, we need to expand
- * COND_SYSCALL_COMPAT in kernel/sys_ni.c and COMPAT_SYS_NI in
- * kernel/time/posix-stubs.c to cover this case as well.
+ * COND_SYSCALL_COMPAT in kernel/sys_ni.c to cover this case as well.
  */
 #define COND_SYSCALL_COMPAT(name)					\
 	cond_syscall(__s390_compat_sys_##name)
 
-#define COMPAT_SYS_NI(name)						\
-	SYSCALL_ALIAS(__s390_compat_sys_##name, sys_ni_posix_timers)
-
 #define __S390_SYS_STUBx(x, name, ...)						\
 	long __s390_sys##name(struct pt_regs *regs);				\
 	ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO);				\
@@ -124,9 +116,6 @@
 #define COND_SYSCALL(name)						\
 	cond_syscall(__s390x_sys_##name)
 
-#define SYS_NI(name)							\
-	SYSCALL_ALIAS(__s390x_sys_##name, sys_ni_posix_timers)
-
 #define __S390_SYS_STUBx(x, fullname, name, ...)
 
 #endif /* CONFIG_COMPAT */
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
index fd2669b1cb2d..21f9407be5d3 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -86,9 +86,6 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 		return sys_ni_syscall();				\
 	}
 
-#define __SYS_NI(abi, name)						\
-	SYSCALL_ALIAS(__##abi##_##name, sys_ni_posix_timers);
-
 #ifdef CONFIG_X86_64
 #define __X64_SYS_STUB0(name)						\
 	__SYS_STUB0(x64, sys_##name)
@@ -100,13 +97,10 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 #define __X64_COND_SYSCALL(name)					\
 	__COND_SYSCALL(x64, sys_##name)
 
-#define __X64_SYS_NI(name)						\
-	__SYS_NI(x64, sys_##name)
 #else /* CONFIG_X86_64 */
 #define __X64_SYS_STUB0(name)
 #define __X64_SYS_STUBx(x, name, ...)
 #define __X64_COND_SYSCALL(name)
-#define __X64_SYS_NI(name)
 #endif /* CONFIG_X86_64 */
 
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
@@ -120,13 +114,10 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 #define __IA32_COND_SYSCALL(name)					\
 	__COND_SYSCALL(ia32, sys_##name)
 
-#define __IA32_SYS_NI(name)						\
-	__SYS_NI(ia32, sys_##name)
 #else /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */
 #define __IA32_SYS_STUB0(name)
 #define __IA32_SYS_STUBx(x, name, ...)
 #define __IA32_COND_SYSCALL(name)
-#define __IA32_SYS_NI(name)
 #endif /* CONFIG_X86_32 || CONFIG_IA32_EMULATION */
 
 #ifdef CONFIG_IA32_EMULATION
@@ -135,8 +126,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
  * additional wrappers (aptly named __ia32_sys_xyzzy) which decode the
  * ia32 regs in the proper order for shared or "common" syscalls. As some
  * syscalls may not be implemented, we need to expand COND_SYSCALL in
- * kernel/sys_ni.c and SYS_NI in kernel/time/posix-stubs.c to cover this
- * case as well.
+ * kernel/sys_ni.c to cover this case as well.
  */
 #define __IA32_COMPAT_SYS_STUB0(name)					\
 	__SYS_STUB0(ia32, compat_sys_##name)
@@ -148,14 +138,10 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 #define __IA32_COMPAT_COND_SYSCALL(name)				\
 	__COND_SYSCALL(ia32, compat_sys_##name)
 
-#define __IA32_COMPAT_SYS_NI(name)					\
-	__SYS_NI(ia32, compat_sys_##name)
-
 #else /* CONFIG_IA32_EMULATION */
 #define __IA32_COMPAT_SYS_STUB0(name)
 #define __IA32_COMPAT_SYS_STUBx(x, name, ...)
 #define __IA32_COMPAT_COND_SYSCALL(name)
-#define __IA32_COMPAT_SYS_NI(name)
 #endif /* CONFIG_IA32_EMULATION */
 
 
@@ -175,13 +161,10 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 #define __X32_COMPAT_COND_SYSCALL(name)					\
 	__COND_SYSCALL(x64, compat_sys_##name)
 
-#define __X32_COMPAT_SYS_NI(name)					\
-	__SYS_NI(x64, compat_sys_##name)
 #else /* CONFIG_X86_X32_ABI */
 #define __X32_COMPAT_SYS_STUB0(name)
 #define __X32_COMPAT_SYS_STUBx(x, name, ...)
 #define __X32_COMPAT_COND_SYSCALL(name)
-#define __X32_COMPAT_SYS_NI(name)
 #endif /* CONFIG_X86_X32_ABI */
 
 
@@ -212,17 +195,12 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 
 /*
  * As some compat syscalls may not be implemented, we need to expand
- * COND_SYSCALL_COMPAT in kernel/sys_ni.c and COMPAT_SYS_NI in
- * kernel/time/posix-stubs.c to cover this case as well.
+ * COND_SYSCALL_COMPAT in kernel/sys_ni.c to cover this case as well.
  */
 #define COND_SYSCALL_COMPAT(name) 					\
 	__IA32_COMPAT_COND_SYSCALL(name)				\
 	__X32_COMPAT_COND_SYSCALL(name)
 
-#define COMPAT_SYS_NI(name)						\
-	__IA32_COMPAT_SYS_NI(name)					\
-	__X32_COMPAT_SYS_NI(name)
-
 #endif /* CONFIG_COMPAT */
 
 #define __SYSCALL_DEFINEx(x, name, ...)					\
@@ -243,8 +221,8 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
  * As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
  * obvious reasons, and passing struct pt_regs *regs to it in %rdi does not
  * hurt, we only need to re-define it here to keep the naming congruent to
- * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() and SYS_NI()
- * macros to work correctly.
+ * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() macro
+ * to work correctly.
  */
 #define SYSCALL_DEFINE0(sname)						\
 	SYSCALL_METADATA(_##sname, 0);					\
@@ -257,10 +235,6 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 	__X64_COND_SYSCALL(name)					\
 	__IA32_COND_SYSCALL(name)
 
-#define SYS_NI(name)							\
-	__X64_SYS_NI(name)						\
-	__IA32_SYS_NI(name)
-
 
 /*
  * For VSYSCALLS, we need to declare these three syscalls with the new
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index e1a6e3c675c0..9a846439b36a 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -201,6 +201,20 @@ COND_SYSCALL(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time32);
 COND_SYSCALL_COMPAT(recvmmsg_time64);
 
+/* Posix timer syscalls may be configured out */
+COND_SYSCALL(timer_create);
+COND_SYSCALL(timer_gettime);
+COND_SYSCALL(timer_getoverrun);
+COND_SYSCALL(timer_settime);
+COND_SYSCALL(timer_delete);
+COND_SYSCALL(clock_adjtime);
+COND_SYSCALL(getitimer);
+COND_SYSCALL(setitimer);
+COND_SYSCALL(alarm);
+COND_SYSCALL_COMPAT(timer_create);
+COND_SYSCALL_COMPAT(getitimer);
+COND_SYSCALL_COMPAT(setitimer);
+
 /*
  * Architecture specific syscalls: see further below
  */
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 828aeecbd1e8..9b6fcb8d85e7 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -17,40 +17,6 @@
 #include <linux/time_namespace.h>
 #include <linux/compat.h>
 
-#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
-/* Architectures may override SYS_NI and COMPAT_SYS_NI */
-#include <asm/syscall_wrapper.h>
-#endif
-
-asmlinkage long sys_ni_posix_timers(void)
-{
-	pr_err_once("process %d (%s) attempted a POSIX timer syscall "
-		    "while CONFIG_POSIX_TIMERS is not set\n",
-		    current->pid, current->comm);
-	return -ENOSYS;
-}
-
-#ifndef SYS_NI
-#define SYS_NI(name)  SYSCALL_ALIAS(sys_##name, sys_ni_posix_timers)
-#endif
-
-#ifndef COMPAT_SYS_NI
-#define COMPAT_SYS_NI(name)  SYSCALL_ALIAS(compat_sys_##name, sys_ni_posix_timers)
-#endif
-
-SYS_NI(timer_create);
-SYS_NI(timer_gettime);
-SYS_NI(timer_getoverrun);
-SYS_NI(timer_settime);
-SYS_NI(timer_delete);
-SYS_NI(clock_adjtime);
-SYS_NI(getitimer);
-SYS_NI(setitimer);
-SYS_NI(clock_adjtime32);
-#ifdef __ARCH_WANT_SYS_ALARM
-SYS_NI(alarm);
-#endif
-
 /*
  * We preserve minimal support for CLOCK_REALTIME and CLOCK_MONOTONIC
  * as it is easy to remain compatible with little code. CLOCK_BOOTTIME
@@ -158,18 +124,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
 				 which_clock);
 }
 
-#ifdef CONFIG_COMPAT
-COMPAT_SYS_NI(timer_create);
-#endif
-
-#if defined(CONFIG_COMPAT) || defined(CONFIG_ALPHA)
-COMPAT_SYS_NI(getitimer);
-COMPAT_SYS_NI(setitimer);
-#endif
-
 #ifdef CONFIG_COMPAT_32BIT_TIME
-SYS_NI(timer_settime32);
-SYS_NI(timer_gettime32);
 
 SYSCALL_DEFINE2(clock_settime32, const clockid_t, which_clock,
 		struct old_timespec32 __user *, tp)

^ permalink raw reply related	[relevance 71%]

* Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation
  @ 2023-12-19 20:17 99%       ` Linus Torvalds
  2023-12-19 23:15 71%         ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-19 20:17 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Borislav Petkov, kernel test robot, Thomas Gleixner, oe-lkp, lkp,
	linux-kernel, Dave Hansen, Kirill A. Shutemov, xen-devel

On Tue, 19 Dec 2023 at 11:15, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> -asmlinkage long sys_ni_posix_timers(void);
> +asmlinkage long sys_ni_posix_timers(const struct pt_regs *regs);

I don't think it should be asmlinkage. That means "use legacy asm
calling conventions", and for x86-32 that means pass on stack. Which I
don't think these actually are.

I think it's an old artefect, and it doesn't matter for something that
doesn't take any arguments, but when you add an argument it's actively
wrong.

Of course, that argument isn't _used_, so it still doesn't matter, but
if the point is to use the right prototype, I think we should just
make it be

    long sys_ni_posix_timers(const struct pt_regs *regs);

although I think Sami's suggestion is probably nicer.

That said, I still think that just getting rid of this horrid special
case for posix timers is the right thing, and we should just remove
that SYS_NI() alias thing entirely.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation
  @ 2023-12-19 18:20 90%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-19 18:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kernel test robot, Thomas Gleixner, oe-lkp, lkp, linux-kernel,
	Dave Hansen, Kirill A. Shutemov, xen-devel

On Tue, 19 Dec 2023 at 01:58, Borislav Petkov <bp@alien8.de> wrote:
>
> Looking at the dmesg, I think you missed the most important part - the
> preceding line:
>
> [   13.480504][   T48] CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
>                         ^^^^^^^^^^^

So I think the issue here is that sys_ni_posix_timers is just linker
alias that is used for any non-implemented posix timer system call.

See:

  #define __SYS_NI(abi, name)                                             \
        SYSCALL_ALIAS(__##abi##_##name, sys_ni_posix_timers);

and this all worked fine when the actual call to this was done in
assembly code that happily just called that function directly and
didn't care about any argument types.

But commit be5341eb0d43 ("x86/entry: Convert INT 0x80 emulation to
IDTENTRY") moved that call from assembly into C, and in the process
ended up enabling CFI for it all, and now the compiler will check that
the function types match. Which they don't, because we use that dummy
function (I don't think they do in general).

I don't know what the best fix is. Either CFI should be turned off for
that call, or we should make sure to generate those NI system calls
with the proper types.

The asm didn't care - as long as the function put -ENOSYS in %rax, it
did the right thing - but the kCFI stuff means that the C code now
cares (and checks) that prototypes etc really match.

Maybe we should just get rid of SYS_NI() _entirely_.

I think the only user is the posix-timers stuff, and everything else
uses COND_SYSCALL(), which actually *generates* all the proper weak
functions with all the proper function signatures, instead of playing
around with linker aliases that don't have them.

Afaik, the only reason the posix timers do that odd alias is because
they want to have that

        pr_err_once("process %d (%s) attempted a POSIX timer syscall "
                    "while CONFIG_POSIX_TIMERS is not set\n",
                    current->pid, current->comm);

which I don't think is really worth it. It goes back to 2016 when the
posix timers subsystem became configurable, and I doubt it is worth it
any more (and it was probably of dubious use even at the time).

But I've not had anything to do with the low-level kCFI stuff, and
I'll leave it to Thomas whether that SYS_NI() mess should just be
removed.

I do like the notion of just removing SYS_NI entirely, replacing it
with the standard COND_SYSCALL() thing (and same for the COMPAT
variables, of course).

Thomas?

               Linus

^ permalink raw reply	[relevance 90%]

* Re: [PATCH 0/5] replace magic numbers in GDT descriptors
  @ 2023-12-19 17:33 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-19 17:33 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, Brian Gerst, Peter Zijlstra

On Tue, 19 Dec 2023 at 07:12, Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
> Vegard Nossum (5):
>   x86: provide new infrastructure for GDT descriptors
>   x86: replace magic numbers in GDT descriptors, part 1
>   x86: replace magic numbers in GDT descriptors, part 2
>   x86: always set A (accessed) flag in GDT descriptors
>   x86: add DB flag to 32-bit percpu GDT entry

All these patches look fine to me, but I will again leave it to the
x86 maintainers whether they want to apply them. But feel free to add
my Ack if y ou do.

The end result does look a *lot* more legible, with something like

   DESC_DATA64 | DESC_USER

instead of just a raw number like 0xc0f3.

So while this is unlikely to be a maintenance burden (since we look at
these things so seldom, and they never really change), I think it's a
nice readability improvement.

The fact that Vegard found two oddities while doing this series just
reinforces that readability issue. Neither of them were bugs, but they
were odd inconsistencies.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: pull-request: bpf-next 2023-12-18
  @ 2023-12-19  1:17 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-19  1:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, davem, edumazet, pabeni, daniel, andrii,
	peterz, brauner, netdev, bpf, kernel-team, linux-kernel

On Mon, 18 Dec 2023 at 16:55, Jakub Kicinski <kuba@kernel.org> wrote:
>
> LGTM, but what do I know about file systems.. Adding LKML to the CC
> list, if anyone has any late comments on the BPF token come forward
> now, petty please?

See my crossed email reply.

The file descriptor handling is FUNDAMENTALLY wrong. The first time
that happened, we chalked it up to a mistake. Now it's something
worse.

Please don't pull until at least that part is fixed.

I tried to review the token patches, but honestly, I got to that part
and I just gave up.

We had this whole discussion more than 6 months ago:

  https://lore.kernel.org/all/20230517-allabendlich-umgekehrt-8cc81f8313ac@brauner/

and I really thought the bpf people had *understood* they their
special use of "fd == 0" was wrong.

But it seems that they never did. Once is a mistake. Twice is a
choice. And the bpf people have chosen insanity.

               Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.7-rc6
@ 2023-12-17 23:53 50% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-17 23:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hmm. Nothing really stands out for this rc, which is all good. The
diffstat looks mostly nice and flat (which tends just to be a sign of
"small changes spread out"), with the exception of a couple of random
drivers that just had a bit more churn than others (mellanox and intel
iavf ethernet driver).

The other thing that stands out in the diffstat (although a lot less),
is some removal of some legacy debugging code that harkens back to the
copy-on-write credentials that were introduced in 2008, and that then
caused us to later introduce some self-checking code for that change.
I think we can lay that to rest, since that copy-on-write credential
model has now been around for 15 years and has probably never actually
found anything (the report that caused it is sadly lost in the mists
of time and the long-ago demise of kerneloops.org). In related news,
apparently nobody is silly enough to enable that code anyway.

That one was triggered by a "we should re-order the members of the
'cred' structure for the debug case because one of the types changed
size", but  rather than fix that code up, I asked Jens to just remove
the verification code that nobody enables and that isn't really
relevant any more.

But realistically, despite those few blips on the diffstat, most of
this ends up just being various random fixes all over. Filesystems are
maybe showing up more than usual (smb client and server, btrfs,
bcachefs and fuse), and we've got some tracing, mm and selftest
updates, but the bulk of it all is still (as usual) various random
driver fixes.

Shortlog appended. Please do give this a test in between the
last-minute xmas shopping or whatever else is going on ...

              Linus

---

Al Viro (2):
      fix ufs_get_locked_folio() breakage
      io_uring/cmd: fix breakage in SOCKET_URING_OP_SIOC* implementation

Alex Bee (1):
      clk: rockchip: rk3128: Fix SCLK_SDMMC's clock name

Alex Deucher (2):
      drm/amdgpu: fix buffer funcs setting order on suspend harder
      drm/amdgpu/sdma5.2: add begin/end_use ring callbacks

Alison Schofield (3):
      cxl/core: Always hold region_rwsem while reading poison lists
      cxl/memdev: Hold region_rwsem during inject and clear poison ops
      kernel/resource: Increment by align value in get_free_mem_region()

Amelie Delaunay (1):
      dmaengine: stm32-dma: avoid bitfield overflow assertion

Amir Goldstein (1):
      fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP

Andrew Davis (1):
      phy: ti: gmii-sel: Fix register offset when parent is not a syscon node

Andrew Halaney (1):
      net: stmmac: Handle disabled MDIO busses from devicetree

Andrzej Kacprowski (1):
      accel/ivpu/37xx: Fix interrupt_clear_with_0 WA initialization

AngeloGioacchino Del Regno (1):
      drm/mediatek: mtk_disp_gamma: Fix breakage due to merge issue

Ard Biesheuvel (1):
      efi/x86: Avoid physical KASLR on older Dell systems

Baokun Li (1):
      ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS

Baoquan He (6):
      riscv: fix VMALLOC_START definition
      loongarch, kexec: change dependency of object files
      m68k, kexec: fix the incorrect ifdeffery and build dependency of
CONFIG_KEXEC
      mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
      sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
      x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC

Beau Belgrave (1):
      eventfs: Fix events beyond NAME_MAX blocking tasks

Bjorn Helgaas (1):
      Revert "PCI: acpiphp: Reassign resources on bridge if necessary"

Boris Burkov (5):
      btrfs: free qgroup reserve when ORDERED_IOERR is set
      btrfs: fix qgroup_free_reserved_data int overflow
      btrfs: free qgroup pertrans reserve on transaction abort
      btrfs: don't clear qgroup reserved bit in release_folio
      btrfs: ensure releasing squota reserve on head refs

Brian Foster (1):
      bcachefs: don't attempt rw on unfreeze when shutdown

Chao Song (1):
      soundwire: intel_ace2x: fix AC timing setting for ACE2.x

Chengfeng Ye (2):
      atm: solos-pci: Fix potential deadlock on &cli_queue_lock
      atm: solos-pci: Fix potential deadlock on &tx_queue_lock

Chris Mi (2):
      net/mlx5e: Disable IPsec offload support if not FW steering
      net/mlx5e: TC, Don't offload post action rule if not supported

Chris Morgan (1):
      clk: rockchip: rk3568: Add PLL rate for 292.5MHz

Christian König (2):
      drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
      drm/amdgpu: warn when there are still mappings when a BO is destroyed v2

Dan Carpenter (1):
      net/mlx5: Fix a NULL vs IS_ERR() check

Dan Williams (1):
      cxl/hdm: Fix dpa translation locking

Daniel Hill (1):
      bcachefs: rebalance shouldn't attempt to compress unwritten extents

Dave Jiang (2):
      cxl/hdm: Fix a benign lockdep splat
      cxl: Add cxl_num_decoders_committed() usage to cxl_test

David Arinzon (4):
      net: ena: Destroy correct number of xdp queues upon failure
      net: ena: Fix xdp drops handling due to multibuf packets
      net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
      net: ena: Fix XDP redirection error

David Heidelberg (1):
      dt-bindings: panel-simple-dsi: move LG 5" HD TFT LCD panel into DSI yaml

David Hildenbrand (1):
      selftests/mm: cow: print ksft header before printing anything else

David Howells (1):
      afs: Fix refcount underflow from error handling race

David Stevens (1):
      mm/shmem: fix race in shmem_undo_range w/THP

Dinghao Liu (1):
      qed: Fix a potential use-after-free in qed_cxt_tables_alloc

Dmitrii Galantsev (1):
      drm/amd/pm: fix pp_*clk_od typo

Dong Chenchen (1):
      net: Remove acked SYN flag from packet in the transmit queue correctly

Eric Dumazet (2):
      tcp: fix tcp_disordered_ack() vs usec TS resolution
      net: prevent mss overflow in skb_segment()

Fangzhi Zuo (1):
      drm/amd/display: Populate dtbclk from bounding box

Farouk Bouabid (1):
      drm/panel: ltk050h3146w: Set burst mode for ltk050h3148w

Finley Xiao (1):
      clk: rockchip: rk3128: Fix aclk_peri_src's parent

Florent Revest (1):
      team: Fix use-after-free when an option instance allocation fails

Frank Li (1):
      dmaengine: fsl-edma: fix DMA channel leak in eDMAv4

Gavin Li (1):
      net/mlx5e: Check netdev pointer before checking its net ns

Gergo Koteles (4):
      ALSA: hda/tas2781: leave hda_component in usable state
      ALSA: hda/tas2781: handle missing EFI calibration data
      ALSA: hda/tas2781: call cleanup functions only once
      ALSA: hda/tas2781: reset the amp before component_add

Guanjun (2):
      dmaengine: idxd: Protect int_handle field in hw descriptor
      dmaengine: idxd: Fix incorrect descriptions for GRPCFG register

Hamza Mahfooz (1):
      drm/amd/display: fix hw rotated modes when PSR-SU is enabled

Hangyu Hua (1):
      fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()

Hans de Goede (1):
      platform/x86: intel-vbtn: Fix missing tablet-mode-switch events

Haren Myneni (1):
      powerpc/pseries/vas: Migration suspend waits for no in-progress
open windows

Hariprasad Kelam (3):
      octeontx2-pf: Fix promisc mcam entry action
      octeontx2-af: Update RSS algorithm index
      octeontx2-af: Fix pause frame configuration

Hartmut Knaack (1):
      ALSA: hda/realtek: Apply mute LED quirk for HP15-db

Hyunwoo Kim (3):
      atm: Fix Use-After-Free in do_vcc_ioctl
      net/rose: Fix Use-After-Free in rose_ioctl
      appletalk: Fix Use-After-Free in atalk_ioctl

Ignat Korchagin (1):
      kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP

Igor Russkikh (1):
      net: atlantic: fix double free in ring reinit logic

Ioana Ciornei (2):
      dpaa2-switch: fix size of the dma_unmap
      dpaa2-switch: do not ask for MDB, VLAN and FDB replay

Ira Weiny (2):
      cxl/cdat: Free correct buffer on checksum error
      cxl/pmu: Ensure put_device on pmu devices

Jagadeesh Kona (1):
      clk: qcom: Fix SM_CAMCC_8550 dependencies

Jai Luthra (1):
      dmaengine: ti: k3-psil-am62a: Fix SPI PDMA data

Jakub Kicinski (1):
      Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY
flag is set"

James Houghton (1):
      arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify

Jan Kara (1):
      ext4: fix warning in ext4_dio_write_end_io()

Jani Nikula (3):
      drm/crtc: fix uninitialized variable use
      drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select
      drm/edid: also call add modes in EDID connector update fallback

Jason-JH.Lin (1):
      drm/mediatek: Add spinlock for setting vblank event in atomic_begin

Jens Axboe (3):
      io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE
      cred: switch to using atomic_long_t
      cred: get rid of CONFIG_DEBUG_CREDENTIALS

Jianbo Liu (2):
      net/mlx5e: Reduce eswitch mode_lock protection context
      net/mlx5e: Check the number of elements before walk TC rhashtable

Jiaxun Yang (1):
      PCI: loongson: Limit MRRS to 256

Jiri Kosina (1):
      mailmap: add address mapping for Jiri Kosina

Jiri Pirko (1):
      dpll: sanitize possible null pointer dereference in
dpll_pin_parent_pin_set()

Johan Hovold (6):
      PCI/ASPM: Add pci_enable_link_state_locked()
      PCI: vmd: Fix potential deadlock when enabling ASPM
      PCI: qcom: Fix potential deadlock when enabling ASPM
      PCI: qcom: Clean up ASPM comment
      PCI/ASPM: Clean up __pci_disable_link_state() 'sem' parameter
      PCI/ASPM: Add pci_disable_link_state_locked() lockdep assert

John Hubbard (1):
      Revert "selftests: error out if kernel header files are not yet built"

Josef Bacik (1):
      btrfs: do not allow non subvolume root targets for snapshot

Judy Hsiao (1):
      neighbour: Don't let neigh_forced_gc() disable preemption for long

Kai Vehmanen (2):
      ALSA: hda/hdmi: add force-connect quirk for NUC5CPYB
      ALSA: hda/hdmi: add force-connect quirks for ASUSTeK Z170 variants

Kalesh AP (1):
      bnxt_en: Fix wrong return value check in bnxt_close_nic()

Karsten Graul (1):
      MAINTAINERS: remove myself as maintainer of SMC

Kefeng Wang (1):
      mm: fix VMA heap bounds checking

Kent Overstreet (10):
      bcachefs: Don't drop journal pins in exit path
      bcachefs; Don't use btree write buffer until journal replay is finished
      bcachefs: Fix a journal deadlock in replay
      bcachefs: Fix bch2_extent_drop_ptrs() call
      bcachefs: Convert compression_stats to for_each_btree_key2
      bcachefs: Don't run indirect extent trigger unless inserting/deleting
      bcachefs: Fix creating snapshot with implict source
      bcachefs: Fix deleted inode check for dirs
      bcachefs: Fix uninitialized var in bch2_journal_replay()
      bcachefs: Close journal entry if necessary when flushing all pins

Krister Johansen (1):
      fuse: share lookup state between submount and its parent

Krzysztof Kozlowski (3):
      soundwire: stream: fix NULL pointer dereference for multi_link
      stmmac: dwmac-loongson: drop useless check for compatible fallback
      MIPS: dts: loongson: drop incorrect dwmac fallback compatible

Leon Romanovsky (4):
      net/mlx5e: Honor user choice of IPsec replay window size
      net/mlx5e: Ensure that IPsec sequence packet number starts from 1
      net/mlx5e: Remove exposure of IPsec RX flow steering struct
      net/mlx5e: Tidy up IPsec NAT-T SA discovery

Lingkai Dong (1):
      drm: Fix FD ownership check in drm_master_check_perm()

Linus Torvalds (1):
      Linux 6.7-rc6

Lyude Paul (1):
      drm/nouveau/kms/nv50-: Don't allow inheritance of headless iors

Maciej Żenczykowski (1):
      net: ipv6: support reporting otherwise unknown prefix flags in
RTM_NEWPREFIX

Mario Limonciello (4):
      HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[]
      drm/amd/display: Restore guard against default backlight value < 1 nit
      drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
      drm/amd: Fix a probing order problem on SDMA 2.4

Mark Rutland (1):
      perf: Fix perf_event_validate_size() lockdep splat

Mathieu Desnoyers (1):
      ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()

Michael Chan (1):
      bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic

Michael Ellerman (1):
      MAINTAINERS: powerpc: Add Aneesh & Naveen

Michael Walle (2):
      drm/mediatek: fix kernel oops if no crtc is found
      phy: mediatek: mipi: mt8183: fix minimal supported frequency

Mikhail Khvainitski (1):
      HID: lenovo: Restrict detection of patched firmware only to USB cptkbd

Moshe Shemesh (2):
      net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work
      net/mlx5: Nack sync reset request when HotPlug is enabled

Namjae Jeon (8):
      ksmbd: set epoch in create context v2 lease
      ksmbd: set v2 lease capability
      ksmbd: downgrade RWH lease caching state to RH for directory
      ksmbd: send v2 lease break notification for directory
      ksmbd: lazy v2 lease break on smb2_write()
      ksmbd: avoid duplicate opinfo_put() call on error of
smb21_lease_break_ack()
      ksmbd: fix wrong allocation size update in smb2_open()
      ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE

Nikolay Kuratov (1):
      vsock/virtio: Fix unsigned integer wrap around in
virtio_transport_has_space()

Patrisious Haddad (2):
      net/mlx5e: Unify esw and normal IPsec status table creation/destruction
      net/mlx5e: Add IPsec and ASO syndromes check in HW

Paulo Alcantara (4):
      smb: client: fix OOB in receive_encrypted_standard()
      smb: client: fix potential OOBs in smb2_parse_contexts()
      smb: client: fix NULL deref in asn1_ber_decoder()
      smb: client: fix OOB in smb2_query_reparse_point()

Pavel Begunkov (1):
      io_uring/af_unix: disable sending io_uring over sockets

Piotr Gardocki (2):
      iavf: Introduce new state machines for flow director
      iavf: Handle ntuple on/off based on new state machines for flow director

Radu Bulie (1):
      net: fec: correct queue selection

Randy Dunlap (2):
      platform/x86: thinkpad_acpi: fix kernel-doc warnings
      platform/x86: intel_ips: fix kernel-doc formatting

Robin Murphy (1):
      perf/arm-cmn: Fail DTC counter allocation correctly

Ronald Wahl (1):
      dmaengine: ti: k3-psil-am62: Fix SPI PDMA data

Saleemkhan Jamadar (1):
      drm/amdgpu/jpeg: configure doorbell for each playback

Salvatore Dipietro (1):
      tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set

Sebastian Parschauer (1):
      HID: Add quirk for Labtec/ODDOR/aikeec handbrake

SeongJae Park (1):
      mm/damon/core: make damon_start() waits until kdamond_fn() starts

Shinas Rasheed (2):
      octeon_ep: initialise control mbox tasks before using APIs
      octeon_ep: explicitly test for firmware ready value

Shubhrajyoti Datta (1):
      EDAC/versal: Read num_csrows and num_chans using the correct
bitfield macro

Slawomir Laba (1):
      iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close

Sneh Shah (1):
      net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX

Somnath Kotur (1):
      bnxt_en: Clear resource reservation during resume

Sreekanth Reddy (1):
      bnxt_en: Fix skb recycling logic in bnxt_deliver_skb()

Stefan Wahren (3):
      qca_debug: Prevent crash on TX ring changes
      qca_debug: Fix ethtool -G iface tx behavior
      qca_spi: Fix reset behavior

Steven Rostedt (Google) (12):
      ring-buffer: Fix writing to the buffer with max_data_size
      tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
      ring-buffer: Fix memory leak of free page
      tracing: Update snapshot buffer on resize if it is allocated
      ring-buffer: Do not update before stamp when switching sub-buffers
      ring-buffer: Have saved event hold the entire event
      tracing: Add size check when printing trace_marker output
      ring-buffer: Do not try to put back write_stamp
      ring-buffer: Remove useless update to write_stamp in rb_try_to_discard()
      ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
      ring-buffer: Have rb_time_cmpxchg() set the msb counter too
      ring-buffer: Do not record in NMI if the arch does not support
cmpxchg in NMI

Stuart Lee (1):
      drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get

Su Hui (1):
      phy: sunplus: return negative error code in sp_usb_phy_probe

Taimur Hassan (1):
      drm/amd/display: Revert "Fix conversions between bytes and KB"

Thierry Reding (1):
      drm/nouveau: Fixup gk20a instobj hierarchy

Tvrtko Ursulin (2):
      drm/i915/selftests: Fix engine reset count storage for multi-tile
      drm/i915: Use internal class when counting engine resets

Tyler Fanelli (2):
      fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
      docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP

Ville Syrjälä (3):
      drm/i915: Fix remapped stride with CCS on ADL+
      drm/i915: Fix intel_atomic_setup_scalers() plane_state handling
      drm/i915: Fix ADL+ tiled plane stride when the POT stride is
smaller than the original

Vlad Buslov (1):
      net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table

Wang Yao (1):
      efi/loongarch: Use load address to calculate kernel entry address

Weihao Li (1):
      clk: rockchip: rk3128: Fix HCLK_OTG gate register

Xiaolei Wang (2):
      dmaengine: fsl-edma: Do not suspend and resume the masked dma
channel when the system is sleeping
      dmaengine: fsl-edma: Add judgment on enabling round robin arbitration

Yan Jun (1):
      HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards

Yang Yingliang (1):
      dmaengine: fsl-edma: fix wrong pointer check in fsl_edma3_attach_pd()

Yanteng Si (1):
      stmmac: dwmac-loongson: Make sure MDIO is initialized before use

Ye Bin (1):
      jbd2: fix soft lockup in journal_finish_inode_data_buffers()

Yu Zhao (4):
      mm/mglru: fix underprotected page cache
      mm/mglru: try to stop at high watermarks
      mm/mglru: respect min_ttl_ms with memcgs
      mm/mglru: reclaim offlined memcgs harder

Yuntao Wang (1):
      crash_core: fix the check for whether crashkernel is from high memory

Yusong Gao (1):
      sign-file: Fix incorrect return values check

Zhang Yi (2):
      jbd2: correct the printing of write_flags in jbd2_write_superblock()
      jbd2: increase the journal IO's priority

Zheng Yejian (1):
      tracing: Fix uaf issue when open the hist or hist_debug file

Zhipeng Lu (1):
      octeontx2-af: fix a use-after-free in rvu_nix_register_reporters

Ziqi Zhao (1):
      drm/crtc: Fix uninit-value bug in drm_mode_setcrtc

Zizhi Wo (1):
      ksmbd: fix memory leak in smb2_lock()

^ permalink raw reply	[relevance 50%]

* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
  @ 2023-12-17 21:40 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-17 21:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Brian Gerst, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Michal Luczaj

On Sun, 17 Dec 2023 at 13:08, H. Peter Anvin <hpa@zytor.com> wrote:
>
> On December 13, 2023 10:54:00 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
]> >Side note: the SS/CS checks could be stricter than the usual selector tests.
> >
> >In particular, normal segments can be Null segments. But CS/SS must not be.
> >
> >Also, since you're now checking the validity, maybe we shouldn't do
> >the "force cpl3" any more, and just make it an error to try to load a
> >non-cpl3 segment at sigreturn..
> >
> >That forcing was literally just because we weren't checking it for sanity...
> >
> >           Linus
>
> Not to mention that changing a null descriptor to 3 is wrong.

I don't think it is. All of 0-3 are "Null selectors". The RPL of the
selector simply doesn't matter when the index is zero, afaik.

But we obviously only do this for CS/SS, which can't be (any kind of)
Null selector and iret will GP on them regardless of the RPL in the
selector.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] hotfixes for 6.7-rc6
  @ 2023-12-17  0:16 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-17  0:16 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, linux-mm, linux-kernel, Jesse Barnes,
	Suren Baghdasaryan, Guru Anbalagane, David Rientjes

On Fri, 15 Dec 2023 at 20:57, Yu Zhao <yuzhao@google.com> wrote:
>
> There has been a short-term plan, i.e., moving some of folio->flags to
> the lower bits of folio->lru so that we can drop the Kconfig
> constraint. I have discussed this with Willy but never acted on it. My
> priority has been to surface more of our ideas that can potentially
> save users money on memory to the community. I'm CC'ing our team
> leads. Please feel free to let us know your preference on the
> priority.

This is definitely a "eventually" thing on my wishlist, so I was more
just wanting to hear that there is a plan, and somebody working on
it..

Thanks,

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
  @ 2023-12-16 18:40 95%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-16 18:40 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Brian Gerst, linux-kernel, x86, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra

On Sat, 16 Dec 2023 at 10:25, Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
> While preparing the patch I also came across some things that are
> unclear to me:
>
> - why do we want some segments with the A (accessed) bit set and some
> with it cleared -- is there an actual reason for the difference, or
> could we just set it for all of them?

I think it's random, and an effect of just having hardcoded numbers
and not having any structure to it.

But I do think you're right that we should just start with all
kernel-created segment descriptors marked as accessed. I do not
believe that we have any actual *use* for the descriptor access bit.

> - why does setup_percpu_segment() want the DB (size) flag clear? This
> seems to indicate that it's a 16-bit segment -- is this correct?

I think it's nonsensical and doesn't matter, and is another mistake
from us just having random numbers.

I don't think the DB bit matters except for when it's used for the
code or stack segment (or, apparently, if it's a grow-down segment).

So I think your patch looks good, and I would keep it in that form if
it makes it easier to just verify that it generates an identical
kernel image.

And then as a separate patch, I would remove that DB bit clear thing.

Anyway, I do like your patch, and I think the fact that you found
those oddities is a good argument *for* the patch, but at the same
time I think I'll just bow to the x86 maintainers who may think that
this is churn in an area that they'd rather not touch any more.

So consider that an "ack" from me, but with that caveat of yes, I
think a binary diff would be a good thing because this is *so* odd and
low-level and maybe people just think it's not worth it.

Thanks,

                  Linus

^ permalink raw reply	[relevance 95%]

* Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds
  @ 2023-12-16 18:26 90%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-16 18:26 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Vinicius Costa Gomes, hu1.chen, miklos, malini.bhandaru,
	tim.c.chen, mikko.ylinen, lizhen.you, linux-unionfs,
	linux-kernel, linux-fsdevel, Christian Brauner, David Howells

On Sat, 16 Dec 2023 at 02:16, Amir Goldstein <amir73il@gmail.com> wrote:
>
> As a matter of fact, maybe it makes sense to embed a non-refcounted
> copy in the struct used for the guard:

No, please don't. A couple of reasons:

 - that 'struct cred' is not an insignificant size, so stack usage is noticeable

 - we really should strive to avoid passing pointers to random stack
elements around

Don't get me wrong - we pass structures around on the stack all the
time, but it _has_ been a problem with stack usage. Those things tend
to grow..

So in general, the primary use of "pointers to stack objects" is for
when it's either trivially tiny, or when it's a struct that is
explicitly designed for that purpose as a kind of an "extended set of
arguments" (think things like the "tlb_state" for the TLB flushing, or
the various iterator structures we use etc).

When we have a real mainline kernel struct like 'struct cred' that
commonly gets passed around as a pointer argument that *isn't* on the
stack, I get nervous when people then pass it around on the stack too.
It's just too easy to mistakenly pass it off with the wrong lifetime,
and stack corruption is *so* nasty to debug that it's just horrendous.

Yes, lifetime problems are nasty to debug even when it's not some
mis-use of a stack object, but at least for slab allocations etc we
have various generic debug tools that help find them.

For the "you accessed things under the stack, possibly from the wrong
thread", I don't think any of our normal debug coverage will help at
all.

So yes, stack allocations are efficient and fast, and we do use them,
but please don't use them for something like 'struct cred' that has a
proper allocator function normally.

I just removed the CONFIG_DEBUG_CREDENTIALS code, because the fix for
a potential overflow made it have bad padding, and rather than fix the
padding I thought it was better to just remove the long-unused debug
code that just made that thing even more unwieldly than it is.

But I thought that largely because our 'struct cred' use has been
quite stable for a long time (and the original impetus for all that
debug code was the long-ago switch to using the copy-on-write
behavior).

Let's not break that stability with suddenly having a "sometimes it's
allocated on the stack" model.

             Linus

^ permalink raw reply	[relevance 90%]

* Re: [GIT PULL] hotfixes for 6.7-rc6
  @ 2023-12-15 20:11 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-15 20:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

On Fri, 15 Dec 2023 at 07:16, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Yu Zhao (4):
>       mm/mglru: fix underprotected page cache
>       mm/mglru: try to stop at high watermarks
>       mm/mglru: respect min_ttl_ms with memcgs
>       mm/mglru: reclaim offlined memcgs harder

Entirely unrelated to this pull request (which I already pulled and
pushed out, as noted by pr-tracker-bot), since I looked at these it
just reminded me about a question I've had for a while...

Do we have any long-term (or even short-term?) plans to just make
mglru be the one and only model?

Yes, right now it's not just a Kconfig choice, but a real technical
issue too: it depends on having enough flags available, so we have
that "cannot use it on 32-bit with sparsemem".

But I'm hoping there is a plan or a workaround for that?

Because I feel like we really don't want to keep this "two different
models" situation around forever.

                     Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH] ring-buffer: Remove 32bit timestamp logic
  @ 2023-12-14 20:50 99%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-14 20:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-arch, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers

On Thu, 14 Dec 2023 at 12:35, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 14 Dec 2023 11:44:55 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > On Thu, 14 Dec 2023 at 08:55, Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > And yes, this does get called in NMI context.
> >
> > Not on an i486-class machine they won't. You don't have a local apic
> > on those, and they won't have any NMI sources under our control (ie
> > NMI does exist, but we're talking purely legacy NMI for "motherboard
> > problems" like RAM parity errors etc)
>
> Ah, so we should not worry about being in NMI context without a 64bit cmpxchg?

.. on x86.

Elsewhere, who knows?

It is *probably* true in most situations. '32-bit' => 'legacy' =>
'less likely to have fancy profiling / irq setups'.

But I really don't know.

> > So no. You need to forget about the whole "do a 64-bit cmpxchg on
> > 32-bit architectures" as being some kind of solution in the short
> > term.
>
> But do all archs have an implementation of cmpxchg64, even if it requires
> disabling interrupts? If not, then I definitely cannot remove this code.

We have a generic header file, so anybody who uses that would get the
fallback version, ie

arch_cmpxchg64 -> generic_cmpxchg64_local -> __generic_cmpxchg64_local

which does that irq disabling thing.

But no, not everybody is guaranteed to use that fallback. From a quick
look, ARC, hexagon and CSky don't do this, for example.

And then I got bored and stopped looking.

My guess is that *most* 32-bit architectures do not have a 64-bit
cmpxchg - not even the irq-safe one.

For the UP case you can do your own, of course.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3] ring-buffer: Remove 32bit timestamp logic
  2023-12-14 20:30 99%     ` Linus Torvalds
@ 2023-12-14 20:32 99%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-14 20:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Linux Arch

On Thu, 14 Dec 2023 at 12:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Read my email. Don't do random x86-centric things. We have that
>
>   #ifndef system_has_cmpxchg64
>       #define system_has_cmpxchg64() false
>   #endif
>
> which should work.

And again, by "should work" I mean that it would disable this entirely
on things like arm32 until the arm people decide they care. But at
least it won't use an unsafe non-working 64-bit cmpxchg.

And no, for 6.7, only fix reported bugs. No big reorgs at all,
particularly for something that likely has never been hit by any user
and sounds like this all just came out of discussion.

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3] ring-buffer: Remove 32bit timestamp logic
  @ 2023-12-14 20:30 99%     ` Linus Torvalds
  2023-12-14 20:32 99%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14 20:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Linux Arch

On Thu, 14 Dec 2023 at 12:18, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> For this issue of the 64bit cmpxchg, is there any config that works for any
> arch that do not have a safe 64-bit cmpxchg? At least for 486, is the
> second half of the if condition reasonable?
>
>         if (IS_ENABLED(CONFIG_X86_32) && !IS_ENABLED(CONFIG_X86_CMPXCHG64)) {
>                 if (unlikely(in_nmi()))
>                         return NULL;
>         }

No.

Read my email. Don't do random x86-centric things. We have that

  #ifndef system_has_cmpxchg64
      #define system_has_cmpxchg64() false
  #endif

which should work.

NOTE! The above is for 32-bit architectures only! For 64-bit ones
either just use cmpxchg directly. And if you need a 128-bit one,
there's system_has_cmpxchg128...

                 Linus

^ permalink raw reply	[relevance 99%]

* Re: [RFC PATCH v3 11/11] mseal:add documentation
  @ 2023-12-14 20:14 86%           ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14 20:14 UTC (permalink / raw)
  To: Stephen Röttger
  Cc: Jeff Xu, jeffxu, akpm, keescook, jannh, willy, gregkh, jorgelo,
	groeck, linux-kernel, linux-kselftest, linux-mm, pedro.falcato,
	dave.hansen, linux-hardening, deraadt

On Thu, 14 Dec 2023 at 10:07, Stephen Röttger <sroettger@google.com> wrote:
>
> AIUI, the madvise(DONTNEED) should effectively only change the content of
> anonymous pages, i.e. it's similar to a memset(0) in that case. That's why we
> added this special case: if you want to madvise(DONTNEED) an anonymous page,
> you should have write permissions to the page.

Hmm. I actually would be happier if we just made that change in
general. Maybe even without sealing, but I agree that it *definitely*
makes sense in general as a sealing thing.

IOW, just saying

 "madvise(DONTNEED) needs write permissions to an anonymous mapping when sealed"

makes 100% sense to me. Having a separate _flag_ to give sensible
semantics is just odd.

IOW, what I really want is exactly that "sensible semantics, not random flags".

Particularly for new system calls with fairly specialized use, I think
it's very important that the semantics are sensible on a conceptual
level, and that we do not add system calls that are based on "random
implementation issue of the day".

Yes, yes, then as we have to maintain things long-term, and we hit
some compatibility issue, at *THAT* point we'll end up facing nasty
"we had an implementation that had these semantics in practice, so now
we're stuck with it", but when introducing a new system call, let's
try really hard to start off from those kinds of random things.

Wouldn't it be lovely if we can just come up with a sane set of "this
is what it means to seal a vma", and enumerate those, and make those
sane conceptual rules be the initial definition. By all means have a
"flags" argument for future cases when we figure out there was
something wrong or the notion needed to be extended, but if we already
*start* with random extensions, I feel there's something wrong with
the whole concept.

So I would really wish for the first version of

     mseal(start, len, flags);

to have "flags=0" be the one and only case we actually handle
initially, and only add a single PROT_SEAL flag to mmap() that says
"create this mapping already pre-sealed".

Strive very hard to make sealing be a single VM_SEALED flag in the
vma->vm_flags that we already have, just admit that none of this
matters on 32-bit architectures, so that VM_SEALED can just use one of
the high flags that we have several free of (and that pkeys already
depends on), and make this a standard feature with no #ifdef's.

Can chrome live with that? And what would the required semantics be?
I'll start the list:

 - you can't unmap or remap in any way (including over-mapping)

 - you can't change protections (but with architecture support like
pkey, you can obviously change the protections indirectly with PKRU
etc)

 - you can't do VM operations that change data without the area being
writable (so the DONTNEED case - maybe there are others)

 - anything else?

Wouldn't it be lovely to have just a single notion of sealing that is
well-documented and makes sense, and doesn't require people to worry
about odd special cases?

And yes, we'd have the 'flags' argument for future special cases, and
hope really hard that it's never needed.

           Linus

^ permalink raw reply	[relevance 86%]

* Re: [PATCH v3] ring-buffer: Remove 32bit timestamp logic
  @ 2023-12-14 19:46 99% ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14 19:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Linux Arch

On Thu, 14 Dec 2023 at 09:53, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> +       /*
> +        * For architectures that can not do cmpxchg() in NMI, or require
> +        * disabling interrupts to do 64-bit cmpxchg(), do not allow them
> +        * to record in NMI context.
> +        */
> +       if ((!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) ||
> +            (IS_ENABLED(CONFIG_X86_32) && !IS_ENABLED(CONFIG_X86_CMPXCHG64))) &&
> +           unlikely(in_nmi())) {
> +               return NULL;
> +       }

Again, this is COMPLETE GARBAGE.

You're using "ARCH_HAVE_NMI_SAFE_CMPXCHG" to test something that just
isn't what it's about.

Having a NMI-safe cmpxchg does *not* mean that you actualyl have a
NMI-safe 64-bit version.

You can't test it that way.

Stop making random changes that just happen to work on the one machine
you tested it on.

           Linus


^ permalink raw reply	[relevance 99%]

* Re: [PATCH] ring-buffer: Remove 32bit timestamp logic
  @ 2023-12-14 19:44 90%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14 19:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-arch, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers

On Thu, 14 Dec 2023 at 08:55, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> And yes, this does get called in NMI context.

Not on an i486-class machine they won't. You don't have a local apic
on those, and they won't have any NMI sources under our control (ie
NMI does exist, but we're talking purely legacy NMI for "motherboard
problems" like RAM parity errors etc)

> I had a patch that added:
>
> +       /* ring buffer does cmpxchg, make sure it is safe in NMI context */
> +       if (!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) &&
> +           (unlikely(in_nmi()))) {
> +               return NULL;
> +       }

CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG doesn't work on x86 in this context,
because the issue is that yes, there's a safe 'cmpxchg', but that's
not what you want.

You want the save cmpxchg64, which is an entirely different beast.

And honestly, I don't think that NMI_SAFE_CMPXCHG covers the
double-word case anywhere else either, except purely by luck.

In mm/slab.c, we also use a double-wide cmpxchg, and there the rule
has literally been that it's conditional on

 (a) system_has_cmpxchg64() existing as a macro

 (b) using that macro to then gate - at runtime - whether it actually
works or not

I think - but didn't check - that we essentially only enable the
two-word case on x86 as a result, and fall back to the slow case on
all other architectures - and on the i486 case.

That said, other architectures *do* have a working double-word
cmpxchg, but I wouldn't guarantee it. For example, 32-bit arm does
have one using ldrexd/strexd, but that only exists on arm v6+.

And guess what? You'll silently get a "disable interrupts, do it as a
non-atomic load-store" on arm too for that case. And again, pre-v6 arm
is about as relevant as i486 is, but my point is, that double-word
cmpxchg you rely on simply DOES NOT EXIST on 32-bit platforms except
under special circumstances.

So this isn't a "x86 is the odd man out". This is literally generic.

> Now back to my original question. Are you OK with me sending this to you
> now, or should I send you just the subtle fixes to the 32-bit rb_time_*
> code and keep this patch for the merge window?

I'm absolutely not taking some untested random "let's do 64-bit
cmpxchg that we know is broken on 32-bit using broken conditionals"
shit.

What *would* work is that slab approach, which is essentially

  #ifndef system_has_cmpxchg64
      #define system_has_cmpxchg64() false
  #endif

        ...
        if (!system_has_cmpxchg64())
                return error or slow case

        do_64bit_cmpxchg_case();

(although the slub case is much more indirect, and uses a
__CMPXCHG_DOUBLE flag that only gets set when that define exists etc).

But that would literally cut off support for all non-x86 32-bit architectures.

So no. You need to forget about the whole "do a 64-bit cmpxchg on
32-bit architectures" as being some kind of solution in the short
term.

               Linus


^ permalink raw reply	[relevance 90%]

* Re: [PATCH] ring-buffer: Remove 32bit timestamp logic
  @ 2023-12-14  6:53 92%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14  6:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-arch, LKML, Linux Trace Kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers

On Wed, 13 Dec 2023 at 18:45, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> tl;dr;  The ring-buffer timestamp requires a 64-bit cmpxchg to keep the
> timestamps in sync (only in the slow paths). I was told that 64-bit cmpxchg
> can be extremely slow on 32-bit architectures. So I created a rb_time_t
> that on 64-bit was a normal local64_t type, and on 32-bit it's represented
> by 3 32-bit words and a counter for synchronization. But this now requires
> three 32-bit cmpxchgs for where one simple 64-bit cmpxchg would do.

It's not that a 64-bit cmpxchg is even slow. It doesn't EXIST AT ALL
on older 32-bit x86 machines.

Which is why we have

    arch/x86/lib/cmpxchg8b_emu.S

which emulates it on machines that don't have the CX8 capability
("CX8" being the x86 capability flag name for the cmpxchg8b
instruction, aka 64-bit cmpxchg).

Which only works because those older 32-bit cpu's also don't do SMP,
so there are no SMP cache coherency issues, only interrupt atomicity
issues.

IOW, the way to do an atomic 64-bit cmpxchg on the affected hardware
is to simply disable interrupts.

In other words - it's not just slow.  It's *really* slow. As in 10x
slower, not "slightly slower".

> We started discussing how much time this is actually saving to be worth the
> complexity, and actually found some hardware to test. One Atom processor.

That atom processor won't actually show the issue. It's much too
recent. So your "test" is actually worthless.

And you probably did this all with a kernel config that had
CONFIG_X86_CMPXCHG64 set anyway, which wouldn't even boot on a i486
machine.

So in fact your test was probably doubly broken, in that not only
didn't you test the slow case, you tested something that wouldn't even
have worked in the environment where the slow case happened.

Now, the real question is if anybody cares about CPUs that don't have
cmpxchg8b support.

IBecause in practice, it's really just old 486-class machines (and a
couple of clone manufacturers who _claimed_ to be Pentium class, but
weren't - there was also some odd thing with Windows breaking if you
had CPUID claiming to support CX8

We dropped support for the original 80386 some time ago. I'd actually
be willing to drop support for ll pre-cmpxchg8b machines, and get rid
of the emulation.

I also suspect that from a perf angle, none of this matters. The
emulation being slow probably is a non-issue, simply because even if
you run on an old i486 machine, you probably won't be doing perf or
tracing on it.

             Linus

^ permalink raw reply	[relevance 92%]

* Re: [PATCH] get_maintainer: correctly parse UTF-8 encoded names in files
  @ 2023-12-14  1:41 99%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-14  1:41 UTC (permalink / raw)
  To: Alvin Šipraga
  Cc: Joe Perches, Duje Mihanović,
	Alvin Šipraga, Konstantin Ryabitsev, linux-kernel

On Wed, 13 Dec 2023 at 17:06, Alvin Šipraga <ALSI@bang-olufsen.dk> wrote:
>
> Sorry to be a nuisance, but could you please have another look below and
> reconsider this patch? Otherwise NAK is fine, but I wanted to follow up
> on this as it solves an actual, albeit minor, issue for people with
> unusual names when sending and receiving patches.

The patch seems bogus, because it shouldn't have any "Latin" encoding
issues at all.

Opening as utf8 makes sense, but the "Latin" part of the regular
expressions seem bogus.

IOW, isn't '\p{L}' the right pattern for a "letter"? Isn't that what
we actually care about here?

Replacing one locale bug with just another locale bug seems pointless.

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [RFC PATCH v3 11/11] mseal:add documentation
  @ 2023-12-14  1:31 89%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-14  1:31 UTC (permalink / raw)
  To: Jeff Xu
  Cc: jeffxu, akpm, keescook, jannh, sroettger, willy, gregkh, jorgelo,
	groeck, linux-kernel, linux-kselftest, linux-mm, pedro.falcato,
	dave.hansen, linux-hardening, deraadt

On Wed, 13 Dec 2023 at 16:36, Jeff Xu <jeffxu@google.com> wrote:
>
>
> > IOW, when would you *ever* say "seal this area, but MADV_DONTNEED is ok"?
> >
> The MADV_DONTNEED is OK for file-backed mapping.

Right. It makes no semantic difference. So there's no point to it.

My point was that you added this magic flag for "not ok for RO anon mapping".

It's such a *completely* random flag, that I go "that's just crazy
random - make sealing _always_ disallow that case".

So what I object to in this series is basically random small details
that should just eb part of the basic act of sealing.

I think sealing should just mean "you can't do any operations that
have semantic meaning for the mapping, because it is SEALED".

So I think sealing should automatically mean "can't do MADV_DONTNEED
on anon memory", because that's basically equivalent to a munmap/remap
operation.

I also think that sealing should just automatically mean "can't do
mprotect any more".

And yes, the OpenBSD semantics of "immutable" apparently allowed
reducing permissions, but even the openbsd man-page seems to think
that was a bug, so we should just not allow it. And the openbsd case
seems to be because of how they made certain things immutable by
default, which is different from what this mseal() thing is.

End result: I'd really like to make the thing conceptually simpler,
rather than add all those random (*very* random in case of
MADV_DONTNEED) special cases.

Is there any actual practical example of why you'd want a half-sealed thing?

And no, I didn't read the pdf that was attached. If it can't just be
explained in plain language, it's not an explanation.

I'd love for "sealed" to be just a single bit in the vm_flags things
that we already have. Not a config option. Not some complicated thing
that is hard to explain. A simple "I have set up this mapping, you
can't change it any more".

And if it cannot be that kind of thing, I want to have clear and
obvious examples of why it can't be that simple thing.

Not a pdf file that describes some google-chrome design. Something
down-to-earth and practical (and not a "we might want this in the
future" thing either).

IOW, what is wrong with "THIS VMA SETUP CANNOT BE CHANGED ANY MORE"?

Nothing less, but also nothing more. No random odd bits that need explaining.

              Linus

^ permalink raw reply	[relevance 89%]

* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
  2023-12-13 18:51 96%   ` Linus Torvalds
@ 2023-12-13 19:08 97%     ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-13 19:08 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra

On Wed, 13 Dec 2023 at 10:51, Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
> It also happens to be 32-bit only, it doesn't matter for the thing
> you're trying to fix, but that valid_user_selector() thing is then
> used on x86-32 too.
>
> So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
> getcpu one is a user segment.
>
> And the PnP and APM BIOS segments are similarly kernel-only.

Final (?) note: when looking at this, I have to say that our
GDT_ENTRY_INIT() and GDT_ENTRY() macros are horrendous.

I know exactly *why* they are horrendous, with all the history of
passing in raw flags values, etc, and you can most certainly see that
whole thing in the GDT_ENTRY() macro. It's used in assembly code in a
couple of cases too.

But then you look at GDT_ENTRY_INIT(), and it turns that illegible
"flags" value into (slightly more) legible S/DPL/etc values. So it
literally makes people use those odd "this is how this is encoded"
values even when the code actually wants to use a structure definition
that has the flags split out.

I guess it's much too much work to really fix things, but maybe we
could at least add #defines and comments for the special values.

So instead of

        GDT_ENTRY_INIT(0xc093, 0, 0xfffff)

we could maybe have

       #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \
                ((type) |
                 (s)<<4) | \
                (dpl) << 5) | ....

and have #defines for those 0xc093 values (with comments), so that we'd have

        GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff)

instead of a magic 0xc093 number.

This would require some nit-picky "read all those values and know the
crazy descriptor table layout" thing. Maybe somebody has a serious
case of insomnia and boredom?

           Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH 3/3] x86/sigreturn: Reject system segements
  @ 2023-12-13 18:54 99%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-13 18:54 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra, Michal Luczaj

On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>
> @@ -98,7 +98,11 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
>
>         /* Get CS/SS and force CPL3 */
>         regs->cs = sc.cs | 0x03;
> +       if (!valid_user_selector(regs->cs))
> +               return false;
>         regs->ss = sc.ss | 0x03;
> +       if (!valid_user_selector(regs->ss))
> +               return false;

Side note: the SS/CS checks could be stricter than the usual selector tests.

In particular, normal segments can be Null segments. But CS/SS must not be.

Also, since you're now checking the validity, maybe we shouldn't do
the "force cpl3" any more, and just make it an error to try to load a
non-cpl3 segment at sigreturn..

That forcing was literally just because we weren't checking it for sanity...

           Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT
  @ 2023-12-13 18:51 96%   ` Linus Torvalds
  2023-12-13 19:08 97%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-13 18:51 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-kernel, x86, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra

On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>
> This will make testing for system segments easier.

It seems to make more sense organizationally too, with the special
non-data/code segments clearly separate at the end.

So I think this is fine conceptually.

HOWEVER, I think that you might want to expand on this a bit more,
because there are other special segments selectors that might not be
thing you want to expose to user space.

We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
It also happens to be 32-bit only, it doesn't matter for the thing
you're trying to fix, but that valid_user_selector() thing is then
used on x86-32 too.

So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
getcpu one is a user segment.

And the PnP and APM BIOS segments are similarly kernel-only.

But then the VDSO getcpu segment is user-visible, in the middle, and
again, it's 32-bit only but that whole GDT_SYSTEM_START thing is
supposed to work there too.

End result: this seems incomplete and not really fully baked.

I wonder if instead of GDT_SYSTEM_START, you'd be better off just
making a trivial constant bitmap of "these are user visible segments
in the GDT". No need to re-order things, just have something like

   #define USER_SEGMENTS_MASK \
        ((1ul << GDT_ENTRY_DEFAULT_USER_CS) |
         ,,,,

and use that for the test (remember to check for GDT_ENTRIES as the max).

Hmm?

             Linus

^ permalink raw reply	[relevance 96%]

* Re: [RFC PATCH v3 11/11] mseal:add documentation
  @ 2023-12-13  0:39 99%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-12-13  0:39 UTC (permalink / raw)
  To: jeffxu
  Cc: akpm, keescook, jannh, sroettger, willy, gregkh, jeffxu, jorgelo,
	groeck, linux-kernel, linux-kselftest, linux-mm, pedro.falcato,
	dave.hansen, linux-hardening, deraadt

On Tue, 12 Dec 2023 at 15:17, <jeffxu@chromium.org> wrote:
> +
> +**types**: bit mask to specify the sealing types, they are:

I really want a real-life use-case for more than one bit of "don't modify".

IOW, when would you *ever* say "seal this area, but MADV_DONTNEED is ok"?

Or when would you *ever* say "seal this area, but mprotect()" is ok.

IOW, I want to know why we don't just do the BSD immutable thing, and
why we need this multi-level sealing thing.

               Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.7-rc5
@ 2023-12-10 22:53 39% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-10 22:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Well, this has been a week of travel, jetlag, and then a few days of
getting over a nasty cold for me.

I think I'm mostly over it, but it does mean that I'm very happy that
things have been pretty calm and it wasn't a problem that I was
feeling pretty miserable at one point and sat at the computer only
sporadically as a result.

The stats for rc5 all look very normal - the bulk is drivers (gpu,
networking and sound being the biggest areas, but we've got a bit of
everything in there), and then we have the usual mix of architecture
fixes, filesystems, networking, core kernel and some selftest updates.

Nothing looks particularly scary, which is good, because if it had
been, I wouldn't  have had the capacity to deal with it last week.

Let's hope it stays that way even as I am getting better. Because the
holidays are almost upon us, and I'm woefully underprepared.

                Linus

---

Adrián Larumbe (2):
      drm/panfrost: Consider dma-buf imported objects as resident
      drm/panfrost: Fix incorrect updating of current device frequency

Ahmad Fatoum (1):
      MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry

Aleksandrs Vinarskis (1):
      ALSA: hda/realtek: fix speakers on XPS 9530 (2023)

Alex Bee (2):
      arm64: dts: rockchip: Expand reg size of vdec node for RK3399
      ARM: dts: rockchip: Fix sdmmc_pwren's pinmux setting for RK3128

Alex Deucher (2):
      drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml
      drm/amdgpu: fix buffer funcs setting order on suspend

Alexander Stein (4):
      arm64: dt: imx93: tqma9352-mba93xxla: Fix LPUART2 pad config
      dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
      arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
      arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells

Alexander Usyskin (1):
      mei: pxp: fix mei_pxp_send_message return value

Alvin Lee (1):
      drm/amd/display: Use channel_width = 2 for vram table 3.0

Andi Shyti (1):
      serial: ma35d1: Validate console index before assignment

Andrew Jones (1):
      RISC-V: hwprobe: Always use u64 for extension bits

Andrew Morton (2):
      MAINTAINERS: add Andrew Morton for lib/*
      mm/memory.c:zap_pte_range() print bad swap entry

Andy Shevchenko (2):
      units: add missing header
      serial: 8250_dw: Add ACPI ID for Granite Rapids-D UART

AngeloGioacchino Del Regno (8):
      ASoC: SOF: mediatek: mt8186: Add Google Steelix topology compatible
      arm64: dts: mediatek: mt8195: Fix PM suspend/resume with venc clocks
      arm64: dts: mediatek: mt8183: Fix unit address for scp reserved memory
      arm64: dts: mediatek: mt8183-evb: Fix unit_address_vs_reg warning on ntc
      arm64: dts: mediatek: mt8173-evb: Fix regulator-fixed node names
      arm64: dts: mediatek: mt8183: Move thermal-zones to the root node
      arm64: dts: mediatek: mt8186: Change gpu speedbin nvmem cell name
      arm64: dts: mediatek: cherry: Fix interrupt cells for MT6360 on I2C7

Antoniu Miclaus (1):
      hwmon: max31827: include regulator header

Armin Wolf (3):
      platform/x86: wmi: Skip blocks with zero instances
      hwmon: (acpi_power_meter) Fix 4.29 MW bug
      hwmon: (corsair-psu) Fix probe when built-in

Arnd Bergmann (2):
      ARM: PL011: Fix DMA support
      drm/bridge: tc358768: select CONFIG_VIDEOMODE_HELPERS

Ashwin Dayanand Kamat (1):
      x86/sev: Fix kernel crash due to late update to read-only ghcb_version

Ayush Singh (1):
      greybus: gb-beagleplay: Ensure le for values in transport

Bagas Sanjaya (1):
      MAINTAINERS: drop Antti Palosaari

Baoquan He (2):
      drivers/base/cpu: crash data showing should depends on KEXEC_CORE
      kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP

Bin Li (1):
      ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5

Bitao Hu (1):
      nvme: fix deadlock between reset and scan

Boerge Struempfel (1):
      gpiolib: sysfs: Fix error handling on failed export

Borislav Petkov (AMD) (1):
      x86/CPU/AMD: Check vendor in the AMD microcode callback

Brett Creeley (1):
      ionic: Fix dim work handling in split interrupt mode

Cameron Williams (1):
      parport: Add support for Brainboxes IX/UC/PX parallel cards

Chancel Liu (1):
      ASoC: imx-rpmsg: SND_SOC_IMX_RPMSG should depend on OF and I2C

Charles Keepax (1):
      ASoC: wm8974: Correct boost mixer inputs

Charlie Jenkins (3):
      riscv: Safely remove entries from relocation list
      riscv: Correct type casting in module loading
      Support rv32 ULEB128 test

Chester Lin (2):
      MAINTAINERS: change the S32G2 maintainer's email address.
      .mailmap: add a new address mapping for Chester Lin

Christophe JAILLET (1):
      hwmon: (nzxt-kraken2) Fix error handling path in kraken2_probe()

ChunHao Lin (1):
      r8169: fix rtl8125b PAUSE frames blasting when suspended

Claudio Imbrenda (2):
      KVM: s390: vsie: fix wrong VIR 37 when MSO is used
      KVM: s390/mm: Properly reset no-dat

Clément Léger (1):
      riscv: fix misaligned access handling of C.SWSP and C.SDSP

Colin Ian King (1):
      hwmon: ltc2991: Fix spelling mistake "contiuous" -> "continuous"

Conor Dooley (2):
      riscv: dts: sophgo: remove address-cells from intc node
      riscv: dts: microchip: move timebase-frequency to mpfs.dtsi

D. Wythe (1):
      netfilter: bpf: fix bad registration on nf_defrag

Dan Carpenter (1):
      io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()

Daniel Borkmann (1):
      packet: Move reference count in packet_sock to atomic_long_t

Daniel Mack (1):
      serial: sc16is7xx: address RX timeout interrupt errata

Daniil Maximov (1):
      net: atlantic: Fix NULL dereference of skb pointer in

Dave Airlie (1):
      nouveau/tu102: flush all pdbs on vmm flush

David Howells (3):
      cifs: Fix flushing, invalidation and file size with copy_file_range()
      cifs: Fix flushing, invalidation and file size with FICLONE
      cifs: Fix non-availability of dedup breaking generic/304

David Jeffery (1):
      md/raid6: use valid sector values to determine if an I/O should
wait on the reshape

David Lin (1):
      ASoC: nau8822: Fix incorrect type in assignment and cast to
restricted __be16

David Rau (1):
      ASoC: da7219: Support low DC impedance headset

David Thompson (1):
      mlxbf-bootctl: correctly identify secure boot with development keys

David Woodhouse (1):
      KVM: selftests: add -MP to CFLAGS

Dinghao Liu (3):
      ASoC: wm_adsp: fix memleak in wm_adsp_buffer_populate
      scsi: be2iscsi: Fix a memleak in beiscsi_init_wrb_handle()
      net: bnxt: fix a potential use-after-free in bnxt_init_tc

Dmitry Safonov (5):
      Documentation/tcp: Fix an obvious typo
      net/tcp: Consistently align TCP-AO option in the header
      net/tcp: Limit TCP_AO_REPAIR to non-listen sockets
      net/tcp: Don't add key with non-matching VRF on connected sockets
      net/tcp: Don't store TCP-AO maclen on reqsk

Douglas Anderson (5):
      r8152: Hold the rtnl_lock for all of reset
      r8152: Add RTL8152_INACCESSIBLE checks to more loops
      r8152: Add RTL8152_INACCESSIBLE to r8156b_wait_loading_flash()
      r8152: Add RTL8152_INACCESSIBLE to r8153_pre_firmware_1()
      r8152: Add RTL8152_INACCESSIBLE to r8153_aldps_en()

Elliot Berman (1):
      freezer,sched: Do not restore saved_state of a thawed task

Eric Dumazet (2):
      ipv6: fix potential NULL deref in fib6_add()
      tcp: do not accept ACK of bytes we never sent

Eric Woudstra (1):
      arm64: dts: mt7986: fix emmc hs400 mode without uboot initialization

Eugen Hristev (3):
      arm64: dts: mediatek: mt8186: fix clock names for power domains
      arm64: dts: mediatek: mt7622: fix memory node warning check
      arm64: dts: mediatek: mt8183-kukui-jacuzzi: fix dsi unnecessary
cells properties

Fabio Estevam (4):
      ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
      ARM: dts: imx28-xea: Pass the 'model' property
      dt-bindings: lcdif: Properly describe the i.MX23 interrupts
      dt-bindings: display: adi,adv75xx: Document #sound-dai-cells

Florian Fainelli (2):
      pwm: bcm2835: Fix NPD in suspend/resume
      scripts/gdb: fix lx-device-list-bus and lx-device-list-class

Florian Westphal (2):
      netfilter: nft_set_pipapo: skip inactive elements during set walk
      netfilter: nf_tables: fix 'exist' matching on bigendian arches

Francesco Dolcini (1):
      platform/surface: aggregator: fix recv_buf() return value

Frank Wunderlich (2):
      arm64: dts: mt7986: define 3W max power to both SFP on BPI-R3
      arm64: dts: mt7986: change cooling trips

Geetha sowjanya (3):
      octeontx2-af: Fix mcs sa cam entries size
      octeontx2-af: Fix mcs stats register address
      octeontx2-af: Add missing mcs flr handler call

Georg Gottleuber (1):
      nvme-pci: Add sleep quirk for Kingston drives

Greg Kroah-Hartman (1):
      Revert "greybus: gb-beagleplay: Ensure le for values in transport"

Haibo Chen (2):
      arm64: dts: imx93: update gpio node name to align with register address
      arm64: dts: imx8ulp: update gpio node name to align with register address

Hans de Goede (3):
      platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
      platform/x86: asus-wmi: Change q500a_i8042_filter() into a
generic i8042-filter
      platform/x86: asus-wmi: Filter Volume key presses if also
reported via atkbd

Hawking Zhang (1):
      drm/amdgpu: Update fw version for boot time error query

Heiko Carstens (1):
      checkstack: fix printed address

Heiko Stuebner (2):
      arm64: dts: rockchip: fix rk356x pcie msg interrupt name
      arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi

Heiner Kallweit (1):
      leds: trigger: netdev: fix RTNL handling to prevent potential deadlock

Helge Deller (1):
      parisc: Fix asm operand number out of range build error in bug table

Hengqi Chen (3):
      LoongArch: Preserve syscall nr across execve()
      LoongArch: BPF: Don't sign extend memory load operand
      LoongArch: BPF: Don't sign extend function return value

Hsin-Yi Wang (1):
      arm64: dts: mt8183: kukui: Fix underscores in node names

Hugh Dickins (1):
      mm: fix oops when filemap_map_pmd() without prealloc_pte

Hui Zhou (1):
      nfp: flower: fix for take a mutex lock in soft irq context and rcu lock

Ian Rogers (2):
      perf metrics: Avoid segv if default metricgroup isn't set
      perf list: Fix JSON segfault by setting the used
skip_duplicate_pmus callback

Ido Schimmel (2):
      psample: Require 'CAP_NET_ADMIN' when joining "packets" group
      drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group

Ilkka Koskinen (1):
      perf vendor events arm64: AmpereOne: Add missing
DefaultMetricgroupName fields

Inki Dae (1):
      drm/exynos: fix a wrong error checking

Ivan Lipski (1):
      drm/amd/display: Add monitor patch for specific eDP

Ivan Orlov (1):
      ALSA: pcmtest: stop timer before buffer is released

Ivan Vecera (1):
      i40e: Fix unexpected MFS warning message

Jack Wang (4):
      RDMA/rtrs-srv: Do not unconditionally enable irq
      RDMA/rtrs-clt: Start hb after path_up
      RDMA/rtrs-clt: Fix the max_send_wr setting
      RDMA/rtrs-clt: Remove the warnings for req in_use check

Jacob Keller (1):
      iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero

Jakub Kicinski (1):
      MAINTAINERS: exclude 9p from networking

James Clark (1):
      coresight: Fix crash when Perf and sysfs modes are used concurrently

Jason Gunthorpe (2):
      iommufd: Add iommufd_ctx to iommufd_put_object()
      iommufd: Do not UAF during iommufd_put_object()

Jason Zhang (1):
      ALSA: pcm: fix out-of-bounds in snd_pcm_state_names

Jens Axboe (1):
      io_uring/kbuf: check for buffer list readiness after NULL check

Jeremy Soller (1):
      ASoC: amd: yc: Add DMI entry to support System76 Pangolin 13

Jiadong Zhu (1):
      drm/amdgpu: disable MCBP by default

Jianheng Zhang (1):
      net: stmmac: fix FPE events losing

Jiaxun Yang (3):
      MIPS: Loongson64: Reserve vgabios memory on boot
      MIPS: Loongson64: Enable DMA noncoherent support
      MIPS: Loongson64: Handle more memory types passed from firmware

Jiexun Wang (1):
      mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()

Jinyang He (1):
      LoongArch: Set unwind stack type to unknown rather than set error flag

Jiri Olsa (2):
      bpf: Fix prog_array_map_poke_run map poke update
      selftests/bpf: Add test for early update in prog_array_map_poke_run

Johan Hovold (1):
      ASoC: soc-pcm: fix up bad merge

Johannes Berg (1):
      Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"

John Fastabend (2):
      net: tls, update curr on splice as well
      bpf: sockmap, updating the sg structure should also update curr

Jonas Karlman (1):
      arm64: dts: rockchip: Expand reg size of vdec node for RK3328

Junhao He (4):
      hwtracing: hisi_ptt: Add dummy callback pmu::read()
      coresight: ultrasoc-smb: Fix sleep while close preempt in enable_smb
      coresight: ultrasoc-smb: Config SMB buffer before register sink
      coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base

Junxian Huang (2):
      RDMA/hns: Fix unnecessary err return when using invalid congest
control algorithm
      MAINTAINERS: Add Chengchang Tang as Hisilicon RoCE maintainer

Kalesh AP (1):
      RDMA/bnxt_re: Correct module description string

Kamil Duljas (3):
      ASoC: Intel: Skylake: Fix mem leak in few functions
      ASoC: SOF: topology: Fix mem leak in sof_dai_load()
      ASoC: Intel: Skylake: mem leak in skl register function

Keith Busch (3):
      nvme: introduce helper function to get ctrl state
      nvme: ensure reset state check ordering
      nvme-ioctl: move capable() admin check to the end

Kelly Kane (1):
      r8152: add vendor/device ID pair for ASUS USB-C2500

Kirill A. Shutemov (2):
      x86/coco: Disable 32-bit emulation by default on TDX and SEV
      x86/tdx: Allow 32-bit emulation by default

Konrad Dybcio (1):
      dt-bindings: interrupt-controller: Allow #power-domain-cells

Konstantin Aladyshev (1):
      usb: gadget: f_hid: fix report descriptor allocation

Krzysztof Kozlowski (2):
      ARM: dts: rockchip: minor whitespace cleanup around '='
      arm64: dts: rockchip: minor whitespace cleanup around '='

Kuan-Ying Lee (1):
      scripts/gdb/tasks: fix lx-ps command error

Kunkun Jiang (1):
      KVM: arm64: GICv4: Do not perform a map to a mapped vLPI

Kunwu Chan (3):
      platform/mellanox: Add null pointer checks for devm_kasprintf()
      platform/mellanox: Check
devm_hwmon_device_register_with_groups() return value
      ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init

Lad Prabhakar (1):
      riscv: errata: andes: Probe for IOCP only once in boot stage

Li Ma (1):
      drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table

Lijo Lazar (4):
      drm/amdgpu: Restrict extended wait to PSP v13.0.6
      drm/amdgpu: Add NULL checks for function pointers
      drm/amdgpu: Update HDP 4.4.2 clock gating flags
      drm/amdgpu: Avoid querying DRM MGCG status

Like Xu (2):
      KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode
      KVM: x86: Remove 'return void' expression for 'void function'

Linus Torvalds (1):
      Linux 6.7-rc5

Liu Shixin (2):
      Revert "mm/kmemleak: move the initialisation of object to __link_object"
      mm/kmemleak: move set_track_prepare() outside raw_spinlocks

Lizhi Xu (1):
      squashfs: squashfs_read_data need to check if the length is 0

Lorenzo Bianconi (1):
      net: veth: fix packet segmentation in veth_convert_skb_to_xdp_buff

Lorenzo Pieralisi (1):
      firmware: arm_ffa: Fix ffa_notification_info_get() IDs handling

Luca Ceresoli (1):
      of: dynamic: Fix of_reconfig_get_state_change() return value documentation

Lukasz Luba (1):
      powercap: DTPM: Fix missing cpufreq_cpu_put() calls

Luke D. Jones (1):
      platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend

Maciej Strozek (2):
      ASoC: cs43130: Fix the position of const qualifier
      ASoC: cs43130: Fix incorrect frame delay configuration

Malcolm Hart (1):
      ASoC: amd: yc: Fix non-functional mic on ASUS E1504FA

Marcin Szycik (1):
      ice: Restore fix disabling RX VLAN filtering

Marian Postevca (1):
      ASoC: amd: acp: Add support for a new Huawei Matebook laptop

Mario Limonciello (1):
      ALSA: hda/realtek: Add Framework laptop 16 to quirks

Mathias Nyman (1):
      Revert "xhci: Loosen RPM as default policy to cover for AMD xHC 1.1"

Matthias Reichl (1):
      regmap: fix bogus error on regcache_sync success

Matus Malych (1):
      ASoC: amd: yc: Add HP 255 G10 into quirk table

Md Haris Iqbal (3):
      RDMA/rtrs-srv: Check return values while processing info request
      RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true
      RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight

Michael Walle (1):
      dt-bindings: display: mediatek: dsi: remove Xinlei's mail

Michal Swiatkowski (1):
      ice: change vfs.num_msix_per to vf->num_msix

Mike Kravetz (1):
      hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write

Mike Marciniszyn (3):
      RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz
      RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned
      RDMA/irdma: Fix support for 64k pages

Ming Lei (1):
      lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly

Miquel Raynal (1):
      nvmem: Do not expect fixed layouts to grab a layout driver

Mukesh Ojha (1):
      devcoredump: Send uevent once devcd is ready

Mustafa Ismail (2):
      RDMA/irdma: Do not modify to SQD on error
      RDMA/irdma: Add wait for suspend on SQD

Nathan Rossi (1):
      arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3

Naveen Mamindlapalli (1):
      octeontx2-pf: consider both Rx and Tx packet stats for adaptive
interrupt coalescing

Naveen N Rao (1):
      powerpc/ftrace: Fix stack teardown in ftrace_no_trace

Neil Armstrong (1):
      ASoC: codecs: lpass-tx-macro: set active_decimator correct default value

Nico Pache (1):
      selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS

Nitesh Shetty (1):
      nvme: prevent potential spectre v1 gadget

Nithin Dabilpuram (1):
      octeontx2-af: Adjust Tx credits when MCS external bypass is disabled

Nícolas F. R. A. Prado (1):
      dt: dt-extract-compatibles: Don't follow symlinks when walking tree

Pablo Neira Ayuso (2):
      netfilter: nf_tables: bail out on mismatching dynset and set expressions
      netfilter: nf_tables: validate family when identifying table via handle

Paolo Abeni (1):
      tcp: fix mid stream window clamp.

Pascal Noël (1):
      ALSA: hda/realtek: Apply quirk for ASUS UM3504DA

Paulo Alcantara (1):
      smb: client: fix potential NULL deref in parse_dfs_referrals()

Pavel Begunkov (2):
      io_uring: fix mutex_unlock with unreferenced ctx
      io_uring/af_unix: disable sending io_uring over sockets

Peng Fan (1):
      arm64: dts: imx93: correct mediamix power

Peter Ujfalusi (5):
      ASoC: Intel: skl_hda_dsp_generic: Drop HDMI routes when HDMI is
not available
      ASoC: Intel: sof_sdw: Always register the HDMI dai links
      ASoC: hdac_hda: Conditionally register dais for HDMI and Analog
      ASoC: SOF: ipc4-topology: Correct data structures for the SRC module
      ASoC: SOF: ipc4-topology: Correct data structures for the GAIN module

Peter Xu (4):
      mm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check
      mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set
      mm/selftests: fix pagemap_ioctl memory map test
      mm/Kconfig: make userfaultfd a menuconfig

Peter Zijlstra (1):
      perf: Fix perf_event_validate_size()

Petr Pavlu (3):
      tracing: Fix incomplete locking when disabling buffered events
      tracing: Fix a warning when allocating buffered events fails
      tracing: Fix a possible race when disabling buffered events

Phil Sutter (1):
      netfilter: xt_owner: Fix for unsafe access of sk->sk_socket

Philipp Zabel (1):
      ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt

RD Babiera (1):
      usb: typec: class: fix typec_altmode_put_partner to put plugs

Rafael J. Wysocki (1):
      ACPI: utils: Fix error path in acpi_evaluate_reference()

Rahul Bhansali (1):
      octeontx2-af: Update Tx link register range

Randy Dunlap (2):
      hv_netvsc: rndis_filter needs to select NLS
      greybus: BeaglePlay driver needs CRC_CCITT

Ranjani Sridharan (2):
      ASoC: SOF: ipc4-topology: Add core_mask in struct snd_sof_pipeline
      ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores

Rob Herring (2):
      arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
      dt-bindings: perf: riscv,pmu: drop unneeded quotes

Robin Murphy (1):
      iommufd/selftest: Fix _test_mock_dirty_bitmaps()

Roman Gushchin (1):
      mm: kmem: properly initialize local objcg variable in current_obj_cgroup()

Roman Li (1):
      drm/amd/display: Fix array-index-out-of-bounds in dml2

Ronald Wahl (3):
      serial: 8250: 8250_omap: Do not start RX DMA on THRI interrupt
      serial: 8250_omap: Add earlycon support for the AM654 UART controller
      serial: 8250: 8250_omap: Clear UART_HAS_RHR_IT_DIS bit

Roy Luo (1):
      USB: gadget: core: adjust uevent timing on gadget unbind

Ryusuke Konishi (2):
      nilfs2: fix missing error check for sb_set_blocksize call
      nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()

Sam Edwards (2):
      arm64: dts: rockchip: Fix Turing RK1 interrupt pinctrls
      arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588

Samuel Holland (1):
      riscv: Fix SMP when shadow call stacks are enabled

Sarah Grant (1):
      ALSA: usb-audio: Add Pioneer DJM-450 mixer controls

Sascha Hauer (1):
      dt-bindings: soc: rockchip: grf: add rockchip,rk3588-pmugrf

Sean Christopherson (3):
      KVM: Set file_operations.owner appropriately for all such structures
      Revert "KVM: Prevent module exit until all VMs are freed"
      KVM: SVM: Update EFER software model on CR0 trap for SEV-ES

Sean Nyekjaer (1):
      net: dsa: microchip: provide a list of valid protocols for xmit handler

SeongJae Park (2):
      mm/damon/core: copy nr_accesses when splitting region
      mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions

Shannon Nelson (4):
      pds_vdpa: fix up format-truncation complaint
      pds_vdpa: clear config callback when status goes to 0
      pds_vdpa: set features order
      ionic: fix snprintf format length warning

Shengjiu Wang (3):
      ASoC: fsl_sai: Fix no frame sync clock issue on i.MX8MP
      ASoC: fsl_xcvr: Enable 2 * TX bit clock for spdif only case
      ASoC: fsl_xcvr: refine the requested phy clock frequency

Shifeng Li (2):
      RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info()
      RDMA/irdma: Avoid free the non-cqp_request scratch

Shigeru Yoshida (2):
      RDMA/core: Fix uninit-value access in ib_get_eth_speed()
      ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()

Shin'ichiro Kawasaki (1):
      nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions

Shuming Fan (1):
      ASoC: rt5650: add mutex to avoid the jack detection failure

Shyam Prasad N (2):
      Revert "cifs: reconnect work should have reference on server struct"
      cifs: reconnect worker should take reference on server struct
unconditionally

Sidhartha Kumar (1):
      mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI

Srinivas Kandagatla (2):
      ASoC: ops: add correct range check for limiting volume
      ASoC: qcom: sc8280xp: Limit speaker digital volumes

Stefan Eichenberger (1):
      arm64: dts: imx8-apalis: set wifi regulator to always-on

Stefan Kerkmann (1):
      ARM: dts: imx6q: skov: fix ethernet clock regression

Stefan Wahren (1):
      ARM: dts: bcm2711-rpi-400: Fix delete-node of led_act

Stefan Wiehler (1):
      mips/smp: Call rcutree_report_cpu_starting() earlier

Stefano Garzarella (1):
      vsock/virtio: fix "comparison of distinct pointer types lacks a
cast" warning

Steve Sistare (1):
      vdpa/mlx5: preserve CVQ vringh index

Steven Rostedt (Google) (5):
      tracing: Always update snapshot buffer size
      tracing: Stop current tracer when resizing buffer
      tracing: Disable snapshot buffer when stopping instance tracers
      ring-buffer: Force absolute timestamp on discard of event
      ring-buffer: Test last update in 32bit version of __rb_time_read()

Su Hui (3):
      misc: mei: client.c: return negative error code in mei_cl_write
      misc: mei: client.c: fix problem of return '-EOVERFLOW' in mei_cl_write
      highmem: fix a memory copy problem in memcpy_from_folio

Subbaraya Sundeep (2):
      octeontx2-pf: Add missing mutex lock in otx2_get_pauseparam
      octeontx2-af: Check return value of nix_get_nixlf before using nixlf

Sudeep Holla (8):
      firmware: arm_ffa: Declare ffa_bus_type structure in the header
      firmware: arm_ffa: Allow FF-A initialisation even when notification fails
      firmware: arm_ffa: Setup the partitions after the notification
initialisation
      firmware: arm_ffa: Add checks for the notification enabled state
      firmware: arm_ffa: Fix FFA notifications cleanup path
      firmware: arm_ffa: Fix the size of the allocation in
ffa_partitions_cleanup()
      firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
      firmware: arm_scmi: Fix possible frequency truncation when using
level indexing mode

Sumanth Korikkar (2):
      mm/memory_hotplug: add missing mem_hotplug_lock
      mm/memory_hotplug: fix error handling in add_memory_resource()

Sumit Garg (1):
      tee: optee: Fix supplicant based device enumeration

Takashi Iwai (1):
      ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7

Tejun Heo (1):
      workqueue: Make sure that wq_unbound_cpumask is never empty

Thinh Tran (1):
      net/tg3: fix race condition in tg3_reset_task()

Thomas Bogendoerfer (1):
      MIPS: kernel: Clear FPU states when setting up kernel threads

Thomas Gleixner (2):
      x86/entry: Convert INT 0x80 emulation to IDTENTRY
      x86/entry: Do not allow external 0x80 interrupts

Thomas Reichinger (1):
      arcnet: restoring support for multiple Sohard Arcnet cards

Thomas Zimmermann (1):
      drm/atomic-helpers: Invoke end_fb_access while owning plane state

Tiezhu Yang (2):
      LoongArch: BPF: Fix sign-extension mov instructions
      LoongArch: BPF: Fix unconditional bswap instructions

Tim Bosse (1):
      ALSA: hda/realtek: add new Framework laptop to quirks

Tim Van Patten (1):
      cgroup_freezer: cgroup_freezing: Check if not frozen

Timur Tabi (1):
      nouveau/gsp: document some aspects of GSP-RM

Tobias Waldekranz (1):
      net: dsa: mv88e6xxx: Restore USXGMII support for 6393X

Uwe Kleine-König (1):
      coresight: etm4x: Remove bogous __exit annotation for some functions

Vegard Nossum (1):
      Documentation: coresight: fix `make refcheckdocs` warning

Ville Syrjälä (4):
      drm/i915: Check pipe active state in {planes,vrr}_{enabling,disabling}()
      drm/i915: Skip some timing checks on BXT/GLK DSI transcoders
      drm/i915/mst: Fix .mode_valid_ctx() return values
      drm/i915/mst: Reject modes that require the bigjoiner

WANG Rui (1):
      LoongArch: Apply dynamic relocations for LLD

Wen Gu (1):
      net/smc: fix missing byte order conversion in CLC handshake

Xi Ruoyao (1):
      LoongArch: Slightly clean up drdtime()

Xiang Yang (1):
      drm/exynos: fix a potential error pointer dereference

Xiaolei Wang (1):
      arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup

Yang Wang (2):
      drm/amd/pm: support new mca smu error code decoding
      drm/amdgpu: optimize the printing order of error data

Yewon Choi (1):
      xsk: Skip polling event check for unbound socket

Yi Zhang (1):
      ndtest: fix typo class_regster -> class_register

Yicong Yang (2):
      hwtracing: hisi_ptt: Handle the interrupt in hardirq context
      hwtracing: hisi_ptt: Don't try to attach a task

Yonghong Song (1):
      bpf: Fix a verifier bug due to incorrect branch offset
comparison with cpu=v4

Yonglong Liu (2):
      net: hns: fix wrong head when modify the tx feature when sending packets
      net: hns: fix fake link up on xge port

Yu Kuai (4):
      md: fix missing flush of sync_work
      md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
      md: fix stopping sync thread
      md: split MD_RECOVERY_NEEDED out of mddev_resume

Zhipeng Lu (1):
      octeontx2-af: fix a use-after-free in rvu_npa_register_reporters

angquan yu (1):
      KVM: selftests: Actually print out magic token in NX hugepages
skip message

heminhong (1):
      drm/i915: correct the input parameter on _intel_dsb_commit()

^ permalink raw reply	[relevance 39%]

* Re: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses
  @ 2023-12-09 20:08 99%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-09 20:08 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Michal Luczaj, x86, tglx, mingo, bp, dave.hansen, shuah, luto,
	linux-kernel

On Sat, 9 Dec 2023 at 09:16, Brian Gerst <brgerst@gmail.com> wrote:
>
> A different way to plug this is to harden ptrace (and sigreturn) to
> verify that the segments are code or data type segments instead of
> relying on an IRET fault.

I think that is likely a good idea regardless of this particular issue.

And I don't think you need to even check the segment for any kind of
validity - all you need to check that it's a valid selector.

And we *kind* of do that already, with the x86 ptrace code checking

  static inline bool invalid_selector(u16 value)
  {
        return unlikely(value != 0 && (value & SEGMENT_RPL_MASK) != USER_RPL);
  }

but the thing is, I think we could limit that a lot more.

I think the only valid GDT entries are 0-15 (that includes the default
kernel segments, but they don't contain anything interesting), so we
could tighten that selector check to say that it has to be either a
LDT entry or a selector < 15.

So add some kind of requirement for "(value & 4) || (value < 8*16)", perhaps?

              Linus

^ permalink raw reply	[relevance 99%]

* Re: [syzbot] [kernel?] possible deadlock in stack_depot_put
       [not found]           ` <20231206112215.1381-1-hdanton@sina.com>
@ 2023-12-06 11:40 97%         ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-06 11:40 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Petr Mladek, Tetsuo Handa, syzbot, linux-kernel, Matthew Wilcox,
	John Ogness, Waiman Long, syzkaller-bugs

On Wed, 6 Dec 2023 at 20:22, Hillf Danton <hdanton@sina.com> wrote:
>
> Given the same pattern in both up() and __mutex_unlock_slowpath() where
> acquire raw spinlock to wake waiter up, it is safe to unlock mutex in
> irq context.

What? No. That spinlock is exactly why it is NOT OK to unlock a mutex
in irq context.

If somebody else is trying to get or release the mutex at the same
time an interrupt happens, you now have an immediate deadlock.

No spinlocks - raw or not - are irq safe.

The only way you make them irq-safe is by disabling interrupts
entirely across the locked region, which the mutex code very much does
not do, and does not want to do.

So no. Mutexes are not usable from interrupts.

So repeat after me: MUTEXES CANNOT BE USED IN ANY FORM IN INTERRUPT
CONTEXT. End of story.

Other locks do work. completions are designed to be done from
interrupts. And our legacy semaphores were irq-safe (for wakeups) from
day one, which is then why the spinlock in the legacy semaphore is
done with interrupts disabled, and why you can do "down_trylock()" and
"up[()" in interrupt context.

But mutexes wanted to consciously avoid that, partly *exactly* because
they didn't want to have the more expensive irq-safe spinlocks
(particularly with the debugging versions)

            Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH -tip 1/3] x86/percpu: Fix "const_pcpu_hot" version generation failure
  @ 2023-12-03 22:19 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-03 22:19 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Thomas Gleixner,
	Josh Poimboeuf

On Mon, 4 Dec 2023 at 07:12, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> +/*
> + * The generic per-cpu infrastrucutre is not suitable for
> + * reading const-qualified variables.
> + */
> +#define this_cpu_read_const(pcp)       ({ BUG(); (typeof(pcp))0; })

NAK. Absolutely not.

No way in hell is it acceptable to make this a run-time BUG. If it
doesn't work, it needs to be a compile failure. End of story.

                Linus

^ permalink raw reply	[relevance 99%]

* Linux 6.7-rc4
@ 2023-12-03 10:18 47% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-12-03 10:18 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Another -rc with slightly odd timing due to time zones and travel
(hey, it's Sunday afternoon *somewhere* right now), but it's the last
trip of the year, so we won't be seeing any more of that.

Of course, instead of travel, we have the holidays coming up. As
usual, that makes for an interesting release cadence, but at least
this time I think the timing ends up working out, with the holidays
happening during the tail end of the release schedule.

And that "tail end of the release schedule" is while the current 6.7
release is supposed to be very quiet anyway, which sounds nice and
like it all is working out just fine from a timing perspective.  But
the tail end of the release is then also when developers are supposed
to get ready for the _next_ merge window.

So while it all looks superficially convenient from a 6.7 release
schedule, it almost certainly means that we'll have to do something
about the 6.8 merge window.

We'll see. Maybe people will decide to try to get their ducks lined up
super-early for 6.8, or maybe we'll delay the next merge window or
something. I haven't decided yet, and nobody has emailed me in a panic
about it (yet).

*Anyway*, right now we're still a few weeks away from that, and this
is just the rc4 release. And things look fine for now, with a fairly
small rc4 - although that might also be due to me not being the only
developer on the road for conferences...

The appended shortlog gives the details, but the last week looks
pretty normal, with drivers dominating (drm and particularly the AMD
GPU side showing up in the diffstat). But we've got a little bit of
everything, including tooling, filesystems (bcachefs showing up, but
noise elsewhere too) and core networking. Some minor architecture
fixes too.

Please test,

          Linus

---

Abdul Halim, Mohd Syazwan (1):
      iommu/vt-d: Add MTL to quirk list to skip TE disabling

Adrian Hunter (6):
      mmc: block: Do not lose cache flush during CQE error recovery
      mmc: cqhci: Increase recovery halt timeout
      mmc: block: Be sure to wait while busy in CQE error recovery
      mmc: block: Retry commands in CQE error recovery
      mmc: cqhci: Warn of halt or task clear failure
      mmc: cqhci: Fix task clearing in CQE error recovery

Alex Deucher (1):
      drm/amdgpu: fix AGP addressing when GART is not at 0

Alex Sierra (1):
      drm/amdgpu: Force order between a read and write to the same address

Alvin Lee (3):
      drm/amd/display: Include udelay when waiting for INBOX0 ACK
      drm/amd/display: Use DRAM speed from validation for dummy p-state
      drm/amd/display: Increase num voltage states to 40

Antonio Borneo (1):
      pinctrl: stm32: fix array read out of bound

Ard Biesheuvel (1):
      arm64: Avoid enabling KPTI unnecessarily

Arnaldo Carvalho de Melo (1):
      tools: Disable __packed attribute compiler warning due to
-Werror=attributes

Arnd Bergmann (1):
      media: pci: mgb4: add COMMON_CLK dependency

Ayush Jain (1):
      cpufreq/amd-pstate: Only print supported EPP values for
performance governor

Bart Van Assche (1):
      block: Document the role of the two attribute groups

Ben Greear (1):
      wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap

Bragatheswaran Manickavel (1):
      btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()

Brett Creeley (2):
      vfio/pds: Fix mutex lock->magic != lock warning
      vfio/pds: Fix possible sleep while in atomic context

Brian Foster (1):
      bcachefs: preserve device path as device name

Camille Cho (1):
      drm/amd/display: Simplify brightness initialization

Candice Li (1):
      drm/amdgpu: Update EEPROM I2C address for smu v13_0_0

Charles Keepax (1):
      pinctrl: lochnagar: Don't build on MIPS

Chen Ni (1):
      pinctrl: stm32: Add check for devm_kcalloc

Chester Lin (2):
      pinctrl: s32cc: Avoid possible string truncation
      dt-bindings: pinctrl: s32g2: change a maintainer email address

Christian König (1):
      dma-buf: fix check in dma_resv_add_fence

Christoph Niedermaier (1):
      cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily

Claudiu Beznea (6):
      net: ravb: Check return value of reset_control_deassert()
      net: ravb: Use pm_runtime_resume_and_get()
      net: ravb: Make write access to CXR35 first before accessing
other EMAC registers
      net: ravb: Start TX queues after HW initialization succeeded
      net: ravb: Stop DMA in case of failures on ravb_open()
      net: ravb: Keep reverse order of operations in ravb_remove()

Damien Le Moal (2):
      scsi: Change SCSI device boolean fields to single bit flags
      scsi: sd: Fix system start for ATA devices

Dan Carpenter (4):
      media: v4l2-subdev: Fix a 64bit bug
      wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta()
      xen/events: fix error code in xen_bind_pirq_msi_to_irq()
      nouveau/gsp/r535: remove a stray unlock in r535_gsp_rpc_send()

Daniel Borkmann (1):
      netkit: Reject IFLA_NETKIT_PEER_INFO in netkit_change_link

Daniel Mentz (1):
      iommu: Fix printk arg in of_iommu_get_resv_regions()

Dave Airlie (1):
      nouveau: find the smallest page allocation to cover a buffer alloc.

Dave Ertman (1):
      ice: Fix VF Reset paths when interface in a failed over aggregate

Dave Jiang (1):
      ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes

David Howells (2):
      cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
      cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved

David Sterba (1):
      btrfs: fix 64bit compat send ioctl arguments not initializing
version member

Dinghao Liu (1):
      drm/amd/pm: fix a memleak in aldebaran_tables_init

Dmitry Antipov (2):
      uapi: propagate __struct_group() attributes to the container union
      smb: client, common: fix fortify warnings

Dmitry Baryshkov (1):
      MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry

Dmytro Laktyushkin (1):
      drm/amd/display: update dcn315 lpddr pstate latency

Edward Adam Davis (1):
      mptcp: fix uninit-value in mptcp_incoming_options

Elena Salomatkina (1):
      octeontx2-af: Fix possible buffer overflow

Ewan D. Milne (1):
      nvme: check for valid nvme_identify_ns() before using it

Felix Kuehling (1):
      Revert "drm/prime: Unexport helpers for fd/handle conversion"

Filipe Manana (2):
      btrfs: fix off-by-one when checking chunk map includes logical address
      btrfs: make error messages more clear when getting a chunk map

Furong Xu (1):
      net: stmmac: xgmac: Disable FPE MMC interrupts

Gautham R. Shenoy (1):
      cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()

Geetha sowjanya (1):
      octeontx2-pf: Fix adding mbox work queue entry when num_vfs > 64

Greg Ungerer (2):
      net: dsa: mv88e6xxx: fix marvell 6350 switch probing
      net: dsa: mv88e6xxx: fix marvell 6350 probe crash

Gustavo A. R. Silva (3):
      gcc-plugins: randstruct: Update code comment in relayout_struct()
      neighbour: Fix __randomize_layout crash in struct neighbour
      nouveau/gsp: replace zero-length array with flex-array member
and use __counted_by

Hamza Mahfooz (1):
      drm/amd/display: fix ABM disablement

Hans de Goede (1):
      ACPI: video: Use acpi_video_device for cooling-dev driver data

Hawking Zhang (1):
      drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interrupt

Heiner Kallweit (2):
      r8169: fix deadlock on RTL8125 in jumbo mtu mode
      r8169: prevent potential deadlock in rtl8169_close

Hou Tao (1):
      bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags()

Ilya Bakoulin (1):
      drm/amd/display: Fix MPCC 1DLUT programming

Ioana Ciornei (2):
      dpaa2-eth: increase the needed headroom to account for alignment
      dpaa2-eth: recycle the RX buffer only after all processing done

JP Kobryn (1):
      kprobes: consistent rcu api usage for kretprobe holder

Jakub Kicinski (2):
      ethtool: don't propagate EOPNOTSUPP from dumps
      tools: ynl-gen: always construct struct ynl_req_state

Jann Horn (1):
      btrfs: send: ensure send_fd is writable

Jason Gunthorpe (1):
      iommu: Flow ERR_PTR out from __iommu_domain_alloc()

Jens Axboe (8):
      io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP
      io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP
      io_uring: enable io_mem_alloc/free to be used in other parts
      io_uring/kbuf: defer release of mapped buffer rings
      io_uring/kbuf: recycle freed mapped buffer ring entries
      io_uring/kbuf: prune deferred locked cache when tearing down
      io_uring: free io_buffer_list entries via RCU
      io_uring: use fget/fput consistently

Jiawen Wu (1):
      net: libwx: fix memory leak on msix entry

Johannes Berg (9):
      wifi: cfg80211: fix CQM for non-range use
      wifi: cfg80211: lock wiphy mutex for rfkill poll
      wifi: cfg80211: hold wiphy mutex for send_interface
      debugfs: fix automount d_fsdata usage
      debugfs: annotate debugfs handlers vs. removal with lockdep
      debugfs: add API to allow debugfs operations cancellation
      wifi: cfg80211: add locked debugfs wrappers
      wifi: mac80211: use wiphy locked debugfs helpers for agg_status
      wifi: mac80211: use wiphy locked debugfs for sdata/link

John Fastabend (2):
      bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
      bpf, sockmap: Add af_unix test with both sockets in map

Jonathan Kim (1):
      drm/amdgpu: update xgmi num links info post gc9.4.2

Juergen Gross (1):
      x86/xen: fix percpu vcpu_info allocation

Kailang Yang (2):
      ALSA: hda/realtek: Headset Mic VREF to 100%
      ALSA: hda/realtek: Add supported ALC257 for ChromeOS

Keith Busch (1):
      nvme-core: check for too small lba shift

Kent Overstreet (22):
      closures: CLOSURE_CALLBACK() to fix type punning
      bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig option
      bcachefs: bch2_moving_ctxt_flush_all()
      bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops
      bcachefs: Don't stop copygc thread on device resize
      bcachefs: Start gc, copygc, rebalance threads after initing writes ref
      bcachefs: Fix an endianness conversion
      bcachefs: Proper refcounting for journal_keys
      bcachefs: deallocate_extra_replicas()
      bcachefs: Data update path won't accidentaly grow replicas
      bcachefs: Fix ec + durability calculation
      bcachefs: bpos is misaligned on big endian
      bcachefs: Fix zstd compress workspace size
      bcachefs: Add missing validation for jset_entry_data_usage
      bcachefs: Fix bucket data type for stripe buckets
      bcachefs: Fix split_race livelock
      bcachefs: trace_move_extent_start_fail() now includes errcode
      bcachefs: -EROFS doesn't count as move_extent_start_fail
      bcachefs: move journal seq assertion
      bcachefs: Fix race between btree writes and metadata drop
      bcachefs: Convert gc_alloc_start() to for_each_btree_key2()
      bcachefs: Extra kthread_should_stop() calls for copygc

Kornel Dulęba (1):
      mmc: sdhci-pci-gli: Disable LPM during initialization

Kunwu Chan (1):
      iommu/vt-d: Set variable intel_dirty_ops to static

Laurent Pinchart (1):
      media: vsp1: Remove unbalanced .s_stream(0) calls

Li Ma (1):
      drm/amdgpu: add init_registers for nbio v7.11

Lijo Lazar (1):
      drm/amdgpu: Use another offset for GC 9.4.3 remap

Linus Torvalds (1):
      Linux 6.7-rc4

Linus Walleij (4):
      pinctrl: cy8c95x0: Fix doc warning
      Revert "drm/bridge: panel: Check device dependency before
managing device link"
      Revert "driver core: Export device_is_dependent() to modules"
      Revert "drm/bridge: panel: Add a device link between drm device
and panel device"

Liu Ying (2):
      drm/bridge: panel: Check device dependency before managing device link
      driver core: Export device_is_dependent() to modules

Lorenzo Bianconi (1):
      wifi: mt76: mt7925: fix typo in mt7925_init_he_caps

Lu Baolu (5):
      iommu/vt-d: Support enforce_cache_coherency only for empty domains
      iommu/vt-d: Omit devTLB invalidation requests when TES=0
      iommu/vt-d: Disable PCI ATS in legacy passthrough mode
      iommu/vt-d: Make context clearing consistent with context mapping
      iommu/vt-d: Fix incorrect cache invalidation for mm notification

Lu Yao (1):
      drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer

Lukasz Luba (1):
      powercap: DTPM: Fix unneeded conversions to micro-Watts

Maria Yu (1):
      pinctrl: avoid reload of p state in list iteration

Mario Limonciello (1):
      drm/amd: Enable PCIe PME from D3

Mark O'Donovan (1):
      nvme: fine-tune sending of first keep-alive

Markus Weippert (1):
      bcache: revert replacing IS_ERR_OR_NULL with IS_ERR

Martin Tůma (1):
      media: mgb4: Added support for T200 card variant

Masami Hiramatsu (Google) (1):
      rethook: Use __rcu pointer for rethook::handler

Maurizio Lombardi (1):
      nvme-core: fix a memory leak in nvme_ns_info_from_identify()

Maxime Ripard (1):
      kunit: Warn if tests are slow

Michael Roth (1):
      efi/unaccepted: Fix off-by-one when checking for overlapping ranges

Michael Strauss (1):
      drm/amd/display: Do not read DPREFCLK spread info from LUT on DCN35

Michael-CY Lee (1):
      wifi: avoid offset calculation on NULL pointer

Michal Wajdeczko (1):
      kunit: Reset suite counter right before running tests

Mikulas Patocka (2):
      dm-verity: align struct dm_verity_fec_io properly
      dm-flakey: start allocating with MAX_ORDER

Ming Lei (2):
      block: move .bd_inode into 1st cacheline of block_device
      blk-mq: don't count completed flush data request as inflight in
case of quiesce

Ming Yen Hsieh (1):
      wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config

Mukul Joshi (1):
      drm/amdkfd: Use common function for IP version check

Namhyung Kim (14):
      tools headers UAPI: Update tools's copy of drm headers
      tools headers UAPI: Update tools's copy of fscrypt.h header
      tools headers UAPI: Update tools's copy of kvm.h header
      tools headers UAPI: Update tools's copy of mount.h header
      tools headers UAPI: Update tools's copy of vhost.h header
      tools headers UAPI: Update tools's copy of unistd.h header
      tools headers: Update tools's copy of socket.h header
      tools headers: Update tools's copy of x86/asm headers
      tools headers: Update tools's copy of arm64/asm headers
      tools headers: Update tools's copy of s390/asm headers
      tools/perf: Update tools's copy of x86 syscall table
      tools/perf: Update tools's copy of powerpc syscall table
      tools/perf: Update tools's copy of s390 syscall table
      tools/perf: Update tools's copy of mips syscall table

Namjae Jeon (6):
      ksmbd: fix possible deadlock in smb2_open
      ksmbd: separately allocate ci per dentry
      ksmbd: move oplock handling after unlock parent dir
      ksmbd: release interim response after sending status pending response
      ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId
      ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error

Nicholas Kazlauskas (8):
      drm/amd/display: Add z-state support policy for dcn35
      drm/amd/display: Update DCN35 watermarks
      drm/amd/display: Add Z8 watermarks for DML2 bbox overrides
      drm/amd/display: Feed SR and Z8 watermarks into DML2 for DCN35
      drm/amd/display: Remove min_dst_y_next_start check for Z8
      drm/amd/display: Update min Z8 residency time to 2100 for DCN314
      drm/amd/display: Update DCN35 clock table policy
      drm/amd/display: Allow DTBCLK disable for DCN35

Nicholas Piggin (1):
      KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers

Oldřich Jedlička (1):
      wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush

Oliver Upton (2):
      tools perf: Add arm64 sysreg files to MANIFEST
      perf build: Ensure sysreg-defs Makefile respects output dir

Paulo Alcantara (2):
      smb: client: fix missing mode bits for SMB symlinks
      smb: client: report correct st_size for SMB and NFS symlinks

Perry Yuan (1):
      drm/amdgpu: optimize RLC powerdown notification on Vangogh

Peter Ujfalusi (1):
      ALSA: hda: intel-nhlt: Ignore vbps when looking for DMIC 32 bps format

Peter Wang (1):
      scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode

Prike Liang (1):
      drm/amdgpu: correct the amdgpu runtime dereference usage count

Qu Wenruo (4):
      btrfs: tree-checker: add type and sequence check for inline backrefs
      btrfs: do not abort transaction if there is already an existing qgroup
      btrfs: add dmesg output for first mount and last unmount of a filesystem
      btrfs: free the allocated memory if btrfs_alloc_page_array() fails

Richard Fitzgerald (2):
      kunit: test: Avoid cast warning when adding kfree() as an action
      ALSA: hda: cs35l56: Enable low-power hibernation mode on SPI

Ritesh Harjani (IBM) (1):
      ext2: Fix ki_pos update for DIO buffered-io fallback case

Robin Murphy (1):
      iommu: Avoid more races around device probe

Sean Christopherson (1):
      vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart

Stanislav Fomichev (1):
      netdevsim: Don't accept device bound programs

Stefan Binding (2):
      ALSA: hda: cs35l41: Remove unnecessary boolean state variable
firmware_running
      ALSA: cs35l41: Fix for old systems which do not support command

Stephan Gerhold (3):
      cpufreq: qcom-nvmem: Enable virtual power domain devices
      cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
      pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP

Subbaraya Sundeep (1):
      octeontx2-pf: Restore TC ingress police rules when interface is up

Sung Joon Kim (1):
      drm/amd/display: Fix black screen on video playback with embedded panel

Taimur Hassan (3):
      drm/amd/display: Remove config update
      drm/amd/display: Fix conversions between bytes and KB
      drm/amd/display: Fix some HostVM parameters in DML

Takashi Iwai (2):
      leds: class: Don't expose color sysfs entry
      ALSA: hda: Disable power-save on KONTRON SinglePC

Thomas Hellström (1):
      drm/gpuvm: Fix deprecated license identifier

Tim Huang (1):
      drm/amdgpu: fix memory overflow in the IB test

Timothy Pearson (1):
      powerpc: Don't clobber f0/vs0 during fp|altivec register save

Tvrtko Ursulin (1):
      drm/i915/gsc: Mark internal GSC engine with reserved uabi class

Tzuyi Chang (1):
      pinctrl: realtek: Fix logical error when finding descriptor

Ulf Hansson (1):
      pmdomain: arm: Avoid polling for scmi_perf_domain

Vasiliy Kovalev (1):
      ALSA: hda - Fix speaker and headset mic pin config for CHUWI CoreBook XPro

Ville Syrjälä (2):
      drm/i915: Also check for VGA converter in eDP probe
      drm/i915: Call intel_pre_plane_updates() also for pipes getting enabled

Wenchao Chen (1):
      mmc: sdhci-sprd: Fix vqmmc not shutting down after the card was pulled

Wenjing Liu (1):
      drm/amd/display: fix a pipe mapping error in dcn32_fpu

Willem de Bruijn (4):
      selftests/net: ipsec: fix constant out of range
      selftests/net: fix a char signedness issue
      selftests/net: unix: fix unused variable compiler warning
      selftests/net: mptcp: fix uninitialized variable warnings

Wu Bo (2):
      dm verity: initialize fec io before freeing it
      dm verity: don't perform FEC for failed readahead IO

Wyes Karny (1):
      cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update

Yang Jihong (2):
      perf kwork: Fix a build error on 32-bit
      perf lock contention: Fix a build error on 32-bit

Yang Yingliang (2):
      drm/panel: nt36523: fix return value check in nt36523_probe()
      firewire: core: fix possible memory leak in create_units()

Yoshihiro Shimoda (4):
      net: rswitch: Fix type of ret in rswitch_start_xmit()
      net: rswitch: Fix return value in rswitch_start_xmit()
      net: rswitch: Fix missing dev_kfree_skb_any() in error path
      ravb: Fix races between ravb_tx_timeout_work() and net related ops

Yu Kuai (1):
      block: warn once for each partition in bio_check_ro()

ZhenGuo Yin (1):
      drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

Zhengchao Shao (1):
      ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet

Zhongwei (1):
      drm/amd/display: force toggle rate wa for first link training
for a retimer

Zongmin Zhou (1):
      ksmbd: prevent memory leak on error return

ndesaulniers@google.com (1):
      MAINTAINERS: refresh LLVM support

wuqiang.matt (1):
      lib: objpool: fix head overrun on RK3588 SBC

xiazhengqiao (1):
      drm/panel: starry-2081101qfh032011-53g: Fine tune the panel power sequence

^ permalink raw reply	[relevance 47%]

* Re: [GIT PULL] Pin control fixes for v6.7 minus one patch
  @ 2023-11-29 15:48 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-29 15:48 UTC (permalink / raw)
  To: Linus Walleij
  Cc: open list:GPIO SUBSYSTEM, linux-kernel, Maria Yu, Charles Keepax

On Wed, 29 Nov 2023 at 07:18, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Here is an updated tag on a branch where the only change
> is to drop the locking READ_ONCE() patch until we know
> more about what is going on here.

Bah. I already pulled the previous one and pushed out before reading
more emails and noticing you had so quickly re-done it.

So the READ_ONCE() workaround is there now, but I hope there will be a
future patch that explains (and fixes) whatever made the value change
from underneath that code.

          Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] Pin control fixes for v6.7
  @ 2023-11-29 14:55 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-29 14:55 UTC (permalink / raw)
  To: Linus Walleij
  Cc: open list:GPIO SUBSYSTEM, linux-kernel, Maria Yu, Charles Keepax,
	Chester Lin

On Wed, 29 Nov 2023 at 04:09, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> The most interesting patch is the list iterator fix in the core by Maria
> Yu, it took a while for me to realize what was going on there.

That commit message still doesn't explain what the problem was.

Why is p->state volatile there? It seems to be a serious locking bug
if p->state can randomly change there, and the READ_ONCE() looks like
a "this hides the problem" rather than an actual real fix.

                   Linus

^ permalink raw reply	[relevance 99%]

* Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
  @ 2023-11-27 17:10 87%       ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-27 17:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Jann Horn,
	linux-doc, linuxppc-dev, intel-gfx, linux-fsdevel, gfs2, bpf,
	ying.huang, feng.tang, fengwei.yin

On Mon, 27 Nov 2023 at 02:27, Christian Brauner <brauner@kernel.org> wrote:
>
> So I've picked up your patch (vfs.misc). It's clever alright so thanks
> for the comments in there otherwise I would've stared at this for far
> too long.

Note that I should probably have commented on one other thing: that
whole "just load from fd[0] is always safe, because the fd[] array
always exists".

IOW, that whole "load and mask" thing only works when you know the
array exists at all.

Doing that "just mask the index" wouldn't be valid if "size = 0" is an
option and might mean that we don't have an array at all (ie if "->fd"
itself could be NULL.

But we never have a completely empty file descriptor array, and
fdp->fd is never NULL.  At a minimum 'max_fds' is NR_OPEN_DEFAULT.

(The whole 'tsk->files' could be NULL, but only for kernel threads or
when exiting, so fget_task() will check for *that*, but it's a
separate thing)

So that's why it's safe to *entirely* remove the whole

                if (unlikely(fd >= fdt->max_fds))

test, and do it *all* with just "mask the index, and mask the resulting load".

Because we can *always* do that load at "fdt->fd[0]", and we want to
check the result for NULL anyway, so the "mask at the end and check
for NULL" is both natural and generates very good code.

Anyway, not a big deal, bit it might be worth noting before somebody
tries the same trick on some other array that *could* be zero-sized
and with a NULL base pointer, and where that 'array[0]' access isn't
necessarily guaranteed to be ok.

> It's a little unpleasant because of the cast-orama going on before we
> check the file pointer but I don't see that it's in any way wrong.

In my cleanup phase - which was a bit messy - I did wonder if I should
have some helper for it, since it shows up in both __fget_files_rcu()
and in files_lookup_fd_raw().

So I *could* have tried to add something like a
"masked_rcu_dereference()" that took the base pointer, the index, and
the mask, and did that whole dance.

Or I could have had just a "mask_pointer()" function, which we do
occasionally do in other places too (ie we hide data in low bits, and
then we mask them away when the pointer is used as a pointer).

But with only two users, it seemed to add more conceptual complexity
than it's worth, and I was not convinced that we'd want to expose that
pattern and have others use it.

So having a helper might clarify things, but it might also encourage
wrong users. I dunno.

I suspect the only real use for this ends up being this very special
"access the fdt->fd[] array using a file descriptor".

Anyway, that's why I largely just did it with comments, and commented
both places - and just kept the cast there in the open.

             Linus

^ permalink raw reply	[relevance 87%]

* Linux 6.7-rc3
@ 2023-11-27  4:13 53% Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-27  4:13 UTC (permalink / raw)
  To: Linux Kernel Mailing List

The diffstat here is dominated by a couple of reverts of some Realtek
phy code (accounting for almost a third of the diff).

But ignoring that, it's mostly fairly small, and all over the place.
Ethernet drivers, smb client fixes, bpf selftests stand out as bigger
areas, but we have random small driver updates (block, gpu, nvme, hid,
usb) and some arch fixes (x86, parisc, loongarch, arm64) too. Some
misc filesystem fixes.

Shortlog appended, and gives some flavor of what was going on last week.

               Linus

---

Abel Vesa (1):
      drm/msm/dp: don't touch DP subconnector property in eDP case

Alex Elder (1):
      net: ipa: fix one GSI register field width

Alexander Stein (1):
      usb: dwc3: Fix default mode initialization

Andrzej Hajda (1):
      drm/i915: do not clean GT table on error path

Andy Shevchenko (1):
      platform/x86: intel_telemetry: Fix kernel doc descriptions

Ani Sinha (1):
      hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles

Aoba K (1):
      HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad

Arnd Bergmann (3):
      nvme: target: fix nvme_keyring_id() references
      nvme: target: fix Kconfig select statements
      nvme: tcp: fix compile-time checks for TLS mode

Arseniy Krasnov (1):
      vsock/test: fix SEQPACKET message bounds test

Asuna Yang (1):
      USB: serial: option: add Luat Air72*U series products

Badhri Jagan Sridharan (2):
      usb: typec: tcpm: Skip hard reset when in error recovery
      usb: typec: tcpm: Fix sink caps op current check

Bibo Mao (1):
      LoongArch: Implement constant timer shutdown interface

Bjorn Andersson (1):
      drm/msm/dpu: Add missing safe_lut_tbl in sc8280xp catalog

Borislav Petkov (AMD) (2):
      x86/microcode: Remove the driver announcement and version
      x86/microcode: Rework early revisions reporting

Brett Raye (1):
      HID: glorious: fix Glorious Model I HID report

Charles Mirabile (1):
      io_uring/fs: consider link->flags when getting path for LINKAT

Charles Yi (1):
      HID: fix HID device resource race between HID core and debugging support

Chen Ni (1):
      ata: pata_isapnp: Add missing error check for devm_ioport_map()

Chengming Zhou (1):
      block/null_blk: Fix double blk_mq_start_request() warning

Christoph Hellwig (5):
      filemap: add a per-mapping stable writes flag
      block: update the stable_writes flag in bdev_add
      xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
      xfs: respect the stable writes flag on the RT device
      nvmet: nul-terminate the NQNs passed in the connect command

Christophe JAILLET (1):
      USB: typec: tps6598x: Fix a memory leak in an error handling path

Chuck Lever (1):
      libfs: getdents() should return 0 after reaching EOD

Chunfeng Yun (1):
      usb: xhci-mtk: fix in-ep's start-split check failure

Colin Ian King (1):
      bcache: remove redundant assignment to variable cur_idx

Coly Li (5):
      bcache: avoid oversize memory allocation by small stripe_size
      bcache: check return value from btree_node_alloc_replacement()
      bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in
btree_gc_coalesce()
      bcache: add code comments for bch_btree_node_get() and
__bch_btree_node_alloc()
      bcache: avoid NULL checking to c->root in run_cache_set()

Cong Yang (1):
      drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP and HBP

D. Wythe (1):
      net/smc: avoid data corruption caused by decline

Damien Le Moal (1):
      block: Remove blk_set_runtime_active()

Dan Carpenter (1):
      drm/msm: remove unnecessary NULL check

Daniel Borkmann (6):
      net, vrf: Move dstats structure to core
      net: Move {l,t,d}stats allocation to core and convert veth & vrf
      netkit: Add tstats per-CPU traffic counters
      bpf, netkit: Add indirect call wrapper for fetching peer dev
      selftests/bpf: De-veth-ize the tc_redirect test case
      selftests/bpf: Add netkit to tc_redirect selftest

Dapeng Mi (1):
      perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities

Darrick J. Wong (2):
      xfs: clean up dqblk extraction
      xfs: dquot recovery does not validate the recovered dquot

Dave Airlie (1):
      nouveau/gsp: allocate enough space for all channel ids.

David Howells (8):
      rxrpc: Fix some minor issues with bundle tracing
      rxrpc: Fix RTT determination to use any ACK as a source
      rxrpc: Defer the response to a PING ACK until we've parsed it
      afs: Fix afs_server_list to be cleaned up with RCU
      afs: Make error on cell lookup failure consistent with OpenAFS
      afs: Return ENOENT if no cell DNS record can be found
      afs: Fix file locking on R/O volumes to operate in local mode
      afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY

David Woodhouse (2):
      ACPI: processor_idle: use raw_safe_halt() in acpi_idle_play_dead()
      PM: tools: Fix sleepgraph syntax error

Denis Benato (2):
      HID: hid-asus: add const to read-only outgoing usb buffer
      HID: hid-asus: reset the backlight brightness level on resume

Dmitry Baryshkov (2):
      drm/msm: remove exra drm_kms_helper_poll_init() call
      drm/msm/dp: attach the DP subconnector property

Eduard Zingerman (11):
      selftests/bpf: track tcp payload offset as scalar in xdp_synproxy
      selftests/bpf: track string payload offset as scalar in strobemeta
      selftests/bpf: fix bpf_loop_bench for new callback verification scheme
      bpf: extract __check_reg_arg() utility function
      bpf: extract setup_func_entry() utility function
      bpf: verify callbacks as if they are called unknown number of times
      selftests/bpf: tests for iterating callbacks
      bpf: widening for callback iterators
      selftests/bpf: test widening for iterating callbacks
      bpf: keep track of max number of bpf_loop callback iterations
      selftests/bpf: check if max number of bpf_loop iterations is tracked

Eric Dumazet (1):
      wireguard: use DEV_STATS_INC()

Ferry Meng (1):
      erofs: simplify erofs_read_inode()

Gao Xiang (1):
      MAINTAINERS: erofs: add EROFS webpage

Gerd Bayer (1):
      s390/ism: ism driver implies smc protocol

Gil Fine (1):
      thunderbolt: Set lane bonding bit only for downstream port

Gustavo A. R. Silva (1):
      xen: privcmd: Replace zero-length array with flex-array member
and use __counted_by

Haiyang Zhang (2):
      hv_netvsc: fix race of netvsc and VF register_netdevice
      hv_netvsc: Fix race of register_netdevice_notifier and VF register

Hamish Martin (2):
      HID: mcp2221: Set driver data before I2C adapter add
      HID: mcp2221: Allow IO to start during probe

Hannes Reinecke (5):
      nvme-tcp: only evaluate 'tls' option if TLS is selected
      nvme: catch errors from nvme_configure_metadata()
      nvme: blank out authentication fabrics options if not configured
      nvmet-tcp: always initialize tls_handshake_tmo_work
      nvme: move nvme_stop_keep_alive() back to original position

Hans de Goede (5):
      ACPI: PM: Add acpi_device_fix_up_power_children() function
      ACPI: video: Use acpi_device_fix_up_power_children()
      ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA
      MAINTAINERS: Drop Mark Gross as maintainer for x86 platform drivers
      usb: misc: ljca: Fix enumeration error on Dell Latitude 9420

Hao Ge (1):
      dpll: Fix potential msg memleak when genlmsg_put_reply failed

Harshit Mogalapalli (4):
      platform/x86: hp-bioscfg: Simplify return check in
hp_add_other_attributes()
      platform/x86: hp-bioscfg: move mutex_lock() down in
hp_add_other_attributes()
      platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()
      platform/x86: hp-bioscfg: Remove unused obj in hp_add_other_attributes()

Heikki Krogerus (1):
      usb: typec: tipd: Supply also I2C driver data

Heiko Carstens (2):
      s390: remove odd comment
      scripts/checkstack.pl: match all stack sizes for s390

Heiner Kallweit (1):
      Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"

Helge Deller (9):
      parisc: Mark ex_table entries 32-bit aligned in assembly.h
      parisc: Mark ex_table entries 32-bit aligned in uaccess.h
      parisc: Mark altinstructions read-only and 32-bit aligned
      parisc: Mark jump_table naturally aligned
      parisc: Mark lock_aligned variables 16-byte aligned on SMP
      parisc: Ensure 32-bit alignment on parisc unwind section
      parisc: Use natural CPU alignment for bug_table
      parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
      parisc: Reduce size of the bug_table on 64-bit kernel by half

Huacai Chen (3):
      LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
      LoongArch: Silence the boot warning about 'nokaslr'
      LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL

Ian Kent (1):
      autofs: add: new_inode check in autofs_fill_super()

Imre Deak (1):
      drm/i915/dp_mst: Fix race between connector registration and setup

Ivan Vecera (1):
      i40e: Fix adding unsupported cloud filters

Jacek Lawrynowicz (1):
      accel/ivpu/37xx: Fix hangs related to MMIO reset

Jacob Keller (3):
      ice: remove ptp_tx ring parameter flag
      ice: unify logic for programming PFINT_TSYN_MSK
      ice: restore timestamp configuration after device reset

Jakub Kicinski (4):
      net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modules
      docs: netdev: try to guide people on dealing with silence
      tools: ynl: fix header path for nfsd
      tools: ynl: fix duplicate op name in devlink

Jan Höppner (1):
      s390/dasd: protect device queue against concurrent access

Jann Horn (1):
      tls: fix NULL deref on tls_sw_splice_eof() with empty record

Jean Delvare (1):
      stmmac: dwmac-loongson: Add architecture dependency

Jiawen Wu (1):
      net: wangxun: fix kernel panic due to null pointer

Jingbo Xu (1):
      erofs: fix NULL dereference of dif->bdev_handle in fscache mode

Jiri Kosina (1):
      Revert "HID: logitech-dj: Add support for a new lightspeed
receiver iteration"

Jithu Joseph (1):
      MAINTAINERS: Remove stale entry for SBL platform driver

Johan Hovold (11):
      Revert "phy: realtek: usb: Add driver for the Realtek SoC USB 3.0 PHY"
      Revert "phy: realtek: usb: Add driver for the Realtek SoC USB 2.0 PHY"
      Revert "usb: phy: add usb phy notify port status API"
      dt-bindings: usb: hcd: add missing phy name to example
      USB: xhci-plat: fix legacy PHY double init
      dt-bindings: usb: qcom,dwc3: fix example wakeup interrupt types
      USB: dwc3: qcom: fix wakeup after probe deferral
      USB: dwc3: qcom: simplify wakeup interrupt setup
      USB: dwc3: qcom: fix resource leaks on probe deferral
      USB: dwc3: qcom: fix software node leak on probe errors
      USB: dwc3: qcom: fix ACPI platform device leak

Jonas Karlman (1):
      drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full

Jonathan Marek (1):
      drm/msm/dsi: use the correct VREG_CTRL_1 value for 4nm cphy

Jose Ignacio Tornos Martinez (1):
      net: usb: ax88179_178a: fix failed operations during ax88179_reset

Kees Cook (1):
      MAINTAINERS: Add netdev subsystem profile link

Keith Busch (2):
      swiotlb-xen: provide the "max_mapping_size" method
      io_uring: fix off-by one bvec index

Kunwu Chan (1):
      ipv4: Correct/silence an endian warning in __ip_do_redirect

Lech Perczak (2):
      USB: serial: option: don't claim interface 4 for ZTE MF290
      net: usb: qmi_wwan: claim interface 4 for ZTE MF290

Li Nan (4):
      nbd: fold nbd config initialization into nbd_alloc_config()
      nbd: factor out a helper to get nbd_config without holding 'config_lock'
      nbd: fix null-ptr-dereference while accessing 'nbd->config'
      nbd: pass nbd_sock to nbd_read_reply() instead of index

Linus Torvalds (2):
      asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
      Linux 6.7-rc3

Long Li (1):
      hv_netvsc: Mark VF as slave before exposing it to user-mode

Lorenzo Bianconi (1):
      net: veth: fix ethtool stats reporting

Marek Vasut (2):
      drm/panel: simple: Fix Innolux G101ICE-L01 bus flags
      drm/panel: simple: Fix Innolux G101ICE-L01 timings

Mark Brown (1):
      kselftest/arm64: Fix output formatting for za-fork

Mark O'Donovan (2):
      nvme-auth: unlock mutex in one place only
      nvme-auth: set explanation code for failure2 msgs

Masahiro Yamada (2):
      LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi
      arm64: add dependency between vmlinuz.efi and Image

Mathieu Desnoyers (1):
      MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer

Mika Westerberg (2):
      thunderbolt: Send uevent after asymmetric/symmetric switch
      thunderbolt: Only add device router DP IN to the head of the DP
resource list

Mikhail Zaslonko (1):
      s390/ipl: add missing IPL_TYPE_ECKD_DUMP case to ipl_init()

Ming Lei (3):
      blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read
lock required!"
      blk-cgroup: avoid to warn !rcu_read_lock_held() in blkg_lookup()
      blk-cgroup: bypass blkcg_deactivate_policy after destroying

Mingzhe Zou (3):
      bcache: fixup init dirty data errors
      bcache: fixup lock c->root error
      bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race

Muhammad Muzammil (1):
      s390/dasd: resolve spelling mistake

Nguyen Dinh Phi (1):
      nfc: virtual_ncidev: Add variable to check if ndev is running

Niklas Neronin (1):
      usb: config: fix iteration issue in 'usb_get_bos_descriptor()'

Oliver Neukum (3):
      usb: aqc111: check packet for fixup for true limit
      HID: add ALWAYS_POLL quirk for Apple kb
      USB: dwc2: write HCINT with INTMASK applied

Omar Sandoval (1):
      iov_iter: fix copy_page_to_iter_nofault()

Paolo Abeni (1):
      kselftest: rtnetlink: fix ip route command typo

Paulo Alcantara (4):
      smb: client: implement ->query_reparse_point() for SMB1
      smb: client: introduce ->parse_reparse_point()
      smb: client: set correct file type from NFS reparse points
      smb: client: introduce cifs_sfu_make_node()

Pawel Laszczak (1):
      usb: cdnsp: Fix deadlock issue during using NCM gadget

Peilin Ye (2):
      veth: Use tstats per-CPU traffic counters
      bpf: Fix dev's rx stats for bpf_redirect_peer traffic

Peter Zijlstra (1):
      lockdep: Fix block chain corruption

Puliang Lu (1):
      USB: serial: option: fix FM101R-GL defines

Raju Rangoju (3):
      amd-xgbe: handle corner-case during sfp hotplug
      amd-xgbe: handle the corner-case during tx completion
      amd-xgbe: propagate the correct speed and duplex status

Rand Deeb (1):
      bcache: prevent potential division by zero error

Ricardo Ribalda (1):
      usb: dwc3: set the dma max_seg_size

Ritvik Budhiraja (1):
      cifs: fix use after free for iface while disabling secondary channels

Samuel Holland (1):
      net: axienet: Fix check for partial TX checksum

Saurabh Sengar (1):
      x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM

Shyam Sundar S K (1):
      platform/x86/amd/pmc: adjust getting DRAM size behavior

Simon Horman (1):
      MAINTAINERS: Add indirect_call_wrapper.h to NETWORKING [GENERAL]

Song Liu (1):
      md: fix bi_status reporting in md_end_clone_io

Stanley Chang (1):
      usb: dwc3: add missing of_node_put and platform_device_put

Stefan Berger (1):
      fs: Pass AT_GETATTR_NOSEC flag to getattr interface function

Stefan Eichenberger (2):
      dt-bindings: usb: microchip,usb5744: Add second supply
      usb: misc: onboard-hub: add support for Microchip USB5744

Stefano Stabellini (1):
      arm/xen: fix xen_vcpu_info allocation alignment

Steven Rostedt (Google) (6):
      eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
      eventfs: Do not invalidate dentry in create_file/dir_dentry()
      eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
      eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
      eventfs: Do not allow NULL parent to eventfs_start_creating()
      eventfs: Make sure that parent->d_inode is locked in creating files/dirs

Stuart Hayhurst (1):
      platform/x86: ideapad-laptop: Set max_brightness before using it

Suman Ghosh (2):
      octeontx2-pf: Fix memory leak during interface down
      octeontx2-pf: Fix ntuple rule creation to direct packet to VF
with higher Rx queue than its PF

Thomas Richter (1):
      s390/pai: cleanup event initialization

Thomas Zimmermann (1):
      drm/ast: Disconnect BMC if physical connector is connected

Uros Bizjak (1):
      x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()

Victor Fragoso (1):
      USB: serial: option: add Fibocom L7xx modules

WANG Rui (2):
      LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
      LoongArch: Record pc instead of offset in la_abs relocation

Wentong Wu (1):
      usb: misc: ljca: Drop _ADR support to get ljca children devices

Will Deacon (1):
      arm64: mm: Fix "rodata=on" when CONFIG_RODATA_FULL_DEFAULT_ENABLED=y

Xuxin Xiong (1):
      drm/panel: auo,b101uan08.3: Fine tune the panel power sequence

Yanteng Si (2):
      Docs/LoongArch: Update links in LoongArch introduction.rst
      Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst

Yihong Cao (1):
      HID: apple: add Jamesdonkey and A3R to non-apple keyboards list

^ permalink raw reply	[relevance 53%]

* Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
  2023-11-26 20:23 74% ` Linus Torvalds
@ 2023-11-26 23:20 79%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-26 23:20 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christian Brauner, oe-lkp, lkp, linux-kernel, Jann Horn,
	linux-doc, linuxppc-dev, intel-gfx, linux-fsdevel, gfs2, bpf,
	ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

On Sun, 26 Nov 2023 at 12:23, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, I might have messed up some "trivial cleanup" when prepping for
> sending it out...

Bah. Famous last words. One of the "trivial cleanups" made the code
more "obvious" by renaming the nospec mask as just "mask".

And that trivial rename broke that patch *entirely*, because now that
name shadowed the "fmode_t" mask argument.

Don't even ask how long it took me to go from "I *tested* this,
dammit, now it doesn't work at all" to "Oh God, I'm so stupid".

So that nobody else would waste any time on this, attached is a new
attempt. This time actually tested *after* the changes.

                  Linus

[-- Attachment #2: 0001-Improve-__fget_files_rcu-code-generation-and-thus-__.patch --]
[-- Type: text/x-patch, Size: 5014 bytes --]

From 45f70b5413a654d20ead410c533ec1d604bdb1e2 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 26 Nov 2023 12:24:38 -0800
Subject: [PATCH] Improve __fget_files_rcu() code generation (and thus __fget_light())

Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.

The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted.  Let's just
write that function to better.

Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/file.c               | 48 +++++++++++++++++++++++++----------------
 include/linux/fdtable.h | 15 ++++++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 5fb0b146e79e..608b4802c214 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		struct file *file;
 		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 		struct file __rcu **fdentry;
+		unsigned long nospec;
 
-		if (unlikely(fd >= fdt->max_fds))
+		/* Mask is a 0 for invalid fd's, ~0 for valid ones */
+		nospec = array_index_mask_nospec(fd, fdt->max_fds);
+
+		/* fdentry points to the 'fd' offset, or fdt->fd[0] */
+		fdentry = fdt->fd + (fd&nospec);
+
+		/* Do the load, then mask any invalid result */
+		file = rcu_dereference_raw(*fdentry);
+		file = (void *)(nospec & (unsigned long)file);
+
+		if (unlikely(!file))
 			return NULL;
 
-		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		/*
+		 * Ok, we have a file pointer that was valid at
+		 * some point, but it might have become stale since.
+		 *
+		 * We need to confirm it by incrementing the refcount
+		 * and then check the lookup again.
+		 *
+		 * atomic_long_inc_not_zero() gives us a full memory
+		 * barrier. We only really need an 'acquire' one to
+		 * protect the loads below, but we don't have that.
+		 */
+		if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+			continue;
 
 		/*
-		 * Ok, we have a file pointer. However, because we do
-		 * this all locklessly under RCU, we may be racing with
-		 * that file being closed.
-		 *
 		 * Such a race can take two forms:
 		 *
 		 *  (a) the file ref already went down to zero and the
 		 *      file hasn't been reused yet or the file count
 		 *      isn't zero but the file has already been reused.
-		 */
-		file = __get_file_rcu(fdentry);
-		if (unlikely(!file))
-			return NULL;
-
-		if (unlikely(IS_ERR(file)))
-			continue;
-
-		/*
+		 *
 		 *  (b) the file table entry has changed under us.
 		 *       Note that we don't need to re-check the 'fdt->fd'
 		 *       pointer having changed, because it always goes
@@ -991,7 +1002,8 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		 *
 		 * If so, we need to put our ref and try again.
 		 */
-		if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
+		if (unlikely(file != rcu_dereference_raw(*fdentry)) ||
+		    unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
 			fput(file);
 			continue;
 		}
@@ -1128,13 +1140,13 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask)
 	 * atomic_read_acquire() pairs with atomic_dec_and_test() in
 	 * put_files_struct().
 	 */
-	if (atomic_read_acquire(&files->count) == 1) {
+	if (likely(atomic_read_acquire(&files->count) == 1)) {
 		file = files_lookup_fd_raw(files, fd);
 		if (!file || unlikely(file->f_mode & mask))
 			return 0;
 		return (unsigned long)file;
 	} else {
-		file = __fget(fd, mask);
+		file = __fget_files(files, fd, mask);
 		if (!file)
 			return 0;
 		return FDPUT_FPUT | (unsigned long)file;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index bc4c3287a65e..80bd7789bab1 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -83,12 +83,17 @@ struct dentry;
 static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds);
+	struct file *needs_masking;
 
-	if (fd < fdt->max_fds) {
-		fd = array_index_nospec(fd, fdt->max_fds);
-		return rcu_dereference_raw(fdt->fd[fd]);
-	}
-	return NULL;
+	/*
+	 * 'mask' is zero for an out-of-bounds fd, all ones for ok.
+	 * 'fd&mask' is 'fd' for ok, or 0 for out of bounds.
+	 *
+	 * Accessing fdt->fd[0] is ok, but needs masking of the result.
+	 */
+	needs_masking = rcu_dereference_raw(fdt->fd[fd&mask]);
+	return (struct file *)(mask & (unsigned long)needs_masking);
 }
 
 static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd)
-- 
2.43.0.5.g38fb137bdb


^ permalink raw reply related	[relevance 79%]

* Re: [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
  @ 2023-11-26 20:23 74% ` Linus Torvalds
  2023-11-26 23:20 79%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-26 20:23 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christian Brauner, oe-lkp, lkp, linux-kernel, Jann Horn,
	linux-doc, linuxppc-dev, intel-gfx, linux-fsdevel, gfs2, bpf,
	ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 3134 bytes --]

On Sun, 19 Nov 2023 at 23:11, kernel test robot <oliver.sang@intel.com> wrote:
>
> kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops on:
>
> commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to SLAB_TYPESAFE_BY_RCU")

Ok, so __fget_light() is one of our more performance-critical things,
and that commit ends up making it call a rather more expensive version
in __get_file_rcu(), so we have:

>      30.90 ą  4%     -20.6       10.35 ą  2%  perf-profile.self.cycles-pp.__fget_light
>       0.00           +26.5       26.48        perf-profile.self.cycles-pp.__get_file_rcu

and that "20% decrease balanced by 26% increase elsewhere" then
directly causes the ~3% regression.

I took a look at the code generation, and honestly, I think we're
better off just making __fget_files_rcu() have special logic for this
all, and not use __get_file_rcu().

The 'fd' case really is special because we need to do that
non-speculative pointer access.

Because it turns out that we already have to use array_index_nospec()
to safely generate that pointer to the fd entry, and once you have to
do that "use non-speculative accesses to generate a safe pointer", you
might as well just go whole hog.

So this takes a different approach, and uses the
array_index_mask_nospec() thing that we have exactly to generate that
non-speculative mask for these things:

        /* Mask is a 0 for invalid fd's, ~0 for valid ones */
        mask = array_index_mask_nospec(fd, fdt->max_fds);

and then it does something you can consider either horribly clever, or
horribly ugly (this first part is basically just
array_index_nospec()):

        /* fdentry points to the 'fd' offset, or fdt->fd[0] */
        fdentry = fdt->fd + (fd&mask);

and then we can just *unconditionally* do the load - but since we
might be loading fd[0] for an invalid case, we need to mask the result
too:

        /* Do the load, then mask any invalid result */
        file = rcu_dereference_raw(*fdentry);
        file = (void *)(mask & (unsigned long)file);

but now we have done everything without any conditionals, and the only
conditional is now "did we load NULL" - which includes that "we masked
the bad value".

Then we just do that atomic_long_inc_not_zero() on the file, and
re-check the pointer chain we used.

I made files_lookup_fd_raw() do the same thing.

The end result is much nicer code generation at least from my quick
check. And I assume the regression will be gone, and hopefully even
turned into an improvement since this is so hot.

Comments? I also looked at that odd OPTIMIZER_HIDE_VAR() that
__get_file_rcu() does, and I don't get it. Both things come from
volatile accesses, I don't see the point of those games, but I also
didn't care, since it's no longer in a critical code path.

Christian?

NOTE! This patch is not well tested. I verified an earlier version of
this, but have been playing with it since, so caveat emptor.

IOW, I might have messed up some "trivial cleanup" when prepping for
sending it out...

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4120 bytes --]

 fs/file.c               | 48 ++++++++++++++++++++++++++++++------------------
 include/linux/fdtable.h | 15 ++++++++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 5fb0b146e79e..c74a6e8687d9 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		struct file *file;
 		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 		struct file __rcu **fdentry;
+		unsigned long mask;
 
-		if (unlikely(fd >= fdt->max_fds))
+		/* Mask is a 0 for invalid fd's, ~0 for valid ones */
+		mask = array_index_mask_nospec(fd, fdt->max_fds);
+
+		/* fdentry points to the 'fd' offset, or fdt->fd[0] */
+		fdentry = fdt->fd + (fd&mask);
+
+		/* Do the load, then mask any invalid result */
+		file = rcu_dereference_raw(*fdentry);
+		file = (void *)(mask & (unsigned long)file);
+
+		if (unlikely(!file))
 			return NULL;
 
-		fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+		/*
+		 * Ok, we have a file pointer that was valid at
+		 * some point, but it might have become stale since.
+		 *
+		 * We need to confirm it by incrementing the refcount
+		 * and then check the lookup again.
+		 *
+		 * atomic_long_inc_not_zero() gives us a full memory
+		 * barrier. We only really need an 'acquire' one to
+		 * protect the loads below, but we don't have that.
+		 */
+		if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
+			continue;
 
 		/*
-		 * Ok, we have a file pointer. However, because we do
-		 * this all locklessly under RCU, we may be racing with
-		 * that file being closed.
-		 *
 		 * Such a race can take two forms:
 		 *
 		 *  (a) the file ref already went down to zero and the
 		 *      file hasn't been reused yet or the file count
 		 *      isn't zero but the file has already been reused.
-		 */
-		file = __get_file_rcu(fdentry);
-		if (unlikely(!file))
-			return NULL;
-
-		if (unlikely(IS_ERR(file)))
-			continue;
-
-		/*
+		 *
 		 *  (b) the file table entry has changed under us.
 		 *       Note that we don't need to re-check the 'fdt->fd'
 		 *       pointer having changed, because it always goes
@@ -991,7 +1002,8 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
 		 *
 		 * If so, we need to put our ref and try again.
 		 */
-		if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
+		if (unlikely(file != rcu_dereference_raw(*fdentry)) ||
+		    unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
 			fput(file);
 			continue;
 		}
@@ -1128,13 +1140,13 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask)
 	 * atomic_read_acquire() pairs with atomic_dec_and_test() in
 	 * put_files_struct().
 	 */
-	if (atomic_read_acquire(&files->count) == 1) {
+	if (likely(atomic_read_acquire(&files->count) == 1)) {
 		file = files_lookup_fd_raw(files, fd);
 		if (!file || unlikely(file->f_mode & mask))
 			return 0;
 		return (unsigned long)file;
 	} else {
-		file = __fget(fd, mask);
+		file = __fget_files(files, fd, mask);
 		if (!file)
 			return 0;
 		return FDPUT_FPUT | (unsigned long)file;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index bc4c3287a65e..a8a8b4d24619 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -83,12 +83,17 @@ struct dentry;
 static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+	unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds);
+	struct file *needs_masking;
 
-	if (fd < fdt->max_fds) {
-		fd = array_index_nospec(fd, fdt->max_fds);
-		return rcu_dereference_raw(fdt->fd[fd]);
-	}
-	return NULL;
+	/*
+	 * 'mask' is zero for an out-of-bounds fd, all ones for ok.
+	 * 'fd|~mask' is 'fd' for ok, or 0 for out of bounds.
+	 *
+	 * Accessing fdt->fd[0] is ok, but needs masking of the result.
+	 */
+	needs_masking = rcu_dereference_raw(fdt->fd[fd&mask]);
+	return (struct file *)(mask & (unsigned long)needs_masking);
 }
 
 static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd)

^ permalink raw reply related	[relevance 74%]

* Re: [GIT PULL] fbdev fixes and updates for v6.7-rc3
  @ 2023-11-26 16:29 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-26 16:29 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-kernel, linux-fbdev, dri-devel

On Sat, 25 Nov 2023 at 22:58, Helge Deller <deller@gmx.de> wrote:
>
> please pull some small fbdev fixes for 6.7-rc3.

These all seem to be pure cleanups, not bug fixes.

Please resend during the merge window.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCHES] assorted dcache stuff
  @ 2023-11-24 21:41 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 21:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christian Brauner, linux-kernel

On Thu, 23 Nov 2023 at 22:05, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         Assorted dcache cleanups.

Looks obvious enough to me.

Famous last words.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [RFC][PATCHSET v3] simplifying fast_dput(), dentry_kill() et.al.
    @ 2023-11-24 21:28 99% ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 21:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christian Brauner, linux-kernel

On Thu, 23 Nov 2023 at 22:02, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         The series below is the fallout of trying to document the dentry
> refcounting and life cycle - basically, getting rid of the bits that
> had been too subtle and ugly to write them up.

Apart from my RCU note, this looks like "Al knows what he's doing" to me.

Although I'm inclined to agree with Amir on the "no need to call out
kabi" on patch#3. It's also not like we've ever cared: as long as you
convert all users, kabi is simply not relevant.

            Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH v3 02/21] coda_flag_children(): cope with dentries turning negative
  @ 2023-11-24 21:22 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 21:22 UTC (permalink / raw)
  To: Al Viro, Paul E. McKenney; +Cc: linux-fsdevel, Christian Brauner, linux-kernel

On Thu, 23 Nov 2023 at 22:04, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ->d_lock on parent does not stabilize ->d_inode of child.
> We don't do much with that inode in there, but we need
> at least to avoid struct inode getting freed under us...

Gaah. We've gone back and forth on this. Being non-preemptible is
already equivalent to rcu read locking.

From Documentation/RCU/rcu_dereference.rst:

                            With the new consolidated
        RCU flavors, an RCU read-side critical section is entered
        using rcu_read_lock(), anything that disables bottom halves,
        anything that disables interrupts, or anything that disables
        preemption.

so I actually think the coda code is already mostly fine, because that
parent spin_lock may not stabilize d_child per se, but it *does* imply
a RCU read lock.

So I think you should drop the rcu_read_lock/rcu_read_unlock from that patch.

But that

                struct inode *inode = d_inode_rcu(de);

conversion is required to get a stable inode pointer.

So half of this patch is unnecessary.

Adding Paul to the cc just to verify that the docs are up-to-date and
that we're still good here.

Because we've gone back-and-forth on the "spinlocks are an implied RCU
read-side critical section" a couple of times.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios
  @ 2023-11-24 20:55 92% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 20:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Ryan Roberts,
	Matthew Wilcox, Hugh Dickins, Yin Fengwei, Yang Shi, Ying Huang,
	Zi Yan, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Paul E. McKenney

On Fri, 24 Nov 2023 at 05:26, David Hildenbrand <david@redhat.com> wrote:
>
> Are you interested in some made-up math, new locking primitives and
> slightly unpleasant performance numbers on first sight? :)

Ugh. I'm not loving the "I have a proof, but it's too big to fit in
the margin" model  of VM development.

This does seem to be very subtle.

Also, please benchmark what your rmap changes do to just plain regular
pages - it *looks* like maybe all you did was to add some
VM_WARN_ON_FOLIO() for those cases, but I have this strong memory of
that

        if (likely(!compound)) {

case being very critical on all the usual cases (and the cleanups by
Hugh last year were nice).

I get the feeling that you are trying to optimize a particular case
that is special enough that some less complicated model might work.

Just by looking at your benchmarks, I *think* the case you actually
want to optimize is "THP -> fork -> child exit/execve -> parent write
COW reuse" where the THP page was really never in more than two VM's,
and the second VM was an almost accidental temporary thing that is
just about the whole "fork->exec/exit" model.

Which makes me really feel like your rmap_id is very over-engineered.
It seems to be designed to handle all the generic cases, but it seems
like the main cause for it is a very specific case that I _feel_
should be something that could be tracked with *way* less information
(eg just have a "pointer to owner vma, and a simple counter of
non-owners").

I dunno. I was cc'd, I looked at the patches, but I suspect I'm not
really the target audience. If Hugh is ok with this kind of
complexity, I bow to a higher authority. This *does* seem to add a lot
of conceptual complexity to something that is already complicated.

           Linus

^ permalink raw reply	[relevance 92%]

* Re: [GIT PULL] vfs fixes
  2023-11-24 18:52 99%   ` Linus Torvalds
@ 2023-11-24 20:12 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 20:12 UTC (permalink / raw)
  To: Christian Brauner, Omar Sandoval, David Howells
  Cc: linux-fsdevel, linux-kernel

On Fri, 24 Nov 2023 at 10:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Gaah. I guess it's the VM_IOREMAP case that is the cause of all this horridness.
>
> So we'd have to know not to mess with IO mappings. Annoying.

Doing a debian code search, I see a number of programs that do a
"stat()" on the kcore file, to get some notion of "system memory
size". I don't think it's valid, but whatever. We probably shouldn't
change it.

I also see some programs that actually read the ELF notes and sections
for dumping purposes.

But does anybody actually run gdb on that thing or similar? That's the
original model for that file, but it was always more of a gimmick than
anything else.

Because we could just say "read zeroes from KCORE_VMALLOC" and be done
with it that way.

                  Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] vfs fixes
  2023-11-24 18:25 60% ` Linus Torvalds
@ 2023-11-24 18:52 99%   ` Linus Torvalds
  2023-11-24 20:12 99%     ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-24 18:52 UTC (permalink / raw)
  To: Christian Brauner, Omar Sandoval, David Howells
  Cc: linux-fsdevel, linux-kernel

On Fri, 24 Nov 2023 at 10:25, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I just like how the patch looks:
>
>  6 files changed, 1 insertion(+), 368 deletions(-)
>
> and those 350+ deleted lines really looked disgusting to me.

Gaah. I guess it's the VM_IOREMAP case that is the cause of all this horridness.

So we'd have to know not to mess with IO mappings. Annoying.

But I think my patch could still act as a starting point, except that

                case KCORE_VMALLOC:

would have to have some kind of  "if this is not a regular vmalloc,
just skip it" logic.

So I guess we can't remove all those lines, but we *could* replace all
the vread_iter() code with a "bool can_I_vread_this()" instead. So the
fallback would still be to just do the bounce buffer copy.

Or something.

Oh well.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] afs: Miscellaneous fixes
  @ 2023-11-24 18:39 99% ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-24 18:39 UTC (permalink / raw)
  To: David Howells; +Cc: Marc Dionne, linux-afs, linux-fsdevel, linux-kernel

On Fri, 24 Nov 2023 at 07:52, David Howells <dhowells@redhat.com> wrote:
> Btw, I did want to ask about (5): Does a superblock being marked SB_RDONLY
> imply immutability to the application?

Obviously not - any network filesystem can and will change from under
you, even if the local copy is read-only.

So SB_RDONLY can only mean that writes to that instance of the
filesystem will fail.

It's a bit stronger than MNT_READONLY, in that for a *local*
filesystem, SB_RDONLY tends to mean that it's truly immutable (while
MNT_READONLY is obviously per mount) but even then some sub-mount
thing (and I guess the AFS snapshot is a good example of that) might
expose the same filesystem through multiple superblocks.

Exactly like a network filesystem inevitably will.

In any case, any user space that thinks SB_RDONLY is some kind of
immutability signal is clearly buggy. At a minimum, such user space
would have to limit itself to particular filesystem types and say "I
know _this_ filesystem can have only one superblock"). And I'd argue
that while that might work in practice, it's insane.

                Linus

^ permalink raw reply	[relevance 99%]

* Re: [GIT PULL] vfs fixes
  @ 2023-11-24 18:25 60% ` Linus Torvalds
  2023-11-24 18:52 99%   ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-24 18:25 UTC (permalink / raw)
  To: Christian Brauner, Omar Sandoval, David Howells
  Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2661 bytes --]

On Fri, 24 Nov 2023 at 02:28, Christian Brauner <brauner@kernel.org> wrote:
>
> * Fix a bug introduced with the iov_iter rework from last cycle.
>
>   This broke /proc/kcore by copying too much and without the correct
>   offset.

Ugh. I think the whole /proc/kcore vmalloc handling is just COMPLETELY broken.

It does this:

        /*
         * vmalloc uses spinlocks, so we optimistically try to
         * read memory. If this fails, fault pages in and try
         * again until we are done.
         */
        while (true) {
                read += vread_iter(iter, src, left);
                if (read == tsz)
                        break;

                src += read;
                left -= read;

                if (fault_in_iov_iter_writeable(iter, left)) {
                        ret = -EFAULT;
                        goto out;
                }
        }


and that is just broken beyond words for two totally independent reasons:

 (a) vread_iter() looks like it can fail because of not having a
source, and return 0 (I dunno - it seems to try to avoid that, but it
all looks pretty dodgy)

       At that point fault_in_iov_iter_writeable() will try to fault
in the destination, which may work just fine, but if the source was
the problem, you'd have an endless loop.

 (b) That "read += X" is completely broken anyway. It should be just a
"=". So that whole loop is crazy broken, and only works for the case
where you get it all in one go. This code is crap.

Now, I think it all works in practice for one simple reason: I doubt
anybody uses this (and it looks like the callees in the while loop try
very hard to always fill the whole area - maybe people noticed the
first bug and tried to work around it that way).

I guess there is at least one test program, but it presumably doesn't
trigger or care about the bugs here.

But I think we can get rid of this all, and just deal with the
KCORE_VMALLOC case exactly the same way we already deal with VMEMMAP
and TEXT: by just doing copy_from_kernel_nofault() into a bounce
buffer, and then doing a regular _copy_to_iter() or whatever.

NOTE! I looked at the code, and threw up in my mouth a little, and
maybe I missed something. Maybe it all works fine. But Omar - since
you found the original problem, may I implore you to test this
attached patch?

I just like how the patch looks:

 6 files changed, 1 insertion(+), 368 deletions(-)

and those 350+ deleted lines really looked disgusting to me.

This patch is on top of the pull I did, because obviously the fix in
that pull was correct, I just think we should go further and get rid
of this whole mess entirely.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 11981 bytes --]

 fs/proc/kcore.c         |  26 +----
 include/linux/uio.h     |   3 -
 include/linux/vmalloc.h |   3 -
 lib/iov_iter.c          |  33 ------
 mm/nommu.c              |   9 --
 mm/vmalloc.c            | 295 ------------------------------------------------
 6 files changed, 1 insertion(+), 368 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 6422e569b080..83a39f4d1ddc 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -504,31 +504,6 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
 		}
 
 		switch (m->type) {
-		case KCORE_VMALLOC:
-		{
-			const char *src = (char *)start;
-			size_t read = 0, left = tsz;
-
-			/*
-			 * vmalloc uses spinlocks, so we optimistically try to
-			 * read memory. If this fails, fault pages in and try
-			 * again until we are done.
-			 */
-			while (true) {
-				read += vread_iter(iter, src, left);
-				if (read == tsz)
-					break;
-
-				src += read;
-				left -= read;
-
-				if (fault_in_iov_iter_writeable(iter, left)) {
-					ret = -EFAULT;
-					goto out;
-				}
-			}
-			break;
-		}
 		case KCORE_USER:
 			/* User page is handled prior to normal kernel page: */
 			if (copy_to_iter((char *)start, tsz, iter) != tsz) {
@@ -555,6 +530,7 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
 				break;
 			}
 			fallthrough;
+		case KCORE_VMALLOC:
 		case KCORE_VMEMMAP:
 		case KCORE_TEXT:
 			/*
diff --git a/include/linux/uio.h b/include/linux/uio.h
index b6214cbf2a43..993a6bd8bdd3 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -187,9 +187,6 @@ static inline size_t copy_folio_from_iter_atomic(struct folio *folio,
 	return copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
 }
 
-size_t copy_page_to_iter_nofault(struct page *page, unsigned offset,
-				 size_t bytes, struct iov_iter *i);
-
 static __always_inline __must_check
 size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..f8885045f4d2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -247,9 +247,6 @@ static inline void set_vm_flush_reset_perms(void *addr)
 }
 #endif
 
-/* for /proc/kcore */
-extern long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
-
 /*
  *	Internals.  Don't use..
  */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8ff6824a1005..6d2b79973622 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -394,39 +394,6 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_to_iter);
 
-size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t bytes,
-				 struct iov_iter *i)
-{
-	size_t res = 0;
-
-	if (!page_copy_sane(page, offset, bytes))
-		return 0;
-	if (WARN_ON_ONCE(i->data_source))
-		return 0;
-	page += offset / PAGE_SIZE; // first subpage
-	offset %= PAGE_SIZE;
-	while (1) {
-		void *kaddr = kmap_local_page(page);
-		size_t n = min(bytes, (size_t)PAGE_SIZE - offset);
-
-		n = iterate_and_advance(i, n, kaddr + offset,
-					copy_to_user_iter_nofault,
-					memcpy_to_iter);
-		kunmap_local(kaddr);
-		res += n;
-		bytes -= n;
-		if (!bytes || !n)
-			break;
-		offset += n;
-		if (offset == PAGE_SIZE) {
-			page++;
-			offset = 0;
-		}
-	}
-	return res;
-}
-EXPORT_SYMBOL(copy_page_to_iter_nofault);
-
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
diff --git a/mm/nommu.c b/mm/nommu.c
index b6dc558d3144..1612b3a601fd 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -199,15 +199,6 @@ unsigned long vmalloc_to_pfn(const void *addr)
 }
 EXPORT_SYMBOL(vmalloc_to_pfn);
 
-long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
-{
-	/* Don't allow overflow */
-	if ((unsigned long) addr + count < count)
-		count = -(unsigned long) addr;
-
-	return copy_to_iter(addr, count, iter);
-}
-
 /*
  *	vmalloc  -  allocate virtually contiguous memory
  *
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..79889a10e18d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -802,31 +802,6 @@ unsigned long vmalloc_nr_pages(void)
 	return atomic_long_read(&nr_vmalloc_pages);
 }
 
-/* Look up the first VA which satisfies addr < va_end, NULL if none. */
-static struct vmap_area *find_vmap_area_exceed_addr(unsigned long addr)
-{
-	struct vmap_area *va = NULL;
-	struct rb_node *n = vmap_area_root.rb_node;
-
-	addr = (unsigned long)kasan_reset_tag((void *)addr);
-
-	while (n) {
-		struct vmap_area *tmp;
-
-		tmp = rb_entry(n, struct vmap_area, rb_node);
-		if (tmp->va_end > addr) {
-			va = tmp;
-			if (tmp->va_start <= addr)
-				break;
-
-			n = n->rb_left;
-		} else
-			n = n->rb_right;
-	}
-
-	return va;
-}
-
 static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root)
 {
 	struct rb_node *n = root->rb_node;
@@ -3562,276 +3537,6 @@ void *vmalloc_32_user(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
-/*
- * Atomically zero bytes in the iterator.
- *
- * Returns the number of zeroed bytes.
- */
-static size_t zero_iter(struct iov_iter *iter, size_t count)
-{
-	size_t remains = count;
-
-	while (remains > 0) {
-		size_t num, copied;
-
-		num = min_t(size_t, remains, PAGE_SIZE);
-		copied = copy_page_to_iter_nofault(ZERO_PAGE(0), 0, num, iter);
-		remains -= copied;
-
-		if (copied < num)
-			break;
-	}
-
-	return count - remains;
-}
-
-/*
- * small helper routine, copy contents to iter from addr.
- * If the page is not present, fill zero.
- *
- * Returns the number of copied bytes.
- */
-static size_t aligned_vread_iter(struct iov_iter *iter,
-				 const char *addr, size_t count)
-{
-	size_t remains = count;
-	struct page *page;
-
-	while (remains > 0) {
-		unsigned long offset, length;
-		size_t copied = 0;
-
-		offset = offset_in_page(addr);
-		length = PAGE_SIZE - offset;
-		if (length > remains)
-			length = remains;
-		page = vmalloc_to_page(addr);
-		/*
-		 * To do safe access to this _mapped_ area, we need lock. But
-		 * adding lock here means that we need to add overhead of
-		 * vmalloc()/vfree() calls for this _debug_ interface, rarely
-		 * used. Instead of that, we'll use an local mapping via
-		 * copy_page_to_iter_nofault() and accept a small overhead in
-		 * this access function.
-		 */
-		if (page)
-			copied = copy_page_to_iter_nofault(page, offset,
-							   length, iter);
-		else
-			copied = zero_iter(iter, length);
-
-		addr += copied;
-		remains -= copied;
-
-		if (copied != length)
-			break;
-	}
-
-	return count - remains;
-}
-
-/*
- * Read from a vm_map_ram region of memory.
- *
- * Returns the number of copied bytes.
- */
-static size_t vmap_ram_vread_iter(struct iov_iter *iter, const char *addr,
-				  size_t count, unsigned long flags)
-{
-	char *start;
-	struct vmap_block *vb;
-	struct xarray *xa;
-	unsigned long offset;
-	unsigned int rs, re;
-	size_t remains, n;
-
-	/*
-	 * If it's area created by vm_map_ram() interface directly, but
-	 * not further subdividing and delegating management to vmap_block,
-	 * handle it here.
-	 */
-	if (!(flags & VMAP_BLOCK))
-		return aligned_vread_iter(iter, addr, count);
-
-	remains = count;
-
-	/*
-	 * Area is split into regions and tracked with vmap_block, read out
-	 * each region and zero fill the hole between regions.
-	 */
-	xa = addr_to_vb_xa((unsigned long) addr);
-	vb = xa_load(xa, addr_to_vb_idx((unsigned long)addr));
-	if (!vb)
-		goto finished_zero;
-
-	spin_lock(&vb->lock);
-	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
-		spin_unlock(&vb->lock);
-		goto finished_zero;
-	}
-
-	for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
-		size_t copied;
-
-		if (remains == 0)
-			goto finished;
-
-		start = vmap_block_vaddr(vb->va->va_start, rs);
-
-		if (addr < start) {
-			size_t to_zero = min_t(size_t, start - addr, remains);
-			size_t zeroed = zero_iter(iter, to_zero);
-
-			addr += zeroed;
-			remains -= zeroed;
-
-			if (remains == 0 || zeroed != to_zero)
-				goto finished;
-		}
-
-		/*it could start reading from the middle of used region*/
-		offset = offset_in_page(addr);
-		n = ((re - rs + 1) << PAGE_SHIFT) - offset;
-		if (n > remains)
-			n = remains;
-
-		copied = aligned_vread_iter(iter, start + offset, n);
-
-		addr += copied;
-		remains -= copied;
-
-		if (copied != n)
-			goto finished;
-	}
-
-	spin_unlock(&vb->lock);
-
-finished_zero:
-	/* zero-fill the left dirty or free regions */
-	return count - remains + zero_iter(iter, remains);
-finished:
-	/* We couldn't copy/zero everything */
-	spin_unlock(&vb->lock);
-	return count - remains;
-}
-
-/**
- * vread_iter() - read vmalloc area in a safe way to an iterator.
- * @iter:         the iterator to which data should be written.
- * @addr:         vm address.
- * @count:        number of bytes to be read.
- *
- * This function checks that addr is a valid vmalloc'ed area, and
- * copy data from that area to a given buffer. If the given memory range
- * of [addr...addr+count) includes some valid address, data is copied to
- * proper area of @buf. If there are memory holes, they'll be zero-filled.
- * IOREMAP area is treated as memory hole and no copy is done.
- *
- * If [addr...addr+count) doesn't includes any intersects with alive
- * vm_struct area, returns 0. @buf should be kernel's buffer.
- *
- * Note: In usual ops, vread() is never necessary because the caller
- * should know vmalloc() area is valid and can use memcpy().
- * This is for routines which have to access vmalloc area without
- * any information, as /proc/kcore.
- *
- * Return: number of bytes for which addr and buf should be increased
- * (same number as @count) or %0 if [addr...addr+count) doesn't
- * include any intersection with valid vmalloc area
- */
-long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
-{
-	struct vmap_area *va;
-	struct vm_struct *vm;
-	char *vaddr;
-	size_t n, size, flags, remains;
-
-	addr = kasan_reset_tag(addr);
-
-	/* Don't allow overflow */
-	if ((unsigned long) addr + count < count)
-		count = -(unsigned long) addr;
-
-	remains = count;
-
-	spin_lock(&vmap_area_lock);
-	va = find_vmap_area_exceed_addr((unsigned long)addr);
-	if (!va)
-		goto finished_zero;
-
-	/* no intersects with alive vmap_area */
-	if ((unsigned long)addr + remains <= va->va_start)
-		goto finished_zero;
-
-	list_for_each_entry_from(va, &vmap_area_list, list) {
-		size_t copied;
-
-		if (remains == 0)
-			goto finished;
-
-		vm = va->vm;
-		flags = va->flags & VMAP_FLAGS_MASK;
-		/*
-		 * VMAP_BLOCK indicates a sub-type of vm_map_ram area, need
-		 * be set together with VMAP_RAM.
-		 */
-		WARN_ON(flags == VMAP_BLOCK);
-
-		if (!vm && !flags)
-			continue;
-
-		if (vm && (vm->flags & VM_UNINITIALIZED))
-			continue;
-
-		/* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
-		smp_rmb();
-
-		vaddr = (char *) va->va_start;
-		size = vm ? get_vm_area_size(vm) : va_size(va);
-
-		if (addr >= vaddr + size)
-			continue;
-
-		if (addr < vaddr) {
-			size_t to_zero = min_t(size_t, vaddr - addr, remains);
-			size_t zeroed = zero_iter(iter, to_zero);
-
-			addr += zeroed;
-			remains -= zeroed;
-
-			if (remains == 0 || zeroed != to_zero)
-				goto finished;
-		}
-
-		n = vaddr + size - addr;
-		if (n > remains)
-			n = remains;
-
-		if (flags & VMAP_RAM)
-			copied = vmap_ram_vread_iter(iter, addr, n, flags);
-		else if (!(vm && (vm->flags & VM_IOREMAP)))
-			copied = aligned_vread_iter(iter, addr, n);
-		else /* IOREMAP area is treated as memory hole */
-			copied = zero_iter(iter, n);
-
-		addr += copied;
-		remains -= copied;
-
-		if (copied != n)
-			goto finished;
-	}
-
-finished_zero:
-	spin_unlock(&vmap_area_lock);
-	/* zero-fill memory holes */
-	return count - remains + zero_iter(iter, remains);
-finished:
-	/* Nothing remains, or We couldn't copy/zero everything. */
-	spin_unlock(&vmap_area_lock);
-
-	return count - remains;
-}
-
 /**
  * remap_vmalloc_range_partial - map vmalloc pages to userspace
  * @vma:		vma to cover

^ permalink raw reply related	[relevance 60%]

* Re: [tip: x86/mm] x86/mm: Ensure input to pfn_to_kaddr() is treated as a 64-bit type
  @ 2023-11-23 19:00 95%   ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-23 19:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Dave Hansen, H. Peter Anvin, Michael Roth,
	Ingo Molnar, Andy Lutomirski, Peter Zijlstra, Rik van Riel, x86

On Thu, 23 Nov 2023 at 02:31, tip-bot2 for Michael Roth
<tip-bot2@linutronix.de> wrote:
>
> On 64-bit platforms, the pfn_to_kaddr() macro requires that the input
> value is 64 bits in order to ensure that valid address bits don't get
> lost when shifting that input by PAGE_SHIFT to calculate the physical
> address to provide a virtual address for.

Bah. The commit is obviously fine, but can we please just get rid of
that broken pfn_to_kaddr() thing entirely?

It's a bogus mis-spelling of pfn_to_virt(), and I don't know why that
horrid thing exists. In *all* other situations we talk about "virt"
for kernel virtual addresses, I don't know where that horrid "kaddr"
came from (ie "virt_to_page()" and friends).

Most notably, we have "virt_to_pfn()" being quite commonly used. We
don't even have that kaddr_to_pfn(), which just shows *how* bogus this
whole "pfn_to_kaddr()" crud is.

The good news is that there aren't a ton of users. Anybody willing to
just do a search-and-replace and get rid of this pointless and wrong
thing?

Using "pfn_to_virt()" has the added advantage that we have a generic
implementation of it that isn't duplicated pointlessly for N
architectures, and that didn't have this bug:

  static inline void *pfn_to_virt(unsigned long pfn)
  {
        return __va(pfn) << PAGE_SHIFT;
  }
  #define pfn_to_virt pfn_to_virt

Hmm?

Amusingly (or sadly), we have s390 holding up the flag of sanity, and having

    #define pfn_to_kaddr(pfn)  pfn_to_virt(pfn)

and then we'd only need to fix the hexagon version of that macro
(since Hexagon made its own version, with the old bug - but I guess
Hexagon is 32-bit only and hopefully never grows 64-bit (??) so maybe
nobody cares).

           Linus

^ permalink raw reply	[relevance 95%]

* Re: [regression] microcode files missing in initramfs imgages from dracut (was Re: [PATCH] x86: Clean up remaining references to CONFIG_MICROCODE_AMD)
  @ 2023-11-22 21:08 94%                 ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-22 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux regressions mailing list, lukas.bulwahn, dave.hansen, hpa,
	kernel-janitors, linux-kernel, mingo, tglx, x86

On Wed, 22 Nov 2023 at 12:51, Borislav Petkov <bp@alien8.de> wrote:
>
> My only worry here is that we're making a precedent and basically saying
> that it is ok for tools to grep .config to figure out what is supported
> by the kernel. And then other tools might follow.

Yes, I agree that it's not optimal, but I would hate to have some odd
"let's add another ELF note" churn too, for (presumably) increasingly
obscure reasons.

It looks like dracut has been doing this forever, and in fact back in
2015 apparently had the exact same issue (that never made it to kernel
developers, or at least not to me), when the kernel
CONFIG_MICROCODE_xyz_EARLY config went away, and became just
CONFIG_MICROCODE_xyz.

The whole "check kernel config" in dracut seems to go back to 2014, so
it's been that way for almost a decade by now.

Honestly, I think the right approach may be to just remove the check
again from dracut entirely - the intent seems to be to make the initrd
smaller when people don't support microcode updates, but does that
ever actually *happen*?

There are dracut command lines, like "--early-microcode" and
"--no-early-microcode", so people who really want to save space could
just force it that way. Doing the CONFIG_xyz check seems broken.

But that's for the dracut people to worry about.

I guess we on the kernel side could help with "make install" etc, but
we've (intentionally) tried to insulate us from distros having
distro-specific installkernel scripts, so we don't really haev a good
way to pass information down to the installkernel side.

It *would* make sense if we just had some actual arguments we might
pass down. Right now we just do

        exec "${file}" "${KERNELRELEASE}" "${KBUILD_IMAGE}" System.map
"${INSTALL_PATH}"

so basically the only argument we pass down is that INSTALL_PATH
(which is just "/boot" by default).

            Linus

^ permalink raw reply	[relevance 94%]

* Re: [regression] microcode files missing in initramfs imgages from dracut (was Re: [PATCH] x86: Clean up remaining references to CONFIG_MICROCODE_AMD)
  @ 2023-11-22 20:35 97%             ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-22 20:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linux regressions mailing list, lukas.bulwahn, dave.hansen, hpa,
	kernel-janitors, linux-kernel, mingo, tglx, x86

On Wed, 22 Nov 2023 at 07:58, Borislav Petkov <bp@alien8.de> wrote:
>
> IMO, yes, we should not break userspace but dracut is special. And it
> parses willy nilly kernel internals which are not ABI to begin with.

I don't think the "dracut is special" is the thing that matters.

The real issue is that hey, if dracut in its incompetence doesn't
include the microcode on the initrd, that doesn't really matter much.
It's fairly easily fixable, and at worst it will mean that we end up
having CPU mitigations that aren't optimal. Since most of those are BS
anyway, it really doesn't seem critical.

Sure, it's a "regression" in that you don't get the microcode update
included, but from a user perspective things should still continue to
work.

End result: this seems to be pretty solidly a distro issue.

IOW, the whole "users are the only thing that matters" pretty much
means that it's a non-issue. Things continued to work, to the point
that I'm actually surprised anybody even noticed.

That said, I don't think some ELF note is the fix either. I think we
might as well leave it at CONFIG_MICROCODE. Maybe add a note in the
kernel Kconfig that this thing matters for dracut.

Dracut also checks for CONFIG_ACPI_INITRD_TABLE_OVERRIDE. It's a
similar "normal users don't care".

              Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH 2/2] zstd: Backport Huffman speed improvement from upstream
  @ 2023-11-21 20:54 97%           ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-21 20:54 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, Yann Collet,
	Kernel Team, Giovanni Cabiddu

On Tue, 21 Nov 2023 at 12:35, Nick Terrell <terrelln@meta.com> wrote:
> >
> > Honestly, any coding rule that includes "don't use the do-while-zero
> > construct" is actively broken shit.
> >
> > Please just fix your upstream rules. Because they are incredible garbage.
>
> Yeah, that’s the plan. Visual Studios fixed that compiler bug in VS2015 [0],
> so we should be safe to migrate to safer macros.

I don't even use MSVS, but a minute of googling shows that you should
never have done that silly "avoid sane C", and you should always just
have done

  #pragma warning (disable: 4127)

for MSVC.

Honestly, the fact that the result was instead to disable that
standard - and required - construct in the project makes me worry
about the whole zstd thing. WTF?

The do-while-zero construct is _so_ important that there are (sane)
projects that literally *require* the use of it. See for example MISRA
code safety rules.

The kernel rules aren't quite that strict, but yes, do-while-zero is
very much "you should *absolutely* do this" along with all the usual
"make sure you have parentheses around macro arguments" rules.

We had some RFC patches for this area:

   https://lore.kernel.org/all/20230511152951.1970870-1-mathieu.desnoyers@efficios.com/

And on that note, when I googled for the solution to the MSVC brain
damage, I was distressed by how many hits I saw where people thought
the do-while-zero pattern was some "legacy pattern".

That just shows that there are lots of incompetent people simply do
not understand why it's actually *required* for reliable parsing of
macros.  This is not some "historical stylistic" issue, it's literally
a correctness issue for generic macro usage.

           Linus

^ permalink raw reply	[relevance 97%]

* Re: [PATCH 2/2] zstd: Backport Huffman speed improvement from upstream
  @ 2023-11-21 20:12 99%       ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-21 20:12 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Nick Terrell, Linux Kernel Mailing List, Yann Collet,
	Kernel Team, Giovanni Cabiddu

On Tue, 21 Nov 2023 at 11:59, Nick Terrell <terrelln@meta.com> wrote:
>
> W.r.t. do { } while (0), our older Visual Studios CI jobs failed on the
> do { } while (0) macros, because it complained about constant false
> branches.

Wow. That is some truly incompetent compiler people.

I mean, really. As in "Why would you ever use that kind of garbage"
incompetence.

Honestly, any coding rule that includes "don't use the do-while-zero
construct" is actively broken shit.

Please just fix your upstream rules. Because they are incredible garbage.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PATCH 2/2] zstd: Backport Huffman speed improvement from upstream
  @ 2023-11-21 17:23 99%   ` Linus Torvalds
    0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 2023-11-21 17:23 UTC (permalink / raw)
  To: Nick Terrell
  Cc: linux-kernel, Yann Collet, Nick Terrell, Kernel Team,
	Giovanni Cabiddu, Nick Terrell

On Mon, 20 Nov 2023 at 16:52, Nick Terrell <nickrterrell@gmail.com> wrote:
>
> +/* Calls X(N) for each stream 0, 1, 2, 3. */
> +#define HUF_4X_FOR_EACH_STREAM(X) \
> +    {                             \
> +        X(0)                      \
> +        X(1)                      \
> +        X(2)                      \
> +        X(3)                      \
> +    }
> +
> +/* Calls X(N, var) for each stream 0, 1, 2, 3. */
> +#define HUF_4X_FOR_EACH_STREAM_WITH_VAR(X, var) \
> +    {                                           \
> +        X(0, (var))                             \
> +        X(1, (var))                             \
> +        X(2, (var))                             \
> +        X(3, (var))                             \
> +    }
> +

What shitty compilers do you need to be compatible with?

Because at least for Linux, the above is one single #define:

    #define FOUR(X,y...) do { \
        X(0,##y); X(1,##y); X(2,##y); X(3,##y); \
    } while (0)

and it does the right thing for any number of arguments, ie

    FOUR(fn)
    FOUR(fn1,a)
    FOUR(fn2,a,b)

expands to

    do { fn(0); fn(1); fn(2); fn(3); } while (0)
    do { fn1(0,a); fn1(1,a); fn1(2,a); fn1(3,a); } while (0)
    do { fn2(0,a,b); fn2(1,a,b); fn2(2,a,b); fn2(3,a,b); } while (0)

so unless you need to support some completely garbage compiler
upstream, please just do the single #define.

               Linus

^ permalink raw reply	[relevance 99%]

* Re: [PULL REQUEST] i2c-for-6.7-rc2
  @ 2023-11-20 17:32 99%     ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 2023-11-20 17:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Wolfram Sang, linux-i2c, linux-kernel, Peter Rosin,
	Bartosz Golaszewski, Andi Shyti, Catalin Marinas

On Mon, 20 Nov 2023 at 07:05, Will Deacon <will@kernel.org> wrote:
>
> and I think the high-level problem was something like:
>
> 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer()
>    setting 'dev->msg_read_idx' to 0)
> 2. CPU x writes to an I/O register on this I2C controller which generates
>    an IRQ (end of i2c_dw_xfer_init())
> 3. CPU y takes the IRQ
> 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1)
>
> (i2c folks: please chime in if I got this wrong)
>
> the issue being that the writes in (1) are not ordered before the I/O
> access in (2) if the relaxed accessor is used.

Ok, then removing relaxed is indeed the right thing to do. Because
yes, it's an actual ordering issue with the IO write, not some locking
issue.

Thanks for filling in the details, that patch looked iffy to me, but
it does sound like everything is good.

             Linus

^ permalink raw reply	[relevance 99%]

Results 401-600 of ~40000   |  | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-03-12 11:32     [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
2021-03-12 11:32     ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
2024-01-20  6:58       ` Thorsten Glaser
2024-01-20  8:22         ` H. Peter Anvin
2024-01-20 17:00 94%       ` Linus Torvalds
2023-08-25 14:12     [PATCH] x86: Clean up remaining references to CONFIG_MICROCODE_AMD Lukas Bulwahn
2023-11-12 15:03     ` [regression] microcode files missing in initramfs imgages from dracut (was Re: [PATCH] x86: Clean up remaining references to CONFIG_MICROCODE_AMD) Linux regression tracking (Thorsten Leemhuis)
2023-11-12 18:10       ` Borislav Petkov
2023-11-22  9:15         ` Linux regression tracking (Thorsten Leemhuis)
2023-11-22 11:58           ` Borislav Petkov
2023-11-22 15:34             ` Linux regression tracking (Thorsten Leemhuis)
2023-11-22 15:57               ` Borislav Petkov
2023-11-22 20:35 97%             ` Linus Torvalds
2023-11-22 20:51                   ` Borislav Petkov
2023-11-22 21:08 94%                 ` Linus Torvalds
2023-09-18  8:14     [PATCH next v4 0/5] minmax: Relax type checks in min() and max() David Laight
2024-01-08 11:46     ` Jiri Slaby
2024-01-08 18:19 97%   ` Linus Torvalds
2024-01-08 20:04 83%     ` Linus Torvalds
2024-01-08 21:11 99%       ` Linus Torvalds
2024-01-10  6:17             ` Stephen Rothwell
2024-01-10 19:35 99%           ` Linus Torvalds
2024-01-10 22:58                 ` David Laight
2024-01-20 21:33 99%               ` Linus Torvalds
2023-09-20 19:23     x86/csum: Remove unnecessary odd handling Noah Goldstein
2023-09-23  3:24     ` kernel test robot
2023-09-23 14:05       ` Noah Goldstein
2023-09-23 21:13         ` David Laight
2023-09-24 14:35           ` Noah Goldstein
2023-12-23 22:18             ` Noah Goldstein
2024-01-04 23:28               ` Noah Goldstein
2024-01-04 23:36 99%             ` Linus Torvalds
2024-01-05  0:33 99%               ` Linus Torvalds
2024-01-05 10:41                     ` David Laight
2024-01-05 18:05 82%                   ` Linus Torvalds
2024-01-05 23:52                         ` David Laight
2024-01-06  0:18 99%                       ` Linus Torvalds
2024-01-06 10:26                             ` Eric Dumazet
2024-01-06 19:32 90%                           ` Linus Torvalds
2023-10-14 17:22     [PATCH] get_maintainer: correctly parse UTF-8 encoded names in files Alvin Šipraga
2023-10-16 14:37     ` Duje Mihanović
2023-10-16 22:17       ` Joe Perches
2023-10-16 23:56         ` Alvin Šipraga
2023-12-14  1:06           ` Alvin Šipraga
2023-12-14  1:41 99%         ` Linus Torvalds
2023-10-18 11:59     ovl: ovl_fs::creator_cred::usage scalability issues Amir Goldstein
2023-12-14 22:02     ` [RFC] HACK: overlayfs: Optimize overlay/restore creds Vinicius Costa Gomes
2023-12-15 10:30       ` Amir Goldstein
2023-12-15 20:00         ` Vinicius Costa Gomes
2023-12-16 10:16           ` Amir Goldstein
2023-12-16 18:26 90%         ` Linus Torvalds
2023-10-25 14:01     [PATCH v4 0/6] querying mount attributes Miklos Szeredi
2023-10-25 14:02     ` [PATCH v4 5/6] add listmount(2) syscall Miklos Szeredi
2024-01-10 22:23       ` Guenter Roeck
2024-01-11  0:32 99%     ` Linus Torvalds
2024-01-11 18:57           ` Guenter Roeck
2024-01-11 20:14 82%         ` Linus Torvalds
2024-01-11 23:57               ` Guenter Roeck
2024-01-12  3:40 93%             ` Linus Torvalds
2023-10-28 12:23     [GIT PULL] Scheduler changes for v6.7 Ingo Molnar
2024-01-08 14:07     ` [GIT PULL] Scheduler changes for v6.8 Ingo Molnar
2024-01-10 22:19 99%   ` Linus Torvalds
2024-01-10 22:41 99%     ` Linus Torvalds
2024-01-10 22:57 99%       ` Linus Torvalds
2024-01-11  8:11             ` Vincent Guittot
2024-01-11 17:45 99%           ` Linus Torvalds
2024-01-11 17:53 99%             ` Linus Torvalds
2024-01-11 18:16                   ` Vincent Guittot
2024-01-12 14:23                     ` Dietmar Eggemann
2024-01-12 18:18                       ` Qais Yousef
2024-01-12 19:03                         ` Vincent Guittot
2024-01-12 20:30 92%                       ` Linus Torvalds
2024-01-12 20:49 99%                         ` Linus Torvalds
2024-01-12 21:04 97%                           ` Linus Torvalds
2024-01-13  1:04                                 ` Qais Yousef
2024-01-13  1:24 99%                               ` Linus Torvalds
2024-01-13  1:31 99%                                 ` Linus Torvalds
2023-11-07  6:05     [PATCH] wifi: brcmfmac: cfg80211: Use WSEC to set SAE password Hector Martin
2023-12-17 11:25     ` Kalle Valo
2023-12-19  8:52       ` Arend Van Spriel
2023-12-19 11:01         ` Hector Martin
2023-12-19 13:46           ` Arend van Spriel
     [not found]             ` <CAF4BwTXNtu30DAgBXo4auDaDK0iWc9Ch8f=EH+facQ-_F-oMUQ@mail.gmail.com>
2023-12-19 14:42               ` Kalle Valo
2023-12-20  0:06                 ` Hector Martin
2023-12-20  1:44 96%               ` Linus Torvalds
2023-11-18  0:04     [PULL REQUEST] i2c-for-6.7-rc2 Wolfram Sang
2023-11-18 17:56     ` Linus Torvalds
2023-11-20 15:05       ` Will Deacon
2023-11-20 17:32 99%     ` Linus Torvalds
2023-11-20  7:11     [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression kernel test robot
2023-11-26 20:23 74% ` Linus Torvalds
2023-11-26 23:20 79%   ` Linus Torvalds
2023-11-27 10:27         ` Christian Brauner
2023-11-27 17:10 87%       ` Linus Torvalds
2023-11-21  1:03     [PATCH 0/2] zstd: import upstream v1.5.5 Nick Terrell
2023-11-21  1:03     ` [PATCH 2/2] zstd: Backport Huffman speed improvement from upstream Nick Terrell
2023-11-21 17:23 99%   ` Linus Torvalds
2023-11-21 19:59         ` Nick Terrell
2023-11-21 20:12 99%       ` Linus Torvalds
2023-11-21 20:35             ` Nick Terrell
2023-11-21 20:54 97%           ` Linus Torvalds
2023-11-22 16:37     [PATCH v3] x86: Ensure input to pfn_to_kaddr() is treated as a 64-bit type Michael Roth
2023-11-23 10:31     ` [tip: x86/mm] x86/mm: " tip-bot2 for Michael Roth
2023-11-23 19:00 95%   ` Linus Torvalds
2023-11-24  6:02     [RFC][PATCHSET v3] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-11-24  6:04     ` [PATCH v3 01/21] switch nfsd_client_rmdir() to use of simple_recursive_removal() Al Viro
2023-11-24  6:04       ` [PATCH v3 02/21] coda_flag_children(): cope with dentries turning negative Al Viro
2023-11-24 21:22 99%     ` Linus Torvalds
2023-11-24 21:28 99% ` [RFC][PATCHSET v3] simplifying fast_dput(), dentry_kill() et.al Linus Torvalds
2023-11-24  6:05     [PATCHES] assorted dcache stuff Al Viro
2023-11-24 21:41 99% ` Linus Torvalds
2023-11-24 10:27     [GIT PULL] vfs fixes Christian Brauner
2023-11-24 18:25 60% ` Linus Torvalds
2023-11-24 18:52 99%   ` Linus Torvalds
2023-11-24 20:12 99%     ` Linus Torvalds
2023-11-24 13:26     [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios David Hildenbrand
2023-11-24 20:55 92% ` Linus Torvalds
2023-11-24 15:52     [GIT PULL] afs: Miscellaneous fixes David Howells
2023-11-24 18:39 99% ` Linus Torvalds
2023-11-25 21:07     [syzbot] [kernel?] possible deadlock in stack_depot_put syzbot
     [not found]     ` <20231205113107.1324-1-hdanton@sina.com>
2023-12-05 12:00       ` Tetsuo Handa
2023-12-06  9:42         ` Petr Mladek
     [not found]           ` <20231206112215.1381-1-hdanton@sina.com>
2023-12-06 11:40 97%         ` Linus Torvalds
2023-11-26  6:58     [GIT PULL] fbdev fixes and updates for v6.7-rc3 Helge Deller
2023-11-26 16:29 99% ` Linus Torvalds
2023-11-27  4:13 53% Linux 6.7-rc3 Linus Torvalds
2023-11-29 12:09     [GIT PULL] Pin control fixes for v6.7 Linus Walleij
2023-11-29 14:55 99% ` Linus Torvalds
2023-11-29 15:18     [GIT PULL] Pin control fixes for v6.7 minus one patch Linus Walleij
2023-11-29 15:48 99% ` Linus Torvalds
2023-12-03 10:18 47% Linux 6.7-rc4 Linus Torvalds
2023-12-03 22:10     [PATCH -tip 1/3] x86/percpu: Fix "const_pcpu_hot" version generation failure Uros Bizjak
2023-12-03 22:19 99% ` Linus Torvalds
2023-12-06  0:43     [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Michal Luczaj
2023-12-09 17:16     ` Brian Gerst
2023-12-09 20:08 99%   ` Linus Torvalds
2023-12-10 22:53 39% Linux 6.7-rc5 Linus Torvalds
2023-12-12 23:16     [RFC PATCH v3 00/11] Introduce mseal() jeffxu
2023-12-12 23:17     ` [RFC PATCH v3 11/11] mseal:add documentation jeffxu
2023-12-13  0:39 99%   ` Linus Torvalds
2023-12-14  0:35         ` Jeff Xu
2023-12-14  1:31 89%       ` Linus Torvalds
2023-12-14 18:06             ` Stephen Röttger
2023-12-14 20:14 86%           ` Linus Torvalds
2023-12-14 22:52                 ` Jeff Xu
2024-01-20 15:23                   ` Theo de Raadt
2024-01-20 16:40 99%                 ` Linus Torvalds
2023-12-13 16:34     [PATCH 0/3] Reject setting system segments from userspace Brian Gerst
2023-12-13 16:34     ` [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Brian Gerst
2023-12-13 18:51 96%   ` Linus Torvalds
2023-12-13 19:08 97%     ` Linus Torvalds
2023-12-16 18:24           ` Vegard Nossum
2023-12-16 18:40 95%         ` Linus Torvalds
2023-12-13 16:34     ` [PATCH 3/3] x86/sigreturn: Reject system segements Brian Gerst
2023-12-13 18:54 99%   ` Linus Torvalds
2023-12-17 21:07         ` H. Peter Anvin
2023-12-17 21:40 99%       ` Linus Torvalds
2023-12-14  2:11     [PATCH] ring-buffer: Remove 32bit timestamp logic Steven Rostedt
2023-12-14  2:46     ` Steven Rostedt
2023-12-14  6:53 92%   ` Linus Torvalds
2023-12-14 16:56         ` Steven Rostedt
2023-12-14 19:44 90%       ` Linus Torvalds
2023-12-14 20:36             ` Steven Rostedt
2023-12-14 20:50 99%           ` Linus Torvalds
2023-12-14 17:54     [PATCH v3] " Steven Rostedt
2023-12-14 19:46 99% ` Linus Torvalds
2023-12-14 20:19       ` Steven Rostedt
2023-12-14 20:30 99%     ` Linus Torvalds
2023-12-14 20:32 99%       ` Linus Torvalds
2023-12-15 15:16     [GIT PULL] hotfixes for 6.7-rc6 Andrew Morton
2023-12-15 20:11 99% ` Linus Torvalds
2023-12-15 20:22       ` Andrew Morton
2023-12-16  4:56         ` Yu Zhao
2023-12-17  0:16 99%       ` Linus Torvalds
2023-12-17 23:53 50% Linux 6.7-rc6 Linus Torvalds
     [not found]     <20231219000520.34178-1-alexei.starovoitov@gmail.com>
2023-12-19  0:55     ` pull-request: bpf-next 2023-12-18 Jakub Kicinski
2023-12-19  1:17 99%   ` Linus Torvalds
2023-12-19  8:49     [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation kernel test robot
2023-12-19  9:58     ` Borislav Petkov
2023-12-19 18:20 90%   ` Linus Torvalds
2023-12-19 19:15         ` Andrew Cooper
2023-12-19 20:17 99%       ` Linus Torvalds
2023-12-19 23:15 71%         ` Linus Torvalds
2023-12-20 23:39               ` Sami Tolvanen
2023-12-21  5:38 98%             ` Linus Torvalds
2023-12-19 15:11     [PATCH 0/5] replace magic numbers in GDT descriptors Vegard Nossum
2023-12-19 17:33 99% ` Linus Torvalds
2023-12-21  3:08     [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
2023-12-21  3:09     ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
2024-01-03 11:36       ` Jiri Slaby
2024-01-03 19:14 99%     ` Linus Torvalds
2024-01-03 21:34           ` Oliver Giles
2024-01-03 21:57 99%         ` Linus Torvalds
2023-12-21 15:09     [PATCH] afs: Fix overwriting of result of DNS query David Howells
2023-12-21 18:01 99% ` Linus Torvalds
2023-12-21 15:27     [GIT PULL] tracing: A few more fixes for 6.7 Steven Rostedt
2023-12-21 17:45 98% ` Linus Torvalds
2023-12-21 19:28       ` Steven Rostedt
2023-12-21 20:01 94%     ` Linus Torvalds
2023-12-21 15:30     [GIT PULL] afs, dns: Fix dynamic root interaction with negative DNS David Howells
2023-12-23 17:28     ` Simon Horman
2023-12-23 19:14 88%   ` Linus Torvalds
2023-12-24  0:02       ` [PATCH] keys, dns: Fix missing size check of V1 server-list header David Howells
2024-01-10 10:14         ` David Howells
2024-01-10 11:06           ` Pengfei Xu
2024-01-10 17:23             ` David Howells
2024-01-10 18:52 99%           ` Linus Torvalds
2023-12-22 13:29     [GIT PULL] tracing: Fix eventfs ownership again Steven Rostedt
2023-12-22 22:24 99% ` Linus Torvalds
2023-12-24  0:42 48% Linux 6.7-rc7 Linus Torvalds
2023-12-27 23:03     [GIT PULL] hotfixes for 6.7 Andrew Morton
2023-12-28  0:36 99% ` Linus Torvalds
2023-12-29 20:51     [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code David Laight
2023-12-29 20:56     ` [PATCH next 3/5] locking/osq_lock: Clarify osq_wait_next() David Laight
2023-12-29 22:54 97%   ` Linus Torvalds
2023-12-29 20:57     ` [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses David Laight
2023-12-30 20:41 87%   ` Linus Torvalds
2023-12-30 20:59 95%     ` Linus Torvalds
2023-12-30 19:40 99% ` [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code Linus Torvalds
2023-12-31 21:04 86% Linux 6.7-rc8 Linus Torvalds
2024-01-03 15:25     [PATCH] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
2024-01-03 18:12 95% ` Linus Torvalds
2024-01-03 18:38 87%   ` Linus Torvalds
2024-01-03 18:51         ` Steven Rostedt
2024-01-03 19:04 97%       ` Linus Torvalds
2024-01-03 19:53         ` Steven Rostedt
2024-01-03 19:57 98%       ` Linus Torvalds
2024-01-03 21:54 80%         ` Linus Torvalds
2024-01-03 22:05               ` Steven Rostedt
2024-01-03 22:14 97%             ` Linus Torvalds
2024-01-03 22:06 99%           ` Linus Torvalds
2024-01-03 22:14               ` Al Viro
2024-01-03 22:17 99%             ` Linus Torvalds
2024-01-04  1:32     [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
2024-01-04  1:48     ` Al Viro
2024-01-04  2:25       ` Steven Rostedt
2024-01-04  4:39         ` Al Viro
2024-01-04 15:05           ` Steven Rostedt
2024-01-04 18:25             ` Al Viro
2024-01-04 19:10               ` Steven Rostedt
2024-01-04 19:21 92%             ` Linus Torvalds
2024-01-04 19:15               ` Steven Rostedt
2024-01-04 19:35 93%             ` Linus Torvalds
2024-01-04 20:02 99%               ` Linus Torvalds
2024-01-04  2:29     [git pull] drm fixes for 6.8 Dave Airlie
2024-01-04 18:50 99% ` Linus Torvalds
2024-01-04 15:48     [GIT PULL] Final KVM fix for Linux 6.7 Paolo Bonzini
2024-01-05 17:21 99% ` Linus Torvalds
2024-01-05 17:29       ` Sean Christopherson
2024-01-05 17:38 99%     ` Linus Torvalds
2024-01-04 16:47     [for-next][PATCH 0/3] tracefs/eventfs: Updates for 6.8 Steven Rostedt
2024-01-04 16:47     ` [for-next][PATCH 2/3] eventfs: Stop using dcache_readdir() for getdents() Steven Rostedt
2024-01-04 18:46 97%   ` Linus Torvalds
2024-01-04 19:02         ` Steven Rostedt
2024-01-04 20:05           ` Steven Rostedt
2024-01-04 20:18 98%         ` Linus Torvalds
2024-01-04 16:47     ` [for-next][PATCH 3/3] tracefs/eventfs: Use root and instance inodes as default ownership Steven Rostedt
2024-01-04 18:38 99%   ` Linus Torvalds
2024-01-04 18:51     [RFC PATCH v4 0/4] Introduce mseal() jeffxu
2024-01-04 18:51     ` [RFC PATCH v4 2/4] mseal: add mseal syscall jeffxu
2024-01-07 18:41 99%   ` Linus Torvalds
2024-01-05 12:46     [GIT PULL] vfs mount api updates Christian Brauner
2024-01-09  1:02 99% ` Linus Torvalds
2024-01-05 19:03     [GIT PULL] Btrfs updates for 6.8 David Sterba
2024-01-10 17:34 99% ` Linus Torvalds
2024-01-05 23:21     [GIT PULL] lsm/lsm-pr-20240105 Paul Moore
2024-01-09 21:07 99% ` Linus Torvalds
2024-01-10 19:54       ` Paul Moore
2024-01-10 20:22 96%     ` Linus Torvalds
2024-01-06 17:41     include/asm-generic/unaligned.h:119:16: sparse: sparse: cast truncates bits from constant value (aa01a0 becomes a0) kernel test robot
2024-01-07  0:42     ` Dmitry Torokhov
2024-01-07  5:54 98%   ` Linus Torvalds
2024-01-07 20:29 62% Linux 6.7 Linus Torvalds
2024-01-08 11:35     [GIT PULL] x86/mm changes for v6.8 Ingo Molnar
2024-01-09  2:06 99% ` Linus Torvalds
2024-01-09  3:57 85%   ` Linus Torvalds
2024-01-08 17:05     [GIT PULL] bitmap patches " Yury Norov
2024-01-21 21:47 91% ` Linus Torvalds
2024-01-08 18:35     [GIT PULL] execve updates for v6.8-rc1 Kees Cook
2024-01-09  0:19 83% ` Linus Torvalds
2024-01-09  0:30 99%   ` Linus Torvalds
2024-01-09  0:46 99%     ` Linus Torvalds
2024-01-09  1:48       ` Kees Cook
2024-01-09  1:53 99%     ` Linus Torvalds
2024-01-09  3:28 84%       ` Linus Torvalds
2024-01-09 18:57         ` Josh Triplett
2024-01-09 23:40 72%       ` Linus Torvalds
2024-01-10  2:21             ` Josh Triplett
2024-01-10  3:54 91%           ` Linus Torvalds
2024-01-11  9:47                 ` Al Viro
2024-01-11 10:05                   ` Al Viro
2024-01-11 17:42 99%                 ` Linus Torvalds
2024-01-20 22:18 94%                   ` Linus Torvalds
2024-01-11 17:37 99%               ` Linus Torvalds
2024-01-10 19:24               ` Kees Cook
2024-01-10 20:12 92%             ` Linus Torvalds
2024-01-08 18:59     [GIT PULL] Documentation for 6.8 Jonathan Corbet
2024-01-12  3:53 99% ` Linus Torvalds
2024-01-09 18:18     [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot
2024-01-09 19:05 94% ` Linus Torvalds
2024-01-10  4:40     [PATCH] keys, dns: Fix missing size check of V1 server-list header Pengfei Xu
2024-01-10 19:36     [GIT PULL] bcachefs updates for 6.8 Kent Overstreet
2024-01-10 23:48     ` Kees Cook
2024-01-11  0:04       ` Kent Overstreet
2024-01-11  0:39         ` Kees Cook
2024-01-11  0:58           ` Kent Overstreet
2024-01-11  1:47 99%         ` Linus Torvalds
2024-01-11 22:57               ` Matthew Wilcox
2024-01-11 23:42                 ` Kees Cook
2024-01-11 23:58 99%               ` Linus Torvalds
2024-01-10 19:49     [git pull] drm " Dave Airlie
2024-01-12 19:33 96% ` Linus Torvalds
2024-01-10 20:48     [GIT PULL] first round of SCSI updates for the 6.7+ merge window James Bottomley
2024-01-11 22:36 98% ` Linus Torvalds
2024-01-11 22:47       ` James Bottomley
2024-01-11 22:53 99%     ` Linus Torvalds
2024-01-11 22:47 97%   ` Linus Torvalds
2024-01-11 23:28         ` James Bottomley
2024-01-11 23:50 92%       ` Linus Torvalds
2024-01-12 14:27             ` Konstantin Ryabitsev
2024-01-12 18:34 99%           ` Linus Torvalds
2024-01-11 18:28     [GIT PULL] f2fs update for 6.8-rc1 Jaegeuk Kim
2024-01-12  5:05 96% ` Linus Torvalds
2024-01-12  7:12       ` Al Viro
2024-01-12 18:18 99%     ` Linus Torvalds
2024-01-11 18:32     [GIT PULL] RCU changes for v6.8 Neeraj Upadhyay (AMD)
2024-01-11 19:12 99% ` Linus Torvalds
2024-01-13 18:33     [PATCH] media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c) Aurelien Jarno
2024-01-14 11:04     ` Hans Verkuil
2024-01-21 19:57 99%   ` Linus Torvalds
2024-01-13 21:31 99% Heads up - effectively offline for now Linus Torvalds
2024-01-15 20:43     [GIT PULL] power-supply changes for 6.8 Sebastian Reichel
2024-01-17 18:00     ` Nathan Chancellor
2024-01-18  0:11 99%   ` Linus Torvalds
2024-01-16 16:42     [GIT PULL] Backlight for v6.8 Lee Jones
2024-01-17 23:38 99% ` Linus Torvalds
2024-01-16 22:55     [PATCH v3 0/2] eventfs: Create dentries and inodes at dir open Steven Rostedt
2024-01-16 22:55     ` [PATCH v3 1/2] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
2024-01-22 21:59       ` Darrick J. Wong
2024-01-22 22:02 99%     ` Linus Torvalds
2024-01-17 14:35     [for-linus][PATCH 0/3] eventfs: A few more fixes for 6.8 Steven Rostedt
2024-01-17 14:35     ` [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same Steven Rostedt
2024-01-22 10:38       ` Geert Uytterhoeven
2024-01-22 15:06         ` Steven Rostedt
2024-01-22 16:23           ` Geert Uytterhoeven
2024-01-22 16:47             ` Steven Rostedt
2024-01-22 17:37 97%           ` Linus Torvalds
2024-01-22 17:39 99%             ` Linus Torvalds
2024-01-22 18:19 94%               ` Linus Torvalds
2024-01-22 19:44                     ` Steven Rostedt
2024-01-25 17:40                       ` Christian Brauner
2024-01-25 18:07                         ` Steven Rostedt
2024-01-25 18:08                           ` Steven Rostedt
2024-01-26  8:07                             ` Geert Uytterhoeven
2024-01-26 10:11                               ` Christian Brauner
2024-01-26 16:25                                 ` Steven Rostedt
2024-01-26 19:09 99%                               ` Linus Torvalds
2024-01-17 16:14     [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
2024-01-17 16:14     ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
2024-01-17 19:00       ` Jeff Layton
2024-01-17 19:39         ` Josh Poimboeuf
2024-01-17 20:20 92%       ` Linus Torvalds
2024-01-17 21:02             ` Shakeel Butt
2024-01-17 22:20               ` Roman Gushchin
2024-01-17 22:56                 ` Shakeel Butt
2024-01-22  5:10 85%               ` Linus Torvalds
2024-01-17 21:30     [PULL REQUEST] i2c-for-6.8-rc1-fixed Wolfram Sang
2024-01-18  0:02 99% ` Linus Torvalds
2024-01-18  7:30     [GIT PULL] dma-mapping fixes for Linux 6.8 Christoph Hellwig
2024-01-19  0:52 99% ` Linus Torvalds
2024-01-19 21:14     [GIT PULL] strlcpy removal for v6.8-rc1 Kees Cook
2024-01-19 22:00 99% ` Linus Torvalds
2024-01-19 22:53       ` Kees Cook
2024-01-19 23:59 99%     ` Linus Torvalds
2024-01-20 15:26     [GIT PULL] final round of SCSI updates for the 6.7+ merge window James Bottomley
2024-01-20 17:52 97% ` Linus Torvalds
2024-01-20 19:09       ` James Bottomley
2024-01-20 19:35 99%     ` Linus Torvalds
2024-01-21  6:30           ` Theodore Ts'o
2024-01-21 18:48 96%         ` Linus Torvalds
2024-01-24  5:36               ` Theodore Ts'o
2024-01-25 17:56 96%             ` Linus Torvalds
2024-01-21 21:35     [GIT PULL] More bcachefs updates for 6.8-rc1 Kent Overstreet
2024-01-21 22:05 99% ` Linus Torvalds
2024-01-21 22:23 76% Linux 6.8-rc1 Linus Torvalds
2024-01-22 15:29     [GIT PULL] Enable -Wstringop-overflow globally Gustavo A. R. Silva
2024-01-26 21:22 99% ` Linus Torvalds
2024-01-26 21:30       ` Gustavo A. R. Silva
2024-01-26 22:24         ` Kees Cook
2024-01-26 22:36 99%       ` Linus Torvalds
2024-01-22 18:33     [GIT PULL] Btrfs fixes for 6.8-rc2 David Sterba
2024-01-22 22:34 99% ` Linus Torvalds
2024-01-22 22:54 99%   ` Linus Torvalds
2024-01-22 23:01 99%     ` Linus Torvalds
2024-01-26 19:25 98% ` Linus Torvalds
2024-01-26 20:00       ` David Sterba
2024-01-26 21:39         ` Qu Wenruo
2024-01-26 21:51 99%       ` Linus Torvalds
2024-01-26 21:56             ` Qu Wenruo
2024-01-26 22:02 99%           ` Linus Torvalds
2024-01-22 23:06     [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4 Steven Rostedt
2024-01-22 23:15     ` Steven Rostedt
2024-01-22 23:19       ` Steven Rostedt
2024-01-23  0:43 99%     ` Linus Torvalds
     [not found]           ` <27c3d1e9-5933-47a9-9c33-ff8ec13f40d3@amd.com>
2024-01-23  1:25 99%         ` Linus Torvalds
2024-01-23  0:26     [PATCH 00/82] overflow: Refactor open-coded arithmetic wrap-around Kees Cook
2024-01-23  0:27     ` [PATCH 34/82] ipc: Refactor intentional wrap-around calculation Kees Cook
2024-01-23  1:07 94%   ` Linus Torvalds
2024-01-23  1:38         ` Kees Cook
2024-01-23 18:06 93%       ` Linus Torvalds
2024-01-24 10:42     Strange EFAULT on mips64el returned by syscall when another thread is forking Xi Ruoyao
2024-01-24 11:59     ` Andreas Schwab
2024-01-24 12:49       ` Xi Ruoyao
2024-01-24 16:13         ` Xi Ruoyao
2024-01-24 21:32           ` Xi Ruoyao
2024-01-24 21:54 93%         ` Linus Torvalds
2024-01-24 22:10 99%           ` Linus Torvalds
2024-01-24 16:19     [6.8-rc1 Regression] Unable to exec apparmor_parser from virt-aa-helper Kevin Locke
2024-01-24 16:35     ` Kees Cook
2024-01-24 16:46 99%   ` Linus Torvalds
2024-01-24 16:54 99%     ` Linus Torvalds
2024-01-24 17:10 93%       ` Linus Torvalds
2024-01-24 17:21             ` Kees Cook
2024-01-24 17:27 99%           ` Linus Torvalds
2024-01-24 18:27 89%             ` Linus Torvalds
2024-01-24 18:29 99%               ` Linus Torvalds
2024-01-24 19:02                   ` Kees Cook
2024-01-24 19:41 99%                 ` Linus Torvalds
2024-01-25 14:16                   ` Tetsuo Handa
2024-01-25 17:17 99%                 ` Linus Torvalds
2024-01-24 19:22     [PATCH] exec: Check __FMODE_EXEC instead of in_execve for LSMs Kees Cook
2024-01-24 19:58     ` Jann Horn
2024-01-24 20:15       ` Kees Cook
2024-01-24 20:47 91%     ` Linus Torvalds
2024-01-25 17:34 89% [PATCH] x86: mm: get rid of conditional IF flag handling in page fault path Linus Torvalds
2024-01-26  9:39 60% ` [tip: x86/mm] x86/mm: Get " tip-bot2 for Linus Torvalds
2024-01-25 18:29     [PATCH] softirq: fix memory corruption when freeing tasklet_struct Mikulas Patocka
2024-01-25 19:51 84% ` Linus Torvalds
2024-01-26 18:18     [PATCH] eventfs: Give files a default of PAGE_SIZE size Steven Rostedt
2024-01-26 18:31 99% ` Linus Torvalds
2024-01-26 18:41       ` Steven Rostedt
2024-01-26 19:06 99%     ` Linus Torvalds
2024-01-26 20:02     [PATCH] eventfs: Have inodes have unique inode numbers Steven Rostedt
2024-01-26 20:25 94% ` Linus Torvalds
2024-01-26 21:26       ` Steven Rostedt
2024-01-26 21:31 99%     ` Linus Torvalds
2024-01-26 21:36 98%     ` Linus Torvalds
2024-01-26 21:49 99%       ` Linus Torvalds
2024-01-26 22:08             ` Steven Rostedt
2024-01-26 22:26 99%           ` Linus Torvalds
2024-01-26 22:14             ` Mathieu Desnoyers
2024-01-26 22:29 99%           ` Linus Torvalds
2024-01-26 22:34               ` Matthew Wilcox
2024-01-26 22:48 98%             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).