On Mon, 16 Oct 2023 at 13:35, Nadav Amit wrote: > > I have encountered several such issues before [1], and while some have been fixed, > some have not (I looked at switch_fpu_finish()), and might under the right/wrong > circumstances use the wrongly-“cached” current. Moreover, perhaps new problems > have been added since my old patch. Yeah, that fpu switching is disgusting and borderline buggy. And yes, it would trigger problems when caching the value of 'current'. I don't particularly love the patch you pointed at, because it seems to have only fixed the switch_fpu_finish() case, which is the one that presumably triggered issues, but that's not a very pretty fix. switch_fpu_prepare() has the exact same problem, and in fact is likely the *source* of the issue, because that's the original "load current" that then ends up being cached incorrectly later in __switch_to(). The whole struct fpu *prev_fpu = &prev->fpu; thing in __switch_to() is pretty ugly. There's no reason why we should look at that 'prev_fpu' pointer there, or pass it down. And it only generates worse code, in how it loads 'current' when t__switch_to() has the right task pointers. So the attached patch is, I think, the right thing to do. It may not be the *complete* fix, but at least for the config I tested, this makes all loads of 'current' go away in the resulting generated assembly, and the only access to '%gs:pcpu_hot(%rip)' is the write to update it: movq %rbx, %gs:pcpu_hot(%rip) from that raw_cpu_write(pcpu_hot.current_task, next_p); code. Thomas, I think you've touched this code last, but even that isn't very recent. The attached patch not only cleans this up, it actually generates better code too: (a) it removes one push/pop pair at entry/exit because there's one less register used (no 'current') (b) it removes that pointless load of 'current' because it just uses the right argument: - #APP - movq %gs:pcpu_hot(%rip), %r12 - #NO_APP - testq $16384, (%r12) + testq $16384, (%rdi) so I think this is the right thing to do. I checked that the 32-bit code builds and looks sane too. I do think the 'old/new' naming in the FPU code should probably be 'prev/next' to match the switch_to() naming, but I didn't do that. Comments? Linus Linus