I think I narrowed this down. __native_flush_tlb_single() depends on cpu_tlbstate.loaded_mm_asid matching what is in CR3. But, while we are still "early" in boot, CR3 has hardware ASID=0, but cpu_tlbstate.loaded_mm_asid=0 which is actually hardware ASID=1. So, we have ASID=0 in CR3 and we try to *flush* ASID=1 with INVPCID, which does nothing for us, effectively missing the TLB flush. I think we need to steer __native_flush_tlb_single() into the "!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)" path if we get called before initialize_tlbstate_and_flush() gives us a "real" ASID in CR3, but I haven't found a nice way to do it, yet. We probably also need a debugging warning in there to read CR3 and check it against cpu_tlbstate.loaded_mm_asid. I'll look at this in some more detail tomorrow if nobody beats me to it.