All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <vschneid@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@intel.com>
Cc: "Nadav Amit" <namit@vmware.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"linux-trace-kernel@vger.kernel.org"
	<linux-trace-kernel@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, bpf <bpf@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Frederic Weisbecker" <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Neeraj Upadhyay" <quic_neeraju@quicinc.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Uladzislau Rezki" <urezki@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lorenzo Stoakes" <lstoakes@gmail.com>,
	"Josh Poimboeuf" <jpoimboe@kernel.org>,
	"Jason Baron" <jbaron@akamai.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Juerg Haefliger" <juerg.haefliger@canonical.com>,
	"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Dan Carpenter" <error27@gmail.com>,
	"Chuang Wang" <nashuiliang@gmail.com>,
	"Yang Jihong" <yangjihong1@huawei.com>,
	"Petr Mladek" <pmladek@suse.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	"Song Liu" <song@kernel.org>,
	"Julian Pidancet" <julian.pidancet@oracle.com>,
	"Tom Lendacky" <thomas.lendacky@amd.com>,
	"Dionna Glaze" <dionnaglaze@google.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	"Yair Podemsky" <ypodemsk@redhat.com>
Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs
Date: Tue, 25 Jul 2023 15:03:15 +0100	[thread overview]
Message-ID: <xhsmhr0owrsv0.mognet@vschneid.remote.csb> (raw)
In-Reply-To: <20230725132155.GJ3765278@hirez.programming.kicks-ass.net>


Sorry, I missed out Dave's email, so now I'm taking my time to page (hah!)
all of this.

On 25/07/23 15:21, Peter Zijlstra wrote:
> On Mon, Jul 24, 2023 at 10:40:04AM -0700, Dave Hansen wrote:
>
>> TLB flushes for freed page tables are another game entirely.  The CPU is
>> free to cache any part of the paging hierarchy it wants at any time.
>> It's also free to set accessed and dirty bits at any time, even for
>> instructions that may never execute architecturally.
>>
>> That basically means that if you have *ANY* freed page table page
>> *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're
>> screwed.
>>
>> There's no reasoning about accesses or ordering.  As soon as the CPU
>> does *anything*, it's out to get you.
>>

OK, I feel like I need to go back do some more reading now, but I think I
get the difference. Thanks for spelling it out.

>> You're going to need to do something a lot more radical to deal with
>> free page table pages.
>
> Ha! IIRC the only thing we can reasonably do there is to have strict
> per-cpu page-tables such that NOHZ_FULL CPUs can be isolated. That is,
> as long we the per-cpu tables do not contain -- and have never contained
> -- a particular table page, we can avoid flushing it. Because if it
> never was there, it also couldn't have speculatively loaded it.
>
> Now, x86 doesn't really do per-cpu page tables easily (otherwise we'd
> have done them ages ago) and doing them is going to be *major* surgery
> and pain.
>
> Other than that, we must take the TLBI-IPI when freeing
> page-table-pages.
>
>
> But yeah, I think Nadav is right, vmalloc.c never frees page-tables (or
> at least, I couldn't find it in a hurry either), but if we're going to
> be doing this, then that file must include a very prominent comment
> explaining it must never actually do so either.
>

I also couldn't find any freeing of the page-table-pages, I'll do another
pass and sharpen my quill for a big fat comment.

> Not being able to free page-tables might be a 'problem' if we're going
> to be doing more of HUGE_VMALLOC, because that means it becomes rather
> hard to swizzle from small to large pages.


  reply	other threads:[~2023-07-25 14:04 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 16:30 [RFC PATCH v2 00/20] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 01/20] tracing/filters: Dynamically allocate filter_pred.regex Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 02/20] tracing/filters: Enable filtering a cpumask field by another cpumask Valentin Schneider
2023-07-26 19:41   ` Josh Poimboeuf
2023-07-27  9:46     ` Valentin Schneider
2023-07-29 19:09     ` Steven Rostedt
2023-07-31 11:19       ` Valentin Schneider
2023-07-31 15:48         ` Steven Rostedt
2023-07-20 16:30 ` [RFC PATCH v2 03/20] tracing/filters: Enable filtering a scalar field by a cpumask Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 04/20] tracing/filters: Enable filtering the CPU common " Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 05/20] tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU Valentin Schneider
2023-07-29 19:34   ` Steven Rostedt
2023-07-31 11:20     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 06/20] tracing/filters: Optimise scalar vs cpumask filtering when the " Valentin Schneider
2023-07-29 19:55   ` Steven Rostedt
2023-07-31 11:20     ` Valentin Schneider
2023-07-31 12:07     ` Dan Carpenter
2023-07-31 15:54       ` Steven Rostedt
2023-07-31 16:03         ` Dan Carpenter
2023-07-31 17:20           ` Valentin Schneider
2023-07-31 18:16           ` Steven Rostedt
2023-07-20 16:30 ` [RFC PATCH v2 07/20] tracing/filters: Optimise CPU " Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 08/20] tracing/filters: Further optimise scalar vs cpumask comparison Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 09/20] tracing/filters: Document cpumask filtering Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 10/20] jump_label,module: Don't alloc static_key_mod for __ro_after_init keys Valentin Schneider
2023-07-28 22:04   ` Peter Zijlstra
2023-07-20 16:30 ` [RFC PATCH v2 11/20] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2023-07-28 15:33   ` Josh Poimboeuf
2023-07-31 11:16     ` Valentin Schneider
2023-07-31 21:36       ` Josh Poimboeuf
2023-07-31 21:46         ` Peter Zijlstra
2023-08-01 16:06           ` Josh Poimboeuf
2023-08-01 18:12             ` Peter Zijlstra
2023-07-20 16:30 ` [RFC PATCH v2 12/20] objtool: Warn about non __ro_after_init static key usage in .noinstr Valentin Schneider
2023-07-28 15:35   ` Josh Poimboeuf
2023-07-31 11:18     ` Valentin Schneider
2023-07-28 16:02   ` Josh Poimboeuf
2023-07-31 11:18     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 13/20] context_tracking: Make context_tracking_key __ro_after_init Valentin Schneider
2023-07-28 16:00   ` Josh Poimboeuf
2023-07-31 11:16     ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 14/20] x86/kvm: Make kvm_async_pf_enabled __ro_after_init Valentin Schneider
2023-10-09 16:40   ` Maxim Levitsky
2023-07-20 16:30 ` [RFC PATCH v2 15/20] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2023-07-24 14:52   ` Frederic Weisbecker
2023-07-24 16:55     ` Valentin Schneider
2023-07-24 19:18       ` Frederic Weisbecker
2023-07-25 10:10         ` Valentin Schneider
2023-07-25 11:22           ` Frederic Weisbecker
2023-07-25 13:05             ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 16/20] rcu: Make RCU dynticks counter size configurable Valentin Schneider
2023-07-21  8:17   ` Valentin Schneider
2023-07-21 14:10     ` Paul E. McKenney
2023-07-21 15:08       ` Valentin Schneider
2023-07-21 16:09         ` Paul E. McKenney
2023-07-20 16:30 ` [RFC PATCH v2 17/20] rcutorture: Add a test config to torture test low RCU_DYNTICKS width Valentin Schneider
2023-07-20 19:53   ` Paul E. McKenney
2023-07-21  4:00     ` Paul E. McKenney
2023-07-21  7:58       ` Valentin Schneider
2023-07-21 14:07         ` Paul E. McKenney
2023-07-21 15:08           ` Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 18/20] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2023-07-25 10:49   ` Joel Fernandes
2023-07-25 13:36     ` Valentin Schneider
2023-07-25 17:41       ` Joel Fernandes
2023-07-25 13:39     ` Peter Zijlstra
2023-07-25 17:47       ` Joel Fernandes
2023-07-20 16:30 ` [RFC PATCH v2 19/20] context_tracking,x86: Add infrastructure to defer kernel TLBI Valentin Schneider
2023-07-20 16:30 ` [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2023-07-21 18:15   ` Nadav Amit
2023-07-24 11:32     ` Valentin Schneider
2023-07-24 17:40       ` Dave Hansen
2023-07-25 13:21         ` Peter Zijlstra
2023-07-25 14:03           ` Valentin Schneider [this message]
2023-07-25 16:37         ` Marcelo Tosatti
2023-07-25 17:12           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xhsmhr0owrsv0.mognet@vschneid.remote.csb \
    --to=vschneid@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dionnaglaze@google.com \
    --cc=error27@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jbaron@akamai.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=jpoimboe@kernel.org \
    --cc=juerg.haefliger@canonical.com \
    --cc=julian.pidancet@oracle.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=namit@vmware.com \
    --cc=nashuiliang@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=nsaenz@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=qiang.zhang1211@gmail.com \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=urezki@gmail.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    --cc=yangjihong1@huawei.com \
    --cc=ypodemsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.