* [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
@ 2022-06-06 18:01 Nadav Amit
2022-06-06 20:48 ` Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Nadav Amit @ 2022-06-06 18:01 UTC (permalink / raw)
To: Dave Hansen
Cc: LKML, Nadav Amit, Peter Zijlstra, Ingo Molnar, Andy Lutomirski,
Thomas Gleixner, x86
From: Nadav Amit <namit@vmware.com>
On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
contended and reading it should (arguably) be avoided as much as
possible.
Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
even when it is not necessary (e.g., the mm was already switched).
This is wasteful.
Moreover, one of the existing optimizations is to read mm's tlb_gen to
see if there are additional in-flight TLB invalidations and flush the
entire TLB in such a case. However, if the request's tlb_gen was already
flushed, the benefit of checking the mm's tlb_gen is likely to be offset
by the overhead of the check itself.
Running will-it-scale with tlb_flush1_threads show a considerable
benefit on 56-core Skylake (up to +24%):
threads Baseline (v5.17+) +Patch
1 159960 160202
5 310808 308378 (-0.7%)
10 479110 490728
15 526771 562528
20 534495 587316
25 547462 628296
30 579616 666313
35 594134 701814
40 612288 732967
45 617517 749727
50 637476 735497
55 614363 778913 (+24%)
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
--
Note: The benchmarked kernels include Dave's revert of commit
6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
tlb_is_not_lazy()
---
arch/x86/mm/tlb.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..d9314cc8b81f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
const struct flush_tlb_info *f = info;
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
- u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
bool local = smp_processor_id() == f->initiating_cpu;
unsigned long nr_invalidate = 0;
+ u64 mm_tlb_gen;
/* This code cannot presently handle being reentered. */
VM_WARN_ON(!irqs_disabled());
@@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
return;
}
+ if (f->new_tlb_gen <= local_tlb_gen) {
+ /*
+ * The TLB is already up to date in respect to f->new_tlb_gen.
+ * While the core might be still behind mm_tlb_gen, checking
+ * mm_tlb_gen unnecessarily would have negative caching effects
+ * so avoid it.
+ */
+ return;
+ }
+
+ /*
+ * Defer mm_tlb_gen reading as long as possible to avoid cache
+ * contention.
+ */
+ mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
+
if (unlikely(local_tlb_gen == mm_tlb_gen)) {
/*
* There's nothing to do: we're already up to date. This can
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-06-06 18:01 [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Nadav Amit
@ 2022-06-06 20:48 ` Andy Lutomirski
2022-06-06 21:06 ` Nadav Amit
2022-06-07 16:38 ` [tip: x86/mm] x86/mm/tlb: Avoid " tip-bot2 for Nadav Amit
2022-07-08 3:27 ` [PATCH v2] x86/mm/tlb: avoid " Hugh Dickins
2 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2022-06-06 20:48 UTC (permalink / raw)
To: Nadav Amit, Dave Hansen
Cc: Linux Kernel Mailing List, Nadav Amit, Peter Zijlstra (Intel),
Ingo Molnar, Thomas Gleixner, the arch/x86 maintainers
On Mon, Jun 6, 2022, at 11:01 AM, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
>
> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
> contended and reading it should (arguably) be avoided as much as
> possible.
>
> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
> even when it is not necessary (e.g., the mm was already switched).
> This is wasteful.
>
> Moreover, one of the existing optimizations is to read mm's tlb_gen to
> see if there are additional in-flight TLB invalidations and flush the
> entire TLB in such a case. However, if the request's tlb_gen was already
> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
> by the overhead of the check itself.
>
> Running will-it-scale with tlb_flush1_threads show a considerable
> benefit on 56-core Skylake (up to +24%):
Acked-by: Andy Lutomirski <luto@kernel.org>
But...
I'm suspicious that the analysis is missing something. Under this kind of workload, there are a whole bunch of flushes being initiated, presumably in parallel. Each flush does an RMW on mm_tlb_gen (which will make the cacheline exclusive on the initiating CPU). And each flush sends out an IPI, and the IPI handler reads mm_tlb_gen (which makes the cacheline shared) when it updates the local tlb_gen. So you're doing (at least!) an E->S and S->E transition per flush. Your patch doesn't change this.
But your patch does add a whole new case in which the IPI handler simply doesn't flush! I think it takes either quite a bit of racing or a well-timed context switch to hit that case, but, if you hit it, then you skip a flush and you skip the read of mm_tlb_gen.
Have you tested what happens if you do something like your patch but you also make the mm_tlb_gen read unconditional? I'm curious if there's more to the story than you're seeing.
You could also contemplate a somewhat evil hack in which you don't read mm_tlb_gen even if you *do* flush and instead use f->new_tlb_gen. That would potentially do a bit of extra flushing but would avoid the flush path causing the E->S transition. (Which may be of dubious value for real workloads, since I don't think there's a credible way to avoid having context switches read mm_tlb_gen.)
--Andy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-06-06 20:48 ` Andy Lutomirski
@ 2022-06-06 21:06 ` Nadav Amit
2022-06-07 9:07 ` Nadav Amit
0 siblings, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2022-06-06 21:06 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Dave Hansen, Linux Kernel Mailing List, Peter Zijlstra (Intel),
Ingo Molnar, Thomas Gleixner, the arch/x86 maintainers,
Nadav Amit
On Jun 6, 2022, at 1:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
> ⚠ External Email
>
> On Mon, Jun 6, 2022, at 11:01 AM, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>>
>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>> contended and reading it should (arguably) be avoided as much as
>> possible.
>>
>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>> even when it is not necessary (e.g., the mm was already switched).
>> This is wasteful.
>>
>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>> see if there are additional in-flight TLB invalidations and flush the
>> entire TLB in such a case. However, if the request's tlb_gen was already
>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>> by the overhead of the check itself.
>>
>> Running will-it-scale with tlb_flush1_threads show a considerable
>> benefit on 56-core Skylake (up to +24%):
>
> Acked-by: Andy Lutomirski <luto@kernel.org>
>
> But...
>
> I'm suspicious that the analysis is missing something. Under this kind of workload, there are a whole bunch of flushes being initiated, presumably in parallel. Each flush does an RMW on mm_tlb_gen (which will make the cacheline exclusive on the initiating CPU). And each flush sends out an IPI, and the IPI handler reads mm_tlb_gen (which makes the cacheline shared) when it updates the local tlb_gen. So you're doing (at least!) an E->S and S->E transition per flush. Your patch doesn't change this.
>
> But your patch does add a whole new case in which the IPI handler simply doesn't flush! I think it takes either quite a bit of racing or a well-timed context switch to hit that case, but, if you hit it, then you skip a flush and you skip the read of mm_tlb_gen.
>
> Have you tested what happens if you do something like your patch but you also make the mm_tlb_gen read unconditional? I'm curious if there's more to the story than you're seeing.
>
> You could also contemplate a somewhat evil hack in which you don't read mm_tlb_gen even if you *do* flush and instead use f->new_tlb_gen. That would potentially do a bit of extra flushing but would avoid the flush path causing the E->S transition. (Which may be of dubious value for real workloads, since I don't think there's a credible way to avoid having context switches read mm_tlb_gen.)
Thanks Andy. I still think that the performance comes from saving cache
accesses, which are skipped in certain cases in this workload. I would note
that this patch comes from me profiling will-it-scale, after Dave complained
that I ruined the performance in some other patch. So this is not a random
“I tried something and it’s better”.
I vaguely remember profiling the number of cache-[something] and seeing an
effect, and I cannot explain such performance improvement by just skipping a
flush. But...
Having said all of that, I will run at least the first experiment that you
asked for. I was considering skipping reading mm_tlb_gen completely, but for
the reasons that you mentioned considered it as something that might
introduce performance regression for workloads that are more important than
will-it-scale.
I would also admit that I am not sure how to completely prevent speculative
read of mm->tlb_gen. I guess a serializing instruction is out of the
question, so this optimization is a best-effort.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-06-06 21:06 ` Nadav Amit
@ 2022-06-07 9:07 ` Nadav Amit
0 siblings, 0 replies; 9+ messages in thread
From: Nadav Amit @ 2022-06-07 9:07 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Dave Hansen, Linux Kernel Mailing List, Peter Zijlstra (Intel),
Ingo Molnar, Thomas Gleixner, the arch/x86 maintainers
On Jun 6, 2022, at 2:06 PM, Nadav Amit <namit@vmware.com> wrote:
> On Jun 6, 2022, at 1:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
>> ⚠ External Email
>>
>> On Mon, Jun 6, 2022, at 11:01 AM, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>>
>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>> contended and reading it should (arguably) be avoided as much as
>>> possible.
>>>
>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>> even when it is not necessary (e.g., the mm was already switched).
>>> This is wasteful.
>>>
>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>> see if there are additional in-flight TLB invalidations and flush the
>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>> by the overhead of the check itself.
>>>
>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>> benefit on 56-core Skylake (up to +24%):
>>
>> Acked-by: Andy Lutomirski <luto@kernel.org>
>>
>> But...
>>
>> I'm suspicious that the analysis is missing something. Under this kind of workload, there are a whole bunch of flushes being initiated, presumably in parallel. Each flush does an RMW on mm_tlb_gen (which will make the cacheline exclusive on the initiating CPU). And each flush sends out an IPI, and the IPI handler reads mm_tlb_gen (which makes the cacheline shared) when it updates the local tlb_gen. So you're doing (at least!) an E->S and S->E transition per flush. Your patch doesn't change this.
>>
>> But your patch does add a whole new case in which the IPI handler simply doesn't flush! I think it takes either quite a bit of racing or a well-timed context switch to hit that case, but, if you hit it, then you skip a flush and you skip the read of mm_tlb_gen.
>>
>> Have you tested what happens if you do something like your patch but you also make the mm_tlb_gen read unconditional? I'm curious if there's more to the story than you're seeing.
>>
>> You could also contemplate a somewhat evil hack in which you don't read mm_tlb_gen even if you *do* flush and instead use f->new_tlb_gen. That would potentially do a bit of extra flushing but would avoid the flush path causing the E->S transition. (Which may be of dubious value for real workloads, since I don't think there's a credible way to avoid having context switches read mm_tlb_gen.)
>
> Thanks Andy. I still think that the performance comes from saving cache
> accesses, which are skipped in certain cases in this workload. I would note
> that this patch comes from me profiling will-it-scale, after Dave complained
> that I ruined the performance in some other patch. So this is not a random
> “I tried something and it’s better”.
>
> I vaguely remember profiling the number of cache-[something] and seeing an
> effect, and I cannot explain such performance improvement by just skipping a
> flush. But...
>
> Having said all of that, I will run at least the first experiment that you
> asked for. I was considering skipping reading mm_tlb_gen completely, but for
> the reasons that you mentioned considered it as something that might
> introduce performance regression for workloads that are more important than
> will-it-scale.
>
> I would also admit that I am not sure how to completely prevent speculative
> read of mm->tlb_gen. I guess a serializing instruction is out of the
> question, so this optimization is a best-effort.
Here are the results of my runs. Note that these results are not comparable
to the ones before. This time I had an older machine (Haswell) and the
configuration is slightly different (IIRC the previous run was with PTI
disabled and now it is enabled; arguably this is less favorable for this
patch, since the cache-effect part of the overall TLB shootdown is smaller).
As you noted, Andy, there are two things - related in my mind - that the
patch does. It returns early with no flush if f->new_tlb_gen<=local_tlb_gen
and it tries to avoid touching mm->tlb_gen to minimize cache effects.
You asked to run experiments that separate the effect.
no early return early return
5.18.1 +patch touch mm->tlb_gen no mm->tlb_gen [*]
1 159504 159705 159284 159499
5 326725 320440 323629 303195
10 489841 497678 498601 442976
15 552282 576148 570337 503709
20 588333 628960 619342 551789
25 637319 675633 659985 591575
30 643372 691581 670599 613017
35 677259 706157 689624 646873
40 659309 728078 716718 655364
45 670985 735346 696558 665500
[*] mm->tlb_gen îs completely remove from flush_tlb_func() in this setup
Now, clearly this experiment is limited, and I did not measure the number of
TLB shootdowns, number of cache-misses, etc.
Having said that, the conclusions I get to:
1. The performance benefit appears to come from both the early return and
avoiding touching mm->tlb_gen.
2. Even in this setup, your optimization (Andy) of checking mm->tlb_gen,
pays off. Removing mm->tlb_gen completely from flush_tlb_func is bad.
Now, if you ask me how this whole thing can be further improved, then I
would say that perhaps on remote cores it is best to do the actual TLB flush
after we went over the TLB shootdown entries and figured out if we have
multiple outstanding TLB shootdowns and what the new generation is (using
max of f->new_tlb_gen; without touching mm->tlb_gen). It can be done by
adding a stage in flush_smp_call_function_queue(), but it might slightly
break the abstraction layers.
Having said all of that, I think that the performance improvement is
considerable even in this config (which I remind, is less favorable for the
benchmark).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip: x86/mm] x86/mm/tlb: Avoid reading mm_tlb_gen when possible
2022-06-06 18:01 [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Nadav Amit
2022-06-06 20:48 ` Andy Lutomirski
@ 2022-06-07 16:38 ` tip-bot2 for Nadav Amit
2022-07-08 3:27 ` [PATCH v2] x86/mm/tlb: avoid " Hugh Dickins
2 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Nadav Amit @ 2022-06-07 16:38 UTC (permalink / raw)
To: linux-tip-commits
Cc: Nadav Amit, Dave Hansen, Peter Zijlstra (Intel),
Andy Lutomirski, x86, linux-kernel
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: aa44284960d550eb4d8614afdffebc68a432a9b4
Gitweb: https://git.kernel.org/tip/aa44284960d550eb4d8614afdffebc68a432a9b4
Author: Nadav Amit <namit@vmware.com>
AuthorDate: Mon, 06 Jun 2022 11:01:23 -07:00
Committer: Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Tue, 07 Jun 2022 08:48:03 -07:00
x86/mm/tlb: Avoid reading mm_tlb_gen when possible
On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
contended and reading it should (arguably) be avoided as much as
possible.
Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
even when it is not necessary (e.g., the mm was already switched).
This is wasteful.
Moreover, one of the existing optimizations is to read mm's tlb_gen to
see if there are additional in-flight TLB invalidations and flush the
entire TLB in such a case. However, if the request's tlb_gen was already
flushed, the benefit of checking the mm's tlb_gen is likely to be offset
by the overhead of the check itself.
Running will-it-scale with tlb_flush1_threads show a considerable
benefit on 56-core Skylake (up to +24%):
threads Baseline (v5.17+) +Patch
1 159960 160202
5 310808 308378 (-0.7%)
10 479110 490728
15 526771 562528
20 534495 587316
25 547462 628296
30 579616 666313
35 594134 701814
40 612288 732967
45 617517 749727
50 637476 735497
55 614363 778913 (+24%)
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20220606180123.2485171-1-namit@vmware.com
---
arch/x86/mm/tlb.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d..d9314cc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
const struct flush_tlb_info *f = info;
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
- u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
bool local = smp_processor_id() == f->initiating_cpu;
unsigned long nr_invalidate = 0;
+ u64 mm_tlb_gen;
/* This code cannot presently handle being reentered. */
VM_WARN_ON(!irqs_disabled());
@@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
return;
}
+ if (f->new_tlb_gen <= local_tlb_gen) {
+ /*
+ * The TLB is already up to date in respect to f->new_tlb_gen.
+ * While the core might be still behind mm_tlb_gen, checking
+ * mm_tlb_gen unnecessarily would have negative caching effects
+ * so avoid it.
+ */
+ return;
+ }
+
+ /*
+ * Defer mm_tlb_gen reading as long as possible to avoid cache
+ * contention.
+ */
+ mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
+
if (unlikely(local_tlb_gen == mm_tlb_gen)) {
/*
* There's nothing to do: we're already up to date. This can
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-06-06 18:01 [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Nadav Amit
2022-06-06 20:48 ` Andy Lutomirski
2022-06-07 16:38 ` [tip: x86/mm] x86/mm/tlb: Avoid " tip-bot2 for Nadav Amit
@ 2022-07-08 3:27 ` Hugh Dickins
2022-07-08 4:23 ` Nadav Amit
2 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2022-07-08 3:27 UTC (permalink / raw)
To: Nadav Amit
Cc: Andrew Morton, Dave Hansen, LKML, Nadav Amit, Peter Zijlstra,
Ingo Molnar, Andy Lutomirski, Thomas Gleixner, x86, linux-mm
On Mon, 6 Jun 2022, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
>
> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
> contended and reading it should (arguably) be avoided as much as
> possible.
>
> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
> even when it is not necessary (e.g., the mm was already switched).
> This is wasteful.
>
> Moreover, one of the existing optimizations is to read mm's tlb_gen to
> see if there are additional in-flight TLB invalidations and flush the
> entire TLB in such a case. However, if the request's tlb_gen was already
> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
> by the overhead of the check itself.
>
> Running will-it-scale with tlb_flush1_threads show a considerable
> benefit on 56-core Skylake (up to +24%):
>
> threads Baseline (v5.17+) +Patch
> 1 159960 160202
> 5 310808 308378 (-0.7%)
> 10 479110 490728
> 15 526771 562528
> 20 534495 587316
> 25 547462 628296
> 30 579616 666313
> 35 594134 701814
> 40 612288 732967
> 45 617517 749727
> 50 637476 735497
> 55 614363 778913 (+24%)
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: x86@kernel.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
>
> --
>
> Note: The benchmarked kernels include Dave's revert of commit
> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
> tlb_is_not_lazy()
> ---
> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d400b6d9d246..d9314cc8b81f 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
> const struct flush_tlb_info *f = info;
> struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> - u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
> u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
> bool local = smp_processor_id() == f->initiating_cpu;
> unsigned long nr_invalidate = 0;
> + u64 mm_tlb_gen;
>
> /* This code cannot presently handle being reentered. */
> VM_WARN_ON(!irqs_disabled());
> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
> return;
> }
>
> + if (f->new_tlb_gen <= local_tlb_gen) {
> + /*
> + * The TLB is already up to date in respect to f->new_tlb_gen.
> + * While the core might be still behind mm_tlb_gen, checking
> + * mm_tlb_gen unnecessarily would have negative caching effects
> + * so avoid it.
> + */
> + return;
> + }
> +
> + /*
> + * Defer mm_tlb_gen reading as long as possible to avoid cache
> + * contention.
> + */
> + mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
> +
> if (unlikely(local_tlb_gen == mm_tlb_gen)) {
> /*
> * There's nothing to do: we're already up to date. This can
> --
> 2.25.1
I'm sorry, but bisection and reversion show that this commit,
aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
is responsible for the "internal compiler error: Segmentation fault"s
I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
That tmpfs is using huge pages as much as it can, so splitting and
collapsing, compaction and page migration entailed, in case that's
relevant (maybe this commit is perfect, but there's a TLB flushing
bug over there in mm which this commit just exposes).
Whether those segfaults happen without the huge page element,
I have not done enough testing to tell - there are other bugs with
swapping in current linux-next, indeed, I wouldn't even have found
this one, if I hadn't already been on a bisection for another bug,
and got thrown off course by these segfaults.
I hope that you can work out what might be wrong with this,
but meantime I think it needs to be reverted.
Thanks,
Hugh
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-07-08 3:27 ` [PATCH v2] x86/mm/tlb: avoid " Hugh Dickins
@ 2022-07-08 4:23 ` Nadav Amit
2022-07-08 5:56 ` Nadav Amit
0 siblings, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2022-07-08 4:23 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
Andy Lutomirski, Thomas Gleixner, x86, linux-mm
On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 6 Jun 2022, Nadav Amit wrote:
>
>> From: Nadav Amit <namit@vmware.com>
>>
>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>> contended and reading it should (arguably) be avoided as much as
>> possible.
>>
>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>> even when it is not necessary (e.g., the mm was already switched).
>> This is wasteful.
>>
>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>> see if there are additional in-flight TLB invalidations and flush the
>> entire TLB in such a case. However, if the request's tlb_gen was already
>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>> by the overhead of the check itself.
>>
>> Running will-it-scale with tlb_flush1_threads show a considerable
>> benefit on 56-core Skylake (up to +24%):
>>
>> threads Baseline (v5.17+) +Patch
>> 1 159960 160202
>> 5 310808 308378 (-0.7%)
>> 10 479110 490728
>> 15 526771 562528
>> 20 534495 587316
>> 25 547462 628296
>> 30 579616 666313
>> 35 594134 701814
>> 40 612288 732967
>> 45 617517 749727
>> 50 637476 735497
>> 55 614363 778913 (+24%)
>>
>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: x86@kernel.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>
>> --
>>
>> Note: The benchmarked kernels include Dave's revert of commit
>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>> tlb_is_not_lazy()
>> ---
>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index d400b6d9d246..d9314cc8b81f 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>> const struct flush_tlb_info *f = info;
>> struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>> u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>> - u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>> u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>> bool local = smp_processor_id() == f->initiating_cpu;
>> unsigned long nr_invalidate = 0;
>> + u64 mm_tlb_gen;
>>
>> /* This code cannot presently handle being reentered. */
>> VM_WARN_ON(!irqs_disabled());
>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>> return;
>> }
>>
>> + if (f->new_tlb_gen <= local_tlb_gen) {
>> + /*
>> + * The TLB is already up to date in respect to f->new_tlb_gen.
>> + * While the core might be still behind mm_tlb_gen, checking
>> + * mm_tlb_gen unnecessarily would have negative caching effects
>> + * so avoid it.
>> + */
>> + return;
>> + }
>> +
>> + /*
>> + * Defer mm_tlb_gen reading as long as possible to avoid cache
>> + * contention.
>> + */
>> + mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>> +
>> if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>> /*
>> * There's nothing to do: we're already up to date. This can
>> --
>> 2.25.1
>
> I'm sorry, but bisection and reversion show that this commit,
> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
> is responsible for the "internal compiler error: Segmentation fault"s
> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>
> That tmpfs is using huge pages as much as it can, so splitting and
> collapsing, compaction and page migration entailed, in case that's
> relevant (maybe this commit is perfect, but there's a TLB flushing
> bug over there in mm which this commit just exposes).
>
> Whether those segfaults happen without the huge page element,
> I have not done enough testing to tell - there are other bugs with
> swapping in current linux-next, indeed, I wouldn't even have found
> this one, if I hadn't already been on a bisection for another bug,
> and got thrown off course by these segfaults.
>
> I hope that you can work out what might be wrong with this,
> but meantime I think it needs to be reverted.
I find it always surprising how trivial one liners fail.
As you probably know, debugging these kind of things is hard. I see two
possible cases:
1. The failure is directly related to this optimization. The immediate
suspect in my mind is something to do with PCID/ASID.
2. The failure is due to another bug that was papered by “enough” TLB
flushes.
I will look into the code. But if it is possible, it would be helpful to
know whether you get the failure with the “nopcid” kernel parameter. If it
passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
Not arguing about a revert, but, in some way, if the test fails, it can
indicate that the optimization “works”…
I’ll put some time to look deeper into the code, but it would be very
helpful if you can let me know what happens with nopcid.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-07-08 4:23 ` Nadav Amit
@ 2022-07-08 5:56 ` Nadav Amit
2022-07-08 6:59 ` Nadav Amit
0 siblings, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2022-07-08 5:56 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
Andy Lutomirski, Thomas Gleixner, x86, linux-mm
On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
>
>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>>
>>> From: Nadav Amit <namit@vmware.com>
>>>
>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>> contended and reading it should (arguably) be avoided as much as
>>> possible.
>>>
>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>> even when it is not necessary (e.g., the mm was already switched).
>>> This is wasteful.
>>>
>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>> see if there are additional in-flight TLB invalidations and flush the
>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>> by the overhead of the check itself.
>>>
>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>> benefit on 56-core Skylake (up to +24%):
>>>
>>> threads Baseline (v5.17+) +Patch
>>> 1 159960 160202
>>> 5 310808 308378 (-0.7%)
>>> 10 479110 490728
>>> 15 526771 562528
>>> 20 534495 587316
>>> 25 547462 628296
>>> 30 579616 666313
>>> 35 594134 701814
>>> 40 612288 732967
>>> 45 617517 749727
>>> 50 637476 735497
>>> 55 614363 778913 (+24%)
>>>
>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: x86@kernel.org
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>
>>> --
>>>
>>> Note: The benchmarked kernels include Dave's revert of commit
>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>> tlb_is_not_lazy()
>>> ---
>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>> index d400b6d9d246..d9314cc8b81f 100644
>>> --- a/arch/x86/mm/tlb.c
>>> +++ b/arch/x86/mm/tlb.c
>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>> const struct flush_tlb_info *f = info;
>>> struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>> u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>> - u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>> bool local = smp_processor_id() == f->initiating_cpu;
>>> unsigned long nr_invalidate = 0;
>>> + u64 mm_tlb_gen;
>>>
>>> /* This code cannot presently handle being reentered. */
>>> VM_WARN_ON(!irqs_disabled());
>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>> return;
>>> }
>>>
>>> + if (f->new_tlb_gen <= local_tlb_gen) {
>>> + /*
>>> + * The TLB is already up to date in respect to f->new_tlb_gen.
>>> + * While the core might be still behind mm_tlb_gen, checking
>>> + * mm_tlb_gen unnecessarily would have negative caching effects
>>> + * so avoid it.
>>> + */
>>> + return;
>>> + }
>>> +
>>> + /*
>>> + * Defer mm_tlb_gen reading as long as possible to avoid cache
>>> + * contention.
>>> + */
>>> + mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> +
>>> if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>>> /*
>>> * There's nothing to do: we're already up to date. This can
>>> --
>>> 2.25.1
>>
>> I'm sorry, but bisection and reversion show that this commit,
>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>> is responsible for the "internal compiler error: Segmentation fault"s
>> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>>
>> That tmpfs is using huge pages as much as it can, so splitting and
>> collapsing, compaction and page migration entailed, in case that's
>> relevant (maybe this commit is perfect, but there's a TLB flushing
>> bug over there in mm which this commit just exposes).
>>
>> Whether those segfaults happen without the huge page element,
>> I have not done enough testing to tell - there are other bugs with
>> swapping in current linux-next, indeed, I wouldn't even have found
>> this one, if I hadn't already been on a bisection for another bug,
>> and got thrown off course by these segfaults.
>>
>> I hope that you can work out what might be wrong with this,
>> but meantime I think it needs to be reverted.
>
> I find it always surprising how trivial one liners fail.
>
> As you probably know, debugging these kind of things is hard. I see two
> possible cases:
>
> 1. The failure is directly related to this optimization. The immediate
> suspect in my mind is something to do with PCID/ASID.
>
> 2. The failure is due to another bug that was papered by “enough” TLB
> flushes.
>
> I will look into the code. But if it is possible, it would be helpful to
> know whether you get the failure with the “nopcid” kernel parameter. If it
> passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
>
> Not arguing about a revert, but, in some way, if the test fails, it can
> indicate that the optimization “works”…
>
> I’ll put some time to look deeper into the code, but it would be very
> helpful if you can let me know what happens with nopcid.
Actually, only using “nopcid” would most likely make it go away if we have
PTI enabled. So to get a good indication, a check whether it reproduces with
“nopti” and “nopcid” is needed.
I don’t have a better answer yet. Still trying to see what might have gone
wrong.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
2022-07-08 5:56 ` Nadav Amit
@ 2022-07-08 6:59 ` Nadav Amit
0 siblings, 0 replies; 9+ messages in thread
From: Nadav Amit @ 2022-07-08 6:59 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
Andy Lutomirski, Thomas Gleixner, x86, linux-mm
On Jul 7, 2022, at 10:56 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>
>> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
>>
>>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>>>
>>>> From: Nadav Amit <namit@vmware.com>
>>>>
>>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>>> contended and reading it should (arguably) be avoided as much as
>>>> possible.
>>>>
>>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>>> even when it is not necessary (e.g., the mm was already switched).
>>>> This is wasteful.
>>>>
>>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>>> see if there are additional in-flight TLB invalidations and flush the
>>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>>> by the overhead of the check itself.
>>>>
>>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>>> benefit on 56-core Skylake (up to +24%):
>>>>
>>>> threads Baseline (v5.17+) +Patch
>>>> 1 159960 160202
>>>> 5 310808 308378 (-0.7%)
>>>> 10 479110 490728
>>>> 15 526771 562528
>>>> 20 534495 587316
>>>> 25 547462 628296
>>>> 30 579616 666313
>>>> 35 594134 701814
>>>> 40 612288 732967
>>>> 45 617517 749727
>>>> 50 637476 735497
>>>> 55 614363 778913 (+24%)
>>>>
>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Andy Lutomirski <luto@kernel.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: x86@kernel.org
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>>
>>>> --
>>>>
>>>> Note: The benchmarked kernels include Dave's revert of commit
>>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>>> tlb_is_not_lazy()
>>>> ---
>>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>>> index d400b6d9d246..d9314cc8b81f 100644
>>>> --- a/arch/x86/mm/tlb.c
>>>> +++ b/arch/x86/mm/tlb.c
>>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>>> const struct flush_tlb_info *f = info;
>>>> struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>>> u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>>> - u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>>> u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>>> bool local = smp_processor_id() == f->initiating_cpu;
>>>> unsigned long nr_invalidate = 0;
>>>> + u64 mm_tlb_gen;
>>>>
>>>> /* This code cannot presently handle being reentered. */
>>>> VM_WARN_ON(!irqs_disabled());
>>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>>> return;
>>>> }
>>>>
>>>> + if (f->new_tlb_gen <= local_tlb_gen) {
>>>> + /*
>>>> + * The TLB is already up to date in respect to f->new_tlb_gen.
>>>> + * While the core might be still behind mm_tlb_gen, checking
>>>> + * mm_tlb_gen unnecessarily would have negative caching effects
>>>> + * so avoid it.
>>>> + */
>>>> + return;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Defer mm_tlb_gen reading as long as possible to avoid cache
>>>> + * contention.
>>>> + */
>>>> + mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>>> +
>>>> if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>>>> /*
>>>> * There's nothing to do: we're already up to date. This can
>>>> --
>>>> 2.25.1
>>>
>>> I'm sorry, but bisection and reversion show that this commit,
>>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>>> is responsible for the "internal compiler error: Segmentation fault"s
>>> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>>>
>>> That tmpfs is using huge pages as much as it can, so splitting and
>>> collapsing, compaction and page migration entailed, in case that's
>>> relevant (maybe this commit is perfect, but there's a TLB flushing
>>> bug over there in mm which this commit just exposes).
>>>
>>> Whether those segfaults happen without the huge page element,
>>> I have not done enough testing to tell - there are other bugs with
>>> swapping in current linux-next, indeed, I wouldn't even have found
>>> this one, if I hadn't already been on a bisection for another bug,
>>> and got thrown off course by these segfaults.
>>>
>>> I hope that you can work out what might be wrong with this,
>>> but meantime I think it needs to be reverted.
>>
>> I find it always surprising how trivial one liners fail.
>>
>> As you probably know, debugging these kind of things is hard. I see two
>> possible cases:
>>
>> 1. The failure is directly related to this optimization. The immediate
>> suspect in my mind is something to do with PCID/ASID.
>>
>> 2. The failure is due to another bug that was papered by “enough” TLB
>> flushes.
>>
>> I will look into the code. But if it is possible, it would be helpful to
>> know whether you get the failure with the “nopcid” kernel parameter. If it
>> passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
>>
>> Not arguing about a revert, but, in some way, if the test fails, it can
>> indicate that the optimization “works”…
>>
>> I’ll put some time to look deeper into the code, but it would be very
>> helpful if you can let me know what happens with nopcid.
>
> Actually, only using “nopcid” would most likely make it go away if we have
> PTI enabled. So to get a good indication, a check whether it reproduces with
> “nopti” and “nopcid” is needed.
>
> I don’t have a better answer yet. Still trying to see what might have gone
> wrong.
Ok. My bad. Sorry. arch_tlbbatch_flush() does not set any generation in
flush_tlb_info. Bad.
Should be fixed by something like - I’ll send a patch tomorrow:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d9314cc8b81f..9f19894c322f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -771,7 +771,7 @@ static void flush_tlb_func(void *info)
return;
}
- if (f->new_tlb_gen <= local_tlb_gen) {
+ if (unlikely(f->mm && f->new_tlb_gen <= local_tlb_gen)) {
/*
* The TLB is already up to date in respect to f->new_tlb_gen.
* While the core might be still behind mm_tlb_gen, checking
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-07-08 7:00 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 18:01 [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Nadav Amit
2022-06-06 20:48 ` Andy Lutomirski
2022-06-06 21:06 ` Nadav Amit
2022-06-07 9:07 ` Nadav Amit
2022-06-07 16:38 ` [tip: x86/mm] x86/mm/tlb: Avoid " tip-bot2 for Nadav Amit
2022-07-08 3:27 ` [PATCH v2] x86/mm/tlb: avoid " Hugh Dickins
2022-07-08 4:23 ` Nadav Amit
2022-07-08 5:56 ` Nadav Amit
2022-07-08 6:59 ` Nadav Amit
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.