All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Intervals to schedule the worker for mix_interrupt_randomness().
@ 2022-02-28 18:12 Sebastian Andrzej Siewior
  2022-02-28 18:58 ` Jason A. Donenfeld
  0 siblings, 1 reply; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-28 18:12 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-crypto, Thomas Gleixner, Peter Zijlstra,
	Eric Biggers, Theodore Ts'o, Dominik Brodowski

Hi Jason,

I was debugging my backport and looking for a bug but then I figured out
that everything works as intended.

add_interrupt_randomness() has this piece:
|         if (new_count < 64 && (!time_after(now, fast_pool->last + HZ) ||
|                                unlikely(crng_init == 0)))
|                 return;

I was reading this wrong the whole time as in (with context):
   If new_count is greater or equal to 64 and a second passed by
   schedule the worker for mix_interrupt_randomness().

Which means a worker once a second on a CPU or longer intervals if the
CPU is idle. But in reality this is:
   If new_count is greater _or_ equal to 64 or a at least second passed
   by then schedule the worker for mix_interrupt_randomness().

This explains why there are some many worker pending/ running for
mix_interrupt_randomness() in my testing. To give you an example why
this got my attention:

| [   15.763823] random: mix_interrupt_randomness() CPU16, HAZ: 2
| [   15.763823] random: mix_interrupt_randomness() CPU17, HAZ: 2
| [   15.763826] random: mix_interrupt_randomness() CPU07, HAZ: 2
| [   15.763827] random: mix_interrupt_randomness() CPU06, HAZ: 4
| [   15.763827] random: mix_interrupt_randomness() CPU04, HAZ: 2
| [   18.579328] random: mix_interrupt_randomness() CPU18, HAZ: 3
| [   18.579357] random: mix_interrupt_randomness() CPU16, HAZ: 1
| [   18.579358] random: mix_interrupt_randomness() CPU03, HAZ: 2
| [   18.579358] random: mix_interrupt_randomness() CPU17, HAZ: 1
| [   18.579359] random: mix_interrupt_randomness() CPU04, HAZ: 1
| [   18.579360] random: mix_interrupt_randomness() CPU05, HAZ: 1
| [   18.579361] random: mix_interrupt_randomness() CPU06, HAZ: 1
| [   18.579362] random: mix_interrupt_randomness() CPU07, HAZ: 1
| [   20.531244] random: mix_interrupt_randomness() CPU18, HAZ: 2
| [   20.531266] random: mix_interrupt_randomness() CPU16, HAZ: 1
| [   20.531267] random: mix_interrupt_randomness() CPU03, HAZ: 2
| [   20.531267] random: mix_interrupt_randomness() CPU17, HAZ: 1
| [   20.531269] random: mix_interrupt_randomness() CPU04, HAZ: 1
| [   20.531270] random: mix_interrupt_randomness() CPU05, HAZ: 1
| [   20.531270] random: mix_interrupt_randomness() CPU06, HAZ: 1
| [   20.531271] random: mix_interrupt_randomness() CPU07, HAZ: 1
| [   22.515212] random: mix_interrupt_randomness() CPU18, HAZ: 2
| [   22.515240] random: mix_interrupt_randomness() CPU16, HAZ: 2
| [   22.515240] random: mix_interrupt_randomness() CPU17, HAZ: 1
| [   22.515241] random: mix_interrupt_randomness() CPU03, HAZ: 1
| [   22.515242] random: mix_interrupt_randomness() CPU04, HAZ: 1
| [   22.515242] random: mix_interrupt_randomness() CPU05, HAZ: 1
| [   22.515244] random: mix_interrupt_randomness() CPU06, HAZ: 1
| [   22.515244] random: mix_interrupt_randomness() CPU07, HAZ: 1
| [   23.948447] random: mix_interrupt_randomness() CPU18, HAZ: 1
| [   23.948744] random: mix_interrupt_randomness() CPU16, HAZ: 1
| [   24.531151] random: mix_interrupt_randomness() CPU17, HAZ: 1
| [   24.531152] random: mix_interrupt_randomness() CPU03, HAZ: 1
| [   24.531153] random: mix_interrupt_randomness() CPU04, HAZ: 1
| [   24.531153] random: mix_interrupt_randomness() CPU05, HAZ: 1
| [   24.531154] random: mix_interrupt_randomness() CPU06, HAZ: 1
| [   24.531155] random: mix_interrupt_randomness() CPU07, HAZ: 1
| [   25.034401] random: mix_interrupt_randomness() CPU16, HAZ: 2
| [   25.074542] random: mix_interrupt_randomness() CPU18, HAZ: 38
| [   25.566450] random: mix_interrupt_randomness() CPU03, HAZ: 19
| [   25.598532] random: mix_interrupt_randomness() CPU05, HAZ: 15
| [   25.726046] random: mix_interrupt_randomness() CPU18, HAZ: 64
| [   25.802257] random: mix_interrupt_randomness() CPU18, HAZ: 64
| [   26.085382] random: mix_interrupt_randomness() CPU18, HAZ: 64

This output comes from the hack at the end of the email (not properly
formatted). Since the box is idle and runs NO_HZ it is possible that a
CPU is idle for a second or longer. If you look at the begin of the
output, CPU16 scheduled the worker for mix_interrupt_randomness() with
fast_pool::count = 2. So did CPU 17, 7 and 6. At the end log you see
CPU18 got busy and scheduled the worker more frequently since it
acquired 64 interrupts. This is output includes 10 seconds and CPUs 0
and 1 are not part of the log.

Is this really what we want? With HAZ=1 the CPU was woken up from idle
poll routine so the registers should have always the same content since
it is always the same while() routine calling CPU's idle function. At
least get_reg() rotates them but instruction_pointer() would return
always the same value. (Side note: on 64bit the upper 32bit of the IP
register should be all 0xff…ff for the kernel and 0x00…00 for userland.
So this looks like one bit of entropy. I mention this now because I
noticed that 32bit throws an additional register to the mix to make up
for the small register)).

Wouldn't it make sense entropy wise to gather more entropy (say the 64)
before consuming it? And waiting at least a second if more entropy was
created?

With CONFIG_PERIODIC you have at least CONFIG_HZ interrupts on each CPU.
"At least" because you have the timer tick interrupt and you may have
additional interrupt for your device interrupts.  With HZ=1000 you have
1000 timer tick interrupts.
That is the one extreme. The other is NO_HZ and an idle box. [ Now that
I look around it appears that risc-v has no get_irq_regs() and
random_get_entropy() may return 0. Spooky. ]

--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1254,6 +1254,7 @@ static unsigned long get_reg(struct fast_pool *f, struct pt_regs *regs)
        return *ptr;
 }
 
+enum { MIX_INFLIGHT = 1U << 31 };
 static void mix_interrupt_randomness(struct work_struct *work)
 {
        struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
@@ -1271,6 +1272,9 @@ static void mix_interrupt_randomness(struct work_struct *work)
         * consistent view, before we reenable irqs again.
         */
        memcpy(pool, fast_pool->pool32, sizeof(pool));
+       pr_err("%s() CPU%02d, HAZ: %d\n", __func__, smp_processor_id(),
+              READ_ONCE(fast_pool->count) & ~MIX_INFLIGHT);
+       WARN_ON((READ_ONCE(fast_pool->count) & MIX_INFLIGHT) == 0);
        fast_pool->count = 0;
        fast_pool->last = jiffies;
        local_irq_enable();
@@ -1288,7 +1292,6 @@ static void mix_interrupt_randomness(struct work_struct *work)
 
 void add_interrupt_randomness(int irq)
 {
-       enum { MIX_INFLIGHT = 1U << 31 };
        cycles_t cycles = random_get_entropy();
        unsigned long now = jiffies;
        struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: Intervals to schedule the worker for mix_interrupt_randomness().
  2022-02-28 18:12 RFC: Intervals to schedule the worker for mix_interrupt_randomness() Sebastian Andrzej Siewior
@ 2022-02-28 18:58 ` Jason A. Donenfeld
  2022-03-01 15:58   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 3+ messages in thread
From: Jason A. Donenfeld @ 2022-02-28 18:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, Linux Crypto Mailing List, Thomas Gleixner, Peter Zijlstra,
	Eric Biggers, Theodore Ts'o, Dominik Brodowski

Hi Sebastian,

I'm actually trying quite hard not to change the details of entropy
gathering for 5.18. There are lots of little arguments for why each
little thing is the way it is, and these have built up over a long
period of time, and reworking /that/ requires some anthropological
work to understand all the original intents, which I consider to be a
bit of a different "project" from what I'm working on now with
random.c. So I'd like to minimize changes to the semantics. Right now,
those semantics are:

A) crng_init==0: pre_init_inject after 64 interrupts.
B) crng_init!=0: mix_pool_bytes after 64 interrupts OR after 1 second
has elapsed.

In trying to "reverse engineer" what the rationales for (A) and (B)
might be, I imagine the reasoning breaks down to:

A) Since crng_pre_init_inject will increase crng_init_cnt by 16, we
want to make sure it's decently entropic [maybe? kind of a weak
argument perhaps].
B) We're crediting only 1 bit in this case, so let's consider that to
have accumulated after 64 fairly fast interrupts, which may be
"regular", or after a fewer amount that occur within a second, since
these might be irregular, and so perhaps more entropic per-each. That
sounds mostly sensible to me, and rather conservative too. So it seems
okay with me.

Maybe we can revisit these for 5.19, but I'd like not to tinker too
much with that for now. I think I can actually argue both for and
against the points I tried to synthesize in (A) and (B) above. It's
also related to what we collect, which you alluded to in your message.
Right now we're collecting:

1) random_get_entropy() (i.e. RDTSC): this is the big huge important
thing that's captured.
2) jiffies: does this matter if we're already gathering RDTSC? I can
see an argument both ways. More analysis and thought required
3) the irq number: maybe it matters, maybe it doesn't.
4) the return address: maybe it matters, maybe it doesn't.
5) the value of some register: maybe it matters, maybe it doesn't.

Maybe this could be reduced to only capturing (1); maybe we benefit
from having all of (1)-(5). Again, I can see the argument both ways,
and I think that needs a lot more investigation and thought, and if
that's to happen, it really seems like a 5.19 thing rather than a 5.18
thing at this point.

But all this brings me to what I'm really wondering when reading your
email: do your observations matter? Are you observing a performance or
reliability issue or something like that with those workqueues
pending? Is this whole workqueue approach a mistake and we should
revert it? Or is it still okay, but you were just idly wondering about
that time limit? As you can tell, I'm mostly concerned with not
breaking something by accident.

Regards,
Jason

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFC: Intervals to schedule the worker for mix_interrupt_randomness().
  2022-02-28 18:58 ` Jason A. Donenfeld
@ 2022-03-01 15:58   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-01 15:58 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: LKML, Linux Crypto Mailing List, Thomas Gleixner, Peter Zijlstra,
	Eric Biggers, Theodore Ts'o, Dominik Brodowski

On 2022-02-28 19:58:05 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> I'm actually trying quite hard not to change the details of entropy
> gathering for 5.18. There are lots of little arguments for why each
> random.c. So I'd like to minimize changes to the semantics. Right now,
> those semantics are:
> 
> A) crng_init==0: pre_init_inject after 64 interrupts.
> B) crng_init!=0: mix_pool_bytes after 64 interrupts OR after 1 second
> has elapsed.

Yes. I double checked, that it was not a recent change during that
rework. So yes, lets keep it as is, I just wanted to point that out.
…
> But all this brings me to what I'm really wondering when reading your
> email: do your observations matter? Are you observing a performance or
> reliability issue or something like that with those workqueues
> pending? Is this whole workqueue approach a mistake and we should
> revert it? Or is it still okay, but you were just idly wondering about
> that time limit? As you can tell, I'm mostly concerned with not
> breaking something by accident.

I noticed it because I backported the required patches (not everything
from your queue, just the patches I needed to drop mine and have
everything working). During testing I noticed that the worker is
scheduled more often than I expected and I looked that it is scheduled
and not accidentally stops due to a backport that went wrong. And since
I got the condition wrong…

But you are asking for performance. I run b.sh which does:
- unpack a kernel to /dev/shm
- build allmodconfig

and then invoked it with "perf stat -r 5 -a --table ./b.sh" to get some
numbers. I applied your complete queue on top of v5.17-rc6, and the
result was:

| Performance counter stats for 'system wide' (5 runs):
|
|     45.502.822,32 msec cpu-clock                 #   32,014 CPUs utilized            ( +-  0,05% )
|         9.479.371      context-switches          #  208,419 /sec                     ( +-  0,08% )
|           839.380      cpu-migrations            #   18,455 /sec                     ( +-  0,38% )
|       624.839.341      page-faults               #   13,738 K/sec                    ( +-  0,00% )
|105.297.794.633.131      cycles                    #    2,315 GHz                      ( +-  0,01% )
|77.238.191.940.405      stalled-cycles-frontend   #   73,37% frontend cycles idle     ( +-  0,01% )
|56.724.314.805.475      stalled-cycles-backend    #   53,89% backend cycles idle      ( +-  0,02% )
|69.889.082.499.264      instructions              #    0,66  insn per cycle         
|                                                  #    1,10  stalled cycles per insn  ( +-  0,00% )
|14.670.304.314.265      branches                  #  322,550 M/sec                    ( +-  0,00% )
|   561.326.606.978      branch-misses             #    3,83% of all branches          ( +-  0,02% )
|
|          # Table of individual measurements:
|          1419,113 (-2,247) #
|          1422,552 (+1,192) #
|          1420,773 (-0,587) #
|          1422,362 (+1,002) #
|          1422,001 (+0,641) #
|
|          # Final result:
|          1421,360 +- 0,641 seconds time elapsed  ( +-  0,05% )

Checked a few commit earlier, before the workqueue rework started
  "random: rewrite header introductory comment":
| Performance counter stats for 'system wide' (5 runs):
|
|     45.508.013,44 msec cpu-clock                 #   32,034 CPUs utilized            ( +-  0,05% )
|         9.456.280      context-switches          #  208,017 /sec                     ( +-  0,11% )
|           837.148      cpu-migrations            #   18,415 /sec                     ( +-  0,30% )
|       624.851.749      page-faults               #   13,745 K/sec                    ( +-  0,00% )
|105.289.268.852.002      cycles                    #    2,316 GHz                      ( +-  0,01% )
|77.233.457.186.415      stalled-cycles-frontend   #   73,38% frontend cycles idle     ( +-  0,02% )
|56.740.014.447.074      stalled-cycles-backend    #   53,91% backend cycles idle      ( +-  0,02% )
|69.882.802.096.982      instructions              #    0,66  insn per cycle
|                                                  #    1,10  stalled cycles per insn  ( +-  0,00% )
|14.670.395.601.080      branches                  #  322,716 M/sec                    ( +-  0,00% )
|   560.846.203.691      branch-misses             #    3,82% of all branches          ( +-  0,01% )
|
|          # Table of individual measurements:
|          1418,288 (-2,347) #
|          1420,425 (-0,210) #
|          1420,633 (-0,001) #
|          1421,665 (+1,030) #
|          1422,162 (+1,528) #
|
|          # Final result:
|          1420,635 +- 0,669 seconds time elapsed  ( +-  0,05% )

and then on v5.17-rc6:
| Performance counter stats for 'system wide' (5 runs):
|
|     45.459.406,05 msec cpu-clock                 #   32,009 CPUs utilized            ( +-  0,04% )
|         9.460.380      context-switches          #  208,171 /sec                     ( +-  0,10% )
|           837.571      cpu-migrations            #   18,430 /sec                     ( +-  0,30% )
|       624.859.326      page-faults               #   13,750 K/sec                    ( +-  0,00% )
|105.247.262.852.106      cycles                    #    2,316 GHz                      ( +-  0,01% )
|77.185.603.119.285      stalled-cycles-frontend   #   73,34% frontend cycles idle     ( +-  0,01% )
|56.688.996.383.094      stalled-cycles-backend    #   53,87% backend cycles idle      ( +-  0,02% )
|69.883.077.705.602      instructions              #    0,66  insn per cycle         
|                                                  #    1,10  stalled cycles per insn  ( +-  0,00% )
|14.670.347.661.094      branches                  #  322,813 M/sec                    ( +-  0,00% )
|   561.066.414.554      branch-misses             #    3,82% of all branches          ( +-  0,01% )
|
|          # Table of individual measurements:
|          1418,142 (-2,061) #
|          1420,187 (-0,016) #
|          1421,242 (+1,039) #
|          1420,800 (+0,597) #
|          1420,644 (+0,441) #
|
|          # Final result:
|          1420,203 +- 0,542 seconds time elapsed  ( +-  0,04% )

It does not appear that something stands out.

> Regards,
> Jason

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-01 15:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28 18:12 RFC: Intervals to schedule the worker for mix_interrupt_randomness() Sebastian Andrzej Siewior
2022-02-28 18:58 ` Jason A. Donenfeld
2022-03-01 15:58   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.