All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Tue, 3 Jan 2023 19:51:38 +0100	[thread overview]
Message-ID: <Y7R5OokY7P+H2vuD@zx2c4.com> (raw)
In-Reply-To: <Y7Rw1plb/pqPiWgg@gmail.com>

On Tue, Jan 03, 2023 at 07:15:50PM +0100, Ingo Molnar wrote:
> Frankly, I don't appreciate your condescending discussion style that 
> borders on the toxic, and to save time I'm nacking this technical approach 
> until both the patch-set and your reaction to constructive review feedback 
> improves:
> 
>     NAcked-by: Ingo Molnar <mingo@kernel.org>

Your initial email to me did not strike me as constructive at all. All I
gotta say is that you really seem to live up to your reputation here...

But trying to steer things back to the technical realm:

> For a single architecture: x86.
> 
> And it's only 19 lines because x86 already happens to have a bunch of 
> complexity implemented, such as a safe instruction decoder that allows the 
> skipping of an instruction - which relies on thousands of lines of 
> complexity.
> 
> On an architecture where this isn't present, it would have to be 
> implemented to support the instruction-skipping aspect of VM_DROPPABLE.

My assumption is actually the opposite: that x86 (CISC) is basically the
most complex, and that the RISC architectures will all be a good deal
more trivial -- e.g. just adding 4 to IP on some. It looks like some
architectures also already have mature decoders where required.

> Even on x86, it's not common today for the software-decoder to be used in 
> unprivileged code - primary use was debugging & instrumentation code. So 
> your patches bring this piece of complexity to a much larger scope of 
> untrusted user-space functionality.

As far as I can tell, this decoder *is* used with userspace already.
It's used by SEV and by UMIP, in a totally userspace accessible way. Am
I misunderstanding its use there? It looks to me like that operates on
untrusted code.

*However* - if your big objection to this patch is that the instruction
skipping is problematic, we could actually punt that part. The result
will be that userspace just retries the memory write and the fault
happens again, and eventually it succeeds. From a perspective of
vgetrandom(), that's perhaps worse -- with this v14 patchset, it'll
immediately fallback to the syscall under memory pressure -- but you
could argue that nobody really cares about performance at that point
anyway, and so just retrying the fault until it succeeds is a less
complex behavior that would be just fine.

Let me know if you think that'd be an acceptable compromise, and I'll
roll it into v15. As a preview, it pretty much amounts to dropping 3/7
and editing the commit message in this 2/7 patch.

> I did not suggest to swap it: my suggestion is to just pin these vDSO data 
> pages. The per thread memory overhead is infinitesimal on the vast majority 
> of the target systems, and the complexity trade-off you are proposing is 
> poorly reasoned IMO.
> 
> I think my core point that it would be much simpler to simply pin those 
> pages and not introduce rarely-excercised 'discardable memory' semantics in 
> Linux is a fair one - so it's straightforward to lift this NAK.

Okay so this is where I think we're really not lined up and is a large
part of why I wondered whether you'd read the commit messages before
dismissing this. This VM_DROPPABLE mapping comes as a result of a
vgetrandom_alloc() syscall, which (g)libc makes at some point, and then
the result of that is passed to the vDSO getrandom() function. The
memory in vgetrandom_alloc() is then carved up, one per thread, with
(g)libc's various internal pthread creation/exit functions.

So that means this isn't a thing that's trivially limited to just one
per thread. Userspace can call vgetrandom_alloc() all it wants.

Thus, I'm having a hard time seeing how relaxing rlimits here as you
suggested doesn't amount to an rlimit backdoor. I'm also not seeing
other fancy options for "pinning pages" as you mentioned in this email.
Something about having the kernel allocate them on clone()? That seems
terrible and complex. And if you do want this all to go through mlock(),
somehow, there's still the fork() inheritabiity issue. (This was all
discussed on the thread a few versions ago that surfaced these issues,
by the way.)

So I'm not really seeing any good alternatives, no matter how hard I
squint at your suggestions. Maybe you can elaborate a bit?
Alternatively, perhaps the compromise I suggested above where we ditch
the instruction decoder stuff is actually fine with you?

> rarely-excercised 'discardable memory' semantics in 
> Linux is a fair one - so it's straightforward to lift this NAK.

I still don't think calling this "rarely-exercised" is true. Desktop
machines regularly OOM with lots of Chrome tabs, and this is
functionality that'll be in glibc, so it'll be exercised quite often.
Even on servers, many operators work with the philosophy that unused RAM
is wasted RAM, and so servers are run pretty close to capacity. Not to
mention Android, where lots of handsets have way too little RAM.
Swapping and memory pressure and so forth is very real. So claiming that
this is somehow obscure or rarely used or what have you isn't very
accurate. These are code paths that will certainly get exercised.

Jason

  reply	other threads:[~2023-01-03 18:51 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 16:29 [PATCH v14 0/7] implement getrandom() in vDSO Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace Jason A. Donenfeld
2023-01-03 10:32   ` Ingo Molnar
2023-01-03 14:51     ` Jason A. Donenfeld
2023-01-03 17:00       ` Ingo Molnar
2023-01-03 17:29         ` Borislav Petkov
2023-01-03 17:30           ` Jason A. Donenfeld
2023-01-03 17:47             ` Ingo Molnar
2023-01-03 17:48               ` Jason A. Donenfeld
2023-01-04 20:25               ` Ingo Molnar
2023-01-04 20:29                 ` Jason A. Donenfeld
2023-01-03 11:00   ` [tip: x86/asm] x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type tip-bot2 for Jason A. Donenfeld
2023-01-03 17:53   ` [tip: x86/urgent] " tip-bot2 for Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2023-01-03 10:50   ` Ingo Molnar
2023-01-03 15:01     ` Jason A. Donenfeld
2023-01-03 18:15       ` Ingo Molnar
2023-01-03 18:51         ` Jason A. Donenfeld [this message]
2023-01-03 18:36     ` Andy Lutomirski
2023-01-03 19:05       ` Jason A. Donenfeld
2023-01-03 20:52         ` Andy Lutomirski
2023-01-03 19:19       ` Linus Torvalds
2023-01-03 19:35         ` Jason A. Donenfeld
2023-01-03 19:54           ` Linus Torvalds
2023-01-03 20:03             ` Jason A. Donenfeld
2023-01-03 20:15               ` Linus Torvalds
2023-01-03 20:25                 ` Linus Torvalds
2023-01-03 20:44                 ` Jason A. Donenfeld
2023-01-05 21:57                   ` Yann Droneaud
2023-01-05 22:57                     ` Jason A. Donenfeld
2023-01-06  1:02                       ` Linus Torvalds
2023-01-06  2:08                         ` Linus Torvalds
2023-01-06  2:42                           ` Jason A. Donenfeld
2023-01-06 20:53                           ` Andy Lutomirski
2023-01-06 21:10                             ` Linus Torvalds
2023-01-10 11:01                               ` Dr. Greg
2023-01-06 21:36                             ` Jason A. Donenfeld
2023-01-06 21:42                           ` Matthew Wilcox
2023-01-06 22:06                             ` Linus Torvalds
2023-01-06  2:14                         ` Jason A. Donenfeld
2023-01-09 10:34             ` Florian Weimer
2023-01-09 14:28               ` Linus Torvalds
2023-01-11  7:27                 ` Eric Biggers
2023-01-11 12:07                   ` Linus Torvalds
2023-01-01 16:29 ` [PATCH v14 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 4/7] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 5/7] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 6/7] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2023-01-12 17:27   ` Christophe Leroy
2023-01-12 17:49     ` Jason A. Donenfeld
2023-01-11 22:23 ` [PATCH v14 0/7] implement getrandom() in vDSO Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7R5OokY7P+H2vuD@zx2c4.com \
    --to=jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.