linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	"Carlos O'Donell" <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Tue, 3 Jan 2023 12:52:07 -0800	[thread overview]
Message-ID: <CALCETrXaHPZMNx7g2NS9-5ShG3i74615W7gKQ2tmr4xpvgTBkA@mail.gmail.com> (raw)
In-Reply-To: <Y7R8Zq6sIKAIprtr@zx2c4.com>

On Tue, Jan 3, 2023 at 11:06 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Andy,
>
> Thanks for your constructive suggestions.
>
> On Tue, Jan 03, 2023 at 10:36:01AM -0800, Andy Lutomirski wrote:
> > > > c) If there's not enough memory to service a page fault, it's not fatal,
> > > >    and no signal is sent. Instead, writes are simply lost.
> >
> > This just seems massively overcomplicated to me.  If there isn't
> > enough memory to fault in a page of code, we don't have some magic
> > instruction emulator in the kernel.  We either OOM or we wait for
> > memory to show up.
>
> Before addressing the other parts of your email, I thought I'd touch on
> this. Quoting from the email I just wrote Ingo:
>
> | *However* - if your big objection to this patch is that the instruction
> | skipping is problematic, we could actually punt that part. The result
> | will be that userspace just retries the memory write and the fault
> | happens again, and eventually it succeeds. From a perspective of
> | vgetrandom(), that's perhaps worse -- with this v14 patchset, it'll
> | immediately fallback to the syscall under memory pressure -- but you
> | could argue that nobody really cares about performance at that point
> | anyway, and so just retrying the fault until it succeeds is a less
> | complex behavior that would be just fine.
> |
> | Let me know if you think that'd be an acceptable compromise, and I'll
> | roll it into v15. As a preview, it pretty much amounts to dropping 3/7
> | and editing the commit message in this 2/7 patch.
>
> IOW, I think the main ideas of the patch work just fine without "point
> c" with the instruction skipping. Instead, waiting/retrying could
> potentially work. So, okay, it seems like the two of you both hate the
> instruction decoder stuff, so I'll plan on working that part in, in one
> way or another, for v15.
>
> > On Tue, Jan 3, 2023 at 2:50 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > > The vDSO getrandom() implementation works with a buffer allocated with a
> > > > new system call that has certain requirements:
> > > >
> > > > - It shouldn't be written to core dumps.
> > > >   * Easy: VM_DONTDUMP.
> > > > - It should be zeroed on fork.
> > > >   * Easy: VM_WIPEONFORK.
> >
> > I have a rather different suggestion: make a special mapping.  Jason,
> > you're trying to shoehorn all kinds of bizarre behavior into the core
> > mm, and none of that seems to me to belong to the core mm.  Instead,
> > have an actual special mapping with callbacks that does the right
> > thing.  No fancy VM flags.
>
> Oooo! I like this. Avoiding adding VM_* flags would indeed be nice.
> I had seen things that I thought looked in this direction with the shmem
> API, but when I got into the details, it looked like this was meant for
> something else and couldn't address most of what I wanted here.
>
> If you say this is possible, I'll look again to see if I can figure it
> out. Though, if you have some API name at the top of your head, you
> might save me some code squinting time.

Look for _install_special_mapping().

--Andy

> > Want to mlock it?  No, don't do that -- that's absurd.  Just arrange
> > so that, if it gets evicted, it's not written out anywhere.  And when
> > it gets faulted back in it does the right thing -- see above.
>
> Presumably mlock calls are redirected to some function pointer so I can
> just return EINTR?

Or just don't worry about it.  If someone mlocks() it, that's their
problem.  The point is that no one needs to.

>
> > Zero on fork?  I'm sure that's manageable with a special mapping.  If
> > not, you can add a new vm operation or similar to make it work.  (Kind
> > of like how we extended special mappings to get mremap right a couple
> > years go.)  But maybe you don't want to *zero* it on fork and you want
> > to do something more intelligent.  Fine -- you control ->fault!
>
> Doing something more intelligent would be an interesting development, I
> guess... But, before I think about that, all mapping have flags;
> couldn't I *still* set VM_WIPEONFORK on the special mapping? Or does the
> API you have in mind not work that way? (Side note: I also want
> VM_DONTDUMP to work.)

You really want unmap (the pages, not the vma) on fork, not wipe on
fork.  It'll be VM_SHARED, and I'm not sure what VM_WIPEONFORK |
VM_SHARED does.

  reply	other threads:[~2023-01-03 20:52 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 16:29 [PATCH v14 0/7] implement getrandom() in vDSO Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace Jason A. Donenfeld
2023-01-03 10:32   ` Ingo Molnar
2023-01-03 14:51     ` Jason A. Donenfeld
2023-01-03 17:00       ` Ingo Molnar
2023-01-03 17:29         ` Borislav Petkov
2023-01-03 17:30           ` Jason A. Donenfeld
2023-01-03 17:47             ` Ingo Molnar
2023-01-03 17:48               ` Jason A. Donenfeld
2023-01-04 20:25               ` Ingo Molnar
2023-01-04 20:29                 ` Jason A. Donenfeld
2023-01-03 11:00   ` [tip: x86/asm] x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type tip-bot2 for Jason A. Donenfeld
2023-01-03 17:53   ` [tip: x86/urgent] " tip-bot2 for Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2023-01-03 10:50   ` Ingo Molnar
2023-01-03 15:01     ` Jason A. Donenfeld
2023-01-03 18:15       ` Ingo Molnar
2023-01-03 18:51         ` Jason A. Donenfeld
2023-01-03 18:36     ` Andy Lutomirski
2023-01-03 19:05       ` Jason A. Donenfeld
2023-01-03 20:52         ` Andy Lutomirski [this message]
2023-01-03 19:19       ` Linus Torvalds
2023-01-03 19:35         ` Jason A. Donenfeld
2023-01-03 19:54           ` Linus Torvalds
2023-01-03 20:03             ` Jason A. Donenfeld
2023-01-03 20:15               ` Linus Torvalds
2023-01-03 20:25                 ` Linus Torvalds
2023-01-03 20:44                 ` Jason A. Donenfeld
2023-01-05 21:57                   ` Yann Droneaud
2023-01-05 22:57                     ` Jason A. Donenfeld
2023-01-06  1:02                       ` Linus Torvalds
2023-01-06  2:08                         ` Linus Torvalds
2023-01-06  2:42                           ` Jason A. Donenfeld
2023-01-06 20:53                           ` Andy Lutomirski
2023-01-06 21:10                             ` Linus Torvalds
2023-01-10 11:01                               ` Dr. Greg
2023-01-06 21:36                             ` Jason A. Donenfeld
2023-01-06 21:42                           ` Matthew Wilcox
2023-01-06 22:06                             ` Linus Torvalds
2023-01-06  2:14                         ` Jason A. Donenfeld
2023-01-09 10:34             ` Florian Weimer
2023-01-09 14:28               ` Linus Torvalds
2023-01-11  7:27                 ` Eric Biggers
2023-01-11 12:07                   ` Linus Torvalds
2023-01-01 16:29 ` [PATCH v14 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 4/7] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 5/7] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 6/7] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2023-01-12 17:27   ` Christophe Leroy
2023-01-12 17:49     ` Jason A. Donenfeld
2023-01-11 22:23 ` [PATCH v14 0/7] implement getrandom() in vDSO Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrXaHPZMNx7g2NS9-5ShG3i74615W7gKQ2tmr4xpvgTBkA@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).