All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andy Lutomirski" <luto@kernel.org>
To: "Jann Horn" <jannh@google.com>, "Peter Oskolkov" <posk@google.com>
Cc: "Peter Oskolkov" <posk@posk.io>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux API" <linux-api@vger.kernel.org>,
	"Paul Turner" <pjt@google.com>, "Ben Segall" <bsegall@google.com>,
	"Andrei Vagin" <avagin@google.com>,
	"Thierry Delisle" <tdelisle@uwaterloo.ca>
Subject: Re: [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers
Date: Tue, 14 Sep 2021 09:52:08 -0700	[thread overview]
Message-ID: <d656e605-4f89-4ea2-8baf-f7786f0630d9@www.fastmail.com> (raw)
In-Reply-To: <CAG48ez0mgCXpXnqAUsa0TcFBPjrid-74Gj=xG8HZqj2n+OPoKw@mail.gmail.com>



On Thu, Sep 9, 2021, at 2:20 PM, Jann Horn wrote:
> On Thu, Sep 9, 2021 at 9:07 PM Peter Oskolkov <posk@google.com> wrote:
> > On Wed, Sep 8, 2021 at 4:39 PM Jann Horn <jannh@google.com> wrote:
> >
> > Thanks a lot for the reviews, Jann!
> >
> > I understand how to address most of your comments. However, one issue
> > I'm not sure what to do about:
> >
> > [...]
> >
> > > If this function is not allowed to sleep, as the comment says...
> >
> > [...]
> >
> > > ... then I'm pretty sure you can't call fix_pagefault() here, which
> > > acquires the mmap semaphore (which may involve sleeping) and then goes
> > > through the pagefault handling path (which can also sleep for various
> > > reasons, like allocating memory for pagetables, loading pages from
> > > disk / NFS / FUSE, and so on).
> >
> > <quote from peterz@ from
> > https://lore.kernel.org/lkml/20210609125435.GA68187@worktop.programming.kicks-ass.net/>:
> >   So a PF_UMCG_WORKER would be added to sched_submit_work()'s PF_*_WORKER
> >   path to capture these tasks blocking. The umcg_sleeping() hook added
> >   there would:
> >
> >     put_user(BLOCKED, umcg_task->umcg_status);
> >     ...
> > </quote>
> >
> > Which is basically what I am doing here: in sched_submit_work() I need
> > to read/write to userspace; and we cannot sleep in
> > sched_submit_work(), I believe.
> >
> > If you are right that it is impossible to deal with pagefaults from
> > within non-sleepable contexts, I see two options:
> >
> > Option 1: as you suggest, pin pages holding struct umcg_task in sys_umcg_ctl;
> 
> FWIW, there is a variant on this that might also be an option:
> 
> You can create a new memory mapping from kernel code and stuff pages
> into it that were originally allocated as normal kernel pages. This is
> done in a bunch of places, e.g.:

With a custom mapping, you don’t need to pin pages at all, I think.  As long as you can reconstruct the contents of the shared page and you’re willing to do some slightly careful synchronization, you can detect that the page is missing when you try to update it and skip the update. The vm_ops->fault handler can repopulate the page the next time it’s accessed.

All that being said, I feel like I’m missing something. The point of this is to send what the old M:N folks called “scheduler activations”, right?  Wouldn’t it be more efficient to explicitly wake something blockable/pollable and write the message into a more efficient data structure?  Polling one page per task from userspace seems like it will have inherently high latency due to the polling interval and will also have very poor locality.  Or am I missing something?

> 
>
> Note that what I'm suggesting here is a bit unusual - normally only
> the vDSO is a "special mapping", other APIs tend to use mappings that
> are backed by files. But I think we probably don't want to have a file
> involved here...
> 

A file would be weird — the lifetime and SCM_RIGHTS interactions may be unpleasant.

> If you decide to go this route, you should probably CC
> linux-mm@kvack.org (for general memory management) and Andy Lutomirski
> (who has tinkered around in vDSO-related code a lot).
> 

Who’s that? :)

> > or
> >
> > Option 2: add more umcg-related kernel state to task_struct so that
> > reading/writing to userspace is not necessary in sched_submit_work().
> >
> > The first option sounds much better from the code simplicity point of
> > view, but I'm not sure if it is a viable approach, i.e. I'm afraid
> > we'll get a hard NACK here, as a non-privileged process will be able
> > to force the kernel to pin a page per task/thread.
> 
> To clarify: It's entirely normal that userspace processes can force
> the kernel to hold on to some amounts of memory that can't be paged
> out - consider e.g. pagetables and kernel objects referenced by file
> descriptors. So an API that pins limited amounts of memory that are
> also mapped in userspace isn't inherently special. But pinning pages
> that were originally allocated as normal userspace memory can be more
> problematic because that memory might be hugepages, or file pages, or
> it might prevent the hugepaged from being able to defragment memory
> because the pinned page was allocated in ZONE_MOVABLE.
> 
> 
> > We may get around
> > it by first pinning a limited number of pages, then having the
> > userspace allocate structs umcg_task on those pages, so that a pinned
> > page would cover more than a single task/thread. And have a sysctl
> > that limits the number of pinned pages per MM.
> 
> I think that you wouldn't necessarily need a sysctl for that if the
> kernel can enforce that you don't have more pages allocated than you
> need for the maximum number of threads that have ever been running
> under the process, and you also use __GFP_ACCOUNT so that cgroups can
> correctly attribute the memory usage.
> 
> > Peter Z., could you, please, comment here? Do you think pinning pages
> > to hold structs umcg_task is acceptable?
> 

  parent reply	other threads:[~2021-09-14 16:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08 18:49 [PATCH 0/4 v0.5] sched/umcg: RFC UMCG patchset Peter Oskolkov
2021-09-08 18:49 ` [PATCH 1/4 v0.5] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Oskolkov
2021-09-08 18:49 ` [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers Peter Oskolkov
2021-09-08 23:38   ` Jann Horn
2021-09-09  1:16     ` Jann Horn
2021-09-09 19:06     ` Peter Oskolkov
2021-09-09 21:20       ` Jann Horn
2021-09-09 22:09         ` Peter Oskolkov
2021-09-09 23:13           ` Jann Horn
2021-09-14 16:52         ` Andy Lutomirski [this message]
2021-09-14 18:11           ` Peter Zijlstra
2021-09-14 18:40             ` Andy Lutomirski
2021-09-15 15:42               ` Peter Zijlstra
2021-09-15 16:50                 ` Andy Lutomirski
2021-09-15 19:10                   ` Peter Zijlstra
2021-09-14  8:07       ` Peter Zijlstra
2021-09-14 16:29         ` Peter Oskolkov
2021-09-14 18:04           ` Peter Zijlstra
2021-09-14 18:15             ` Peter Zijlstra
2021-09-14 18:29             ` Peter Oskolkov
2021-09-14 18:48               ` Peter Oskolkov
2021-09-08 18:49 ` [PATCH 3/4 v0.5] sched/umcg: RFC: implement UMCG syscalls Peter Oskolkov
2021-09-08 22:02   ` kernel test robot
2021-09-09  1:39   ` Jann Horn
2021-09-14 16:51     ` Peter Oskolkov
2021-09-09 11:25   ` kernel test robot
2021-09-08 18:49 ` [PATCH 4/4 v0.5] sched/umcg: add Documentation/userspace-api/umcg.rst Peter Oskolkov
2021-09-14 16:35   ` Tao Zhou
2021-09-14 16:57     ` Peter Oskolkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d656e605-4f89-4ea2-8baf-f7786f0630d9@www.fastmail.com \
    --to=luto@kernel.org \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tdelisle@uwaterloo.ca \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.