Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall

* Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall
       [not found]   ` <CAG48ez3UrzPE8rkucTgCu8ggcTEjx_h3Gj2FES1qM-uv2KD8bQ@mail.gmail.com>
@ 2021-07-02  6:22     ` Andrei Vagin
  2021-07-02 11:51       ` Jann Horn
  0 siblings, 1 reply; 6+ messages in thread
From: Andrei Vagin @ 2021-07-02  6:22 UTC (permalink / raw)
  To: Jann Horn
  Cc: linux-kernel, linux-api, linux-um, criu, avagin, Andrew Morton,
	Andy Lutomirski, Anton Ivanov, Christian Brauner, Dmitry Safonov,
	Ingo Molnar, Jeff Dike, Mike Rapoport, Michael Kerrisk,
	Oleg Nesterov, Peter Zijlstra, Richard Weinberger,
	Thomas Gleixner, linux-mm

On Mon, Jun 28, 2021 at 06:13:29PM +0200, Jann Horn wrote:
> On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin <avagin@gmail.com> wrote:
> > This change introduces the new system call:
> > process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags,
> >                 siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask)
> >
> > process_vm_exec allows to execute the current process in an address
> > space of another process.
> [...]
> 
> I still think that this whole API is fundamentally the wrong approach
> because it tries to shoehorn multiple usecases with different
> requirements into a single API. But that aside:

Here, I can't agree with you, but this is discussed in the parallel
thread.

> 
> > +static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm)
> > +{
> > +       struct task_struct *tsk = current;
> > +       struct mm_struct *active_mm;
> > +
> > +       task_lock(tsk);
> > +       /* Hold off tlb flush IPIs while switching mm's */
> > +       local_irq_disable();
> > +
> > +       sync_mm_rss(prev_mm);
> > +
> > +       vmacache_flush(tsk);
> > +
> > +       active_mm = tsk->active_mm;
> > +       if (active_mm != target_mm) {
> > +               mmgrab(target_mm);
> > +               tsk->active_mm = target_mm;
> > +       }
> > +       tsk->mm = target_mm;
> 
> I'm pretty sure you're not currently allowed to overwrite the ->mm
> pointer of a userspace thread. For example, zap_threads() assumes that
> all threads running under a process have the same ->mm. (And if you're
> fiddling with ->mm stuff, you should probably CC linux-mm@.)
> 
> As far as I understand, only kthreads are allowed to do this (as
> implemented in kthread_use_mm()).

kthread_use_mm() was renamed from use_mm in the v5.8 kernel. Before
that, it wasn't used for user processes in the kernel, but it was
exported for modules, and we used it without any visible problems. We
understood that there could be some issues like zap_threads and it was
one of reasons why we decided to introduce this system call.

I understand that there are no places in the kernel where we change mm
of user threads back and forth, but are there any real concerns why we
should not do that? I agree that zap_threads should be fixed, but it
will the easy one.

> 
> > +       switch_mm_irqs_off(active_mm, target_mm, tsk);
> > +       local_irq_enable();
> > +       task_unlock(tsk);
> > +#ifdef finish_arch_post_lock_switch
> > +       finish_arch_post_lock_switch();
> > +#endif
> > +
> > +       if (active_mm != target_mm)
> > +               mmdrop(active_mm);
> > +}

^ permalink raw reply	[flat|nested] 6+ messages in thread