All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Yu Zhao <yuzhao@google.com>, Andy Lutomirski <luto@kernel.org>,
	Peter Xu <peterx@redhat.com>, Pavel Emelyanov <xemul@openvz.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Minchan Kim <minchan@kernel.org>, Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Kees Cook <keescook@chromium.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>, Jan Kara <jack@suse.cz>,
	Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy
Date: Fri, 8 Jan 2021 14:19:45 -0400	[thread overview]
Message-ID: <20210108181945.GF504133@ziepe.ca> (raw)
In-Reply-To: <X/iPtCktcQHwuK5T@redhat.com>

On Fri, Jan 08, 2021 at 12:00:36PM -0500, Andrea Arcangeli wrote:
> > The majority cannot be converted to notifiers because they are DMA
> > based. Every one of those is an ABI for something, and does not expect
> > extra privilege to function. It would be a major breaking change to
> > have pin_user_pages require some cap.
> 
> ... what makes them safe is to be transient GUP pin and not long
> term.
> 
> Please note the "long term" in the underlined line.

Many of them are long term, though only 50 or so have been marked
specifically with FOLL_LONGTERM. I don't see how we can make such a
major ABI break.

Looking at it, vmsplice() is simply wrong. A long term page pin must
use pin_user_pages(), and either FOLL_LONGTERM|FOLL_WRITE (write mode)
FOLL_LONGTERM|FOLL_FORCE|FOLL_WRITE (read mode)

ie it must COW and it must reject cases that are not longterm safe,
like DAX and CMA and so on.

These are the well established rules, vmsplice does not get a pass
simply because it is using the CPU to memory copy as its "DMA".

> speaking in practice. io_uring has similar concern but it can use mmu
> notifier, so it can totally fix it and be 100% safe from this.

IIRC io_uring does use FOLL_LONGTERM and FOLL_WRITE..

> The scheduler disclosure date was 2020-08-25 so I can freely explain
> the case that motivated all these changes.
> 
> case A)
> 
> if !fork() {
>    // in child
>    mmap one page
>    vmsplice takes gup pin long term on such page
>    munmap one page
>    // mapcount == 1 (parent mm)
>    // page_count == 2 (gup in child, and parent mm)
> } else {
>    parent writes to the page
>    // mapcount == 1, wp_page_reuse
> }
> 
> parent did a COW with mapcount == 1 so the parent will take over a
> page that is still GUP pinned in the child. 

Sorry, I missed something, how does mmaping a fresh new page in the
child impact the parent?

I guess the issue is not to mmap but to GUP a shared page in a way
that doesn't trigger COW during GUP and then munmap that page so a
future parent COW does re-use, leaking access.

It seems enforcing FOLL_WRITE to always COW on GUP closes this, right?

This is what all correct FOLL_LONGTERM users do today, it is required
for many other reasons beyond this interesting security issue.

> However, you know full well in the second case it is a feature and not
> a bug, that wp_page_reuse is called instead, and in fact it has to be
> called or it's a bug (and that's the bug page_count in do_wp_page
> introduces).

What I was trying to explain below, is I think we agreed that a page
under active FOLL_LONGTERM pin *can not* be write protected.

Establishing the FOLL_LONGTERM pin (for read or write) must *always*
break the write protection and the VM *cannot* later establish a new
write protection on that page while the pin is active.

Indeed, it is complete nonsense to try and write protect a page that
has active DMA write activity! Changing the CPU page protection bits
will not stop any DMA! Doing so will inevitably become a security
problem with an attack similar to what you described.

So this is what was done during fork() - fork will no longer write
protect pages under FOLL_LONGTERM to make them COWable, instead it
will copy them at fork time.

Any other place doing write protect must also follow these same
rules.

I wasn't aware this could be used to create a security problem, but it
does make sense. write protect really must mean writes to the memory
must stop and that is fundementally incompatible with active DMA.

Thus write protect of pages under DMA must be forbidden, as a matter
of security.

Jason

  reply	other threads:[~2021-01-08 18:21 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-25  9:25 [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes Nadav Amit
2020-12-25  9:25 ` [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect Nadav Amit
2021-01-04 12:22   ` Peter Zijlstra
2021-01-04 19:24     ` Andrea Arcangeli
2021-01-04 19:35       ` Nadav Amit
2021-01-04 20:19         ` Andrea Arcangeli
2021-01-04 20:39           ` Nadav Amit
2021-01-04 21:01             ` Andrea Arcangeli
2021-01-04 21:26               ` Nadav Amit
2021-01-05 18:45                 ` Andrea Arcangeli
2021-01-05 19:05                   ` Nadav Amit
2021-01-05 19:45                     ` Andrea Arcangeli
2021-01-05 20:06                       ` Nadav Amit
2021-01-05 21:06                         ` Andrea Arcangeli
2021-01-05 21:43                           ` Peter Xu
2021-01-05  8:13       ` Peter Zijlstra
2021-01-05  8:52         ` Nadav Amit
2021-01-05 14:26           ` Peter Zijlstra
2021-01-05  8:58       ` Peter Zijlstra
2021-01-05  9:22         ` Nadav Amit
2021-01-05 17:58         ` Andrea Arcangeli
2021-01-05 15:08   ` Peter Xu
2021-01-05 18:08     ` Andrea Arcangeli
2021-01-05 18:41       ` Peter Xu
2021-01-05 18:55         ` Andrea Arcangeli
2021-01-05 19:07     ` Nadav Amit
2021-01-05 19:43       ` Peter Xu
2020-12-25  9:25 ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Nadav Amit
2021-01-05 15:08   ` Will Deacon
2021-01-05 18:20   ` Andrea Arcangeli
2021-01-05 19:26     ` Nadav Amit
2021-01-05 20:39       ` Andrea Arcangeli
2021-01-05 21:20         ` Yu Zhao
2021-01-05 21:22         ` Nadav Amit
2021-01-05 22:16           ` Will Deacon
2021-01-06  0:29             ` Andrea Arcangeli
2021-01-06  0:02           ` Andrea Arcangeli
2021-01-07 20:04           ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Andrea Arcangeli
2021-01-07 20:04             ` [PATCH 1/2] mm: proc: Invalidate TLB after clearing soft-dirty page state Andrea Arcangeli
2021-01-07 20:04             ` [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending Andrea Arcangeli
2021-01-07 20:17               ` Linus Torvalds
2021-01-07 20:17                 ` Linus Torvalds
2021-01-07 20:25                 ` Linus Torvalds
2021-01-07 20:25                   ` Linus Torvalds
2021-01-07 20:58                 ` Andrea Arcangeli
2021-01-07 21:29                   ` Linus Torvalds
2021-01-07 21:29                     ` Linus Torvalds
2021-01-07 21:53                     ` John Hubbard
2021-01-07 22:00                       ` Linus Torvalds
2021-01-07 22:00                         ` Linus Torvalds
2021-01-07 22:14                         ` John Hubbard
2021-01-07 22:20                           ` Linus Torvalds
2021-01-07 22:20                             ` Linus Torvalds
2021-01-07 22:24                             ` Linus Torvalds
2021-01-07 22:24                               ` Linus Torvalds
2021-01-07 22:37                               ` John Hubbard
2021-01-15 11:27                       ` Jan Kara
2021-01-07 22:31                     ` Andrea Arcangeli
2021-01-07 22:42                       ` Linus Torvalds
2021-01-07 22:42                         ` Linus Torvalds
2021-01-07 22:51                         ` Linus Torvalds
2021-01-07 22:51                           ` Linus Torvalds
2021-01-07 23:48                           ` Andrea Arcangeli
2021-01-08  0:25                             ` Linus Torvalds
2021-01-08  0:25                               ` Linus Torvalds
2021-01-08 12:48                               ` Will Deacon
2021-01-08 16:14                                 ` Andrea Arcangeli
2021-01-08 17:39                                   ` Linus Torvalds
2021-01-08 17:39                                     ` Linus Torvalds
2021-01-08 17:53                                     ` Andrea Arcangeli
2021-01-08 19:25                                       ` Linus Torvalds
2021-01-08 19:25                                         ` Linus Torvalds
2021-01-09  0:12                                         ` Andrea Arcangeli
2021-01-08 17:30                                 ` Linus Torvalds
2021-01-08 17:30                                   ` Linus Torvalds
2021-01-07 23:28                         ` Andrea Arcangeli
2021-01-07 21:36               ` kernel test robot
2021-01-07 21:36                 ` kernel test robot
2021-01-07 20:25             ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Jason Gunthorpe
2021-01-07 20:32               ` Linus Torvalds
2021-01-07 20:32                 ` Linus Torvalds
2021-01-07 21:05                 ` Linus Torvalds
2021-01-07 21:05                   ` Linus Torvalds
2021-01-07 22:02                   ` Andrea Arcangeli
2021-01-07 22:17                     ` Linus Torvalds
2021-01-07 22:17                       ` Linus Torvalds
2021-01-07 22:56                       ` Andrea Arcangeli
2021-01-09 19:32                   ` Matthew Wilcox
2021-01-09 19:46                     ` Linus Torvalds
2021-01-09 19:46                       ` Linus Torvalds
2021-01-15 14:30                       ` Jan Kara
2021-01-07 21:54                 ` Andrea Arcangeli
2021-01-07 21:45               ` Andrea Arcangeli
2021-01-08 13:36                 ` Jason Gunthorpe
2021-01-08 17:00                   ` Andrea Arcangeli
2021-01-08 18:19                     ` Jason Gunthorpe [this message]
2021-01-08 18:31                       ` Andy Lutomirski
2021-01-08 18:31                         ` Andy Lutomirski
2021-01-08 18:38                         ` Linus Torvalds
2021-01-08 18:38                           ` Linus Torvalds
2021-01-08 23:34                         ` Andrea Arcangeli
2021-01-09 19:03                           ` Andy Lutomirski
2021-01-09 19:03                             ` Andy Lutomirski
2021-01-09 19:15                             ` Linus Torvalds
2021-01-09 19:15                               ` Linus Torvalds
2021-01-08 18:59                       ` Linus Torvalds
2021-01-08 18:59                         ` Linus Torvalds
2021-01-08 22:43                       ` Andrea Arcangeli
2021-01-09  0:42                         ` Jason Gunthorpe
2021-01-09  2:50                           ` Andrea Arcangeli
2021-01-11 14:30                             ` Jason Gunthorpe
2021-01-13 21:56                           ` Jerome Glisse
2021-01-13 23:39                             ` Jason Gunthorpe
2021-01-14  2:35                               ` Jerome Glisse
2021-01-09  3:49                       ` Hillf Danton
2021-01-11 14:39                         ` Jason Gunthorpe
2021-01-05 21:55         ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Peter Xu
2021-03-02 22:13 ` [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes Peter Xu
2021-03-02 22:14   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210108181945.GF504133@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=aarcange@redhat.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xemul@openvz.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.