From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 470DCC43441 for ; Sat, 10 Nov 2018 18:45:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E68D020858 for ; Sat, 10 Nov 2018 18:45:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZX020Zn0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E68D020858 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727277AbeKKEbU (ORCPT ); Sat, 10 Nov 2018 23:31:20 -0500 Received: from mail-ua1-f68.google.com ([209.85.222.68]:35644 "EHLO mail-ua1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727110AbeKKEbT (ORCPT ); Sat, 10 Nov 2018 23:31:19 -0500 Received: by mail-ua1-f68.google.com with SMTP id d8so1769726ual.2 for ; Sat, 10 Nov 2018 10:45:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=8AMLtmmurCtRfJVdyZgA2kK00jaD0RApM2FvUpjayjg=; b=ZX020Zn0rGW7bEwBPLkENmG2C45zBBZX19p5G+JHxFBjP3eUNpghpl76MhYEjLdSvZ SvCEtM1yOvkThmHBtewMvEpwcubO8qNpEPP442PxSIhlzk46OmUPEAYgdZXSXbwPddcQ pOljUksJIxbTzE+gV5NgH43vdDwylG9Fz7guKPnyx9FTiJaAs4wO7fPIclOMlRpdJMc8 NWIWbBHfASOQ77xEpUwMfEBR/1twh3kRjBUdxO2ecfnX2LoB8pYLavtXKT9BoKWxhLRg fv/6qztMStIpjKxtbmt52X059hO/Y2IyiVD/XX6VekxHTHIFJeLpbgDH2XLUTyU0oUNX PuIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=8AMLtmmurCtRfJVdyZgA2kK00jaD0RApM2FvUpjayjg=; b=PU/tZLOCZP95FcM1borqoW4bjCYeKfOC4RfvcylPF5HiDGQl5PiCT6J9SaXH1gzSID t5oklMmxOfiIlpfqAZ53g2pY/M6P8dMk+mpK7bNhmCllZtgXEwFM1aRitGX1fq17mvXb whCy1CdiPspRka8vGXyvvCrWutgs2SDj1Q04q+m+7t3VY7n5PWByeqN5xK8kItRa0viA Vn+zABvADvgpoo7DP/ppVD8wwE38CRsZLUO/vskJeFjR5f/bjzlvDX0Pqqqaym8B0rOk JiK6jwHzRgRs+jVAhLTr26YusrXhjXRXUsukFnN9a3Bf6G0E1XawtoTnPpfOmo0cOEO+ rUKg== X-Gm-Message-State: AGRZ1gIf4gYtdv/x7ZK14+HIDEI01BYzjevJ/jtSU/WDlvY4qSFUJzE0 HS6Vr799NtW9Uz4IWvMev+hfRJvxdA2xXzsld5ktEQ== X-Google-Smtp-Source: AJdET5eO4tXb6GLsmvt1mMKh8qWHpfQl5PK+Vd+WP9U3tJh/WU1xAy0FFH5Hc2/wz/qznKF6EGq4F8VQChTvPJBZDoI= X-Received: by 2002:ab0:5648:: with SMTP id z8mr5851121uaa.126.1541875520806; Sat, 10 Nov 2018 10:45:20 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a67:f48d:0:0:0:0:0 with HTTP; Sat, 10 Nov 2018 10:45:19 -0800 (PST) In-Reply-To: <20181110182405.GB242356@google.com> References: <20181108041537.39694-1-joel@joelfernandes.org> <20181110032005.GA22238@google.com> <69CE06CC-E47C-4992-848A-66EB23EE6C74@amacapital.net> <20181110182405.GB242356@google.com> From: Daniel Colascione Date: Sat, 10 Nov 2018 10:45:19 -0800 Message-ID: Subject: Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd To: Joel Fernandes Cc: Andy Lutomirski , Jann Horn , kernel list , John Reck , John Stultz , Todd Kjos , Greg Kroah-Hartman , Christoph Hellwig , Al Viro , Andrew Morton , Bruce Fields , Jeff Layton , Khalid Aziz , Lei.Yang@windriver.com, linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux-MM , marcandre.lureau@redhat.com, Mike Kravetz , Minchan Kim , Shuah Khan , Valdis Kletnieks , Hugh Dickins , Linux API Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes w= rote: > Thanks Andy for your thoughts, my comments below: > > On Fri, Nov 09, 2018 at 10:05:14PM -0800, Andy Lutomirski wrote: >> >> >> > On Nov 9, 2018, at 7:20 PM, Joel Fernandes wr= ote: >> > >> >> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote: >> >>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn wrote: >> >>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google) >> >>> wrote: >> >>>> Android uses ashmem for sharing memory regions. We are looking forw= ard >> >>>> to migrating all usecases of ashmem to memfd so that we can possibl= y >> >>>> remove the ashmem driver in the future from staging while also >> >>>> benefiting from using memfd and contributing to it. Note staging dr= ivers >> >>>> are also not ABI and generally can be removed at anytime. >> >>>> >> >>>> One of the main usecases Android has is the ability to create a reg= ion >> >>>> and mmap it as writeable, then add protection against making any >> >>>> "future" writes while keeping the existing already mmap'ed >> >>>> writeable-region active. This allows us to implement a usecase whe= re >> >>>> receivers of the shared memory buffer can get a read-only view, whi= le >> >>>> the sender continues to write to the buffer. >> >>>> See CursorWindow documentation in Android for more details: >> >>>> https://developer.android.com/reference/android/database/CursorWind= ow >> >>>> >> >>>> This usecase cannot be implemented with the existing F_SEAL_WRITE s= eal. >> >>>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE s= eal >> >>>> which prevents any future mmap and write syscalls from succeeding w= hile >> >>>> keeping the existing mmap active. >> >>> >> >>> Please CC linux-api@ on patches like this. If you had done that, I >> >>> might have criticized your v1 patch instead of your v3 patch... >> >>> >> >>>> The following program shows the seal >> >>>> working in action: >> >>> [...] >> >>>> Cc: jreck@google.com >> >>>> Cc: john.stultz@linaro.org >> >>>> Cc: tkjos@google.com >> >>>> Cc: gregkh@linuxfoundation.org >> >>>> Cc: hch@infradead.org >> >>>> Reviewed-by: John Stultz >> >>>> Signed-off-by: Joel Fernandes (Google) >> >>>> --- >> >>> [...] >> >>>> diff --git a/mm/memfd.c b/mm/memfd.c >> >>>> index 2bb5e257080e..5ba9804e9515 100644 >> >>>> --- a/mm/memfd.c >> >>>> +++ b/mm/memfd.c >> >>> [...] >> >>>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, = unsigned int seals) >> >>>> } >> >>>> } >> >>>> >> >>>> + if ((seals & F_SEAL_FUTURE_WRITE) && >> >>>> + !(*file_seals & F_SEAL_FUTURE_WRITE)) { >> >>>> + /* >> >>>> + * The FUTURE_WRITE seal also prevents growing and = shrinking >> >>>> + * so we need them to be already set, or requested = now. >> >>>> + */ >> >>>> + int test_seals =3D (seals | *file_seals) & >> >>>> + (F_SEAL_GROW | F_SEAL_SHRINK); >> >>>> + >> >>>> + if (test_seals !=3D (F_SEAL_GROW | F_SEAL_SHRINK)) = { >> >>>> + error =3D -EINVAL; >> >>>> + goto unlock; >> >>>> + } >> >>>> + >> >>>> + spin_lock(&file->f_lock); >> >>>> + file->f_mode &=3D ~(FMODE_WRITE | FMODE_PWRITE); >> >>>> + spin_unlock(&file->f_lock); >> >>>> + } >> >>> >> >>> So you're fiddling around with the file, but not the inode? How are >> >>> you preventing code like the following from re-opening the file as >> >>> writable? >> >>> >> >>> $ cat memfd.c >> >>> #define _GNU_SOURCE >> >>> #include >> >>> #include >> >>> #include >> >>> #include >> >>> #include >> >>> #include >> >>> >> >>> int main(void) { >> >>> int fd =3D syscall(__NR_memfd_create, "testfd", 0); >> >>> if (fd =3D=3D -1) err(1, "memfd"); >> >>> char path[100]; >> >>> sprintf(path, "/proc/self/fd/%d", fd); >> >>> int fd2 =3D open(path, O_RDWR); >> >>> if (fd2 =3D=3D -1) err(1, "reopen"); >> >>> printf("reopen successful: %d\n", fd2); >> >>> } >> >>> $ gcc -o memfd memfd.c >> >>> $ ./memfd >> >>> reopen successful: 4 >> >>> $ >> >>> >> >>> That aside: I wonder whether a better API would be something that >> >>> allows you to create a new readonly file descriptor, instead of >> >>> fiddling with the writability of an existing fd. >> >> >> >> My favorite approach would be to forbid open() on memfds, hope that >> >> nobody notices the tiny API break, and then add an ioctl for "reopen >> >> this memfd with reduced permissions" - but that's just my personal >> >> opinion. >> > >> > I did something along these lines and it fixes the issue, but I forbid= open >> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then i= ts not >> > an ABI break because this is a brand new seal. That seems the least in= trusive >> > solution and it works. Do you mind testing it and I'll add your and Te= sted-by >> > to the new fix? The patch is based on top of this series. >> > >> > ---8<----------- >> > From: "Joel Fernandes (Google)" >> > Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of seal= ed memfd >> > >> > Jann Horn found that reopening an F_SEAL_FUTURE_WRITE sealed memfd >> > through /proc/self/fd/N symlink as writeable succeeds. The simplest fi= x >> > without causing ABI breakages and ugly VFS hacks is to simply deny all >> > opens on F_SEAL_FUTURE_WRITE sealed fds. >> > >> > Reported-by: Jann Horn >> > Signed-off-by: Joel Fernandes (Google) >> > --- >> > mm/shmem.c | 18 ++++++++++++++++++ >> > 1 file changed, 18 insertions(+) >> > >> > diff --git a/mm/shmem.c b/mm/shmem.c >> > index 446942677cd4..5b378c486b8f 100644 >> > --- a/mm/shmem.c >> > +++ b/mm/shmem.c >> > @@ -3611,7 +3611,25 @@ static const struct address_space_operations sh= mem_aops =3D { >> > .error_remove_page =3D generic_error_remove_page, >> > }; >> > >> > +/* Could arrive here for memfds opened through /proc/ */ >> > +int shmem_open(struct inode *inode, struct file *file) >> > +{ >> > + struct shmem_inode_info *info =3D SHMEM_I(inode); >> > + >> > + /* >> > + * memfds for which future writes have been prevented >> > + * should not be reopened, say, through /proc/pid/fd/N >> > + * symlinks otherwise it can cause a sealed memfd to be >> > + * promoted to writable. >> > + */ >> > + if (info->seals & F_SEAL_FUTURE_WRITE) >> > + return -EACCES; >> > + >> > + return 0; >> > +} >> >> The result of this series is very warty. We have a concept of seals, and= they all work similarly, except the new future write seal. That one: >> > > I don't see it as warty, different seals will work differently. It works > quite well for our usecase, and since Linux is all about solving real > problems in the real work, it would be useful to have it. > >> - causes a probably-observable effect in the file mode in F_GETFL. > > Wouldn't that be the right thing to observe anyway? > >> - causes reopen to fail. > > So this concern isn't true anymore if we make reopen fail only for WRITE > opens as Daniel suggested. I will make this change so that the security f= ix > is a clean one. > >> - does *not* affect other struct files that may already exist on the sam= e inode. > > TBH if you really want to block all writes to the file, then you want > F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over I= PC > to another process and we want to prevent any new writes in the receiver > side. There is no way this other receiving process can have an existing f= d > unless it was already sent one without the seal applied. The proposed se= al > could be renamed to F_SEAL_FD_WRITE if that is preferred. > >> - mysteriously malfunctions if you try to set it again on another struct >> file that already exists >> > > I didn't follow this, could you explain more? > >> - probably is insecure when used on hugetlbfs. > > The usecase is not expected to prevent all writes, indeed the usecase > requires existing mmaps to continue to be able to write into the memory m= ap. > So would you call that a security issue too? The use of the seal wants to > allow existing mmap regions to be continue to be written into (I mentione= d > more details in the cover letter). > >> I see two reasonable solutions: >> >> 1. Don=E2=80=99t fiddle with the struct file at all. Instead make the in= ode flag >> work by itself. > > Currently, the various VFS paths check only the struct file's f_mode to d= eny > writes of already opened files. This would mean more checking in all thos= e > paths (and modification of all those paths). > > Anyway going with that idea, we could > 1. call deny_write_access(file) from the memfd's seal path which decremen= ts > the inode::i_writecount. > 2. call get_write_access(inode) in the various VFS paths in addition to > checking for FMODE_*WRITE and deny the write (incase i_writecount is nega= tive) > > That will prevent both reopens, and writes from succeeding. However I wor= ry a > bit about 2 not being too familiar with VFS internals, about what the > consequences of doing that may be. IMHO, modifying both the inode and the struct file separately is fine, since they mean different things. In regular filesystems, it's fine to have a read-write open file description for a file whose inode grants write permission to nobody. Speaking of which: is fchmod enough to prevent this attack? >> 2. Don=E2=80=99t call it a =E2=80=9Cseal=E2=80=9D. Instead fix the /pro= c hole and add an API to >> drop write access on an existing struct file. >> >> I personally prefer #2. > > So I don't think proc has a hole looking at the code yesterday. As I was > saying in other thread, its just how symlinks work. If we were to apply t= his > API to files in general, and we had symlinks to files and we tried to reo= pen the > file, we would run into the same issue - that's why I believe it has noth= ing > to do with proc, and the fix has to be in the underlying inode being poin= ted > to. Does that make sense or did I miss something? +1. Another point to consider is that even if we did somehow fix the "proc hole", any other means (perhaps in the future) of obtaining an inode reference would then be insecure, perhaps silently. For example --- is there a way to use open_by_handle_at(2) to open a memfd file?