From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754070AbaFMRXs (ORCPT ); Fri, 13 Jun 2014 13:23:48 -0400 Received: from mail-ve0-f178.google.com ([209.85.128.178]:41794 "EHLO mail-ve0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753803AbaFMRXp (ORCPT ); Fri, 13 Jun 2014 13:23:45 -0400 MIME-Version: 1.0 In-Reply-To: References: <1402655819-14325-1-git-send-email-dh.herrmann@gmail.com> <1402655819-14325-8-git-send-email-dh.herrmann@gmail.com> From: Andy Lutomirski Date: Fri, 13 Jun 2014 10:23:23 -0700 Message-ID: Subject: Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files To: David Herrmann Cc: "linux-kernel@vger.kernel.org" , Michael Kerrisk , Ryan Lortie , Linus Torvalds , Andrew Morton , "linux-mm@kvack.org" , Linux FS Devel , Linux API , Greg Kroah-Hartman , John Stultz , Lennart Poettering , Daniel Mack , Kay Sievers , Hugh Dickins , Tony Battersby Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 13, 2014 at 8:27 AM, David Herrmann wrote: > Hi > > On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski wrote: >> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann wrote: >>> When setting SEAL_WRITE, we must make sure nobody has a writable reference >>> to the pages (via GUP or similar). We currently check references and wait >>> some time for them to be dropped. This, however, might fail for several >>> reasons, including: >>> - the page is pinned for longer than we wait >>> - while we wait, someone takes an already pinned page for read-access >>> >>> Therefore, this patch introduces page-isolation. When sealing a file with >>> SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage >>> is put in place atomically, the old page is detached and left alone. It >>> will get reclaimed once the last external user dropped it. >>> >>> Signed-off-by: David Herrmann >> >> Won't this have unexpected effects? >> >> Thread 1: start read into mapping backed by fd >> >> Thread 2: SEAL_WRITE >> >> Thread 1: read finishes. now the page doesn't match the sealed page > > Just to be clear: you're talking about read() calls that write into > the memfd? (like my FUSE example does) Your language might be > ambiguous to others as "read into" actually implies a write. > > No, this does not have unexpected effects. But yes, your conclusion is > right. To be clear, this behavior would be part of the API. Any > asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your > buffer before the write finishes. But you actually have to extend your > example: > > Thread 1: p = mmap(memfd, SIZE); > Thread 1: h = async_read(some_fd, p, SIZE); > Thread 1: munmap(p, SIZE); > Thread 2: SEAL_WRITE > Thread 1: async_wait(h); > > If you don't do the unmap(), then SEAL_WRITE will fail due to an > elevated i_mmap_writable. I think this is fine. In fact, I remember > reading that async-IO is not required to resolve user-space addresses > at the time of the syscall, but might delay it to the time of the > actual write. But you're right, it would be misleading that the AIO > operation returns success. This would be part of the memfd-API, > though. And if you mess with your address space while running an > async-IO operation on it, you're screwed anyway. Ok, I missed the part where you had to munmap to trigger the oddity. That seems fine to me. > > Btw., your sealing use-case is really odd. No-one guarantees that the > SEAL_WRITE happens _after_ you schedule your async-read. In case you > have some synchronization there, you just have to move it after > waiting for your async-io to finish. > > Does that clear things up? I think so. --Andy