From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754384AbbEOJi3 (ORCPT ); Fri, 15 May 2015 05:38:29 -0400 Received: from mail.phunq.net ([184.71.0.62]:58026 "EHLO starbase.phunq.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754290AbbEOJi1 (ORCPT ); Fri, 15 May 2015 05:38:27 -0400 Message-ID: <5555BE99.1030803@phunq.net> Date: Fri, 15 May 2015 02:38:33 -0700 From: Daniel Phillips User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Rik van Riel , linux-kernel@vger.kernel.org CC: linux-fsdevel@vger.kernel.org, tux3@tux3.org, OGAWA Hirofumi , mgorman@suse.de, Andrea Arcangeli , Peter Zijlstra Subject: Re: [FYI] tux3: Core changes References: <8f886f13-6550-4322-95be-93244ae61045@phunq.net> <55545C2F.8040207@phunq.net> <55549C2F.6000103@redhat.com> <5555388F.5010909@phunq.net> <555562AE.9020204@redhat.com> In-Reply-To: <555562AE.9020204@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/14/2015 08:06 PM, Rik van Riel wrote: > On 05/14/2015 08:06 PM, Daniel Phillips wrote: >>> The issue is that things like ptrace, AIO, infiniband >>> RDMA, and other direct memory access subsystems can take >>> a reference to page A, which Tux3 clones into a new page B >>> when the process writes it. >>> >>> However, while the process now points at page B, ptrace, >>> AIO, infiniband, etc will still be pointing at page A. >>> >>> This causes the process and the other subsystem to each >>> look at a different page, instead of at shared state, >>> causing ptrace to do nothing, AIO and RDMA data to be >>> invisible (or corrupted), etc... >> >> Is this a bit like page migration? > > Yes. Page migration will fail if there is an "extra" > reference to the page that is not accounted for by > the migration code. > > Only pages that have no extra refcount can be migrated. > > Similarly, your cow code needs to fail if there is an > extra reference count pinning the page. As long as > the page has a user that you cannot migrate, you cannot > move any of the other users over. They may rely on data > written by the hidden-to-you user, and the hidden-to-you > user may write to the page when you think it is a read > only stable snapshot. Please bear with me as I study these cases one by one. First one is ptrace. Only for executable files, right? Maybe we don't need to fork pages in executable files, Uprobes... If somebody puts a breakpoint in a page and we fork it, the replacement page has a copy of the breakpoint, and all the code on the page. Did anything break? Note: we have the option of being cowardly and just not doing page forking for mmapped files, or certain kinds of mmapped files, etc. But first we should give it the old college try, to see if absolute perfection is possible and practical. Regards, Daniel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Phillips Subject: Re: [FYI] tux3: Core changes Date: Fri, 15 May 2015 02:38:33 -0700 Message-ID: <5555BE99.1030803@phunq.net> References: <8f886f13-6550-4322-95be-93244ae61045@phunq.net> <55545C2F.8040207@phunq.net> <55549C2F.6000103@redhat.com> <5555388F.5010909@phunq.net> <555562AE.9020204@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Andrea Arcangeli , Peter Zijlstra , tux3@tux3.org, mgorman@suse.de, linux-fsdevel@vger.kernel.org, OGAWA Hirofumi To: Rik van Riel , linux-kernel@vger.kernel.org Return-path: In-Reply-To: <555562AE.9020204@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tux3-bounces@phunq.net Sender: "Tux3" List-Id: linux-fsdevel.vger.kernel.org On 05/14/2015 08:06 PM, Rik van Riel wrote: > On 05/14/2015 08:06 PM, Daniel Phillips wrote: >>> The issue is that things like ptrace, AIO, infiniband >>> RDMA, and other direct memory access subsystems can take >>> a reference to page A, which Tux3 clones into a new page B >>> when the process writes it. >>> >>> However, while the process now points at page B, ptrace, >>> AIO, infiniband, etc will still be pointing at page A. >>> >>> This causes the process and the other subsystem to each >>> look at a different page, instead of at shared state, >>> causing ptrace to do nothing, AIO and RDMA data to be >>> invisible (or corrupted), etc... >> >> Is this a bit like page migration? > > Yes. Page migration will fail if there is an "extra" > reference to the page that is not accounted for by > the migration code. > > Only pages that have no extra refcount can be migrated. > > Similarly, your cow code needs to fail if there is an > extra reference count pinning the page. As long as > the page has a user that you cannot migrate, you cannot > move any of the other users over. They may rely on data > written by the hidden-to-you user, and the hidden-to-you > user may write to the page when you think it is a read > only stable snapshot. Please bear with me as I study these cases one by one. First one is ptrace. Only for executable files, right? Maybe we don't need to fork pages in executable files, Uprobes... If somebody puts a breakpoint in a page and we fork it, the replacement page has a copy of the breakpoint, and all the code on the page. Did anything break? Note: we have the option of being cowardly and just not doing page forking for mmapped files, or certain kinds of mmapped files, etc. But first we should give it the old college try, to see if absolute perfection is possible and practical. Regards, Daniel