From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934357AbbEOIJL (ORCPT ); Fri, 15 May 2015 04:09:11 -0400 Received: from cantor2.suse.de ([195.135.220.15]:59012 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933558AbbEOIJG (ORCPT ); Fri, 15 May 2015 04:09:06 -0400 Date: Fri, 15 May 2015 09:09:03 +0100 From: Mel Gorman To: Rik van Riel Cc: Daniel Phillips , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tux3@tux3.org, OGAWA Hirofumi , Andrea Arcangeli , Peter Zijlstra Subject: Re: [FYI] tux3: Core changes Message-ID: <20150515080902.GU2462@suse.de> References: <8f886f13-6550-4322-95be-93244ae61045@phunq.net> <55545C2F.8040207@phunq.net> <55549C2F.6000103@redhat.com> <5555388F.5010909@phunq.net> <555562AE.9020204@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <555562AE.9020204@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote: > On 05/14/2015 08:06 PM, Daniel Phillips wrote: > > Hi Rik, > > > > Added Mel, Andrea and Peterz to CC as interested parties. There are > > probably others, please just jump in. > > > > On 05/14/2015 05:59 AM, Rik van Riel wrote: > >> On 05/14/2015 04:26 AM, Daniel Phillips wrote: > >>> Hi Rik, > >>> > >>> Our linux-tux3 tree currently currently carries this 652 line diff > >>> against core, to make Tux3 work. This is mainly by Hirofumi, except > >>> the fs-writeback.c hook, which is by me. The main part you may be > >>> interested in is rmap.c, which addresses the issues raised at the > >>> 2013 Linux Storage Filesystem and MM Summit 2015 in San Francisco.[1] > >>> > >>> LSFMM: Page forking > >>> http://lwn.net/Articles/548091/ > >>> > >>> This is just a FYI. An upcoming Tux3 report will be a tour of the page > >>> forking design and implementation. For now, this is just to give a > >>> general sense of what we have done. We heard there are concerns about > >>> how ptrace will work. I really am not familiar with the issue, could > >>> you please explain what you were thinking of there? > >> > >> The issue is that things like ptrace, AIO, infiniband > >> RDMA, and other direct memory access subsystems can take > >> a reference to page A, which Tux3 clones into a new page B > >> when the process writes it. > >> > >> However, while the process now points at page B, ptrace, > >> AIO, infiniband, etc will still be pointing at page A. > >> > >> This causes the process and the other subsystem to each > >> look at a different page, instead of at shared state, > >> causing ptrace to do nothing, AIO and RDMA data to be > >> invisible (or corrupted), etc... > > > > Is this a bit like page migration? > > Yes. Page migration will fail if there is an "extra" > reference to the page that is not accounted for by > the migration code. > When I said it's not like page migration, I was referring to the fact that a COW on a pinned page for RDMA is a different problem to page migration. The COW of a pinned page can lead to lost writes or corruption depending on the ordering of events. Page migration fails when there are unexpected problems to avoid this class of issue which is fine for page migration but may be a critical failure in a filesystem depending on exactly why the copy is required. -- Mel Gorman SUSE Labs