From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751185AbbGEMzC (ORCPT ); Sun, 5 Jul 2015 08:55:02 -0400 Received: from mail.parknet.co.jp ([210.171.160.6]:38777 "EHLO mail.parknet.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750798AbbGEMyz (ORCPT ); Sun, 5 Jul 2015 08:54:55 -0400 From: OGAWA Hirofumi To: Jan Kara Cc: Daniel Phillips , David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes References: <5563F5C8.2040806@redhat.com> <67294911-1776-46b8-916d-0e5642a38725@phunq.net> <20150526070910.GA3307@quack.suse.cz> <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> Date: Sun, 05 Jul 2015 21:54:45 +0900 In-Reply-To: <20150623161247.GP2427@quack.suse.cz> (Jan Kara's message of "Tue, 23 Jun 2015 18:12:47 +0200") Message-ID: <87k2ueepd6.fsf@mail.parknet.co.jp> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jan Kara writes: >> I'm not sure I'm understanding your pseudocode logic correctly though. >> This logic doesn't seems to be a page forking specific issue. And >> this pseudocode logic seems to be missing the locking and revalidate of >> page. >> >> If you can show more details, it would be helpful to see more, and >> discuss the issue of page forking, or we can think about how to handle >> the corner cases. >> >> Well, before that, why need more details? >> >> For example, replace the page fork at (4) with "truncate", "punch >> hole", or "invalidate page". >> >> Those operations remove the old page from radix tree, so the >> userspace's write creates the new page, and HW still refererences the >> old page. (I.e. situation should be same with page forking, in my >> understand of this pseudocode logic.) > > Yes, if userspace truncates the file, the situation we end up with is > basically the same. However for truncate to happen some malicious process > has to come and truncate the file - a failure scenario that is acceptable > for most use cases since it doesn't happen unless someone is actively > trying to screw you. With page forking it is enough for flusher thread > to start writeback for that page to trigger the problem - event that is > basically bound to happen without any other userspace application > interfering. Acceptable conclusion is where came from? That pseudocode logic doesn't say about usage at all. And even if assume it is acceptable, as far as I can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a page on non-exists block (sparse file. i.e. missing disk space check in your logic). And if really no any lock/check, there would be another races. >> IOW, this pseudocode logic seems to be broken without page forking if >> no lock and revalidate. Usually, we prevent unpleasant I/O by >> lock_page or PG_writeback, and an obsolated page is revalidated under >> lock_page. > > Well, good luck with converting all the get_user_pages() users in kernel to > use lock_page() or PG_writeback checks to avoid issues with page forking. I > don't think that's really feasible. What does all get_user_pages() conversion mean? Well, maybe right more or less, I also think there is the issue in/around get_user_pages() that we have to tackle. IMO, if there is a code that pseudocode logic actually, it is the breakage. And "it is acceptable and limitation, and give up to fix", I don't think it is the right way to go. If there is really code broken like your logic, I think we should fix. Could you point which code is using your logic? Since that seems to be so racy, I can't believe yet there are that racy codes actually. >> For page forking, we may also be able to prevent similar situation by >> locking, flags, and revalidate. But those details might be different >> with current code, because page states are different. > > Sorry, I don't understand what do you mean in this paragraph. Can you > explain it a bit more? This just means a forked page (old page) and a truncated page have different set of flags and state, so we may have to adjust revalidation. Thanks. -- OGAWA Hirofumi From mboxrd@z Thu Jan 1 00:00:00 1970 From: OGAWA Hirofumi Subject: Re: [FYI] tux3: Core changes Date: Sun, 05 Jul 2015 21:54:45 +0900 Message-ID: <87k2ueepd6.fsf@mail.parknet.co.jp> References: <5563F5C8.2040806@redhat.com> <67294911-1776-46b8-916d-0e5642a38725@phunq.net> <20150526070910.GA3307@quack.suse.cz> <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Daniel Phillips To: Jan Kara Return-path: In-Reply-To: <20150623161247.GP2427@quack.suse.cz> (Jan Kara's message of "Tue, 23 Jun 2015 18:12:47 +0200") List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tux3-bounces@phunq.net Sender: "Tux3" List-Id: linux-fsdevel.vger.kernel.org Jan Kara writes: >> I'm not sure I'm understanding your pseudocode logic correctly though. >> This logic doesn't seems to be a page forking specific issue. And >> this pseudocode logic seems to be missing the locking and revalidate of >> page. >> >> If you can show more details, it would be helpful to see more, and >> discuss the issue of page forking, or we can think about how to handle >> the corner cases. >> >> Well, before that, why need more details? >> >> For example, replace the page fork at (4) with "truncate", "punch >> hole", or "invalidate page". >> >> Those operations remove the old page from radix tree, so the >> userspace's write creates the new page, and HW still refererences the >> old page. (I.e. situation should be same with page forking, in my >> understand of this pseudocode logic.) > > Yes, if userspace truncates the file, the situation we end up with is > basically the same. However for truncate to happen some malicious process > has to come and truncate the file - a failure scenario that is acceptable > for most use cases since it doesn't happen unless someone is actively > trying to screw you. With page forking it is enough for flusher thread > to start writeback for that page to trigger the problem - event that is > basically bound to happen without any other userspace application > interfering. Acceptable conclusion is where came from? That pseudocode logic doesn't say about usage at all. And even if assume it is acceptable, as far as I can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a page on non-exists block (sparse file. i.e. missing disk space check in your logic). And if really no any lock/check, there would be another races. >> IOW, this pseudocode logic seems to be broken without page forking if >> no lock and revalidate. Usually, we prevent unpleasant I/O by >> lock_page or PG_writeback, and an obsolated page is revalidated under >> lock_page. > > Well, good luck with converting all the get_user_pages() users in kernel to > use lock_page() or PG_writeback checks to avoid issues with page forking. I > don't think that's really feasible. What does all get_user_pages() conversion mean? Well, maybe right more or less, I also think there is the issue in/around get_user_pages() that we have to tackle. IMO, if there is a code that pseudocode logic actually, it is the breakage. And "it is acceptable and limitation, and give up to fix", I don't think it is the right way to go. If there is really code broken like your logic, I think we should fix. Could you point which code is using your logic? Since that seems to be so racy, I can't believe yet there are that racy codes actually. >> For page forking, we may also be able to prevent similar situation by >> locking, flags, and revalidate. But those details might be different >> with current code, because page states are different. > > Sorry, I don't understand what do you mean in this paragraph. Can you > explain it a bit more? This just means a forked page (old page) and a truncated page have different set of flags and state, so we may have to adjust revalidation. Thanks. -- OGAWA Hirofumi