From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> To: Jan Kara <jack@suse.cz> Cc: Daniel Phillips <daniel@phunq.net>, David Lang <david@lang.hm>, Rik van Riel <riel@redhat.com>, tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes Date: Sun, 05 Jul 2015 21:54:45 +0900 [thread overview] Message-ID: <87k2ueepd6.fsf@mail.parknet.co.jp> (raw) In-Reply-To: <20150623161247.GP2427@quack.suse.cz> (Jan Kara's message of "Tue, 23 Jun 2015 18:12:47 +0200") Jan Kara <jack@suse.cz> writes: >> I'm not sure I'm understanding your pseudocode logic correctly though. >> This logic doesn't seems to be a page forking specific issue. And >> this pseudocode logic seems to be missing the locking and revalidate of >> page. >> >> If you can show more details, it would be helpful to see more, and >> discuss the issue of page forking, or we can think about how to handle >> the corner cases. >> >> Well, before that, why need more details? >> >> For example, replace the page fork at (4) with "truncate", "punch >> hole", or "invalidate page". >> >> Those operations remove the old page from radix tree, so the >> userspace's write creates the new page, and HW still refererences the >> old page. (I.e. situation should be same with page forking, in my >> understand of this pseudocode logic.) > > Yes, if userspace truncates the file, the situation we end up with is > basically the same. However for truncate to happen some malicious process > has to come and truncate the file - a failure scenario that is acceptable > for most use cases since it doesn't happen unless someone is actively > trying to screw you. With page forking it is enough for flusher thread > to start writeback for that page to trigger the problem - event that is > basically bound to happen without any other userspace application > interfering. Acceptable conclusion is where came from? That pseudocode logic doesn't say about usage at all. And even if assume it is acceptable, as far as I can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a page on non-exists block (sparse file. i.e. missing disk space check in your logic). And if really no any lock/check, there would be another races. >> IOW, this pseudocode logic seems to be broken without page forking if >> no lock and revalidate. Usually, we prevent unpleasant I/O by >> lock_page or PG_writeback, and an obsolated page is revalidated under >> lock_page. > > Well, good luck with converting all the get_user_pages() users in kernel to > use lock_page() or PG_writeback checks to avoid issues with page forking. I > don't think that's really feasible. What does all get_user_pages() conversion mean? Well, maybe right more or less, I also think there is the issue in/around get_user_pages() that we have to tackle. IMO, if there is a code that pseudocode logic actually, it is the breakage. And "it is acceptable and limitation, and give up to fix", I don't think it is the right way to go. If there is really code broken like your logic, I think we should fix. Could you point which code is using your logic? Since that seems to be so racy, I can't believe yet there are that racy codes actually. >> For page forking, we may also be able to prevent similar situation by >> locking, flags, and revalidate. But those details might be different >> with current code, because page states are different. > > Sorry, I don't understand what do you mean in this paragraph. Can you > explain it a bit more? This just means a forked page (old page) and a truncated page have different set of flags and state, so we may have to adjust revalidation. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
WARNING: multiple messages have this Message-ID (diff)
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> To: Jan Kara <jack@suse.cz> Cc: David Lang <david@lang.hm>, Rik van Riel <riel@redhat.com>, tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Daniel Phillips <daniel@phunq.net> Subject: Re: [FYI] tux3: Core changes Date: Sun, 05 Jul 2015 21:54:45 +0900 [thread overview] Message-ID: <87k2ueepd6.fsf@mail.parknet.co.jp> (raw) In-Reply-To: <20150623161247.GP2427@quack.suse.cz> (Jan Kara's message of "Tue, 23 Jun 2015 18:12:47 +0200") Jan Kara <jack@suse.cz> writes: >> I'm not sure I'm understanding your pseudocode logic correctly though. >> This logic doesn't seems to be a page forking specific issue. And >> this pseudocode logic seems to be missing the locking and revalidate of >> page. >> >> If you can show more details, it would be helpful to see more, and >> discuss the issue of page forking, or we can think about how to handle >> the corner cases. >> >> Well, before that, why need more details? >> >> For example, replace the page fork at (4) with "truncate", "punch >> hole", or "invalidate page". >> >> Those operations remove the old page from radix tree, so the >> userspace's write creates the new page, and HW still refererences the >> old page. (I.e. situation should be same with page forking, in my >> understand of this pseudocode logic.) > > Yes, if userspace truncates the file, the situation we end up with is > basically the same. However for truncate to happen some malicious process > has to come and truncate the file - a failure scenario that is acceptable > for most use cases since it doesn't happen unless someone is actively > trying to screw you. With page forking it is enough for flusher thread > to start writeback for that page to trigger the problem - event that is > basically bound to happen without any other userspace application > interfering. Acceptable conclusion is where came from? That pseudocode logic doesn't say about usage at all. And even if assume it is acceptable, as far as I can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a page on non-exists block (sparse file. i.e. missing disk space check in your logic). And if really no any lock/check, there would be another races. >> IOW, this pseudocode logic seems to be broken without page forking if >> no lock and revalidate. Usually, we prevent unpleasant I/O by >> lock_page or PG_writeback, and an obsolated page is revalidated under >> lock_page. > > Well, good luck with converting all the get_user_pages() users in kernel to > use lock_page() or PG_writeback checks to avoid issues with page forking. I > don't think that's really feasible. What does all get_user_pages() conversion mean? Well, maybe right more or less, I also think there is the issue in/around get_user_pages() that we have to tackle. IMO, if there is a code that pseudocode logic actually, it is the breakage. And "it is acceptable and limitation, and give up to fix", I don't think it is the right way to go. If there is really code broken like your logic, I think we should fix. Could you point which code is using your logic? Since that seems to be so racy, I can't believe yet there are that racy codes actually. >> For page forking, we may also be able to prevent similar situation by >> locking, flags, and revalidate. But those details might be different >> with current code, because page states are different. > > Sorry, I don't understand what do you mean in this paragraph. Can you > explain it a bit more? This just means a forked page (old page) and a truncated page have different set of flags and state, so we may have to adjust revalidation. Thanks. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
next prev parent reply other threads:[~2015-07-05 12:55 UTC|newest] Thread overview: 211+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-04-28 23:13 Tux3 Report: How fast can we fsync? Daniel Phillips 2015-04-28 23:13 ` Daniel Phillips 2015-04-29 2:21 ` Mike Galbraith 2015-04-29 6:01 ` Daniel Phillips 2015-04-29 6:01 ` Daniel Phillips 2015-04-29 6:20 ` Richard Weinberger 2015-04-29 6:56 ` Daniel Phillips 2015-04-29 6:56 ` Daniel Phillips 2015-04-29 6:33 ` Mike Galbraith 2015-04-29 7:23 ` Daniel Phillips 2015-04-29 7:23 ` Daniel Phillips 2015-04-29 16:42 ` Mike Galbraith 2015-04-29 19:05 ` xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Mike Galbraith 2015-04-29 19:20 ` Austin S Hemmelgarn 2015-04-29 21:12 ` Daniel Phillips 2015-04-30 4:40 ` Mike Galbraith 2015-04-30 0:20 ` Dave Chinner 2015-04-30 3:35 ` Mike Galbraith 2015-04-30 9:00 ` Martin Steigerwald 2015-04-30 9:00 ` Martin Steigerwald 2015-04-30 14:57 ` Theodore Ts'o 2015-04-30 15:59 ` Daniel Phillips 2015-04-30 17:59 ` Martin Steigerwald 2015-04-30 11:14 ` Daniel Phillips 2015-04-30 12:07 ` Mike Galbraith 2015-04-30 12:58 ` Daniel Phillips 2015-04-30 12:58 ` Daniel Phillips 2015-04-30 13:48 ` Mike Galbraith 2015-04-30 14:07 ` Daniel Phillips 2015-04-30 14:28 ` Howard Chu 2015-04-30 14:28 ` Howard Chu 2015-04-30 15:14 ` Daniel Phillips 2015-04-30 16:00 ` Howard Chu 2015-04-30 18:22 ` Christian Stroetmann 2015-05-11 22:12 ` Pavel Machek 2015-05-11 23:17 ` Theodore Ts'o 2015-05-12 2:34 ` Daniel Phillips 2015-05-12 5:38 ` Dave Chinner 2015-05-12 6:18 ` Daniel Phillips 2015-05-12 6:18 ` Daniel Phillips 2015-05-12 18:39 ` David Lang 2015-05-12 20:54 ` Daniel Phillips 2015-05-12 21:30 ` David Lang 2015-05-12 22:27 ` Daniel Phillips 2015-05-12 22:35 ` David Lang 2015-05-12 23:55 ` Theodore Ts'o 2015-05-13 1:26 ` Daniel Phillips 2015-05-13 19:09 ` Martin Steigerwald 2015-05-13 19:37 ` Daniel Phillips 2015-05-13 20:02 ` Jeremy Allison 2015-05-13 20:02 ` Jeremy Allison 2015-05-13 20:24 ` Daniel Phillips 2015-05-13 20:25 ` Martin Steigerwald 2015-05-13 20:38 ` Daniel Phillips 2015-05-13 21:10 ` Martin Steigerwald 2015-05-13 0:31 ` Daniel Phillips 2015-05-12 21:30 ` Christian Stroetmann 2015-05-13 7:20 ` Pavel Machek 2015-05-13 13:47 ` Elifarley Callado Coelho Cruz 2015-05-12 9:03 ` Pavel Machek 2015-05-12 9:03 ` Pavel Machek 2015-05-12 11:22 ` Daniel Phillips 2015-05-12 13:26 ` Howard Chu 2015-05-11 23:53 ` Daniel Phillips 2015-05-11 23:53 ` Daniel Phillips 2015-05-12 0:12 ` David Lang 2015-05-12 4:36 ` Daniel Phillips 2015-05-12 17:30 ` Christian Stroetmann 2015-05-13 7:25 ` Pavel Machek 2015-05-13 11:31 ` Daniel Phillips 2015-05-13 12:41 ` Daniel Phillips 2015-05-13 13:08 ` Mike Galbraith 2015-05-13 13:15 ` Daniel Phillips 2015-04-30 14:33 ` Mike Galbraith 2015-04-30 15:24 ` Daniel Phillips 2015-04-30 15:24 ` Daniel Phillips 2015-04-29 20:40 ` Tux3 Report: How fast can we fsync? Daniel Phillips 2015-04-29 20:40 ` Daniel Phillips 2015-04-29 22:06 ` OGAWA Hirofumi 2015-04-29 22:06 ` OGAWA Hirofumi 2015-04-30 3:57 ` Mike Galbraith 2015-04-30 3:50 ` Mike Galbraith 2015-04-30 10:59 ` Daniel Phillips 2015-04-30 1:46 ` Dave Chinner 2015-04-30 10:28 ` Daniel Phillips 2015-04-30 10:28 ` Daniel Phillips [not found] ` <55420EAC.5040900@suse.com> 2015-04-30 11:36 ` Daniel Phillips 2015-04-30 13:19 ` Filipe David Manana 2015-04-30 13:25 ` Daniel Phillips 2015-05-01 15:38 ` Dave Chinner 2015-05-01 23:20 ` Daniel Phillips 2015-05-02 1:07 ` David Lang 2015-05-02 10:26 ` Daniel Phillips 2015-05-02 16:00 ` Christian Stroetmann 2015-05-02 16:30 ` Richard Weinberger 2015-05-02 17:00 ` Christian Stroetmann 2015-05-12 17:41 ` Daniel Phillips 2015-05-12 17:46 ` Tux3 Report: How fast can we fail? Daniel Phillips 2015-05-13 22:07 ` Daniel Phillips 2015-05-26 10:03 ` Pavel Machek 2015-05-26 10:03 ` Pavel Machek 2015-05-27 6:41 ` Mosis Tembo 2015-05-27 18:28 ` Daniel Phillips 2015-05-27 18:28 ` Daniel Phillips 2015-05-27 21:39 ` Pavel Machek 2015-05-27 22:46 ` Daniel Phillips 2015-05-28 12:55 ` Austin S Hemmelgarn 2015-05-27 7:37 ` Mosis Tembo 2015-05-27 14:04 ` Austin S Hemmelgarn 2015-05-27 15:21 ` Mosis Tembo 2015-05-27 15:37 ` Austin S Hemmelgarn 2015-05-14 7:37 ` [WIP] tux3: Optimized fsync Daniel Phillips 2015-05-14 8:26 ` [FYI] tux3: Core changes Daniel Phillips 2015-05-14 12:59 ` Rik van Riel 2015-05-15 0:06 ` Daniel Phillips 2015-05-15 0:06 ` Daniel Phillips 2015-05-15 3:06 ` Rik van Riel 2015-05-15 8:09 ` Mel Gorman 2015-05-15 9:54 ` Daniel Phillips 2015-05-15 9:54 ` Daniel Phillips 2015-05-15 11:00 ` Mel Gorman 2015-05-16 22:38 ` David Lang 2015-05-18 12:57 ` Mel Gorman 2015-05-18 12:57 ` Mel Gorman 2015-05-15 9:38 ` Daniel Phillips 2015-05-15 9:38 ` Daniel Phillips 2015-05-27 7:41 ` Pavel Machek 2015-05-27 18:09 ` Daniel Phillips 2015-05-27 18:09 ` Daniel Phillips 2015-05-27 21:37 ` Pavel Machek 2015-05-27 22:33 ` Daniel Phillips 2015-05-15 8:05 ` Mel Gorman 2015-05-17 13:26 ` Boaz Harrosh 2015-05-18 2:20 ` Rik van Riel 2015-05-18 7:58 ` Boaz Harrosh 2015-05-19 4:46 ` Daniel Phillips 2015-05-21 19:43 ` [WIP][PATCH] tux3: preliminatry nospace handling Daniel Phillips 2015-05-19 14:00 ` [FYI] tux3: Core changes Jan Kara 2015-05-19 19:18 ` Daniel Phillips 2015-05-19 20:33 ` David Lang 2015-05-19 20:33 ` David Lang 2015-05-20 14:44 ` Jan Kara 2015-05-20 16:22 ` Daniel Phillips 2015-05-20 18:01 ` David Lang 2015-05-20 18:01 ` David Lang 2015-05-20 19:53 ` Rik van Riel 2015-05-20 19:53 ` Rik van Riel 2015-05-20 22:51 ` Daniel Phillips 2015-05-20 22:51 ` Daniel Phillips 2015-05-21 3:24 ` Daniel Phillips 2015-05-21 3:51 ` David Lang 2015-05-21 19:53 ` Daniel Phillips 2015-05-21 19:53 ` Daniel Phillips 2015-05-26 4:25 ` Rik van Riel 2015-05-26 4:25 ` Rik van Riel 2015-05-26 4:30 ` Daniel Phillips 2015-05-26 4:30 ` Daniel Phillips 2015-05-26 6:04 ` David Lang 2015-05-26 6:04 ` David Lang 2015-05-26 6:11 ` Daniel Phillips 2015-05-26 6:13 ` David Lang 2015-05-26 6:13 ` David Lang 2015-05-26 8:09 ` Daniel Phillips 2015-05-26 8:09 ` Daniel Phillips 2015-05-26 10:13 ` Pavel Machek 2015-05-26 10:13 ` Pavel Machek 2015-05-26 7:09 ` Jan Kara 2015-05-26 8:08 ` Daniel Phillips 2015-05-26 8:08 ` Daniel Phillips 2015-05-26 9:00 ` Jan Kara 2015-05-26 9:00 ` Jan Kara 2015-05-26 20:22 ` Daniel Phillips 2015-05-26 21:36 ` Rik van Riel 2015-05-26 21:49 ` Daniel Phillips 2015-05-26 21:49 ` Daniel Phillips 2015-05-27 8:41 ` Jan Kara 2015-06-21 15:36 ` OGAWA Hirofumi 2015-06-21 15:36 ` OGAWA Hirofumi 2015-06-23 16:12 ` Jan Kara 2015-07-05 12:54 ` OGAWA Hirofumi [this message] 2015-07-05 12:54 ` OGAWA Hirofumi 2015-07-09 16:05 ` Jan Kara 2015-07-09 16:05 ` Jan Kara 2015-07-31 4:44 ` OGAWA Hirofumi 2015-07-31 15:37 ` Raymond Jennings 2015-07-31 17:27 ` Daniel Phillips 2015-07-31 17:27 ` Daniel Phillips 2015-07-31 18:29 ` David Lang 2015-07-31 18:29 ` David Lang 2015-07-31 18:43 ` Daniel Phillips 2015-07-31 18:43 ` Daniel Phillips 2015-07-31 22:12 ` Daniel Phillips 2015-07-31 22:12 ` Daniel Phillips 2015-07-31 22:27 ` David Lang 2015-08-01 0:00 ` Daniel Phillips 2015-08-01 0:00 ` Daniel Phillips 2015-08-01 0:16 ` Daniel Phillips 2015-08-01 0:16 ` Daniel Phillips 2015-08-03 13:07 ` Jan Kara 2015-08-01 10:55 ` Elifarley Callado Coelho Cruz 2015-08-18 16:39 ` Rik van Riel 2015-08-03 13:42 ` Jan Kara 2015-08-03 13:42 ` Jan Kara 2015-08-09 13:42 ` OGAWA Hirofumi 2015-08-10 12:45 ` Jan Kara 2015-08-10 12:45 ` Jan Kara 2015-08-16 19:42 ` OGAWA Hirofumi 2015-05-26 10:22 ` Sergey Senozhatsky 2015-05-26 12:33 ` Jan Kara 2015-05-26 12:33 ` Jan Kara 2015-05-26 19:18 ` Daniel Phillips
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87k2ueepd6.fsf@mail.parknet.co.jp \ --to=hirofumi@mail.parknet.co.jp \ --cc=daniel@phunq.net \ --cc=david@lang.hm \ --cc=jack@suse.cz \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=riel@redhat.com \ --cc=tux3@tux3.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.