linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: David Lang <david@lang.hm>, Rik van Riel <riel@redhat.com>,
	Jan Kara <jack@suse.cz>,
	tux3@tux3.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Daniel Phillips <daniel@phunq.net>
Subject: Re: [FYI] tux3: Core changes
Date: Thu, 9 Jul 2015 18:05:28 +0200	[thread overview]
Message-ID: <20150709160528.GK2900@quack.suse.cz> (raw)
In-Reply-To: <87k2ueepd6.fsf@mail.parknet.co.jp>

On Sun 05-07-15 21:54:45, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> I'm not sure I'm understanding your pseudocode logic correctly though.
> >> This logic doesn't seems to be a page forking specific issue.  And
> >> this pseudocode logic seems to be missing the locking and revalidate of
> >> page.
> >> 
> >> If you can show more details, it would be helpful to see more, and
> >> discuss the issue of page forking, or we can think about how to handle
> >> the corner cases.
> >> 
> >> Well, before that, why need more details?
> >> 
> >> For example, replace the page fork at (4) with "truncate", "punch
> >> hole", or "invalidate page".
> >> 
> >> Those operations remove the old page from radix tree, so the
> >> userspace's write creates the new page, and HW still refererences the
> >> old page.  (I.e. situation should be same with page forking, in my
> >> understand of this pseudocode logic.)
> >
> > Yes, if userspace truncates the file, the situation we end up with is
> > basically the same. However for truncate to happen some malicious process
> > has to come and truncate the file - a failure scenario that is acceptable
> > for most use cases since it doesn't happen unless someone is actively
> > trying to screw you. With page forking it is enough for flusher thread
> > to start writeback for that page to trigger the problem - event that is
> > basically bound to happen without any other userspace application
> > interfering.
> 
> Acceptable conclusion is where came from? That pseudocode logic doesn't
> say about usage at all. And even if assume it is acceptable, as far as I
> can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a
> page on non-exists block (sparse file. i.e. missing disk space check in
> your logic). And if really no any lock/check, there would be another
> races.

So drop_caches won't cause any issues because it avoids mmaped pages.
Also page reclaim or page migration don't cause any issues because
they avoid pages with increased refcount (and increased refcount would stop
drop_caches from reclaiming the page as well if it was not for the mmaped
check before). Generally, elevated page refcount currently guarantees page
isn't migrated, reclaimed, or otherwise detached from the mapping (except
for truncate where the combination of mapping-index becomes invalid) and
your page forking would change that assumption - which IMHO has a big
potential for some breakage somewhere. And frankly I fail to see why you
and Daniel care so much about this corner case because from performance POV
it's IMHO a non-issue and you bother with page forking because of
performance, don't you?

> >> IOW, this pseudocode logic seems to be broken without page forking if
> >> no lock and revalidate.  Usually, we prevent unpleasant I/O by
> >> lock_page or PG_writeback, and an obsolated page is revalidated under
> >> lock_page.
> >
> > Well, good luck with converting all the get_user_pages() users in kernel to
> > use lock_page() or PG_writeback checks to avoid issues with page forking. I
> > don't think that's really feasible.
> 
> What does all get_user_pages() conversion mean? Well, maybe right more
> or less, I also think there is the issue in/around get_user_pages() that
> we have to tackle.
> 
> 
> IMO, if there is a code that pseudocode logic actually, it is the
> breakage. And "it is acceptable and limitation, and give up to fix", I
> don't think it is the right way to go. If there is really code broken
> like your logic, I think we should fix.
> 
> Could you point which code is using your logic? Since that seems to be
> so racy, I can't believe yet there are that racy codes actually.

So you can have a look for example at
drivers/media/v4l2-core/videobuf2-dma-contig.c which implements setting up
of a video device buffer at virtual address specified by user. Now I don't
know whether there really is any userspace video program that sets up the
video buffer in mmaped file. I would agree with you that it would be a
strange thing to do but I've seen enough strange userspace code that I
would not be too surprised.

Another example of similar kind is at
drivers/infiniband/core/umem.c where we again set up buffer for infiniband
cards at users specified virtual address. And there are more drivers in
kernel like that.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2015-07-09 16:05 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 23:13 Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29  2:21 ` Mike Galbraith
2015-04-29  6:01   ` Daniel Phillips
2015-04-29  6:20     ` Richard Weinberger
2015-04-29  6:56       ` Daniel Phillips
2015-04-29  6:33     ` Mike Galbraith
2015-04-29  7:23       ` Daniel Phillips
2015-04-29 16:42         ` Mike Galbraith
2015-04-29 19:05           ` xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Mike Galbraith
2015-04-29 19:20             ` Austin S Hemmelgarn
2015-04-29 21:12             ` Daniel Phillips
2015-04-30  4:40               ` Mike Galbraith
2015-04-30  0:20             ` Dave Chinner
2015-04-30  3:35               ` Mike Galbraith
2015-04-30  9:00               ` Martin Steigerwald
2015-04-30 14:57                 ` Theodore Ts'o
2015-04-30 15:59                   ` Daniel Phillips
2015-04-30 17:59                   ` Martin Steigerwald
2015-04-30 11:14               ` Daniel Phillips
2015-04-30 12:07                 ` Mike Galbraith
2015-04-30 12:58                   ` Daniel Phillips
2015-04-30 13:48                     ` Mike Galbraith
2015-04-30 14:07                       ` Daniel Phillips
2015-04-30 14:28                         ` Howard Chu
2015-04-30 15:14                           ` Daniel Phillips
2015-04-30 16:00                             ` Howard Chu
2015-04-30 18:22                             ` Christian Stroetmann
2015-05-11 22:12                             ` Pavel Machek
2015-05-11 23:17                               ` Theodore Ts'o
2015-05-12  2:34                                 ` Daniel Phillips
2015-05-12  5:38                                   ` Dave Chinner
2015-05-12  6:18                                     ` Daniel Phillips
2015-05-12 18:39                                       ` David Lang
2015-05-12 20:54                                         ` Daniel Phillips
2015-05-12 21:30                                           ` David Lang
2015-05-12 22:27                                             ` Daniel Phillips
2015-05-12 22:35                                               ` David Lang
2015-05-12 23:55                                                 ` Theodore Ts'o
2015-05-13  1:26                                                 ` Daniel Phillips
2015-05-13 19:09                                                   ` Martin Steigerwald
2015-05-13 19:37                                                     ` Daniel Phillips
2015-05-13 20:02                                                       ` Jeremy Allison
2015-05-13 20:24                                                         ` Daniel Phillips
2015-05-13 20:25                                                       ` Martin Steigerwald
2015-05-13 20:38                                                         ` Daniel Phillips
2015-05-13 21:10                                                           ` Martin Steigerwald
2015-05-13  0:31                                             ` Daniel Phillips
2015-05-12 21:30                                           ` Christian Stroetmann
2015-05-13  7:20                                           ` Pavel Machek
2015-05-13 13:47                                             ` Elifarley Callado Coelho Cruz
2015-05-12  9:03                                   ` Pavel Machek
2015-05-12 11:22                                     ` Daniel Phillips
2015-05-12 13:26                                       ` Howard Chu
2015-05-11 23:53                               ` Daniel Phillips
2015-05-12  0:12                                 ` David Lang
2015-05-12  4:36                                   ` Daniel Phillips
2015-05-12 17:30                                     ` Christian Stroetmann
2015-05-13  7:25                                 ` Pavel Machek
2015-05-13 11:31                                   ` Daniel Phillips
2015-05-13 12:41                                     ` Daniel Phillips
2015-05-13 13:08                                     ` Mike Galbraith
2015-05-13 13:15                                       ` Daniel Phillips
2015-04-30 14:33                         ` Mike Galbraith
2015-04-30 15:24                           ` Daniel Phillips
2015-04-29 20:40           ` Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29 22:06             ` OGAWA Hirofumi
2015-04-30  3:57               ` Mike Galbraith
2015-04-30  3:50             ` Mike Galbraith
2015-04-30 10:59               ` Daniel Phillips
2015-04-30  1:46 ` Dave Chinner
2015-04-30 10:28   ` Daniel Phillips
2015-05-01 15:38     ` Dave Chinner
2015-05-01 23:20       ` Daniel Phillips
2015-05-02  1:07         ` David Lang
2015-05-02 10:26           ` Daniel Phillips
2015-05-02 16:00             ` Christian Stroetmann
2015-05-02 16:30               ` Richard Weinberger
2015-05-02 17:00                 ` Christian Stroetmann
2015-05-12 17:41 ` Daniel Phillips
2015-05-12 17:46 ` Tux3 Report: How fast can we fail? Daniel Phillips
2015-05-13 22:07   ` Daniel Phillips
2015-05-26 10:03   ` Pavel Machek
2015-05-27  6:41     ` Mosis Tembo
2015-05-27 18:28       ` Daniel Phillips
2015-05-27 21:39         ` Pavel Machek
2015-05-27 22:46           ` Daniel Phillips
2015-05-28 12:55             ` Austin S Hemmelgarn
2015-05-27  7:37     ` Mosis Tembo
2015-05-27 14:04       ` Austin S Hemmelgarn
2015-05-27 15:21         ` Mosis Tembo
2015-05-27 15:37           ` Austin S Hemmelgarn
2015-05-14  7:37 ` [WIP] tux3: Optimized fsync Daniel Phillips
2015-05-14  8:26 ` [FYI] tux3: Core changes Daniel Phillips
2015-05-14 12:59   ` Rik van Riel
2015-05-15  0:06     ` Daniel Phillips
2015-05-15  3:06       ` Rik van Riel
2015-05-15  8:09         ` Mel Gorman
2015-05-15  9:54           ` Daniel Phillips
2015-05-15 11:00             ` Mel Gorman
2015-05-16 22:38               ` David Lang
2015-05-18 12:57                 ` Mel Gorman
2015-05-15  9:38         ` Daniel Phillips
2015-05-27  7:41           ` Pavel Machek
2015-05-27 18:09             ` Daniel Phillips
2015-05-27 21:37               ` Pavel Machek
2015-05-27 22:33                 ` Daniel Phillips
2015-05-15  8:05       ` Mel Gorman
2015-05-17 13:26     ` Boaz Harrosh
2015-05-18  2:20       ` Rik van Riel
2015-05-18  7:58         ` Boaz Harrosh
2015-05-19  4:46         ` Daniel Phillips
2015-05-21 19:43     ` [WIP][PATCH] tux3: preliminatry nospace handling Daniel Phillips
2015-05-19 14:00   ` [FYI] tux3: Core changes Jan Kara
2015-05-19 19:18     ` Daniel Phillips
2015-05-19 20:33       ` David Lang
2015-05-20 14:44         ` Jan Kara
2015-05-20 16:22           ` Daniel Phillips
2015-05-20 18:01             ` David Lang
2015-05-20 19:53             ` Rik van Riel
2015-05-20 22:51               ` Daniel Phillips
2015-05-21  3:24                 ` Daniel Phillips
2015-05-21  3:51                   ` David Lang
2015-05-21 19:53                     ` Daniel Phillips
2015-05-26  4:25                       ` Rik van Riel
2015-05-26  4:30                         ` Daniel Phillips
2015-05-26  6:04                           ` David Lang
2015-05-26  6:11                             ` Daniel Phillips
2015-05-26  6:13                               ` David Lang
2015-05-26  8:09                                 ` Daniel Phillips
2015-05-26 10:13                                   ` Pavel Machek
2015-05-26  7:09                               ` Jan Kara
2015-05-26  8:08                                 ` Daniel Phillips
2015-05-26  9:00                                   ` Jan Kara
2015-05-26 20:22                                     ` Daniel Phillips
2015-05-26 21:36                                       ` Rik van Riel
2015-05-26 21:49                                         ` Daniel Phillips
2015-05-27  8:41                                       ` Jan Kara
2015-06-21 15:36                                         ` OGAWA Hirofumi
2015-06-23 16:12                                           ` Jan Kara
2015-07-05 12:54                                             ` OGAWA Hirofumi
2015-07-09 16:05                                               ` Jan Kara [this message]
2015-07-31  4:44                                                 ` OGAWA Hirofumi
2015-07-31 15:37                                                   ` Raymond Jennings
2015-07-31 17:27                                                     ` Daniel Phillips
2015-07-31 18:29                                                       ` David Lang
2015-07-31 18:43                                                         ` Daniel Phillips
2015-07-31 22:12                                                         ` Daniel Phillips
2015-07-31 22:27                                                           ` David Lang
2015-08-01  0:00                                                             ` Daniel Phillips
2015-08-01  0:16                                                               ` Daniel Phillips
2015-08-03 13:07                                                                 ` Jan Kara
2015-08-01 10:55                                                             ` Elifarley Callado Coelho Cruz
2015-08-18 16:39                                                       ` Rik van Riel
2015-08-03 13:42                                                   ` Jan Kara
2015-08-09 13:42                                                     ` OGAWA Hirofumi
2015-08-10 12:45                                                       ` Jan Kara
2015-08-16 19:42                                                         ` OGAWA Hirofumi
2015-05-26 10:22                                   ` Sergey Senozhatsky
2015-05-26 12:33                                     ` Jan Kara
2015-05-26 19:18                                     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150709160528.GK2900@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=daniel@phunq.net \
    --cc=david@lang.hm \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=tux3@tux3.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).