linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Lang <david@lang.hm>
To: Mel Gorman <mgorman@suse.de>
Cc: Daniel Phillips <daniel@phunq.net>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	tux3@tux3.org, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [FYI] tux3: Core changes
Date: Sat, 16 May 2015 15:38:04 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.02.1505161525580.3147@nftneq.ynat.uz> (raw)
In-Reply-To: <20150515110041.GV2462@suse.de>

On Fri, 15 May 2015, Mel Gorman wrote:

> On Fri, May 15, 2015 at 02:54:48AM -0700, Daniel Phillips wrote:
>>
>>
>> On 05/15/2015 01:09 AM, Mel Gorman wrote:
>>> On Thu, May 14, 2015 at 11:06:22PM -0400, Rik van Riel wrote:
>>>> On 05/14/2015 08:06 PM, Daniel Phillips wrote:
>>>>>> The issue is that things like ptrace, AIO, infiniband
>>>>>> RDMA, and other direct memory access subsystems can take
>>>>>> a reference to page A, which Tux3 clones into a new page B
>>>>>> when the process writes it.
>>>>>>
>>>>>> However, while the process now points at page B, ptrace,
>>>>>> AIO, infiniband, etc will still be pointing at page A.
>>>>>>
>>>>>> This causes the process and the other subsystem to each
>>>>>> look at a different page, instead of at shared state,
>>>>>> causing ptrace to do nothing, AIO and RDMA data to be
>>>>>> invisible (or corrupted), etc...
>>>>>
>>>>> Is this a bit like page migration?
>>>>
>>>> Yes. Page migration will fail if there is an "extra"
>>>> reference to the page that is not accounted for by
>>>> the migration code.
>>>
>>> When I said it's not like page migration, I was referring to the fact
>>> that a COW on a pinned page for RDMA is a different problem to page
>>> migration. The COW of a pinned page can lead to lost writes or
>>> corruption depending on the ordering of events.
>>
>> I see the lost writes case, but not the corruption case,
>
> Data corruption can occur depending on the ordering of events and the
> applications expectations. If a process starts IO, RDMA pins the page
> for read and forks are combined with writes from another thread then when
> the IO completes the reads may not be visible. The application may take
> improper action at that point.

if tux3 forks the page and writes the copy while the original page is being 
modified by other things, this means that some of the changes won't be in the 
version written (and this could catch partial writes with 'interesting' results 
if the forking happens at the wrong time)

But if the original page gets re-marked as needing to be written out when it's 
changed by one of the other things that are accessing it, there shouldn't be any 
long-term corruption.

As far as short-term corruption goes, any time you have a page mmapped it could 
get written out at any time, with only some of the application changes applied 
to it, so this sort of corruption could happen anyway couldn't it?

> Users of RDMA are typically expected to use MADV_DONTFORK to avoid this
> class of problem.
>
> You can choose to not define this as data corruption because thge kernel
> is not directly involved and that's your call.
>
>> Do you
>> mean corruption by changing a page already in writeout? If so,
>> don't all filesystems have that problem?
>>
>
> No, the problem is different. Backing devices requiring stable pages will
> block the write until the IO is complete. For those that do not require
> stable pages it's ok to allow the write as long as the page is dirtied so
> that it'll be written out again and no data is lost.

so if tux3 is prevented from forking the page in cases where the write would be 
blocked, and will get forked again for follow-up writes if it's modified again 
otherwise, won't this be the same thing?

David Lang

>> If RDMA to a mmapped file races with write(2) to the same file,
>> maybe it is reasonable and expected to lose some data.
>>
>
> In the RDMA case, there is at least application awareness to work around
> the problems. Normally it's ok to have both mapped and write() access
> to data although userspace might need a lock to co-ordinate updates and
> event ordering.
>
>

  reply	other threads:[~2015-05-16 22:39 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 23:13 Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29  2:21 ` Mike Galbraith
2015-04-29  6:01   ` Daniel Phillips
2015-04-29  6:20     ` Richard Weinberger
2015-04-29  6:56       ` Daniel Phillips
2015-04-29  6:33     ` Mike Galbraith
2015-04-29  7:23       ` Daniel Phillips
2015-04-29 16:42         ` Mike Galbraith
2015-04-29 19:05           ` xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Mike Galbraith
2015-04-29 19:20             ` Austin S Hemmelgarn
2015-04-29 21:12             ` Daniel Phillips
2015-04-30  4:40               ` Mike Galbraith
2015-04-30  0:20             ` Dave Chinner
2015-04-30  3:35               ` Mike Galbraith
2015-04-30  9:00               ` Martin Steigerwald
2015-04-30 14:57                 ` Theodore Ts'o
2015-04-30 15:59                   ` Daniel Phillips
2015-04-30 17:59                   ` Martin Steigerwald
2015-04-30 11:14               ` Daniel Phillips
2015-04-30 12:07                 ` Mike Galbraith
2015-04-30 12:58                   ` Daniel Phillips
2015-04-30 13:48                     ` Mike Galbraith
2015-04-30 14:07                       ` Daniel Phillips
2015-04-30 14:28                         ` Howard Chu
2015-04-30 15:14                           ` Daniel Phillips
2015-04-30 16:00                             ` Howard Chu
2015-04-30 18:22                             ` Christian Stroetmann
2015-05-11 22:12                             ` Pavel Machek
2015-05-11 23:17                               ` Theodore Ts'o
2015-05-12  2:34                                 ` Daniel Phillips
2015-05-12  5:38                                   ` Dave Chinner
2015-05-12  6:18                                     ` Daniel Phillips
2015-05-12 18:39                                       ` David Lang
2015-05-12 20:54                                         ` Daniel Phillips
2015-05-12 21:30                                           ` David Lang
2015-05-12 22:27                                             ` Daniel Phillips
2015-05-12 22:35                                               ` David Lang
2015-05-12 23:55                                                 ` Theodore Ts'o
2015-05-13  1:26                                                 ` Daniel Phillips
2015-05-13 19:09                                                   ` Martin Steigerwald
2015-05-13 19:37                                                     ` Daniel Phillips
2015-05-13 20:02                                                       ` Jeremy Allison
2015-05-13 20:24                                                         ` Daniel Phillips
2015-05-13 20:25                                                       ` Martin Steigerwald
2015-05-13 20:38                                                         ` Daniel Phillips
2015-05-13 21:10                                                           ` Martin Steigerwald
2015-05-13  0:31                                             ` Daniel Phillips
2015-05-12 21:30                                           ` Christian Stroetmann
2015-05-13  7:20                                           ` Pavel Machek
2015-05-13 13:47                                             ` Elifarley Callado Coelho Cruz
2015-05-12  9:03                                   ` Pavel Machek
2015-05-12 11:22                                     ` Daniel Phillips
2015-05-12 13:26                                       ` Howard Chu
2015-05-11 23:53                               ` Daniel Phillips
2015-05-12  0:12                                 ` David Lang
2015-05-12  4:36                                   ` Daniel Phillips
2015-05-12 17:30                                     ` Christian Stroetmann
2015-05-13  7:25                                 ` Pavel Machek
2015-05-13 11:31                                   ` Daniel Phillips
2015-05-13 12:41                                     ` Daniel Phillips
2015-05-13 13:08                                     ` Mike Galbraith
2015-05-13 13:15                                       ` Daniel Phillips
2015-04-30 14:33                         ` Mike Galbraith
2015-04-30 15:24                           ` Daniel Phillips
2015-04-29 20:40           ` Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29 22:06             ` OGAWA Hirofumi
2015-04-30  3:57               ` Mike Galbraith
2015-04-30  3:50             ` Mike Galbraith
2015-04-30 10:59               ` Daniel Phillips
2015-04-30  1:46 ` Dave Chinner
2015-04-30 10:28   ` Daniel Phillips
2015-05-01 15:38     ` Dave Chinner
2015-05-01 23:20       ` Daniel Phillips
2015-05-02  1:07         ` David Lang
2015-05-02 10:26           ` Daniel Phillips
2015-05-02 16:00             ` Christian Stroetmann
2015-05-02 16:30               ` Richard Weinberger
2015-05-02 17:00                 ` Christian Stroetmann
2015-05-12 17:41 ` Daniel Phillips
2015-05-12 17:46 ` Tux3 Report: How fast can we fail? Daniel Phillips
2015-05-13 22:07   ` Daniel Phillips
2015-05-26 10:03   ` Pavel Machek
2015-05-27  6:41     ` Mosis Tembo
2015-05-27 18:28       ` Daniel Phillips
2015-05-27 21:39         ` Pavel Machek
2015-05-27 22:46           ` Daniel Phillips
2015-05-28 12:55             ` Austin S Hemmelgarn
2015-05-27  7:37     ` Mosis Tembo
2015-05-27 14:04       ` Austin S Hemmelgarn
2015-05-27 15:21         ` Mosis Tembo
2015-05-27 15:37           ` Austin S Hemmelgarn
2015-05-14  7:37 ` [WIP] tux3: Optimized fsync Daniel Phillips
2015-05-14  8:26 ` [FYI] tux3: Core changes Daniel Phillips
2015-05-14 12:59   ` Rik van Riel
2015-05-15  0:06     ` Daniel Phillips
2015-05-15  3:06       ` Rik van Riel
2015-05-15  8:09         ` Mel Gorman
2015-05-15  9:54           ` Daniel Phillips
2015-05-15 11:00             ` Mel Gorman
2015-05-16 22:38               ` David Lang [this message]
2015-05-18 12:57                 ` Mel Gorman
2015-05-15  9:38         ` Daniel Phillips
2015-05-27  7:41           ` Pavel Machek
2015-05-27 18:09             ` Daniel Phillips
2015-05-27 21:37               ` Pavel Machek
2015-05-27 22:33                 ` Daniel Phillips
2015-05-15  8:05       ` Mel Gorman
2015-05-17 13:26     ` Boaz Harrosh
2015-05-18  2:20       ` Rik van Riel
2015-05-18  7:58         ` Boaz Harrosh
2015-05-19  4:46         ` Daniel Phillips
2015-05-21 19:43     ` [WIP][PATCH] tux3: preliminatry nospace handling Daniel Phillips
2015-05-19 14:00   ` [FYI] tux3: Core changes Jan Kara
2015-05-19 19:18     ` Daniel Phillips
2015-05-19 20:33       ` David Lang
2015-05-20 14:44         ` Jan Kara
2015-05-20 16:22           ` Daniel Phillips
2015-05-20 18:01             ` David Lang
2015-05-20 19:53             ` Rik van Riel
2015-05-20 22:51               ` Daniel Phillips
2015-05-21  3:24                 ` Daniel Phillips
2015-05-21  3:51                   ` David Lang
2015-05-21 19:53                     ` Daniel Phillips
2015-05-26  4:25                       ` Rik van Riel
2015-05-26  4:30                         ` Daniel Phillips
2015-05-26  6:04                           ` David Lang
2015-05-26  6:11                             ` Daniel Phillips
2015-05-26  6:13                               ` David Lang
2015-05-26  8:09                                 ` Daniel Phillips
2015-05-26 10:13                                   ` Pavel Machek
2015-05-26  7:09                               ` Jan Kara
2015-05-26  8:08                                 ` Daniel Phillips
2015-05-26  9:00                                   ` Jan Kara
2015-05-26 20:22                                     ` Daniel Phillips
2015-05-26 21:36                                       ` Rik van Riel
2015-05-26 21:49                                         ` Daniel Phillips
2015-05-27  8:41                                       ` Jan Kara
2015-06-21 15:36                                         ` OGAWA Hirofumi
2015-06-23 16:12                                           ` Jan Kara
2015-07-05 12:54                                             ` OGAWA Hirofumi
2015-07-09 16:05                                               ` Jan Kara
2015-07-31  4:44                                                 ` OGAWA Hirofumi
2015-07-31 15:37                                                   ` Raymond Jennings
2015-07-31 17:27                                                     ` Daniel Phillips
2015-07-31 18:29                                                       ` David Lang
2015-07-31 18:43                                                         ` Daniel Phillips
2015-07-31 22:12                                                         ` Daniel Phillips
2015-07-31 22:27                                                           ` David Lang
2015-08-01  0:00                                                             ` Daniel Phillips
2015-08-01  0:16                                                               ` Daniel Phillips
2015-08-03 13:07                                                                 ` Jan Kara
2015-08-01 10:55                                                             ` Elifarley Callado Coelho Cruz
2015-08-18 16:39                                                       ` Rik van Riel
2015-08-03 13:42                                                   ` Jan Kara
2015-08-09 13:42                                                     ` OGAWA Hirofumi
2015-08-10 12:45                                                       ` Jan Kara
2015-08-16 19:42                                                         ` OGAWA Hirofumi
2015-05-26 10:22                                   ` Sergey Senozhatsky
2015-05-26 12:33                                     ` Jan Kara
2015-05-26 19:18                                     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1505161525580.3147@nftneq.ynat.uz \
    --to=david@lang.hm \
    --cc=aarcange@redhat.com \
    --cc=daniel@phunq.net \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tux3@tux3.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).