All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
	Christopher Lameter <cl@linux.com>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [LSFMM] RDMA data corruption potential during FS writeback
Date: Wed, 23 May 2018 16:03:25 -0700	[thread overview]
Message-ID: <414dceeb-1cd1-61e4-2e5b-31f06242aba3@nvidia.com> (raw)
In-Reply-To: <20180521143830.GA25109@bombadil.infradead.org>

On 05/21/2018 07:38 AM, Matthew Wilcox wrote:
> On Fri, May 18, 2018 at 08:51:38PM -0700, Dan Williams wrote:
-------------8<------------------------------------------------
> 
> Suggestion 1:
> 
> in get_user_pages_fast(), mark the page as dirty, but don't tag the radix
> tree entry as dirty.  Then vmscan() won't find it when it's looking to
> write out dirty pages.  Only mark it as dirty in the radix tree once we
> call set_page_dirty_lock().
> 
> Suggestion 2:
> 
> in get_user_pages_fast(), replace the page in the radix tree with a special
> entry that means "page under io".  In set_page_dirty_lock(), replace the
> "page under io" entry with the struct page pointer.

This second one feels a simpler to me. If no one sees huge problems with this,
I can put this together and try it out, because I have a few nicely reproducible
bugs that I can test this on.

But with either approach, a quick question first: will this do the right thing
for the other two use cases below?

    a) ftruncate

    b) deleting the inode and dropping all references to it (only the 
       get_user_pages reference remains)

...or is some other way to sneak in and try_to_free_buffers() on a 
page in this state?

Also, just to be sure I'm on the same page, is it accurate to claim that we
would then have the following updated guidelines for device drivers and
user space?

1. You can safely DMA to file-backed memory that you've pinned via
get_user_pages (with the usual caveats about getting the pages you think
you're getting), if you are careful to avoid truncating or deleting the 
file out from under get_user_pages.

In other words, this pattern is supported:

    get_user_pages (on file-backed memory from a persistent storage filesystem)
    ...do some DMA
    set_page_dirty_lock
    put_page

2. Furthermore, even if you are less careful, you still won't crash the kernel,
The worst that could happen is to corrupt your data, due to interrupting the
writeback.

The possibility of data corruption is bad, but it's also arguably both
self-inflicted and avoidable. Anyway, even so, it's an improvement: the bugs
I'm seeing would definitely get fixed with this.
    
> 
> Both of these suggestions have trouble with simultaneous sub-page IOs to the
> same page.  Do we care?  I suspect we might as pages get larger (see also:
> supporting THP pages in the page cache).
> 

I don't *think* we care. At least, no examples occur to me where this would
cause a problem.

thanks,
-- 
John Hubbard
NVIDIA

  reply	other threads:[~2018-05-23 23:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18 14:37 [LSFMM] RDMA data corruption potential during FS writeback Christopher Lameter
2018-05-18 15:49 ` Jason Gunthorpe
2018-05-18 16:47   ` Christopher Lameter
2018-05-18 17:36     ` Jason Gunthorpe
2018-05-18 20:23       ` Dan Williams
2018-05-19  2:33         ` John Hubbard
2018-05-19  3:24           ` Jason Gunthorpe
2018-05-19  3:51             ` Dan Williams
2018-05-19  5:38               ` John Hubbard
2018-05-21 14:38               ` Matthew Wilcox
2018-05-23 23:03                 ` John Hubbard [this message]
2018-05-21 13:37             ` Christopher Lameter
2018-05-21 13:59           ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=414dceeb-1cd1-61e4-2e5b-31f06242aba3@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.