From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=0NEp=ZA=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN,
	FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E05CAC5DF60
	for <linux-mm@archiver.kernel.org>; Fri,  8 Nov 2019 09:38:56 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id C0062214DA
	for <linux-mm@archiver.kernel.org>; Fri,  8 Nov 2019 09:38:55 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0062214DA
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 70B4D6B0010; Fri,  8 Nov 2019 04:38:55 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6E1C76B0266; Fri,  8 Nov 2019 04:38:55 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5F7EA6B0269; Fri,  8 Nov 2019 04:38:55 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110])
	by kanga.kvack.org (Postfix) with ESMTP id 4B2B06B0010
	for <linux-mm@kvack.org>; Fri,  8 Nov 2019 04:38:55 -0500 (EST)
Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with SMTP id F32D3A8F9
	for <linux-mm@kvack.org>; Fri,  8 Nov 2019 09:38:54 +0000 (UTC)
X-FDA: 76132610988.21.hose97_87d6b1fcecd17
X-HE-Tag: hose97_87d6b1fcecd17
X-Filterd-Recvd-Size: 5246
Received: from mail3-167.sinamail.sina.com.cn (mail3-167.sinamail.sina.com.cn [202.108.3.167])
	by imf45.hostedemail.com (Postfix) with SMTP
	for <linux-mm@kvack.org>; Fri,  8 Nov 2019 09:38:53 +0000 (UTC)
Received: from unknown (HELO localhost.localdomain)([114.244.162.243])
	by sina.com with ESMTP
	id 5DC537A70000A857; Fri, 8 Nov 2019 17:38:49 +0800 (CST)
X-Sender: hdanton@sina.com
X-Auth-ID: hdanton@sina.com
X-SMAIL-MID: 714224629127
From: Hillf Danton <hdanton@sina.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [RFC] mm: gup: add helper page_try_gup_pin(page)
Date: Fri,  8 Nov 2019 17:38:37 +0800
Message-Id: <20191108093837.1696-1-hdanton@sina.com>
In-Reply-To: <20191107095017.17544-1-hdanton@sina.com>
References: <20191103112113.8256-1-hdanton@sina.com> <20191104043420.15648-1-hdanton@sina.com> <20191104102050.15988-1-hdanton@sina.com> <20191105042755.7292-1-hdanton@sina.com> <20191106092240.1712-1-hdanton@sina.com> <20191107095017.17544-1-hdanton@sina.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On Thu, 7 Nov 2019 09:57:48 -0500 Jerome Glisse wrote:
>=20
> I am not sure i follow ? Today we can not differentiate between GUP
> and regular get_page(), if you use some combination of specific fs
> and hardware you might get some BUG_ON() throws at you depending on
> how lucky/unlucky you are. We can not solve this without being able
> to differentiate between GUP and regular get_page(). Hence why John's
> patchset is the first step in the right direction.
>=20
What is the second one? And when? By who?

> If there is no GUP on a page then regular writeback happens as it has
> for years now so in absence of GUP i do not see any issue.
>=20
>=20
> > > still something where there is no agreement as far as i remember th=
e
> > > outcome of the last discussion we had. I expect this will a topic
> > > at next LSF/MM or maybe something we can flush out before.
> >
> > These are the restraints we know
> >
> > A, multiple gup pins
> > B, mutual data corruptions
> > C, no break of existing use cases
> > D, zero copy
>=20
> ? What you mean by zero copy ?
>=20
Snippet that can be found at https://lwn.net/Articles/784574/

"get_user_pages() is a way to map user-space memory into the kernel's
address space; it will ensure that all of the requested pages have
been faulted into RAM (and locked there) and provide a kernel mapping
that, in turn, can be used for direct access by the kernel or (more
often) to set up zero-copy I/O operations.

> > E, feel free to add
> >
> > then what is preventing an agreement like bounce page?
>=20
> There is 2 sides (AFAIR):
>     - do not write back GUPed page and wait until GUP goes away to
>       write them. But GUP can last as long as the uptime and we can
>       loose data on power failure.
>     - use a bounce page so that there is a chance we have some data
>       on power failure
>=20
> >
> > Because page migrate and reclaim have been working for a while with
> > gup pin taken into account, detecting it has no priority in any form
> > over the agreement on how to make a witeback page stable.
>=20
> migrate just ignore GUPed page and thus there is no issue with migrate.
> writeback is a special case here because some filesystem need a stable
> page content and also we need to inhibit some fs specific things that
> trigger BUG_ON() in set_page_dirty*()
>=20
Which drivers so far have been snared by the BUG_ON()? Is there any
chance to fix them one after another? Otherwise what is making them
special (long-lived pin)?

After setting page dirty, is there any pending DMA transfer to the
dirty page? If yes, what is the point to do writeback for corrupted
data? If no, what is preventing the gup pin from being released?

> > What seems more important, restriction B above makes C hard to meet
> > in any feasible approach trying to keep a writeback page stable, and
> > zero-copy makes it harder AFAICS.
>=20
> writeback can use bounce page, zero copy ie not having to use bounce
> page, is not an issue in fact in some cases we already use bounce page
> (at the block device level).
>=20
> >
> > > In any case my opinion is bounce page is the best thing we can do,
> > > from application and FS point of view it mimics the characteristics
> > > of regular write-back just as if the write protection window of the
> > > write-backed page was infinitly short.
> >
> > A 100-line patch tells more than a 200-line explanation can and helps
> > to shorten the discussion prior to reaching an agreement.
>=20
> It is not that trivial, you need to make sure every layer from fs down
> to block device driver properly behave in front of bounce page. We have
> such mechanism for bio but it is a the bio level but maybe it can be
> dumped one level.