From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4I4d=6M=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D8FC7C83000
	for <linux-mm@archiver.kernel.org>; Tue, 28 Apr 2020 06:43:27 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A1ACD205C9
	for <linux-mm@archiver.kernel.org>; Tue, 28 Apr 2020 06:43:27 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1ACD205C9
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 3A9D08E0005; Tue, 28 Apr 2020 02:43:27 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 35AB28E0001; Tue, 28 Apr 2020 02:43:27 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 270128E0005; Tue, 28 Apr 2020 02:43:27 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92])
	by kanga.kvack.org (Postfix) with ESMTP id 0D47E8E0001
	for <linux-mm@kvack.org>; Tue, 28 Apr 2020 02:43:27 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id B14B14DCA
	for <linux-mm@kvack.org>; Tue, 28 Apr 2020 06:43:26 +0000 (UTC)
X-FDA: 76756322412.06.hen63_1df68bbeb0f5b
X-HE-Tag: hen63_1df68bbeb0f5b
X-Filterd-Recvd-Size: 4999
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246])
	by imf05.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 28 Apr 2020 06:43:25 +0000 (UTC)
Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175])
	by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id DD45E82080A;
	Tue, 28 Apr 2020 16:43:19 +1000 (AEST)
Received: from dave by dread.disaster.area with local (Exim 4.92.3)
	(envelope-from <david@fromorbit.com>)
	id 1jTJxe-0002vq-9O; Tue, 28 Apr 2020 16:43:18 +1000
Date: Tue, 28 Apr 2020 16:43:18 +1000
From: Dave Chinner <david@fromorbit.com>
To: "Ruan, Shiyang" <ruansy.fnst@cn.fujitsu.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"hch@lst.de" <hch@lst.de>, "rgoldwyn@suse.de" <rgoldwyn@suse.de>,
	"Qi, Fuli" <qi.fuli@fujitsu.com>,
	"Gotou, Yasunori" <y-goto@fujitsu.com>
Subject: Re: =?utf-8?B?5Zue5aSNOiBSZQ==?= =?utf-8?Q?=3A?= [RFC PATCH 0/8]
 dax: Add a dax-rmap tree to support reflink
Message-ID: <20200428064318.GG2040@dread.disaster.area>
References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com>
 <20200427122836.GD29705@bombadil.infradead.org>
 <em33c55fa5-15ca-4c46-8c27-6b0300fa4e51@g08fnstd180058>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <em33c55fa5-15ca-4c46-8c27-6b0300fa4e51@g08fnstd180058>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0
	a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17
	a=IkcTkHD0fZMA:10 a=cl8xLZFz6L8A:10 a=5KLPUuaC_9wA:10 a=JfrnYn6hAAAA:8
	a=7-415B0cAAAA:8 a=Kw4piam9Eq2nsQd2tG8A:9 a=93mTbiTF0b_u7Sz-:21
	a=KFoNIqDtwUuuseL_:21 a=QEXdDO2ut3YA:10 a=1CNFftbPRP8L7MoqJWF3:22
	a=biEYGPWJfzWAr4FL6Ov7:22 a=pHzHmUro8NiASowvMSCR:22
	a=n87TN5wuljxrRezIQYnT:22
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote:
>=20
> =E5=9C=A8 2020/4/27 20:28:36, "Matthew Wilcox" <willy@infradead.org> =E5=
=86=99=E9=81=93:
>=20
> >On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:
> >>  This patchset is a try to resolve the shared 'page cache' problem f=
or
> >>  fsdax.
> >>
> >>  In order to track multiple mappings and indexes on one page, I
> >>  introduced a dax-rmap rb-tree to manage the relationship.  A dax en=
try
> >>  will be associated more than once if is shared.  At the second time=
 we
> >>  associate this entry, we create this rb-tree and store its root in
> >>  page->private(not used in fsdax).  Insert (->mapping, ->index) when
> >>  dax_associate_entry() and delete it when dax_disassociate_entry().
> >
> >Do we really want to track all of this on a per-page basis?  I would
> >have thought a per-extent basis was more useful.  Essentially, create
> >a new address_space for each shared extent.  Per page just seems like
> >a huge overhead.
> >
> Per-extent tracking is a nice idea for me.  I haven't thought of it=20
> yet...
>=20
> But the extent info is maintained by filesystem.  I think we need a way=
=20
> to obtain this info from FS when associating a page.  May be a bit=20
> complicated.  Let me think about it...

That's why I want the -user of this association- to do a filesystem
callout instead of keeping it's own naive tracking infrastructure.
The filesystem can do an efficient, on-demand reverse mapping lookup
from it's own extent tracking infrastructure, and there's zero
runtime overhead when there are no errors present.

At the moment, this "dax association" is used to "report" a storage
media error directly to userspace. I say "report" because what it
does is kill userspace processes dead. The storage media error
actually needs to be reported to the owner of the storage media,
which in the case of FS-DAX is the filesytem.

That way the filesystem can then look up all the owners of that bad
media range (i.e. the filesystem block it corresponds to) and take
appropriate action. e.g.

- if it falls in filesytem metadata, shutdown the filesystem
- if it falls in user data, call the "kill userspace dead" routines
  for each mapping/index tuple the filesystem finds for the given
  LBA address that the media error occurred.

Right now if the media error is in filesystem metadata, the
filesystem isn't even told about it. The filesystem can't even shut
down - the error is just dropped on the floor and it won't be until
the filesystem next tries to reference that metadata that we notice
there is an issue.

Cheers,

Dave.
--=20
Dave Chinner
david@fromorbit.com