From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA0F9C433E2 for ; Fri, 5 Jun 2020 01:30:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8DB732067B for ; Fri, 5 Jun 2020 01:30:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DB732067B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF10680009; Thu, 4 Jun 2020 21:30:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA19B8E0006; Thu, 4 Jun 2020 21:30:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 969B880009; Thu, 4 Jun 2020 21:30:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 7EF948E0006 for ; Thu, 4 Jun 2020 21:30:39 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4029252D1 for ; Fri, 5 Jun 2020 01:30:39 +0000 (UTC) X-FDA: 76893428598.30.toys02_5d0f3e026d9c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 29B83180B3C83 for ; Fri, 5 Jun 2020 01:30:39 +0000 (UTC) X-HE-Tag: toys02_5d0f3e026d9c X-Filterd-Recvd-Size: 6677 Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 5 Jun 2020 01:30:38 +0000 (UTC) Received: from dread.disaster.area (pa49-180-124-177.pa.nsw.optusnet.com.au [49.180.124.177]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id CA2095AB25B; Fri, 5 Jun 2020 11:30:32 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jh1Bf-00026O-Bh; Fri, 05 Jun 2020 11:30:23 +1000 Date: Fri, 5 Jun 2020 11:30:23 +1000 From: Dave Chinner To: "Darrick J. Wong" Cc: Ruan Shiyang , Matthew Wilcox , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "dan.j.williams@intel.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "Qi, Fuli" , "Gotou, Yasunori" Subject: Re: =?utf-8?B?5Zue5aSNOiBSZQ==?= =?utf-8?Q?=3A?= [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Message-ID: <20200605013023.GZ2040@dread.disaster.area> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> <20200427122836.GD29705@bombadil.infradead.org> <20200428064318.GG2040@dread.disaster.area> <153e13e6-8685-fb0d-6bd3-bb553c06bf51@cn.fujitsu.com> <20200604145107.GA1334206@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200604145107.GA1334206@magnolia> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=W5xGqiek c=1 sm=1 tr=0 a=k3aV/LVJup6ZGWgigO6cSA==:117 a=k3aV/LVJup6ZGWgigO6cSA==:17 a=IkcTkHD0fZMA:10 a=nTHF0DUjJn0A:10 a=5KLPUuaC_9wA:10 a=JfrnYn6hAAAA:8 a=7-415B0cAAAA:8 a=Ta0clAhtVI-YSBJ3DlQA:9 a=J8Q19hsgq330FmqU:21 a=uNIap141QPGCy0-l:21 a=QEXdDO2ut3YA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=biEYGPWJfzWAr4FL6Ov7:22 X-Rspamd-Queue-Id: 29B83180B3C83 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 04, 2020 at 07:51:07AM -0700, Darrick J. Wong wrote: > On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote: > >=20 > >=20 > > On 2020/4/28 =E4=B8=8B=E5=8D=882:43, Dave Chinner wrote: > > > On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote: > > > >=20 > > > > =E5=9C=A8 2020/4/27 20:28:36, "Matthew Wilcox" =E5=86=99=E9=81=93: > > > >=20 > > > > > On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote: > > > > > > This patchset is a try to resolve the shared 'page cache' p= roblem for > > > > > > fsdax. > > > > > >=20 > > > > > > In order to track multiple mappings and indexes on one page= , I > > > > > > introduced a dax-rmap rb-tree to manage the relationship. = A dax entry > > > > > > will be associated more than once if is shared. At the sec= ond time we > > > > > > associate this entry, we create this rb-tree and store its = root in > > > > > > page->private(not used in fsdax). Insert (->mapping, ->ind= ex) when > > > > > > dax_associate_entry() and delete it when dax_disassociate_e= ntry(). > > > > >=20 > > > > > Do we really want to track all of this on a per-page basis? I = would > > > > > have thought a per-extent basis was more useful. Essentially, = create > > > > > a new address_space for each shared extent. Per page just seem= s like > > > > > a huge overhead. > > > > >=20 > > > > Per-extent tracking is a nice idea for me. I haven't thought of = it > > > > yet... > > > >=20 > > > > But the extent info is maintained by filesystem. I think we need= a way > > > > to obtain this info from FS when associating a page. May be a bi= t > > > > complicated. Let me think about it... > > >=20 > > > That's why I want the -user of this association- to do a filesystem > > > callout instead of keeping it's own naive tracking infrastructure. > > > The filesystem can do an efficient, on-demand reverse mapping looku= p > > > from it's own extent tracking infrastructure, and there's zero > > > runtime overhead when there are no errors present. > >=20 > > Hi Dave, > >=20 > > I ran into some difficulties when trying to implement the per-extent = rmap > > tracking. So, I re-read your comments and found that I was misunders= tanding > > what you described here. > >=20 > > I think what you mean is: we don't need the in-memory dax-rmap tracki= ng now. > > Just ask the FS for the owner's information that associate with one p= age > > when memory-failure. So, the per-page (even per-extent) dax-rmap is > > needless in this case. Is this right? >=20 > Right. XFS already has its own rmap tree. *nod* > > Based on this, we only need to store the extent information of a fsda= x page > > in its ->mapping (by searching from FS). Then obtain the owners of t= his > > page (also by searching from FS) when memory-failure or other rmap ca= se > > occurs. >=20 > I don't even think you need that much. All you need is the "physical" > offset of that page within the pmem device (e.g. 'this is the 307th 4k > page =3D=3D offset 1257472 since the start of /dev/pmem0') and xfs can = look > up the owner of that range of physical storage and deal with it as > needed. Right. If we have the dax device associated with the page that had the failure, then we can determine the offset of the page into the block device address space and that's all we need to find the owner of the page in the filesystem. Note that there may actually be no owner - the page that had the fault might land in free space, in which case we can simply zero the page and clear the error. > > So, a fsdax page is no longer associated with a specific file, but wi= th a > > FS(or the pmem device). I think it's easier to understand and implem= ent. Effectively, yes. But we shouldn't need to actually associate the page with anything at the filesystem level because it is already associated with a DAX device at a lower level via a dev_pagemap. The hardware page fault already runs thought this code memory_failure_dev_pagemap() before it gets to the DAX code, so really all we need to is have that function pass us the page, offset into the device and, say, the struct dax_device associated with that page so we can get to the filesystem superblock we can then use for rmap lookups on... Cheers, Dave. --=20 Dave Chinner david@fromorbit.com