From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.7 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB29BC74A35 for ; Wed, 10 Jul 2019 21:50:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 93958214AF for ; Wed, 10 Jul 2019 21:50:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dIVcKYak" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727680AbfGJVuv (ORCPT ); Wed, 10 Jul 2019 17:50:51 -0400 Received: from mail-io1-f68.google.com ([209.85.166.68]:37018 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727625AbfGJVuv (ORCPT ); Wed, 10 Jul 2019 17:50:51 -0400 Received: by mail-io1-f68.google.com with SMTP id q22so8086656iog.4; Wed, 10 Jul 2019 14:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=szjTCXs/+qzPjQoJIlXrmJ5QRPtMUoSrM9cPDUs6dgg=; b=dIVcKYaknlt37uO6ESWKfchDa+WB9+B5g+y3PeiUwxUyVLmkiVYp7v9ARUuAw2Aufo qGkvuaajAEo9I2w3whE7ewAT903bK7+tGpe6rM1uH/dblZd38Kbrk49gH/IdMkurUTRH 1D+MBld/AiyH02qS4bit48NM4gq7fpia4ATUjr3PN2It5TORTrmoNFIOFCnZcHcPLijF gnGSAJot0Ytfr1kUIAeV0yMTCIMnBkOqE2hkdXhb6RM0Tqy3+9wYqHvzBqCzfbq7Q8QG V/27WRLaSAGyOfYo0qFCPMIkBs5G7IEB+UfDqJo4bSEsfwbScXtZuaQLgiwrvg6DJFrp Vj7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=szjTCXs/+qzPjQoJIlXrmJ5QRPtMUoSrM9cPDUs6dgg=; b=I6eXmfA9sFaPxRZA3+FBNcfiwMSmfCPuD6vRwvbm7xTGiSj4VqcKu+gAN7D3QHwqFk T3b+8dqP3y9DgkCoCTbO31cW6H2FOorXeqcOEtYQzozZDeGMT3JDSXT3NogOtECAjR+V HBAav4ECHoFgX75SHl5U+5KH2Vu9Gy4voWwKoNiOhnyyu0g6pwJAXA5d1TW2Hz45fOrK zQcmJ+lWXspgidSir4QelSDNb1hgYg8ferc8ccAJUPYpeqmcrAJa1pFlBD+CTb5KlJYd iWPfuTGWa4PrnDKxYXYfX62Vv8WbGFu31x5HJ4ImnTSfsnnWdImUJlsM3JptiEFzjUbE Mp4w== X-Gm-Message-State: APjAAAUIYMiPBjLJbisJQtezp/svGwujOvcgK91JfSu0LCCgmUBz8eqM 37BLU87K8cJqH6DFFL+Nsgo1SW7CggnVG2ObBYYyolzE X-Google-Smtp-Source: APXvYqywSqqa4SHlujvNVxfUYsAaEqPdPXoF6aCEi5Pmeih57jtNFLzWD6Y8hjCf/hTm6N2bo/WYmWoiS6J9eljuGtQ= X-Received: by 2002:a5e:9e42:: with SMTP id j2mr360059ioq.133.1562795450420; Wed, 10 Jul 2019 14:50:50 -0700 (PDT) MIME-Version: 1.0 References: <20190703075502.79782-1-yuchao0@huawei.com> <39944e50-5888-f900-1954-91be2b12ea5b@huawei.com> In-Reply-To: <39944e50-5888-f900-1954-91be2b12ea5b@huawei.com> From: =?UTF-8?Q?Andreas_Gr=C3=BCnbacher?= Date: Wed, 10 Jul 2019 23:50:39 +0200 Message-ID: Subject: Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case To: Chao Yu Cc: Christoph Hellwig , "Darrick J. Wong" , linux-xfs@vger.kernel.org, Linux FS-devel Mailing List , Gao Xiang , chao@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu : > Hi Andreas, > > Thanks for your review in advance. :) > > On 2019/7/10 7:32, Andreas Gr=C3=BCnbacher wrote: > > Hi Chao, > > > > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu : > >> Some filesystems like erofs/reiserfs have the ability to pack tail > >> data into metadata, e.g.: > >> IOMAP_MAPPED [0, 8192] > >> IOMAP_INLINE [8192, 8200] > >> > >> However current IOMAP_INLINE type has assumption that: > >> - inline data should be locating at page #0. > >> - inline size should equal to .i_size > >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs, > >> so this patch tries to relieve above limits to make IOMAP_INLINE more > >> generic to cover tail-packing case. > > > > this patch suffers from the following problems: > > > > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and > > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use > > iomap_fiemap on a filesystem with tail packing, so if we don't make > > the same distinction in iomap, iomap_fiemap will silently produce the > > wrong result. This means that IOMAP_TAIL should become a separate > > mapping type. > > I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are w= ith > different semantics: > > - IOMAP_TAIL: > we may introduce this flag to cover tail-packing case, in where we merg= e > small-sized data in the tail of file into free space of its metadata. > - FIEMAP_EXTENT_DATA_TAIL: > quoted from Documentation/filesystems/fiemap.txt > " This will also set FIEMAP_EXTENT_NOT_ALIGNED > Data is packed into a block with data from other files." > It looks like this flag indicates that blocks from different files stores= into > one common block. Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate flag in your scenario. In that case, we should be fine on the fiemap front. > I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA= _TAIL > should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambigui= ty. And > then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case= and > tail-packing case? Those definitions are user-visible. > > * IOMAP_INLINE mappings always start at offset 0 and span an entire > > page, so they are always page aligned. IOMAP_TAIL mappings only need > > Why can't #0 page consist of more than one block/mapping? I didn't get wh= at's > difference here. Currently, iomap_write_begin will read the inline data into page #0 and mark that page up-to-date. There's a check for that in iomap_write_end_inline. If a page contains more than one mapping, we won't be able to mark the entire page up to date anymore; we'd need to track which parts of the page are up to date and which are not. Iomap supports two tracking mechanisms, buffer heads and iomap_page, and we'd need to implement that tracking code for each of those cases. > > to be block aligned. If the block size is smaller than the page size, > > - reiserfs tries to find last page's content, if the size of that page's = valid > content is smaller than threshold (at least less than one block), reiserf= s can > do the packing. > > - erofs' block size is 4kb which is the same as page size. > > > Actually, for IOMAP_TAIL, there is no users who need to map more than one= block > into metadata, so I'm not sure that we should support that, or we just ne= ed to > let code preparing for that to those potential users? How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs doesn't require block size =3D=3D PAGE_SIZE? > Thanks, > > > a tail page can consist of more than one mapping. So > > iomap_read_inline_data needs to be changed to only copy a single block > > out of iomap->inline_data, and we need to adjust iomap_write_begin and > > iomap_readpage_actor accordingly. > > > > * Functions iomap_read_inline_data, iomap_write_end_inline, and > > iomap_dio_inline_actor currently all assume that we are operating on > > page 0, and that iomap->inline_data points at the data at offset 0. > > That's no longer the case for IOMAP_TAIL mappings. We need to look > > only at the offset within the page / block there. > > > > * There are some asserts like the following int he code: > > > > BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > > > > Those should probably be tightened to check for block boundaries > > instead of page boundaries, i.e. something like: > > > > BUG_ON(size > i_blocksize(inode) - > > offset_in_block(iomap->inline_data, inode)); > > > >> Signed-off-by: Chao Yu > >> --- > >> fs/iomap.c | 4 +--- > >> 1 file changed, 1 insertion(+), 3 deletions(-) > >> > >> diff --git a/fs/iomap.c b/fs/iomap.c > >> index 12654c2e78f8..d1c16b692d31 100644 > >> --- a/fs/iomap.c > >> +++ b/fs/iomap.c > >> @@ -264,13 +264,12 @@ static void > >> iomap_read_inline_data(struct inode *inode, struct page *page, > >> struct iomap *iomap) > >> { > >> - size_t size =3D i_size_read(inode); > >> + size_t size =3D iomap->length; > > > > Function iomap_read_inline_data fills the entire page here, not only > > the iomap->length part, so this is wrong. But see my comment above > > about changing iomap_read_inline_data to read a single block above as > > well. > > > >> void *addr; > >> > >> if (PageUptodate(page)) > >> return; > >> > >> - BUG_ON(page->index); > >> BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data)); > >> > >> addr =3D kmap_atomic(page); > >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t p= os, loff_t length, void *data, > >> sector_t sector; > >> > >> if (iomap->type =3D=3D IOMAP_INLINE) { > >> - WARN_ON_ONCE(pos); > >> iomap_read_inline_data(inode, page, iomap); > >> return PAGE_SIZE; > >> } > > > > Those last two changes look right to me. > > > > Thanks, > > Andreas > > . > > At this point, can I ask how important this packing mechanism is to you? I can see a point in implementing inline files, which help because there tends to be a large number of very small files. But for not-so-small files, is saving an extra block really worth the trouble, especially given how cheap storage has become? Thanks, Andreas