From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5404C4346A for ; Mon, 21 Sep 2020 16:20:55 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 38BBA222BB for ; Mon, 21 Sep 2020 16:20:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XGcgXnxE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38BBA222BB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id AD0EF145940E9; Mon, 21 Sep 2020 09:20:54 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=207.211.31.120; helo=us-smtp-1.mimecast.com; envelope-from=mpatocka@redhat.com; receiver= Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5BF46145940E6 for ; Mon, 21 Sep 2020 09:20:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600705250; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jQWTc66kqIVZ12N6Cbr6RkCq5AT8IRw3RQfZVWbTbrI=; b=XGcgXnxEjLG2twMNuMLbR23HEyJGYVMPfpCp+7sgv7/EViAfHiXfVN9vbJwIueWxpZdFJy DNA3XCeByF0o8lODIdF92J4qcRiYvmieiQWjPPaAwCTVyQ9w3j4pSCtZntoHD6Y/eyyaPY 84E2c3vqHbOL5XIROi/rs6w/yaInxXs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-515-cblGsGTfOVqBQ7_yEP5lXA-1; Mon, 21 Sep 2020 12:20:46 -0400 X-MC-Unique: cblGsGTfOVqBQ7_yEP5lXA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D80471005E5B; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A19A010013D0; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 08LGKhfl006434; Mon, 21 Sep 2020 12:20:43 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 08LGKgrb006430; Mon, 21 Sep 2020 12:20:42 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 21 Sep 2020 12:20:42 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams Subject: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Message-ID-Hash: WCT2XZSJQWLJCBJEPSB6GLNTUEHWP5WE X-Message-ID-Hash: WCT2XZSJQWLJCBJEPSB6GLNTUEHWP5WE X-MailFrom: mpatocka@redhat.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Linus Torvalds , Alexander Viro , Andrew Morton , Matthew Wilcox , Jan Kara , Eric Sandeen , Dave Chinner , "Tadakamadla, Rajesh (DCIG/CDI/HPS Perf)" , Linux Kernel Mailing List , linux-fsdevel , linux-nvdimm X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: TEXT/PLAIN; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, 16 Sep 2020, Mikulas Patocka wrote: > > > On Wed, 16 Sep 2020, Dan Williams wrote: > > > On Wed, Sep 16, 2020 at 10:24 AM Mikulas Patocka wrote: > > > > > > > My first question about nvfs is how it compares to a daxfs with > > > > executables and other binaries configured to use page cache with the > > > > new per-file dax facility? > > > > > > nvfs is faster than dax-based filesystems on metadata-heavy operations > > > because it doesn't have the overhead of the buffer cache and bios. See > > > this: http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS > > > > ...and that metadata problem is intractable upstream? Christoph poked > > at bypassing the block layer for xfs metadata operations [1], I just > > have not had time to carry that further. > > > > [1]: "xfs: use dax_direct_access for log writes", although it seems > > he's dropped that branch from his xfs.git > > XFS is very big. I wanted to create something small. And the another difference is that XFS metadata are optimized for disks and SSDs. On disks and SSDs, reading one byte is as costly as reading a full block. So we must put as much information to a block as possible. XFS uses b+trees for file block mapping and for directories - it is reasonable decision because b+trees minimize the number of disk accesses. On persistent memory, each access has its own cost, so NVFS uses metadata structures that minimize the number of cache lines accessed (rather than the number of blocks accessed). For block mapping, NVFS uses the classic unix dierct/indirect blocks - if a file block is mapped by a 3-rd level indirect block, we do just three memory accesses and we are done. If we used b+trees, the number of accesses would be much larger than 3 (we would have to do binary search in the b+tree nodes). The same for directories - NVFS hashes the file name and uses radix-tree to locate a directory page where the directory entry is located. XFS b+trees would result in much more accesses than the radix-tree. Regarding journaling - NVFS doesn't do it because persistent memory is so fast that we can just check it in the case of crash. NVFS has a multithreaded fsck that can do 3 million inodes per second. XFS does journaling (it was reasonable decision for disks where fsck took hours) and it will cause overhead for all the filesystem operations. Mikulas _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A50CC43469 for ; Mon, 21 Sep 2020 16:20:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F64F221E2 for ; Mon, 21 Sep 2020 16:20:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CzjvJKX2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728120AbgIUQUw (ORCPT ); Mon, 21 Sep 2020 12:20:52 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:38366 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726430AbgIUQUu (ORCPT ); Mon, 21 Sep 2020 12:20:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600705248; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jQWTc66kqIVZ12N6Cbr6RkCq5AT8IRw3RQfZVWbTbrI=; b=CzjvJKX2yEpTwfO0DSS/76aqW4K1f9wON90pBvbfy31kjr/6cpsSIuhCRVkOTU7EC0aAzy CMdxo6Amax88bthlxtq6JdToLP9bRJvJA8goRwfMHHKehDM06oXGxTUqudOBu2+++tLOKT ZYGAo4UMNKJm/XtHw9YFv4X8Vj+oobI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-515-cblGsGTfOVqBQ7_yEP5lXA-1; Mon, 21 Sep 2020 12:20:46 -0400 X-MC-Unique: cblGsGTfOVqBQ7_yEP5lXA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D80471005E5B; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A19A010013D0; Mon, 21 Sep 2020 16:20:43 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 08LGKhfl006434; Mon, 21 Sep 2020 12:20:43 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 08LGKgrb006430; Mon, 21 Sep 2020 12:20:42 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 21 Sep 2020 12:20:42 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Dan Williams cc: Linus Torvalds , Alexander Viro , Andrew Morton , Vishal Verma , Dave Jiang , Ira Weiny , Matthew Wilcox , Jan Kara , Eric Sandeen , Dave Chinner , "Kani, Toshi" , "Norton, Scott J" , "Tadakamadla, Rajesh (DCIG/CDI/HPS Perf)" , Linux Kernel Mailing List , linux-fsdevel , linux-nvdimm Subject: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 16 Sep 2020, Mikulas Patocka wrote: > > > On Wed, 16 Sep 2020, Dan Williams wrote: > > > On Wed, Sep 16, 2020 at 10:24 AM Mikulas Patocka wrote: > > > > > > > My first question about nvfs is how it compares to a daxfs with > > > > executables and other binaries configured to use page cache with the > > > > new per-file dax facility? > > > > > > nvfs is faster than dax-based filesystems on metadata-heavy operations > > > because it doesn't have the overhead of the buffer cache and bios. See > > > this: http://people.redhat.com/~mpatocka/nvfs/BENCHMARKS > > > > ...and that metadata problem is intractable upstream? Christoph poked > > at bypassing the block layer for xfs metadata operations [1], I just > > have not had time to carry that further. > > > > [1]: "xfs: use dax_direct_access for log writes", although it seems > > he's dropped that branch from his xfs.git > > XFS is very big. I wanted to create something small. And the another difference is that XFS metadata are optimized for disks and SSDs. On disks and SSDs, reading one byte is as costly as reading a full block. So we must put as much information to a block as possible. XFS uses b+trees for file block mapping and for directories - it is reasonable decision because b+trees minimize the number of disk accesses. On persistent memory, each access has its own cost, so NVFS uses metadata structures that minimize the number of cache lines accessed (rather than the number of blocks accessed). For block mapping, NVFS uses the classic unix dierct/indirect blocks - if a file block is mapped by a 3-rd level indirect block, we do just three memory accesses and we are done. If we used b+trees, the number of accesses would be much larger than 3 (we would have to do binary search in the b+tree nodes). The same for directories - NVFS hashes the file name and uses radix-tree to locate a directory page where the directory entry is located. XFS b+trees would result in much more accesses than the radix-tree. Regarding journaling - NVFS doesn't do it because persistent memory is so fast that we can just check it in the case of crash. NVFS has a multithreaded fsck that can do 3 million inodes per second. XFS does journaling (it was reasonable decision for disks where fsck took hours) and it will cause overhead for all the filesystem operations. Mikulas