From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1C4E2C99 for ; Fri, 5 Nov 2021 03:32:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vYTj4+Qj7wwu4cVg0DWamUnt0bl+yIq5bTcfsNjRRqw=; b=GDZA848HxP31gXPJrbJb1hsEqw 0f4xGgqaPeTYr/jnGmCuCwIDm9/vBUN70m1anN1+xf4jYdLKo8bnkpv98Tz/4LSFfutV+5RUbsHcn xvpui7RzzIwOqB9/ri4wv9GDDtkkShT1u8edH0UBE2oBmmJEo1CNw0G0wxCGXXKlubAjOM2Ro/Ehs bQ0RTb1mP+goFvBzd5NdSO+UkHqegStEbVRdq3ZH+RO1ssVK/8iJEk/+Yic/9NxIG4O8ZIrUL3aQU GV3eG6t1cGKVIkDmU4bZ62AO0lzIYsinGEt5zLBc3KGEkXCrJ55GrQMAElwFrvICgT/tM4n5HS7z3 Ew1FygLQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mipvv-006JD0-HF; Fri, 05 Nov 2021 03:30:39 +0000 Date: Fri, 5 Nov 2021 03:30:27 +0000 From: Matthew Wilcox To: Theodore Ts'o Cc: "Darrick J. Wong" , Dan Williams , Christoph Hellwig , Eric Sandeen , Mike Snitzer , Ira Weiny , device-mapper development , linux-xfs , Linux NVDIMM , linux-s390 , linux-fsdevel , linux-erofs@lists.ozlabs.org, linux-ext4 , virtualization@lists.linux-foundation.org Subject: Re: futher decouple DAX from block devices Message-ID: References: <20211018044054.1779424-1-hch@lst.de> <21ff4333-e567-2819-3ae0-6a2e83ec7ce6@sandeen.net> <20211104081740.GA23111@lst.de> <20211104173417.GJ2237511@magnolia> <20211104173559.GB31740@lst.de> <20211104190443.GK24333@magnolia> Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote: > On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote: > > > Note that I've avoided implementing read/write fops for dax devices > > > partly out of concern for not wanting to figure out shared-mmap vs > > > write coherence issues, but also because of a bet with Dave Hansen > > > that device-dax not grow features like what happened to hugetlbfs. So > > > it would seem mkfs would need to switch to mmap I/O, or bite the > > > bullet and implement read/write fops in the driver. > > > > That ... would require a fair amount of userspace changes, though at > > least e2fsprogs has pluggable io drivers, which would make mmapping a > > character device not too awful. > > > > xfsprogs would be another story -- porting the buffer cache mignt not be > > too bad, but mkfs and repair seem to issue pread/pwrite calls directly. > > Note that xfsprogs explicitly screens out chardevs. > > It's not just e2fsprogs and xfsprogs. There's also udev, blkid, > potententially systemd unit generators to kick off fsck runs, etc. > There are probably any number of user scripts which assume that file > systems are mounted on block devices --- for example, by looking at > the output of lsblk, etc. > > Also note that block devices have O_EXCL support to provide locking > against attempts to run mkfs on a mounted file system. If you move > dax file systems to be mounted on a character mode device, that would > have to be replicated as well, etc. So I suspect that a large number > of subtle things would break, and I'd strongly recommend against going > down that path. Agreed. There were reasons we decided to present pmem as "block device with extra functionality" rather than try to cram all the block layer functionality (eg submitting BIOs for filesystem metadata) into a character device. Some of those assumptions might be worth re-examining, but I haven't seen anything that makes me say "this is obviously better than what we did at the time". From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55337C433F5 for ; Fri, 5 Nov 2021 03:32:50 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BE44860EB4 for ; Fri, 5 Nov 2021 03:32:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BE44860EB4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-587-rqGGvG3ROwu4RiTE6_XwDA-1; Thu, 04 Nov 2021 23:32:45 -0400 X-MC-Unique: rqGGvG3ROwu4RiTE6_XwDA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 20A8710B3940; Fri, 5 Nov 2021 03:32:41 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 14FB55D740; Fri, 5 Nov 2021 03:32:40 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 3D0A41806D03; Fri, 5 Nov 2021 03:32:39 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 1A53WXbO026498 for ; Thu, 4 Nov 2021 23:32:34 -0400 Received: by smtp.corp.redhat.com (Postfix) id AE7DE2026D65; Fri, 5 Nov 2021 03:32:33 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast02.extmail.prod.ext.rdu2.redhat.com [10.11.55.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AA0AB2026D64 for ; Fri, 5 Nov 2021 03:32:30 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B0E3B800883 for ; Fri, 5 Nov 2021 03:32:30 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-49-913Vf8mYPNuUm-tOlI9xAw-1; Thu, 04 Nov 2021 23:32:26 -0400 X-MC-Unique: 913Vf8mYPNuUm-tOlI9xAw-1 Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mipvv-006JD0-HF; Fri, 05 Nov 2021 03:30:39 +0000 Date: Fri, 5 Nov 2021 03:30:27 +0000 From: Matthew Wilcox To: "Theodore Ts'o" Message-ID: References: <20211018044054.1779424-1-hch@lst.de> <21ff4333-e567-2819-3ae0-6a2e83ec7ce6@sandeen.net> <20211104081740.GA23111@lst.de> <20211104173417.GJ2237511@magnolia> <20211104173559.GB31740@lst.de> <20211104190443.GK24333@magnolia> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-loop: dm-devel@redhat.com Cc: Linux NVDIMM , linux-erofs@lists.ozlabs.org, Mike Snitzer , linux-s390 , "Darrick J. Wong" , Eric Sandeen , virtualization@lists.linux-foundation.org, linux-xfs , device-mapper development , linux-fsdevel , Dan Williams , linux-ext4 , Ira Weiny , Christoph Hellwig Subject: Re: [dm-devel] futher decouple DAX from block devices X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote: > On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote: > > > Note that I've avoided implementing read/write fops for dax devices > > > partly out of concern for not wanting to figure out shared-mmap vs > > > write coherence issues, but also because of a bet with Dave Hansen > > > that device-dax not grow features like what happened to hugetlbfs. So > > > it would seem mkfs would need to switch to mmap I/O, or bite the > > > bullet and implement read/write fops in the driver. > > > > That ... would require a fair amount of userspace changes, though at > > least e2fsprogs has pluggable io drivers, which would make mmapping a > > character device not too awful. > > > > xfsprogs would be another story -- porting the buffer cache mignt not be > > too bad, but mkfs and repair seem to issue pread/pwrite calls directly. > > Note that xfsprogs explicitly screens out chardevs. > > It's not just e2fsprogs and xfsprogs. There's also udev, blkid, > potententially systemd unit generators to kick off fsck runs, etc. > There are probably any number of user scripts which assume that file > systems are mounted on block devices --- for example, by looking at > the output of lsblk, etc. > > Also note that block devices have O_EXCL support to provide locking > against attempts to run mkfs on a mounted file system. If you move > dax file systems to be mounted on a character mode device, that would > have to be replicated as well, etc. So I suspect that a large number > of subtle things would break, and I'd strongly recommend against going > down that path. Agreed. There were reasons we decided to present pmem as "block device with extra functionality" rather than try to cram all the block layer functionality (eg submitting BIOs for filesystem metadata) into a character device. Some of those assumptions might be worth re-examining, but I haven't seen anything that makes me say "this is obviously better than what we did at the time". -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6F30C433EF for ; Fri, 5 Nov 2021 03:33:09 +0000 (UTC) Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7367B60EB4 for ; Fri, 5 Nov 2021 03:33:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7367B60EB4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 24ECA40293; Fri, 5 Nov 2021 03:33:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bMRszv0cklc2; Fri, 5 Nov 2021 03:33:08 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id B48414022C; Fri, 5 Nov 2021 03:33:07 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8663CC0012; Fri, 5 Nov 2021 03:33:07 +0000 (UTC) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id C5561C000E for ; Fri, 5 Nov 2021 03:33:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id B677140272 for ; Fri, 5 Nov 2021 03:33:06 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pLWfo1Clyx2g for ; Fri, 5 Nov 2021 03:33:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by smtp4.osuosl.org (Postfix) with ESMTPS id 224DB4022C for ; Fri, 5 Nov 2021 03:33:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vYTj4+Qj7wwu4cVg0DWamUnt0bl+yIq5bTcfsNjRRqw=; b=GDZA848HxP31gXPJrbJb1hsEqw 0f4xGgqaPeTYr/jnGmCuCwIDm9/vBUN70m1anN1+xf4jYdLKo8bnkpv98Tz/4LSFfutV+5RUbsHcn xvpui7RzzIwOqB9/ri4wv9GDDtkkShT1u8edH0UBE2oBmmJEo1CNw0G0wxCGXXKlubAjOM2Ro/Ehs bQ0RTb1mP+goFvBzd5NdSO+UkHqegStEbVRdq3ZH+RO1ssVK/8iJEk/+Yic/9NxIG4O8ZIrUL3aQU GV3eG6t1cGKVIkDmU4bZ62AO0lzIYsinGEt5zLBc3KGEkXCrJ55GrQMAElwFrvICgT/tM4n5HS7z3 Ew1FygLQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mipvv-006JD0-HF; Fri, 05 Nov 2021 03:30:39 +0000 Date: Fri, 5 Nov 2021 03:30:27 +0000 From: Matthew Wilcox To: Theodore Ts'o Subject: Re: futher decouple DAX from block devices Message-ID: References: <20211018044054.1779424-1-hch@lst.de> <21ff4333-e567-2819-3ae0-6a2e83ec7ce6@sandeen.net> <20211104081740.GA23111@lst.de> <20211104173417.GJ2237511@magnolia> <20211104173559.GB31740@lst.de> <20211104190443.GK24333@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: Linux NVDIMM , linux-erofs@lists.ozlabs.org, Mike Snitzer , linux-s390 , "Darrick J. Wong" , Eric Sandeen , virtualization@lists.linux-foundation.org, linux-xfs , device-mapper development , linux-fsdevel , Dan Williams , linux-ext4 , Ira Weiny , Christoph Hellwig X-BeenThere: virtualization@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux virtualization List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote: > On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote: > > > Note that I've avoided implementing read/write fops for dax devices > > > partly out of concern for not wanting to figure out shared-mmap vs > > > write coherence issues, but also because of a bet with Dave Hansen > > > that device-dax not grow features like what happened to hugetlbfs. So > > > it would seem mkfs would need to switch to mmap I/O, or bite the > > > bullet and implement read/write fops in the driver. > > > > That ... would require a fair amount of userspace changes, though at > > least e2fsprogs has pluggable io drivers, which would make mmapping a > > character device not too awful. > > > > xfsprogs would be another story -- porting the buffer cache mignt not be > > too bad, but mkfs and repair seem to issue pread/pwrite calls directly. > > Note that xfsprogs explicitly screens out chardevs. > > It's not just e2fsprogs and xfsprogs. There's also udev, blkid, > potententially systemd unit generators to kick off fsck runs, etc. > There are probably any number of user scripts which assume that file > systems are mounted on block devices --- for example, by looking at > the output of lsblk, etc. > > Also note that block devices have O_EXCL support to provide locking > against attempts to run mkfs on a mounted file system. If you move > dax file systems to be mounted on a character mode device, that would > have to be replicated as well, etc. So I suspect that a large number > of subtle things would break, and I'd strongly recommend against going > down that path. Agreed. There were reasons we decided to present pmem as "block device with extra functionality" rather than try to cram all the block layer functionality (eg submitting BIOs for filesystem metadata) into a character device. Some of those assumptions might be worth re-examining, but I haven't seen anything that makes me say "this is obviously better than what we did at the time". _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0ECBC433EF for ; Fri, 5 Nov 2021 03:33:26 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 07E1361051 for ; Fri, 5 Nov 2021 03:33:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 07E1361051 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4HlmLT4lrzz2yb3 for ; Fri, 5 Nov 2021 14:33:21 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=GDZA848H; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=infradead.org (client-ip=2001:8b0:10b:1236::1; helo=casper.infradead.org; envelope-from=willy@infradead.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=GDZA848H; dkim-atps=neutral Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4HlmLJ1D8Bz2xtw for ; Fri, 5 Nov 2021 14:33:09 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vYTj4+Qj7wwu4cVg0DWamUnt0bl+yIq5bTcfsNjRRqw=; b=GDZA848HxP31gXPJrbJb1hsEqw 0f4xGgqaPeTYr/jnGmCuCwIDm9/vBUN70m1anN1+xf4jYdLKo8bnkpv98Tz/4LSFfutV+5RUbsHcn xvpui7RzzIwOqB9/ri4wv9GDDtkkShT1u8edH0UBE2oBmmJEo1CNw0G0wxCGXXKlubAjOM2Ro/Ehs bQ0RTb1mP+goFvBzd5NdSO+UkHqegStEbVRdq3ZH+RO1ssVK/8iJEk/+Yic/9NxIG4O8ZIrUL3aQU GV3eG6t1cGKVIkDmU4bZ62AO0lzIYsinGEt5zLBc3KGEkXCrJ55GrQMAElwFrvICgT/tM4n5HS7z3 Ew1FygLQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mipvv-006JD0-HF; Fri, 05 Nov 2021 03:30:39 +0000 Date: Fri, 5 Nov 2021 03:30:27 +0000 From: Matthew Wilcox To: Theodore Ts'o Subject: Re: futher decouple DAX from block devices Message-ID: References: <20211018044054.1779424-1-hch@lst.de> <21ff4333-e567-2819-3ae0-6a2e83ec7ce6@sandeen.net> <20211104081740.GA23111@lst.de> <20211104173417.GJ2237511@magnolia> <20211104173559.GB31740@lst.de> <20211104190443.GK24333@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: linux-erofs@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development of Linux EROFS file system List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Linux NVDIMM , linux-erofs@lists.ozlabs.org, Mike Snitzer , linux-s390 , "Darrick J. Wong" , Eric Sandeen , virtualization@lists.linux-foundation.org, linux-xfs , device-mapper development , linux-fsdevel , Dan Williams , linux-ext4 , Ira Weiny , Christoph Hellwig Errors-To: linux-erofs-bounces+linux-erofs=archiver.kernel.org@lists.ozlabs.org Sender: "Linux-erofs" On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote: > On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote: > > > Note that I've avoided implementing read/write fops for dax devices > > > partly out of concern for not wanting to figure out shared-mmap vs > > > write coherence issues, but also because of a bet with Dave Hansen > > > that device-dax not grow features like what happened to hugetlbfs. So > > > it would seem mkfs would need to switch to mmap I/O, or bite the > > > bullet and implement read/write fops in the driver. > > > > That ... would require a fair amount of userspace changes, though at > > least e2fsprogs has pluggable io drivers, which would make mmapping a > > character device not too awful. > > > > xfsprogs would be another story -- porting the buffer cache mignt not be > > too bad, but mkfs and repair seem to issue pread/pwrite calls directly. > > Note that xfsprogs explicitly screens out chardevs. > > It's not just e2fsprogs and xfsprogs. There's also udev, blkid, > potententially systemd unit generators to kick off fsck runs, etc. > There are probably any number of user scripts which assume that file > systems are mounted on block devices --- for example, by looking at > the output of lsblk, etc. > > Also note that block devices have O_EXCL support to provide locking > against attempts to run mkfs on a mounted file system. If you move > dax file systems to be mounted on a character mode device, that would > have to be replicated as well, etc. So I suspect that a large number > of subtle things would break, and I'd strongly recommend against going > down that path. Agreed. There were reasons we decided to present pmem as "block device with extra functionality" rather than try to cram all the block layer functionality (eg submitting BIOs for filesystem metadata) into a character device. Some of those assumptions might be worth re-examining, but I haven't seen anything that makes me say "this is obviously better than what we did at the time".