From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x22d.google.com (mail-oi0-x22d.google.com [IPv6:2607:f8b0:4003:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 910D8202E6163 for ; Mon, 23 Oct 2017 04:17:04 -0700 (PDT) Received: by mail-oi0-x22d.google.com with SMTP id j126so29988402oib.8 for ; Mon, 23 Oct 2017 04:20:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20171023124427.10d15ee3@mschwideX1> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <150846714747.24336.14704246566580871364.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020075735.GA14378@lst.de> <20171020162933.GA26320@lst.de> <20171023071835.67ee5210@mschwideX1> <20171023124427.10d15ee3@mschwideX1> From: Dan Williams Date: Mon, 23 Oct 2017 04:20:46 -0700 Message-ID: Subject: Re: [PATCH v3 02/13] dax: require 'struct page' for filesystem dax List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Martin Schwidefsky Cc: Jan Kara , "linux-nvdimm@lists.01.org" , Benjamin Herrenschmidt , Heiko Carstens , "linux-kernel@vger.kernel.org" , linux-xfs@vger.kernel.org, Linux MM , Paul Mackerras , Michael Ellerman , linux-fsdevel , Andrew Morton , Christoph Hellwig , Gerald Schaefer List-ID: T24gTW9uLCBPY3QgMjMsIDIwMTcgYXQgMzo0NCBBTSwgTWFydGluIFNjaHdpZGVmc2t5CjxzY2h3 aWRlZnNreUBkZS5pYm0uY29tPiB3cm90ZToKPiBPbiBNb24sIDIzIE9jdCAyMDE3IDAxOjU1OjIw IC0wNzAwCj4gRGFuIFdpbGxpYW1zIDxkYW4uai53aWxsaWFtc0BpbnRlbC5jb20+IHdyb3RlOgo+ Cj4+IE9uIFN1biwgT2N0IDIyLCAyMDE3IGF0IDEwOjE4IFBNLCBNYXJ0aW4gU2Nod2lkZWZza3kK Pj4gPHNjaHdpZGVmc2t5QGRlLmlibS5jb20+IHdyb3RlOgo+PiA+IE9uIEZyaSwgMjAgT2N0IDIw MTcgMTg6Mjk6MzMgKzAyMDAKPj4gPiBDaHJpc3RvcGggSGVsbHdpZyA8aGNoQGxzdC5kZT4gd3Jv dGU6Cj4+ID4KPj4gPj4gT24gRnJpLCBPY3QgMjAsIDIwMTcgYXQgMDg6MjM6MDJBTSAtMDcwMCwg RGFuIFdpbGxpYW1zIHdyb3RlOgo+PiA+PiA+IFllcywgaG93ZXZlciBpdCBzZWVtcyB0aGVzZSBk cml2ZXJzIC8gcGxhdGZvcm1zIGhhdmUgYmVlbiBsaXZpbmcgd2l0aAo+PiA+PiA+IHRoZSBsYWNr IG9mIHN0cnVjdCBwYWdlIGZvciBhIGxvbmcgdGltZS4gU28gdGhleSBlaXRoZXIgZG9uJ3QgdXNl IERBWCwKPj4gPj4gPiBvciB0aGV5IGhhdmUgYSBjb25zdHJhaW5lZCB1c2UgY2FzZSB0aGF0IG5l dmVyIHRyaWdnZXJzCj4+ID4+ID4gZ2V0X3VzZXJfcGFnZXMoKS4gSWYgaXQgaXMgdGhlIGxhdHRl ciB0aGVuIHRoZXkgY291bGQgaW50cm9kdWNlIGEgbmV3Cj4+ID4+ID4gY29uZmlndXJhdGlvbiBv cHRpb24gdGhhdCBieXBhc3NlcyB0aGUgcGZuX3RfZGV2bWFwKCkgY2hlY2sgaW4KPj4gPj4gPiBi ZGV2X2RheF9zdXBwb3J0ZWQoKSBhbmQgZml4IHVwIHRoZSBnZXRfdXNlcl9wYWdlcygpIHBhdGhz IHRvIGZhaWwuCj4+ID4+ID4gU28sIEknZCBsaWtlIHRvIHVuZGVyc3RhbmQgaG93IHRoZXNlIGRy aXZlcnMgaGF2ZSBiZWVuIHVzaW5nIERBWAo+PiA+PiA+IHN1cHBvcnQgd2l0aG91dCBzdHJ1Y3Qg cGFnZSB0byBzZWUgaWYgd2UgbmVlZCBhIHdvcmthcm91bmQgb3Igd2UgY2FuCj4+ID4+ID4gZ28g YWhlYWQgZGVsZXRlIHRoaXMgc3VwcG9ydC4gSWYgdGhlIHVzYWdlIGlzIGxpbWl0ZWQgdG8KPj4g Pj4gPiBleGVjdXRlLWluLXBsYWNlIHBlcmhhcHMgd2UgY2FuIGRvIGEgY29uc3RyYWluZWQgLT5k aXJlY3RfYWNjZXNzKCkgZm9yCj4+ID4+ID4ganVzdCB0aGF0IGNhc2UuCj4+ID4+Cj4+ID4+IEZv ciBheG9ucmFtIEkgZG91YnQgYW55b25lIGlzIHVzaW5nIGl0IGFueSBtb3JlIC0gaXQgd2FzIGEg dmVyeSBmb3IKPj4gPj4gdGhlIElCTSBDZWxsIGJsYWRlcywgd2hpY2ggd2VyZSBwcm9kdWNl0ZUg aW4gYSByYXRoZXIgbGltaXRlZCBudW1iZXIuCj4+ID4+IEFuZCBDZWxsIGJhc2ljYWxseSBzZWVt cyB0byBiZSBkZWFkIGFzIGZhciBhcyBJIGNhbiB0ZWxsLgo+PiA+Pgo+PiA+PiBGb3IgUy8zOTAg TWFydGluIG1pZ2h0IGJlIGFibGUgdG8gaGVscCBvdXQgd2hhdCB0aGUgc3RhdHVzIG9mIHhwcmFt Cj4+ID4+IGluIGdlbmVyYWwgYW5kIERBWCBzdXBwb3J0IGluIHBhcnRpY3VsYXIgaXMuCj4+ID4K Pj4gPiBUaGUgZ29lcyBiYWNrIHRvIHRoZSB0aW1lIHdoZXJlIERBWCB3YXMgY2FsbGVkIFhJUC4g VGhlIGluaXRpYWwgZGVzaWduCj4+ID4gcG9pbnQgaGFzIGJlZW4gKm5vdCogdG8gaGF2ZSBzdHJ1 Y3QgcGFnZXMgZm9yIGEgbGFyZ2UgcmVhZC1vbmx5IG1lbW9yeQo+PiA+IGFyZWEuIFRoZXJlIGlz IGEgYmxvY2sgZGV2aWNlIGRyaXZlciBmb3Igei9WTSB0aGF0IG1hcHMgYSBEQ1NTIHNlZ21lbnQK Pj4gPiBzb21ld2hlcmUgaW4gbWVtb3JlIChubyBzdHJ1Y3QgcGFnZSEpIHdpdGggZS5nLiB0aGUg Y29tcGxldGUgL3Vzcgo+PiA+IGZpbGVzeXN0ZW0uIFRoZSB4cHJhbSBkcml2ZXIgaXMgYSBkaWZm ZXJlbnQgYmVhc3QgYW5kIGhhcyBub3RoaW5nIHRvCj4+ID4gZG8gd2l0aCBYSVAvREFYLgo+PiA+ Cj4+ID4gTm93LCBpZiBhbnkgdGhlcmUgYXJlIHZlcnkgZmV3IHVzZXJzIG9mIHRoZSBkY3NzYmxr IGRyaXZlciBvdXQgdGhlcmUuCj4+ID4gVGhlIGlkZWEgdG8gc2F2ZSBhIGZldyBtZWdhYnl0ZSBm b3IgL3VzciBuZXZlciByZWFsbHkgdG9vayBvZi4KPj4gPgo+PiA+IFdlIGhhdmUgdG8gbG9vayBh dCBvdXIgZ2V0X3VzZXJfcGFnZXMoKSBpbXBsZW1lbnRhdGlvbiB0byBzZWUgaG93IGhhcmQKPj4g PiBpdCB3b3VsZCBiZSB0byBtYWtlIGl0IGZhaWwgaWYgdGhlIHRhcmdldCBhZGRyZXNzIGlzIGZv ciBhbiBhcmVhIHdpdGhvdXQKPj4gPiBzdHJ1Y3QgcGFnZXMuCj4+Cj4+IEZvciByZWFkLW9ubHkg bWVtb3J5IEkgdGhpbmsgd2UgY2FuIGVuYWJsZSBhIHN1YnNldCBvZiBEQVgsIGFuZAo+PiBleHBs aWNpdGx5IHR1cm4gb2ZmIHRoZSBwYXRocyB0aGF0IHJlcXVpcmUgZ2V0X3VzZXJfcGFnZXMoKS4g SG93ZXZlciwKPj4gSSB3b25kZXIgaWYgYW55b25lIGhhcyB0ZXN0ZWQgREFYIHdpdGggZGNzc2Js ayBiZWNhdXNlIGZvcmsoKSByZXF1aXJlcwo+PiBnZXRfdXNlcl9wYWdlcygpPwo+Cj4gSSBkaWQg bm90IHRlc3QgaXQgcmVjZW50bHksIHNvbWVvbmUgZWxzZSBtaWdodCBoYXZlLiBHZXJhbGQ/Cj4K PiBMb29raW5nIGF0IHRoZSBjb2RlIEkgc2VlIHRoaXMgaW4gdGhlIHMzOTAgdmVyc2lvbiBvZiBn dXBfcHRlX3JhbmdlOgo+Cj4gICAgICAgICBtYXNrID0gKHdyaXRlID8gX1BBR0VfUFJPVEVDVCA6 IDApIHwgX1BBR0VfSU5WQUxJRCB8IF9QQUdFX1NQRUNJQUw7Cj4gICAgICAgICAuLi4KPiAgICAg ICAgICAgICAgICAgaWYgKChwdGVfdmFsKHB0ZSkgJiBtYXNrKSAhPSAwKQo+ICAgICAgICAgICAg ICAgICAgICAgICAgIHJldHVybiAwOwo+ICAgICAgICAgLi4uCj4KPiBUaGUgWElQIGNvZGUgdXNl ZCB0aGUgcHRlX21rc3BlY2lhbCBtZWNoYW5pY3MgdG8gbWFrZSBpdCB3b3JrLiBBcyBmYXIgYXMK PiBJIGNhbiBzZWUgdGhlIHBmbl90X2Rldm1hcCByZXR1cm5zIHRydWUgZm9yIHRoZSBEQVggbWFw cGlucywgeWVzPwoKWWVzLCBidXQgdGhhdCdzIG9ubHkgZm9yIGdldF91c2VyX3BhZ2VzX2Zhc3Qo KSBzdXBwb3J0LgoKPiBUaGVuIEkgd291bGQgc2F5IHRoYXQgZGNzc2JsayBhbmQgREFYIGN1cnJl bnRseSBkbyBub3Qgd29yayB0b2dldGhlci4KCkkgdGhpbmsgYXQgYSBtaW5pbXVtIHdlIG5lZWQg YSBuZXcgcGZuX3QgZmxhZyBmb3IgdGhlICdzcGVjaWFsJyBiaXQgdG8KYXQgbGVhc3QgaW5kaWNh dGUgdGhhdCBEQVggbWFwcGluZ3Mgb2YgZGNzc2JsayBhbmQgYXhvbnJhbSBkbyBub3QKc3VwcG9y dCBub3JtYWwgZ2V0X3VzZXJfcGFnZXMoKS4gVGhlbiBJIGRvbid0IG5lZWQgdG8gZXhwbGljaXRs eQpkaXNhYmxlIERBWCBpbiB0aGUgIXBmbl90X2Rldm1hcCgpIGNhc2UuIEkgdGhpbmsgSSBhbHNv IHdhbnQgdG8gc3BsaXQKdGhlICJwZm5fdG9fdmlydCgpIiBhbmQgdGhlICJzZWN0b3IgdG8gcGZu IiBvcGVyYXRpb25zIGludG8gZGlzdGluY3QKZGF4X29wZXJhdGlvbnMgcmF0aGVyIHRoYW4gZG9p bmcgYm90aCBpbiBvbmUgLT5kaXJlY3RfYWNjZXNzKCkuIFRoaXMKc3VwcG9ydHMgc3RvcmluZyBw Zm5zIGluIHRoZSBmcy9kYXggcmFkaXggcmF0aGVyIHRoYW4gc2VjdG9ycy4KCkluIG90aGVyIHdv cmRzLCB0aGUgcGZuX3RfZGV2bWFwKCkgcmVxdWlyZW1lbnQgd2FzIG9ubHkgYWJvdXQgbWFraW5n CmdldF91c2VyX3BhZ2VzKCkgc2FmZWx5IGZhaWwsIGFuZCBwdGVfc3BlY2lhbCgpIGZpbGxzIHRo YXQKcmVxdWlyZW1lbnQuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fCkxpbnV4LW52ZGltbSBtYWlsaW5nIGxpc3QKTGludXgtbnZkaW1tQGxpc3RzLjAxLm9y ZwpodHRwczovL2xpc3RzLjAxLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LW52ZGltbQo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751465AbdJWLUu (ORCPT ); Mon, 23 Oct 2017 07:20:50 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:54688 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751405AbdJWLUr (ORCPT ); Mon, 23 Oct 2017 07:20:47 -0400 X-Google-Smtp-Source: ABhQp+SgXJDvdSfzVhFWHrkacRJU4lgJMyn1O001GK7LuDyoESewwRGv9gxaGHFQ/xZlgC0oenTgtazB+Cg3xlJreW8= MIME-Version: 1.0 In-Reply-To: <20171023124427.10d15ee3@mschwideX1> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <150846714747.24336.14704246566580871364.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020075735.GA14378@lst.de> <20171020162933.GA26320@lst.de> <20171023071835.67ee5210@mschwideX1> <20171023124427.10d15ee3@mschwideX1> From: Dan Williams Date: Mon, 23 Oct 2017 04:20:46 -0700 Message-ID: Subject: Re: [PATCH v3 02/13] dax: require 'struct page' for filesystem dax To: Martin Schwidefsky Cc: Christoph Hellwig , Andrew Morton , Jan Kara , "linux-nvdimm@lists.01.org" , Benjamin Herrenschmidt , Heiko Carstens , "linux-kernel@vger.kernel.org" , linux-xfs@vger.kernel.org, Linux MM , Jeff Moyer , Paul Mackerras , Michael Ellerman , linux-fsdevel , Ross Zwisler , Gerald Schaefer Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v9NBKrK9028928 On Mon, Oct 23, 2017 at 3:44 AM, Martin Schwidefsky wrote: > On Mon, 23 Oct 2017 01:55:20 -0700 > Dan Williams wrote: > >> On Sun, Oct 22, 2017 at 10:18 PM, Martin Schwidefsky >> wrote: >> > On Fri, 20 Oct 2017 18:29:33 +0200 >> > Christoph Hellwig wrote: >> > >> >> On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: >> >> > Yes, however it seems these drivers / platforms have been living with >> >> > the lack of struct page for a long time. So they either don't use DAX, >> >> > or they have a constrained use case that never triggers >> >> > get_user_pages(). If it is the latter then they could introduce a new >> >> > configuration option that bypasses the pfn_t_devmap() check in >> >> > bdev_dax_supported() and fix up the get_user_pages() paths to fail. >> >> > So, I'd like to understand how these drivers have been using DAX >> >> > support without struct page to see if we need a workaround or we can >> >> > go ahead delete this support. If the usage is limited to >> >> > execute-in-place perhaps we can do a constrained ->direct_access() for >> >> > just that case. >> >> >> >> For axonram I doubt anyone is using it any more - it was a very for >> >> the IBM Cell blades, which were produceѕ in a rather limited number. >> >> And Cell basically seems to be dead as far as I can tell. >> >> >> >> For S/390 Martin might be able to help out what the status of xpram >> >> in general and DAX support in particular is. >> > >> > The goes back to the time where DAX was called XIP. The initial design >> > point has been *not* to have struct pages for a large read-only memory >> > area. There is a block device driver for z/VM that maps a DCSS segment >> > somewhere in memore (no struct page!) with e.g. the complete /usr >> > filesystem. The xpram driver is a different beast and has nothing to >> > do with XIP/DAX. >> > >> > Now, if any there are very few users of the dcssblk driver out there. >> > The idea to save a few megabyte for /usr never really took of. >> > >> > We have to look at our get_user_pages() implementation to see how hard >> > it would be to make it fail if the target address is for an area without >> > struct pages. >> >> For read-only memory I think we can enable a subset of DAX, and >> explicitly turn off the paths that require get_user_pages(). However, >> I wonder if anyone has tested DAX with dcssblk because fork() requires >> get_user_pages()? > > I did not test it recently, someone else might have. Gerald? > > Looking at the code I see this in the s390 version of gup_pte_range: > > mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL; > ... > if ((pte_val(pte) & mask) != 0) > return 0; > ... > > The XIP code used the pte_mkspecial mechanics to make it work. As far as > I can see the pfn_t_devmap returns true for the DAX mappins, yes? Yes, but that's only for get_user_pages_fast() support. > Then I would say that dcssblk and DAX currently do not work together. I think at a minimum we need a new pfn_t flag for the 'special' bit to at least indicate that DAX mappings of dcssblk and axonram do not support normal get_user_pages(). Then I don't need to explicitly disable DAX in the !pfn_t_devmap() case. I think I also want to split the "pfn_to_virt()" and the "sector to pfn" operations into distinct dax_operations rather than doing both in one ->direct_access(). This supports storing pfns in the fs/dax radix rather than sectors. In other words, the pfn_t_devmap() requirement was only about making get_user_pages() safely fail, and pte_special() fills that requirement. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20171023124427.10d15ee3@mschwideX1> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <150846714747.24336.14704246566580871364.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020075735.GA14378@lst.de> <20171020162933.GA26320@lst.de> <20171023071835.67ee5210@mschwideX1> <20171023124427.10d15ee3@mschwideX1> From: Dan Williams Date: Mon, 23 Oct 2017 04:20:46 -0700 Message-ID: Subject: Re: [PATCH v3 02/13] dax: require 'struct page' for filesystem dax To: Martin Schwidefsky Cc: Christoph Hellwig , Andrew Morton , Jan Kara , "linux-nvdimm@lists.01.org" , Benjamin Herrenschmidt , Heiko Carstens , "linux-kernel@vger.kernel.org" , linux-xfs@vger.kernel.org, Linux MM , Jeff Moyer , Paul Mackerras , Michael Ellerman , linux-fsdevel , Ross Zwisler , Gerald Schaefer Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: On Mon, Oct 23, 2017 at 3:44 AM, Martin Schwidefsky wrote: > On Mon, 23 Oct 2017 01:55:20 -0700 > Dan Williams wrote: > >> On Sun, Oct 22, 2017 at 10:18 PM, Martin Schwidefsky >> wrote: >> > On Fri, 20 Oct 2017 18:29:33 +0200 >> > Christoph Hellwig wrote: >> > >> >> On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: >> >> > Yes, however it seems these drivers / platforms have been living wi= th >> >> > the lack of struct page for a long time. So they either don't use D= AX, >> >> > or they have a constrained use case that never triggers >> >> > get_user_pages(). If it is the latter then they could introduce a n= ew >> >> > configuration option that bypasses the pfn_t_devmap() check in >> >> > bdev_dax_supported() and fix up the get_user_pages() paths to fail. >> >> > So, I'd like to understand how these drivers have been using DAX >> >> > support without struct page to see if we need a workaround or we ca= n >> >> > go ahead delete this support. If the usage is limited to >> >> > execute-in-place perhaps we can do a constrained ->direct_access() = for >> >> > just that case. >> >> >> >> For axonram I doubt anyone is using it any more - it was a very for >> >> the IBM Cell blades, which were produce=D1=95 in a rather limited num= ber. >> >> And Cell basically seems to be dead as far as I can tell. >> >> >> >> For S/390 Martin might be able to help out what the status of xpram >> >> in general and DAX support in particular is. >> > >> > The goes back to the time where DAX was called XIP. The initial design >> > point has been *not* to have struct pages for a large read-only memory >> > area. There is a block device driver for z/VM that maps a DCSS segment >> > somewhere in memore (no struct page!) with e.g. the complete /usr >> > filesystem. The xpram driver is a different beast and has nothing to >> > do with XIP/DAX. >> > >> > Now, if any there are very few users of the dcssblk driver out there. >> > The idea to save a few megabyte for /usr never really took of. >> > >> > We have to look at our get_user_pages() implementation to see how hard >> > it would be to make it fail if the target address is for an area witho= ut >> > struct pages. >> >> For read-only memory I think we can enable a subset of DAX, and >> explicitly turn off the paths that require get_user_pages(). However, >> I wonder if anyone has tested DAX with dcssblk because fork() requires >> get_user_pages()? > > I did not test it recently, someone else might have. Gerald? > > Looking at the code I see this in the s390 version of gup_pte_range: > > mask =3D (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPEC= IAL; > ... > if ((pte_val(pte) & mask) !=3D 0) > return 0; > ... > > The XIP code used the pte_mkspecial mechanics to make it work. As far as > I can see the pfn_t_devmap returns true for the DAX mappins, yes? Yes, but that's only for get_user_pages_fast() support. > Then I would say that dcssblk and DAX currently do not work together. I think at a minimum we need a new pfn_t flag for the 'special' bit to at least indicate that DAX mappings of dcssblk and axonram do not support normal get_user_pages(). Then I don't need to explicitly disable DAX in the !pfn_t_devmap() case. I think I also want to split the "pfn_to_virt()" and the "sector to pfn" operations into distinct dax_operations rather than doing both in one ->direct_access(). This supports storing pfns in the fs/dax radix rather than sectors. In other words, the pfn_t_devmap() requirement was only about making get_user_pages() safely fail, and pte_special() fills that requirement. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org