All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	linux-nvdimm@lists.01.org, Ingo Molnar <mingo@kernel.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, Theodore Ts'o <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Wed, 6 May 2015 15:10:07 -0700	[thread overview]
Message-ID: <CA+55aFzON9617c2_Amep0ngLq91kfrPiSccdZakxir82iekUiA@mail.gmail.com> (raw)
In-Reply-To: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, May 6, 2015 at 1:04 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> The motivation for this change is persistent memory and the desire to
> use it not only via the pmem driver, but also as a memory target for I/O
> (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel.

I detest this approach.

I'd much rather go exactly the other way around, and do the dynamic
"struct page" instead.

Add a flag to "struct page" to mark it as a fake entry and teach
"page_to_pfn()" to look up the actual pfn some way (that union tha
contains "index" looks like a good target to also contain 'pfn', for
example).

Especially if this is mainly for persistent storage, we'll never have
issues with worrying about writing it back under memory pressure, so
allocating a "struct page" for these things shouldn't be a problem.
There's likely only a few paths that actually generate IO for those
things.

In other words, I'd really like our basic infrastructure to be for the
*normal* case, and the "struct page" is about so much more than just
"what's the target for IO". For normal IO, "struct page" is also what
serializes the IO so that you have a consistent view of the end
result, and there's obviously the reference count there too. So I
really *really* think that "struct page" is the better entity for
describing the actual IO, because it's the common and the generic
thing, while a "pfn" is not actually *enough* for IO in general, and
you now end up having to look up the "struct page" for the locking and
refcounting etc.

If you go the other way, and instead generate a "struct page" from the
pfn for the few cases that need it, you put the onus on odd behavior
where it belongs.

Yes, it might not be any simpler in the end, but I think it would be
conceptually much better.

                    Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	linux-nvdimm@ml01.01.org, Ingo Molnar <mingo@kernel.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, "Theodore Ts'o" <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>, Tejun Heo <tj@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Wed, 6 May 2015 15:10:07 -0700	[thread overview]
Message-ID: <CA+55aFzON9617c2_Amep0ngLq91kfrPiSccdZakxir82iekUiA@mail.gmail.com> (raw)
In-Reply-To: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, May 6, 2015 at 1:04 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> The motivation for this change is persistent memory and the desire to
> use it not only via the pmem driver, but also as a memory target for I/O
> (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel.

I detest this approach.

I'd much rather go exactly the other way around, and do the dynamic
"struct page" instead.

Add a flag to "struct page" to mark it as a fake entry and teach
"page_to_pfn()" to look up the actual pfn some way (that union tha
contains "index" looks like a good target to also contain 'pfn', for
example).

Especially if this is mainly for persistent storage, we'll never have
issues with worrying about writing it back under memory pressure, so
allocating a "struct page" for these things shouldn't be a problem.
There's likely only a few paths that actually generate IO for those
things.

In other words, I'd really like our basic infrastructure to be for the
*normal* case, and the "struct page" is about so much more than just
"what's the target for IO". For normal IO, "struct page" is also what
serializes the IO so that you have a consistent view of the end
result, and there's obviously the reference count there too. So I
really *really* think that "struct page" is the better entity for
describing the actual IO, because it's the common and the generic
thing, while a "pfn" is not actually *enough* for IO in general, and
you now end up having to look up the "struct page" for the locking and
refcounting etc.

If you go the other way, and instead generate a "struct page" from the
pfn for the few cases that need it, you put the onus on odd behavior
where it belongs.

Yes, it might not be any simpler in the end, but I think it would be
conceptually much better.

                    Linus

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>, Neil Brown <neilb@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Chris Mason <clm@fb.com>, Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	linux-nvdimm@lists.01.org, Ingo Molnar <mingo@kernel.org>,
	Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Jens Axboe <axboe@kernel.dk>, "Theodore Ts'o" <tytso@mit.edu>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Julia Lawall <Julia.Lawall@lip6.fr>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
Date: Wed, 6 May 2015 15:10:07 -0700	[thread overview]
Message-ID: <CA+55aFzON9617c2_Amep0ngLq91kfrPiSccdZakxir82iekUiA@mail.gmail.com> (raw)
In-Reply-To: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, May 6, 2015 at 1:04 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>
> The motivation for this change is persistent memory and the desire to
> use it not only via the pmem driver, but also as a memory target for I/O
> (DAX, O_DIRECT, DMA, RDMA, etc) in other parts of the kernel.

I detest this approach.

I'd much rather go exactly the other way around, and do the dynamic
"struct page" instead.

Add a flag to "struct page" to mark it as a fake entry and teach
"page_to_pfn()" to look up the actual pfn some way (that union tha
contains "index" looks like a good target to also contain 'pfn', for
example).

Especially if this is mainly for persistent storage, we'll never have
issues with worrying about writing it back under memory pressure, so
allocating a "struct page" for these things shouldn't be a problem.
There's likely only a few paths that actually generate IO for those
things.

In other words, I'd really like our basic infrastructure to be for the
*normal* case, and the "struct page" is about so much more than just
"what's the target for IO". For normal IO, "struct page" is also what
serializes the IO so that you have a consistent view of the end
result, and there's obviously the reference count there too. So I
really *really* think that "struct page" is the better entity for
describing the actual IO, because it's the common and the generic
thing, while a "pfn" is not actually *enough* for IO in general, and
you now end up having to look up the "struct page" for the locking and
refcounting etc.

If you go the other way, and instead generate a "struct page" from the
pfn for the few cases that need it, you put the onus on odd behavior
where it belongs.

Yes, it might not be any simpler in the end, but I think it would be
conceptually much better.

                    Linus

  parent reply	other threads:[~2015-05-06 22:10 UTC|newest]

Thread overview: 180+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 20:04 [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Dan Williams
2015-05-06 20:04 ` Dan Williams
2015-05-06 20:04 ` [PATCH v2 01/10] arch: introduce __pfn_t for persistent memory i/o Dan Williams
2015-05-06 20:04   ` Dan Williams
2015-05-07 14:55   ` Stephen Rothwell
2015-05-07 14:55     ` Stephen Rothwell
2015-05-08  0:21     ` Dan Williams
2015-05-08  0:21       ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 02/10] block: add helpers for accessing a bio_vec page Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-08 15:59   ` Dan Williams
2015-05-08 15:59     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 03/10] block: convert .bv_page to .bv_pfn bio_vec Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 04/10] dma-mapping: allow archs to optionally specify a ->map_pfn() operation Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 05/10] scatterlist: use sg_phys() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 06/10] scatterlist: support "page-less" (__pfn_t only) entries Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 07/10] x86: support dma_map_pfn() Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 08/10] x86: support kmap_atomic_pfn_t() for persistent memory Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:20   ` [Linux-nvdimm] " Dan Williams
2015-05-06 20:20     ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 09/10] dax: convert to __pfn_t Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:05 ` [PATCH v2 10/10] block: base support for pfn i/o Dan Williams
2015-05-06 20:05   ` Dan Williams
2015-05-06 20:50 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Al Viro
2015-05-06 20:50   ` Al Viro
2015-05-06 22:10 ` Linus Torvalds [this message]
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 22:10   ` Linus Torvalds
2015-05-06 23:47   ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-06 23:47     ` Dan Williams
2015-05-07  0:19     ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  0:19       ` Linus Torvalds
2015-05-07  2:36       ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  2:36         ` Dan Williams
2015-05-07  9:02         ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07  9:02           ` Ingo Molnar
2015-05-07 14:42           ` Ingo Molnar
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 14:42             ` Ingo Molnar
2015-05-07 15:52             ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 15:52               ` Dan Williams
2015-05-07 17:52               ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 17:52                 ` Ingo Molnar
2015-05-07 15:00         ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:00           ` Linus Torvalds
2015-05-07 15:40           ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:40             ` Dan Williams
2015-05-07 15:58             ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 15:58               ` Linus Torvalds
2015-05-07 16:03               ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 16:03                 ` Dan Williams
2015-05-07 17:36                 ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:36                   ` Ingo Molnar
2015-05-07 17:42                   ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:42                     ` Dan Williams
2015-05-07 17:56                     ` Dave Hansen
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 17:56                       ` Dave Hansen
2015-05-07 19:11                       ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:11                         ` Ingo Molnar
2015-05-07 19:36                         ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:36                           ` Jerome Glisse
2015-05-07 19:48                           ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:48                             ` Ingo Molnar
2015-05-07 19:53                             ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 19:53                               ` Ingo Molnar
2015-05-07 20:18                               ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-07 20:18                                 ` Jerome Glisse
2015-05-08  5:37                                 ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  5:37                                   ` Ingo Molnar
2015-05-08  9:20                                   ` Al Viro
2015-05-08  9:20                                     ` Al Viro
2015-05-08  9:26                                     ` Ingo Molnar
2015-05-08  9:26                                       ` Ingo Molnar
2015-05-08 10:00                                       ` Al Viro
2015-05-08 10:00                                         ` Al Viro
2015-05-08 13:45                         ` Rik van Riel
2015-05-08 13:45                           ` Rik van Riel
2015-05-08 14:05                           ` Ingo Molnar
2015-05-08 14:05                             ` Ingo Molnar
2015-05-08 14:40                             ` John Stoffel
2015-05-08 14:40                               ` John Stoffel
2015-05-08 15:54                               ` Linus Torvalds
2015-05-08 15:54                                 ` Linus Torvalds
2015-05-08 16:28                                 ` Al Viro
2015-05-08 16:28                                   ` Al Viro
2015-05-08 16:59                                 ` Rik van Riel
2015-05-08 16:59                                   ` Rik van Riel
2015-05-09  1:14                                   ` Linus Torvalds
2015-05-09  1:14                                     ` Linus Torvalds
2015-05-09  3:02                                     ` Rik van Riel
2015-05-09  3:02                                       ` Rik van Riel
2015-05-09  3:52                                       ` Linus Torvalds
2015-05-09  3:52                                         ` Linus Torvalds
2015-05-09 21:56                                       ` Dave Chinner
2015-05-09 21:56                                         ` Dave Chinner
2015-05-09  8:45                                   ` "Directly mapped persistent memory page cache" Ingo Molnar
2015-05-09  8:45                                     ` Ingo Molnar
2015-05-09 15:51                                     ` Eric W. Biederman
2015-05-09 15:51                                       ` Eric W. Biederman
2015-05-10 10:07                                       ` Ingo Molnar
2015-05-10 10:07                                         ` Ingo Molnar
2015-05-09 18:24                                     ` Dan Williams
2015-05-09 18:24                                       ` Dan Williams
2015-05-10  9:46                                       ` Ingo Molnar
2015-05-10  9:46                                         ` Ingo Molnar
2015-05-10 17:29                                         ` Dan Williams
2015-05-10 17:29                                           ` Dan Williams
2015-05-11  8:25                                     ` Dave Chinner
2015-05-11  8:25                                       ` Dave Chinner
2015-05-11  9:18                                       ` Ingo Molnar
2015-05-11  9:18                                         ` Ingo Molnar
2015-05-11 10:12                                         ` Zuckerman, Boris
2015-05-11 10:12                                           ` Zuckerman, Boris
2015-05-11 10:38                                           ` Ingo Molnar
2015-05-11 10:38                                             ` Ingo Molnar
2015-05-11 14:51                                             ` Jeff Moyer
2015-05-11 14:51                                               ` Jeff Moyer
2015-05-12  0:53                                         ` Dave Chinner
2015-05-12  0:53                                           ` Dave Chinner
2015-05-12 14:47                                           ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-05-12 14:47                                             ` Jerome Glisse
2015-06-05  5:43                                             ` Dan Williams
2015-06-05  5:43                                               ` Dan Williams
2015-05-11 14:31                                     ` Matthew Wilcox
2015-05-11 14:31                                       ` Matthew Wilcox
2015-05-11 20:01                                       ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-11 20:01                                         ` Jerome Glisse
2015-05-08 20:40                                 ` [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t John Stoffel
2015-05-08 20:40                                   ` John Stoffel
2015-05-08 14:54                             ` Rik van Riel
2015-05-08 14:54                               ` Rik van Riel
2015-05-07 17:43                 ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 17:43                   ` Linus Torvalds
2015-05-07 20:06                   ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 20:06                     ` Dan Williams
2015-05-07 16:18       ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:18         ` Christoph Hellwig
2015-05-07 16:41         ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 16:41           ` Dan Williams
2015-05-07 18:40           ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 18:40             ` Ingo Molnar
2015-05-07 19:44             ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 19:44               ` Dan Williams
2015-05-07 17:30         ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse
2015-05-07 17:30           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFzON9617c2_Amep0ngLq91kfrPiSccdZakxir82iekUiA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=Julia.Lawall@lip6.fr \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=boaz@plexistor.com \
    --cc=clm@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=martin.petersen@oracle.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=neilb@suse.de \
    --cc=paulus@samba.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.