linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Rik van Riel <riel@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	axboe@kernel.dk, linux-nvdimm@ml01.01.org,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-raid@vger.kernel.org, mgorman@suse.de, hch@infradead.org,
	linux-fsdevel@vger.kernel.org,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [RFC PATCH 0/7] evacuate struct page from the block layer
Date: Sun, 22 Mar 2015 17:51:26 +0200	[thread overview]
Message-ID: <550EE4FE.1070009@plexistor.com> (raw)
In-Reply-To: <550C490E.1080708@redhat.com>

On 03/20/2015 06:21 PM, Rik van Riel wrote:
> On 03/19/2015 09:43 AM, Matthew Wilcox wrote:
> 
>> 1. Construct struct pages for persistent memory
>> 1a. Permanently
>> 1b. While the pages are under I/O
> 
> Michael Tsirkin and I have been doing some thinking about what
> it would take to allocate struct pages per 2MB area permanently,
> and allocate additional struct pages for 4kB pages on demand,
> when a 2MB area is broken up into 4kB pages.
> 
> This should work for both DRAM and persistent memory.
> 

My thoughts as well, this need *not* be a huge evasive change. Is however
a careful surgery in very core code. And lots of sleepless scary nights
and testing to make sure all the side effects are wrinkled out.

BTW: Basic core block code may very well work with:
	bv_page, bv_len > PAGE_SIZE bv_offset > PAGE_SIZE.

  Meaning bv_page-pfn is contiguous in physical space (and virtual
  of course). So much so that there are already rumors that this suppose
  to be supported, and there are already out-of-tree drivers that use
  this today by kmalloc a page-order and feeding BIOs with bv_len=64K

  But going out of block-layer and say to networking say via iscsi and
  this breaks pretty fast. Lets fix that then lets introduce a:
	page_size(page)
  page already knows its size (ie belonging to a 2M THP)

> I am still not convinced it is worthwhile to have struct pages
> for persistent memory though, but I am willing to change my mind.
> 

If we want copy-less, we need a common memory descriptor career. Today this
is page-struct. So for me your above statement means:
	"still not convinced I care about copy-less pmem"

Otherwise you either enhance what you have today or devise a new
system, which means change the all Kernel.

Lastly: Why does pmem need to wait out-of-tree. Even you say above that
machines with lots of DRAM can enjoy the HUGE-to-4k split. So why
not let pmem waist 4k pages like everyone else and fix it as above
down the line, both for pmem and ram. And save both ways.
Why do we need to first change the all Kernel, then have pmem. Why not
use current infra structure, for good or for worth, and incrementally
do better.

May I call you on the phone to try and work things out. I believe the
huge page thing + 4k on demand is not a very big change, as long as
	struct page *page is left as is, everywhere.

But may *now* carry a different physical/virtual contiguous payload
bigger then 4k. Is not the PAGE_SIZE the real bug? lets fix that problem.

Thanks
Boaz


  parent reply	other threads:[~2015-03-22 15:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-16 20:25 [RFC PATCH 0/7] evacuate struct page from the block layer Dan Williams
2015-03-16 20:25 ` [RFC PATCH 1/7] block: add helpers for accessing a bio_vec page Dan Williams
2015-03-16 20:25 ` [RFC PATCH 2/7] block: convert bio_vec.bv_page to bv_pfn Dan Williams
2015-03-16 23:05   ` Al Viro
2015-03-17 13:02     ` Matthew Wilcox
2015-03-17 15:53       ` Dan Williams
2015-03-16 20:25 ` [RFC PATCH 3/7] dma-mapping: allow archs to optionally specify a ->map_pfn() operation Dan Williams
2015-03-18 11:21   ` [Linux-nvdimm] " Boaz Harrosh
2015-03-16 20:25 ` [RFC PATCH 4/7] scatterlist: use sg_phys() Dan Williams
2015-03-16 20:25 ` [RFC PATCH 5/7] scatterlist: support "page-less" (__pfn_t only) entries Dan Williams
2015-03-16 20:25 ` [RFC PATCH 6/7] x86: support dma_map_pfn() Dan Williams
2015-03-16 20:26 ` [RFC PATCH 7/7] block: base support for pfn i/o Dan Williams
2015-03-18 10:47 ` [RFC PATCH 0/7] evacuate struct page from the block layer Boaz Harrosh
2015-03-18 13:06   ` Matthew Wilcox
2015-03-18 14:38     ` [Linux-nvdimm] " Boaz Harrosh
2015-03-20 15:56       ` Rik van Riel
2015-03-22 11:53         ` Boaz Harrosh
2015-03-18 15:35   ` Dan Williams
2015-03-18 20:26 ` Andrew Morton
2015-03-19 13:43   ` Matthew Wilcox
2015-03-19 15:54     ` [Linux-nvdimm] " Boaz Harrosh
2015-03-19 19:59       ` Andrew Morton
2015-03-19 20:59         ` Dan Williams
2015-03-22 17:22           ` Boaz Harrosh
2015-03-20 17:32         ` Wols Lists
2015-03-22 10:30         ` Boaz Harrosh
2015-03-19 18:17     ` Christoph Hellwig
2015-03-19 19:31       ` Matthew Wilcox
2015-03-22 16:46       ` Boaz Harrosh
2015-03-20 16:21     ` Rik van Riel
2015-03-20 20:31       ` Matthew Wilcox
2015-03-20 21:08         ` Rik van Riel
2015-03-22 17:06           ` Boaz Harrosh
2015-03-22 17:22             ` Dan Williams
2015-03-22 17:39               ` Boaz Harrosh
2015-03-20 21:17         ` Wols Lists
2015-03-22 16:24         ` Boaz Harrosh
2015-03-22 15:51       ` Boaz Harrosh [this message]
2015-03-23 15:19         ` Rik van Riel
2015-03-23 19:30           ` Christoph Hellwig
2015-03-24  9:41           ` Boaz Harrosh
2015-03-24 16:57             ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=550EE4FE.1070009@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@infradead.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mst@redhat.com \
    --cc=riel@redhat.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).