linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <openosd@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Boaz Harrosh <boaz@plexistor.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jens Axboe <axboe@fb.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm@ml01.01.org, Toshi Kani <toshi.kani@hp.com>,
	linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/9] mm: Let sparse_{add,remove}_one_section receive a node_id
Date: Sun, 14 Sep 2014 12:36:23 +0300	[thread overview]
Message-ID: <54156197.5050303@gmail.com> (raw)
In-Reply-To: <5411D6D9.5080107@intel.com>

On 09/11/2014 08:07 PM, Dave Hansen wrote:
<>
> 
> OK, that sounds like it will work.  The "leaked until the next mount"
> sounds disastrous, but I'm sure you'll fix that.  I can see how it might
> lead to some fragmentation if only small amounts are ever pinned, but
> not a deal-breaker.
> 

There is no such thing as fragmentation with memory mapped storage ;-)

<>
> I'm saying that, if we have a 'struct page' for the memory, we should
> try to make the mmap()s more normal.  This enables all kinds of things
> that DAX does not support today, like direct I/O.
> 

What? no! direct I/O is fully supported. Including all API's of it. Do
you mean open(O_DIRECT) and io_submit(..) Yes it is fully supported.

In fact all IO is direct IO. there is never page-cache on the way, hence direct

BTW: These patches enable something else. Say FSA is DAX and FSB is regular
disk FS then
	fda = open(/mnt/FSA);
	pa = mmap(fda, ...);

	fdb = open(/mnt/FSB, O_DIRECT);
	io_submit(fdb,..,pa ,..);
	/* I mean pa is put for IO into the passed iocb for fdb */

Before this patch above will not work and revert to buffered IO, but
with these patches it will work.
Please note this is true for the submitted pmem driver. With brd which
also supports DAX this will work, because brd always uses pages.

<>
> Great, so we at least agree that this adds complexity.
> 

But the complexity is already there DAX by Matthew is to go in soon I hope.
Surly these added pages do not add to the complexity that much.

<>
> 
> OK, so I think I at least understand the scope of the patch set and the
> limitations.  I think I've summarized the limitations:
> 
> 1. Approach requires all of RAM+Pmem to be direct-mapped (rules out
>    almost all 32-bit systems, or any 64-bit systems with more than 64TB
>    of RAM+pmem-storage)

Yes, for NOW

> 2. Approach is currently incompatible with some kernel code that
>    requires a 'struct page' (such as direct I/O), and all kernel code
>    that requires knowledge of zones or NUMA nodes.

NO!
Direct IO - supported
NUMA - supported

"all kernel code that requires knowledge of zones" - Not needed

> 3. Approach requires 1/64 of the amount of storage to be consumed by
>    RAM for a pseudo 'struct page'.  If you had 64GB of storage and 1GB
>    of RAM, you would simply run our of RAM.
> 

Yes so in a system as above of 64GB of pmem, 1GB of pmem will need to be
set aside and hotpluged as volatile memory. This already works today BTW
you can set aside a portion of NvDIMM and hotplug it as system memory.

We are already used to pay that ratio for RAM.
On a kernel-config choice that ratio can be also paid for pmem. This is
why I left it a configuration option

> Did I miss any?
> 

Thanks
Boaz


  reply	other threads:[~2014-09-14  9:36 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-27 21:11 [PATCH 0/4] Add persistent memory driver Ross Zwisler
2014-08-27 21:11 ` [PATCH 1/4] pmem: Initial version of " Ross Zwisler
2014-09-09 16:23   ` [PATCH v2] " Boaz Harrosh
2014-09-09 16:53     ` [Linux-nvdimm] " Dan Williams
2014-09-10 13:23       ` Boaz Harrosh
2014-09-10 17:03         ` Dan Williams
2014-09-10 17:47           ` Boaz Harrosh
2014-09-10 23:01             ` Dan Williams
2014-09-11 10:45               ` Boaz Harrosh
2014-09-11 16:31                 ` Dan Williams
2014-09-14 11:18                   ` Boaz Harrosh
2014-09-16 13:54                     ` Jeff Moyer
2014-09-16 16:24                       ` Boaz Harrosh
2014-09-19 16:27                       ` Dan Williams
2014-09-21  9:27                         ` Boaz Harrosh
2014-11-02  3:22   ` [PATCH 1/4] " Elliott, Robert (Server Storage)
2014-11-03 15:50     ` Jeff Moyer
2014-11-03 16:19     ` Wilcox, Matthew R
2014-11-04 10:37       ` Boaz Harrosh
2014-11-04 16:26         ` Elliott, Robert (Server Storage)
2014-11-04 16:41           ` Ross Zwisler
2014-11-04 17:06             ` Boaz Harrosh
2014-08-27 21:12 ` [PATCH 2/4] pmem: Add support for getgeo() Ross Zwisler
2014-11-02  3:27   ` Elliott, Robert (Server Storage)
2014-11-03 16:36     ` Wilcox, Matthew R
2014-08-27 21:12 ` [PATCH 3/4] pmem: Add support for rw_page() Ross Zwisler
2014-08-27 21:12 ` [PATCH 4/4] pmem: Add support for direct_access() Ross Zwisler
2014-09-09 15:37 ` [PATCH 0/9] pmem: Fixes and farther development (mm: add_persistent_memory) Boaz Harrosh
2014-09-09 15:44   ` [PATCH 4/9] SQUASHME: pmem: Support of multiple memory regions Boaz Harrosh
2014-09-09 15:45   ` [PATCH 5/9] mm: Let sparse_{add,remove}_one_section receive a node_id Boaz Harrosh
2014-09-09 18:36     ` Dave Hansen
2014-09-10 10:07       ` Boaz Harrosh
2014-09-10 16:10         ` Dave Hansen
2014-09-10 17:25           ` Boaz Harrosh
2014-09-10 18:28             ` Dave Hansen
2014-09-11  8:39               ` Boaz Harrosh
2014-09-11 17:07                 ` Dave Hansen
2014-09-14  9:36                   ` Boaz Harrosh [this message]
2014-09-09 15:47   ` [PATCH 6/9] mm: New add_persistent_memory/remove_persistent_memory Boaz Harrosh
2014-09-09 15:48   ` [PATCH 7/9] pmem: Add support for page structs Boaz Harrosh
2014-09-09 15:51   ` [PATCH 9/9] pmem: KISS, remove register_blkdev Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54156197.5050303@gmail.com \
    --to=openosd@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=boaz@plexistor.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=toshi.kani@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).