linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Huaisheng HS1 Ye <yehs1@lenovo.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	NingTing Cheng <chengnt@lenovo.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"pasha.tatashin@oracle.com" <pasha.tatashin@oracle.com>,
	Linux MM <linux-mm@kvack.org>, "colyli@suse.de" <colyli@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sasha Levin <alexander.levin@verizon.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mikulas Patocka <mpatocka@redhat.com>
Subject: Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone
Date: Mon, 7 May 2018 20:52:57 -0700	[thread overview]
Message-ID: <CAPcyv4imJSVaSBTcjLSi6RMpN7PBhe5DMZUf93rbwMcgvcYVDQ@mail.gmail.com> (raw)
In-Reply-To: <HK2PR03MB1684659175EB0A11E75E9B61929A0@HK2PR03MB1684.apcprd03.prod.outlook.com>

On Mon, May 7, 2018 at 7:59 PM, Huaisheng HS1 Ye <yehs1@lenovo.com> wrote:
>>
>>Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Mon, May 7, 2018 at 11:46 AM, Matthew Wilcox <willy@infradead.org>
>>wrote:
>>>> On Mon, May 07, 2018 at 10:50:21PM +0800, Huaisheng Ye wrote:
>>>>> Traditionally, NVDIMMs are treated by mm(memory management)
>>subsystem as
>>>>> DEVICE zone, which is a virtual zone and both its start and end of pfn
>>>>> are equal to 0, mm wouldn’t manage NVDIMM directly as DRAM, kernel
>>uses
>>>>> corresponding drivers, which locate at \drivers\nvdimm\ and
>>>>> \drivers\acpi\nfit and fs, to realize NVDIMM memory alloc and free with
>>>>> memory hot plug implementation.
>>>>
>>>> You probably want to let linux-nvdimm know about this patch set.
>>>> Adding to the cc.
>>>
>>> Yes, thanks for that!
>>>
>>>> Also, I only received patch 0 and 4.  What happened
>>>> to 1-3,5 and 6?
>>>>
>>>>> With current kernel, many mm’s classical features like the buddy
>>>>> system, swap mechanism and page cache couldn’t be supported to
>>NVDIMM.
>>>>> What we are doing is to expand kernel mm’s capacity to make it to
>>handle
>>>>> NVDIMM like DRAM. Furthermore we make mm could treat DRAM and
>>NVDIMM
>>>>> separately, that means mm can only put the critical pages to NVDIMM
>>
>>Please define "critical pages."
>>
>>>>> zone, here we created a new zone type as NVM zone. That is to say for
>>>>> traditional(or normal) pages which would be stored at DRAM scope like
>>>>> Normal, DMA32 and DMA zones. But for the critical pages, which we hope
>>>>> them could be recovered from power fail or system crash, we make them
>>>>> to be persistent by storing them to NVM zone.
>>
>>[...]
>>
>>> I think adding yet one more mm-zone is the wrong direction. Instead,
>>> what we have been considering is a mechanism to allow a device-dax
>>> instance to be given back to the kernel as a distinct numa node
>>> managed by the VM. It seems it times to dust off those patches.
>>
>>What's the use case?  The above patch description seems to indicate an
>>intent to recover contents after a power loss.  Without seeing the whole
>>series, I'm not sure how that's accomplished in a safe or meaningful
>>way.
>>
>>Huaisheng, could you provide a bit more background?
>>
>
> Currently in our mind, an ideal use scenario is that, we put all page caches to
> zone_nvm, without any doubt, page cache is an efficient and common cache
> implement, but it has a disadvantage that all dirty data within it would has risk
> to be missed by power failure or system crash. If we put all page caches to NVDIMMs,
> all dirty data will be safe.
>
> And the most important is that, Page cache is different from dm-cache or B-cache.
> Page cache exists at mm. So, it has much more performance than other Write
> caches, which locate at storage level.

Can you be more specific? I think the only fundamental performance
difference between page cache and a block caching driver is that page
cache pages can be DMA'ed directly to lower level storage. However, I
believe that problem is solvable, i.e. we can teach dm-cache to
perform the equivalent of in-kernel direct-I/O when transferring data
between the cache and the backing storage when the cache is comprised
of persistent memory.

>
> At present we have realized NVM zone to be supported by two sockets(NUMA)
> product based on Lenovo Purley platform, and we can expand NVM flag into
> Page Cache allocation interface, so all Page Caches of system had been stored
> to NVDIMM safely.
>
> Now we are focusing how to recover data from Page cache after power on. That is,
> The dirty pages could be safe and the time cost of cache training would be saved a lot.
> Because many pages have already stored to ZONE_NVM before power failture.

I don't see how ZONE_NVM fits into a persistent page cache solution.
All of the mm structures to maintain the page cache are built to be
volatile. Once you build the infrastructure to persist and restore the
state of the page cache it is no longer the traditional page cache.
I.e. it will become something much closer to dm-cache or a filesystem.

One nascent idea from Dave Chinner is to teach xfs how to be a block
server for an upper level filesystem. His aim is sub-volume and
snapshot support, but I wonder if caching could be adapted into that
model?

In any event I think persisting and restoring cache state needs to be
designed before deciding if changes to the mm are needed.

  parent reply	other threads:[~2018-05-08  3:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-07 14:50 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
2018-05-07 14:50 ` [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table Huaisheng Ye
2018-05-07 18:46 ` [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Matthew Wilcox
2018-05-07 18:57   ` Dan Williams
2018-05-07 19:08     ` Jeff Moyer
2018-05-07 19:17       ` Dan Williams
2018-05-07 19:28         ` Jeff Moyer
2018-05-07 19:29           ` Dan Williams
2018-05-08  2:59       ` [External] " Huaisheng HS1 Ye
2018-05-08  3:09         ` Matthew Wilcox
2018-05-09  4:47           ` Huaisheng HS1 Ye
2018-05-10 16:27             ` Matthew Wilcox
2018-05-15 16:07               ` Huaisheng HS1 Ye
2018-05-15 16:20                 ` Matthew Wilcox
2018-05-16  2:05                   ` Huaisheng HS1 Ye
2018-05-16  2:48                     ` Dan Williams
2018-05-16  8:33                       ` Huaisheng HS1 Ye
2018-05-16  2:52                     ` Matthew Wilcox
2018-05-16  4:10                       ` Dan Williams
2018-05-08  3:52         ` Dan Williams [this message]
2018-05-07 19:18     ` Matthew Wilcox
2018-05-07 19:30       ` Dan Williams
2018-05-08  0:54   ` [External] " Huaisheng HS1 Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4imJSVaSBTcjLSi6RMpN7PBhe5DMZUf93rbwMcgvcYVDQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=chengnt@lenovo.com \
    --cc=colyli@suse.de \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mpatocka@redhat.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yehs1@lenovo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).