All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elliott, Robert (Server Storage)" <Elliott@hp.com>
To: Christoph Hellwig <hch@lst.de>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>
Cc: "ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"boaz@plexistor.com" <boaz@plexistor.com>,
	"Kani, Toshimitsu" <toshi.kani@hp.com>
Subject: RE: another pmem variant V2
Date: Tue, 31 Mar 2015 22:11:29 +0000	[thread overview]
Message-ID: <94D0CD8314A33A4D9D801C0FE68B40295A853392@G9W0745.americas.hpqcorp.net> (raw)
In-Reply-To: <1427358764-6126-1-git-send-email-hch@lst.de>



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Christoph Hellwig
> Sent: Thursday, March 26, 2015 3:33 AM
> To: linux-nvdimm@ml01.01.org; linux-fsdevel@vger.kernel.org; linux-
> kernel@vger.kernel.org; x86@kernel.org
> Cc: ross.zwisler@linux.intel.com; axboe@kernel.dk; boaz@plexistor.com
> Subject: another pmem variant V2
> 
> Here is another version of the same trivial pmem driver, because two
> obviously aren't enough.  The first patch is the same pmem driver
> that Ross posted a short time ago, just modified to use platform_devices
> to find the persistant memory region instead of hardconding it in the
> Kconfig.  This allows to keep pmem.c separate from any discovery mechanism,
> but still allow auto-discovery.
> 
...
> This has been tested both with a real NVDIMM on a system with a type 12
> capable bios, as well as with "fake persistent" memory using the memmap=
> option.
> 
> Changes since V1:
>   - s/E820_PROTECTED_KERN/E820_PMEM/g
>   - map the persistent memory as uncached
>   - better kernel parameter description
>   - various typo fixes
>   - MODULE_LICENSE fix

I used fio to test 4 KiB random read and write IOPS 
on a 2-socket x86 DDR4 system.  With various cache attributes:

attr	read		write		notes
----	----		-----		-----
UC	37 K		21 K		ioremap_nocache
WB	3.6 M		2.5 M		ioremap
WC	764 K		3.7 M		ioremap_wc
WT	<not tested yet>		ioremap_wt

So, although UC and WT are the only modes certain to be safe,
the V1 default of UC provides abysmal performance - worse than
a consumer-class SATA SSD.

A solution for x86 is to use the MOVNTI instruction in WB
mode. This non-temporal hint uses a buffer like the write
combining buffer, not filling the cache and not stopping
everything in the CPU.  The kernel function __copy_from_user() 
uses that instruction (with SFENCE at the end) - see
arch/x86/lib/copy_user_nocache_64.S.

If I made the change from memcpy() to __copy_from_user()
correctly, that results in:

attr		read		write		notes
----		----		-----		-----
WB w/NTI	2.4 M		2.6 M		__copy_from_user()
WC w/NTI	3.2 M		2.1 M		__copy_from_user()

There is also a non-temporal streaming load hint instruction
called MOVNTDQA that might be helpful for reads for both WB
and WC. I don't see any existing kernel memcpy-like function 
that utilizes this instruction, so haven't tried it yet.


Intel64 and IA-32 Architectures 
Software Developers Manual excerpts (Jan 2015)
===================================
"The non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ,
MOVNTPS, and MOVNTPD) allow data to be moved from the
processor's registers directly into system memory without
being also written into the L1, L2, and/or L3 caches. These
instructions can be used to prevent cache pollution when
operating on data that is going to be modified only once
before being stored back into system memory. ...

MOVNTI
...
The non-temporal hint is implemented by using a write
combining (WC) memory type protocol when writing the
data to memory. Using this protocol, the processor
does not write the data into the cache hierarchy,
nor does it fetch the corresponding cache line from
memory into the cache hierarchy.
...

MOVNTDQA Provides a non-temporal hint that can cause
adjacent 16-byte items within an aligned 64-byte region
(a streaming line) to be fetched and held in a small
set of temporary buffers ("streaming load buffers"). 
Subsequent streaming loads to other aligned 16-byte 
items in the same streaming line may be supplied from
the streaming load buffer and can improve throughput.
...
A processor implementation may make use of the 
non-temporal hint associated with this instruction if
the memory source is WC (write combining) memory type. 
An implementation may also make use of the non-temporal
hint associated with this instruction if the memory
source is WB (writeback) memory type."



  parent reply	other threads:[~2015-03-31 22:12 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26  8:32 another pmem variant V2 Christoph Hellwig
2015-03-26  8:32 ` [PATCH 1/3] pmem: Initial version of persistent memory driver Christoph Hellwig
2015-03-26 14:12   ` [Linux-nvdimm] " Dan Williams
2015-03-26 14:35     ` Christoph Hellwig
2015-03-26 21:37       ` Ross Zwisler
2015-03-26 14:52     ` Boaz Harrosh
2015-03-26 15:59       ` Dan Williams
2015-03-26  8:32 ` [PATCH 2/3] x86: add a is_e820_ram() helper Christoph Hellwig
2015-03-26  9:02   ` Ingo Molnar
2015-03-26  9:34     ` Christoph Hellwig
2015-03-26 10:04       ` Ingo Molnar
2015-03-26 10:19         ` Christoph Hellwig
2015-03-26 10:28           ` Ingo Molnar
2015-03-26 10:29             ` Christoph Hellwig
2015-03-26 15:49       ` Boaz Harrosh
2015-03-26 16:02         ` [Linux-nvdimm] " Dan Williams
2015-03-26 16:07           ` Boaz Harrosh
2015-03-26 16:43         ` Christoph Hellwig
2015-03-26 18:46           ` Elliott, Robert (Server Storage)
2015-03-26 19:25             ` [Linux-nvdimm] " Dan Williams
2015-03-26 20:53           ` Ross Zwisler
2015-03-26 22:59       ` Yinghai Lu
2015-03-27  8:10         ` Christoph Hellwig
2015-03-26  8:32 ` [PATCH 3/3] x86: add support for the non-standard protected e820 type Christoph Hellwig
2015-03-26 16:57 ` another pmem variant V2 Boaz Harrosh
2015-03-26 17:02   ` [PATCH] SQUASHME: Streamline pmem.c Boaz Harrosh
2015-03-26 17:23     ` Christoph Hellwig
2015-03-26 22:17     ` Ross Zwisler
2015-03-26 22:22     ` Ross Zwisler
2015-03-26 23:31     ` [Linux-nvdimm] " Dan Williams
2015-03-31 13:44       ` Boaz Harrosh
2015-03-26 17:18   ` another pmem variant V2 Christoph Hellwig
2015-03-26 17:31     ` Boaz Harrosh
2015-03-26 18:38       ` Christoph Hellwig
2015-03-31  9:25   ` Christoph Hellwig
2015-03-31 10:25     ` Boaz Harrosh
2015-03-31 10:31       ` Boaz Harrosh
2015-03-31 14:21       ` [RFC] SQUASHME: pmem: Split up pmem_probe from pmem_alloc Boaz Harrosh
2015-03-31 16:10         ` Christoph Hellwig
2015-03-31 16:08       ` another pmem variant V2 Christoph Hellwig
2015-03-31 13:18     ` [SQUASHME 0/6] Streamline of Initial pmem submission Boaz Harrosh
2015-03-31 13:23       ` [PATCH 1/6] SQUASHME: Don't let e820_PMEM sections Boaz Harrosh
2015-03-31 17:16         ` [Linux-nvdimm] " Brooks, Adam J
2015-03-31 13:24       ` [PATCH 2/6] SQUASHME: pmem: Remove getgeo Boaz Harrosh
2015-03-31 13:25       ` [PATCH 3/6] SQUASHME: pmem: Streamline pmem driver Boaz Harrosh
2015-03-31 13:27       ` [PATCH 4/6] SQUSHME: pmem: Micro cleaning Boaz Harrosh
2015-03-31 15:17         ` [Linux-nvdimm] " Dan Williams
2015-03-31 15:24           ` Boaz Harrosh
2015-03-31 15:30             ` Dan Williams
2015-03-31 15:43               ` Boaz Harrosh
2015-03-31 19:40                 ` Matthew Wilcox
2015-03-31 13:28       ` [PATCH 5/6] SQUASHME: pmem: Remove SECTOR_SHIFT Boaz Harrosh
2015-03-31 13:33       ` [PATCH 6/6] SQUASHME: pmem: Remove "... based on brd.c" + Copyright Boaz Harrosh
2015-03-31 15:14     ` another pmem variant V2 Boaz Harrosh
2015-03-31 16:16       ` Christoph Hellwig
2015-03-31 16:44         ` Ingo Molnar
2015-03-31 17:24           ` Christoph Hellwig
2015-03-31 17:33             ` [Linux-nvdimm] " Dan Williams
2015-04-01  7:50               ` Ingo Molnar
2015-04-01  8:06                 ` Boaz Harrosh
2015-04-01 12:49         ` Boaz Harrosh
2015-03-31 22:11 ` Elliott, Robert (Server Storage) [this message]
2015-04-01  7:26   ` Christoph Hellwig
2015-04-02 15:11     ` Elliott, Robert (Server Storage)
2015-04-02 16:41       ` Christoph Hellwig
2015-04-02 18:03         ` Ingo Molnar
2015-04-02 18:03           ` Ingo Molnar
2015-04-01 19:33 ` Elliott, Robert (Server Storage)
2015-04-02  9:37   ` Christoph Hellwig
2015-03-26 18:38 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94D0CD8314A33A4D9D801C0FE68B40295A853392@G9W0745.americas.hpqcorp.net \
    --to=elliott@hp.com \
    --cc=axboe@kernel.dk \
    --cc=boaz@plexistor.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=toshi.kani@hp.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.