All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-mm@kvack.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	jgross@suse.com, Stefan Bader <stefan.bader@canonical.com>,
	Andy Lutomirski <luto@amacapital.net>,
	hmh@hmh.eng.br, yigal@plexistor.com,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	mcgrof@suse.com, Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@linux.intel.com>
Subject: Re: [PATCH v10 12/12] drivers/block/pmem: Map NVDIMM with ioremap_wt()
Date: Fri, 29 May 2015 11:19:57 -0700	[thread overview]
Message-ID: <CAPcyv4g+zYFkEYpa0HCh0Q+2C3wWNr6v3ZU143h52OKf=U=Qvw@mail.gmail.com> (raw)
In-Reply-To: <1432911782.23540.55.camel@misato.fc.hp.com>

On Fri, May 29, 2015 at 8:03 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2015-05-29 at 07:43 -0700, Dan Williams wrote:
>> On Fri, May 29, 2015 at 2:11 AM, Borislav Petkov <bp@alien8.de> wrote:
>> > On Wed, May 27, 2015 at 09:19:04AM -0600, Toshi Kani wrote:
>> >> The pmem driver maps NVDIMM with ioremap_nocache() as we cannot
>> >> write back the contents of the CPU caches in case of a crash.
>> >>
>> >> This patch changes to use ioremap_wt(), which provides uncached
>> >> writes but cached reads, for improving read performance.
>> >>
>> >> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
>> >> ---
>> >>  drivers/block/pmem.c |    4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
>> >> index eabf4a8..095dfaa 100644
>> >> --- a/drivers/block/pmem.c
>> >> +++ b/drivers/block/pmem.c
>> >> @@ -139,11 +139,11 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
>> >>       }
>> >>
>> >>       /*
>> >> -      * Map the memory as non-cachable, as we can't write back the contents
>> >> +      * Map the memory as write-through, as we can't write back the contents
>> >>        * of the CPU caches in case of a crash.
>> >>        */
>> >>       err = -ENOMEM;
>> >> -     pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
>> >> +     pmem->virt_addr = ioremap_wt(pmem->phys_addr, pmem->size);
>> >>       if (!pmem->virt_addr)
>> >>               goto out_release_region;
>> >
>> > Dan, Ross, what about this one?
>> >
>> > ACK to pick it up as a temporary solution?
>>
>> I see that is_new_memtype_allowed() is updated to disallow some
>> combinations, but the manual seems to imply any mixing of memory types
>> is unsupported.  Which worries me even in the current code where we
>> have uncached mappings in the driver, and potentially cached DAX
>> mappings handed out to userspace.
>
> is_new_memtype_allowed() is not to allow some combinations of mixing of
> memory types.  When it is allowed, the requested type of ioremap_xxx()
> is changed to match with the existing map type, so that mixing of memory
> types does not happen.

Yes, but now if the caller was expecting one memory type and gets
another one that is something I think the driver would want to know.
At a minimum I don't think we want to get emails about pmem driver
performance problems when someone's platform is silently degrading WB
to UC for example.

> DAX uses vm_insert_mixed(), which does not even check the existing map
> type to the physical address.

Right, I think that's a problem...

>> A general quibble separate from this patch is that we don't have a way
>> of knowing if ioremap() will reject or change our requested memory
>> type.  Shouldn't the driver be explicitly requesting a known valid
>> type in advance?
>
> I agree we need a solution here.
>
>> Lastly we now have the PMEM API patches from Ross out for review where
>> he is assuming cached mappings with non-temporal writes:
>> https://lists.01.org/pipermail/linux-nvdimm/2015-May/000929.html.
>> This gives us WC semantics on writes which I believe has the nice
>> property of reducing the number of write transactions to memory.
>> Also, the numbers in the paper seem to be assuming DAX operation, but
>> this ioremap_wt() is in the driver and typically behind a file system.
>> Are the numbers relevant to that usage mode?
>
> I have not looked into the Ross's changes yet, but they do not seem to
> replace the use of ioremap_nocache().  If his changes can use WB type
> reliably, yes, we do not need a temporary solution of using ioremap_wt()
> in this driver.

Hmm, yes you're right, it seems those patches did not change the
implementation to use ioremap_cache()... which happens to not be
implemented on all architectures.  I'll take a look.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-mm@kvack.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	jgross@suse.com, Stefan Bader <stefan.bader@canonical.com>,
	Andy Lutomirski <luto@amacapital.net>,
	hmh@hmh.eng.br, yigal@plexistor.com,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	mcgrof@suse.com, Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@linux.intel.com>
Subject: Re: [PATCH v10 12/12] drivers/block/pmem: Map NVDIMM with ioremap_wt()
Date: Fri, 29 May 2015 11:19:57 -0700	[thread overview]
Message-ID: <CAPcyv4g+zYFkEYpa0HCh0Q+2C3wWNr6v3ZU143h52OKf=U=Qvw@mail.gmail.com> (raw)
In-Reply-To: <1432911782.23540.55.camel@misato.fc.hp.com>

On Fri, May 29, 2015 at 8:03 AM, Toshi Kani <toshi.kani@hp.com> wrote:
> On Fri, 2015-05-29 at 07:43 -0700, Dan Williams wrote:
>> On Fri, May 29, 2015 at 2:11 AM, Borislav Petkov <bp@alien8.de> wrote:
>> > On Wed, May 27, 2015 at 09:19:04AM -0600, Toshi Kani wrote:
>> >> The pmem driver maps NVDIMM with ioremap_nocache() as we cannot
>> >> write back the contents of the CPU caches in case of a crash.
>> >>
>> >> This patch changes to use ioremap_wt(), which provides uncached
>> >> writes but cached reads, for improving read performance.
>> >>
>> >> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
>> >> ---
>> >>  drivers/block/pmem.c |    4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c
>> >> index eabf4a8..095dfaa 100644
>> >> --- a/drivers/block/pmem.c
>> >> +++ b/drivers/block/pmem.c
>> >> @@ -139,11 +139,11 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res)
>> >>       }
>> >>
>> >>       /*
>> >> -      * Map the memory as non-cachable, as we can't write back the contents
>> >> +      * Map the memory as write-through, as we can't write back the contents
>> >>        * of the CPU caches in case of a crash.
>> >>        */
>> >>       err = -ENOMEM;
>> >> -     pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
>> >> +     pmem->virt_addr = ioremap_wt(pmem->phys_addr, pmem->size);
>> >>       if (!pmem->virt_addr)
>> >>               goto out_release_region;
>> >
>> > Dan, Ross, what about this one?
>> >
>> > ACK to pick it up as a temporary solution?
>>
>> I see that is_new_memtype_allowed() is updated to disallow some
>> combinations, but the manual seems to imply any mixing of memory types
>> is unsupported.  Which worries me even in the current code where we
>> have uncached mappings in the driver, and potentially cached DAX
>> mappings handed out to userspace.
>
> is_new_memtype_allowed() is not to allow some combinations of mixing of
> memory types.  When it is allowed, the requested type of ioremap_xxx()
> is changed to match with the existing map type, so that mixing of memory
> types does not happen.

Yes, but now if the caller was expecting one memory type and gets
another one that is something I think the driver would want to know.
At a minimum I don't think we want to get emails about pmem driver
performance problems when someone's platform is silently degrading WB
to UC for example.

> DAX uses vm_insert_mixed(), which does not even check the existing map
> type to the physical address.

Right, I think that's a problem...

>> A general quibble separate from this patch is that we don't have a way
>> of knowing if ioremap() will reject or change our requested memory
>> type.  Shouldn't the driver be explicitly requesting a known valid
>> type in advance?
>
> I agree we need a solution here.
>
>> Lastly we now have the PMEM API patches from Ross out for review where
>> he is assuming cached mappings with non-temporal writes:
>> https://lists.01.org/pipermail/linux-nvdimm/2015-May/000929.html.
>> This gives us WC semantics on writes which I believe has the nice
>> property of reducing the number of write transactions to memory.
>> Also, the numbers in the paper seem to be assuming DAX operation, but
>> this ioremap_wt() is in the driver and typically behind a file system.
>> Are the numbers relevant to that usage mode?
>
> I have not looked into the Ross's changes yet, but they do not seem to
> replace the use of ioremap_nocache().  If his changes can use WB type
> reliably, yes, we do not need a temporary solution of using ioremap_wt()
> in this driver.

Hmm, yes you're right, it seems those patches did not change the
implementation to use ioremap_cache()... which happens to not be
implemented on all architectures.  I'll take a look.

  reply	other threads:[~2015-05-29 18:19 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-27 15:18 [PATCH v10 0/12] Support Write-Through mapping on x86 Toshi Kani
2015-05-27 15:18 ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 1/12] x86, mm, pat: Set WT to PA7 slot of PAT MSR Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 2/12] x86, mm, pat: Change reserve_memtype() for WT Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 3/12] x86, asm: Change is_new_memtype_allowed() " Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 4/12] x86, mm, asm-gen: Add ioremap_wt() " Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 5/12] arch/*/asm/io.h: Add ioremap_wt() to all architectures Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 6/12] video/fbdev, asm/io.h: Remove ioremap_writethrough() Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:18 ` [PATCH v10 7/12] x86, mm, pat: Add pgprot_writethrough() for WT Toshi Kani
2015-05-27 15:18   ` Toshi Kani
2015-05-27 15:19 ` [PATCH v10 8/12] x86, mm, asm: Add WT support to set_page_memtype() Toshi Kani
2015-05-27 15:19   ` Toshi Kani
2015-05-27 15:19 ` [PATCH v10 9/12] x86, mm: Add set_memory_wt() for WT Toshi Kani
2015-05-27 15:19   ` Toshi Kani
2015-05-27 15:19 ` [PATCH v10 10/12] x86, mm, pat: Cleanup init flags in pat_init() Toshi Kani
2015-05-27 15:19   ` Toshi Kani
2015-05-29  8:59   ` Borislav Petkov
2015-05-29  8:59     ` Borislav Petkov
2015-05-27 15:19 ` [PATCH v10 11/12] x86, mm, pat: Refactor !pat_enabled handling Toshi Kani
2015-05-27 15:19   ` Toshi Kani
2015-05-29  8:58   ` Borislav Petkov
2015-05-29  8:58     ` Borislav Petkov
2015-05-29 14:27     ` Toshi Kani
2015-05-29 14:27       ` Toshi Kani
2015-05-29 15:13       ` Borislav Petkov
2015-05-29 15:13         ` Borislav Petkov
2015-05-29 15:17         ` Toshi Kani
2015-05-29 15:17           ` Toshi Kani
2015-05-27 15:19 ` [PATCH v10 12/12] drivers/block/pmem: Map NVDIMM with ioremap_wt() Toshi Kani
2015-05-27 15:19   ` Toshi Kani
2015-05-29  9:11   ` Borislav Petkov
2015-05-29  9:11     ` Borislav Petkov
2015-05-29 14:43     ` Dan Williams
2015-05-29 14:43       ` Dan Williams
2015-05-29 15:03       ` Toshi Kani
2015-05-29 15:03         ` Toshi Kani
2015-05-29 15:03         ` Toshi Kani
2015-05-29 18:19         ` Dan Williams [this message]
2015-05-29 18:19           ` Dan Williams
2015-05-29 18:32           ` Toshi Kani
2015-05-29 18:32             ` Toshi Kani
2015-05-29 18:32             ` Toshi Kani
2015-05-29 19:34             ` Dan Williams
2015-05-29 19:34               ` Dan Williams
2015-05-29 20:10               ` Toshi Kani
2015-05-29 20:10                 ` Toshi Kani
2015-05-29 18:34           ` Andy Lutomirski
2015-05-29 18:34             ` Andy Lutomirski
2015-05-29 19:32             ` Dan Williams
2015-05-29 19:32               ` Dan Williams
2015-05-29 21:29             ` Elliott, Robert (Server Storage)
2015-05-29 21:29               ` Elliott, Robert (Server Storage)
2015-05-29 21:29               ` Elliott, Robert (Server Storage)
2015-05-29 21:46               ` Andy Lutomirski
2015-05-29 21:46                 ` Andy Lutomirski
2015-05-29 22:24                 ` Elliott, Robert (Server Storage)
2015-05-29 22:24                   ` Elliott, Robert (Server Storage)
2015-05-29 22:24                   ` Elliott, Robert (Server Storage)
2015-05-29 22:32                 ` H. Peter Anvin
2015-05-29 22:32                   ` H. Peter Anvin
2015-06-01  8:58                 ` Ingo Molnar
2015-06-01  8:58                   ` Ingo Molnar
2015-06-01 17:10                   ` Andy Lutomirski
2015-06-01 17:10                     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4g+zYFkEYpa0HCh0Q+2C3wWNr6v3ZU143h52OKf=U=Qvw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Elliott@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=hch@lst.de \
    --cc=hmh@hmh.eng.br \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@amacapital.net \
    --cc=mcgrof@suse.com \
    --cc=mingo@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stefan.bader@canonical.com \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hp.com \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=yigal@plexistor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.