linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
@ 2017-04-08  0:12 Dan Williams
  2017-04-10 18:53 ` Kani, Toshimitsu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-04-08  0:12 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jan Kara, Toshi Kani, Matthew Wilcox, x86, linux-kernel, stable,
	Christoph Hellwig, Jeff Moyer, Ingo Molnar, Al Viro,
	H. Peter Anvin, Thomas Gleixner, Ross Zwisler

Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <stable@vger.kernel.org>
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
[toshi: trailing bytes flush only needed in the 4B misalign case]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
v2: Change the condition for flushing the last cacheline of the
    destination from 8-byte to 4-byte misalignment (Toshi)

 arch/x86/include/asm/pmem.h |   41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/pmem.h b/arch/x86/include/asm/pmem.h
index 2c1ebeb4d737..cf4e68faedc4 100644
--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -55,7 +55,8 @@ static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
  * @size:	number of bytes to write back
  *
  * Write back a cache range using the CLWB (cache line write back)
- * instruction.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
  */
 static inline void arch_wb_cache_pmem(void *addr, size_t size)
 {
@@ -69,15 +70,6 @@ static inline void arch_wb_cache_pmem(void *addr, size_t size)
 		clwb(p);
 }
 
-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
-	return iter_is_iovec(i) == false;
-}
-
 /**
  * arch_copy_from_iter_pmem - copy data from an iterator to PMEM
  * @addr:	PMEM destination address
@@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
 	/* TODO: skip the write-back by always using non-temporal stores */
 	len = copy_from_iter_nocache(addr, bytes, i);
 
-	if (__iter_needs_pmem_wb(i))
+	/*
+	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
+	 * non-temporal stores for the bulk of the transfer, but we need
+	 * to manually flush if the transfer is unaligned. In the
+	 * non-iovec case the entire destination needs to be flushed.
+	 */
+	if (iter_is_iovec(i)) {
+		unsigned long dest = (unsigned long) addr;
+
+		/*
+		 * If the destination is not 8-byte aligned then
+		 * __copy_user_nocache (on x86_64) uses cached copies
+		 */
+		if (dest & 8) {
+			arch_wb_cache_pmem(addr, 1);
+			dest = ALIGN(dest, 8);
+		}
+
+		/*
+		 * If the remaining transfer length, after accounting
+		 * for destination alignment, is not 4-byte aligned
+		 * then __copy_user_nocache() falls back to cached
+		 * copies for the trailing bytes in the final cacheline
+		 * of the transfer.
+		 */
+		if ((bytes - (dest - (unsigned long) addr)) & 4)
+			arch_wb_cache_pmem(addr + bytes - 1, 1);
+	} else
 		arch_wb_cache_pmem(addr, bytes);
 
 	return len;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-08  0:12 [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Dan Williams
@ 2017-04-10 18:53 ` Kani, Toshimitsu
  2017-04-10 21:13   ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Kani, Toshimitsu @ 2017-04-10 18:53 UTC (permalink / raw)
  To: Dan Williams, linux-nvdimm@lists.01.org
  Cc: Jan Kara, Matthew Wilcox, x86, linux-kernel, stable,
	Christoph Hellwig, Jeff Moyer, Ingo Molnar, Al Viro,
	H. Peter Anvin, Thomas Gleixner, Ross Zwisler

> Subject: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-
> bypass assumptions
> 
> Before we rework the "pmem api" to stop abusing __copy_user_nocache()
> for memcpy_to_pmem() we need to fix cases where we may strand dirty data
> in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
> for arbitrary data transfers from userspace. There is no guarantee that
> these transfers, performed by dax_iomap_actor(), will have aligned
> destinations or aligned transfer lengths. Backstop the usage
> __copy_user_nocache() with explicit cache management in these unaligned
> cases.
> 
> Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
> that is saved for a later patch that moves the entirety of the "pmem
> api" into the pmem driver directly.
 :
> ---
> v2: Change the condition for flushing the last cacheline of the
>     destination from 8-byte to 4-byte misalignment (Toshi)
 :
>  arch/x86/include/asm/pmem.h |   41 ++++++++++++++++++++++++++++++----
 :
> @@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void
> *addr, size_t bytes,
>  	/* TODO: skip the write-back by always using non-temporal stores */
>  	len = copy_from_iter_nocache(addr, bytes, i);
> 
> -	if (__iter_needs_pmem_wb(i))
> +	/*
> +	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
> +	 * non-temporal stores for the bulk of the transfer, but we need
> +	 * to manually flush if the transfer is unaligned. In the
> +	 * non-iovec case the entire destination needs to be flushed.
> +	 */
> +	if (iter_is_iovec(i)) {
> +		unsigned long dest = (unsigned long) addr;
> +
> +		/*
> +		 * If the destination is not 8-byte aligned then
> +		 * __copy_user_nocache (on x86_64) uses cached copies
> +		 */
> +		if (dest & 8) {
> +			arch_wb_cache_pmem(addr, 1);
> +			dest = ALIGN(dest, 8);
> +		}
> +
> +		/*
> +		 * If the remaining transfer length, after accounting
> +		 * for destination alignment, is not 4-byte aligned
> +		 * then __copy_user_nocache() falls back to cached
> +		 * copies for the trailing bytes in the final cacheline
> +		 * of the transfer.
> +		 */
> +		if ((bytes - (dest - (unsigned long) addr)) & 4)
> +			arch_wb_cache_pmem(addr + bytes - 1, 1);
> +	} else
>  		arch_wb_cache_pmem(addr, bytes);
> 
>  	return len;

Thanks for the update.  I think the alignment check should be based on
the following note in copy_user_nocache.

 * Note: Cached memory copy is used when destination or size is not
 * naturally aligned. That is:
 *  - Require 8-byte alignment when size is 8 bytes or larger.
 *  - Require 4-byte alignment when size is 4 bytes.

So, I think the code may be something like this.  I also made the following changes:
 - Mask with 7, not 8.
 - ALIGN with cacheline size, instead of 8.
 - Add (bytes > flushed) test since calculation with unsigned long still results in a negative
   value (as a positive value).

        if (bytes < 8) {
                if ((dest & 3) || (bytes != 4))
                        arch_wb_cache_pmem(addr, 1);
        } else {
                if (dest & 7) {
                        dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
                        arch_wb_cache_pmem(addr, 1);
                }

                flushed = dest - (unsigned long) addr;
                if ((bytes > flushed) && ((bytes - flushed) & 7))
                        arch_wb_cache_pmem(addr + bytes - 1, 1);
        }

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 18:53 ` Kani, Toshimitsu
@ 2017-04-10 21:13   ` Dan Williams
  2017-04-10 21:28     ` Kani, Toshimitsu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-04-10 21:13 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

On Mon, Apr 10, 2017 at 11:53 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> Subject: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-
>> bypass assumptions
>>
>> Before we rework the "pmem api" to stop abusing __copy_user_nocache()
>> for memcpy_to_pmem() we need to fix cases where we may strand dirty data
>> in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
>> for arbitrary data transfers from userspace. There is no guarantee that
>> these transfers, performed by dax_iomap_actor(), will have aligned
>> destinations or aligned transfer lengths. Backstop the usage
>> __copy_user_nocache() with explicit cache management in these unaligned
>> cases.
>>
>> Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
>> that is saved for a later patch that moves the entirety of the "pmem
>> api" into the pmem driver directly.
>  :
>> ---
>> v2: Change the condition for flushing the last cacheline of the
>>     destination from 8-byte to 4-byte misalignment (Toshi)
>  :
>>  arch/x86/include/asm/pmem.h |   41 ++++++++++++++++++++++++++++++----
>  :
>> @@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void
>> *addr, size_t bytes,
>>  /* TODO: skip the write-back by always using non-temporal stores */
>>  len = copy_from_iter_nocache(addr, bytes, i);
>>
>> -if (__iter_needs_pmem_wb(i))
>> +/*
>> + * In the iovec case on x86_64 copy_from_iter_nocache() uses
>> + * non-temporal stores for the bulk of the transfer, but we need
>> + * to manually flush if the transfer is unaligned. In the
>> + * non-iovec case the entire destination needs to be flushed.
>> + */
>> +if (iter_is_iovec(i)) {
>> +unsigned long dest = (unsigned long) addr;
>> +
>> +/*
>> + * If the destination is not 8-byte aligned then
>> + * __copy_user_nocache (on x86_64) uses cached copies
>> + */
>> +if (dest & 8) {
>> +arch_wb_cache_pmem(addr, 1);
>> +dest = ALIGN(dest, 8);
>> +}
>> +
>> +/*
>> + * If the remaining transfer length, after accounting
>> + * for destination alignment, is not 4-byte aligned
>> + * then __copy_user_nocache() falls back to cached
>> + * copies for the trailing bytes in the final cacheline
>> + * of the transfer.
>> + */
>> +if ((bytes - (dest - (unsigned long) addr)) & 4)
>> +arch_wb_cache_pmem(addr + bytes - 1, 1);
>> +} else
>>  arch_wb_cache_pmem(addr, bytes);
>>
>>  return len;
>
> Thanks for the update.  I think the alignment check should be based on
> the following note in copy_user_nocache.
>
>  * Note: Cached memory copy is used when destination or size is not
>  * naturally aligned. That is:
>  *  - Require 8-byte alignment when size is 8 bytes or larger.
>  *  - Require 4-byte alignment when size is 4 bytes.
>
> So, I think the code may be something like this.  I also made the following changes:

Thanks!

>  - Mask with 7, not 8.

Yes, good catch.

>  - ALIGN with cacheline size, instead of 8.
>  - Add (bytes > flushed) test since calculation with unsigned long still results in a negative
>    value (as a positive value).
>
>         if (bytes < 8) {
>                 if ((dest & 3) || (bytes != 4))
>                         arch_wb_cache_pmem(addr, 1);
>         } else {
>                 if (dest & 7) {
>                         dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);

Why align the destination to the next cacheline? As far as I can see
the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns
to the next 8-byte boundary.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 21:13   ` Dan Williams
@ 2017-04-10 21:28     ` Kani, Toshimitsu
  2017-04-10 21:35       ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Kani, Toshimitsu @ 2017-04-10 21:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

> > Thanks for the update.  I think the alignment check should be based on
> > the following note in copy_user_nocache.
> >
> >  * Note: Cached memory copy is used when destination or size is not
> >  * naturally aligned. That is:
> >  *  - Require 8-byte alignment when size is 8 bytes or larger.
> >  *  - Require 4-byte alignment when size is 4 bytes.
> >
> > So, I think the code may be something like this.  I also made the following
> changes:
> 
> Thanks!
> 
> >  - Mask with 7, not 8.
> 
> Yes, good catch.
> 
> >  - ALIGN with cacheline size, instead of 8.
> >  - Add (bytes > flushed) test since calculation with unsigned long still results
> in a negative
> >    value (as a positive value).
> >
> >         if (bytes < 8) {
> >                 if ((dest & 3) || (bytes != 4))
> >                         arch_wb_cache_pmem(addr, 1);
> >         } else {
> >                 if (dest & 7) {
> >                         dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
> 
> Why align the destination to the next cacheline? As far as I can see
> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns
> to the next 8-byte boundary.

The clflush here flushes for the cacheline size.  So, we do not need to flush
the same cacheline again when the unaligned tail is in the same line.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 21:28     ` Kani, Toshimitsu
@ 2017-04-10 21:35       ` Dan Williams
  2017-04-10 21:45         ` Kani, Toshimitsu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-04-10 21:35 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

On Mon, Apr 10, 2017 at 2:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> > Thanks for the update.  I think the alignment check should be based on
>> > the following note in copy_user_nocache.
>> >
>> >  * Note: Cached memory copy is used when destination or size is not
>> >  * naturally aligned. That is:
>> >  *  - Require 8-byte alignment when size is 8 bytes or larger.
>> >  *  - Require 4-byte alignment when size is 4 bytes.
>> >
>> > So, I think the code may be something like this.  I also made the following
>> changes:
>>
>> Thanks!
>>
>> >  - Mask with 7, not 8.
>>
>> Yes, good catch.
>>
>> >  - ALIGN with cacheline size, instead of 8.
>> >  - Add (bytes > flushed) test since calculation with unsigned long still results
>> in a negative
>> >    value (as a positive value).
>> >
>> >         if (bytes < 8) {
>> >                 if ((dest & 3) || (bytes != 4))
>> >                         arch_wb_cache_pmem(addr, 1);
>> >         } else {
>> >                 if (dest & 7) {
>> >                         dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
>>
>> Why align the destination to the next cacheline? As far as I can see
>> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns
>> to the next 8-byte boundary.
>
> The clflush here flushes for the cacheline size.  So, we do not need to flush
> the same cacheline again when the unaligned tail is in the same line.

Ok, makes sense. Last question, can't we reduce the check to be:

        if ((bytes > flushed) && ((bytes - flushed) & 3))

...since if 'bytes' was 4-byte aligned we would have performed
non-temporal stores.

Can I add your Signed-off-by: on v3?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 21:35       ` Dan Williams
@ 2017-04-10 21:45         ` Kani, Toshimitsu
  2017-04-10 22:09           ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Kani, Toshimitsu @ 2017-04-10 21:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

> On Mon, Apr 10, 2017 at 2:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
> wrote:
> >> > Thanks for the update.  I think the alignment check should be based on
> >> > the following note in copy_user_nocache.
> >> >
> >> >  * Note: Cached memory copy is used when destination or size is not
> >> >  * naturally aligned. That is:
> >> >  *  - Require 8-byte alignment when size is 8 bytes or larger.
> >> >  *  - Require 4-byte alignment when size is 4 bytes.
> >> >
> >> > So, I think the code may be something like this.  I also made the following
> >> changes:
> >>
> >> Thanks!
> >>
> >> >  - Mask with 7, not 8.
> >>
> >> Yes, good catch.
> >>
> >> >  - ALIGN with cacheline size, instead of 8.
> >> >  - Add (bytes > flushed) test since calculation with unsigned long still
> results
> >> in a negative
> >> >    value (as a positive value).
> >> >
> >> >         if (bytes < 8) {
> >> >                 if ((dest & 3) || (bytes != 4))
> >> >                         arch_wb_cache_pmem(addr, 1);
> >> >         } else {
> >> >                 if (dest & 7) {
> >> >                         dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
> >>
> >> Why align the destination to the next cacheline? As far as I can see
> >> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns
> >> to the next 8-byte boundary.
> >
> > The clflush here flushes for the cacheline size.  So, we do not need to flush
> > the same cacheline again when the unaligned tail is in the same line.
> 
> Ok, makes sense. Last question, can't we reduce the check to be:
> 
>         if ((bytes > flushed) && ((bytes - flushed) & 3))
> 
> ...since if 'bytes' was 4-byte aligned we would have performed
> non-temporal stores.

That is not documented behavior of copy_user_nocache, but as long as the pmem
version of copy_user_nocache follows the same implemented behavior, yes, that
works.

> Can I add your Signed-off-by: on v3?

Sure.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 21:45         ` Kani, Toshimitsu
@ 2017-04-10 22:09           ` Dan Williams
  2017-04-10 22:47             ` Kani, Toshimitsu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-04-10 22:09 UTC (permalink / raw)
  To: Kani, Toshimitsu
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

On Mon, Apr 10, 2017 at 2:45 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> On Mon, Apr 10, 2017 at 2:28 PM, Kani, Toshimitsu <toshi.kani@hpe.com>
>> wrote:
>> >> > Thanks for the update.  I think the alignment check should be based on
>> >> > the following note in copy_user_nocache.
>> >> >
>> >> >  * Note: Cached memory copy is used when destination or size is not
>> >> >  * naturally aligned. That is:
>> >> >  *  - Require 8-byte alignment when size is 8 bytes or larger.
>> >> >  *  - Require 4-byte alignment when size is 4 bytes.
>> >> >
>> >> > So, I think the code may be something like this.  I also made the following
>> >> changes:
>> >>
>> >> Thanks!
>> >>
>> >> >  - Mask with 7, not 8.
>> >>
>> >> Yes, good catch.
>> >>
>> >> >  - ALIGN with cacheline size, instead of 8.
>> >> >  - Add (bytes > flushed) test since calculation with unsigned long still
>> results
>> >> in a negative
>> >> >    value (as a positive value).
>> >> >
>> >> >         if (bytes < 8) {
>> >> >                 if ((dest & 3) || (bytes != 4))
>> >> >                         arch_wb_cache_pmem(addr, 1);
>> >> >         } else {
>> >> >                 if (dest & 7) {
>> >> >                         dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
>> >>
>> >> Why align the destination to the next cacheline? As far as I can see
>> >> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns
>> >> to the next 8-byte boundary.
>> >
>> > The clflush here flushes for the cacheline size.  So, we do not need to flush
>> > the same cacheline again when the unaligned tail is in the same line.
>>
>> Ok, makes sense. Last question, can't we reduce the check to be:
>>
>>         if ((bytes > flushed) && ((bytes - flushed) & 3))
>>
>> ...since if 'bytes' was 4-byte aligned we would have performed
>> non-temporal stores.
>
> That is not documented behavior of copy_user_nocache, but as long as the pmem
> version of copy_user_nocache follows the same implemented behavior, yes, that
> works.

Hmm, sorry this comment confuses me, I'm only referring to the current
version of __copy_user_nocache not the new pmem version. The way I
read the current code we only ever jump to the cached copy loop
(.L_1b_cache_copy_loop) if the trailing byte-count is 4-byte
misaligned.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  2017-04-10 22:09           ` Dan Williams
@ 2017-04-10 22:47             ` Kani, Toshimitsu
  0 siblings, 0 replies; 8+ messages in thread
From: Kani, Toshimitsu @ 2017-04-10 22:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm@lists.01.org, Jan Kara, Matthew Wilcox, x86,
	linux-kernel, stable, Christoph Hellwig, Jeff Moyer, Ingo Molnar,
	Al Viro, H. Peter Anvin, Thomas Gleixner, Ross Zwisler

> >> > The clflush here flushes for the cacheline size.  So, we do not need to
> flush
> >> > the same cacheline again when the unaligned tail is in the same line.
> >>
> >> Ok, makes sense. Last question, can't we reduce the check to be:
> >>
> >>         if ((bytes > flushed) && ((bytes - flushed) & 3))
> >>
> >> ...since if 'bytes' was 4-byte aligned we would have performed
> >> non-temporal stores.
> >
> > That is not documented behavior of copy_user_nocache, but as long as the
> pmem
> > version of copy_user_nocache follows the same implemented behavior, yes,
> that
> > works.
> 
> Hmm, sorry this comment confuses me, I'm only referring to the current
> version of __copy_user_nocache not the new pmem version. The way I
> read the current code we only ever jump to the cached copy loop
> (.L_1b_cache_copy_loop) if the trailing byte-count is 4-byte
> misaligned.

Yes, you are right and that's how the code is implemented.  I added this trailing
4-byte handling for the >=8B case, which is shared with <8B case, since it was 
easy to do.  But I considered it a bonus.  This function also needs to handle 
4B-aligned destination if it is to state that it handles 4B alignment for the >=8B
case as well.   Otherwise, it's inconsistent.  Since I did not see much point of supporting
such case, I simply documented in the Note that 8 byte alignment is required for
the >=8B case.

Thanks,
-Toshi
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-10 22:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-08  0:12 [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions Dan Williams
2017-04-10 18:53 ` Kani, Toshimitsu
2017-04-10 21:13   ` Dan Williams
2017-04-10 21:28     ` Kani, Toshimitsu
2017-04-10 21:35       ` Dan Williams
2017-04-10 21:45         ` Kani, Toshimitsu
2017-04-10 22:09           ` Dan Williams
2017-04-10 22:47             ` Kani, Toshimitsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).