linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
@ 2018-02-23 15:36 Arnd Bergmann
  2018-02-23 15:59 ` Andy Shevchenko
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Arnd Bergmann @ 2018-02-23 15:36 UTC (permalink / raw)
  To: James Smart, Dick Kennedy, James E.J. Bottomley, Martin K. Petersen
  Cc: Arnd Bergmann, Hannes Reinecke, Johannes Thumshirn, linux-scsi,
	linux-kernel

32-bit architectures generally cannot use writeq(), so we now get a build
failure for the lpfc driver:

drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration]

Another problem here is that writing out actual data (unlike accessing
mmio registers) means we must write the data with the same endianess
that we have read from memory, but writeq() will perform byte swaps
and add barriers inbetween accesses as we do for registers.

Using memcpy_toio() should do the right thing here, using register
sized stores with correct endianess conversion and barriers (i.e. none),
but on some architectures might fall back to byte-size access.

Side note: shouldn't the driver use ioremap_wc() instead of ioremap()
to get a write-combining mapping on all architectures that support this?

Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/scsi/lpfc/lpfc_sli.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 4ce3ca6f4b79..6749d41753b4 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
 	struct lpfc_register doorbell;
 	uint32_t host_index;
 	uint32_t idx;
-	uint32_t i = 0;
 	uint8_t *tmp;
 
 	/* sanity check on queue memory */
@@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
 	if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
 		bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id);
 	lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
-	if (q->dpp_enable && q->phba->cfg_enable_dpp) {
+	if (q->dpp_enable && q->phba->cfg_enable_dpp)
 		/* write to DPP aperture taking advatage of Combined Writes */
-		tmp = (uint8_t *)wqe;
-		for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
-			writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
-	}
+		memcpy_toio(tmp, q->dpp_regaddr, q->entry_size);
+
 	/* ensure WQE bcopy and DPP flushed before doorbell write */
 	wmb();
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 15:36 [PATCH] scsi: lpfc: use memcpy_toio instead of writeq Arnd Bergmann
@ 2018-02-23 15:59 ` Andy Shevchenko
  2018-02-23 16:13   ` Andy Shevchenko
  2018-02-23 16:41 ` David Laight
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2018-02-23 15:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, Linux Kernel Mailing List

On Fri, Feb 23, 2018 at 5:36 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> 32-bit architectures generally cannot use writeq(), so we now get a build
> failure for the lpfc driver:
>
> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration]
>
> Another problem here is that writing out actual data (unlike accessing
> mmio registers) means we must write the data with the same endianess
> that we have read from memory, but writeq() will perform byte swaps
> and add barriers inbetween accesses as we do for registers.
>
> Using memcpy_toio() should do the right thing here, using register
> sized stores with correct endianess conversion and barriers (i.e. none),
> but on some architectures might fall back to byte-size access.
>
> Side note: shouldn't the driver use ioremap_wc() instead of ioremap()
> to get a write-combining mapping on all architectures that support this?

IIRC memcpy_toio() doesn't increment the destination address.

lo_hi or hi_lo helpers sound better.

> Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> @@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>         struct lpfc_register doorbell;
>         uint32_t host_index;
>         uint32_t idx;
> -       uint32_t i = 0;
>         uint8_t *tmp;
>
>         /* sanity check on queue memory */
> @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>         if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
>                 bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id);
>         lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
> -       if (q->dpp_enable && q->phba->cfg_enable_dpp) {
> +       if (q->dpp_enable && q->phba->cfg_enable_dpp)
>                 /* write to DPP aperture taking advatage of Combined Writes */
> -               tmp = (uint8_t *)wqe;
> -               for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
> -                       writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
> -       }
> +               memcpy_toio(tmp, q->dpp_regaddr, q->entry_size);
> +
>         /* ensure WQE bcopy and DPP flushed before doorbell write */
>         wmb();
>
> --
> 2.9.0
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 15:59 ` Andy Shevchenko
@ 2018-02-23 16:13   ` Andy Shevchenko
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Shevchenko @ 2018-02-23 16:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, Linux Kernel Mailing List

On Fri, Feb 23, 2018 at 5:59 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Fri, Feb 23, 2018 at 5:36 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> IIRC memcpy_toio() doesn't increment the destination address.
>
> lo_hi or hi_lo helpers sound better.

Ah, sorry, I messed up with writesl() / etc.

memcpy_toio() has another side-effect, but here it seems unimportant.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 15:36 [PATCH] scsi: lpfc: use memcpy_toio instead of writeq Arnd Bergmann
  2018-02-23 15:59 ` Andy Shevchenko
@ 2018-02-23 16:41 ` David Laight
  2018-02-23 16:44   ` Arnd Bergmann
  2018-02-23 16:51   ` Andy Shevchenko
  2018-02-23 21:02 ` Arnd Bergmann
  2018-02-25 10:02 ` Johannes Thumshirn
  3 siblings, 2 replies; 14+ messages in thread
From: David Laight @ 2018-02-23 16:41 UTC (permalink / raw)
  To: 'Arnd Bergmann',
	James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen
  Cc: Hannes Reinecke, Johannes Thumshirn, linux-scsi, linux-kernel

From: Arnd Bergmann
> Sent: 23 February 2018 15:37
> 
> 32-bit architectures generally cannot use writeq(), so we now get a build
> failure for the lpfc driver:
> 
> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean
> 'writeb'? [-Werror=implicit-function-declaration]
> 
> Another problem here is that writing out actual data (unlike accessing
> mmio registers) means we must write the data with the same endianess
> that we have read from memory, but writeq() will perform byte swaps
> and add barriers inbetween accesses as we do for registers.
> 
> Using memcpy_toio() should do the right thing here, using register
> sized stores with correct endianess conversion and barriers (i.e. none),
> but on some architectures might fall back to byte-size access.
...

Have you looked at the performance impact of this on x86?
Last time I looked memcpy_toio() aliased directly to memcpy().
memcpy() is run-time patched between several different algorithms.
On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
on the hardware to DTRT.
For uncached accesses (typical for io) the 'RT' has to be byte transfers.
So instead of the 8 byte transfers (on 64 bit) you get single bytes.
This won't be what is intended!
memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 16:41 ` David Laight
@ 2018-02-23 16:44   ` Arnd Bergmann
  2018-02-23 16:51   ` Andy Shevchenko
  1 sibling, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2018-02-23 16:44 UTC (permalink / raw)
  To: David Laight
  Cc: James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

On Fri, Feb 23, 2018 at 5:41 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Arnd Bergmann
>> Sent: 23 February 2018 15:37
>>
>> 32-bit architectures generally cannot use writeq(), so we now get a build
>> failure for the lpfc driver:
>>
>> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
>> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean
>> 'writeb'? [-Werror=implicit-function-declaration]
>>
>> Another problem here is that writing out actual data (unlike accessing
>> mmio registers) means we must write the data with the same endianess
>> that we have read from memory, but writeq() will perform byte swaps
>> and add barriers inbetween accesses as we do for registers.
>>
>> Using memcpy_toio() should do the right thing here, using register
>> sized stores with correct endianess conversion and barriers (i.e. none),
>> but on some architectures might fall back to byte-size access.
> ...
>
> Have you looked at the performance impact of this on x86?
> Last time I looked memcpy_toio() aliased directly to memcpy().
> memcpy() is run-time patched between several different algorithms.
> On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
> on the hardware to DTRT.
> For uncached accesses (typical for io) the 'RT' has to be byte transfers.
> So instead of the 8 byte transfers (on 64 bit) you get single bytes.
> This won't be what is intended!
> memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.

I'm not that familiar with x86, but I would have guessed that on a
write-combining I/O mapping, the hardware will combine the 'rep movsb'
output data the same was as on a cacheable mapping.

      Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 16:41 ` David Laight
  2018-02-23 16:44   ` Arnd Bergmann
@ 2018-02-23 16:51   ` Andy Shevchenko
  2018-02-23 16:53     ` Andy Shevchenko
  2018-02-23 17:09     ` David Laight
  1 sibling, 2 replies; 14+ messages in thread
From: Andy Shevchenko @ 2018-02-23 16:51 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Arnd Bergmann
>> Sent: 23 February 2018 15:37
>>
>> 32-bit architectures generally cannot use writeq(), so we now get a build
>> failure for the lpfc driver:
>>
>> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
>> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean
>> 'writeb'? [-Werror=implicit-function-declaration]
>>
>> Another problem here is that writing out actual data (unlike accessing
>> mmio registers) means we must write the data with the same endianess
>> that we have read from memory, but writeq() will perform byte swaps
>> and add barriers inbetween accesses as we do for registers.
>>
>> Using memcpy_toio() should do the right thing here, using register
>> sized stores with correct endianess conversion and barriers (i.e. none),
>> but on some architectures might fall back to byte-size access.
> ...
>
> Have you looked at the performance impact of this on x86?
> Last time I looked memcpy_toio() aliased directly to memcpy().
> memcpy() is run-time patched between several different algorithms.
> On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
> on the hardware to DTRT.
> For uncached accesses (typical for io) the 'RT' has to be byte transfers.
> So instead of the 8 byte transfers (on 64 bit) you get single bytes.
> This won't be what is intended!
> memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.

Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit.

The side-effect I referred previously is about tails, i.e. unaligned
bytes are transferred in portions
like
  7 on 64-bit will be  4 + 2 + 1,
  5 = 4 + 1
etc

Similar way on 32-bit.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 16:51   ` Andy Shevchenko
@ 2018-02-23 16:53     ` Andy Shevchenko
  2018-02-23 17:09     ` David Laight
  1 sibling, 0 replies; 14+ messages in thread
From: Andy Shevchenko @ 2018-02-23 16:53 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

On Fri, Feb 23, 2018 at 6:51 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote:
>> From: Arnd Bergmann
>>> Sent: 23 February 2018 15:37
>>>
>>> 32-bit architectures generally cannot use writeq(), so we now get a build
>>> failure for the lpfc driver:
>>>
>>> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
>>> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean
>>> 'writeb'? [-Werror=implicit-function-declaration]
>>>
>>> Another problem here is that writing out actual data (unlike accessing
>>> mmio registers) means we must write the data with the same endianess
>>> that we have read from memory, but writeq() will perform byte swaps
>>> and add barriers inbetween accesses as we do for registers.
>>>
>>> Using memcpy_toio() should do the right thing here, using register
>>> sized stores with correct endianess conversion and barriers (i.e. none),
>>> but on some architectures might fall back to byte-size access.
>> ...
>>
>> Have you looked at the performance impact of this on x86?
>> Last time I looked memcpy_toio() aliased directly to memcpy().
>> memcpy() is run-time patched between several different algorithms.
>> On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
>> on the hardware to DTRT.
>> For uncached accesses (typical for io) the 'RT' has to be byte transfers.
>> So instead of the 8 byte transfers (on 64 bit) you get single bytes.
>> This won't be what is intended!
>> memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.
>
> Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit.
>
> The side-effect I referred previously is about tails, i.e. unaligned
> bytes are transferred in portions
> like
>   7 on 64-bit will be  4 + 2 + 1,
>   5 = 4 + 1
> etc
>
> Similar way on 32-bit.

Same for leading bytes as well.

arch/x86/lib/memcpy_64.S

So, I *hope* that in the code in question there is no unaligned access is used.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 16:51   ` Andy Shevchenko
  2018-02-23 16:53     ` Andy Shevchenko
@ 2018-02-23 17:09     ` David Laight
  2018-02-23 17:12       ` Andy Shevchenko
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2018-02-23 17:09 UTC (permalink / raw)
  To: 'Andy Shevchenko'
  Cc: Arnd Bergmann, James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

From: Andy Shevchenko
> Sent: 23 February 2018 16:51
> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote:
> > From: Arnd Bergmann
> >> Sent: 23 February 2018 15:37
> >>
> >> 32-bit architectures generally cannot use writeq(), so we now get a build
> >> failure for the lpfc driver:
> >>
> >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean
> >> 'writeb'? [-Werror=implicit-function-declaration]
> >>
> >> Another problem here is that writing out actual data (unlike accessing
> >> mmio registers) means we must write the data with the same endianess
> >> that we have read from memory, but writeq() will perform byte swaps
> >> and add barriers inbetween accesses as we do for registers.
> >>
> >> Using memcpy_toio() should do the right thing here, using register
> >> sized stores with correct endianess conversion and barriers (i.e. none),
> >> but on some architectures might fall back to byte-size access.
> > ...
> >
> > Have you looked at the performance impact of this on x86?
> > Last time I looked memcpy_toio() aliased directly to memcpy().
> > memcpy() is run-time patched between several different algorithms.
> > On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
> > on the hardware to DTRT.
> > For uncached accesses (typical for io) the 'RT' has to be byte transfers.
> > So instead of the 8 byte transfers (on 64 bit) you get single bytes.
> > This won't be what is intended!
> > memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.
> 
> Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit.

(Let's not argue about the instruction mnemonic). 

You might expect that, but last time I looked at the bus cycles on a PCIe slave
that wasn't what I saw.

> The side-effect I referred previously is about tails, i.e. unaligned
> bytes are transferred in portions
> like
>   7 on 64-bit will be  4 + 2 + 1,
>   5 = 4 + 1

on 64bit memcpy() is allowed to do:
	(long *)(tgt+len)[-1] = (long *)(src+len)[-1];
	rep_movsq(tgt, src, len >> 3);
provided the length is at least 8.

The misaligned PCIe transfer generates a single TLP covering 12 bytes with the
relevant byte enables set for the first and last 32bit words.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 17:09     ` David Laight
@ 2018-02-23 17:12       ` Andy Shevchenko
  2018-02-23 17:45         ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2018-02-23 17:12 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Andy Shevchenko
>> Sent: 23 February 2018 16:51
>> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote:


>> The side-effect I referred previously is about tails, i.e. unaligned
>> bytes are transferred in portions
>> like
>>   7 on 64-bit will be  4 + 2 + 1,
>>   5 = 4 + 1
>
> on 64bit memcpy() is allowed to do:
>         (long *)(tgt+len)[-1] = (long *)(src+len)[-1];
>         rep_movsq(tgt, src, len >> 3);
> provided the length is at least 8.
>
> The misaligned PCIe transfer generates a single TLP covering 12 bytes with the
> relevant byte enables set for the first and last 32bit words.

But is it guaranteed on any type of bus?
memcpy_toio() is a generic helper, so, first of all we need to be sure
what CPU on its side does, this is I think is pretty straight forward
since it's all written in asm for 64-bit case.
So, what about buses?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 17:12       ` Andy Shevchenko
@ 2018-02-23 17:45         ` David Laight
  0 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2018-02-23 17:45 UTC (permalink / raw)
  To: 'Andy Shevchenko'
  Cc: Arnd Bergmann, James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, Johannes Thumshirn,
	linux-scsi, linux-kernel

From: Andy Shevchenko
> Sent: 23 February 2018 17:13
> To: David Laight
> Cc: Arnd Bergmann; James Smart; Dick Kennedy; James E.J. Bottomley; Martin K. Petersen; Hannes
> Reinecke; Johannes Thumshirn; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
> 
> On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@aculab.com> wrote:
> > From: Andy Shevchenko
> >> Sent: 23 February 2018 16:51
> >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote:
> 
> 
> >> The side-effect I referred previously is about tails, i.e. unaligned
> >> bytes are transferred in portions
> >> like
> >>   7 on 64-bit will be  4 + 2 + 1,
> >>   5 = 4 + 1
> >
> > on 64bit memcpy() is allowed to do:
> >         (long *)(tgt+len)[-1] = (long *)(src+len)[-1];
> >         rep_movsq(tgt, src, len >> 3);
> > provided the length is at least 8.
> >
> > The misaligned PCIe transfer generates a single TLP covering 12 bytes with the
> > relevant byte enables set for the first and last 32bit words.
> 
> But is it guaranteed on any type of bus?
> memcpy_toio() is a generic helper, so, first of all we need to be sure
> what CPU on its side does, this is I think is pretty straight forward
> since it's all written in asm for 64-bit case.

I've just done a compile test, on x86-64 memcpy_toio() generates a
call to memcpy() (checked with objdump -dr).
That is on a system running a 4.14 kernel, so is probably using the system
headers from that release.
I'd need to do a run-time test on a newer system verify what the PCIe slave
sees - but I changed our driver to do its own copy loops in order to avoid
byte transfers some time ago.

FWIW I was originally doing copy_to/from_user() directly to PCIe memory addresses!

On x86 'memory' on devices can always be accesses by simple instructions.
Hardware 'IO' addresses are not valid for memcpy_to/fromio().

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 15:36 [PATCH] scsi: lpfc: use memcpy_toio instead of writeq Arnd Bergmann
  2018-02-23 15:59 ` Andy Shevchenko
  2018-02-23 16:41 ` David Laight
@ 2018-02-23 21:02 ` Arnd Bergmann
  2018-02-24 22:24   ` James Smart
  2018-02-25 10:02 ` Johannes Thumshirn
  3 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2018-02-23 21:02 UTC (permalink / raw)
  To: James Smart, Dick Kennedy, James E.J. Bottomley, Martin K. Petersen
  Cc: Arnd Bergmann, Hannes Reinecke, Johannes Thumshirn, linux-scsi,
	Linux Kernel Mailing List

On Fri, Feb 23, 2018 at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>         if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
>                 bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id);
>         lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
> -       if (q->dpp_enable && q->phba->cfg_enable_dpp) {
> +       if (q->dpp_enable && q->phba->cfg_enable_dpp)
>                 /* write to DPP aperture taking advatage of Combined Writes */
> -               tmp = (uint8_t *)wqe;
> -               for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
> -                       writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
> -       }
> +               memcpy_toio(tmp, q->dpp_regaddr, q->entry_size);
> +
>         /* ensure WQE bcopy and DPP flushed before doorbell write */
>         wmb();
>

Not sure where we are with the question of whether memcpy_toio
is a good replacement or not, but further build testing showed that
my patch was completely broken in more than one way:

I mixed up the source and destination arguments, and I used
the uninitialized 'tmp' instead of 'wqe'. Don't try this patch.

       Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 21:02 ` Arnd Bergmann
@ 2018-02-24 22:24   ` James Smart
  0 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2018-02-24 22:24 UTC (permalink / raw)
  To: Arnd Bergmann, Dick Kennedy, James E.J. Bottomley, Martin K. Petersen
  Cc: Hannes Reinecke, Johannes Thumshirn, linux-scsi,
	Linux Kernel Mailing List

About to post a patch to fix. Rather than fidgeting with the copy 
routine, I want to go back to what we originally proposed - writeq() on 
64bit, writel() on 32-bit.

-- james


On 2/23/2018 1:02 PM, Arnd Bergmann wrote:
> On Fri, Feb 23, 2018 at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>>          if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
>>                  bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id);
>>          lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
>> -       if (q->dpp_enable && q->phba->cfg_enable_dpp) {
>> +       if (q->dpp_enable && q->phba->cfg_enable_dpp)
>>                  /* write to DPP aperture taking advatage of Combined Writes */
>> -               tmp = (uint8_t *)wqe;
>> -               for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
>> -                       writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
>> -       }
>> +               memcpy_toio(tmp, q->dpp_regaddr, q->entry_size);
>> +
>>          /* ensure WQE bcopy and DPP flushed before doorbell write */
>>          wmb();
>>
> Not sure where we are with the question of whether memcpy_toio
> is a good replacement or not, but further build testing showed that
> my patch was completely broken in more than one way:
>
> I mixed up the source and destination arguments, and I used
> the uninitialized 'tmp' instead of 'wqe'. Don't try this patch.
>
>         Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-23 15:36 [PATCH] scsi: lpfc: use memcpy_toio instead of writeq Arnd Bergmann
                   ` (2 preceding siblings ...)
  2018-02-23 21:02 ` Arnd Bergmann
@ 2018-02-25 10:02 ` Johannes Thumshirn
  2018-02-26  9:03   ` Arnd Bergmann
  3 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2018-02-25 10:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, linux-scsi, linux-kernel

Arnd Bergmann <arnd@arndb.de> writes:
> 32-bit architectures generally cannot use writeq(), so we now get a build
> failure for the lpfc driver:
>
> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration]
Hi Arnd,
why can't we use the writeq() from 'io-64-nonatomic-lo-hi.h'? I always
thought these are compat versions for 32 Bit archs and even asked James
to do so, what's why he did the change in the first place.

My apologies for this James.

Thanks,
        Johannes


>
> Another problem here is that writing out actual data (unlike accessing
> mmio registers) means we must write the data with the same endianess
> that we have read from memory, but writeq() will perform byte swaps
> and add barriers inbetween accesses as we do for registers.
>
> Using memcpy_toio() should do the right thing here, using register
> sized stores with correct endianess conversion and barriers (i.e. none),
> but on some architectures might fall back to byte-size access.
>
> Side note: shouldn't the driver use ioremap_wc() instead of ioremap()
> to get a write-combining mapping on all architectures that support this?
>
> Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/scsi/lpfc/lpfc_sli.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> index 4ce3ca6f4b79..6749d41753b4 100644
> --- a/drivers/scsi/lpfc/lpfc_sli.c
> +++ b/drivers/scsi/lpfc/lpfc_sli.c
> @@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>  	struct lpfc_register doorbell;
>  	uint32_t host_index;
>  	uint32_t idx;
> -	uint32_t i = 0;
>  	uint8_t *tmp;
>  
>  	/* sanity check on queue memory */
> @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe)
>  	if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED)
>  		bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id);
>  	lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size);
> -	if (q->dpp_enable && q->phba->cfg_enable_dpp) {
> +	if (q->dpp_enable && q->phba->cfg_enable_dpp)
>  		/* write to DPP aperture taking advatage of Combined Writes */
> -		tmp = (uint8_t *)wqe;
> -		for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
> -			writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
> -	}
> +		memcpy_toio(tmp, q->dpp_regaddr, q->entry_size);
> +
>  	/* ensure WQE bcopy and DPP flushed before doorbell write */
>  	wmb();

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
  2018-02-25 10:02 ` Johannes Thumshirn
@ 2018-02-26  9:03   ` Arnd Bergmann
  0 siblings, 0 replies; 14+ messages in thread
From: Arnd Bergmann @ 2018-02-26  9:03 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: James Smart, Dick Kennedy, James E.J. Bottomley,
	Martin K. Petersen, Hannes Reinecke, linux-scsi,
	Linux Kernel Mailing List

On Sun, Feb 25, 2018 at 11:02 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> Arnd Bergmann <arnd@arndb.de> writes:
>> 32-bit architectures generally cannot use writeq(), so we now get a build
>> failure for the lpfc driver:
>>
>> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
>> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration]
> Hi Arnd,
> why can't we use the writeq() from 'io-64-nonatomic-lo-hi.h'? I always
> thought these are compat versions for 32 Bit archs and even asked James
> to do so, what's why he did the change in the first place.

That could work as well, if someone can figure out what the correct incantation
is that works on big-endian 32-bit architectures. I think the simplest
version that
works everywhere would be

lo_hi_writeq(__le32_to_cpup(__le32 __force *)p) |
(u64)__le32_to_cpup(__le32 __force *)p +1) << 32));

but this is ugly as hell and makes my head spin. I definitely would't want that
applied without being tested properly first on a variety of architectures.

There are three variants that I'd prefer over that:

- use memcpy_toio() and change the x86 implementation to guarantee aligned
  word aligned accesses on the output if at all possible (which seems to be
  a good idea anyway)
- use a loop around __raw_writeq()/__raw_writel() depending on CONFIG_64BIT
- add generic memcpy_toio_32() and memcpy_toio_64() helpers in linux/io.h
  and use those.

       Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-02-26  9:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23 15:36 [PATCH] scsi: lpfc: use memcpy_toio instead of writeq Arnd Bergmann
2018-02-23 15:59 ` Andy Shevchenko
2018-02-23 16:13   ` Andy Shevchenko
2018-02-23 16:41 ` David Laight
2018-02-23 16:44   ` Arnd Bergmann
2018-02-23 16:51   ` Andy Shevchenko
2018-02-23 16:53     ` Andy Shevchenko
2018-02-23 17:09     ` David Laight
2018-02-23 17:12       ` Andy Shevchenko
2018-02-23 17:45         ` David Laight
2018-02-23 21:02 ` Arnd Bergmann
2018-02-24 22:24   ` James Smart
2018-02-25 10:02 ` Johannes Thumshirn
2018-02-26  9:03   ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).