linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vfio: Fix endianness handling for emulated BARs
@ 2014-06-18 11:36 Alexey Kardashevskiy
  2014-06-18 18:35 ` Alex Williamson
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-18 11:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nikunj A Dadhania, kvm, Alexey Kardashevskiy, linux-kernel,
	Alexander Graf, Alex Williamson

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
little endian and le32_to_cpu/... are stubs.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (@val becomes actually little
endian) and calls iowrite32() which does not do swapping on big endian
system.

This removes byte swapping and makes use ioread32be/iowrite32be
(and 16bit versions) on big-endian systems. The "be" helpers take
native endian values and do swapping at the moment of writing to a PCI
register using one of "store byte-reversed" instructions.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 210db24..f363b5a 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -21,6 +21,18 @@
 
 #include "vfio_pci_private.h"
 
+#ifdef __BIG_ENDIAN__
+#define ioread16_native		ioread16be
+#define ioread32_native		ioread32be
+#define iowrite16_native	iowrite16be
+#define iowrite32_native	iowrite32be
+#else
+#define ioread16_native		ioread16
+#define ioread32_native		ioread32
+#define iowrite16_native	iowrite16
+#define iowrite32_native	iowrite32
+#endif
+
 /*
  * Read or write from an __iomem region (MMIO or I/O port) with an excluded
  * range which is inaccessible.  The excluded range drops writes and fills
@@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 				if (copy_from_user(&val, buf, 4))
 					return -EFAULT;
 
-				iowrite32(le32_to_cpu(val), io + off);
+				iowrite32_native(val, io + off);
 			} else {
-				val = cpu_to_le32(ioread32(io + off));
+				val = ioread32_native(io + off);
 
 				if (copy_to_user(buf, &val, 4))
 					return -EFAULT;
@@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
 				if (copy_from_user(&val, buf, 2))
 					return -EFAULT;
 
-				iowrite16(le16_to_cpu(val), io + off);
+				iowrite16_native(val, io + off);
 			} else {
-				val = cpu_to_le16(ioread16(io + off));
+				val = ioread16_native(io + off);
 
 				if (copy_to_user(buf, &val, 2))
 					return -EFAULT;
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-18 11:36 [PATCH] vfio: Fix endianness handling for emulated BARs Alexey Kardashevskiy
@ 2014-06-18 18:35 ` Alex Williamson
  2014-06-19  0:50   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2014-06-18 18:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
> VFIO exposes BARs to user space as a byte stream so userspace can
> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> not do byte swapping and simply return values as it gets them from
> PCI device.
> 
> Instead, the existing code assumes that byte stream in read/write is
> little-endian and it fixes endianness for values which it passes to
> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
> little endian and le32_to_cpu/... are stubs.

vfio read32:

val = cpu_to_le32(ioread32(io + off));

Where the typical x86 case, ioread32 is:

#define ioread32(addr)          readl(addr)

and readl is:

__le32_to_cpu(__raw_readl(addr));

So we do canceling byte swaps, which are both nops on x86, and end up
returning device endian, which we assume is little endian.

vfio write32 is similar:

iowrite32(le32_to_cpu(val), io + off);

The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
out, so input data is device endian, which is assumed little.

> This also works for big endian but rather by an accident: it reads 4 bytes
> from the stream (@val is big endian), converts to CPU format (which should
> be big endian) as it was little endian (@val becomes actually little
> endian) and calls iowrite32() which does not do swapping on big endian
> system.

Really?

In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
writel(), which seems to use the generic implementation, which does
include a cpu_to_le32.

I also see other big endian archs like parisc doing cpu_to_le32 on
iowrite32, so I don't think this statement is true.  I imagine it's
probably working for you because the swap cancel.

> This removes byte swapping and makes use ioread32be/iowrite32be
> (and 16bit versions) on big-endian systems. The "be" helpers take
> native endian values and do swapping at the moment of writing to a PCI
> register using one of "store byte-reversed" instructions.

So now you want iowrite32() on little endian and iowrite32be() on big
endian, the former does a cpu_to_le32 (which is a nop on little endian)
and the latter does a cpu_to_be32 (which is a nop on big endian)...
should we just be using __raw_writel() on both?  There doesn't actually
seem to be any change in behavior here, it just eliminates back-to-back
byte swaps, which are a nop on x86, but not power, right?

> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 210db24..f363b5a 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -21,6 +21,18 @@
>  
>  #include "vfio_pci_private.h"
>  
> +#ifdef __BIG_ENDIAN__
> +#define ioread16_native		ioread16be
> +#define ioread32_native		ioread32be
> +#define iowrite16_native	iowrite16be
> +#define iowrite32_native	iowrite32be
> +#else
> +#define ioread16_native		ioread16
> +#define ioread32_native		ioread32
> +#define iowrite16_native	iowrite16
> +#define iowrite32_native	iowrite32
> +#endif
> +
>  /*
>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>   * range which is inaccessible.  The excluded range drops writes and fills
> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>  				if (copy_from_user(&val, buf, 4))
>  					return -EFAULT;
>  
> -				iowrite32(le32_to_cpu(val), io + off);
> +				iowrite32_native(val, io + off);
>  			} else {
> -				val = cpu_to_le32(ioread32(io + off));
> +				val = ioread32_native(io + off);
>  
>  				if (copy_to_user(buf, &val, 4))
>  					return -EFAULT;
> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>  				if (copy_from_user(&val, buf, 2))
>  					return -EFAULT;
>  
> -				iowrite16(le16_to_cpu(val), io + off);
> +				iowrite16_native(val, io + off);
>  			} else {
> -				val = cpu_to_le16(ioread16(io + off));
> +				val = ioread16_native(io + off);
>  
>  				if (copy_to_user(buf, &val, 2))
>  					return -EFAULT;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-18 18:35 ` Alex Williamson
@ 2014-06-19  0:50   ` Alexey Kardashevskiy
  2014-06-19  1:50     ` Alexey Kardashevskiy
  2014-06-19  1:50     ` Alexey Kardashevskiy
  0 siblings, 2 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-19  0:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On 06/19/2014 04:35 AM, Alex Williamson wrote:
> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>> VFIO exposes BARs to user space as a byte stream so userspace can
>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>> not do byte swapping and simply return values as it gets them from
>> PCI device.
>>
>> Instead, the existing code assumes that byte stream in read/write is
>> little-endian and it fixes endianness for values which it passes to
>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>> little endian and le32_to_cpu/... are stubs.
> 
> vfio read32:
> 
> val = cpu_to_le32(ioread32(io + off));
> 
> Where the typical x86 case, ioread32 is:
> 
> #define ioread32(addr)          readl(addr)
> 
> and readl is:
> 
> __le32_to_cpu(__raw_readl(addr));
> 
> So we do canceling byte swaps, which are both nops on x86, and end up
> returning device endian, which we assume is little endian.
> 
> vfio write32 is similar:
> 
> iowrite32(le32_to_cpu(val), io + off);
> 
> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> out, so input data is device endian, which is assumed little.
> 
>> This also works for big endian but rather by an accident: it reads 4 bytes
>> from the stream (@val is big endian), converts to CPU format (which should
>> be big endian) as it was little endian (@val becomes actually little
>> endian) and calls iowrite32() which does not do swapping on big endian
>> system.
> 
> Really?
> 
> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> writel(), which seems to use the generic implementation, which does
> include a cpu_to_le32.


Ouch, wrong comment. iowrite32() does swapping. My bad.


> 
> I also see other big endian archs like parisc doing cpu_to_le32 on
> iowrite32, so I don't think this statement is true.  I imagine it's
> probably working for you because the swap cancel.
> 
>> This removes byte swapping and makes use ioread32be/iowrite32be
>> (and 16bit versions) on big-endian systems. The "be" helpers take
>> native endian values and do swapping at the moment of writing to a PCI
>> register using one of "store byte-reversed" instructions.
> 
> So now you want iowrite32() on little endian and iowrite32be() on big
> endian, the former does a cpu_to_le32 (which is a nop on little endian)
> and the latter does a cpu_to_be32 (which is a nop on big endian)...
> should we just be using __raw_writel() on both?


We can do that too. The beauty of iowrite32be on ppc64 is that it does not
swap and write separately, it is implemented via the "Store Word
Byte-Reverse Indexed X-form" single instruction.

And some archs (do not know which ones) may add memory barriers in their
implementations of ioread/iowrite. __raw_writel is too raw :)

>  There doesn't actually
> seem to be any change in behavior here, it just eliminates back-to-back
> byte swaps, which are a nop on x86, but not power, right?

Exactly. No dependency for QEMU.


> 
>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 210db24..f363b5a 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -21,6 +21,18 @@
>>  
>>  #include "vfio_pci_private.h"
>>  
>> +#ifdef __BIG_ENDIAN__
>> +#define ioread16_native		ioread16be
>> +#define ioread32_native		ioread32be
>> +#define iowrite16_native	iowrite16be
>> +#define iowrite32_native	iowrite32be
>> +#else
>> +#define ioread16_native		ioread16
>> +#define ioread32_native		ioread32
>> +#define iowrite16_native	iowrite16
>> +#define iowrite32_native	iowrite32
>> +#endif
>> +
>>  /*
>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>   * range which is inaccessible.  The excluded range drops writes and fills
>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>  				if (copy_from_user(&val, buf, 4))
>>  					return -EFAULT;
>>  
>> -				iowrite32(le32_to_cpu(val), io + off);
>> +				iowrite32_native(val, io + off);
>>  			} else {
>> -				val = cpu_to_le32(ioread32(io + off));
>> +				val = ioread32_native(io + off);
>>  
>>  				if (copy_to_user(buf, &val, 4))
>>  					return -EFAULT;
>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>  				if (copy_from_user(&val, buf, 2))
>>  					return -EFAULT;
>>  
>> -				iowrite16(le16_to_cpu(val), io + off);
>> +				iowrite16_native(val, io + off);
>>  			} else {
>> -				val = cpu_to_le16(ioread16(io + off));
>> +				val = ioread16_native(io + off);
>>  
>>  				if (copy_to_user(buf, &val, 2))
>>  					return -EFAULT;
> 
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  0:50   ` Alexey Kardashevskiy
@ 2014-06-19  1:50     ` Alexey Kardashevskiy
  2014-06-19  1:50     ` Alexey Kardashevskiy
  1 sibling, 0 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-19  1:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>> little endian and le32_to_cpu/... are stubs.
>>
>> vfio read32:
>>
>> val = cpu_to_le32(ioread32(io + off));
>>
>> Where the typical x86 case, ioread32 is:
>>
>> #define ioread32(addr)          readl(addr)
>>
>> and readl is:
>>
>> __le32_to_cpu(__raw_readl(addr));
>>
>> So we do canceling byte swaps, which are both nops on x86, and end up
>> returning device endian, which we assume is little endian.
>>
>> vfio write32 is similar:
>>
>> iowrite32(le32_to_cpu(val), io + off);
>>
>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>> out, so input data is device endian, which is assumed little.
>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (@val becomes actually little
>>> endian) and calls iowrite32() which does not do swapping on big endian
>>> system.
>>
>> Really?
>>
>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>> writel(), which seems to use the generic implementation, which does
>> include a cpu_to_le32.
> 
> 
> Ouch, wrong comment. iowrite32() does swapping. My bad.
> 
> 
>>
>> I also see other big endian archs like parisc doing cpu_to_le32 on
>> iowrite32, so I don't think this statement is true.  I imagine it's
>> probably working for you because the swap cancel.
>>
>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>> native endian values and do swapping at the moment of writing to a PCI
>>> register using one of "store byte-reversed" instructions.
>>
>> So now you want iowrite32() on little endian and iowrite32be() on big
>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>> should we just be using __raw_writel() on both?
> 
> 
> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> swap and write separately, it is implemented via the "Store Word
> Byte-Reverse Indexed X-form" single instruction.
> 
> And some archs (do not know which ones) may add memory barriers in their
> implementations of ioread/iowrite. __raw_writel is too raw :)
> 
>>  There doesn't actually
>> seem to be any change in behavior here, it just eliminates back-to-back
>> byte swaps, which are a nop on x86, but not power, right?
> 
> Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special "Store Word
Byte-Reverse Indexed X-form" instruction which does swap and store.
===

any better?




>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 210db24..f363b5a 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -21,6 +21,18 @@
>>>  
>>>  #include "vfio_pci_private.h"
>>>  
>>> +#ifdef __BIG_ENDIAN__
>>> +#define ioread16_native		ioread16be
>>> +#define ioread32_native		ioread32be
>>> +#define iowrite16_native	iowrite16be
>>> +#define iowrite32_native	iowrite32be
>>> +#else
>>> +#define ioread16_native		ioread16
>>> +#define ioread32_native		ioread32
>>> +#define iowrite16_native	iowrite16
>>> +#define iowrite32_native	iowrite32
>>> +#endif
>>> +
>>>  /*
>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>>   * range which is inaccessible.  The excluded range drops writes and fills
>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>  				if (copy_from_user(&val, buf, 4))
>>>  					return -EFAULT;
>>>  
>>> -				iowrite32(le32_to_cpu(val), io + off);
>>> +				iowrite32_native(val, io + off);
>>>  			} else {
>>> -				val = cpu_to_le32(ioread32(io + off));
>>> +				val = ioread32_native(io + off);
>>>  
>>>  				if (copy_to_user(buf, &val, 4))
>>>  					return -EFAULT;
>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>  				if (copy_from_user(&val, buf, 2))
>>>  					return -EFAULT;
>>>  
>>> -				iowrite16(le16_to_cpu(val), io + off);
>>> +				iowrite16_native(val, io + off);
>>>  			} else {
>>> -				val = cpu_to_le16(ioread16(io + off));
>>> +				val = ioread16_native(io + off);
>>>  
>>>  				if (copy_to_user(buf, &val, 2))
>>>  					return -EFAULT;
>>
>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  0:50   ` Alexey Kardashevskiy
  2014-06-19  1:50     ` Alexey Kardashevskiy
@ 2014-06-19  1:50     ` Alexey Kardashevskiy
  2014-06-19  3:48       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-19  1:50 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>> little endian and le32_to_cpu/... are stubs.
>>
>> vfio read32:
>>
>> val = cpu_to_le32(ioread32(io + off));
>>
>> Where the typical x86 case, ioread32 is:
>>
>> #define ioread32(addr)          readl(addr)
>>
>> and readl is:
>>
>> __le32_to_cpu(__raw_readl(addr));
>>
>> So we do canceling byte swaps, which are both nops on x86, and end up
>> returning device endian, which we assume is little endian.
>>
>> vfio write32 is similar:
>>
>> iowrite32(le32_to_cpu(val), io + off);
>>
>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>> out, so input data is device endian, which is assumed little.
>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (@val becomes actually little
>>> endian) and calls iowrite32() which does not do swapping on big endian
>>> system.
>>
>> Really?
>>
>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>> writel(), which seems to use the generic implementation, which does
>> include a cpu_to_le32.
> 
> 
> Ouch, wrong comment. iowrite32() does swapping. My bad.
> 
> 
>>
>> I also see other big endian archs like parisc doing cpu_to_le32 on
>> iowrite32, so I don't think this statement is true.  I imagine it's
>> probably working for you because the swap cancel.
>>
>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>> native endian values and do swapping at the moment of writing to a PCI
>>> register using one of "store byte-reversed" instructions.
>>
>> So now you want iowrite32() on little endian and iowrite32be() on big
>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>> should we just be using __raw_writel() on both?
> 
> 
> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> swap and write separately, it is implemented via the "Store Word
> Byte-Reverse Indexed X-form" single instruction.
> 
> And some archs (do not know which ones) may add memory barriers in their
> implementations of ioread/iowrite. __raw_writel is too raw :)
> 
>>  There doesn't actually
>> seem to be any change in behavior here, it just eliminates back-to-back
>> byte swaps, which are a nop on x86, but not power, right?
> 
> Exactly. No dependency for QEMU.

How about that:
===

VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap gets cancelled, __raw_writel() receives
a native value and then
*(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
just does the right thing.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do explicit byte swapping at the moment
of write to a PCI register. PPC64 uses a special "Store Word
Byte-Reverse Indexed X-form" instruction which does swap and store.
===

any better?




>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> index 210db24..f363b5a 100644
>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>> @@ -21,6 +21,18 @@
>>>  
>>>  #include "vfio_pci_private.h"
>>>  
>>> +#ifdef __BIG_ENDIAN__
>>> +#define ioread16_native		ioread16be
>>> +#define ioread32_native		ioread32be
>>> +#define iowrite16_native	iowrite16be
>>> +#define iowrite32_native	iowrite32be
>>> +#else
>>> +#define ioread16_native		ioread16
>>> +#define ioread32_native		ioread32
>>> +#define iowrite16_native	iowrite16
>>> +#define iowrite32_native	iowrite32
>>> +#endif
>>> +
>>>  /*
>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>>   * range which is inaccessible.  The excluded range drops writes and fills
>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>  				if (copy_from_user(&val, buf, 4))
>>>  					return -EFAULT;
>>>  
>>> -				iowrite32(le32_to_cpu(val), io + off);
>>> +				iowrite32_native(val, io + off);
>>>  			} else {
>>> -				val = cpu_to_le32(ioread32(io + off));
>>> +				val = ioread32_native(io + off);
>>>  
>>>  				if (copy_to_user(buf, &val, 4))
>>>  					return -EFAULT;
>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>  				if (copy_from_user(&val, buf, 2))
>>>  					return -EFAULT;
>>>  
>>> -				iowrite16(le16_to_cpu(val), io + off);
>>> +				iowrite16_native(val, io + off);
>>>  			} else {
>>> -				val = cpu_to_le16(ioread16(io + off));
>>> +				val = ioread16_native(io + off);
>>>  
>>>  				if (copy_to_user(buf, &val, 2))
>>>  					return -EFAULT;
>>
>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  1:50     ` Alexey Kardashevskiy
@ 2014-06-19  3:48       ` Alexey Kardashevskiy
  2014-06-19  5:30         ` Bharat.Bhushan
  2014-06-20  3:21         ` Alex Williamson
  0 siblings, 2 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-19  3:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
>> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>>> not do byte swapping and simply return values as it gets them from
>>>> PCI device.
>>>>
>>>> Instead, the existing code assumes that byte stream in read/write is
>>>> little-endian and it fixes endianness for values which it passes to
>>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>>> little endian and le32_to_cpu/... are stubs.
>>>
>>> vfio read32:
>>>
>>> val = cpu_to_le32(ioread32(io + off));
>>>
>>> Where the typical x86 case, ioread32 is:
>>>
>>> #define ioread32(addr)          readl(addr)
>>>
>>> and readl is:
>>>
>>> __le32_to_cpu(__raw_readl(addr));
>>>
>>> So we do canceling byte swaps, which are both nops on x86, and end up
>>> returning device endian, which we assume is little endian.
>>>
>>> vfio write32 is similar:
>>>
>>> iowrite32(le32_to_cpu(val), io + off);
>>>
>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>>> out, so input data is device endian, which is assumed little.
>>>
>>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>>> from the stream (@val is big endian), converts to CPU format (which should
>>>> be big endian) as it was little endian (@val becomes actually little
>>>> endian) and calls iowrite32() which does not do swapping on big endian
>>>> system.
>>>
>>> Really?
>>>
>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>>> writel(), which seems to use the generic implementation, which does
>>> include a cpu_to_le32.
>>
>>
>> Ouch, wrong comment. iowrite32() does swapping. My bad.
>>
>>
>>>
>>> I also see other big endian archs like parisc doing cpu_to_le32 on
>>> iowrite32, so I don't think this statement is true.  I imagine it's
>>> probably working for you because the swap cancel.
>>>
>>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>>> native endian values and do swapping at the moment of writing to a PCI
>>>> register using one of "store byte-reversed" instructions.
>>>
>>> So now you want iowrite32() on little endian and iowrite32be() on big
>>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>>> should we just be using __raw_writel() on both?
>>
>>
>> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
>> swap and write separately, it is implemented via the "Store Word
>> Byte-Reverse Indexed X-form" single instruction.
>>
>> And some archs (do not know which ones) may add memory barriers in their
>> implementations of ioread/iowrite. __raw_writel is too raw :)
>>
>>>  There doesn't actually
>>> seem to be any change in behavior here, it just eliminates back-to-back
>>> byte swaps, which are a nop on x86, but not power, right?
>>
>> Exactly. No dependency for QEMU.
> 
> How about that:
> ===
> 
> VFIO exposes BARs to user space as a byte stream so userspace can
> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> not do byte swapping and simply return values as it gets them from
> PCI device.
> 
> Instead, the existing code assumes that byte stream in read/write is
> little-endian and it fixes endianness for values which it passes to
> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
> again. Since both byte swaps are nops on little-endian host, this works.
> 
> This also works for big endian but rather by an accident: it reads 4 bytes
> from the stream (@val is big endian), converts to CPU format (which should
> be big endian) as it was little endian (and @val becomes actually little
> endian) and calls iowrite32() which does swapping on big endian
> system again. So byte swap gets cancelled, __raw_writel() receives
> a native value and then
> *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
> just does the right thing.

I am wrong here, sorry. This is what happens when you watch soccer between
2am and 4am :)


> 
> This removes byte swaps and makes use of ioread32be/iowrite32be
> (and 16bit versions) which do explicit byte swapping at the moment
> of write to a PCI register. PPC64 uses a special "Store Word
> Byte-Reverse Indexed X-form" instruction which does swap and store.

No swapping is done here if we use ioread32be as it calls in_be32 and that
animal does "lwz" which is simple load from memory.

So @val (16/32 bit variable on stack) will have different values on LE and
BE but since we do not handle it the host and just memcpy it to the buffer,
nothing breaks here.


So it should be like this:
===
VFIO exposes BARs to user space as a byte stream so userspace can
read it using pread()/pwrite(). Since this is a byte stream, VFIO should
not do byte swapping and simply return values as it gets them from
PCI device and copy_to_user will save bytes in the correct
same true for writes.

Instead, the existing code assumes that byte stream in read/write is
little-endian and it fixes endianness for values which it passes to
ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
again. Since both byte swaps are nops on little-endian host, this works.

This also works for big endian but rather by an accident: it reads 4 bytes
from the stream (@val is big endian), converts to CPU format (which should
be big endian) as it was little endian (and @val becomes actually little
endian) and calls iowrite32() which does swapping on big endian
system again. So byte swap in the host gets cancelled and __raw_writel()
writes the value which was swapped originally by the guest.

This removes byte swaps and makes use of ioread32be/iowrite32be
(and 16bit versions) which do not do byte swap on BE hosts.
For LE hosts, ioread32/iowrite32 are still used.

===


> ===
> 
> any better?
> 
> 
> 
> 
>>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>> index 210db24..f363b5a 100644
>>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>> @@ -21,6 +21,18 @@
>>>>  
>>>>  #include "vfio_pci_private.h"
>>>>  
>>>> +#ifdef __BIG_ENDIAN__
>>>> +#define ioread16_native		ioread16be
>>>> +#define ioread32_native		ioread32be
>>>> +#define iowrite16_native	iowrite16be
>>>> +#define iowrite32_native	iowrite32be
>>>> +#else
>>>> +#define ioread16_native		ioread16
>>>> +#define ioread32_native		ioread32
>>>> +#define iowrite16_native	iowrite16
>>>> +#define iowrite32_native	iowrite32
>>>> +#endif
>>>> +
>>>>  /*
>>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>>>   * range which is inaccessible.  The excluded range drops writes and fills
>>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>  				if (copy_from_user(&val, buf, 4))
>>>>  					return -EFAULT;
>>>>  
>>>> -				iowrite32(le32_to_cpu(val), io + off);
>>>> +				iowrite32_native(val, io + off);
>>>>  			} else {
>>>> -				val = cpu_to_le32(ioread32(io + off));
>>>> +				val = ioread32_native(io + off);
>>>>  
>>>>  				if (copy_to_user(buf, &val, 4))
>>>>  					return -EFAULT;
>>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>  				if (copy_from_user(&val, buf, 2))
>>>>  					return -EFAULT;
>>>>  
>>>> -				iowrite16(le16_to_cpu(val), io + off);
>>>> +				iowrite16_native(val, io + off);
>>>>  			} else {
>>>> -				val = cpu_to_le16(ioread16(io + off));
>>>> +				val = ioread16_native(io + off);
>>>>  
>>>>  				if (copy_to_user(buf, &val, 2))
>>>>  					return -EFAULT;
>>>
>>>
>>>
>>
>>
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  3:48       ` Alexey Kardashevskiy
@ 2014-06-19  5:30         ` Bharat.Bhushan
  2014-06-19  6:17           ` Alexey Kardashevskiy
  2014-06-20  3:21         ` Alex Williamson
  1 sibling, 1 reply; 26+ messages in thread
From: Bharat.Bhushan @ 2014-06-19  5:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Alex Williamson
  Cc: Alexander Graf, linuxppc-dev, Nikunj A Dadhania, kvm, linux-kernel

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogTGludXhwcGMtZGV2IFtt
YWlsdG86bGludXhwcGMtZGV2LQ0KPiBib3VuY2VzK2JoYXJhdC5iaHVzaGFuPWZyZWVzY2FsZS5j
b21AbGlzdHMub3psYWJzLm9yZ10gT24gQmVoYWxmIE9mIEFsZXhleQ0KPiBLYXJkYXNoZXZza2l5
DQo+IFNlbnQ6IFRodXJzZGF5LCBKdW5lIDE5LCAyMDE0IDk6MTggQU0NCj4gVG86IEFsZXggV2ls
bGlhbXNvbg0KPiBDYzoga3ZtQHZnZXIua2VybmVsLm9yZzsgTmlrdW5qIEEgRGFkaGFuaWE7IGxp
bnV4LWtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7DQo+IEFsZXhhbmRlciBHcmFmOyBsaW51eHBwYy1k
ZXZAbGlzdHMub3psYWJzLm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENIXSB2ZmlvOiBGaXggZW5k
aWFubmVzcyBoYW5kbGluZyBmb3IgZW11bGF0ZWQgQkFScw0KPiANCj4gT24gMDYvMTkvMjAxNCAx
MTo1MCBBTSwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4gT24gMDYvMTkvMjAxNCAx
MDo1MCBBTSwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4+IE9uIDA2LzE5LzIwMTQg
MDQ6MzUgQU0sIEFsZXggV2lsbGlhbXNvbiB3cm90ZToNCj4gPj4+IE9uIFdlZCwgMjAxNC0wNi0x
OCBhdCAyMTozNiArMTAwMCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4+Pj4gVkZJ
TyBleHBvc2VzIEJBUnMgdG8gdXNlciBzcGFjZSBhcyBhIGJ5dGUgc3RyZWFtIHNvIHVzZXJzcGFj
ZSBjYW4NCj4gPj4+PiByZWFkIGl0IHVzaW5nIHByZWFkKCkvcHdyaXRlKCkuIFNpbmNlIHRoaXMg
aXMgYSBieXRlIHN0cmVhbSwgVkZJTw0KPiA+Pj4+IHNob3VsZCBub3QgZG8gYnl0ZSBzd2FwcGlu
ZyBhbmQgc2ltcGx5IHJldHVybiB2YWx1ZXMgYXMgaXQgZ2V0cw0KPiA+Pj4+IHRoZW0gZnJvbSBQ
Q0kgZGV2aWNlLg0KPiA+Pj4+DQo+ID4+Pj4gSW5zdGVhZCwgdGhlIGV4aXN0aW5nIGNvZGUgYXNz
dW1lcyB0aGF0IGJ5dGUgc3RyZWFtIGluIHJlYWQvd3JpdGUNCj4gPj4+PiBpcyBsaXR0bGUtZW5k
aWFuIGFuZCBpdCBmaXhlcyBlbmRpYW5uZXNzIGZvciB2YWx1ZXMgd2hpY2ggaXQgcGFzc2VzDQo+
ID4+Pj4gdG8gaW9yZWFkWFgvaW93cml0ZVhYIGhlbHBlcnMuIFRoaXMgd29ya3MgZm9yIGxpdHRs
ZS1lbmRpYW4gYXMgUENJDQo+ID4+Pj4gaXMgbGl0dGxlIGVuZGlhbiBhbmQgbGUzMl90b19jcHUv
Li4uIGFyZSBzdHVicy4NCj4gPj4+DQo+ID4+PiB2ZmlvIHJlYWQzMjoNCj4gPj4+DQo+ID4+PiB2
YWwgPSBjcHVfdG9fbGUzMihpb3JlYWQzMihpbyArIG9mZikpOw0KPiA+Pj4NCj4gPj4+IFdoZXJl
IHRoZSB0eXBpY2FsIHg4NiBjYXNlLCBpb3JlYWQzMiBpczoNCj4gPj4+DQo+ID4+PiAjZGVmaW5l
IGlvcmVhZDMyKGFkZHIpICAgICAgICAgIHJlYWRsKGFkZHIpDQo+ID4+Pg0KPiA+Pj4gYW5kIHJl
YWRsIGlzOg0KPiA+Pj4NCj4gPj4+IF9fbGUzMl90b19jcHUoX19yYXdfcmVhZGwoYWRkcikpOw0K
PiA+Pj4NCj4gPj4+IFNvIHdlIGRvIGNhbmNlbGluZyBieXRlIHN3YXBzLCB3aGljaCBhcmUgYm90
aCBub3BzIG9uIHg4NiwgYW5kIGVuZA0KPiA+Pj4gdXAgcmV0dXJuaW5nIGRldmljZSBlbmRpYW4s
IHdoaWNoIHdlIGFzc3VtZSBpcyBsaXR0bGUgZW5kaWFuLg0KPiA+Pj4NCj4gPj4+IHZmaW8gd3Jp
dGUzMiBpcyBzaW1pbGFyOg0KPiA+Pj4NCj4gPj4+IGlvd3JpdGUzMihsZTMyX3RvX2NwdSh2YWwp
LCBpbyArIG9mZik7DQo+ID4+Pg0KPiA+Pj4gVGhlIGltcGxpY2l0IGNwdV90b19sZTMyIG9mIGlv
d3JpdGUzMigpIGFuZCBvdXIgZXhwbGljaXQgc3dhcCBjYW5jZWwNCj4gPj4+IG91dCwgc28gaW5w
dXQgZGF0YSBpcyBkZXZpY2UgZW5kaWFuLCB3aGljaCBpcyBhc3N1bWVkIGxpdHRsZS4NCj4gPj4+
DQo+ID4+Pj4gVGhpcyBhbHNvIHdvcmtzIGZvciBiaWcgZW5kaWFuIGJ1dCByYXRoZXIgYnkgYW4g
YWNjaWRlbnQ6IGl0IHJlYWRzDQo+ID4+Pj4gNCBieXRlcyBmcm9tIHRoZSBzdHJlYW0gKEB2YWwg
aXMgYmlnIGVuZGlhbiksIGNvbnZlcnRzIHRvIENQVQ0KPiA+Pj4+IGZvcm1hdCAod2hpY2ggc2hv
dWxkIGJlIGJpZyBlbmRpYW4pIGFzIGl0IHdhcyBsaXR0bGUgZW5kaWFuIChAdmFsDQo+ID4+Pj4g
YmVjb21lcyBhY3R1YWxseSBsaXR0bGUNCj4gPj4+PiBlbmRpYW4pIGFuZCBjYWxscyBpb3dyaXRl
MzIoKSB3aGljaCBkb2VzIG5vdCBkbyBzd2FwcGluZyBvbiBiaWcNCj4gPj4+PiBlbmRpYW4gc3lz
dGVtLg0KPiA+Pj4NCj4gPj4+IFJlYWxseT8NCj4gPj4+DQo+ID4+PiBJbiBhcmNoL3Bvd2VycGMv
a2VybmVsL2lvbWFwLmMgaW93cml0ZTMyKCkgaXMganVzdCBhIHdyYXBwZXIgYXJvdW5kDQo+ID4+
PiB3cml0ZWwoKSwgd2hpY2ggc2VlbXMgdG8gdXNlIHRoZSBnZW5lcmljIGltcGxlbWVudGF0aW9u
LCB3aGljaCBkb2VzDQo+ID4+PiBpbmNsdWRlIGEgY3B1X3RvX2xlMzIuDQo+ID4+DQo+ID4+DQo+
ID4+IE91Y2gsIHdyb25nIGNvbW1lbnQuIGlvd3JpdGUzMigpIGRvZXMgc3dhcHBpbmcuIE15IGJh
ZC4NCj4gPj4NCj4gPj4NCj4gPj4+DQo+ID4+PiBJIGFsc28gc2VlIG90aGVyIGJpZyBlbmRpYW4g
YXJjaHMgbGlrZSBwYXJpc2MgZG9pbmcgY3B1X3RvX2xlMzIgb24NCj4gPj4+IGlvd3JpdGUzMiwg
c28gSSBkb24ndCB0aGluayB0aGlzIHN0YXRlbWVudCBpcyB0cnVlLiAgSSBpbWFnaW5lIGl0J3MN
Cj4gPj4+IHByb2JhYmx5IHdvcmtpbmcgZm9yIHlvdSBiZWNhdXNlIHRoZSBzd2FwIGNhbmNlbC4N
Cj4gPj4+DQo+ID4+Pj4gVGhpcyByZW1vdmVzIGJ5dGUgc3dhcHBpbmcgYW5kIG1ha2VzIHVzZSBp
b3JlYWQzMmJlL2lvd3JpdGUzMmJlDQo+ID4+Pj4gKGFuZCAxNmJpdCB2ZXJzaW9ucykgb24gYmln
LWVuZGlhbiBzeXN0ZW1zLiBUaGUgImJlIiBoZWxwZXJzIHRha2UNCj4gPj4+PiBuYXRpdmUgZW5k
aWFuIHZhbHVlcyBhbmQgZG8gc3dhcHBpbmcgYXQgdGhlIG1vbWVudCBvZiB3cml0aW5nIHRvIGEN
Cj4gPj4+PiBQQ0kgcmVnaXN0ZXIgdXNpbmcgb25lIG9mICJzdG9yZSBieXRlLXJldmVyc2VkIiBp
bnN0cnVjdGlvbnMuDQo+ID4+Pg0KPiA+Pj4gU28gbm93IHlvdSB3YW50IGlvd3JpdGUzMigpIG9u
IGxpdHRsZSBlbmRpYW4gYW5kIGlvd3JpdGUzMmJlKCkgb24NCj4gPj4+IGJpZyBlbmRpYW4sIHRo
ZSBmb3JtZXIgZG9lcyBhIGNwdV90b19sZTMyICh3aGljaCBpcyBhIG5vcCBvbiBsaXR0bGUNCj4g
Pj4+IGVuZGlhbikgYW5kIHRoZSBsYXR0ZXIgZG9lcyBhIGNwdV90b19iZTMyICh3aGljaCBpcyBh
IG5vcCBvbiBiaWcgZW5kaWFuKS4uLg0KPiA+Pj4gc2hvdWxkIHdlIGp1c3QgYmUgdXNpbmcgX19y
YXdfd3JpdGVsKCkgb24gYm90aD8NCj4gPj4NCj4gPj4NCj4gPj4gV2UgY2FuIGRvIHRoYXQgdG9v
LiBUaGUgYmVhdXR5IG9mIGlvd3JpdGUzMmJlIG9uIHBwYzY0IGlzIHRoYXQgaXQNCj4gPj4gZG9l
cyBub3Qgc3dhcCBhbmQgd3JpdGUgc2VwYXJhdGVseSwgaXQgaXMgaW1wbGVtZW50ZWQgdmlhIHRo
ZSAiU3RvcmUNCj4gPj4gV29yZCBCeXRlLVJldmVyc2UgSW5kZXhlZCBYLWZvcm0iIHNpbmdsZSBp
bnN0cnVjdGlvbi4NCj4gPj4NCj4gPj4gQW5kIHNvbWUgYXJjaHMgKGRvIG5vdCBrbm93IHdoaWNo
IG9uZXMpIG1heSBhZGQgbWVtb3J5IGJhcnJpZXJzIGluDQo+ID4+IHRoZWlyIGltcGxlbWVudGF0
aW9ucyBvZiBpb3JlYWQvaW93cml0ZS4gX19yYXdfd3JpdGVsIGlzIHRvbyByYXcgOikNCj4gPj4N
Cj4gPj4+ICBUaGVyZSBkb2Vzbid0IGFjdHVhbGx5DQo+ID4+PiBzZWVtIHRvIGJlIGFueSBjaGFu
Z2UgaW4gYmVoYXZpb3IgaGVyZSwgaXQganVzdCBlbGltaW5hdGVzDQo+ID4+PiBiYWNrLXRvLWJh
Y2sgYnl0ZSBzd2Fwcywgd2hpY2ggYXJlIGEgbm9wIG9uIHg4NiwgYnV0IG5vdCBwb3dlciwgcmln
aHQ/DQo+ID4+DQo+ID4+IEV4YWN0bHkuIE5vIGRlcGVuZGVuY3kgZm9yIFFFTVUuDQo+ID4NCj4g
PiBIb3cgYWJvdXQgdGhhdDoNCj4gPiA9PT0NCj4gPg0KPiA+IFZGSU8gZXhwb3NlcyBCQVJzIHRv
IHVzZXIgc3BhY2UgYXMgYSBieXRlIHN0cmVhbSBzbyB1c2Vyc3BhY2UgY2FuIHJlYWQNCj4gPiBp
dCB1c2luZyBwcmVhZCgpL3B3cml0ZSgpLiBTaW5jZSB0aGlzIGlzIGEgYnl0ZSBzdHJlYW0sIFZG
SU8gc2hvdWxkDQo+ID4gbm90IGRvIGJ5dGUgc3dhcHBpbmcgYW5kIHNpbXBseSByZXR1cm4gdmFs
dWVzIGFzIGl0IGdldHMgdGhlbSBmcm9tIFBDSQ0KPiA+IGRldmljZS4NCj4gPg0KPiA+IEluc3Rl
YWQsIHRoZSBleGlzdGluZyBjb2RlIGFzc3VtZXMgdGhhdCBieXRlIHN0cmVhbSBpbiByZWFkL3dy
aXRlIGlzDQo+ID4gbGl0dGxlLWVuZGlhbiBhbmQgaXQgZml4ZXMgZW5kaWFubmVzcyBmb3IgdmFs
dWVzIHdoaWNoIGl0IHBhc3NlcyB0bw0KPiA+IGlvcmVhZFhYL2lvd3JpdGVYWCBoZWxwZXJzIGlu
IG5hdGl2ZSBmb3JtYXQuIFRoZSBJTyBoZWxwZXJzIGRvDQo+ID4gc3dhcHBpbmcgYWdhaW4uIFNp
bmNlIGJvdGggYnl0ZSBzd2FwcyBhcmUgbm9wcyBvbiBsaXR0bGUtZW5kaWFuIGhvc3QsIHRoaXMN
Cj4gd29ya3MuDQo+ID4NCj4gPiBUaGlzIGFsc28gd29ya3MgZm9yIGJpZyBlbmRpYW4gYnV0IHJh
dGhlciBieSBhbiBhY2NpZGVudDogaXQgcmVhZHMgNA0KPiA+IGJ5dGVzIGZyb20gdGhlIHN0cmVh
bSAoQHZhbCBpcyBiaWcgZW5kaWFuKSwgY29udmVydHMgdG8gQ1BVIGZvcm1hdA0KPiA+ICh3aGlj
aCBzaG91bGQgYmUgYmlnIGVuZGlhbikgYXMgaXQgd2FzIGxpdHRsZSBlbmRpYW4gKGFuZCBAdmFs
IGJlY29tZXMNCj4gPiBhY3R1YWxseSBsaXR0bGUNCj4gPiBlbmRpYW4pIGFuZCBjYWxscyBpb3dy
aXRlMzIoKSB3aGljaCBkb2VzIHN3YXBwaW5nIG9uIGJpZyBlbmRpYW4gc3lzdGVtDQo+ID4gYWdh
aW4uIFNvIGJ5dGUgc3dhcCBnZXRzIGNhbmNlbGxlZCwgX19yYXdfd3JpdGVsKCkgcmVjZWl2ZXMg
YSBuYXRpdmUNCj4gPiB2YWx1ZSBhbmQgdGhlbiAqKHZvbGF0aWxlIHVuc2lnbmVkIGludCBfX2Zv
cmNlICopUENJX0ZJWF9BRERSKGFkZHIpID0NCj4gPiB2OyBqdXN0IGRvZXMgdGhlIHJpZ2h0IHRo
aW5nLg0KPiANCj4gSSBhbSB3cm9uZyBoZXJlLCBzb3JyeS4gVGhpcyBpcyB3aGF0IGhhcHBlbnMg
d2hlbiB5b3Ugd2F0Y2ggc29jY2VyIGJldHdlZW4gMmFtDQo+IGFuZCA0YW0gOikNCj4gDQo+IA0K
PiA+DQo+ID4gVGhpcyByZW1vdmVzIGJ5dGUgc3dhcHMgYW5kIG1ha2VzIHVzZSBvZiBpb3JlYWQz
MmJlL2lvd3JpdGUzMmJlIChhbmQNCj4gPiAxNmJpdCB2ZXJzaW9ucykgd2hpY2ggZG8gZXhwbGlj
aXQgYnl0ZSBzd2FwcGluZyBhdCB0aGUgbW9tZW50IG9mIHdyaXRlDQo+ID4gdG8gYSBQQ0kgcmVn
aXN0ZXIuIFBQQzY0IHVzZXMgYSBzcGVjaWFsICJTdG9yZSBXb3JkIEJ5dGUtUmV2ZXJzZQ0KPiA+
IEluZGV4ZWQgWC1mb3JtIiBpbnN0cnVjdGlvbiB3aGljaCBkb2VzIHN3YXAgYW5kIHN0b3JlLg0K
PiANCj4gTm8gc3dhcHBpbmcgaXMgZG9uZSBoZXJlIGlmIHdlIHVzZSBpb3JlYWQzMmJlIGFzIGl0
IGNhbGxzIGluX2JlMzIgYW5kIHRoYXQNCj4gYW5pbWFsIGRvZXMgImx3eiIgd2hpY2ggaXMgc2lt
cGxlIGxvYWQgZnJvbSBtZW1vcnkuDQo+IA0KPiBTbyBAdmFsICgxNi8zMiBiaXQgdmFyaWFibGUg
b24gc3RhY2spIHdpbGwgaGF2ZSBkaWZmZXJlbnQgdmFsdWVzIG9uIExFIGFuZCBCRQ0KPiBidXQg
c2luY2Ugd2UgZG8gbm90IGhhbmRsZSBpdCB0aGUgaG9zdCBhbmQganVzdCBtZW1jcHkgaXQgdG8g
dGhlIGJ1ZmZlciwgbm90aGluZw0KPiBicmVha3MgaGVyZS4NCj4gDQo+IA0KPiBTbyBpdCBzaG91
bGQgYmUgbGlrZSB0aGlzOg0KPiA9PT0NCj4gVkZJTyBleHBvc2VzIEJBUnMgdG8gdXNlciBzcGFj
ZSBhcyBhIGJ5dGUgc3RyZWFtIHNvIHVzZXJzcGFjZSBjYW4gcmVhZCBpdCB1c2luZw0KPiBwcmVh
ZCgpL3B3cml0ZSgpLiBTaW5jZSB0aGlzIGlzIGEgYnl0ZSBzdHJlYW0sIFZGSU8gc2hvdWxkIG5v
dCBkbyBieXRlIHN3YXBwaW5nDQo+IGFuZCBzaW1wbHkgcmV0dXJuIHZhbHVlcyBhcyBpdCBnZXRz
IHRoZW0gZnJvbSBQQ0kgZGV2aWNlIGFuZCBjb3B5X3RvX3VzZXIgd2lsbA0KPiBzYXZlIGJ5dGVz
IGluIHRoZSBjb3JyZWN0IHNhbWUgdHJ1ZSBmb3Igd3JpdGVzLg0KDQoiIGNvcHlfdG9fdXNlciB3
aWxsIHNhdmUgYnl0ZXMgaW4gdGhlIGNvcnJlY3QiIC0tLT8gLS0tICJzYW1lIHRydWUgZm9yIHdy
aXRlcyIuDQoNClRoYW5rcw0KLUJoYXJhdA0KDQo+IA0KPiBJbnN0ZWFkLCB0aGUgZXhpc3Rpbmcg
Y29kZSBhc3N1bWVzIHRoYXQgYnl0ZSBzdHJlYW0gaW4gcmVhZC93cml0ZSBpcyBsaXR0bGUtDQo+
IGVuZGlhbiBhbmQgaXQgZml4ZXMgZW5kaWFubmVzcyBmb3IgdmFsdWVzIHdoaWNoIGl0IHBhc3Nl
cyB0byBpb3JlYWRYWC9pb3dyaXRlWFgNCj4gaGVscGVycyBpbiBuYXRpdmUgZm9ybWF0LiBUaGUg
SU8gaGVscGVycyBkbyBzd2FwcGluZyBhZ2Fpbi4gU2luY2UgYm90aCBieXRlDQo+IHN3YXBzIGFy
ZSBub3BzIG9uIGxpdHRsZS1lbmRpYW4gaG9zdCwgdGhpcyB3b3Jrcy4NCg0KPiANCj4gVGhpcyBh
bHNvIHdvcmtzIGZvciBiaWcgZW5kaWFuIGJ1dCByYXRoZXIgYnkgYW4gYWNjaWRlbnQ6IGl0IHJl
YWRzIDQgYnl0ZXMgZnJvbQ0KPiB0aGUgc3RyZWFtIChAdmFsIGlzIGJpZyBlbmRpYW4pLCBjb252
ZXJ0cyB0byBDUFUgZm9ybWF0ICh3aGljaCBzaG91bGQgYmUgYmlnDQo+IGVuZGlhbikgYXMgaXQg
d2FzIGxpdHRsZSBlbmRpYW4gKGFuZCBAdmFsIGJlY29tZXMgYWN0dWFsbHkgbGl0dGxlDQo+IGVu
ZGlhbikgYW5kIGNhbGxzIGlvd3JpdGUzMigpIHdoaWNoIGRvZXMgc3dhcHBpbmcgb24gYmlnIGVu
ZGlhbiBzeXN0ZW0gYWdhaW4uIFNvDQo+IGJ5dGUgc3dhcCBpbiB0aGUgaG9zdCBnZXRzIGNhbmNl
bGxlZCBhbmQgX19yYXdfd3JpdGVsKCkgd3JpdGVzIHRoZSB2YWx1ZSB3aGljaA0KPiB3YXMgc3dh
cHBlZCBvcmlnaW5hbGx5IGJ5IHRoZSBndWVzdC4NCj4gDQo+IFRoaXMgcmVtb3ZlcyBieXRlIHN3
YXBzIGFuZCBtYWtlcyB1c2Ugb2YgaW9yZWFkMzJiZS9pb3dyaXRlMzJiZSAoYW5kIDE2Yml0DQo+
IHZlcnNpb25zKSB3aGljaCBkbyBub3QgZG8gYnl0ZSBzd2FwIG9uIEJFIGhvc3RzLg0KPiBGb3Ig
TEUgaG9zdHMsIGlvcmVhZDMyL2lvd3JpdGUzMiBhcmUgc3RpbGwgdXNlZC4NCj4gDQo+ID09PQ0K
PiANCj4gDQo+ID4gPT09DQo+ID4NCj4gPiBhbnkgYmV0dGVyPw0KPiA+DQo+ID4NCj4gPg0KPiA+
DQo+ID4+Pj4gU3VnZ2VzdGVkLWJ5OiBCZW5qYW1pbiBIZXJyZW5zY2htaWR0IDxiZW5oQGtlcm5l
bC5jcmFzaGluZy5vcmc+DQo+ID4+Pj4gU2lnbmVkLW9mZi1ieTogQWxleGV5IEthcmRhc2hldnNr
aXkgPGFpa0BvemxhYnMucnU+DQo+ID4+Pj4gLS0tDQo+ID4+Pj4gIGRyaXZlcnMvdmZpby9wY2kv
dmZpb19wY2lfcmR3ci5jIHwgMjAgKysrKysrKysrKysrKysrKy0tLS0NCj4gPj4+PiAgMSBmaWxl
IGNoYW5nZWQsIDE2IGluc2VydGlvbnMoKyksIDQgZGVsZXRpb25zKC0pDQo+ID4+Pj4NCj4gPj4+
PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy92ZmlvL3BjaS92ZmlvX3BjaV9yZHdyLmMNCj4gPj4+PiBi
L2RyaXZlcnMvdmZpby9wY2kvdmZpb19wY2lfcmR3ci5jDQo+ID4+Pj4gaW5kZXggMjEwZGIyNC4u
ZjM2M2I1YSAxMDA2NDQNCj4gPj4+PiAtLS0gYS9kcml2ZXJzL3ZmaW8vcGNpL3ZmaW9fcGNpX3Jk
d3IuYw0KPiA+Pj4+ICsrKyBiL2RyaXZlcnMvdmZpby9wY2kvdmZpb19wY2lfcmR3ci5jDQo+ID4+
Pj4gQEAgLTIxLDYgKzIxLDE4IEBADQo+ID4+Pj4NCj4gPj4+PiAgI2luY2x1ZGUgInZmaW9fcGNp
X3ByaXZhdGUuaCINCj4gPj4+Pg0KPiA+Pj4+ICsjaWZkZWYgX19CSUdfRU5ESUFOX18NCj4gPj4+
PiArI2RlZmluZSBpb3JlYWQxNl9uYXRpdmUJCWlvcmVhZDE2YmUNCj4gPj4+PiArI2RlZmluZSBp
b3JlYWQzMl9uYXRpdmUJCWlvcmVhZDMyYmUNCj4gPj4+PiArI2RlZmluZSBpb3dyaXRlMTZfbmF0
aXZlCWlvd3JpdGUxNmJlDQo+ID4+Pj4gKyNkZWZpbmUgaW93cml0ZTMyX25hdGl2ZQlpb3dyaXRl
MzJiZQ0KPiA+Pj4+ICsjZWxzZQ0KPiA+Pj4+ICsjZGVmaW5lIGlvcmVhZDE2X25hdGl2ZQkJaW9y
ZWFkMTYNCj4gPj4+PiArI2RlZmluZSBpb3JlYWQzMl9uYXRpdmUJCWlvcmVhZDMyDQo+ID4+Pj4g
KyNkZWZpbmUgaW93cml0ZTE2X25hdGl2ZQlpb3dyaXRlMTYNCj4gPj4+PiArI2RlZmluZSBpb3dy
aXRlMzJfbmF0aXZlCWlvd3JpdGUzMg0KPiA+Pj4+ICsjZW5kaWYNCj4gPj4+PiArDQo+ID4+Pj4g
IC8qDQo+ID4+Pj4gICAqIFJlYWQgb3Igd3JpdGUgZnJvbSBhbiBfX2lvbWVtIHJlZ2lvbiAoTU1J
TyBvciBJL08gcG9ydCkgd2l0aCBhbg0KPiBleGNsdWRlZA0KPiA+Pj4+ICAgKiByYW5nZSB3aGlj
aCBpcyBpbmFjY2Vzc2libGUuICBUaGUgZXhjbHVkZWQgcmFuZ2UgZHJvcHMgd3JpdGVzDQo+ID4+
Pj4gYW5kIGZpbGxzIEBAIC01MCw5ICs2Miw5IEBAIHN0YXRpYyBzc2l6ZV90IGRvX2lvX3J3KHZv
aWQgX19pb21lbSAqaW8sIGNoYXINCj4gX191c2VyICpidWYsDQo+ID4+Pj4gIAkJCQlpZiAoY29w
eV9mcm9tX3VzZXIoJnZhbCwgYnVmLCA0KSkNCj4gPj4+PiAgCQkJCQlyZXR1cm4gLUVGQVVMVDsN
Cj4gPj4+Pg0KPiA+Pj4+IC0JCQkJaW93cml0ZTMyKGxlMzJfdG9fY3B1KHZhbCksIGlvICsgb2Zm
KTsNCj4gPj4+PiArCQkJCWlvd3JpdGUzMl9uYXRpdmUodmFsLCBpbyArIG9mZik7DQo+ID4+Pj4g
IAkJCX0gZWxzZSB7DQo+ID4+Pj4gLQkJCQl2YWwgPSBjcHVfdG9fbGUzMihpb3JlYWQzMihpbyAr
IG9mZikpOw0KPiA+Pj4+ICsJCQkJdmFsID0gaW9yZWFkMzJfbmF0aXZlKGlvICsgb2ZmKTsNCj4g
Pj4+Pg0KPiA+Pj4+ICAJCQkJaWYgKGNvcHlfdG9fdXNlcihidWYsICZ2YWwsIDQpKQ0KPiA+Pj4+
ICAJCQkJCXJldHVybiAtRUZBVUxUOw0KPiA+Pj4+IEBAIC02Niw5ICs3OCw5IEBAIHN0YXRpYyBz
c2l6ZV90IGRvX2lvX3J3KHZvaWQgX19pb21lbSAqaW8sIGNoYXIgX191c2VyDQo+ICpidWYsDQo+
ID4+Pj4gIAkJCQlpZiAoY29weV9mcm9tX3VzZXIoJnZhbCwgYnVmLCAyKSkNCj4gPj4+PiAgCQkJ
CQlyZXR1cm4gLUVGQVVMVDsNCj4gPj4+Pg0KPiA+Pj4+IC0JCQkJaW93cml0ZTE2KGxlMTZfdG9f
Y3B1KHZhbCksIGlvICsgb2ZmKTsNCj4gPj4+PiArCQkJCWlvd3JpdGUxNl9uYXRpdmUodmFsLCBp
byArIG9mZik7DQo+ID4+Pj4gIAkJCX0gZWxzZSB7DQo+ID4+Pj4gLQkJCQl2YWwgPSBjcHVfdG9f
bGUxNihpb3JlYWQxNihpbyArIG9mZikpOw0KPiA+Pj4+ICsJCQkJdmFsID0gaW9yZWFkMTZfbmF0
aXZlKGlvICsgb2ZmKTsNCj4gPj4+Pg0KPiA+Pj4+ICAJCQkJaWYgKGNvcHlfdG9fdXNlcihidWYs
ICZ2YWwsIDIpKQ0KPiA+Pj4+ICAJCQkJCXJldHVybiAtRUZBVUxUOw0KPiA+Pj4NCj4gPj4+DQo+
ID4+Pg0KPiA+Pg0KPiA+Pg0KPiA+DQo+ID4NCj4gDQo+IA0KPiAtLQ0KPiBBbGV4ZXkNCj4gX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gTGludXhwcGMt
ZGV2IG1haWxpbmcgbGlzdA0KPiBMaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZw0KPiBodHRw
czovL2xpc3RzLm96bGFicy5vcmcvbGlzdGluZm8vbGludXhwcGMtZGV2DQo=

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  5:30         ` Bharat.Bhushan
@ 2014-06-19  6:17           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-19  6:17 UTC (permalink / raw)
  To: Bharat.Bhushan, Alex Williamson
  Cc: Alexander Graf, linuxppc-dev, Nikunj A Dadhania, kvm, linux-kernel

On 06/19/2014 03:30 PM, Bharat.Bhushan@freescale.com wrote:
> 
> 
>> -----Original Message-----
>> From: Linuxppc-dev [mailto:linuxppc-dev-
>> bounces+bharat.bhushan=freescale.com@lists.ozlabs.org] On Behalf Of Alexey
>> Kardashevskiy
>> Sent: Thursday, June 19, 2014 9:18 AM
>> To: Alex Williamson
>> Cc: kvm@vger.kernel.org; Nikunj A Dadhania; linux-kernel@vger.kernel.org;
>> Alexander Graf; linuxppc-dev@lists.ozlabs.org
>> Subject: Re: [PATCH] vfio: Fix endianness handling for emulated BARs
>>
>> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
>>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
>>>> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>>>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>>>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>>>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO
>>>>>> should not do byte swapping and simply return values as it gets
>>>>>> them from PCI device.
>>>>>>
>>>>>> Instead, the existing code assumes that byte stream in read/write
>>>>>> is little-endian and it fixes endianness for values which it passes
>>>>>> to ioreadXX/iowriteXX helpers. This works for little-endian as PCI
>>>>>> is little endian and le32_to_cpu/... are stubs.
>>>>>
>>>>> vfio read32:
>>>>>
>>>>> val = cpu_to_le32(ioread32(io + off));
>>>>>
>>>>> Where the typical x86 case, ioread32 is:
>>>>>
>>>>> #define ioread32(addr)          readl(addr)
>>>>>
>>>>> and readl is:
>>>>>
>>>>> __le32_to_cpu(__raw_readl(addr));
>>>>>
>>>>> So we do canceling byte swaps, which are both nops on x86, and end
>>>>> up returning device endian, which we assume is little endian.
>>>>>
>>>>> vfio write32 is similar:
>>>>>
>>>>> iowrite32(le32_to_cpu(val), io + off);
>>>>>
>>>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>>>>> out, so input data is device endian, which is assumed little.
>>>>>
>>>>>> This also works for big endian but rather by an accident: it reads
>>>>>> 4 bytes from the stream (@val is big endian), converts to CPU
>>>>>> format (which should be big endian) as it was little endian (@val
>>>>>> becomes actually little
>>>>>> endian) and calls iowrite32() which does not do swapping on big
>>>>>> endian system.
>>>>>
>>>>> Really?
>>>>>
>>>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>>>>> writel(), which seems to use the generic implementation, which does
>>>>> include a cpu_to_le32.
>>>>
>>>>
>>>> Ouch, wrong comment. iowrite32() does swapping. My bad.
>>>>
>>>>
>>>>>
>>>>> I also see other big endian archs like parisc doing cpu_to_le32 on
>>>>> iowrite32, so I don't think this statement is true.  I imagine it's
>>>>> probably working for you because the swap cancel.
>>>>>
>>>>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>>>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>>>>> native endian values and do swapping at the moment of writing to a
>>>>>> PCI register using one of "store byte-reversed" instructions.
>>>>>
>>>>> So now you want iowrite32() on little endian and iowrite32be() on
>>>>> big endian, the former does a cpu_to_le32 (which is a nop on little
>>>>> endian) and the latter does a cpu_to_be32 (which is a nop on big endian)...
>>>>> should we just be using __raw_writel() on both?
>>>>
>>>>
>>>> We can do that too. The beauty of iowrite32be on ppc64 is that it
>>>> does not swap and write separately, it is implemented via the "Store
>>>> Word Byte-Reverse Indexed X-form" single instruction.
>>>>
>>>> And some archs (do not know which ones) may add memory barriers in
>>>> their implementations of ioread/iowrite. __raw_writel is too raw :)
>>>>
>>>>>  There doesn't actually
>>>>> seem to be any change in behavior here, it just eliminates
>>>>> back-to-back byte swaps, which are a nop on x86, but not power, right?
>>>>
>>>> Exactly. No dependency for QEMU.
>>>
>>> How about that:
>>> ===
>>>
>>> VFIO exposes BARs to user space as a byte stream so userspace can read
>>> it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from PCI
>>> device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers in native format. The IO helpers do
>>> swapping again. Since both byte swaps are nops on little-endian host, this
>> works.
>>>
>>> This also works for big endian but rather by an accident: it reads 4
>>> bytes from the stream (@val is big endian), converts to CPU format
>>> (which should be big endian) as it was little endian (and @val becomes
>>> actually little
>>> endian) and calls iowrite32() which does swapping on big endian system
>>> again. So byte swap gets cancelled, __raw_writel() receives a native
>>> value and then *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) =
>>> v; just does the right thing.
>>
>> I am wrong here, sorry. This is what happens when you watch soccer between 2am
>> and 4am :)
>>
>>
>>>
>>> This removes byte swaps and makes use of ioread32be/iowrite32be (and
>>> 16bit versions) which do explicit byte swapping at the moment of write
>>> to a PCI register. PPC64 uses a special "Store Word Byte-Reverse
>>> Indexed X-form" instruction which does swap and store.
>>
>> No swapping is done here if we use ioread32be as it calls in_be32 and that
>> animal does "lwz" which is simple load from memory.
>>
>> So @val (16/32 bit variable on stack) will have different values on LE and BE
>> but since we do not handle it the host and just memcpy it to the buffer, nothing
>> breaks here.
>>
>>
>> So it should be like this:
>> ===
>> VFIO exposes BARs to user space as a byte stream so userspace can read it using
>> pread()/pwrite(). Since this is a byte stream, VFIO should not do byte swapping
>> and simply return values as it gets them from PCI device and copy_to_user will
>> save bytes in the correct same true for writes.
> 
> " copy_to_user will save bytes in the correct" ---? --- "same true for writes".


Oops. "correct order" is it.




> 
> Thanks
> -Bharat
> 
>>
>> Instead, the existing code assumes that byte stream in read/write is little-
>> endian and it fixes endianness for values which it passes to ioreadXX/iowriteXX
>> helpers in native format. The IO helpers do swapping again. Since both byte
>> swaps are nops on little-endian host, this works.
> 
>>
>> This also works for big endian but rather by an accident: it reads 4 bytes from
>> the stream (@val is big endian), converts to CPU format (which should be big
>> endian) as it was little endian (and @val becomes actually little
>> endian) and calls iowrite32() which does swapping on big endian system again. So
>> byte swap in the host gets cancelled and __raw_writel() writes the value which
>> was swapped originally by the guest.
>>
>> This removes byte swaps and makes use of ioread32be/iowrite32be (and 16bit
>> versions) which do not do byte swap on BE hosts.
>> For LE hosts, ioread32/iowrite32 are still used.
>>
>> ===
>>
>>
>>> ===
>>>
>>> any better?
>>>
>>>
>>>
>>>
>>>>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> index 210db24..f363b5a 100644
>>>>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> @@ -21,6 +21,18 @@
>>>>>>
>>>>>>  #include "vfio_pci_private.h"
>>>>>>
>>>>>> +#ifdef __BIG_ENDIAN__
>>>>>> +#define ioread16_native		ioread16be
>>>>>> +#define ioread32_native		ioread32be
>>>>>> +#define iowrite16_native	iowrite16be
>>>>>> +#define iowrite32_native	iowrite32be
>>>>>> +#else
>>>>>> +#define ioread16_native		ioread16
>>>>>> +#define ioread32_native		ioread32
>>>>>> +#define iowrite16_native	iowrite16
>>>>>> +#define iowrite32_native	iowrite32
>>>>>> +#endif
>>>>>> +
>>>>>>  /*
>>>>>>   * Read or write from an __iomem region (MMIO or I/O port) with an
>> excluded
>>>>>>   * range which is inaccessible.  The excluded range drops writes
>>>>>> and fills @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char
>> __user *buf,
>>>>>>  				if (copy_from_user(&val, buf, 4))
>>>>>>  					return -EFAULT;
>>>>>>
>>>>>> -				iowrite32(le32_to_cpu(val), io + off);
>>>>>> +				iowrite32_native(val, io + off);
>>>>>>  			} else {
>>>>>> -				val = cpu_to_le32(ioread32(io + off));
>>>>>> +				val = ioread32_native(io + off);
>>>>>>
>>>>>>  				if (copy_to_user(buf, &val, 4))
>>>>>>  					return -EFAULT;
>>>>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user
>> *buf,
>>>>>>  				if (copy_from_user(&val, buf, 2))
>>>>>>  					return -EFAULT;
>>>>>>
>>>>>> -				iowrite16(le16_to_cpu(val), io + off);
>>>>>> +				iowrite16_native(val, io + off);
>>>>>>  			} else {
>>>>>> -				val = cpu_to_le16(ioread16(io + off));
>>>>>> +				val = ioread16_native(io + off);
>>>>>>
>>>>>>  				if (copy_to_user(buf, &val, 2))
>>>>>>  					return -EFAULT;
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Alexey
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-19  3:48       ` Alexey Kardashevskiy
  2014-06-19  5:30         ` Bharat.Bhushan
@ 2014-06-20  3:21         ` Alex Williamson
  2014-06-20 14:14           ` Alexey Kardashevskiy
  2014-06-20 23:12           ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 26+ messages in thread
From: Alex Williamson @ 2014-06-20  3:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
> > On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
> >> On 06/19/2014 04:35 AM, Alex Williamson wrote:
> >>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
> >>>> VFIO exposes BARs to user space as a byte stream so userspace can
> >>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> >>>> not do byte swapping and simply return values as it gets them from
> >>>> PCI device.
> >>>>
> >>>> Instead, the existing code assumes that byte stream in read/write is
> >>>> little-endian and it fixes endianness for values which it passes to
> >>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
> >>>> little endian and le32_to_cpu/... are stubs.
> >>>
> >>> vfio read32:
> >>>
> >>> val = cpu_to_le32(ioread32(io + off));
> >>>
> >>> Where the typical x86 case, ioread32 is:
> >>>
> >>> #define ioread32(addr)          readl(addr)
> >>>
> >>> and readl is:
> >>>
> >>> __le32_to_cpu(__raw_readl(addr));
> >>>
> >>> So we do canceling byte swaps, which are both nops on x86, and end up
> >>> returning device endian, which we assume is little endian.
> >>>
> >>> vfio write32 is similar:
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
> >>> out, so input data is device endian, which is assumed little.
> >>>
> >>>> This also works for big endian but rather by an accident: it reads 4 bytes
> >>>> from the stream (@val is big endian), converts to CPU format (which should
> >>>> be big endian) as it was little endian (@val becomes actually little
> >>>> endian) and calls iowrite32() which does not do swapping on big endian
> >>>> system.
> >>>
> >>> Really?
> >>>
> >>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
> >>> writel(), which seems to use the generic implementation, which does
> >>> include a cpu_to_le32.
> >>
> >>
> >> Ouch, wrong comment. iowrite32() does swapping. My bad.
> >>
> >>
> >>>
> >>> I also see other big endian archs like parisc doing cpu_to_le32 on
> >>> iowrite32, so I don't think this statement is true.  I imagine it's
> >>> probably working for you because the swap cancel.
> >>>
> >>>> This removes byte swapping and makes use ioread32be/iowrite32be
> >>>> (and 16bit versions) on big-endian systems. The "be" helpers take
> >>>> native endian values and do swapping at the moment of writing to a PCI
> >>>> register using one of "store byte-reversed" instructions.
> >>>
> >>> So now you want iowrite32() on little endian and iowrite32be() on big
> >>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
> >>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
> >>> should we just be using __raw_writel() on both?
> >>
> >>
> >> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
> >> swap and write separately, it is implemented via the "Store Word
> >> Byte-Reverse Indexed X-form" single instruction.
> >>
> >> And some archs (do not know which ones) may add memory barriers in their
> >> implementations of ioread/iowrite. __raw_writel is too raw :)
> >>
> >>>  There doesn't actually
> >>> seem to be any change in behavior here, it just eliminates back-to-back
> >>> byte swaps, which are a nop on x86, but not power, right?
> >>
> >> Exactly. No dependency for QEMU.
> > 
> > How about that:
> > ===
> > 
> > VFIO exposes BARs to user space as a byte stream so userspace can
> > read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> > not do byte swapping and simply return values as it gets them from
> > PCI device.
> > 
> > Instead, the existing code assumes that byte stream in read/write is
> > little-endian and it fixes endianness for values which it passes to
> > ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
> > again. Since both byte swaps are nops on little-endian host, this works.
> > 
> > This also works for big endian but rather by an accident: it reads 4 bytes
> > from the stream (@val is big endian), converts to CPU format (which should
> > be big endian) as it was little endian (and @val becomes actually little
> > endian) and calls iowrite32() which does swapping on big endian
> > system again. So byte swap gets cancelled, __raw_writel() receives
> > a native value and then
> > *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
> > just does the right thing.
> 
> I am wrong here, sorry. This is what happens when you watch soccer between
> 2am and 4am :)
> 
> 
> > 
> > This removes byte swaps and makes use of ioread32be/iowrite32be
> > (and 16bit versions) which do explicit byte swapping at the moment
> > of write to a PCI register. PPC64 uses a special "Store Word
> > Byte-Reverse Indexed X-form" instruction which does swap and store.
> 
> No swapping is done here if we use ioread32be as it calls in_be32 and that
> animal does "lwz" which is simple load from memory.
> 
> So @val (16/32 bit variable on stack) will have different values on LE and
> BE but since we do not handle it the host and just memcpy it to the buffer,
> nothing breaks here.
> 
> 
> So it should be like this:
> ===
> VFIO exposes BARs to user space as a byte stream so userspace can
> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
> not do byte swapping and simply return values as it gets them from
> PCI device and copy_to_user will save bytes in the correct
> same true for writes.
> 
> Instead, the existing code assumes that byte stream in read/write is
> little-endian and it fixes endianness for values which it passes to
> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
> again. Since both byte swaps are nops on little-endian host, this works.
> 
> This also works for big endian but rather by an accident: it reads 4 bytes
> from the stream (@val is big endian), converts to CPU format (which should
> be big endian) as it was little endian (and @val becomes actually little
> endian) and calls iowrite32() which does swapping on big endian
> system again. So byte swap in the host gets cancelled and __raw_writel()
> writes the value which was swapped originally by the guest.
> 
> This removes byte swaps and makes use of ioread32be/iowrite32be
> (and 16bit versions) which do not do byte swap on BE hosts.
> For LE hosts, ioread32/iowrite32 are still used.
> 
> ===

Working on big endian being an accident may be a matter of perspective.
The comment remains that this patch doesn't actually fix anything except
the overhead on big endian systems doing redundant byte swapping and
maybe the philosophy that vfio regions are little endian.

I'm still not a fan of iowrite vs iowritebe, there must be something we
can use that doesn't have an implicit swap.  Calling it iowrite*_native
is also an abuse of the namespace.  Next thing we know some common code
will legitimately use that name.  If we do need to define an alias
(which I'd like to avoid) it should be something like vfio_iowrite32.
Thanks,

Alex

> > ===
> > 
> > any better?
> > 
> > 
> > 
> > 
> >>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>> ---
> >>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
> >>>>  1 file changed, 16 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> >>>> index 210db24..f363b5a 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> >>>> @@ -21,6 +21,18 @@
> >>>>  
> >>>>  #include "vfio_pci_private.h"
> >>>>  
> >>>> +#ifdef __BIG_ENDIAN__
> >>>> +#define ioread16_native		ioread16be
> >>>> +#define ioread32_native		ioread32be
> >>>> +#define iowrite16_native	iowrite16be
> >>>> +#define iowrite32_native	iowrite32be
> >>>> +#else
> >>>> +#define ioread16_native		ioread16
> >>>> +#define ioread32_native		ioread32
> >>>> +#define iowrite16_native	iowrite16
> >>>> +#define iowrite32_native	iowrite32
> >>>> +#endif
> >>>> +
> >>>>  /*
> >>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
> >>>>   * range which is inaccessible.  The excluded range drops writes and fills
> >>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
> >>>>  				if (copy_from_user(&val, buf, 4))
> >>>>  					return -EFAULT;
> >>>>  
> >>>> -				iowrite32(le32_to_cpu(val), io + off);
> >>>> +				iowrite32_native(val, io + off);
> >>>>  			} else {
> >>>> -				val = cpu_to_le32(ioread32(io + off));
> >>>> +				val = ioread32_native(io + off);
> >>>>  
> >>>>  				if (copy_to_user(buf, &val, 4))
> >>>>  					return -EFAULT;
> >>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
> >>>>  				if (copy_from_user(&val, buf, 2))
> >>>>  					return -EFAULT;
> >>>>  
> >>>> -				iowrite16(le16_to_cpu(val), io + off);
> >>>> +				iowrite16_native(val, io + off);
> >>>>  			} else {
> >>>> -				val = cpu_to_le16(ioread16(io + off));
> >>>> +				val = ioread16_native(io + off);
> >>>>  
> >>>>  				if (copy_to_user(buf, &val, 2))
> >>>>  					return -EFAULT;
> >>>
> >>>
> >>>
> >>
> >>
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-20  3:21         ` Alex Williamson
@ 2014-06-20 14:14           ` Alexey Kardashevskiy
  2014-06-20 23:16             ` Benjamin Herrenschmidt
  2014-06-20 23:12           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-20 14:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf, linuxppc-dev

On 06/20/2014 01:21 PM, Alex Williamson wrote:
> On Thu, 2014-06-19 at 13:48 +1000, Alexey Kardashevskiy wrote:
>> On 06/19/2014 11:50 AM, Alexey Kardashevskiy wrote:
>>> On 06/19/2014 10:50 AM, Alexey Kardashevskiy wrote:
>>>> On 06/19/2014 04:35 AM, Alex Williamson wrote:
>>>>> On Wed, 2014-06-18 at 21:36 +1000, Alexey Kardashevskiy wrote:
>>>>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>>>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>>>>> not do byte swapping and simply return values as it gets them from
>>>>>> PCI device.
>>>>>>
>>>>>> Instead, the existing code assumes that byte stream in read/write is
>>>>>> little-endian and it fixes endianness for values which it passes to
>>>>>> ioreadXX/iowriteXX helpers. This works for little-endian as PCI is
>>>>>> little endian and le32_to_cpu/... are stubs.
>>>>>
>>>>> vfio read32:
>>>>>
>>>>> val = cpu_to_le32(ioread32(io + off));
>>>>>
>>>>> Where the typical x86 case, ioread32 is:
>>>>>
>>>>> #define ioread32(addr)          readl(addr)
>>>>>
>>>>> and readl is:
>>>>>
>>>>> __le32_to_cpu(__raw_readl(addr));
>>>>>
>>>>> So we do canceling byte swaps, which are both nops on x86, and end up
>>>>> returning device endian, which we assume is little endian.
>>>>>
>>>>> vfio write32 is similar:
>>>>>
>>>>> iowrite32(le32_to_cpu(val), io + off);
>>>>>
>>>>> The implicit cpu_to_le32 of iowrite32() and our explicit swap cancel
>>>>> out, so input data is device endian, which is assumed little.
>>>>>
>>>>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>>>>> from the stream (@val is big endian), converts to CPU format (which should
>>>>>> be big endian) as it was little endian (@val becomes actually little
>>>>>> endian) and calls iowrite32() which does not do swapping on big endian
>>>>>> system.
>>>>>
>>>>> Really?
>>>>>
>>>>> In arch/powerpc/kernel/iomap.c iowrite32() is just a wrapper around
>>>>> writel(), which seems to use the generic implementation, which does
>>>>> include a cpu_to_le32.
>>>>
>>>>
>>>> Ouch, wrong comment. iowrite32() does swapping. My bad.
>>>>
>>>>
>>>>>
>>>>> I also see other big endian archs like parisc doing cpu_to_le32 on
>>>>> iowrite32, so I don't think this statement is true.  I imagine it's
>>>>> probably working for you because the swap cancel.
>>>>>
>>>>>> This removes byte swapping and makes use ioread32be/iowrite32be
>>>>>> (and 16bit versions) on big-endian systems. The "be" helpers take
>>>>>> native endian values and do swapping at the moment of writing to a PCI
>>>>>> register using one of "store byte-reversed" instructions.
>>>>>
>>>>> So now you want iowrite32() on little endian and iowrite32be() on big
>>>>> endian, the former does a cpu_to_le32 (which is a nop on little endian)
>>>>> and the latter does a cpu_to_be32 (which is a nop on big endian)...
>>>>> should we just be using __raw_writel() on both?
>>>>
>>>>
>>>> We can do that too. The beauty of iowrite32be on ppc64 is that it does not
>>>> swap and write separately, it is implemented via the "Store Word
>>>> Byte-Reverse Indexed X-form" single instruction.
>>>>
>>>> And some archs (do not know which ones) may add memory barriers in their
>>>> implementations of ioread/iowrite. __raw_writel is too raw :)
>>>>
>>>>>  There doesn't actually
>>>>> seem to be any change in behavior here, it just eliminates back-to-back
>>>>> byte swaps, which are a nop on x86, but not power, right?
>>>>
>>>> Exactly. No dependency for QEMU.
>>>
>>> How about that:
>>> ===
>>>
>>> VFIO exposes BARs to user space as a byte stream so userspace can
>>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>>> not do byte swapping and simply return values as it gets them from
>>> PCI device.
>>>
>>> Instead, the existing code assumes that byte stream in read/write is
>>> little-endian and it fixes endianness for values which it passes to
>>> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
>>> again. Since both byte swaps are nops on little-endian host, this works.
>>>
>>> This also works for big endian but rather by an accident: it reads 4 bytes
>>> from the stream (@val is big endian), converts to CPU format (which should
>>> be big endian) as it was little endian (and @val becomes actually little
>>> endian) and calls iowrite32() which does swapping on big endian
>>> system again. So byte swap gets cancelled, __raw_writel() receives
>>> a native value and then
>>> *(volatile unsigned int __force *)PCI_FIX_ADDR(addr) = v;
>>> just does the right thing.
>>
>> I am wrong here, sorry. This is what happens when you watch soccer between
>> 2am and 4am :)
>>
>>
>>>
>>> This removes byte swaps and makes use of ioread32be/iowrite32be
>>> (and 16bit versions) which do explicit byte swapping at the moment
>>> of write to a PCI register. PPC64 uses a special "Store Word
>>> Byte-Reverse Indexed X-form" instruction which does swap and store.
>>
>> No swapping is done here if we use ioread32be as it calls in_be32 and that
>> animal does "lwz" which is simple load from memory.
>>
>> So @val (16/32 bit variable on stack) will have different values on LE and
>> BE but since we do not handle it the host and just memcpy it to the buffer,
>> nothing breaks here.
>>
>>
>> So it should be like this:
>> ===
>> VFIO exposes BARs to user space as a byte stream so userspace can
>> read it using pread()/pwrite(). Since this is a byte stream, VFIO should
>> not do byte swapping and simply return values as it gets them from
>> PCI device and copy_to_user will save bytes in the correct
>> same true for writes.
>>
>> Instead, the existing code assumes that byte stream in read/write is
>> little-endian and it fixes endianness for values which it passes to
>> ioreadXX/iowriteXX helpers in native format. The IO helpers do swapping
>> again. Since both byte swaps are nops on little-endian host, this works.
>>
>> This also works for big endian but rather by an accident: it reads 4 bytes
>> from the stream (@val is big endian), converts to CPU format (which should
>> be big endian) as it was little endian (and @val becomes actually little
>> endian) and calls iowrite32() which does swapping on big endian
>> system again. So byte swap in the host gets cancelled and __raw_writel()
>> writes the value which was swapped originally by the guest.
>>
>> This removes byte swaps and makes use of ioread32be/iowrite32be
>> (and 16bit versions) which do not do byte swap on BE hosts.
>> For LE hosts, ioread32/iowrite32 are still used.
>>
>> ===
> 
> Working on big endian being an accident may be a matter of perspective.
> The comment remains that this patch doesn't actually fix anything except
> the overhead on big endian systems doing redundant byte swapping and
> maybe the philosophy that vfio regions are little endian.


They are little-endian only between ioread32() and copy_to_user() calls,
besides that it is a bytes stream which does not have endianness so I do
not understand the comment about philosophy...


> I'm still not a fan of iowrite vs iowritebe, there must be something we
> can use that doesn't have an implicit swap.  Calling it iowrite*_native
> is also an abuse of the namespace.  Next thing we know some common code
> will legitimately use that name.  If we do need to define an alias
> (which I'd like to avoid) it should be something like vfio_iowrite32.


We can still use __raw_writel&co, would that be ok?



> Thanks,
> 
> Alex
> 
>>> ===
>>>
>>> any better?
>>>
>>>
>>>
>>>
>>>>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> index 210db24..f363b5a 100644
>>>>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>> @@ -21,6 +21,18 @@
>>>>>>  
>>>>>>  #include "vfio_pci_private.h"
>>>>>>  
>>>>>> +#ifdef __BIG_ENDIAN__
>>>>>> +#define ioread16_native		ioread16be
>>>>>> +#define ioread32_native		ioread32be
>>>>>> +#define iowrite16_native	iowrite16be
>>>>>> +#define iowrite32_native	iowrite32be
>>>>>> +#else
>>>>>> +#define ioread16_native		ioread16
>>>>>> +#define ioread32_native		ioread32
>>>>>> +#define iowrite16_native	iowrite16
>>>>>> +#define iowrite32_native	iowrite32
>>>>>> +#endif
>>>>>> +
>>>>>>  /*
>>>>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>>>>>   * range which is inaccessible.  The excluded range drops writes and fills
>>>>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>>>  				if (copy_from_user(&val, buf, 4))
>>>>>>  					return -EFAULT;
>>>>>>  
>>>>>> -				iowrite32(le32_to_cpu(val), io + off);
>>>>>> +				iowrite32_native(val, io + off);
>>>>>>  			} else {
>>>>>> -				val = cpu_to_le32(ioread32(io + off));
>>>>>> +				val = ioread32_native(io + off);
>>>>>>  
>>>>>>  				if (copy_to_user(buf, &val, 4))
>>>>>>  					return -EFAULT;
>>>>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>>>  				if (copy_from_user(&val, buf, 2))
>>>>>>  					return -EFAULT;
>>>>>>  
>>>>>> -				iowrite16(le16_to_cpu(val), io + off);
>>>>>> +				iowrite16_native(val, io + off);
>>>>>>  			} else {
>>>>>> -				val = cpu_to_le16(ioread16(io + off));
>>>>>> +				val = ioread16_native(io + off);
>>>>>>  
>>>>>>  				if (copy_to_user(buf, &val, 2))
>>>>>>  					return -EFAULT;
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-20  3:21         ` Alex Williamson
  2014-06-20 14:14           ` Alexey Kardashevskiy
@ 2014-06-20 23:12           ` Benjamin Herrenschmidt
  2014-06-24 10:11             ` Alexey Kardashevskiy
  1 sibling, 1 reply; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-20 23:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, Alexey Kardashevskiy, linux-kernel,
	Alexander Graf, linuxppc-dev

On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

> Working on big endian being an accident may be a matter of perspective

 :-)

> The comment remains that this patch doesn't actually fix anything except
> the overhead on big endian systems doing redundant byte swapping and
> maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.

> I'm still not a fan of iowrite vs iowritebe, there must be something we
> can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects). 

>  Calling it iowrite*_native is also an abuse of the namespace.


>  Next thing we know some common code
> will legitimately use that name. 

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).

>  If we do need to define an alias
> (which I'd like to avoid) it should be something like vfio_iowrite32.
> Thanks,

Cheers,
Ben.

> Alex
> 
> > > ===
> > > 
> > > any better?
> > > 
> > > 
> > > 
> > > 
> > >>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > >>>> ---
> > >>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
> > >>>>  1 file changed, 16 insertions(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> > >>>> index 210db24..f363b5a 100644
> > >>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > >>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > >>>> @@ -21,6 +21,18 @@
> > >>>>  
> > >>>>  #include "vfio_pci_private.h"
> > >>>>  
> > >>>> +#ifdef __BIG_ENDIAN__
> > >>>> +#define ioread16_native		ioread16be
> > >>>> +#define ioread32_native		ioread32be
> > >>>> +#define iowrite16_native	iowrite16be
> > >>>> +#define iowrite32_native	iowrite32be
> > >>>> +#else
> > >>>> +#define ioread16_native		ioread16
> > >>>> +#define ioread32_native		ioread32
> > >>>> +#define iowrite16_native	iowrite16
> > >>>> +#define iowrite32_native	iowrite32
> > >>>> +#endif
> > >>>> +
> > >>>>  /*
> > >>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
> > >>>>   * range which is inaccessible.  The excluded range drops writes and fills
> > >>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
> > >>>>  				if (copy_from_user(&val, buf, 4))
> > >>>>  					return -EFAULT;
> > >>>>  
> > >>>> -				iowrite32(le32_to_cpu(val), io + off);
> > >>>> +				iowrite32_native(val, io + off);
> > >>>>  			} else {
> > >>>> -				val = cpu_to_le32(ioread32(io + off));
> > >>>> +				val = ioread32_native(io + off);
> > >>>>  
> > >>>>  				if (copy_to_user(buf, &val, 4))
> > >>>>  					return -EFAULT;
> > >>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
> > >>>>  				if (copy_from_user(&val, buf, 2))
> > >>>>  					return -EFAULT;
> > >>>>  
> > >>>> -				iowrite16(le16_to_cpu(val), io + off);
> > >>>> +				iowrite16_native(val, io + off);
> > >>>>  			} else {
> > >>>> -				val = cpu_to_le16(ioread16(io + off));
> > >>>> +				val = ioread16_native(io + off);
> > >>>>  
> > >>>>  				if (copy_to_user(buf, &val, 2))
> > >>>>  					return -EFAULT;
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > > 
> > > 
> > 
> > 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-20 14:14           ` Alexey Kardashevskiy
@ 2014-06-20 23:16             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-20 23:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Nikunj A Dadhania, linux-kernel, Alexander Graf,
	Alex Williamson, linuxppc-dev

On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote:

> We can still use __raw_writel&co, would that be ok?

No unless you understand precisely what kind of memory barriers each
platform require for these.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-20 23:12           ` Benjamin Herrenschmidt
@ 2014-06-24 10:11             ` Alexey Kardashevskiy
  2014-06-24 10:41               ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 10:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm, Alexander Graf

On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> 
>> Working on big endian being an accident may be a matter of perspective
> 
>  :-)
> 
>> The comment remains that this patch doesn't actually fix anything except
>> the overhead on big endian systems doing redundant byte swapping and
>> maybe the philosophy that vfio regions are little endian.
> 
> Yes, that works by accident because technically VFIO is a transport and
> thus shouldn't perform any endian swapping of any sort, which remains
> the responsibility of the end driver which is the only one to know
> whether a given BAR location is a a register or some streaming data
> and in the former case whether it's LE or BE (some PCI devices are BE
> even ! :-)
> 
> But yes, in the end, it works with the dual "cancelling" swaps and the
> overhead of those swaps is probably drowned in the noise of the syscall
> overhead.
> 
>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>> can use that doesn't have an implicit swap.
> 
> Sadly there isn't ... In the old day we didn't even have the "be"
> variant and readl/writel style accessors still don't have them either
> for all archs.
> 
> There is __raw_readl/writel but here the semantics are much more than
> just "don't swap", they also don't have memory barriers (which means
> they are essentially useless to most drivers unless those are platform
> specific drivers which know exactly what they are doing, or in the rare
> cases such as accessing a framebuffer which we know never have side
> effects). 
> 
>>  Calling it iowrite*_native is also an abuse of the namespace.
> 
> 
>>  Next thing we know some common code
>> will legitimately use that name. 
> 
> I might make sense to those definitions into a common header. There have
> been a handful of cases in the past that wanted that sort of "native
> byte order" MMIOs iirc (though don't ask me for examples, I can't really
> remember).
> 
>>  If we do need to define an alias
>> (which I'd like to avoid) it should be something like vfio_iowrite32.


Ping?

We need to make a decision whether to move those xxx_native() helpers
somewhere (where?) or leave the patch as is (as we figured out that
iowriteXX functions implement barriers and we cannot just use raw
accessors) and fix commit log to explain everything.

Thanks!




>> Thanks,
> 
> Cheers,
> Ben.
> 
>> Alex
>>
>>>> ===
>>>>
>>>> any better?
>>>>
>>>>
>>>>
>>>>
>>>>>>> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>> ---
>>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c | 20 ++++++++++++++++----
>>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>>> index 210db24..f363b5a 100644
>>>>>>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>>>>>>> @@ -21,6 +21,18 @@
>>>>>>>  
>>>>>>>  #include "vfio_pci_private.h"
>>>>>>>  
>>>>>>> +#ifdef __BIG_ENDIAN__
>>>>>>> +#define ioread16_native		ioread16be
>>>>>>> +#define ioread32_native		ioread32be
>>>>>>> +#define iowrite16_native	iowrite16be
>>>>>>> +#define iowrite32_native	iowrite32be
>>>>>>> +#else
>>>>>>> +#define ioread16_native		ioread16
>>>>>>> +#define ioread32_native		ioread32
>>>>>>> +#define iowrite16_native	iowrite16
>>>>>>> +#define iowrite32_native	iowrite32
>>>>>>> +#endif
>>>>>>> +
>>>>>>>  /*
>>>>>>>   * Read or write from an __iomem region (MMIO or I/O port) with an excluded
>>>>>>>   * range which is inaccessible.  The excluded range drops writes and fills
>>>>>>> @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>>>>  				if (copy_from_user(&val, buf, 4))
>>>>>>>  					return -EFAULT;
>>>>>>>  
>>>>>>> -				iowrite32(le32_to_cpu(val), io + off);
>>>>>>> +				iowrite32_native(val, io + off);
>>>>>>>  			} else {
>>>>>>> -				val = cpu_to_le32(ioread32(io + off));
>>>>>>> +				val = ioread32_native(io + off);
>>>>>>>  
>>>>>>>  				if (copy_to_user(buf, &val, 4))
>>>>>>>  					return -EFAULT;
>>>>>>> @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char __user *buf,
>>>>>>>  				if (copy_from_user(&val, buf, 2))
>>>>>>>  					return -EFAULT;
>>>>>>>  
>>>>>>> -				iowrite16(le16_to_cpu(val), io + off);
>>>>>>> +				iowrite16_native(val, io + off);
>>>>>>>  			} else {
>>>>>>> -				val = cpu_to_le16(ioread16(io + off));
>>>>>>> +				val = ioread16_native(io + off);
>>>>>>>  
>>>>>>>  				if (copy_to_user(buf, &val, 2))
>>>>>>>  					return -EFAULT;
>>>>>>
>>>>>>
>>>>>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 10:11             ` Alexey Kardashevskiy
@ 2014-06-24 10:41               ` Alexander Graf
  2014-06-24 12:50                 ` Alexey Kardashevskiy
  2014-06-24 21:46                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 26+ messages in thread
From: Alexander Graf @ 2014-06-24 10:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm


On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>
>>> Working on big endian being an accident may be a matter of perspective
>>   :-)
>>
>>> The comment remains that this patch doesn't actually fix anything except
>>> the overhead on big endian systems doing redundant byte swapping and
>>> maybe the philosophy that vfio regions are little endian.
>> Yes, that works by accident because technically VFIO is a transport and
>> thus shouldn't perform any endian swapping of any sort, which remains
>> the responsibility of the end driver which is the only one to know
>> whether a given BAR location is a a register or some streaming data
>> and in the former case whether it's LE or BE (some PCI devices are BE
>> even ! :-)
>>
>> But yes, in the end, it works with the dual "cancelling" swaps and the
>> overhead of those swaps is probably drowned in the noise of the syscall
>> overhead.
>>
>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>> can use that doesn't have an implicit swap.
>> Sadly there isn't ... In the old day we didn't even have the "be"
>> variant and readl/writel style accessors still don't have them either
>> for all archs.
>>
>> There is __raw_readl/writel but here the semantics are much more than
>> just "don't swap", they also don't have memory barriers (which means
>> they are essentially useless to most drivers unless those are platform
>> specific drivers which know exactly what they are doing, or in the rare
>> cases such as accessing a framebuffer which we know never have side
>> effects).
>>
>>>   Calling it iowrite*_native is also an abuse of the namespace.
>>
>>>   Next thing we know some common code
>>> will legitimately use that name.
>> I might make sense to those definitions into a common header. There have
>> been a handful of cases in the past that wanted that sort of "native
>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>> remember).
>>
>>>   If we do need to define an alias
>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>
> Ping?
>
> We need to make a decision whether to move those xxx_native() helpers
> somewhere (where?) or leave the patch as is (as we figured out that
> iowriteXX functions implement barriers and we cannot just use raw
> accessors) and fix commit log to explain everything.

Is there actually any difference in generated code with this patch 
applied and without? I would hope that iowrite..() is inlined and 
cancels out the cpu_to_le..() calls that are also inlined?


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 10:41               ` Alexander Graf
@ 2014-06-24 12:50                 ` Alexey Kardashevskiy
  2014-06-24 12:52                   ` Alexander Graf
  2014-06-24 21:46                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 12:50 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm

On 06/24/2014 08:41 PM, Alexander Graf wrote:
> 
> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>
>>>> Working on big endian being an accident may be a matter of perspective
>>>   :-)
>>>
>>>> The comment remains that this patch doesn't actually fix anything except
>>>> the overhead on big endian systems doing redundant byte swapping and
>>>> maybe the philosophy that vfio regions are little endian.
>>> Yes, that works by accident because technically VFIO is a transport and
>>> thus shouldn't perform any endian swapping of any sort, which remains
>>> the responsibility of the end driver which is the only one to know
>>> whether a given BAR location is a a register or some streaming data
>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>> even ! :-)
>>>
>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>> overhead of those swaps is probably drowned in the noise of the syscall
>>> overhead.
>>>
>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>> can use that doesn't have an implicit swap.
>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>> variant and readl/writel style accessors still don't have them either
>>> for all archs.
>>>
>>> There is __raw_readl/writel but here the semantics are much more than
>>> just "don't swap", they also don't have memory barriers (which means
>>> they are essentially useless to most drivers unless those are platform
>>> specific drivers which know exactly what they are doing, or in the rare
>>> cases such as accessing a framebuffer which we know never have side
>>> effects).
>>>
>>>>   Calling it iowrite*_native is also an abuse of the namespace.
>>>
>>>>   Next thing we know some common code
>>>> will legitimately use that name.
>>> I might make sense to those definitions into a common header. There have
>>> been a handful of cases in the past that wanted that sort of "native
>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>> remember).
>>>
>>>>   If we do need to define an alias
>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>
>> Ping?
>>
>> We need to make a decision whether to move those xxx_native() helpers
>> somewhere (where?) or leave the patch as is (as we figured out that
>> iowriteXX functions implement barriers and we cannot just use raw
>> accessors) and fix commit log to explain everything.
> 
> Is there actually any difference in generated code with this patch applied
> and without? I would hope that iowrite..() is inlined and cancels out the
> cpu_to_le..() calls that are also inlined?

iowrite32 is a non-inline function so conversions take place so are the
others. And sorry but I fail to see why this matters. We are not trying to
accelerate things, we are removing redundant operations which confuse
people who read the code.


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 12:50                 ` Alexey Kardashevskiy
@ 2014-06-24 12:52                   ` Alexander Graf
  2014-06-24 13:01                     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2014-06-24 12:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm


On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>>
>>>>> Working on big endian being an accident may be a matter of perspective
>>>>    :-)
>>>>
>>>>> The comment remains that this patch doesn't actually fix anything except
>>>>> the overhead on big endian systems doing redundant byte swapping and
>>>>> maybe the philosophy that vfio regions are little endian.
>>>> Yes, that works by accident because technically VFIO is a transport and
>>>> thus shouldn't perform any endian swapping of any sort, which remains
>>>> the responsibility of the end driver which is the only one to know
>>>> whether a given BAR location is a a register or some streaming data
>>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>>> even ! :-)
>>>>
>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>>> overhead of those swaps is probably drowned in the noise of the syscall
>>>> overhead.
>>>>
>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>>> can use that doesn't have an implicit swap.
>>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>>> variant and readl/writel style accessors still don't have them either
>>>> for all archs.
>>>>
>>>> There is __raw_readl/writel but here the semantics are much more than
>>>> just "don't swap", they also don't have memory barriers (which means
>>>> they are essentially useless to most drivers unless those are platform
>>>> specific drivers which know exactly what they are doing, or in the rare
>>>> cases such as accessing a framebuffer which we know never have side
>>>> effects).
>>>>
>>>>>    Calling it iowrite*_native is also an abuse of the namespace.
>>>>>    Next thing we know some common code
>>>>> will legitimately use that name.
>>>> I might make sense to those definitions into a common header. There have
>>>> been a handful of cases in the past that wanted that sort of "native
>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>>> remember).
>>>>
>>>>>    If we do need to define an alias
>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>> Ping?
>>>
>>> We need to make a decision whether to move those xxx_native() helpers
>>> somewhere (where?) or leave the patch as is (as we figured out that
>>> iowriteXX functions implement barriers and we cannot just use raw
>>> accessors) and fix commit log to explain everything.
>> Is there actually any difference in generated code with this patch applied
>> and without? I would hope that iowrite..() is inlined and cancels out the
>> cpu_to_le..() calls that are also inlined?
> iowrite32 is a non-inline function so conversions take place so are the
> others. And sorry but I fail to see why this matters. We are not trying to
> accelerate things, we are removing redundant operations which confuse
> people who read the code.

The confusion depends on where you're coming from. If you happen to know 
that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.

I don't have a strong feeling either way though and will let Alex decide 
on the path forward :).


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 12:52                   ` Alexander Graf
@ 2014-06-24 13:01                     ` Alexey Kardashevskiy
  2014-06-24 13:22                       ` Alexander Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 13:01 UTC (permalink / raw)
  To: Alexander Graf, Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm

On 06/24/2014 10:52 PM, Alexander Graf wrote:
> 
> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>>>
>>>>>> Working on big endian being an accident may be a matter of perspective
>>>>>    :-)
>>>>>
>>>>>> The comment remains that this patch doesn't actually fix anything except
>>>>>> the overhead on big endian systems doing redundant byte swapping and
>>>>>> maybe the philosophy that vfio regions are little endian.
>>>>> Yes, that works by accident because technically VFIO is a transport and
>>>>> thus shouldn't perform any endian swapping of any sort, which remains
>>>>> the responsibility of the end driver which is the only one to know
>>>>> whether a given BAR location is a a register or some streaming data
>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>>>> even ! :-)
>>>>>
>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>>>> overhead of those swaps is probably drowned in the noise of the syscall
>>>>> overhead.
>>>>>
>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>>>> can use that doesn't have an implicit swap.
>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>>>> variant and readl/writel style accessors still don't have them either
>>>>> for all archs.
>>>>>
>>>>> There is __raw_readl/writel but here the semantics are much more than
>>>>> just "don't swap", they also don't have memory barriers (which means
>>>>> they are essentially useless to most drivers unless those are platform
>>>>> specific drivers which know exactly what they are doing, or in the rare
>>>>> cases such as accessing a framebuffer which we know never have side
>>>>> effects).
>>>>>
>>>>>>    Calling it iowrite*_native is also an abuse of the namespace.
>>>>>>    Next thing we know some common code
>>>>>> will legitimately use that name.
>>>>> I might make sense to those definitions into a common header. There have
>>>>> been a handful of cases in the past that wanted that sort of "native
>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>>>> remember).
>>>>>
>>>>>>    If we do need to define an alias
>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>>> Ping?
>>>>
>>>> We need to make a decision whether to move those xxx_native() helpers
>>>> somewhere (where?) or leave the patch as is (as we figured out that
>>>> iowriteXX functions implement barriers and we cannot just use raw
>>>> accessors) and fix commit log to explain everything.
>>> Is there actually any difference in generated code with this patch applied
>>> and without? I would hope that iowrite..() is inlined and cancels out the
>>> cpu_to_le..() calls that are also inlined?
>> iowrite32 is a non-inline function so conversions take place so are the
>> others. And sorry but I fail to see why this matters. We are not trying to
>> accelerate things, we are removing redundant operations which confuse
>> people who read the code.
> 
> The confusion depends on where you're coming from. If you happen to know
> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.

It was like this (and this is just confusing):

iowrite32(le32_to_cpu(val), io + off);

What would make sense (according to you and I would understand this) is this:

iowrite32(cpu_to_le32(val), io + off);


Or I missed your point, did I?


> I don't have a strong feeling either way though and will let Alex decide on
> the path forward :)

It would probably help if you picked the side :)


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 13:01                     ` Alexey Kardashevskiy
@ 2014-06-24 13:22                       ` Alexander Graf
  2014-06-24 14:21                         ` Alex Williamson
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Graf @ 2014-06-24 13:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Benjamin Herrenschmidt, Alex Williamson
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm


On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> On 06/24/2014 10:52 PM, Alexander Graf wrote:
>> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>>>>
>>>>>>> Working on big endian being an accident may be a matter of perspective
>>>>>>     :-)
>>>>>>
>>>>>>> The comment remains that this patch doesn't actually fix anything except
>>>>>>> the overhead on big endian systems doing redundant byte swapping and
>>>>>>> maybe the philosophy that vfio regions are little endian.
>>>>>> Yes, that works by accident because technically VFIO is a transport and
>>>>>> thus shouldn't perform any endian swapping of any sort, which remains
>>>>>> the responsibility of the end driver which is the only one to know
>>>>>> whether a given BAR location is a a register or some streaming data
>>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>>>>> even ! :-)
>>>>>>
>>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>>>>> overhead of those swaps is probably drowned in the noise of the syscall
>>>>>> overhead.
>>>>>>
>>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>>>>> can use that doesn't have an implicit swap.
>>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>>>>> variant and readl/writel style accessors still don't have them either
>>>>>> for all archs.
>>>>>>
>>>>>> There is __raw_readl/writel but here the semantics are much more than
>>>>>> just "don't swap", they also don't have memory barriers (which means
>>>>>> they are essentially useless to most drivers unless those are platform
>>>>>> specific drivers which know exactly what they are doing, or in the rare
>>>>>> cases such as accessing a framebuffer which we know never have side
>>>>>> effects).
>>>>>>
>>>>>>>     Calling it iowrite*_native is also an abuse of the namespace.
>>>>>>>     Next thing we know some common code
>>>>>>> will legitimately use that name.
>>>>>> I might make sense to those definitions into a common header. There have
>>>>>> been a handful of cases in the past that wanted that sort of "native
>>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>>>>> remember).
>>>>>>
>>>>>>>     If we do need to define an alias
>>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>>>> Ping?
>>>>>
>>>>> We need to make a decision whether to move those xxx_native() helpers
>>>>> somewhere (where?) or leave the patch as is (as we figured out that
>>>>> iowriteXX functions implement barriers and we cannot just use raw
>>>>> accessors) and fix commit log to explain everything.
>>>> Is there actually any difference in generated code with this patch applied
>>>> and without? I would hope that iowrite..() is inlined and cancels out the
>>>> cpu_to_le..() calls that are also inlined?
>>> iowrite32 is a non-inline function so conversions take place so are the
>>> others. And sorry but I fail to see why this matters. We are not trying to
>>> accelerate things, we are removing redundant operations which confuse
>>> people who read the code.
>> The confusion depends on where you're coming from. If you happen to know
>> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
> It was like this (and this is just confusing):
>
> iowrite32(le32_to_cpu(val), io + off);
>
> What would make sense (according to you and I would understand this) is this:
>
> iowrite32(cpu_to_le32(val), io + off);
>
>
> Or I missed your point, did I?

No, you didn't miss it. I think for people who know how iowrite32() 
works the above is obvious. I find the fact that iowrite32() writes in 
LE always pretty scary though ;).

So IMHO we should either create new, generic iowrite helpers that don't 
do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.


Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 13:22                       ` Alexander Graf
@ 2014-06-24 14:21                         ` Alex Williamson
  2014-06-24 14:33                           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2014-06-24 14:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Nikunj A Dadhania, Alexey Kardashevskiy, linux-kernel, linuxppc-dev

On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> > On 06/24/2014 10:52 PM, Alexander Graf wrote:
> >> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> >>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
> >>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> >>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> >>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> >>>>>>
> >>>>>>> Working on big endian being an accident may be a matter of perspective
> >>>>>>     :-)
> >>>>>>
> >>>>>>> The comment remains that this patch doesn't actually fix anything except
> >>>>>>> the overhead on big endian systems doing redundant byte swapping and
> >>>>>>> maybe the philosophy that vfio regions are little endian.
> >>>>>> Yes, that works by accident because technically VFIO is a transport and
> >>>>>> thus shouldn't perform any endian swapping of any sort, which remains
> >>>>>> the responsibility of the end driver which is the only one to know
> >>>>>> whether a given BAR location is a a register or some streaming data
> >>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
> >>>>>> even ! :-)
> >>>>>>
> >>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
> >>>>>> overhead of those swaps is probably drowned in the noise of the syscall
> >>>>>> overhead.
> >>>>>>
> >>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
> >>>>>>> can use that doesn't have an implicit swap.
> >>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
> >>>>>> variant and readl/writel style accessors still don't have them either
> >>>>>> for all archs.
> >>>>>>
> >>>>>> There is __raw_readl/writel but here the semantics are much more than
> >>>>>> just "don't swap", they also don't have memory barriers (which means
> >>>>>> they are essentially useless to most drivers unless those are platform
> >>>>>> specific drivers which know exactly what they are doing, or in the rare
> >>>>>> cases such as accessing a framebuffer which we know never have side
> >>>>>> effects).
> >>>>>>
> >>>>>>>     Calling it iowrite*_native is also an abuse of the namespace.
> >>>>>>>     Next thing we know some common code
> >>>>>>> will legitimately use that name.
> >>>>>> I might make sense to those definitions into a common header. There have
> >>>>>> been a handful of cases in the past that wanted that sort of "native
> >>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
> >>>>>> remember).
> >>>>>>
> >>>>>>>     If we do need to define an alias
> >>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
> >>>>> Ping?
> >>>>>
> >>>>> We need to make a decision whether to move those xxx_native() helpers
> >>>>> somewhere (where?) or leave the patch as is (as we figured out that
> >>>>> iowriteXX functions implement barriers and we cannot just use raw
> >>>>> accessors) and fix commit log to explain everything.
> >>>> Is there actually any difference in generated code with this patch applied
> >>>> and without? I would hope that iowrite..() is inlined and cancels out the
> >>>> cpu_to_le..() calls that are also inlined?
> >>> iowrite32 is a non-inline function so conversions take place so are the
> >>> others. And sorry but I fail to see why this matters. We are not trying to
> >>> accelerate things, we are removing redundant operations which confuse
> >>> people who read the code.
> >> The confusion depends on where you're coming from. If you happen to know
> >> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
> > It was like this (and this is just confusing):
> >
> > iowrite32(le32_to_cpu(val), io + off);
> >
> > What would make sense (according to you and I would understand this) is this:
> >
> > iowrite32(cpu_to_le32(val), io + off);
> >
> >
> > Or I missed your point, did I?
> 
> No, you didn't miss it. I think for people who know how iowrite32() 
> works the above is obvious. I find the fact that iowrite32() writes in 
> LE always pretty scary though ;).
> 
> So IMHO we should either create new, generic iowrite helpers that don't 
> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.

I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
and keeps the byte order consistent regardless of the platform, while
iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
remember that the byte swaps are a nop on the given platforms.  As Ben
noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
I'd prefer an attempt be made to make it exist before adding
vfio-specific macros.  vfio is arguably doing the right thing here given
the functions available.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 14:21                         ` Alex Williamson
@ 2014-06-24 14:33                           ` Alexey Kardashevskiy
  2014-06-24 14:40                             ` David Laight
                                               ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 14:33 UTC (permalink / raw)
  To: Alex Williamson, Alexander Graf
  Cc: Nikunj A Dadhania, linuxppc-dev, linux-kernel, kvm

On 06/25/2014 12:21 AM, Alex Williamson wrote:
> On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
>> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
>>> On 06/24/2014 10:52 PM, Alexander Graf wrote:
>>>> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>>>>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>>>>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>>>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>>>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>>>>>>
>>>>>>>>> Working on big endian being an accident may be a matter of perspective
>>>>>>>>     :-)
>>>>>>>>
>>>>>>>>> The comment remains that this patch doesn't actually fix anything except
>>>>>>>>> the overhead on big endian systems doing redundant byte swapping and
>>>>>>>>> maybe the philosophy that vfio regions are little endian.
>>>>>>>> Yes, that works by accident because technically VFIO is a transport and
>>>>>>>> thus shouldn't perform any endian swapping of any sort, which remains
>>>>>>>> the responsibility of the end driver which is the only one to know
>>>>>>>> whether a given BAR location is a a register or some streaming data
>>>>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>>>>>>> even ! :-)
>>>>>>>>
>>>>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>>>>>>> overhead of those swaps is probably drowned in the noise of the syscall
>>>>>>>> overhead.
>>>>>>>>
>>>>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>>>>>>> can use that doesn't have an implicit swap.
>>>>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>>>>>>> variant and readl/writel style accessors still don't have them either
>>>>>>>> for all archs.
>>>>>>>>
>>>>>>>> There is __raw_readl/writel but here the semantics are much more than
>>>>>>>> just "don't swap", they also don't have memory barriers (which means
>>>>>>>> they are essentially useless to most drivers unless those are platform
>>>>>>>> specific drivers which know exactly what they are doing, or in the rare
>>>>>>>> cases such as accessing a framebuffer which we know never have side
>>>>>>>> effects).
>>>>>>>>
>>>>>>>>>     Calling it iowrite*_native is also an abuse of the namespace.
>>>>>>>>>     Next thing we know some common code
>>>>>>>>> will legitimately use that name.
>>>>>>>> I might make sense to those definitions into a common header. There have
>>>>>>>> been a handful of cases in the past that wanted that sort of "native
>>>>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>>>>>>> remember).
>>>>>>>>
>>>>>>>>>     If we do need to define an alias
>>>>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>>>>>> Ping?
>>>>>>>
>>>>>>> We need to make a decision whether to move those xxx_native() helpers
>>>>>>> somewhere (where?) or leave the patch as is (as we figured out that
>>>>>>> iowriteXX functions implement barriers and we cannot just use raw
>>>>>>> accessors) and fix commit log to explain everything.
>>>>>> Is there actually any difference in generated code with this patch applied
>>>>>> and without? I would hope that iowrite..() is inlined and cancels out the
>>>>>> cpu_to_le..() calls that are also inlined?
>>>>> iowrite32 is a non-inline function so conversions take place so are the
>>>>> others. And sorry but I fail to see why this matters. We are not trying to
>>>>> accelerate things, we are removing redundant operations which confuse
>>>>> people who read the code.
>>>> The confusion depends on where you're coming from. If you happen to know
>>>> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
>>> It was like this (and this is just confusing):
>>>
>>> iowrite32(le32_to_cpu(val), io + off);
>>>
>>> What would make sense (according to you and I would understand this) is this:
>>>
>>> iowrite32(cpu_to_le32(val), io + off);
>>>
>>>
>>> Or I missed your point, did I?
>>
>> No, you didn't miss it. I think for people who know how iowrite32() 
>> works the above is obvious. I find the fact that iowrite32() writes in 
>> LE always pretty scary though ;).
>>
>> So IMHO we should either create new, generic iowrite helpers that don't 
>> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
> 
> I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense


I do not understand why @val is considered LE here and need to be converted
to CPU. Really. I truly believe it should be cpu_to_le32().


> and keeps the byte order consistent regardless of the platform, while
> iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
> remember that the byte swaps are a nop on the given platforms.  As Ben
> noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
> I'd prefer an attempt be made to make it exist before adding
> vfio-specific macros.  vfio is arguably doing the right thing here given
> the functions available.  Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 14:33                           ` Alexey Kardashevskiy
@ 2014-06-24 14:40                             ` David Laight
  2014-06-24 14:43                             ` Alex Williamson
  2014-06-24 21:54                             ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 26+ messages in thread
From: David Laight @ 2014-06-24 14:40 UTC (permalink / raw)
  To: 'Alexey Kardashevskiy', Alex Williamson, Alexander Graf
  Cc: linuxppc-dev, linux-kernel, Nikunj A Dadhania, kvm

RnJvbTogQWxleGV5IEthcmRhc2hldnNraXkNCi4uLg0KPiA+PiBTbyBJTUhPIHdlIHNob3VsZCBl
aXRoZXIgY3JlYXRlIG5ldywgZ2VuZXJpYyBpb3dyaXRlIGhlbHBlcnMgdGhhdCBkb24ndA0KPiA+
PiBkbyBhbnkgZW5kaWFuIHN3YXBwaW5nIGF0IGFsbCBvciBkbyBpb3dyaXRlMzIoY3B1X3RvX2xl
MzIodmFsKSkgY2FsbHMuDQo+ID4NCj4gPiBJJ20gb25lIG9mIHRob3NlIHBlb3BsZSBmb3Igd2hv
bSBpb3dyaXRlMzIobGUzMl90b19jcHUodmFsKSkgbWFrZXMgc2Vuc2UNCj4gDQo+IA0KPiBJIGRv
IG5vdCB1bmRlcnN0YW5kIHdoeSBAdmFsIGlzIGNvbnNpZGVyZWQgTEUgaGVyZSBhbmQgbmVlZCB0
byBiZSBjb252ZXJ0ZWQNCj4gdG8gQ1BVLiBSZWFsbHkuIEkgdHJ1bHkgYmVsaWV2ZSBpdCBzaG91
bGQgYmUgY3B1X3RvX2xlMzIoKS4NCg0KVGhpbmsgYWJvdXQgaXQgY2FyZWZ1bGx5Li4uDQoNCkFw
cGFyZW50bHkgaW93cml0ZTMyKCkgaXMgd3JpdGluZyBhICdjcHUnIHZhbHVlIG91dCAnbGUnLg0K
U28gaWYgeW91IGhhdmUgYSAnbGUnIHZhbHVlIHlvdSBuZWVkIHRvIGNvbnZlcnQgaXQgdG8gJ2Nw
dScgZmlyc3QuDQoNCkkgcmVhbGx5IHdvbid0IGFzayBob3cgJ2JlJyBwcGMgbWFuYWdlZCB0byBn
ZXQgJ2xlJyBvbi1jaGlwIHBlcmlwaGVyYWxzLg0KDQoJRGF2aWQNCg0K

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 14:33                           ` Alexey Kardashevskiy
  2014-06-24 14:40                             ` David Laight
@ 2014-06-24 14:43                             ` Alex Williamson
  2014-06-24 16:26                               ` Alexey Kardashevskiy
  2014-06-24 21:54                             ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2014-06-24 14:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Nikunj A Dadhania, Alexander Graf, linux-kernel, linuxppc-dev

On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
> On 06/25/2014 12:21 AM, Alex Williamson wrote:
> > On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
> >> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
> >>> On 06/24/2014 10:52 PM, Alexander Graf wrote:
> >>>> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
> >>>>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
> >>>>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
> >>>>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
> >>>>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
> >>>>>>>>
> >>>>>>>>> Working on big endian being an accident may be a matter of perspective
> >>>>>>>>     :-)
> >>>>>>>>
> >>>>>>>>> The comment remains that this patch doesn't actually fix anything except
> >>>>>>>>> the overhead on big endian systems doing redundant byte swapping and
> >>>>>>>>> maybe the philosophy that vfio regions are little endian.
> >>>>>>>> Yes, that works by accident because technically VFIO is a transport and
> >>>>>>>> thus shouldn't perform any endian swapping of any sort, which remains
> >>>>>>>> the responsibility of the end driver which is the only one to know
> >>>>>>>> whether a given BAR location is a a register or some streaming data
> >>>>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
> >>>>>>>> even ! :-)
> >>>>>>>>
> >>>>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
> >>>>>>>> overhead of those swaps is probably drowned in the noise of the syscall
> >>>>>>>> overhead.
> >>>>>>>>
> >>>>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
> >>>>>>>>> can use that doesn't have an implicit swap.
> >>>>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
> >>>>>>>> variant and readl/writel style accessors still don't have them either
> >>>>>>>> for all archs.
> >>>>>>>>
> >>>>>>>> There is __raw_readl/writel but here the semantics are much more than
> >>>>>>>> just "don't swap", they also don't have memory barriers (which means
> >>>>>>>> they are essentially useless to most drivers unless those are platform
> >>>>>>>> specific drivers which know exactly what they are doing, or in the rare
> >>>>>>>> cases such as accessing a framebuffer which we know never have side
> >>>>>>>> effects).
> >>>>>>>>
> >>>>>>>>>     Calling it iowrite*_native is also an abuse of the namespace.
> >>>>>>>>>     Next thing we know some common code
> >>>>>>>>> will legitimately use that name.
> >>>>>>>> I might make sense to those definitions into a common header. There have
> >>>>>>>> been a handful of cases in the past that wanted that sort of "native
> >>>>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
> >>>>>>>> remember).
> >>>>>>>>
> >>>>>>>>>     If we do need to define an alias
> >>>>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
> >>>>>>> Ping?
> >>>>>>>
> >>>>>>> We need to make a decision whether to move those xxx_native() helpers
> >>>>>>> somewhere (where?) or leave the patch as is (as we figured out that
> >>>>>>> iowriteXX functions implement barriers and we cannot just use raw
> >>>>>>> accessors) and fix commit log to explain everything.
> >>>>>> Is there actually any difference in generated code with this patch applied
> >>>>>> and without? I would hope that iowrite..() is inlined and cancels out the
> >>>>>> cpu_to_le..() calls that are also inlined?
> >>>>> iowrite32 is a non-inline function so conversions take place so are the
> >>>>> others. And sorry but I fail to see why this matters. We are not trying to
> >>>>> accelerate things, we are removing redundant operations which confuse
> >>>>> people who read the code.
> >>>> The confusion depends on where you're coming from. If you happen to know
> >>>> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
> >>> It was like this (and this is just confusing):
> >>>
> >>> iowrite32(le32_to_cpu(val), io + off);
> >>>
> >>> What would make sense (according to you and I would understand this) is this:
> >>>
> >>> iowrite32(cpu_to_le32(val), io + off);
> >>>
> >>>
> >>> Or I missed your point, did I?
> >>
> >> No, you didn't miss it. I think for people who know how iowrite32() 
> >> works the above is obvious. I find the fact that iowrite32() writes in 
> >> LE always pretty scary though ;).
> >>
> >> So IMHO we should either create new, generic iowrite helpers that don't 
> >> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
> > 
> > I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
> 
> 
> I do not understand why @val is considered LE here and need to be converted
> to CPU. Really. I truly believe it should be cpu_to_le32().

Because iowrite32 is defined to take a cpu byte order value and write it
as little endian.

> > and keeps the byte order consistent regardless of the platform, while
> > iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
> > remember that the byte swaps are a nop on the given platforms.  As Ben
> > noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
> > I'd prefer an attempt be made to make it exist before adding
> > vfio-specific macros.  vfio is arguably doing the right thing here given
> > the functions available.  Thanks,
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 14:43                             ` Alex Williamson
@ 2014-06-24 16:26                               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-24 16:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Nikunj A Dadhania, Alexander Graf, linux-kernel, linuxppc-dev

On 06/25/2014 12:43 AM, Alex Williamson wrote:
> On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
>> On 06/25/2014 12:21 AM, Alex Williamson wrote:
>>> On Tue, 2014-06-24 at 15:22 +0200, Alexander Graf wrote:
>>>> On 24.06.14 15:01, Alexey Kardashevskiy wrote:
>>>>> On 06/24/2014 10:52 PM, Alexander Graf wrote:
>>>>>> On 24.06.14 14:50, Alexey Kardashevskiy wrote:
>>>>>>> On 06/24/2014 08:41 PM, Alexander Graf wrote:
>>>>>>>> On 24.06.14 12:11, Alexey Kardashevskiy wrote:
>>>>>>>>> On 06/21/2014 09:12 AM, Benjamin Herrenschmidt wrote:
>>>>>>>>>> On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:
>>>>>>>>>>
>>>>>>>>>>> Working on big endian being an accident may be a matter of perspective
>>>>>>>>>>     :-)
>>>>>>>>>>
>>>>>>>>>>> The comment remains that this patch doesn't actually fix anything except
>>>>>>>>>>> the overhead on big endian systems doing redundant byte swapping and
>>>>>>>>>>> maybe the philosophy that vfio regions are little endian.
>>>>>>>>>> Yes, that works by accident because technically VFIO is a transport and
>>>>>>>>>> thus shouldn't perform any endian swapping of any sort, which remains
>>>>>>>>>> the responsibility of the end driver which is the only one to know
>>>>>>>>>> whether a given BAR location is a a register or some streaming data
>>>>>>>>>> and in the former case whether it's LE or BE (some PCI devices are BE
>>>>>>>>>> even ! :-)
>>>>>>>>>>
>>>>>>>>>> But yes, in the end, it works with the dual "cancelling" swaps and the
>>>>>>>>>> overhead of those swaps is probably drowned in the noise of the syscall
>>>>>>>>>> overhead.
>>>>>>>>>>
>>>>>>>>>>> I'm still not a fan of iowrite vs iowritebe, there must be something we
>>>>>>>>>>> can use that doesn't have an implicit swap.
>>>>>>>>>> Sadly there isn't ... In the old day we didn't even have the "be"
>>>>>>>>>> variant and readl/writel style accessors still don't have them either
>>>>>>>>>> for all archs.
>>>>>>>>>>
>>>>>>>>>> There is __raw_readl/writel but here the semantics are much more than
>>>>>>>>>> just "don't swap", they also don't have memory barriers (which means
>>>>>>>>>> they are essentially useless to most drivers unless those are platform
>>>>>>>>>> specific drivers which know exactly what they are doing, or in the rare
>>>>>>>>>> cases such as accessing a framebuffer which we know never have side
>>>>>>>>>> effects).
>>>>>>>>>>
>>>>>>>>>>>     Calling it iowrite*_native is also an abuse of the namespace.
>>>>>>>>>>>     Next thing we know some common code
>>>>>>>>>>> will legitimately use that name.
>>>>>>>>>> I might make sense to those definitions into a common header. There have
>>>>>>>>>> been a handful of cases in the past that wanted that sort of "native
>>>>>>>>>> byte order" MMIOs iirc (though don't ask me for examples, I can't really
>>>>>>>>>> remember).
>>>>>>>>>>
>>>>>>>>>>>     If we do need to define an alias
>>>>>>>>>>> (which I'd like to avoid) it should be something like vfio_iowrite32.
>>>>>>>>> Ping?
>>>>>>>>>
>>>>>>>>> We need to make a decision whether to move those xxx_native() helpers
>>>>>>>>> somewhere (where?) or leave the patch as is (as we figured out that
>>>>>>>>> iowriteXX functions implement barriers and we cannot just use raw
>>>>>>>>> accessors) and fix commit log to explain everything.
>>>>>>>> Is there actually any difference in generated code with this patch applied
>>>>>>>> and without? I would hope that iowrite..() is inlined and cancels out the
>>>>>>>> cpu_to_le..() calls that are also inlined?
>>>>>>> iowrite32 is a non-inline function so conversions take place so are the
>>>>>>> others. And sorry but I fail to see why this matters. We are not trying to
>>>>>>> accelerate things, we are removing redundant operations which confuse
>>>>>>> people who read the code.
>>>>>> The confusion depends on where you're coming from. If you happen to know
>>>>>> that "iowrite32" writes in LE, then the LE conversion makes a lot of sense.
>>>>> It was like this (and this is just confusing):
>>>>>
>>>>> iowrite32(le32_to_cpu(val), io + off);
>>>>>
>>>>> What would make sense (according to you and I would understand this) is this:
>>>>>
>>>>> iowrite32(cpu_to_le32(val), io + off);
>>>>>
>>>>>
>>>>> Or I missed your point, did I?
>>>>
>>>> No, you didn't miss it. I think for people who know how iowrite32() 
>>>> works the above is obvious. I find the fact that iowrite32() writes in 
>>>> LE always pretty scary though ;).
>>>>
>>>> So IMHO we should either create new, generic iowrite helpers that don't 
>>>> do any endian swapping at all or do iowrite32(cpu_to_le32(val)) calls.
>>>
>>> I'm one of those people for whom iowrite32(le32_to_cpu(val)) makes sense
>>
>>
>> I do not understand why @val is considered LE here and need to be converted
>> to CPU. Really. I truly believe it should be cpu_to_le32().
> 
> Because iowrite32 is defined to take a cpu byte order value and write it
> as little endian.


Ok, then neither le32_to_cpu() nor cpu_to_le32() should be there at all, if
we are talking about not scratching anyone's head :)



>>> and keeps the byte order consistent regardless of the platform, while
>>> iowrite32(val) or iowrite32be(val) makes me scratch my head and try to
>>> remember that the byte swaps are a nop on the given platforms.  As Ben
>>> noted, a native, no-swap ioread/write doesn't exist, but perhaps should.
>>> I'd prefer an attempt be made to make it exist before adding
>>> vfio-specific macros.  vfio is arguably doing the right thing here given
>>> the functions available.  Thanks,


I do not mind to make that atempt but what exactly would make sense here?
Try moving macros to include/asm-generic/io.h? Something else? Thanks.



-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 10:41               ` Alexander Graf
  2014-06-24 12:50                 ` Alexey Kardashevskiy
@ 2014-06-24 21:46                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-24 21:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Nikunj A Dadhania, Alexey Kardashevskiy, linux-kernel,
	Alex Williamson, linuxppc-dev

On Tue, 2014-06-24 at 12:41 +0200, Alexander Graf wrote:
> Is there actually any difference in generated code with this patch 
> applied and without? I would hope that iowrite..() is inlined and 
> cancels out the cpu_to_le..() calls that are also inlined?

No, the former uses byteswapping asm, the compiler can't "cancel" it
out, but the overhead of the additional byteswap might not be
measurable.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 14:33                           ` Alexey Kardashevskiy
  2014-06-24 14:40                             ` David Laight
  2014-06-24 14:43                             ` Alex Williamson
@ 2014-06-24 21:54                             ` Benjamin Herrenschmidt
  2014-06-25  2:43                               ` Alexey Kardashevskiy
  2 siblings, 1 reply; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-24 21:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Nikunj A Dadhania, Alexander Graf, linux-kernel,
	Alex Williamson, linuxppc-dev

On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
> 
> I do not understand why @val is considered LE here and need to be
> converted
> to CPU. Really. I truly believe it should be cpu_to_le32().

No. Both are slightly wrong semantically but le32_to_cpu() is less
wrong :-)

iowrite32 supposedly takes a "cpu" value as argument and writes an "le"
value. So if anything, you need something that converts to a "cpu" value
before you call iowrite32.

Now it's still slightly wrong because the "input" to le32_to_cpu() is
supposed to be an "LE" value but of course here it's not, it's a "cpu"
value too :-)

But yes, I agree with aw here, either do nothing or stick a set of
iowriteXX_native or something equivalent in the generic iomap header,
define them in term of iowrite32be/iowrite32 based on the compile time
endian of the arch.

Hitting asm-generic/iomap.h I think will cover all archs except ARM.
For ARM, just hack arch/arm/asm/io.h

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vfio: Fix endianness handling for emulated BARs
  2014-06-24 21:54                             ` Benjamin Herrenschmidt
@ 2014-06-25  2:43                               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Alexey Kardashevskiy @ 2014-06-25  2:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Nikunj A Dadhania, Alexander Graf, linux-kernel,
	Alex Williamson, linuxppc-dev

On 06/25/2014 07:54 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-25 at 00:33 +1000, Alexey Kardashevskiy wrote:
>>
>> I do not understand why @val is considered LE here and need to be
>> converted
>> to CPU. Really. I truly believe it should be cpu_to_le32().
> 
> No. Both are slightly wrong semantically but le32_to_cpu() is less
> wrong :-)
>
> iowrite32 supposedly takes a "cpu" value as argument and writes an "le"
> value. So if anything, you need something that converts to a "cpu" value
> before you call iowrite32.
> 
> Now it's still slightly wrong because the "input" to le32_to_cpu() is
> supposed to be an "LE" value but of course here it's not, it's a "cpu"
> value too :-)
> 
> But yes, I agree with aw here, either do nothing or stick a set of
> iowriteXX_native or something equivalent in the generic iomap header,
> define them in term of iowrite32be/iowrite32 based on the compile time
> endian of the arch.


Ok. I'll do nothing.


> Hitting asm-generic/iomap.h I think will cover all archs except ARM.
> For ARM, just hack arch/arm/asm/io.h


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-06-25  2:43 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-18 11:36 [PATCH] vfio: Fix endianness handling for emulated BARs Alexey Kardashevskiy
2014-06-18 18:35 ` Alex Williamson
2014-06-19  0:50   ` Alexey Kardashevskiy
2014-06-19  1:50     ` Alexey Kardashevskiy
2014-06-19  1:50     ` Alexey Kardashevskiy
2014-06-19  3:48       ` Alexey Kardashevskiy
2014-06-19  5:30         ` Bharat.Bhushan
2014-06-19  6:17           ` Alexey Kardashevskiy
2014-06-20  3:21         ` Alex Williamson
2014-06-20 14:14           ` Alexey Kardashevskiy
2014-06-20 23:16             ` Benjamin Herrenschmidt
2014-06-20 23:12           ` Benjamin Herrenschmidt
2014-06-24 10:11             ` Alexey Kardashevskiy
2014-06-24 10:41               ` Alexander Graf
2014-06-24 12:50                 ` Alexey Kardashevskiy
2014-06-24 12:52                   ` Alexander Graf
2014-06-24 13:01                     ` Alexey Kardashevskiy
2014-06-24 13:22                       ` Alexander Graf
2014-06-24 14:21                         ` Alex Williamson
2014-06-24 14:33                           ` Alexey Kardashevskiy
2014-06-24 14:40                             ` David Laight
2014-06-24 14:43                             ` Alex Williamson
2014-06-24 16:26                               ` Alexey Kardashevskiy
2014-06-24 21:54                             ` Benjamin Herrenschmidt
2014-06-25  2:43                               ` Alexey Kardashevskiy
2014-06-24 21:46                 ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).