All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH kernel] vfio-pci: Allow write combining
@ 2017-10-09  2:50 Alexey Kardashevskiy
  2017-10-10 21:55 ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-09  2:50 UTC (permalink / raw)
  To: kvm
  Cc: Alexey Kardashevskiy, David Gibson, Eric Auger,
	Benjamin Herrenschmidt, Alex Williamson

At the moment the protection in VFIO MMIO mappings is forced to
_PAGE_NON_IDEMPOTENT which means that write combining is not really
available to the userspace even for prefetchable 64bit MMIO BARs.

This replaces pgprot_noncached() with a platform specific
phys_mem_access_prot() when available and depending on the platform
the vm_page_prot may be set to _PAGE_TOLERANT allowing to exploit
the write combining feature.

The guest drivers still have to use _wc versions of
the ioremap/pci_ioremap API to get write combininig working.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This should allow DPDK and radix guests (x86, POWERPC, etc) to
do write combining.

POWERPC hash guests should not be affected by this change, it should
work even without this.
---
 drivers/vfio/pci/vfio_pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index f041b1a6cf66..014192b42724 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1156,8 +1156,13 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	}
 
 	vma->vm_private_data = vdev;
-	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
+#ifdef __HAVE_PHYS_MEM_ACCESS_PROT
+	vma->vm_page_prot = phys_mem_access_prot(NULL, vma->vm_pgoff,
+			req_len, vma->vm_page_prot);
+#else
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+#endif
 
 	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			       req_len, vma->vm_page_prot);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-09  2:50 [RFC PATCH kernel] vfio-pci: Allow write combining Alexey Kardashevskiy
@ 2017-10-10 21:55 ` Alex Williamson
  2017-10-11  2:05   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-10-10 21:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, David Gibson, Eric Auger, Benjamin Herrenschmidt

On Mon,  9 Oct 2017 13:50:00 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment the protection in VFIO MMIO mappings is forced to
> _PAGE_NON_IDEMPOTENT which means that write combining is not really
> available to the userspace even for prefetchable 64bit MMIO BARs.
> 
> This replaces pgprot_noncached() with a platform specific
> phys_mem_access_prot() when available and depending on the platform
> the vm_page_prot may be set to _PAGE_TOLERANT allowing to exploit
> the write combining feature.
> 
> The guest drivers still have to use _wc versions of
> the ioremap/pci_ioremap API to get write combininig working.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This should allow DPDK and radix guests (x86, POWERPC, etc) to
> do write combining.
> 
> POWERPC hash guests should not be affected by this change, it should
> work even without this.
> ---
>  drivers/vfio/pci/vfio_pci.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index f041b1a6cf66..014192b42724 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1156,8 +1156,13 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>  	}
>  
>  	vma->vm_private_data = vdev;
> -	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>  	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> +#ifdef __HAVE_PHYS_MEM_ACCESS_PROT
> +	vma->vm_page_prot = phys_mem_access_prot(NULL, vma->vm_pgoff,
> +			req_len, vma->vm_page_prot);
> +#else
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#endif
>  
>  	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>  			       req_len, vma->vm_page_prot);

Are you testing __HAVE_PHYS_MEM_ACCESS_PROT because the version of
phys_mem_access_prot() defined in drivers/char/mem.c can dereference
@file and we're hoping that platforms we care about won't both define
__HAVE_PHYS_MEM_ACCESS_PROT and look at @file?  Sketchy.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-10 21:55 ` Alex Williamson
@ 2017-10-11  2:05   ` Alexey Kardashevskiy
  2017-10-11  2:42     ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-11  2:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, David Gibson, Eric Auger, Benjamin Herrenschmidt

On 11/10/17 08:55, Alex Williamson wrote:
> On Mon,  9 Oct 2017 13:50:00 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> At the moment the protection in VFIO MMIO mappings is forced to
>> _PAGE_NON_IDEMPOTENT which means that write combining is not really
>> available to the userspace even for prefetchable 64bit MMIO BARs.
>>
>> This replaces pgprot_noncached() with a platform specific
>> phys_mem_access_prot() when available and depending on the platform
>> the vm_page_prot may be set to _PAGE_TOLERANT allowing to exploit
>> the write combining feature.
>>
>> The guest drivers still have to use _wc versions of
>> the ioremap/pci_ioremap API to get write combininig working.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This should allow DPDK and radix guests (x86, POWERPC, etc) to
>> do write combining.
>>
>> POWERPC hash guests should not be affected by this change, it should
>> work even without this.
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index f041b1a6cf66..014192b42724 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -1156,8 +1156,13 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>>  	}
>>  
>>  	vma->vm_private_data = vdev;
>> -	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>  	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
>> +#ifdef __HAVE_PHYS_MEM_ACCESS_PROT
>> +	vma->vm_page_prot = phys_mem_access_prot(NULL, vma->vm_pgoff,
>> +			req_len, vma->vm_page_prot);
>> +#else
>> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> +#endif
>>  
>>  	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>>  			       req_len, vma->vm_page_prot);
> 
> Are you testing __HAVE_PHYS_MEM_ACCESS_PROT because the version of
> phys_mem_access_prot() defined in drivers/char/mem.c can dereference
> @file and we're hoping that platforms we care about won't both define
> __HAVE_PHYS_MEM_ACCESS_PROT and look at @file?  

No.

That version in mem.c is static and not exported at all and I do not
understand why it got this name. Every other instance of
phys_mem_access_prot is accompanied by

#define __HAVE_PHYS_MEM_ACCESS_PROT

But 3 instances (ia64, x86, mips) are not exported (arm, arm64, ppc are)
and v2 of this will come with 3 more single line patches, if we decide to
proceed.

The only version which actually looks at @file is in
arch/mips/loongson64/common/mem.c and I do not know what to do about it
(can it do VFIO at all?), I could pass a file there but no actual code
would use it anyway.


> Sketchy.  Thanks,
> 
> Alex
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-11  2:05   ` Alexey Kardashevskiy
@ 2017-10-11  2:42     ` Alex Williamson
  2017-10-11  2:56       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-10-11  2:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, David Gibson, Eric Auger, Benjamin Herrenschmidt

On Wed, 11 Oct 2017 13:05:00 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 11/10/17 08:55, Alex Williamson wrote:
> > On Mon,  9 Oct 2017 13:50:00 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> At the moment the protection in VFIO MMIO mappings is forced to
> >> _PAGE_NON_IDEMPOTENT which means that write combining is not really
> >> available to the userspace even for prefetchable 64bit MMIO BARs.
> >>
> >> This replaces pgprot_noncached() with a platform specific
> >> phys_mem_access_prot() when available and depending on the platform
> >> the vm_page_prot may be set to _PAGE_TOLERANT allowing to exploit
> >> the write combining feature.
> >>
> >> The guest drivers still have to use _wc versions of
> >> the ioremap/pci_ioremap API to get write combininig working.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>
> >> This should allow DPDK and radix guests (x86, POWERPC, etc) to
> >> do write combining.
> >>
> >> POWERPC hash guests should not be affected by this change, it should
> >> work even without this.
> >> ---
> >>  drivers/vfio/pci/vfio_pci.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >> index f041b1a6cf66..014192b42724 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -1156,8 +1156,13 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> >>  	}
> >>  
> >>  	vma->vm_private_data = vdev;
> >> -	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >>  	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> >> +#ifdef __HAVE_PHYS_MEM_ACCESS_PROT
> >> +	vma->vm_page_prot = phys_mem_access_prot(NULL, vma->vm_pgoff,
> >> +			req_len, vma->vm_page_prot);
> >> +#else
> >> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >> +#endif
> >>  
> >>  	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> >>  			       req_len, vma->vm_page_prot);  
> > 
> > Are you testing __HAVE_PHYS_MEM_ACCESS_PROT because the version of
> > phys_mem_access_prot() defined in drivers/char/mem.c can dereference
> > @file and we're hoping that platforms we care about won't both define
> > __HAVE_PHYS_MEM_ACCESS_PROT and look at @file?    
> 
> No.
> 
> That version in mem.c is static and not exported at all and I do not
> understand why it got this name. Every other instance of
> phys_mem_access_prot is accompanied by
> 
> #define __HAVE_PHYS_MEM_ACCESS_PROT

I did miss that mem.c was static there, but I think the point is still
valid, file being NULL doesn't seem to be a universally expected option.

> But 3 instances (ia64, x86, mips) are not exported (arm, arm64, ppc are)
> and v2 of this will come with 3 more single line patches, if we decide to
> proceed.
> 
> The only version which actually looks at @file is in
> arch/mips/loongson64/common/mem.c and I do not know what to do about it
> (can it do VFIO at all?), I could pass a file there but no actual code
> would use it anyway.

The question is not whether this particular platform could use vfio,
but instead is whether vfio is using the function correctly.  That, I
really don't know.

But also...

arch/arm64/mm/mmu.c:
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
                              unsigned long size, pgprot_t vma_prot)
{
        if (!pfn_valid(pfn))
                return pgprot_noncached(vma_prot);
        else if (file->f_flags & O_SYNC)
                return pgprot_writecombine(vma_prot);
        return vma_prot;
}

Why do we get to ignore dereferencing file on an arch that we
definitely care about?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-11  2:42     ` Alex Williamson
@ 2017-10-11  2:56       ` Alexey Kardashevskiy
  2017-10-11 15:35         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-11  2:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, David Gibson, Eric Auger, Benjamin Herrenschmidt

On 11/10/17 13:42, Alex Williamson wrote:
> On Wed, 11 Oct 2017 13:05:00 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 11/10/17 08:55, Alex Williamson wrote:
>>> On Mon,  9 Oct 2017 13:50:00 +1100
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>   
>>>> At the moment the protection in VFIO MMIO mappings is forced to
>>>> _PAGE_NON_IDEMPOTENT which means that write combining is not really
>>>> available to the userspace even for prefetchable 64bit MMIO BARs.
>>>>
>>>> This replaces pgprot_noncached() with a platform specific
>>>> phys_mem_access_prot() when available and depending on the platform
>>>> the vm_page_prot may be set to _PAGE_TOLERANT allowing to exploit
>>>> the write combining feature.
>>>>
>>>> The guest drivers still have to use _wc versions of
>>>> the ioremap/pci_ioremap API to get write combininig working.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>
>>>> This should allow DPDK and radix guests (x86, POWERPC, etc) to
>>>> do write combining.
>>>>
>>>> POWERPC hash guests should not be affected by this change, it should
>>>> work even without this.
>>>> ---
>>>>  drivers/vfio/pci/vfio_pci.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>>> index f041b1a6cf66..014192b42724 100644
>>>> --- a/drivers/vfio/pci/vfio_pci.c
>>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>>> @@ -1156,8 +1156,13 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>>>>  	}
>>>>  
>>>>  	vma->vm_private_data = vdev;
>>>> -	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>>>  	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
>>>> +#ifdef __HAVE_PHYS_MEM_ACCESS_PROT
>>>> +	vma->vm_page_prot = phys_mem_access_prot(NULL, vma->vm_pgoff,
>>>> +			req_len, vma->vm_page_prot);
>>>> +#else
>>>> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>>> +#endif
>>>>  
>>>>  	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>>>>  			       req_len, vma->vm_page_prot);  
>>>
>>> Are you testing __HAVE_PHYS_MEM_ACCESS_PROT because the version of
>>> phys_mem_access_prot() defined in drivers/char/mem.c can dereference
>>> @file and we're hoping that platforms we care about won't both define
>>> __HAVE_PHYS_MEM_ACCESS_PROT and look at @file?    
>>
>> No.
>>
>> That version in mem.c is static and not exported at all and I do not
>> understand why it got this name. Every other instance of
>> phys_mem_access_prot is accompanied by
>>
>> #define __HAVE_PHYS_MEM_ACCESS_PROT
> 
> I did miss that mem.c was static there, but I think the point is still
> valid, file being NULL doesn't seem to be a universally expected option.
>
> 
>> But 3 instances (ia64, x86, mips) are not exported (arm, arm64, ppc are)
>> and v2 of this will come with 3 more single line patches, if we decide to
>> proceed.
>>
>> The only version which actually looks at @file is in
>> arch/mips/loongson64/common/mem.c and I do not know what to do about it
>> (can it do VFIO at all?), I could pass a file there but no actual code
>> would use it anyway.
> 
> The question is not whether this particular platform could use vfio,
> but instead is whether vfio is using the function correctly.  That, I
> really don't know.

Who does? :) Let's cc: him.

>
> 
> But also...
> 
> arch/arm64/mm/mmu.c:
> pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>                               unsigned long size, pgprot_t vma_prot)
> {
>         if (!pfn_valid(pfn))
>                 return pgprot_noncached(vma_prot);
>         else if (file->f_flags & O_SYNC)
>                 return pgprot_writecombine(vma_prot);
>         return vma_prot;
> }
> 
> Why do we get to ignore dereferencing file on an arch that we
> definitely care about?  Thanks,

Oopsie. This is because I overlooked it. Others do not use it. So I do need
a file. But in the current scheme where all BARs share one fd - it won't
work - I simply cannot allow WC on non-prefetchable BARs :-/



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-11  2:56       ` Alexey Kardashevskiy
@ 2017-10-11 15:35         ` Benjamin Herrenschmidt
  2017-10-16  5:54           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-10-11 15:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Alex Williamson; +Cc: kvm, David Gibson, Eric Auger

On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
> Oopsie. This is because I overlooked it. Others do not use it. So I do need
> a file. But in the current scheme where all BARs share one fd - it won't
> work - I simply cannot allow WC on non-prefetchable BARs :-/

This is an oversight in the design of VFIO-PCI, it should have a way to
specify write combine, either implicitely via such an arch hook, or
explicitely via an ioctl prior to mapping the BARs for example.

Alex, what do you reckon is the best approach here ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-11 15:35         ` Benjamin Herrenschmidt
@ 2017-10-16  5:54           ` Alexey Kardashevskiy
  2017-10-16  6:00             ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-16  5:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alex Williamson; +Cc: kvm, David Gibson, Eric Auger

On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
> On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
>> Oopsie. This is because I overlooked it. Others do not use it. So I do need
>> a file. But in the current scheme where all BARs share one fd - it won't
>> work - I simply cannot allow WC on non-prefetchable BARs :-/
> 
> This is an oversight in the design of VFIO-PCI, it should have a way to
> specify write combine, either implicitely via such an arch hook, or
> explicitely via an ioctl prior to mapping the BARs for example.
> 
> Alex, what do you reckon is the best approach here ?

/me wonders if it is yet another issue for the dead issues bucket, just
like the msix mapping one :)



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  5:54           ` Alexey Kardashevskiy
@ 2017-10-16  6:00             ` David Gibson
  2017-10-16  7:36               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2017-10-16  6:00 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Benjamin Herrenschmidt, Alex Williamson, kvm, Eric Auger

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On Mon, Oct 16, 2017 at 04:54:08PM +1100, Alexey Kardashevskiy wrote:
> On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
> > On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
> >> Oopsie. This is because I overlooked it. Others do not use it. So I do need
> >> a file. But in the current scheme where all BARs share one fd - it won't
> >> work - I simply cannot allow WC on non-prefetchable BARs :-/
> > 
> > This is an oversight in the design of VFIO-PCI, it should have a way to
> > specify write combine, either implicitely via such an arch hook, or
> > explicitely via an ioctl prior to mapping the BARs for example.
> > 
> > Alex, what do you reckon is the best approach here ?
> 
> /me wonders if it is yet another issue for the dead issues bucket, just
> like the msix mapping one :)

Maybe.  Alexey, maybe you can make up a list of things that we (me,
you, BenH) need to discuss with Alex W at KVM Forum?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  6:00             ` David Gibson
@ 2017-10-16  7:36               ` Alexey Kardashevskiy
  2017-10-16  8:01                 ` David Gibson
  2017-10-16  8:38                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-16  7:36 UTC (permalink / raw)
  To: David Gibson; +Cc: Benjamin Herrenschmidt, Alex Williamson, kvm, Eric Auger


[-- Attachment #1.1: Type: text/plain, Size: 1492 bytes --]

On 16/10/17 17:00, David Gibson wrote:
> On Mon, Oct 16, 2017 at 04:54:08PM +1100, Alexey Kardashevskiy wrote:
>> On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
>>> On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
>>>> Oopsie. This is because I overlooked it. Others do not use it. So I do need
>>>> a file. But in the current scheme where all BARs share one fd - it won't
>>>> work - I simply cannot allow WC on non-prefetchable BARs :-/
>>>
>>> This is an oversight in the design of VFIO-PCI, it should have a way to
>>> specify write combine, either implicitely via such an arch hook, or
>>> explicitely via an ioctl prior to mapping the BARs for example.
>>>
>>> Alex, what do you reckon is the best approach here ?
>>
>> /me wonders if it is yet another issue for the dead issues bucket, just
>> like the msix mapping one :)
> 
> Maybe.  Alexey, maybe you can make up a list of things that we (me,
> you, BenH) need to discuss with Alex W at KVM Forum?

"you" - you meant me? I am not coming over there :(

The list is:

1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)

2. Allow write combining in vfio for the userspace (kvm guest is kinda
special and may simply ignore mapping flags in some configs but PPC radix
guests still rely on this)

3. what callback and where needs to be added to inform HV/PR KVM about VFIO
group, like IOMMUMR::add_vfio_group() proposal or something.

Thanks!


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  7:36               ` Alexey Kardashevskiy
@ 2017-10-16  8:01                 ` David Gibson
  2017-11-06  5:44                   ` Alexey Kardashevskiy
  2017-10-16  8:38                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 23+ messages in thread
From: David Gibson @ 2017-10-16  8:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Benjamin Herrenschmidt, Alex Williamson, kvm, Eric Auger

[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]

On Mon, Oct 16, 2017 at 06:36:29PM +1100, Alexey Kardashevskiy wrote:
> On 16/10/17 17:00, David Gibson wrote:
> > On Mon, Oct 16, 2017 at 04:54:08PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
> >>> On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
> >>>> Oopsie. This is because I overlooked it. Others do not use it. So I do need
> >>>> a file. But in the current scheme where all BARs share one fd - it won't
> >>>> work - I simply cannot allow WC on non-prefetchable BARs :-/
> >>>
> >>> This is an oversight in the design of VFIO-PCI, it should have a way to
> >>> specify write combine, either implicitely via such an arch hook, or
> >>> explicitely via an ioctl prior to mapping the BARs for example.
> >>>
> >>> Alex, what do you reckon is the best approach here ?
> >>
> >> /me wonders if it is yet another issue for the dead issues bucket, just
> >> like the msix mapping one :)
> > 
> > Maybe.  Alexey, maybe you can make up a list of things that we (me,
> > you, BenH) need to discuss with Alex W at KVM Forum?
> 
> "you" - you meant me? I am not coming over there :(

Oh.. I thought you were.

> The list is:
> 
> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
> 
> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> special and may simply ignore mapping flags in some configs but PPC radix
> guests still rely on this)
> 
> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
> group, like IOMMUMR::add_vfio_group() proposal or something.

Thanks.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  7:36               ` Alexey Kardashevskiy
  2017-10-16  8:01                 ` David Gibson
@ 2017-10-16  8:38                 ` Benjamin Herrenschmidt
  2017-10-16 11:11                   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-10-16  8:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson; +Cc: Alex Williamson, kvm, Eric Auger

On Mon, 2017-10-16 at 18:36 +1100, Alexey Kardashevskiy wrote:
> 
> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
> 
> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> special and may simply ignore mapping flags in some configs but PPC radix
> guests still rely on this)

Why ? The "G" bit is entirely under control of the guest afaik. This
would only affect qemu itself. It's still useful for things like dpdk
using vfio.

> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
> group, like IOMMUMR::add_vfio_group() proposal or something.

Can you elaborate a bit ? I haven't followed this.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  8:38                 ` Benjamin Herrenschmidt
@ 2017-10-16 11:11                   ` Alexey Kardashevskiy
  2017-10-18  7:33                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-16 11:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: Alex Williamson, kvm, Eric Auger, Paul Mackerras

On 16/10/17 19:38, Benjamin Herrenschmidt wrote:
> On Mon, 2017-10-16 at 18:36 +1100, Alexey Kardashevskiy wrote:
>>
>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
>>
>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
>> special and may simply ignore mapping flags in some configs but PPC radix
>> guests still rely on this)
> 
> Why ? The "G" bit is entirely under control of the guest afaik.

Yes, for hash guests. I am not sure sure about radix, Paul pointed me to
the code in KVM which uses the VFIO's mapping VMA.

> This
> would only affect qemu itself. It's still useful for things like dpdk
> using vfio.

Correct, this is useful regardless KVM.

>> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
>> group, like IOMMUMR::add_vfio_group() proposal or something.
> 
> Can you elaborate a bit ? I haven't followed this.

David knows :)



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16 11:11                   ` Alexey Kardashevskiy
@ 2017-10-18  7:33                     ` Benjamin Herrenschmidt
  2017-10-18  9:00                       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-10-18  7:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson
  Cc: Alex Williamson, kvm, Eric Auger, Paul Mackerras

On Mon, 2017-10-16 at 22:11 +1100, Alexey Kardashevskiy wrote:
> On 16/10/17 19:38, Benjamin Herrenschmidt wrote:
> > On Mon, 2017-10-16 at 18:36 +1100, Alexey Kardashevskiy wrote:
> > > 
> > > 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
> > > 
> > > 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > > special and may simply ignore mapping flags in some configs but PPC radix
> > > guests still rely on this)
> > 
> > Why ? The "G" bit is entirely under control of the guest afaik.
> 
> Yes, for hash guests. I am not sure sure about radix, Paul pointed me to
> the code in KVM which uses the VFIO's mapping VMA.

With radix, the HW will honor the G bit set in the guest page tables.

> 
> > This
> > would only affect qemu itself. It's still useful for things like dpdk
> > using vfio.
> 
> Correct, this is useful regardless KVM.
> 
> > > 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
> > > group, like IOMMUMR::add_vfio_group() proposal or something.
> > 
> > Can you elaborate a bit ? I haven't followed this.
> 
> David knows :)
> 
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-18  7:33                     ` Benjamin Herrenschmidt
@ 2017-10-18  9:00                       ` Alexey Kardashevskiy
  2017-10-18 14:21                         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-10-18  9:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, David Gibson
  Cc: Alex Williamson, kvm, Eric Auger, Paul Mackerras

On 18/10/17 18:33, Benjamin Herrenschmidt wrote:
> On Mon, 2017-10-16 at 22:11 +1100, Alexey Kardashevskiy wrote:
>> On 16/10/17 19:38, Benjamin Herrenschmidt wrote:
>>> On Mon, 2017-10-16 at 18:36 +1100, Alexey Kardashevskiy wrote:
>>>>
>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
>>>>
>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
>>>> special and may simply ignore mapping flags in some configs but PPC radix
>>>> guests still rely on this)
>>>
>>> Why ? The "G" bit is entirely under control of the guest afaik.
>>
>> Yes, for hash guests. I am not sure sure about radix, Paul pointed me to
>> the code in KVM which uses the VFIO's mapping VMA.
> 
> With radix, the HW will honor the G bit set in the guest page tables.


I am sure that you know better, Paul just pointed me to this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kvm/book3s_64_mmu_radix.c?h=v4.13#n403


> 
>>
>>> This
>>> would only affect qemu itself. It's still useful for things like dpdk
>>> using vfio.
>>
>> Correct, this is useful regardless KVM.
>>
>>>> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
>>>> group, like IOMMUMR::add_vfio_group() proposal or something.
>>>
>>> Can you elaborate a bit ? I haven't followed this.
>>
>> David knows :)
>>
>>
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-18  9:00                       ` Alexey Kardashevskiy
@ 2017-10-18 14:21                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-10-18 14:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson
  Cc: Alex Williamson, kvm, Eric Auger, Paul Mackerras

On Wed, 2017-10-18 at 20:00 +1100, Alexey Kardashevskiy wrote:
> > 
> > With radix, the HW will honor the G bit set in the guest page tables.
> 
> 
> I am sure that you know better, Paul just pointed me to this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kvm/book3s_64_mmu_radix.c?h=v4.13#n403

It's in the architecture somewhere, if you look at 3.0B in the radix
section about nested radix, there's a table for the nesting of
attributes.

Here you'll see that regardless of the G bit value set by the host,
the HW will use the G bit value set by the guest.

However, we should still provide a way to exploit WC if anything for
things like DPDK.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-10-16  8:01                 ` David Gibson
@ 2017-11-06  5:44                   ` Alexey Kardashevskiy
  2017-11-14  2:23                     ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-11-06  5:44 UTC (permalink / raw)
  To: David Gibson, Benjamin Herrenschmidt; +Cc: Alex Williamson, kvm, Eric Auger


[-- Attachment #1.1: Type: text/plain, Size: 1753 bytes --]

On 16/10/17 19:01, David Gibson wrote:
> On Mon, Oct 16, 2017 at 06:36:29PM +1100, Alexey Kardashevskiy wrote:
>> On 16/10/17 17:00, David Gibson wrote:
>>> On Mon, Oct 16, 2017 at 04:54:08PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
>>>>> On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
>>>>>> Oopsie. This is because I overlooked it. Others do not use it. So I do need
>>>>>> a file. But in the current scheme where all BARs share one fd - it won't
>>>>>> work - I simply cannot allow WC on non-prefetchable BARs :-/
>>>>>
>>>>> This is an oversight in the design of VFIO-PCI, it should have a way to
>>>>> specify write combine, either implicitely via such an arch hook, or
>>>>> explicitely via an ioctl prior to mapping the BARs for example.
>>>>>
>>>>> Alex, what do you reckon is the best approach here ?
>>>>
>>>> /me wonders if it is yet another issue for the dead issues bucket, just
>>>> like the msix mapping one :)
>>>
>>> Maybe.  Alexey, maybe you can make up a list of things that we (me,
>>> you, BenH) need to discuss with Alex W at KVM Forum?
>>
>> "you" - you meant me? I am not coming over there :(
> 
> Oh.. I thought you were.
> 
>> The list is:
>>
>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
>>
>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
>> special and may simply ignore mapping flags in some configs but PPC radix
>> guests still rely on this)
>>
>> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
>> group, like IOMMUMR::add_vfio_group() proposal or something.
> 
> Thanks.


Any luck with these?

Ben, ping.


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-06  5:44                   ` Alexey Kardashevskiy
@ 2017-11-14  2:23                     ` David Gibson
  2017-11-14  2:29                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2017-11-14  2:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Benjamin Herrenschmidt, Alex Williamson, kvm, Eric Auger

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

On Mon, Nov 06, 2017 at 04:44:03PM +1100, Alexey Kardashevskiy wrote:
> On 16/10/17 19:01, David Gibson wrote:
> > On Mon, Oct 16, 2017 at 06:36:29PM +1100, Alexey Kardashevskiy wrote:
> >> On 16/10/17 17:00, David Gibson wrote:
> >>> On Mon, Oct 16, 2017 at 04:54:08PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/10/17 02:35, Benjamin Herrenschmidt wrote:
> >>>>> On Wed, 2017-10-11 at 13:56 +1100, Alexey Kardashevskiy wrote:
> >>>>>> Oopsie. This is because I overlooked it. Others do not use it. So I do need
> >>>>>> a file. But in the current scheme where all BARs share one fd - it won't
> >>>>>> work - I simply cannot allow WC on non-prefetchable BARs :-/
> >>>>>
> >>>>> This is an oversight in the design of VFIO-PCI, it should have a way to
> >>>>> specify write combine, either implicitely via such an arch hook, or
> >>>>> explicitely via an ioctl prior to mapping the BARs for example.
> >>>>>
> >>>>> Alex, what do you reckon is the best approach here ?
> >>>>
> >>>> /me wonders if it is yet another issue for the dead issues bucket, just
> >>>> like the msix mapping one :)
> >>>
> >>> Maybe.  Alexey, maybe you can make up a list of things that we (me,
> >>> you, BenH) need to discuss with Alex W at KVM Forum?
> >>
> >> "you" - you meant me? I am not coming over there :(
> > 
> > Oh.. I thought you were.
> > 
> >> The list is:
> >>
> >> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)

We have a new plan on this - I'll discuss it over IRC.

> >> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> >> special and may simply ignore mapping flags in some configs but PPC radix
> >> guests still rely on this)

AIUI this isn't for radix, but for DPDK things that we need this.  Ben
talked about it a bit, but I don't know what the outcome was.

> >> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
> >> group, like IOMMUMR::add_vfio_group() proposal or something.

This was discussed, and I'm still thinking about it.  It's kind of
curly.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-14  2:23                     ` David Gibson
@ 2017-11-14  2:29                       ` Benjamin Herrenschmidt
  2017-11-14 16:28                         ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-14  2:29 UTC (permalink / raw)
  To: David Gibson, Alexey Kardashevskiy; +Cc: Alex Williamson, kvm, Eric Auger

On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:
> > >> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)
> 
> We have a new plan on this - I'll discuss it over IRC.
> 
> > >> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > >> special and may simply ignore mapping flags in some configs but PPC radix
> > >> guests still rely on this)
> 
> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> talked about it a bit, but I don't know what the outcome was.

So this is not a powerpc specific issue. Other archs similarily want to
be able to do write combine mappings.

The way sysfs does it is that for prefetchable BARs, it exposes both
a resourceN and a resourceN_wc file.

For VFIO it's a bit more tricky, maybe we need to game the offset using
some of it as flags but that's very fishy, or maybe we do some kind of
ioctl that selects the attributes used for that fd instance for
subsequent mappings...

I'll let Alex chose what he feels most appropriate here.

> > >> 3. what callback and where needs to be added to inform HV/PR KVM about VFIO
> > >> group, like IOMMUMR::add_vfio_group() proposal or something.
> 
> This was discussed, and I'm still thinking about it.  It's kind of
> curly.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-14  2:29                       ` Benjamin Herrenschmidt
@ 2017-11-14 16:28                         ` Alex Williamson
  2017-11-24  4:58                           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-11-14 16:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Gibson, Alexey Kardashevskiy, kvm, Eric Auger

On Tue, 14 Nov 2017 13:29:02 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:
> > > >> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)  
> > 
> > We have a new plan on this - I'll discuss it over IRC.
> >   
> > > >> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > > >> special and may simply ignore mapping flags in some configs but PPC radix
> > > >> guests still rely on this)  
> > 
> > AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> > talked about it a bit, but I don't know what the outcome was.  
> 
> So this is not a powerpc specific issue. Other archs similarily want to
> be able to do write combine mappings.
> 
> The way sysfs does it is that for prefetchable BARs, it exposes both
> a resourceN and a resourceN_wc file.
> 
> For VFIO it's a bit more tricky, maybe we need to game the offset using
> some of it as flags but that's very fishy, or maybe we do some kind of
> ioctl that selects the attributes used for that fd instance for
> subsequent mappings...
> 
> I'll let Alex chose what he feels most appropriate here.

My order of preference would be something like:

 - mmap flags provide some way for the user to specify a wc mapping
   within existing regions

 - some other mechanism of using the existing regions

 - additional regions provided for use exclusively with wc attributes
   (generalizing PCI BAR wc regions within device specific regions)

 - additional file descriptors provided for wc access

This isn't at the top of my priority list to figure out the solution,
so whoever implements it will need to provide justification as they
move down the list from more to less preferred solutions.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-14 16:28                         ` Alex Williamson
@ 2017-11-24  4:58                           ` Alexey Kardashevskiy
  2017-11-29 18:47                             ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Alexey Kardashevskiy @ 2017-11-24  4:58 UTC (permalink / raw)
  To: Alex Williamson, Benjamin Herrenschmidt; +Cc: David Gibson, kvm, Eric Auger

On 15/11/17 03:28, Alex Williamson wrote:
> On Tue, 14 Nov 2017 13:29:02 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
>> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:
>>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)  
>>>
>>> We have a new plan on this - I'll discuss it over IRC.
>>>   
>>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
>>>>>> special and may simply ignore mapping flags in some configs but PPC radix
>>>>>> guests still rely on this)  
>>>
>>> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
>>> talked about it a bit, but I don't know what the outcome was.  
>>
>> So this is not a powerpc specific issue. Other archs similarily want to
>> be able to do write combine mappings.
>>
>> The way sysfs does it is that for prefetchable BARs, it exposes both
>> a resourceN and a resourceN_wc file.
>>
>> For VFIO it's a bit more tricky, maybe we need to game the offset using
>> some of it as flags but that's very fishy, or maybe we do some kind of
>> ioctl that selects the attributes used for that fd instance for
>> subsequent mappings...
>>
>> I'll let Alex chose what he feels most appropriate here.
> 
> My order of preference would be something like:
> 
>  - mmap flags provide some way for the user to specify a wc mapping
>    within existing regions

There are plenty of flags but none really matches, checked with Paul.

>  - some other mechanism of using the existing regions

I can only think of madvise but it does not have appropriate flags either.


>  - additional regions provided for use exclusively with wc attributes
>    (generalizing PCI BAR wc regions within device specific regions)


Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
on for other BARs) seems a viable option.

However the comment for VFIO_PCI_xxx_REGION_INDEX says:

  VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
                           /* device specific cap to define content. */


which limits me in where I can add new indexes, I cannot just add new _WC
indexes to that enum, can I? I cannot see any existing regions above 9 yet
though.


>  - additional file descriptors provided for wc access

It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
take a BAR index, check if the BAR is prefetchable and if so - return an fd
which the userspace then could mmap(). This is won't break that ABI with 9
regions but it is the least favourable in the list...


> This isn't at the top of my priority list to figure out the solution,
> so whoever implements it will need to provide justification as they
> move down the list from more to less preferred solutions.  Thanks,

I am trying... I was really counting on you guys having this discussed in
Prague :(



-- 
Alexey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-24  4:58                           ` Alexey Kardashevskiy
@ 2017-11-29 18:47                             ` Alex Williamson
  2017-11-30  4:20                               ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-11-29 18:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Benjamin Herrenschmidt, David Gibson, kvm, Eric Auger

On Fri, 24 Nov 2017 15:58:09 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 15/11/17 03:28, Alex Williamson wrote:
> > On Tue, 14 Nov 2017 13:29:02 +1100
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:  
> >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)    
> >>>
> >>> We have a new plan on this - I'll discuss it over IRC.
> >>>     
> >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> >>>>>> special and may simply ignore mapping flags in some configs but PPC radix
> >>>>>> guests still rely on this)    
> >>>
> >>> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> >>> talked about it a bit, but I don't know what the outcome was.    
> >>
> >> So this is not a powerpc specific issue. Other archs similarily want to
> >> be able to do write combine mappings.
> >>
> >> The way sysfs does it is that for prefetchable BARs, it exposes both
> >> a resourceN and a resourceN_wc file.
> >>
> >> For VFIO it's a bit more tricky, maybe we need to game the offset using
> >> some of it as flags but that's very fishy, or maybe we do some kind of
> >> ioctl that selects the attributes used for that fd instance for
> >> subsequent mappings...
> >>
> >> I'll let Alex chose what he feels most appropriate here.  
> > 
> > My order of preference would be something like:
> > 
> >  - mmap flags provide some way for the user to specify a wc mapping
> >    within existing regions  
> 
> There are plenty of flags but none really matches, checked with Paul.

Is MAP_NONBLOCK off the table?  Why?
 
> >  - some other mechanism of using the existing regions  
> 
> I can only think of madvise but it does not have appropriate flags either.

Is it worth the process to define something that is appropriate?  Would
either of the above be the obvious architectural/implementation choice
if we could define a flag for it?

> >  - additional regions provided for use exclusively with wc attributes
> >    (generalizing PCI BAR wc regions within device specific regions)  
> 
> 
> Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
> on for other BARs) seems a viable option.
> 
> However the comment for VFIO_PCI_xxx_REGION_INDEX says:
> 
>   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
>                            /* device specific cap to define content. */
> 
> 
> which limits me in where I can add new indexes, I cannot just add new _WC
> indexes to that enum, can I? I cannot see any existing regions above 9 yet
> though.

The comment explains how to do this, you'd add a device specific region
with the type identifying it as a PCI MMIO WC region and the sub-type
probably defining the BAR index.

> >  - additional file descriptors provided for wc access  
> 
> It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
> take a BAR index, check if the BAR is prefetchable and if so - return an fd
> which the userspace then could mmap(). This is won't break that ABI with 9
> regions but it is the least favourable in the list...

Do the kernel mechanics require it to be a separate file descriptor?  A
separate fd is my last choice as well, but the interfaces your were
attempting to use previously seemed to have fd granularity.

> > This isn't at the top of my priority list to figure out the solution,
> > so whoever implements it will need to provide justification as they
> > move down the list from more to less preferred solutions.  Thanks,  
> 
> I am trying... I was really counting on you guys having this discussed in
> Prague :(

Should have been there to push your agenda...  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-29 18:47                             ` Alex Williamson
@ 2017-11-30  4:20                               ` David Gibson
  2017-11-30 20:06                                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2017-11-30  4:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, Benjamin Herrenschmidt, kvm, Eric Auger

[-- Attachment #1: Type: text/plain, Size: 4456 bytes --]

On Wed, Nov 29, 2017 at 11:47:46AM -0700, Alex Williamson wrote:
> On Fri, 24 Nov 2017 15:58:09 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > On 15/11/17 03:28, Alex Williamson wrote:
> > > On Tue, 14 Nov 2017 13:29:02 +1100
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > >   
> > >> On Tue, 2017-11-14 at 13:23 +1100, David Gibson wrote:  
> > >>>>>> 1. Allow msix mapping to the userspace (to address non-64k-aligned msix bar)    
> > >>>
> > >>> We have a new plan on this - I'll discuss it over IRC.
> > >>>     
> > >>>>>> 2. Allow write combining in vfio for the userspace (kvm guest is kinda
> > >>>>>> special and may simply ignore mapping flags in some configs but PPC radix
> > >>>>>> guests still rely on this)    
> > >>>
> > >>> AIUI this isn't for radix, but for DPDK things that we need this.  Ben
> > >>> talked about it a bit, but I don't know what the outcome was.    
> > >>
> > >> So this is not a powerpc specific issue. Other archs similarily want to
> > >> be able to do write combine mappings.
> > >>
> > >> The way sysfs does it is that for prefetchable BARs, it exposes both
> > >> a resourceN and a resourceN_wc file.
> > >>
> > >> For VFIO it's a bit more tricky, maybe we need to game the offset using
> > >> some of it as flags but that's very fishy, or maybe we do some kind of
> > >> ioctl that selects the attributes used for that fd instance for
> > >> subsequent mappings...
> > >>
> > >> I'll let Alex chose what he feels most appropriate here.  
> > > 
> > > My order of preference would be something like:
> > > 
> > >  - mmap flags provide some way for the user to specify a wc mapping
> > >    within existing regions  
> > 
> > There are plenty of flags but none really matches, checked with Paul.
> 
> Is MAP_NONBLOCK off the table?  Why?
>  
> > >  - some other mechanism of using the existing regions  
> > 
> > I can only think of madvise but it does not have appropriate flags either.
> 
> Is it worth the process to define something that is appropriate?  Would
> either of the above be the obvious architectural/implementation choice
> if we could define a flag for it?
> 
> > >  - additional regions provided for use exclusively with wc attributes
> > >    (generalizing PCI BAR wc regions within device specific regions)  
> > 
> > 
> > Adding VFIO_PCI_BAR0_WC_REGION_INDEX for VFIO_PCI_BAR0_REGION_INDEX (and so
> > on for other BARs) seems a viable option.
> > 
> > However the comment for VFIO_PCI_xxx_REGION_INDEX says:
> > 
> >   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
> >                            /* device specific cap to define content. */
> > 
> > 
> > which limits me in where I can add new indexes, I cannot just add new _WC
> > indexes to that enum, can I? I cannot see any existing regions above 9 yet
> > though.
> 
> The comment explains how to do this, you'd add a device specific region
> with the type identifying it as a PCI MMIO WC region and the sub-type
> probably defining the BAR index.
> 
> > >  - additional file descriptors provided for wc access  
> > 
> > It could be a capability + iocti(VFIO_DEVICE_GET_WC_RESOURCE) which would
> > take a BAR index, check if the BAR is prefetchable and if so - return an fd
> > which the userspace then could mmap(). This is won't break that ABI with 9
> > regions but it is the least favourable in the list...
> 
> Do the kernel mechanics require it to be a separate file descriptor?  A
> separate fd is my last choice as well, but the interfaces your were
> attempting to use previously seemed to have fd granularity.
> 
> > > This isn't at the top of my priority list to figure out the solution,
> > > so whoever implements it will need to provide justification as they
> > > move down the list from more to less preferred solutions.  Thanks,  
> > 
> > I am trying... I was really counting on you guys having this discussed in
> > Prague :(
> 
> Should have been there to push your agenda...  Thanks,

We discussed it briefly, BenH seemed to think there wasn't a big
difficulty, IIRC, which is why we didn't spend much time on this
(compared to the other issues).  So, talk to him.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow write combining
  2017-11-30  4:20                               ` David Gibson
@ 2017-11-30 20:06                                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2017-11-30 20:06 UTC (permalink / raw)
  To: David Gibson, Alex Williamson; +Cc: Alexey Kardashevskiy, kvm, Eric Auger

On Thu, 2017-11-30 at 15:20 +1100, David Gibson wrote:
> > > > This isn't at the top of my priority list to figure out the solution,
> > > > so whoever implements it will need to provide justification as they
> > > > move down the list from more to less preferred solutions.  Thanks,  
> > > 
> > > I am trying... I was really counting on you guys having this discussed in
> > > Prague :(
> > 
> > Should have been there to push your agenda...  Thanks,
> 
> We discussed it briefly, BenH seemed to think there wasn't a big
> difficulty, IIRC, which is why we didn't spend much time on this
> (compared to the other issues).  So, talk to him.

Well, first we established that this wasn't an issue for KVM, so the
importance/urgency went down. It's still useful for DPDK.

Then we discussed quickly the various options, none of them is
particularily *difficult* as in the implementation is rather trivial,
the question is to chose which interface to provide userspace.

I don't have a strong opinion there. The mm guys might object to
hijacking MAP_NONBLOCK, otherwise that seems like the best approach, so
we should run that past them.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-11-30 20:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09  2:50 [RFC PATCH kernel] vfio-pci: Allow write combining Alexey Kardashevskiy
2017-10-10 21:55 ` Alex Williamson
2017-10-11  2:05   ` Alexey Kardashevskiy
2017-10-11  2:42     ` Alex Williamson
2017-10-11  2:56       ` Alexey Kardashevskiy
2017-10-11 15:35         ` Benjamin Herrenschmidt
2017-10-16  5:54           ` Alexey Kardashevskiy
2017-10-16  6:00             ` David Gibson
2017-10-16  7:36               ` Alexey Kardashevskiy
2017-10-16  8:01                 ` David Gibson
2017-11-06  5:44                   ` Alexey Kardashevskiy
2017-11-14  2:23                     ` David Gibson
2017-11-14  2:29                       ` Benjamin Herrenschmidt
2017-11-14 16:28                         ` Alex Williamson
2017-11-24  4:58                           ` Alexey Kardashevskiy
2017-11-29 18:47                             ` Alex Williamson
2017-11-30  4:20                               ` David Gibson
2017-11-30 20:06                                 ` Benjamin Herrenschmidt
2017-10-16  8:38                 ` Benjamin Herrenschmidt
2017-10-16 11:11                   ` Alexey Kardashevskiy
2017-10-18  7:33                     ` Benjamin Herrenschmidt
2017-10-18  9:00                       ` Alexey Kardashevskiy
2017-10-18 14:21                         ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.