LinuxPPC-Dev Archive on lore.kernel.org
 help / Atom feed
* [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table
@ 2019-02-11  7:48 Alexey Kardashevskiy
  2019-02-12  0:20 ` David Gibson
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Kardashevskiy @ 2019-02-11  7:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson

We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to adjust the locked_vm counter when we release the memory used
by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.

Unfortunately the allocator does not add the userspace table size to
::it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are releasing.
As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.

This adjusts it_allocated_size if the userspace addresses table was
requested (total_allocated_uas is initialized by zero).

Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 697449a..58146e1 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 			page_shift);
 	tbl->it_level_size = 1ULL << (level_shift - 3);
 	tbl->it_indirect_levels = levels - 1;
-	tbl->it_allocated_size = total_allocated;
+	tbl->it_allocated_size = total_allocated + total_allocated_uas;
 	tbl->it_userspace = uas;
 	tbl->it_nid = nid;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table
  2019-02-11  7:48 [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table Alexey Kardashevskiy
@ 2019-02-12  0:20 ` David Gibson
  2019-02-12  7:33   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 4+ messages in thread
From: David Gibson @ 2019-02-12  0:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2502 bytes --]

On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
> We store 2 multilevel tables in iommu_table - one for the hardware and
> one with the corresponding userspace addresses. Before allocating
> the tables, the iommu_table_group_ops::get_table_size() hook returns
> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
> the locked_vm counter correctly. When the table is actually allocated,
> the amount of allocated memory is stored in iommu_table::it_allocated_size
> and used to adjust the locked_vm counter when we release the memory used
> by the table; .get_table_size() and .create_table() calculate it
> independently but the result is expected to be the same.

Any way we can remove that redundant calculation?  That seems like
begging for bugs.

> Unfortunately the allocator does not add the userspace table size to
> ::it_allocated_size so when we destroy the table because of VFIO PCI
> unplug (i.e. VFIO container is gone but the userspace keeps running),
> we decrement locked_vm by just a half of size of memory we are releasing.
> As the result, we leak locked_vm and may not be able to allocate more
> IOMMU tables after few iterations of hotplug/unplug.
> 
> This adjusts it_allocated_size if the userspace addresses table was
> requested (total_allocated_uas is initialized by zero).
> 
> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index 697449a..58146e1 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>  			page_shift);
>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>  	tbl->it_indirect_levels = levels - 1;
> -	tbl->it_allocated_size = total_allocated;
> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>  	tbl->it_userspace = uas;
>  	tbl->it_nid = nid;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table
  2019-02-12  0:20 ` David Gibson
@ 2019-02-12  7:33   ` Alexey Kardashevskiy
  2019-02-12 23:57     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Kardashevskiy @ 2019-02-12  7:33 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev



On 12/02/2019 11:20, David Gibson wrote:
> On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
>> We store 2 multilevel tables in iommu_table - one for the hardware and
>> one with the corresponding userspace addresses. Before allocating
>> the tables, the iommu_table_group_ops::get_table_size() hook returns
>> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
>> the locked_vm counter correctly. When the table is actually allocated,
>> the amount of allocated memory is stored in iommu_table::it_allocated_size
>> and used to adjust the locked_vm counter when we release the memory used
>> by the table; .get_table_size() and .create_table() calculate it
>> independently but the result is expected to be the same.
> 
> Any way we can remove that redundant calculation?  That seems like
> begging for bugs.


I do not see an easy way. One way could be adding a "dryrun" flag to
pnv_pci_ioda2_table_alloc_pages(), count allocated memory there and call
it from .get_table_size() but for multilevel TCEs it only allocates
first level...


>> Unfortunately the allocator does not add the userspace table size to
>> ::it_allocated_size so when we destroy the table because of VFIO PCI
>> unplug (i.e. VFIO container is gone but the userspace keeps running),
>> we decrement locked_vm by just a half of size of memory we are releasing.
>> As the result, we leak locked_vm and may not be able to allocate more
>> IOMMU tables after few iterations of hotplug/unplug.
>>
>> This adjusts it_allocated_size if the userspace addresses table was
>> requested (total_allocated_uas is initialized by zero).
>>
>> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
>> ---
>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> index 697449a..58146e1 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>  			page_shift);
>>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>>  	tbl->it_indirect_levels = levels - 1;
>> -	tbl->it_allocated_size = total_allocated;
>> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>>  	tbl->it_userspace = uas;
>>  	tbl->it_nid = nid;
>>  
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table
  2019-02-12  7:33   ` Alexey Kardashevskiy
@ 2019-02-12 23:57     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 4+ messages in thread
From: Alexey Kardashevskiy @ 2019-02-12 23:57 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev



On 12/02/2019 18:33, Alexey Kardashevskiy wrote:
> 
> 
> On 12/02/2019 11:20, David Gibson wrote:
>> On Mon, Feb 11, 2019 at 06:48:01PM +1100, Alexey Kardashevskiy wrote:
>>> We store 2 multilevel tables in iommu_table - one for the hardware and
>>> one with the corresponding userspace addresses. Before allocating
>>> the tables, the iommu_table_group_ops::get_table_size() hook returns
>>> the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
>>> the locked_vm counter correctly. When the table is actually allocated,
>>> the amount of allocated memory is stored in iommu_table::it_allocated_size
>>> and used to adjust the locked_vm counter when we release the memory used
>>> by the table; .get_table_size() and .create_table() calculate it
>>> independently but the result is expected to be the same.
>>
>> Any way we can remove that redundant calculation?  That seems like
>> begging for bugs.
> 
> 
> I do not see an easy way. One way could be adding a "dryrun" flag to
> pnv_pci_ioda2_table_alloc_pages(), count allocated memory there and call
> it from .get_table_size() but for multilevel TCEs it only allocates
> first level...


byyyyyy the way even this it_allocated_size is buggy as it is what was
actually allocated so it does not include any indirect levels which
might be allocated later although it should.

I'll simply make it tbl->it_allocated_size=.get_table_size() or just
ditch it_allocated_size, let me see...



> 
> 
>>> Unfortunately the allocator does not add the userspace table size to
>>> ::it_allocated_size so when we destroy the table because of VFIO PCI
>>> unplug (i.e. VFIO container is gone but the userspace keeps running),
>>> we decrement locked_vm by just a half of size of memory we are releasing.
>>> As the result, we leak locked_vm and may not be able to allocate more
>>> IOMMU tables after few iterations of hotplug/unplug.
>>>
>>> This adjusts it_allocated_size if the userspace addresses table was
>>> requested (total_allocated_uas is initialized by zero).
>>>
>>> Fixes: 090bad39b "powerpc/powernv: Add indirect levels to it_userspace"
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>
>>> ---
>>>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> index 697449a..58146e1 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
>>> @@ -313,7 +313,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>>  			page_shift);
>>>  	tbl->it_level_size = 1ULL << (level_shift - 3);
>>>  	tbl->it_indirect_levels = levels - 1;
>>> -	tbl->it_allocated_size = total_allocated;
>>> +	tbl->it_allocated_size = total_allocated + total_allocated_uas;
>>>  	tbl->it_userspace = uas;
>>>  	tbl->it_nid = nid;
>>>  
>>
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11  7:48 [PATCH kernel] powerpc/powernv/ioda: Store correct amount of memory used for table Alexey Kardashevskiy
2019-02-12  0:20 ` David Gibson
2019-02-12  7:33   ` Alexey Kardashevskiy
2019-02-12 23:57     ` Alexey Kardashevskiy

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org linuxppc-dev@archiver.kernel.org
	public-inbox-index linuxppc-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox