All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	benh@kernel.crashing.org, bhelgaas@google.com
Subject: Re: [PATCH v4 07/21] powerpc/powernv: Release PEs dynamically
Date: Tue, 12 May 2015 10:53:29 +1000	[thread overview]
Message-ID: <55514F09.4040100@ozlabs.ru> (raw)
In-Reply-To: <20150512000308.GC4646@gwshan>

On 05/12/2015 10:03 AM, Gavin Shan wrote:
> On Mon, May 11, 2015 at 05:02:08PM +1000, Alexey Kardashevskiy wrote:
>> On 05/11/2015 04:25 PM, Gavin Shan wrote:
>>> On Sat, May 09, 2015 at 10:43:23PM +1000, Alexey Kardashevskiy wrote:
>>>> On 05/01/2015 04:02 PM, Gavin Shan wrote:
>>>>> The original code doesn't support releasing PEs dynamically, meaning
>>>>> that PE and the associated resources (IO, M32, M64 and DMA) can't
>>>>> be released when unplugging a PCI adapter from one hotpluggable slot.
>>>>>
>>>>> The patch takes object oriented methodology, introducs reference
>>>>> count to PE, which is initialized to 1 and increased with 1 when a
>>>>> new PCI device joins the PE. Once the last PCI device leaves the
>>>>> PE, the PE is going to be release together with its associated
>>>>> (IO, M32, M64, DMA) resources.
>>>>
>>>>
>>>> Too little commit log for non-trivial non-cut-n-paste 30KB patch...
>>>>
>>>
>>> Ok. I'll add more details in next revision.
>>>
>>>>>
>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch/powerpc/include/asm/pci-bridge.h     |   3 +
>>>>>   arch/powerpc/kernel/pci-hotplug.c         |   5 +
>>>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 658 +++++++++++++++++++-----------
>>>>>   arch/powerpc/platforms/powernv/pci.h      |   4 +-
>>>>>   4 files changed, 432 insertions(+), 238 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>>> index 5367eb3..a6ad4b1 100644
>>>>> --- a/arch/powerpc/include/asm/pci-bridge.h
>>>>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>>>>> @@ -31,6 +31,9 @@ struct pci_controller_ops {
>>>>>   	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
>>>>>   	void		(*setup_bridge)(struct pci_bus *, unsigned long);
>>>>>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
>>>>> +
>>>>> +	/* Called when PCI device is released */
>>>>> +	void		(*release_device)(struct pci_dev *);
>>>>>   };
>>>>>
>>>>>   /*
>>>>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>>>>> index 7ed85a6..0040343 100644
>>>>> --- a/arch/powerpc/kernel/pci-hotplug.c
>>>>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>>>>> @@ -29,6 +29,11 @@
>>>>>    */
>>>>>   void pcibios_release_device(struct pci_dev *dev)
>>>>>   {
>>>>> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>>>>> +
>>>>> +	if (hose->controller_ops.release_device)
>>>>> +		hose->controller_ops.release_device(dev);
>>>>> +
>>>>>   	eeh_remove_device(dev);
>>>>>   }
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> index 910fb67..ef8c216 100644
>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> @@ -12,6 +12,8 @@
>>>>>   #undef DEBUG
>>>>>
>>>>>   #include <linux/kernel.h>
>>>>> +#include <linux/atomic.h>
>>>>> +#include <linux/kref.h>
>>>>>   #include <linux/pci.h>
>>>>>   #include <linux/crash_dump.h>
>>>>>   #include <linux/debugfs.h>
>>>>> @@ -47,6 +49,8 @@
>>>>>   /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>>>>   #define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>>>>
>>>>> +static void pnv_ioda_release_pe(struct kref *kref);
>>>>> +
>>>>>   static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>>>>>   			    const char *fmt, ...)
>>>>>   {
>>>>> @@ -123,25 +127,400 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>>>>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>>>>   }
>>>>>
>>>>> -static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>> +static inline void pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
>>>>>   {
>>>>> -	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
>>>>> -		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>>>>> -			__func__, pe_no, phb->hose->global_number);
>>>>> +	if (!pe)
>>>>> +		return;
>>>>> +
>>>>> +	kref_get(&pe->kref);
>>>>> +}
>>>>> +
>>>>> +static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
>>>>> +{
>>>>> +	unsigned int count;
>>>>> +
>>>>> +	if (!pe)
>>>>>   		return;
>>>>> +
>>>>> +	/*
>>>>> +	 * The count is initialized to 1 and increased with 1 when
>>>>> +	 * a new PCI device is bound with the PE. Once the last PCI
>>>>> +	 * device is leaving from the PE, the PE is going to be
>>>>> +	 * released.
>>>>> +	 */
>>>>> +	count = atomic_read(&pe->kref.refcount);
>>>>> +	if (count == 2)
>>>>> +		kref_sub(&pe->kref, 2, pnv_ioda_release_pe);
>>>>> +	else
>>>>> +		kref_put(&pe->kref, pnv_ioda_release_pe);
>>>>
>>>>
>>>> What if pnv_ioda_pe_get() gets called between atomic_read() and kref_sub()?
>>>>
>>>
>>> Yeah, that would have problem. But it shouldn't happen because the
>>> PCI devices are joining the parent PE# in strictly serialized mode.
>>> Same thing happens when detaching PCI devices from its parent PE.
>>
>>
>> oookay. Another thing then - why is this kref counter initialized to 1?
>> It would make sense if you did something special when the counter becomes 1
>> after decrement but you do not.
>>
>> Also, this kref thing makes sense if you do kref_put() in multiple places and
>> do not know which one will be the last one so you pass the callback to all of
>> them. Here you do kref_put/sub in one place and you read the counter - so you
>> can call pnv_ioda_release_pe() directly. And it feels like a simple atomic_t
>> would do the job just fine. If you still feel that the counter should start
>>from 1, there are atomic_dec_if_positive() and atomic_inc_not_zero() and
>> others.
>>
>
> It's good question actually. The counter is initialized to 1 when the PE
> is reserved because of M64 requirement or allocated for non-M64 case. If
> we reserve or allocate PE#, there is one thing for sure: the PCI bus has
> one PCI device (including PCI bridge) at least. After the PE# is reserved
> or allocated, the PCI device joins the PE with the result of increasing
> the counter with 1. It means the counter is 2 when PE contains one PCI
> device, and 3 when there're 2 devices. One reason for this design is that
> we just need decrease the counter if we have to release this PE in the
> window between PE reservation/allocation and first PCI device joins. I
> think you're correct that we can call pnv_ioda_release_pe() in this window.
> In this way, the counter is always reflecting the number of PCI devices
> the PE contains.


Good :) I believe it was something different 2-3 versions ago and evolved 
to this so you do not notice it straight away :)


>
>>>>> +}
>>>>> +
>>>>> +static void pnv_pci_release_device(struct pci_dev *pdev)
>>>>> +{
>>>>> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>> +	struct pnv_phb *phb = hose->private_data;
>>>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>> +	struct pnv_ioda_pe *pe;
>>>>> +
>>>>> +	if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>>>>> +		pe = &phb->ioda.pe_array[pdn->pe_number];
>>>>> +		pnv_ioda_pe_put(pe);
>>>>> +		pdn->pe_number = IODA_INVALID_PE;
>>>>>   	}
>>>>> +}
>>>>>
>>>>> -	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc)) {
>>>>> -		pr_warn("%s: PE %d was assigned on PHB#%x\n",
>>>>> -			__func__, pe_no, phb->hose->global_number);
>>>>> +static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
>>>>> +{
>>>>> +	struct pnv_phb *phb = pe->phb;
>>>>> +	int index, count;
>>>>> +	unsigned long tbl_addr, tbl_size;
>>>>> +
>>>>> +	/* No DMA capability for slave PEs */
>>>>> +	if (pe->flags & PNV_IODA_PE_SLAVE)
>>>>> +		return;
>>>>> +
>>>>> +	/* Bypass DMA window */
>>>>> +	if (phb->type == PNV_PHB_IODA2 &&
>>>>> +	    pe->tce_bypass_enabled &&
>>>>> +	    pe->tce32_table &&
>>>>> +	    pe->tce32_table->set_bypass)
>>>>> +		pe->tce32_table->set_bypass(pe->tce32_table, false);
>>>>> +
>>>>> +	/* 32-bits DMA window */
>>>>> +	count = pe->tce32_seg_end - pe->tce32_seg_start;
>>>>> +	tbl_addr = pe->tce32_table->it_base;
>>>>> +	if (!count)
>>>>>   		return;
>>>>> +
>>>>> +	/* Free IOMMU table */
>>>>> +	iommu_free_table(pe->tce32_table,
>>>>> +			 of_node_full_name(phb->hose->dn));
>>>>> +
>>>>> +	/* Deconfigure TCE table */
>>>>> +	switch (phb->type) {
>>>>> +	case PNV_PHB_IODA1:
>>>>> +		for (index = 0; index < count; index++)
>>>>> +			opal_pci_map_pe_dma_window(phb->opal_id,
>>>>> +						   pe->pe_number,
>>>>> +						   pe->tce32_seg_start + index,
>>>>> +						   1,
>>>>> +						   __pa(tbl_addr) +
>>>>> +						   index * TCE32_TABLE_SIZE,
>>>>> +						   0,
>>>>> +						   0x1000);
>>>>> +		bitmap_clear(phb->ioda.tce32_segmap,
>>>>> +			     pe->tce32_seg_start,
>>>>> +			     count);
>>>>> +		tbl_size = TCE32_TABLE_SIZE * count;
>>>>> +		break;
>>>>> +	case PNV_PHB_IODA2:
>>>>> +		opal_pci_map_pe_dma_window(phb->opal_id,
>>>>> +					   pe->pe_number,
>>>>> +					   pe->pe_number << 1,
>>>>> +					   1,
>>>>> +					   __pa(tbl_addr),
>>>>> +					   0,
>>>>> +					   0x1000);
>>>>> +		tbl_size = (1ul << ilog2(phb->ioda.m32_pci_base));
>>>>> +		tbl_size = (tbl_size >> IOMMU_PAGE_SHIFT_4K) * 8;
>>>>> +		break;
>>>>> +	default:
>>>>> +		pe_warn(pe, "Unsupported PHB type %d\n", phb->type);
>>>>> +		return;
>>>>> +	}
>>>>> +
>>>>> +	/* Free memory of IOMMU table */
>>>>> +	free_pages(tbl_addr, get_order(tbl_size));
>>>>
>>>>
>>>> You just programmed the table address to TVT and then you are releasing the
>>>> pages. It does not seem right, it will leave garbage in TVT. Also, I am
>>>> adding helpers to alloc/free TCE pages in DDW patchset, you could reuse bits
>>> >from there (I'll post v10 soon, you'll be in copy and you'll have to review
>>>> that ;) ).
>>>>
>>>
>>> I assume you're talking about TVE. I don't understand how garbage will be left
>>> in TVE. opal_pci_map_pe_dma_window(), which is handled by skiboot, clear TVE
>>> with zero'ed "tce_table_size". The pages previously allocated for TCE table is
>>> released to buddy system, which can be allocated by somebody else (from buddy
>>> or slab).
>>
>> opal_pci_map_pe_dma_window() takes __pa(tbl_addr) which points to some memory
>> which is still allocated. This value goes to a table (which has 2 entries per
>> PE, one for 32bit DMA window and one for bypass/hugewindow) which PHB uses to
>> get the actual TCE table address. What is the name of this table? :) Anyway,
>> you write an address there and then you call free_pages() so after
>> free_pages(), the value in that TVE/TVT/whatever table is a garbage.
>>
>
> I don't look into your DDW code yet. Before we have DDW patchset, the bypass
> TVE (window) isn't supposed to have corresponding TCE table. I guess you might
> change the behaviour in your DDW patchset and I'll take a close look on that.
> For DMA32 window, which is the name of the table, the TVE is cleared by skiboot
> when having zero "tce_table_size" argument.
>
> 	opal_pci_map_pe_dma_window(phb->opal_id,
> 				   pe->pe_number,
> 				   pe->pe_number << 1,
> 				   1,
> 				   __pa(tbl_addr),
> 				   0,			<<<< "tce_table_size".
> 				   0x1000);


Then please, when you pass tce_table_size==0, also pass zero address/zero 
page size/zero levels, unless you have very good reason to pass non-zero 
values for these. What you have now is confusing - it looks like you are 
initializing the table - it is not obvious that "0" is the size and not 
some flags.

When people see this (which does the same thing, please correct me if I am 
wrong), they do not have questions what you are actually trying to do:

  	opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
  				   pe->pe_number << 1, 0, 0, 0, 0);



-- 
Alexey

WARNING: multiple messages have this Message-ID (diff)
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4 07/21] powerpc/powernv: Release PEs dynamically
Date: Tue, 12 May 2015 10:53:29 +1000	[thread overview]
Message-ID: <55514F09.4040100@ozlabs.ru> (raw)
In-Reply-To: <20150512000308.GC4646@gwshan>

On 05/12/2015 10:03 AM, Gavin Shan wrote:
> On Mon, May 11, 2015 at 05:02:08PM +1000, Alexey Kardashevskiy wrote:
>> On 05/11/2015 04:25 PM, Gavin Shan wrote:
>>> On Sat, May 09, 2015 at 10:43:23PM +1000, Alexey Kardashevskiy wrote:
>>>> On 05/01/2015 04:02 PM, Gavin Shan wrote:
>>>>> The original code doesn't support releasing PEs dynamically, meaning
>>>>> that PE and the associated resources (IO, M32, M64 and DMA) can't
>>>>> be released when unplugging a PCI adapter from one hotpluggable slot.
>>>>>
>>>>> The patch takes object oriented methodology, introducs reference
>>>>> count to PE, which is initialized to 1 and increased with 1 when a
>>>>> new PCI device joins the PE. Once the last PCI device leaves the
>>>>> PE, the PE is going to be release together with its associated
>>>>> (IO, M32, M64, DMA) resources.
>>>>
>>>>
>>>> Too little commit log for non-trivial non-cut-n-paste 30KB patch...
>>>>
>>>
>>> Ok. I'll add more details in next revision.
>>>
>>>>>
>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>> ---
>>>>>   arch/powerpc/include/asm/pci-bridge.h     |   3 +
>>>>>   arch/powerpc/kernel/pci-hotplug.c         |   5 +
>>>>>   arch/powerpc/platforms/powernv/pci-ioda.c | 658 +++++++++++++++++++-----------
>>>>>   arch/powerpc/platforms/powernv/pci.h      |   4 +-
>>>>>   4 files changed, 432 insertions(+), 238 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
>>>>> index 5367eb3..a6ad4b1 100644
>>>>> --- a/arch/powerpc/include/asm/pci-bridge.h
>>>>> +++ b/arch/powerpc/include/asm/pci-bridge.h
>>>>> @@ -31,6 +31,9 @@ struct pci_controller_ops {
>>>>>   	resource_size_t (*window_alignment)(struct pci_bus *, unsigned long type);
>>>>>   	void		(*setup_bridge)(struct pci_bus *, unsigned long);
>>>>>   	void		(*reset_secondary_bus)(struct pci_dev *dev);
>>>>> +
>>>>> +	/* Called when PCI device is released */
>>>>> +	void		(*release_device)(struct pci_dev *);
>>>>>   };
>>>>>
>>>>>   /*
>>>>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>>>>> index 7ed85a6..0040343 100644
>>>>> --- a/arch/powerpc/kernel/pci-hotplug.c
>>>>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>>>>> @@ -29,6 +29,11 @@
>>>>>    */
>>>>>   void pcibios_release_device(struct pci_dev *dev)
>>>>>   {
>>>>> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
>>>>> +
>>>>> +	if (hose->controller_ops.release_device)
>>>>> +		hose->controller_ops.release_device(dev);
>>>>> +
>>>>>   	eeh_remove_device(dev);
>>>>>   }
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> index 910fb67..ef8c216 100644
>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> @@ -12,6 +12,8 @@
>>>>>   #undef DEBUG
>>>>>
>>>>>   #include <linux/kernel.h>
>>>>> +#include <linux/atomic.h>
>>>>> +#include <linux/kref.h>
>>>>>   #include <linux/pci.h>
>>>>>   #include <linux/crash_dump.h>
>>>>>   #include <linux/debugfs.h>
>>>>> @@ -47,6 +49,8 @@
>>>>>   /* 256M DMA window, 4K TCE pages, 8 bytes TCE */
>>>>>   #define TCE32_TABLE_SIZE	((0x10000000 / 0x1000) * 8)
>>>>>
>>>>> +static void pnv_ioda_release_pe(struct kref *kref);
>>>>> +
>>>>>   static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>>>>>   			    const char *fmt, ...)
>>>>>   {
>>>>> @@ -123,25 +127,400 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long flags)
>>>>>   		(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
>>>>>   }
>>>>>
>>>>> -static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
>>>>> +static inline void pnv_ioda_pe_get(struct pnv_ioda_pe *pe)
>>>>>   {
>>>>> -	if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
>>>>> -		pr_warn("%s: Invalid PE %d on PHB#%x\n",
>>>>> -			__func__, pe_no, phb->hose->global_number);
>>>>> +	if (!pe)
>>>>> +		return;
>>>>> +
>>>>> +	kref_get(&pe->kref);
>>>>> +}
>>>>> +
>>>>> +static inline void pnv_ioda_pe_put(struct pnv_ioda_pe *pe)
>>>>> +{
>>>>> +	unsigned int count;
>>>>> +
>>>>> +	if (!pe)
>>>>>   		return;
>>>>> +
>>>>> +	/*
>>>>> +	 * The count is initialized to 1 and increased with 1 when
>>>>> +	 * a new PCI device is bound with the PE. Once the last PCI
>>>>> +	 * device is leaving from the PE, the PE is going to be
>>>>> +	 * released.
>>>>> +	 */
>>>>> +	count = atomic_read(&pe->kref.refcount);
>>>>> +	if (count == 2)
>>>>> +		kref_sub(&pe->kref, 2, pnv_ioda_release_pe);
>>>>> +	else
>>>>> +		kref_put(&pe->kref, pnv_ioda_release_pe);
>>>>
>>>>
>>>> What if pnv_ioda_pe_get() gets called between atomic_read() and kref_sub()?
>>>>
>>>
>>> Yeah, that would have problem. But it shouldn't happen because the
>>> PCI devices are joining the parent PE# in strictly serialized mode.
>>> Same thing happens when detaching PCI devices from its parent PE.
>>
>>
>> oookay. Another thing then - why is this kref counter initialized to 1?
>> It would make sense if you did something special when the counter becomes 1
>> after decrement but you do not.
>>
>> Also, this kref thing makes sense if you do kref_put() in multiple places and
>> do not know which one will be the last one so you pass the callback to all of
>> them. Here you do kref_put/sub in one place and you read the counter - so you
>> can call pnv_ioda_release_pe() directly. And it feels like a simple atomic_t
>> would do the job just fine. If you still feel that the counter should start
>>from 1, there are atomic_dec_if_positive() and atomic_inc_not_zero() and
>> others.
>>
>
> It's good question actually. The counter is initialized to 1 when the PE
> is reserved because of M64 requirement or allocated for non-M64 case. If
> we reserve or allocate PE#, there is one thing for sure: the PCI bus has
> one PCI device (including PCI bridge) at least. After the PE# is reserved
> or allocated, the PCI device joins the PE with the result of increasing
> the counter with 1. It means the counter is 2 when PE contains one PCI
> device, and 3 when there're 2 devices. One reason for this design is that
> we just need decrease the counter if we have to release this PE in the
> window between PE reservation/allocation and first PCI device joins. I
> think you're correct that we can call pnv_ioda_release_pe() in this window.
> In this way, the counter is always reflecting the number of PCI devices
> the PE contains.


Good :) I believe it was something different 2-3 versions ago and evolved 
to this so you do not notice it straight away :)


>
>>>>> +}
>>>>> +
>>>>> +static void pnv_pci_release_device(struct pci_dev *pdev)
>>>>> +{
>>>>> +	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
>>>>> +	struct pnv_phb *phb = hose->private_data;
>>>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>> +	struct pnv_ioda_pe *pe;
>>>>> +
>>>>> +	if (pdn && pdn->pe_number != IODA_INVALID_PE) {
>>>>> +		pe = &phb->ioda.pe_array[pdn->pe_number];
>>>>> +		pnv_ioda_pe_put(pe);
>>>>> +		pdn->pe_number = IODA_INVALID_PE;
>>>>>   	}
>>>>> +}
>>>>>
>>>>> -	if (test_and_set_bit(pe_no, phb->ioda.pe_alloc)) {
>>>>> -		pr_warn("%s: PE %d was assigned on PHB#%x\n",
>>>>> -			__func__, pe_no, phb->hose->global_number);
>>>>> +static void pnv_ioda_release_pe_dma(struct pnv_ioda_pe *pe)
>>>>> +{
>>>>> +	struct pnv_phb *phb = pe->phb;
>>>>> +	int index, count;
>>>>> +	unsigned long tbl_addr, tbl_size;
>>>>> +
>>>>> +	/* No DMA capability for slave PEs */
>>>>> +	if (pe->flags & PNV_IODA_PE_SLAVE)
>>>>> +		return;
>>>>> +
>>>>> +	/* Bypass DMA window */
>>>>> +	if (phb->type == PNV_PHB_IODA2 &&
>>>>> +	    pe->tce_bypass_enabled &&
>>>>> +	    pe->tce32_table &&
>>>>> +	    pe->tce32_table->set_bypass)
>>>>> +		pe->tce32_table->set_bypass(pe->tce32_table, false);
>>>>> +
>>>>> +	/* 32-bits DMA window */
>>>>> +	count = pe->tce32_seg_end - pe->tce32_seg_start;
>>>>> +	tbl_addr = pe->tce32_table->it_base;
>>>>> +	if (!count)
>>>>>   		return;
>>>>> +
>>>>> +	/* Free IOMMU table */
>>>>> +	iommu_free_table(pe->tce32_table,
>>>>> +			 of_node_full_name(phb->hose->dn));
>>>>> +
>>>>> +	/* Deconfigure TCE table */
>>>>> +	switch (phb->type) {
>>>>> +	case PNV_PHB_IODA1:
>>>>> +		for (index = 0; index < count; index++)
>>>>> +			opal_pci_map_pe_dma_window(phb->opal_id,
>>>>> +						   pe->pe_number,
>>>>> +						   pe->tce32_seg_start + index,
>>>>> +						   1,
>>>>> +						   __pa(tbl_addr) +
>>>>> +						   index * TCE32_TABLE_SIZE,
>>>>> +						   0,
>>>>> +						   0x1000);
>>>>> +		bitmap_clear(phb->ioda.tce32_segmap,
>>>>> +			     pe->tce32_seg_start,
>>>>> +			     count);
>>>>> +		tbl_size = TCE32_TABLE_SIZE * count;
>>>>> +		break;
>>>>> +	case PNV_PHB_IODA2:
>>>>> +		opal_pci_map_pe_dma_window(phb->opal_id,
>>>>> +					   pe->pe_number,
>>>>> +					   pe->pe_number << 1,
>>>>> +					   1,
>>>>> +					   __pa(tbl_addr),
>>>>> +					   0,
>>>>> +					   0x1000);
>>>>> +		tbl_size = (1ul << ilog2(phb->ioda.m32_pci_base));
>>>>> +		tbl_size = (tbl_size >> IOMMU_PAGE_SHIFT_4K) * 8;
>>>>> +		break;
>>>>> +	default:
>>>>> +		pe_warn(pe, "Unsupported PHB type %d\n", phb->type);
>>>>> +		return;
>>>>> +	}
>>>>> +
>>>>> +	/* Free memory of IOMMU table */
>>>>> +	free_pages(tbl_addr, get_order(tbl_size));
>>>>
>>>>
>>>> You just programmed the table address to TVT and then you are releasing the
>>>> pages. It does not seem right, it will leave garbage in TVT. Also, I am
>>>> adding helpers to alloc/free TCE pages in DDW patchset, you could reuse bits
>>> >from there (I'll post v10 soon, you'll be in copy and you'll have to review
>>>> that ;) ).
>>>>
>>>
>>> I assume you're talking about TVE. I don't understand how garbage will be left
>>> in TVE. opal_pci_map_pe_dma_window(), which is handled by skiboot, clear TVE
>>> with zero'ed "tce_table_size". The pages previously allocated for TCE table is
>>> released to buddy system, which can be allocated by somebody else (from buddy
>>> or slab).
>>
>> opal_pci_map_pe_dma_window() takes __pa(tbl_addr) which points to some memory
>> which is still allocated. This value goes to a table (which has 2 entries per
>> PE, one for 32bit DMA window and one for bypass/hugewindow) which PHB uses to
>> get the actual TCE table address. What is the name of this table? :) Anyway,
>> you write an address there and then you call free_pages() so after
>> free_pages(), the value in that TVE/TVT/whatever table is a garbage.
>>
>
> I don't look into your DDW code yet. Before we have DDW patchset, the bypass
> TVE (window) isn't supposed to have corresponding TCE table. I guess you might
> change the behaviour in your DDW patchset and I'll take a close look on that.
> For DMA32 window, which is the name of the table, the TVE is cleared by skiboot
> when having zero "tce_table_size" argument.
>
> 	opal_pci_map_pe_dma_window(phb->opal_id,
> 				   pe->pe_number,
> 				   pe->pe_number << 1,
> 				   1,
> 				   __pa(tbl_addr),
> 				   0,			<<<< "tce_table_size".
> 				   0x1000);


Then please, when you pass tce_table_size==0, also pass zero address/zero 
page size/zero levels, unless you have very good reason to pass non-zero 
values for these. What you have now is confusing - it looks like you are 
initializing the table - it is not obvious that "0" is the size and not 
some flags.

When people see this (which does the same thing, please correct me if I am 
wrong), they do not have questions what you are actually trying to do:

  	opal_pci_map_pe_dma_window(phb->opal_id, pe->pe_number,
  				   pe->pe_number << 1, 0, 0, 0, 0);



-- 
Alexey

  reply	other threads:[~2015-05-12  0:53 UTC|newest]

Thread overview: 184+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-01  6:02 [PATCH v4 00/21] PowerPC/PowerNV: PCI Slot Management Gavin Shan
2015-05-01  6:02 ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 01/21] pci: Add pcibios_setup_bridge() Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-07 22:12   ` Bjorn Helgaas
2015-05-07 22:12     ` Bjorn Helgaas
2015-05-11  1:59     ` Gavin Shan
2015-05-11  1:59       ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 02/21] powerpc/powernv: Enable M64 on P7IOC Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09  0:18   ` Alexey Kardashevskiy
2015-05-09  0:18     ` Alexey Kardashevskiy
2015-05-11  4:37     ` Gavin Shan
2015-05-11  4:37       ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 03/21] powerpc/powernv: M64 support improvement Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 10:24   ` Alexey Kardashevskiy
2015-05-09 10:24     ` Alexey Kardashevskiy
2015-05-11  4:47     ` Gavin Shan
2015-05-11  4:47       ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 04/21] powerpc/powernv: Improve IO and M32 mapping Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 10:53   ` Alexey Kardashevskiy
2015-05-09 10:53     ` Alexey Kardashevskiy
2015-05-11  4:52     ` Gavin Shan
2015-05-11  4:52       ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 05/21] powerpc/powernv: Improve DMA32 segment assignment Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 06/21] powerpc/powernv: Create PEs dynamically Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 11:43   ` Alexey Kardashevskiy
2015-05-09 11:43     ` Alexey Kardashevskiy
2015-05-11  4:55     ` Gavin Shan
2015-05-11  4:55       ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 07/21] powerpc/powernv: Release " Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 12:43   ` Alexey Kardashevskiy
2015-05-09 12:43     ` Alexey Kardashevskiy
2015-05-11  6:25     ` Gavin Shan
2015-05-11  6:25       ` Gavin Shan
2015-05-11  7:02       ` Alexey Kardashevskiy
2015-05-11  7:02         ` Alexey Kardashevskiy
2015-05-12  0:03         ` Gavin Shan
2015-05-12  0:03           ` Gavin Shan
2015-05-12  0:53           ` Alexey Kardashevskiy [this message]
2015-05-12  0:53             ` Alexey Kardashevskiy
2015-05-12  1:25             ` Gavin Shan
2015-05-12  1:25               ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 08/21] powerpc/powernv: Drop pnv_ioda_setup_dev_PE() Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 12:45   ` Alexey Kardashevskiy
2015-05-09 12:45     ` Alexey Kardashevskiy
2015-05-01  6:02 ` [PATCH v4 09/21] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 13:41   ` Alexey Kardashevskiy
2015-05-09 13:41     ` Alexey Kardashevskiy
2015-05-11  6:45     ` Gavin Shan
2015-05-11  6:45       ` Gavin Shan
2015-05-11  7:16       ` Alexey Kardashevskiy
2015-05-11  7:16         ` Alexey Kardashevskiy
2015-05-01  6:02 ` [PATCH v4 10/21] powerpc/powernv: Fundamental reset for PCI bus reset Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-09 14:12   ` Alexey Kardashevskiy
2015-05-09 14:12     ` Alexey Kardashevskiy
2015-05-11  6:47     ` Gavin Shan
2015-05-11  6:47       ` Gavin Shan
2015-05-11  7:17       ` Alexey Kardashevskiy
2015-05-11  7:17         ` Alexey Kardashevskiy
2015-05-12  0:04         ` Gavin Shan
2015-05-12  0:04           ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 11/21] powerpc/pci: Don't scan empty slot Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-01  6:02 ` [PATCH v4 12/21] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
2015-05-01  6:02   ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 13/21] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-09 14:30   ` Alexey Kardashevskiy
2015-05-09 14:30     ` Alexey Kardashevskiy
2015-05-11  7:19     ` Gavin Shan
2015-05-11  7:19       ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 14/21] powerpc/powernv: Functions to get/reset PCI slot status Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-09 14:44   ` Alexey Kardashevskiy
2015-05-09 14:44     ` Alexey Kardashevskiy
2015-05-01  6:03 ` [PATCH v4 15/21] powerpc/pci: Delay creating pci_dn Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-09 14:55   ` Alexey Kardashevskiy
2015-05-09 14:55     ` Alexey Kardashevskiy
2015-05-11  7:21     ` Gavin Shan
2015-05-11  7:21       ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 16/21] powerpc/pci: Create eeh_dev while " Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-09 15:08   ` Alexey Kardashevskiy
2015-05-09 15:08     ` Alexey Kardashevskiy
2015-05-11  7:24     ` Gavin Shan
2015-05-11  7:24       ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 17/21] powerpc/pci: Export traverse_pci_device_nodes() Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 18/21] powerpc/pci: Update bridge windows on PCI plugging Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 19/21] drivers/of: Support adding sub-tree Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-01 12:54   ` Rob Herring
2015-05-01 12:54     ` Rob Herring
2015-05-01 15:22     ` Benjamin Herrenschmidt
2015-05-01 15:22       ` Benjamin Herrenschmidt
2015-05-01 18:46       ` Rob Herring
2015-05-01 18:46         ` Rob Herring
2015-05-01 22:57         ` Benjamin Herrenschmidt
2015-05-01 22:57           ` Benjamin Herrenschmidt
2015-05-01 23:29           ` Benjamin Herrenschmidt
2015-05-01 23:29             ` Benjamin Herrenschmidt
2015-05-02  2:48             ` Benjamin Herrenschmidt
2015-05-02  2:48               ` Benjamin Herrenschmidt
2015-05-04  1:30               ` Gavin Shan
2015-05-04  1:30                 ` Gavin Shan
2015-05-04  4:51                 ` Benjamin Herrenschmidt
2015-05-04  4:51                   ` Benjamin Herrenschmidt
2015-05-04  0:23             ` Gavin Shan
2015-05-04  0:23               ` Gavin Shan
     [not found]           ` <1430521038.7979.70.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2015-05-04 16:41             ` Pantelis Antoniou
2015-05-04 16:41               ` Pantelis Antoniou
2015-05-04 16:41               ` Pantelis Antoniou
2015-05-04 21:14               ` Benjamin Herrenschmidt
2015-05-04 21:14                 ` Benjamin Herrenschmidt
2015-05-13 23:35                 ` Benjamin Herrenschmidt
2015-05-13 23:35                   ` Benjamin Herrenschmidt
     [not found]                   ` <1431560124.20218.91.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2015-05-14  0:18                     ` Rob Herring
2015-05-14  0:18                       ` Rob Herring
2015-05-14  0:18                       ` Rob Herring
     [not found]                       ` <CAL_JsqKqTa5eg3eOqx3bkeNdO_920WwDiRbQaxwWLEWpCypFmA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-14  0:54                         ` Benjamin Herrenschmidt
2015-05-14  0:54                           ` Benjamin Herrenschmidt
2015-05-14  0:54                           ` Benjamin Herrenschmidt
2015-05-14  6:23                           ` Pantelis Antoniou
2015-05-14  6:23                             ` Pantelis Antoniou
2015-05-14  6:46                             ` Benjamin Herrenschmidt
2015-05-14  6:46                               ` Benjamin Herrenschmidt
2015-05-14  7:04                               ` Pantelis Antoniou
2015-05-14  7:04                                 ` Pantelis Antoniou
     [not found]                                 ` <3988EABE-3DE9-4E1C-9778-22E35138E359-wVdstyuyKrO8r51toPun2/C9HSW9iNxf@public.gmane.org>
2015-05-14  7:14                                   ` Benjamin Herrenschmidt
2015-05-14  7:14                                     ` Benjamin Herrenschmidt
2015-05-14  7:14                                     ` Benjamin Herrenschmidt
2015-05-14  7:19                                     ` Pantelis Antoniou
2015-05-14  7:19                                       ` Pantelis Antoniou
2015-05-14  7:19                                       ` Pantelis Antoniou
     [not found]                                       ` <75F026CA-5AC1-4106-B2F0-AB0D006DEF5A-wVdstyuyKrO8r51toPun2/C9HSW9iNxf@public.gmane.org>
2015-05-14  7:25                                         ` Benjamin Herrenschmidt
2015-05-14  7:25                                           ` Benjamin Herrenschmidt
2015-05-14  7:25                                           ` Benjamin Herrenschmidt
2015-05-14  7:29                                           ` Benjamin Herrenschmidt
2015-05-14  7:29                                             ` Benjamin Herrenschmidt
     [not found]                                           ` <1431588358.4160.42.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2015-05-14  7:34                                             ` Pantelis Antoniou
2015-05-14  7:34                                               ` Pantelis Antoniou
2015-05-14  7:34                                               ` Pantelis Antoniou
     [not found]                                               ` <D7FC0542-DD1A-428F-8E75-81620C6D83DC-wVdstyuyKrO8r51toPun2/C9HSW9iNxf@public.gmane.org>
2015-05-14  7:47                                                 ` Benjamin Herrenschmidt
2015-05-14  7:47                                                   ` Benjamin Herrenschmidt
2015-05-14  7:47                                                   ` Benjamin Herrenschmidt
2015-05-14 11:02                                                   ` Pantelis Antoniou
2015-05-14 11:02                                                     ` Pantelis Antoniou
2015-05-14 11:02                                                     ` Pantelis Antoniou
2015-05-14 23:25                                                     ` Benjamin Herrenschmidt
2015-05-14 23:25                                                       ` Benjamin Herrenschmidt
     [not found]                           ` <1431564871.4160.8.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2015-06-07  7:54                             ` Grant Likely
2015-06-07  7:54                               ` Grant Likely
     [not found]                               ` <20150607075422.6ECE9C40A12-WNowdnHR2B42iJbIjFUEsiwD8/FfD2ys@public.gmane.org>
2015-06-08 20:57                                 ` Benjamin Herrenschmidt
2015-06-08 20:57                                   ` Benjamin Herrenschmidt
     [not found]                                   ` <1433797073.4526.163.camel-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2015-06-08 21:34                                     ` Grant Likely
2015-06-08 21:34                                       ` Grant Likely
2015-06-10  6:55                                       ` Gavin Shan
2015-05-03 23:28     ` Gavin Shan
2015-05-03 23:28       ` Gavin Shan
2015-05-15  1:27   ` Gavin Shan
2015-05-15  1:27     ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 20/21] powerpc/powernv: Select OF_DYNAMIC Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-01  6:03 ` [PATCH v4 21/21] pci/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2015-05-01  6:03   ` Gavin Shan
2015-05-09 15:54   ` Alexey Kardashevskiy
2015-05-09 15:54     ` Alexey Kardashevskiy
2015-05-11  7:38     ` Gavin Shan
2015-05-11  7:38       ` Gavin Shan
2015-05-08 23:59 ` [PATCH v4 00/21] PowerPC/PowerNV: PCI Slot Management Alexey Kardashevskiy
2015-05-08 23:59   ` Alexey Kardashevskiy
2015-05-11  7:40   ` Gavin Shan
2015-05-11  7:40     ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55514F09.4040100@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.