From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: alistair@popple.id.au
Cc: Sam Bobroff <sbobroff@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
linuxppc-dev@lists.ozlabs.org,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages
Date: Tue, 9 Jul 2019 10:57:27 +1000 [thread overview]
Message-ID: <8f0b1403-89b7-e16d-dadb-8ae92ee60620@ozlabs.ru> (raw)
In-Reply-To: <2908012.2rHUNcR828@townsend>
On 08/07/2019 17:01, alistair@popple.id.au wrote:
> Hi Alexey,
>
> Couple of comment/questions below.
>
>> - /*
>> - * Reserve page 0 so it will not be used for any mappings.
>> - * This avoids buggy drivers that consider page 0 to be invalid
>> - * to crash the machine or even lose data.
>> - */
>> - if (tbl->it_offset == 0)
>> - set_bit(0, tbl->it_map);
>> + tbl->it_reserved_start = res_start;
>> + tbl->it_reserved_end = res_end;
>> + iommu_table_reserve_pages(tbl);
>
> Personally I think it would be cleaner to set tbl->it_reserved_start/end in
> iommu_table_reserve_pages() rather than separating the setup like this.
Yeah I suppose...
>
>>
>> /* We only split the IOMMU table if we have 1GB or more of space */
>> if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 *
>> 1024))
>> @@ -727,12 +755,7 @@ static void iommu_table_free(struct kref *kref)
>> return;
>> }
>>
>> - /*
>> - * In case we have reserved the first bit, we should not emit
>> - * the warning below.
>> - */
>> - if (tbl->it_offset == 0)
>> - clear_bit(0, tbl->it_map);
>> + iommu_table_release_pages(tbl);
>>
>> /* verify that table contains no entries */
>> if (!bitmap_empty(tbl->it_map, tbl->it_size))
>> @@ -1037,8 +1060,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
>> for (i = 0; i < tbl->nr_pools; i++)
>> spin_lock(&tbl->pools[i].lock);
>>
>> - if (tbl->it_offset == 0)
>> - clear_bit(0, tbl->it_map);
>> + iommu_table_reserve_pages(tbl);
>>
>> if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
>> pr_err("iommu_tce: it_map is not empty");
>> @@ -1068,9 +1090,7 @@ void iommu_release_ownership(struct iommu_table
>> *tbl)
>>
>> memset(tbl->it_map, 0, sz);
>>
>> - /* Restore bit#0 set by iommu_init_table() */
>> - if (tbl->it_offset == 0)
>> - set_bit(0, tbl->it_map);
>> + iommu_table_release_pages(tbl);
>
> Are the above two changes correct?
No they are not, this is a bug. Thanks for spotting, repost is coming.
> From the perspective of code
> behaviour this
> seems to swap the set/clear_bit() calls. For example above you are
> replacing
> set_bit(0, tbl->it_map) with a call to iommu_table_release_pages() which
> does
> clear_bit(0, tbl->it_map) instead.
>
> - Alistair
>
>> for (i = 0; i < tbl->nr_pools; i++)
>> spin_unlock(&tbl->pools[i].lock);
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
>> 126602b4e399..ce2efdb3900d 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -2422,6 +2422,7 @@ static long
>> pnv_pci_ioda2_setup_default_config(struct
>> pnv_ioda_pe *pe) {
>> struct iommu_table *tbl = NULL;
>> long rc;
>> + unsigned long res_start, res_end;
>>
>> /*
>> * crashkernel= specifies the kdump kernel's maximum memory at
>> @@ -2435,19 +2436,46 @@ static long
>> pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe) * DMA
>> window can
>> be larger than available memory, which will
>> * cause errors later.
>> */
>> - const u64 window_size = min((u64)pe->table_group.tce32_size,
>> max_memory);
>> + const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
>>
>> - rc = pnv_pci_ioda2_create_table(&pe->table_group, 0,
>> - IOMMU_PAGE_SHIFT_4K,
>> - window_size,
>> - POWERNV_IOMMU_DEFAULT_LEVELS, false, &tbl);
>> + /*
>> + * We create the default window as big as we can. The constraint is
>> + * the max order of allocation possible. The TCE tableis likely to
>> + * end up being multilevel and with on-demand allocation in place,
>> + * the initial use is not going to be huge as the default window
>> aims
>> + * to support cripplied devices (i.e. not fully 64bit DMAble) only.
>> + */
>> + /* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
>> + const u64 window_size = min((maxblock * 8) << PAGE_SHIFT,
>> max_memory);
>> + /* Each TCE level cannot exceed maxblock so go multilevel if
>> needed */
>> + unsigned long tces_order = ilog2(window_size >> PAGE_SHIFT);
>> + unsigned long tcelevel_order = ilog2(maxblock >> 3);
>> + unsigned int levels = tces_order / tcelevel_order;
>> +
>> + if (tces_order % tcelevel_order)
>> + levels += 1;
>> + /*
>> + * We try to stick to default levels (which is >1 at the moment) in
>> + * order to save memory by relying on on-demain TCE level
>> allocation.
>> + */
>> + levels = max_t(unsigned int, levels, POWERNV_IOMMU_DEFAULT_LEVELS);
>> +
>> + rc = pnv_pci_ioda2_create_table(&pe->table_group, 0, PAGE_SHIFT,
>> + window_size, levels, false, &tbl);
>> if (rc) {
>> pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
>> rc);
>> return rc;
>> }
>>
>> - iommu_init_table(tbl, pe->phb->hose->node);
>> + /* We use top part of 32bit space for MMIO so exclude it from DMA */
>> + res_start = 0;
>> + res_end = 0;
>> + if (window_size > pe->phb->ioda.m32_pci_base) {
>> + res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>> + res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>> + }
>> + iommu_init_table_res(tbl, pe->phb->hose->node, res_start, res_end);
>>
>> rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>> if (rc) {
>
--
Alexey
next prev parent reply other threads:[~2019-07-09 0:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-30 7:03 [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Alexey Kardashevskiy
2019-05-30 7:03 ` [PATCH kernel v3 1/3] powerpc/iommu: Allow bypass-only for DMA Alexey Kardashevskiy
2019-06-03 2:03 ` David Gibson
2019-05-30 7:03 ` [PATCH kernel v3 2/3] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window Alexey Kardashevskiy
2019-07-08 7:01 ` alistair
2019-05-30 7:03 ` [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages Alexey Kardashevskiy
2019-07-08 7:01 ` alistair
2019-07-09 0:57 ` Alexey Kardashevskiy [this message]
2019-06-06 4:11 ` [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Shawn Anastasio
2019-06-06 7:17 ` Alistair Popple
2019-06-06 12:07 ` Oliver
2019-06-07 1:41 ` Alistair Popple
2019-06-10 5:19 ` Alexey Kardashevskiy
2019-06-12 5:05 ` Shawn Anastasio
2019-06-12 6:16 ` Oliver O'Halloran
2019-06-12 19:14 ` Shawn Anastasio
2019-06-12 7:07 ` Alexey Kardashevskiy
2019-06-12 19:15 ` Shawn Anastasio
2019-06-18 4:26 ` Shawn Anastasio
2019-06-18 6:39 ` Alexey Kardashevskiy
2019-06-18 7:00 ` Shawn Anastasio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f0b1403-89b7-e16d-dadb-8ae92ee60620@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=alistair@popple.id.au \
--cc=david@gibson.dropbear.id.au \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=oohall@gmail.com \
--cc=sbobroff@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).