From mboxrd@z Thu Jan 1 00:00:00 1970 From: Julien Grall Subject: Re: [RFC PATCH v3 05/18] xen/arm: ITS: Port ITS driver to xen Date: Mon, 22 Jun 2015 18:16:14 +0100 Message-ID: <558842DE.10100@citrix.com> References: <1434974517-12136-1-git-send-email-vijay.kilari@gmail.com> <1434974517-12136-6-git-send-email-vijay.kilari@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1434974517-12136-6-git-send-email-vijay.kilari@gmail.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: vijay.kilari@gmail.com, Ian.Campbell@citrix.com, julien.grall@citrix.com, stefano.stabellini@eu.citrix.com, stefano.stabellini@citrix.com, tim@xen.org, xen-devel@lists.xen.org Cc: Prasun.Kapoor@caviumnetworks.com, Vijaya Kumar K , manish.jaggi@caviumnetworks.com List-Id: xen-devel@lists.xenproject.org Hi, On 22/06/15 13:01, vijay.kilari@gmail.com wrote: > From: Vijaya Kumar K > > Only required changes from Linux ITS driver is ported > and compiled Can you list the changes you took from Linux? The coding style is not the same so it's hard to know what is the difference. It would also have been nice to get the latest fix from Linux and say on which version you are based. [..] > +/* > + * The ITS structure - contains most of the infrastructure, with the > + * msi_controller, the command queue, the collections, and the list of > + * devices writing to it. > + */ > +struct its_node { > + spinlock_t lock; > + struct list_head entry; > + void __iomem *base; The indentation seems wrong. > + unsigned long phys_base; > + unsigned long phys_size; > + its_cmd_block *cmd_base; > + its_cmd_block *cmd_write; > + void *tables[GITS_BASER_NR_REGS]; > + struct its_collection *collections; > + u64 flags; > + u32 ite_size; > + struct dt_device_node *dt_node; > +}; > + > +#define ITS_ITT_ALIGN SZ_256 > + > +static LIST_HEAD(its_nodes); > +static DEFINE_SPINLOCK(its_lock); > +static struct rdist_prop *gic_rdists; > + > +#define gic_data_rdist() (per_cpu(rdist, smp_processor_id())) > +#define gic_data_rdist_rd_base() (per_cpu(rdist, smp_processor_id()).rbase) You could use gic_data_rdist().rbase [..] > +#ifdef DEBUG_GIC_ITS > +void dump_cmd(its_cmd_block *cmd) > +{ > + printk("ITS: Phys_cmd CMD[0] = 0x%lx CMD[1] = 0x%lx CMD[2] = 0x%lx CMD[3] = 0x%lx\n", > + cmd->raw_cmd[0], cmd->raw_cmd[1], cmd->raw_cmd[2], cmd->raw_cmd[3]); > +} > +#endif > + > +#define ITS_CMD_QUEUE_SZ SZ_64K > +#define ITS_CMD_QUEUE_NR_ENTRIES (ITS_CMD_QUEUE_SZ / sizeof(its_cmd_block)) > + > +#define ALIGN(x, a) (((x) + (a - 1)) & ~(a - 1)) Please use ROUNDUP which does the same job. > + > +typedef struct its_collection *(*its_cmd_builder_t)(its_cmd_block *, > + struct its_cmd_desc *); > + > +static struct its_collection *its_build_mapd_cmd(its_cmd_block *cmd, > + struct its_cmd_desc *desc) > +{ > + unsigned long itt_addr; > + u8 size; > + > + size = max(order_base_2(desc->its_mapd_cmd.dev->nr_ites), 1); Why did you replace the ilog2 by order_base_2? > + itt_addr = __pa(desc->its_mapd_cmd.dev->itt_addr); > + itt_addr = ALIGN(itt_addr, ITS_ITT_ALIGN); > + > + memset(cmd, 0x0, sizeof(its_cmd_block)); You duplicate this line in ever build_*_cmd function. I think you should call only in one place before the builder. > + cmd->mapd.cmd = GITS_CMD_MAPD; > + cmd->mapd.devid = desc->its_mapd_cmd.dev->device_id; > + cmd->mapd.size = size - 1; > + cmd->mapd.itt = itt_addr >> 8; I think the code is more difficult to read without the helpers. You opencode every trick in all the builder rather than in a single place. > + cmd->mapd.valid = desc->its_mapd_cmd.valid; > + > +#ifdef DEBUG_GIC_ITS > + dump_cmd(cmd); > +#endif The dump can be done in a single place rather than in every builder. > + /* Take first collection for sync */ > + return &desc->its_mapd_cmd.dev->its->collections[0]; > +} > + > +static struct its_collection *its_build_mapc_cmd(its_cmd_block *cmd, > + struct its_cmd_desc *desc) > +{ > + memset(cmd, 0x0, sizeof(its_cmd_block)); > + cmd->mapc.cmd = GITS_CMD_MAPC; > + cmd->mapc.col = desc->its_mapc_cmd.col->col_id; > + /* > + * Thought target address field is only 32 bit. I don't understand what you mean with "thought". > + * So take bit[48:16] Can you explain why you are using only the [48:16]? I.e that target addresses must be 64KB aligned... > + */ > + cmd->mapc.ta = desc->its_mapc_cmd.col->target_address >> 16; > + cmd->mapc.valid = desc->its_mapc_cmd.valid; > + > +#ifdef DEBUG_GIC_ITS > + dump_cmd(cmd); > +#endif Missing newline. [..] > +static void its_send_single_command(struct its_node *its, > + its_cmd_builder_t builder, > + struct its_cmd_desc *desc) > +{ > + its_cmd_block *cmd, *sync_cmd, *next_cmd; > + struct its_collection *sync_col; > + unsigned long flags; > + > + spin_lock_irqsave(&its->lock, flags); > + > + cmd = its_allocate_entry(its); > + if ( !cmd ) > + { /* We're soooooo screewed... */ I think the /* ... */ should go on a separate line. > + its_err_ratelimit("ITS can't allocate, dropping command\n"); > + spin_unlock_irqrestore(&its->lock, flags); > + return; > + } > + sync_col = builder(cmd, desc); > + its_flush_cmd(its, cmd); > + > + if ( sync_col ) > + { > + sync_cmd = its_allocate_entry(its); > + if ( !sync_cmd ) > + { > + its_err_ratelimit("ITS can't SYNC, skipping\n"); > + goto post; > + } > + sync_cmd->sync.cmd = GITS_CMD_SYNC; > + sync_cmd->sync.ta = sync_col->target_address >> 16; This is the second place where you use target_address with ">> 16". May I ask why you dropped the helpers? At least it keeping such shift in a single place. Also, given that all the usage of target_address is done with shift. Shouldn't you just do the shift once at setup? [..] > +/* > + * We allocate 64kB for PROPBASE. That gives us at most 64K LPIs to > + * deal with (one configuration byte per interrupt). PENDBASE has to > + * be 64kB aligned (one bit per LPI, plus 8192 bits for SPI/PPI/SGI). > + */ > +#define LPI_PROPBASE_SZ SZ_64K > +#define LPI_PENDBASE_SZ (LPI_PROPBASE_SZ / 8 + SZ_1K) > + > +/* > + * This is how many bits of ID we need, including the useless ones. > + */ > +#define LPI_NRBITS fls(LPI_PROPBASE_SZ + SZ_8K) - 1 Missing ( ... ) to ensure that LPI_NRBITS will be expanded correctly. Although there is no need to switch to fls(...) you recently introduced ilog2. > + > +static int __init its_alloc_lpi_tables(void) > +{ > + paddr_t paddr; > + > + gic_rdists->prop_page = > + alloc_xenheap_pages(get_order_from_bytes(LPI_PROPBASE_SZ), 0); > + > + if ( !gic_rdists->prop_page ) > + { > + its_err("Failed to allocate PROPBASE\n"); > + return -ENOMEM; > + } > + > + paddr = __pa(gic_rdists->prop_page); > + its_info("GIC: using LPI property table @%pa\n", &paddr); > + > + /* Priority 0xa0, Group-1, disabled */ Well, you are not sure that GIC_PRI_IRQ is equal to 0xa0. > + memset(gic_rdists->prop_page, > + GIC_PRI_IRQ | LPI_PROP_GROUP1, > + LPI_PROPBASE_SZ); > + > + /* Make sure the GIC will observe the written configuration */ > + clean_and_invalidate_dcache_va_range(gic_rdists->prop_page, > + LPI_PROPBASE_SZ); > + > + return 0; > +} > + > +static const char *its_base_type_string[] = { > + [GITS_BASER_TYPE_DEVICE] = "Devices", > + [GITS_BASER_TYPE_VCPU] = "Virtual CPUs", > + [GITS_BASER_TYPE_CPU] = "Physical CPUs", > + [GITS_BASER_TYPE_COLLECTION] = "Interrupt Collections", > + [GITS_BASER_TYPE_RESERVED5] = "Reserved (5)", > + [GITS_BASER_TYPE_RESERVED6] = "Reserved (6)", > + [GITS_BASER_TYPE_RESERVED7] = "Reserved (7)", > +}; > + > +static void its_free_tables(struct its_node *its) > +{ > + int i; > + > + for ( i = 0; i < GITS_BASER_NR_REGS; i++ ) > + { > + if ( its->tables[i] ) > + { > + xfree(its->tables[i]); As said on the previous version, the memory for the table is allocated via alloc_xenheap_pages. So freeing the memory should be done via free_xenheap_pages. > + its->tables[i] = NULL; > + } > + } > +} > + > +static int its_alloc_tables(struct its_node *its) > +{ > + int err; > + int i; > + int psz = SZ_64K; > + u64 shr = GITS_BASER_InnerShareable; > + > + for ( i = 0; i < GITS_BASER_NR_REGS; i++ ) > + { > + u64 val = readq_relaxed(its->base + GITS_BASER + i * 8); > + u64 type = GITS_BASER_TYPE(val); > + u64 entry_size = GITS_BASER_ENTRY_SIZE(val); > + int order = get_order_from_bytes(psz); > + int alloc_size; > + u64 tmp; > + void *base; > + > + if (type == GITS_BASER_TYPE_NONE) if ( ... ) > + continue; > + > + /* > + * Allocate as many entries as required to fit the > + * range of device IDs that the ITS can grok... The ID > + * space being incredibly sparse, this results in a > + * massive waste of memory. > + * > + * For other tables, only allocate a single page. > + */ > + if ( type == GITS_BASER_TYPE_DEVICE ) > + { > + u64 typer = readq_relaxed(its->base + GITS_TYPER); > + u32 ids = GITS_TYPER_DEVBITS(typer); > + > + order = get_order_from_bytes((1UL << ids) * 8); Why 8 and not entry_size? Also, the latest Linux version is using max(...) to round up to the default page granularity if the size is smaller. > + if (order >= MAX_ORDER) > + { > + order = MAX_ORDER - 1; > + its_warn("Device Table too large, reduce its page order to %u\n", > + order); > + } > + } > + > + alloc_size = (1 << order) * PAGE_SIZE; > + base = alloc_xenheap_pages(order, 0); > + if ( !base ) > + { > + err = -ENOMEM; > + goto out_free; > + } > + memset(base, 0, alloc_size); > + its->tables[i] = base; > +retry_baser: > + val = (__pa(base) | > + (type << GITS_BASER_TYPE_SHIFT) | > + ((entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT) | > + GITS_BASER_WaWb | Same here, there was Linux changes to use non-cacheable where there is no shareability. > + shr | > + GITS_BASER_VALID); > + > + switch (psz) { > + case SZ_4K: > + val |= GITS_BASER_PAGE_SIZE_4K; > + break; > + case SZ_16K: > + val |= GITS_BASER_PAGE_SIZE_16K; > + break; > + case SZ_64K: > + val |= GITS_BASER_PAGE_SIZE_64K; > + break; > + } > + > + val |= (((alloc_size / psz) - 1) & 0xffUL); I don't think the & 0xffUL is useful. If alloc_size is too big it would result to an invalid val anyway. FWIW, Linux doesn't have it. > + > + writeq_relaxed(val, its->base + GITS_BASER + i * 8); > + tmp = readq_relaxed(its->base + GITS_BASER + i * 8); > + > + if ( (val ^ tmp) & GITS_BASER_SHAREABILITY_MASK ) > + { > + /* > + * Shareability didn't stick. Just use > + * whatever the read reported, which is likely > + * to be the only thing this redistributor > + * supports. > + */ > + shr = tmp & GITS_BASER_SHAREABILITY_MASK; > + goto retry_baser; > + } > + > + if ( (val ^ tmp) & GITS_BASER_PAGE_SIZE_MASK ) > + { > + /* > + * Page size didn't stick. Let's try a smaller > + * size and retry. If we reach 4K, then > + * something is horribly wrong... > + */ > + switch (psz) { > + case SZ_16K: > + psz = SZ_4K; > + goto retry_baser; > + case SZ_64K: > + psz = SZ_16K; > + goto retry_baser; > + } > + } > + > + /* skip comparin cacheability fields as they are implementation /* * Fooo */ And s/comparin/comparing/ > + * defined. > + */ > + val = val << 5; > + tmp = tmp << 5; Why do you need a such trick? The whole purpose of the below check is to verify that the value is correctly taken by the ITS. As the field should be RW, we would be in big trouble if the cache is not taken into account by the ITS. FWIW, Linux doesn't do it. > + if ( val != tmp ) > + { > + its_err("ITS: GITS_BASER%d doesn't stick: %lx %lx\n", > + i, (unsigned long) val, (unsigned long) tmp); You won't print a correct value here with your "val <<". [..] > + /* > + * Make sure any change to the table is observable by the GIC. > + */ > + dsb(sy); > + > + /* set PROPBASE */ > + val = (__pa(gic_rdists->prop_page) | > + GICR_PROPBASER_InnerShareable | > + GICR_PROPBASER_WaWb | > + ((LPI_NRBITS - 1) & GICR_PROPBASER_IDBITS_MASK)); > + > + writeq_relaxed(val, rbase + GICR_PROPBASER); > + tmp = readq_relaxed(rbase + GICR_PROPBASER); > + > + if ( (tmp ^ val) & GICR_PROPBASER_SHAREABILITY_MASK ) > + { If there is no shareability, Linux is dropping the cacheability. > + its_info("GIC: using cache flushing for LPI property table\n"); > + gic_rdists->flags |= RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING; > + } > + > + /* set PENDBASE */ > + val = (__pa(pend_page) | > + GICR_PROPBASER_InnerShareable | > + GICR_PROPBASER_WaWb); > + > + writeq_relaxed(val, rbase + GICR_PENDBASER); > + Ditto [..] > +static void its_cpu_init_collection(void) > +{ > + struct its_node *its; > + int cpu; > + > + spin_lock(&its_lock); > + cpu = smp_processor_id(); > + > + list_for_each_entry(its, &its_nodes, entry) > + { > + u64 target; > + /* > + * We now have to bind each collection to its target > + * redistributor. > + */ > + if ( readq_relaxed(its->base + GITS_TYPER) & GITS_TYPER_PTA ) > + { > + /* > + * This ITS wants the physical address of the > + * redistributor. > + */ > + target = gic_data_rdist().phys_base; > + } > + else > + { > + /* > + * This ITS wants a linear CPU number. > + */ > + target = readq_relaxed(gic_data_rdist_rd_base() + GICR_TYPER); > + target = GICR_TYPER_CPU_NUMBER(target); Why did you drop the << 16? Although, as said earlier, given the usage of target_address you could do shift >> directly in this function rather than on multiple__ place. [..] > + > +static int its_probe(struct dt_device_node *node) > +{ [..] > + err = its_alloc_collections(its); > + if ( err ) > + goto out_free_tables; > + > + baser = (__pa(its->cmd_base) | > + GITS_CBASER_WaWb | > + GITS_CBASER_InnerShareable | > + (ITS_CMD_QUEUE_SZ / SZ_4K - 1) | > + GITS_CBASER_VALID); > + > + writeq_relaxed(baser, its->base + GITS_CBASER); > + tmp = readq_relaxed(its->base + GITS_CBASER); Missing checking the shareability. > + writeq_relaxed(0, its->base + GITS_CWRITER); > + writel_relaxed(GITS_CTLR_ENABLE, its->base + GITS_CTLR); > + > + if ( (tmp ^ baser) & GITS_BASER_SHAREABILITY_MASK ) > + { > + its_info("ITS: using cache flushing for cmd queue\n"); > + its->flags |= ITS_FLAGS_CMDQ_NEEDS_FLUSHING; > + } Technically you should check that the ITS node contains a property "msi-controller". [..] > +int its_init(struct rdist_prop *rdists) > +{ > + struct dt_device_node *np = NULL; > + > + static const struct dt_device_match its_device_ids[] __initconst = If the structure is __initconst, the function should be __init too. > + { > + DT_MATCH_GIC_ITS, > + { /* sentinel */ }, > + }; > + > + for (np = dt_find_matching_node(NULL, its_device_ids); np; > + np = dt_find_matching_node(np, its_device_ids)) > + its_probe(np); > + > + if ( list_empty(&its_nodes) ) > + { > + its_warn("ITS: No ITS available, not enabling LPIs\n"); > + return -ENXIO; > + } > + > + gic_rdists = rdists; > + its_alloc_lpi_tables(); > + > + BUILD_BUG_ON(sizeof(its_cmd_block) != 32); Why this build bug on here? Shouldn't it be part of the builder code? [..] > diff --git a/xen/include/asm-arm/gic-its.h b/xen/include/asm-arm/gic-its.h > new file mode 100644 > index 0000000..4e42f7f > --- /dev/null > +++ b/xen/include/asm-arm/gic-its.h > @@ -0,0 +1,214 @@ > +/* > + * Copyright (C) 2015 Cavium Inc. > + * Vijaya Kumar K > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see . > + */ > + > +#ifndef __ASM_ARM_GIC_ITS_H__ > +#define __ASM_ARM_GIC_ITS_H__ > + > +#include > + > +/* > + * Collection structure - just an ID, and a redistributor address to > + * ping. We use one per CPU as a bag of interrupts assigned to this s/bag/collection/ > + * CPU. > + */ > +struct its_collection { > + u64 target_address; > + u16 col_id; > +}; > + > +/* ITS command structures */ > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:56; > + u64 res2:64; > + u64 col:16; > + u64 ta:32; > + u64 res3:15; > + u64 valid:1; > + u64 res4:64; > +}mapc_cmd_t; Missing space > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 size:5; > + u64 res2:59; > + u64 res3:8; > + u64 itt:40; > + u64 res4:15; > + u64 valid:1; > + u64 res5:64; > +}mapd_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 col:16; > + u64 res3:48; > + u64 res4:64; > +}mapi_cmd_t; > + ditto > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 phy_id:32; > + u64 col:16; > + u64 res3:48; > + u64 res4:64; > +}mapvi_cmd_t; > + ditto > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 col:16; > + u64 res3:48; > + u64 res4:64; > +}movi_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 res3:64; > + u64 res4:64; > +}discard_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 res3:64; > + u64 res4:64; > +}inv_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:56; > + u64 res2:64; > + u64 res3:16; > + u64 ta1:32; > + u64 res4:16; > + u64 res5:16; > + u64 ta2:32; > + u64 res6:16; > +}movall_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:56; > + u64 res2:64; > + u64 col:16; > + u64 res3:48; > + u64 res4:64; > +}invall_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 res3:64; > + u64 res4:64; > +}int_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:24; > + u64 devid:32; > + u64 event:32; > + u64 res2:32; > + u64 res3:64; > + u64 res4:64; > +}clear_cmd_t; ditto > + > +typedef struct __packed { > + u64 cmd:8; > + u64 res1:56; > + u64 res2:64; > + u64 res3:16; > + u64 ta:32; > + u64 res4:16; > + u64 res5:64; > +}sync_cmd_t; ditto > + > +typedef union { > + u64 raw_cmd[4]; > + mapc_cmd_t mapc; > + mapd_cmd_t mapd; > + mapi_cmd_t mapi; > + mapvi_cmd_t mapvi; > + movi_cmd_t movi; > + movall_cmd_t movall; > + inv_cmd_t inv; > + invall_cmd_t invall; > + discard_cmd_t discard; > + int_cmd_t int_cmd; > + clear_cmd_t clear; > + sync_cmd_t sync; > +}its_cmd_block; ditto > +/* > + * The ITS view of a device - belongs to an ITS, a collection, owns an > + * interrupt translation table, and a list of interrupts. > + */ > +struct its_device { > + /* Physical ITS */ > + struct its_node *its; > + /* Device ITT address */ > + paddr_t itt_addr; > + /* Device ITT size */ > + unsigned long itt_size; > + /* Physical LPI map */ > + unsigned long *lpi_map; > + /* First Physical LPI number assigned */ > + u32 lpi_base; > + /* Number of Physical LPIs assigned */ > + int nr_lpis; > + /* Number of ITES entries */ > + u32 nr_ites; > + /* Physical Device id */ > + u32 device_id; > +}; > + > +static inline uint8_t its_decode_cmd(its_cmd_block *cmd) > +{ > + return cmd->raw_cmd[0] & 0xff; > +} Please defined any exported function of gic-v3-its.c here and not in patches where they are used. [..] > diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h > index 9e2acb7..e9d5f36 100644 > --- a/xen/include/asm-arm/gic.h > +++ b/xen/include/asm-arm/gic.h > @@ -161,6 +161,7 @@ > DT_MATCH_COMPATIBLE("arm,gic-400") > > #define DT_MATCH_GIC_V3 DT_MATCH_COMPATIBLE("arm,gic-v3") > +#define DT_MATCH_GIC_ITS DT_MATCH_COMPATIBLE("arm,gic-v3-its") > > /* > * GICv3 registers that needs to be saved/restored > diff --git a/xen/include/asm-arm/gic_v3_defs.h b/xen/include/asm-arm/gic_v3_defs.h > index acbb906..dc4fe14 100644 > --- a/xen/include/asm-arm/gic_v3_defs.h > +++ b/xen/include/asm-arm/gic_v3_defs.h > @@ -45,9 +45,11 @@ > #define GICC_SRE_EL2_DIB (1UL << 2) > #define GICC_SRE_EL2_ENEL1 (1UL << 3) > > +#define GICR_CTL_ENABLE (1U << 0) Why? You've defined GICR_CTLR_ENABLE_LPIS below and don't use it. Regards, -- Julien Grall