From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Interrupt code cleanup [RFC] Date: Wed, 31 Aug 2011 16:52:13 +0100 Message-ID: <4E5E58AD.9060500@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "xen-devel@lists.xensource.com" , Keir Fraser List-Id: xen-devel@lists.xenproject.org Following several recent subtle but serious bugs in the core of the Xen interrupt code, I am starting an extended project to clean up the code, remove the false logic (there are several areas where the logic is wrong now that per-cpu IDTs have been introduced), and to try and document some of the code and general flow. So far, I have already done: 1) Fold irq_status into irq_cfg. This saves 744 bytes per CPU of data which is frequently referenced. 2) Introduce irq_cfg.old_vector for the same reason that cpu_mask and old_cpu_mask exist. This removes a brute force search in __clear_irq_vector() and allows for 3) 3) Remove irq_vector and replace functionality with irq_cfg.vector (and old_vector where relevant). This is because the logic surrounding irq_vector is false with per-cpu IDTs I have done some preliminary analysis of the code and data (before changes listed above): (Assuming 64bit, NR_CPUS = 256 and not using a debug build) * Size of irq_desc is 140 bytes of data (cache aligned to 64 bytes), so 3 cache lines. * Size of irq_cfg is 83 bytes of data (non cache aligned), so 2 (possibly 3) cache lines. * irq_desc.affinity and pending_mask seem functionally related to irq_cfg but are in a separate structure. * irq_desc.status and irq_cfg.move_in_progress have overlapping information which could possibly be simplified. * irq_desc.irq duplicates data which almost always a parameter to irq related functions, and trivial to calculate using pointer arithmetic if not otherwise available. (irq_desc[irq] === irq across the codebase) * irq_desc.chip_data is a pointer to an irq_cfg which is always available using the irq number and irq_cfg array. (irq_desc[irq].chip_data === irq_cfg[irq] across the codebase) While the above is not a specific indication of things which need to be changed, it is interesting to note that irq_desc without the irq and chip_data fields becomes 128 bytes. Also, if the cpumask_t's were moved to irq_cfg, then irq_desc would be 64 bytes and wouldn't scale with NR_CPUS at all. Having said that, my real question is "is there a good reason why irq_desc and irq_cfg are separate structures?". They are both referenced in all interrupt code paths. At the moment, all cache alignment (including xmalloc and friends) is done by the compiler on the assumption that the cache line length is 64 bytes. Is it sensible to try and pull the L1 cache line size out of cpuid at boot time and work with that? Are there many systems in common use which do not have 64 byte cache line sizes? One error which I am aware of, and will be sorting, is to do with allocation of vectors for gsi's. The EOI broadcast from the Local APIC to the IO-APICs only takes into account the vector, not the target CPU, meaning that two different interrupts with the same vector but different CPUs will cross EOI each other. Are there any areas people have noticed where the irq code is limited and could be improved? One which comes to mind is the fact that at the moment, an irq migration requires the same free vector on all target PCPUs which becomes problematic if the affinity is for more than 1 CPU on systems with large numbers of virtual functions with large numbers of interrupts each. (I have seen one case where this problem caused a guest because Xen could not allocate a vector to pass through, even though it had room to do so. I fear the problem will get worse in the future) Sorry if this appears a bit jumbled - It is still a bit of a brainstorm at the moment, but I wanted peoples views/opinions. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com