From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755385Ab1CYVwa (ORCPT ); Fri, 25 Mar 2011 17:52:30 -0400 Received: from mail-iw0-f174.google.com ([209.85.214.174]:62769 "EHLO mail-iw0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755114Ab1CYVw2 (ORCPT ); Fri, 25 Mar 2011 17:52:28 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=ichF1g2iABLagm2Yzkzlm/cu5pZAXz7vX1qM4mti6ib5VrbhbtLNaVqMxXKi2YGoxT 4CMJoAi/6n+hbLyFrr4eIKe0P4b3HarstVUDO+U/jq+uYL5/3YAXnq9h7h4xYe8U208x ChKbeKTlaSzX+nEbsfKH3rZS7bS7podNGrGb0= MIME-Version: 1.0 Date: Fri, 25 Mar 2011 21:52:28 +0000 Message-ID: Subject: advice sought: practicality of SMP cache coherency implemented in assembler (and a hardware detect line) From: Luke Kenneth Casson Leighton To: linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org folks, hi, i've hit an unusual situation where i'm responsible for evaluating and setting the general specification of a new multi-core processor, but it's based around a RISC core that cannot be significantly changed without going through some very expensive verification procedures. in the discussions, the engineer responsible for it said that modifying the cache is prohibitively expensive and time-consuming, but that one possible workaround would be to have a hardware detection mechanism of cache-write conflicts, to generate a software interrupt that you would then simply run some assembly code to flush the 1st-level cache line. the hardware detection mechanism could be tacked on, would be very quick and easy to implement, and would generate interrupts to the specific processor whose data required flushing. now, whilst it tickles my hardware hacker fancy like anything, because i feel that this could be used for many other purposes such as implementing spin-locks, i have some concerns about the performance implications that i'm not qualified or experienced enough to say one way or the other if it's a stonking good idea or just outright mad. so, bearing in mind that sensible answers will likely result in offers of a consulting contract to actually *implement* the software / assembly code for the linux kernel modifications required (yes, linux is already available for this RISC processor type - but only in single-core), i would greatly appreciate some help in getting answers to these questions: * is this even a good idea? does it "fly"? * if it does work, at what point do the number of cores involved just make it... completely impractical? over 2? over 4? 8? 16? * i believe the cache lines in the 1st level data cache are 8 bytes (and the AMBA / AXI bus on each is 64-bit wide) - is that reasonable? * does anyone know of any other processors that have actually implemented software-driven cache coherency, esp. ones with linux kernel running on them, and if so, how does it do? much appreciated considerate and informative answers - i must apologise that i will be immediately unsubscribing from linux-kernel list, and re-subscribing again in the near future, but will be watching responses via web-based list archives: the number of messages on lkml is too high to do otherwise. also for those of you who remember it: whilst it was fun in a scary kind of way, if would be nice if this didn't turn into the free-for-all whopper-thread that occurred back in 2005 or so - this multi-core processor is going to be based around an existing proven 20-year-old well-established RISC core that has been running linux for over a decade, it just has never been put into an SMP arrangement before and we're on rather short timescales to get it done. l.