From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Kenna Date: Sat, 28 Apr 2012 02:09:44 +0000 Subject: OpenSPARC T1 Processor L2 Cache Question Message-Id: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: sparclinux@vger.kernel.org Hello, My question is simple: does the L2 cache in the OpenSPARC T1 (niagara) processor use index hashing to determine the cache set index for a cache line? I can only find sources citing that the T2 does this [1] It seems like this was a new feature in the T2 designed to utilize the L2 cache more efficiently [2], which *implies* that the T1 does not. However, I cannot see this when I run experiments using a T1 that I have. More background information is below. The OpenSPARC T1 Microarchitecture Specification [3] states that the L2 cache in the UltraSPARC T1 has total size 3 MB and is divided into 4 banks. Each bank uses a 64 byte line size, is 768 KB in size, and has 1024 12-way associative sets. Bits [7:6] of the physical address are used to determine which bank memory maps to. Since there are 1024 sets in a bank, it would make sense to use physical address bits [17:8] (because these 10 bits are the next-most-significant after the bits determining the bank). I am hoping that these bits alone determine a set (without hashing); however, some experiments I ran do not support this. Basically, the experiment is to precisely control the virtual to physical memory mapping of a buffer in a process' address space and then read it back while using the perf-events subsystem (which uses the processor's hardware performance counters) to monitor L2 cache misses. Assuming that index hashing is not used, I compare the results when the buffer is allocated such that it uses all cache sets equally and when it is restricted to use only the first 128 cache sets (since I operate on the granularity of 8 KB pages). I expected the number of L2 cache misses in the second experiment to be much higher do to conflict misses (all memory mapping to the same sets), but, actually, the results are the same in both cases. To achieve this mapping, I made a miscdevice and set it up so that I can mmap it and it returns a buffer with the proper virtual to physical address mapping. I am using vm_insert_page() to insert the pages into the process' virtual address space. The physical address of pages are determined with page_to_phys() and then I mask bits [17:13] to determine which cache sets the page maps into. Thanks for any comments, referrals, or other help you can provide. -- Christopher Kenna [1] http://www.opensparc.net/offers/OpenSPARC_Internals_Book.pdf [2] http://www.paume.itb.ac.id/~iidc/opensparc/training%20openSPARC/pdf/06-KG-T2-Solaris-changes-final.pdf [3] http://users.ece.utexas.edu/~mcdermot/vlsi-2/OpenSPARCT1_Micro_Arch.pdf