From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264110AbUDGGwj (ORCPT ); Wed, 7 Apr 2004 02:52:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264106AbUDGGwi (ORCPT ); Wed, 7 Apr 2004 02:52:38 -0400 Received: from mtvcafw.sgi.com ([192.48.171.6]:63358 "EHLO omx3.sgi.com") by vger.kernel.org with ESMTP id S264110AbUDGGvs (ORCPT ); Wed, 7 Apr 2004 02:51:48 -0400 Date: Tue, 6 Apr 2004 23:50:00 -0700 From: Paul Jackson To: Denis Vlasenko Cc: colpatch@us.ibm.com, wli@holomorphy.com, linux-kernel@vger.kernel.org Subject: Re: [Patch 17/23] mask v2 = [6/7] nodemask_t_ia64_changes Message-Id: <20040406235000.6c06af9a.pj@sgi.com> In-Reply-To: <200404070855.03742.vda@port.imtp.ilyichevsk.odessa.ua> References: <20040401122802.23521599.pj@sgi.com> <20040401131240.00f7d74d.pj@sgi.com> <20040406043732.6fb2df9f.pj@sgi.com> <200404070855.03742.vda@port.imtp.ilyichevsk.odessa.ua> Organization: SGI X-Mailer: Sylpheed version 0.9.8 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Denis asked: > why such a simple thing require 700 bytes of code in the first place? Well ... it doesn't "require" 700 bytes of code. But it is currently consuming that much in this patch set, each time a for node loop is invoked. This is because "for_each_online_node" boils down to two copies of "find_next_bit" (to get the first bit and then to get the next bit), and in the file include/asm-ia64/bitops.h, find_next_bit() is the following hefty chunk of inline code: /* * Find next bit in a bitmap reasonably efficiently.. */ static inline int find_next_bit(const void *addr, unsigned long size, unsigned long offset) { unsigned long *p = ((unsigned long *) addr) + (offset >> 6); unsigned long result = offset & ~63UL; unsigned long tmp; if (offset >= size) return size; size -= result; offset &= 63UL; if (offset) { tmp = *(p++); tmp &= ~0UL << offset; if (size < 64) goto found_first; if (tmp) goto found_middle; size -= 64; result += 64; } while (size & ~63UL) { if ((tmp = *(p++))) goto found_middle; result += 64; size -= 64; } if (!size) return result; tmp = *p; found_first: tmp &= ~0UL >> (64-size); if (tmp == 0UL) /* Are any bits set? */ return result + size; /* Nope. */ found_middle: return result + __ffs(tmp); } === Some things that Matthew might want to try: 1) Don't inline ia64 find_next_bit 2) Hunt down and minimize uses find_next_bit (benefits more than just numamask) 3) Instead of having the loop macro evaluate to: for (i = first_node(mask); i < MAX_NUMNODES; i = next_node(i, mask)) rather have it evaluate something like this (node_set is more efficient): for ( ({ i = 0; while(!node_set(i, mask) && i < MAX_NUMNODES) i++; i; }); i < MAX_NUMNODES; ({ i++; while(!node_set(i, mask) && i < MAX_NUMNODES) i++; i; }) ) Hmmm ... (3) looks rather nice (in an ugly sort of way ...). It might be worth moving lower, perhaps into bitmap, for use by both cpumask and nodemask. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373