From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CED2C43381 for ; Thu, 21 Feb 2019 03:54:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0672B2147A for ; Thu, 21 Feb 2019 03:54:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1550721243; bh=1AhAnlvD4N5kI//xf6oJXdEFYqCubX4gcaVFnS7Q4jc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=LczAsyS8on8qTeiPlmtYVziIlGbVFUTFvqusqg7vOYEKK3hYpdrXwN7xCD/IN4Tdw MkUfa2gU/XSjVAVhJWB2vNNNtVDejwtskHu5ZmMI7lVsH0I+UD8xdgfKE4ppP6Eyyq Hcfh15vOgD+9aoFRwMgni+EE4bmIquOZdZ8usjKw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726573AbfBUDyB (ORCPT ); Wed, 20 Feb 2019 22:54:01 -0500 Received: from mail.kernel.org ([198.145.29.99]:48130 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726016AbfBUDyA (ORCPT ); Wed, 20 Feb 2019 22:54:00 -0500 Received: from localhost (lfbn-1-18527-45.w90-101.abo.wanadoo.fr [90.101.69.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D895B20851; Thu, 21 Feb 2019 03:53:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1550721239; bh=1AhAnlvD4N5kI//xf6oJXdEFYqCubX4gcaVFnS7Q4jc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GrBdkMZI8ajYV85QnCrH50t0t0QEiPaLjtDCR0/yvjICp9qt9nR3gzjKGxtueY8Qb E4gIQowyH/BM/ScbOMmCBSSL1CpcF0f1B2+lHXzNZtECnTdF6X+ewf7kgrHA7IV2xj sWSlKcZ/yMyQ35OxQ2okKIEOCHv05QTtvU9L444c= Date: Thu, 21 Feb 2019 04:53:56 +0100 From: Frederic Weisbecker To: Linus Torvalds Cc: LKML , Sebastian Andrzej Siewior , Peter Zijlstra , Mauro Carvalho Chehab , "David S . Miller" , Thomas Gleixner , "Paul E . McKenney" , Frederic Weisbecker , Pavan Kondeti , Ingo Molnar , Joel Fernandes Subject: Re: [PATCH 05/32] locking/lockdep: Prepare valid_state() to handle plain masks Message-ID: <20190221035354.GA22364@lenoir> References: <20190212171423.8308-1-frederic@kernel.org> <20190212171423.8308-6-frederic@kernel.org> <20190213151648.GD8524@lenoir> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 13, 2019 at 11:47:13AM -0800, Linus Torvalds wrote: > On Wed, Feb 13, 2019 at 7:16 AM Frederic Weisbecker wrote: > > > > > > If "vectors" only has the high hit set, you end up with "fs" having > > > the value "64". > > > > > > And then "vectors >>= fs" is undefined and won't actually do anything > > > at all on x86. > > > > Oh! ok didn't know that... > > So in general, shift counts >= width of the type (or negative) are undefined. > > They can sometimes happen to work (that's the "undefined" part ;), but > it's not reliable or portable. > > It's why you occasionally see things like > > drivers/block/sx8.c: > tmp = (blk_rq_pos(rq) >> 16) >> 16; > > to get the upper 32 bits of the value. It is written with that odd > double shift, rather than being written as ">> 32". That way it works > even if the sector type happens to be 32-bit (and the compiler will > just end up turning it into a zero if it's an unsigned 32-bit type > since it's compile-time obvious). Ok, I see. > > > I see, perhaps I should use for_each_set_bit() that should take care about those > > details? > > That would _work_, but don't do that. "for_each_set_bit()" works on > bitmaps in memory, and is slow for a simple word case. In addition to > being slow, it uses the Linux tradition of working on bitmaps that are > comprised of "unsigned long". So it has byte order issues too. > > So for_each_set_bit() is useful when you have real arrays of bits and > are using the "set_bit()" etc interfaces. Yeah I suspected some overhead. > > When you're actually working on just a single variable, your "__ffs()" > model works fine, you just need to be careful to _not_ do the "+1" and > then use it for shifts. > > Also, it actually turns out that if you want to be really clever, you > can play tricks if you don't care about the exact bit *number*. > > For example, this expression: > > v = a & (a-1); > > will remove the lowest bit set from 'a' very cheaply. So what you can > do is something like this: > > void for_each_bit_in_mask(u64 mask) > { > while (mask) { > u64 newmask = mask & (mask-1); > u64 onebit = mask ^ newmask; > mask = newmask; > do_something_with(onebit); > } > } > > to do some operation on each bit set, and quite efficiently and > without any undefined behavior or expensive shifts. > > But the above trick does require that you are happy to just see the > bit *mask*, not the bit *number*. I'm not sure any of your cases match > that. Nice, I couldn't resist introducing such a headache in my set ;-) unfortunately I indeed need the bit number itself most of the time. So following your 1st advice, I should rather do something along the lines of: nr = 0; while (mask) { fs = __ffs64(mask); mask >>= fs; mask >>= 1; nr += fs + 1; process_bit_nr(nr - 1); } And define a for_each_lock_usage_bit(usage_mask) on top of it. Thanks a lot!