From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E8E8C432C3 for ; Tue, 3 Dec 2019 04:12:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3018120665 for ; Tue, 3 Dec 2019 04:12:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ellerman.id.au header.i=@ellerman.id.au header.b="TYjO+1/B" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726651AbfLCEMr (ORCPT ); Mon, 2 Dec 2019 23:12:47 -0500 Received: from ozlabs.org ([203.11.71.1]:44331 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726592AbfLCEMr (ORCPT ); Mon, 2 Dec 2019 23:12:47 -0500 Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 47RpVN4pdRz9sPJ; Tue, 3 Dec 2019 15:12:44 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ellerman.id.au; s=201909; t=1575346364; bh=mDwLrVfgNbGCj+6i+eyO23EDw2kXl4PKfhu8/4hlKgg=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=TYjO+1/BsSLC8QH5cQ4nIy6TTdq7oCVNhmNJM4w26lSN0Adil1BeTrNK+bvfPxyrn xcoiLiS5Uh4p2Jn4g1fz9sO15gN6E1Wq6BTNFBB1XjmJCZVC866wDTWi+IC2U1CYw/ BH27FipUBzp+sRxh12CtUS6fCHzO34PGahDlPXpfOSBJCyHJseQuGsImb/Y7+wi1Bk pQWbf9MAyIuOpIf0iBLIXvOwfnqeX4j8rgtcn+GDBvPuTAvPO1IU37ARrOsfpO8olS +jJ4gXl15CCjcfhRZRlAUoJXOWCwgw4nvmji30hSKzYP9B0vprST7KH2fOlKEA/kt+ SKfK5Et0KDW5w== From: Michael Ellerman To: Frank Rowand , Sebastian Andrzej Siewior , devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Cc: Rob Herring , Benjamin Herrenschmidt , Paul Mackerras , Thomas Gleixner , Frank Rowand Subject: Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF In-Reply-To: References: <20191129151056.o5c44lm5lb4wsr4r@linutronix.de> Date: Tue, 03 Dec 2019 15:12:44 +1100 Message-ID: <87tv6idp37.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain Sender: devicetree-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: devicetree@vger.kernel.org Frank Rowand writes: > On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: >> I've been looking at phandle_cache and noticed the following: The raw >> phandle value as generated by dtc starts at zero and is incremented by >> one for each phandle entry. The qemu pSeries model is using Slof (which >> is probably the same thing as used on real hardware) and this looks like >> a poiner value for the phandle. >> With >> qemu-system-ppc64le -m 16G -machine pseries -smp 8 >> >> I got the following output: >> | entries: 64 >> | phandle 7e732468 slot 28 hash c >> | phandle 7e732ad0 slot 10 hash 27 >> | phandle 7e732ee8 slot 28 hash 3a >> | phandle 7e734160 slot 20 hash 36 >> | phandle 7e734318 slot 18 hash 3a >> | phandle 7e734428 slot 28 hash 33 >> | phandle 7e734538 slot 38 hash 2c >> | phandle 7e734850 slot 10 hash e >> | phandle 7e735220 slot 20 hash 2d >> | phandle 7e735bf0 slot 30 hash d >> | phandle 7e7365c0 slot 0 hash 2d >> | phandle 7e736f90 slot 10 hash d >> | phandle 7e737960 slot 20 hash 2d >> | phandle 7e738330 slot 30 hash d >> | phandle 7e738d00 slot 0 hash 2d >> | phandle 7e739730 slot 30 hash 38 >> | phandle 7e73bd08 slot 8 hash 17 >> | phandle 7e73c2e0 slot 20 hash 32 >> | phandle 7e73c7f8 slot 38 hash 37 >> | phandle 7e782420 slot 20 hash 13 >> | phandle 7e782ed8 slot 18 hash 1b >> | phandle 7e73ce28 slot 28 hash 39 >> | phandle 7e73d390 slot 10 hash 22 >> | phandle 7e73d9a8 slot 28 hash 1a >> | phandle 7e73dc28 slot 28 hash 37 >> | phandle 7e73de00 slot 0 hash a >> | phandle 7e73e028 slot 28 hash 0 >> | phandle 7e7621a8 slot 28 hash 36 >> | phandle 7e73e458 slot 18 hash 1e >> | phandle 7e73e608 slot 8 hash 1e >> | phandle 7e740078 slot 38 hash 28 >> | phandle 7e740180 slot 0 hash 1d >> | phandle 7e740240 slot 0 hash 33 >> | phandle 7e740348 slot 8 hash 29 >> | phandle 7e740410 slot 10 hash 2 >> | phandle 7e740eb0 slot 30 hash 3e >> | phandle 7e745390 slot 10 hash 33 >> | phandle 7e747b08 slot 8 hash c >> | phandle 7e748528 slot 28 hash f >> | phandle 7e74a6e0 slot 20 hash 18 >> | phandle 7e74aab0 slot 30 hash b >> | phandle 7e74f788 slot 8 hash d >> | Used entries: 8, hashed: 29 >> >> So the hash array has 64 entries out which only 8 are populated. Using >> hash_32() populates 29 entries. >> Could someone with real hardware verify this? >> I'm not sure how important this performance wise, it looks just like a >> waste using only 1/8 of the array. > > The hash used is based on the assumptions you noted, and as stated in the > code, that phandle property values are in a contiguous range of 1..n > (not starting from zero), which is what dtc generates. > We knew that for systems that do not match the assumptions that the hash > will not be optimal. If we're going to have the phandle cache it should at least make some attempt to work across the systems that Linux supports. > Unless there is a serious performance problem for > such systems, I do not want to make the phandle hash code more complicated > to optimize for these cases. And the pseries have been performing ok > without phandle related performance issues that I remember hearing since > before the cache was added, which could have only helped the performance. > Yes, if your observations are correct, some memory is being wasted, but > a 64 entry cache is not very large on a pseries. A single line change to use an actual hash function is hardly complicating it, compared to the amount of code already there. cheers