From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3D71C432C0 for ; Sat, 30 Nov 2019 02:14:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A8BBE20869 for ; Sat, 30 Nov 2019 02:14:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mv7BOpck" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727179AbfK3COu (ORCPT ); Fri, 29 Nov 2019 21:14:50 -0500 Received: from mail-yb1-f193.google.com ([209.85.219.193]:47024 "EHLO mail-yb1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727177AbfK3COu (ORCPT ); Fri, 29 Nov 2019 21:14:50 -0500 Received: by mail-yb1-f193.google.com with SMTP id v15so12202421ybp.13 for ; Fri, 29 Nov 2019 18:14:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=cI8/7z9v59sx8rV1YXAjrbhcz/dsn2bdotkY6ICnSSs=; b=mv7BOpckWzaTp/9wQ29LHYha8g5Fmm5opmo5Qp5W7XxK4FGCE2NIDj9AY9bgdQ2ZMq EBNupNB8mjJ2QaR4QzsWXZJ0UCwW4m1MbyNcGO4nNaodfCXgrOGxCnD+UNBUFTyToJAf HnEvs6pY0DfWrM6lleJdfIZGeRn7UnDT1mT4xRw35M+pMi+5mxPNQ4HXDx4xsiAFsAHp BwQlWBk2/lGuegNxDi+eeIvw0aHKxr7UR6KdFvLchbC+zg74/rKKRiSF7N0+5kjL0ZnB GWYLKRU+76pnp6zffYWvpTQ8SDa+L+WaDjC1mEgzDnQJfFbD37nN+5SKylTTkfV3MB0+ U0wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=cI8/7z9v59sx8rV1YXAjrbhcz/dsn2bdotkY6ICnSSs=; b=nOExYwFqJqRaX6enwm2HXMYXN/kwlnNU44J5SxwRx7OHc+u/PodsteMIagWeSiVv6j yW6+1RLLl0olbSiDHHHw7URAf8wJN+KIZ2g9AMpKbqcx2NlAORVXuHwT4FM2PT7pfsuh YgMezGTztCE6/bhLa01zh8QrxKy35LNOOq6gH0gGkqH42QKqA8NsEyZfbv3F5Jm4iLYy JupU59MF8W2eiRjdu16DX1m2ywsHrvIYbSIOR9v81ptL+0JAsoru2r8OSq76VLPvjWiu GcgUSJ9PRkty0Drt6bzGbVlJwdXuuD8ZlMdI1OrG0DpdNyfcpOXYtf4sh0Ab8AUoRAQt EzPA== X-Gm-Message-State: APjAAAXrSsL8jNF+0eWB467UmuBo7nU2sjFWWCxubvWqtBW5q1nVRyIU iSsSxmMIjkYI82cLnmYieI8= X-Google-Smtp-Source: APXvYqzHY3DrUeXIWwAdco703XfexT4OkbxHjgBhdWN5aycdweyJhyEIUqT6C/Ay0Qz2o0wd0992rg== X-Received: by 2002:a25:9346:: with SMTP id g6mr43480381ybo.6.1575080088485; Fri, 29 Nov 2019 18:14:48 -0800 (PST) Received: from [192.168.1.46] (c-73-88-245-53.hsd1.tn.comcast.net. [73.88.245.53]) by smtp.gmail.com with ESMTPSA id x84sm10978604ywg.47.2019.11.29.18.14.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 29 Nov 2019 18:14:48 -0800 (PST) Subject: Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF To: Sebastian Andrzej Siewior , devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Cc: Rob Herring , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Thomas Gleixner , Frank Rowand References: <20191129151056.o5c44lm5lb4wsr4r@linutronix.de> From: Frank Rowand Message-ID: Date: Fri, 29 Nov 2019 20:14:47 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191129151056.o5c44lm5lb4wsr4r@linutronix.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: devicetree-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: devicetree@vger.kernel.org On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: > I've been looking at phandle_cache and noticed the following: The raw > phandle value as generated by dtc starts at zero and is incremented by > one for each phandle entry. The qemu pSeries model is using Slof (which > is probably the same thing as used on real hardware) and this looks like > a poiner value for the phandle. > With > qemu-system-ppc64le -m 16G -machine pseries -smp 8 > > I got the following output: > | entries: 64 > | phandle 7e732468 slot 28 hash c > | phandle 7e732ad0 slot 10 hash 27 > | phandle 7e732ee8 slot 28 hash 3a > | phandle 7e734160 slot 20 hash 36 > | phandle 7e734318 slot 18 hash 3a > | phandle 7e734428 slot 28 hash 33 > | phandle 7e734538 slot 38 hash 2c > | phandle 7e734850 slot 10 hash e > | phandle 7e735220 slot 20 hash 2d > | phandle 7e735bf0 slot 30 hash d > | phandle 7e7365c0 slot 0 hash 2d > | phandle 7e736f90 slot 10 hash d > | phandle 7e737960 slot 20 hash 2d > | phandle 7e738330 slot 30 hash d > | phandle 7e738d00 slot 0 hash 2d > | phandle 7e739730 slot 30 hash 38 > | phandle 7e73bd08 slot 8 hash 17 > | phandle 7e73c2e0 slot 20 hash 32 > | phandle 7e73c7f8 slot 38 hash 37 > | phandle 7e782420 slot 20 hash 13 > | phandle 7e782ed8 slot 18 hash 1b > | phandle 7e73ce28 slot 28 hash 39 > | phandle 7e73d390 slot 10 hash 22 > | phandle 7e73d9a8 slot 28 hash 1a > | phandle 7e73dc28 slot 28 hash 37 > | phandle 7e73de00 slot 0 hash a > | phandle 7e73e028 slot 28 hash 0 > | phandle 7e7621a8 slot 28 hash 36 > | phandle 7e73e458 slot 18 hash 1e > | phandle 7e73e608 slot 8 hash 1e > | phandle 7e740078 slot 38 hash 28 > | phandle 7e740180 slot 0 hash 1d > | phandle 7e740240 slot 0 hash 33 > | phandle 7e740348 slot 8 hash 29 > | phandle 7e740410 slot 10 hash 2 > | phandle 7e740eb0 slot 30 hash 3e > | phandle 7e745390 slot 10 hash 33 > | phandle 7e747b08 slot 8 hash c > | phandle 7e748528 slot 28 hash f > | phandle 7e74a6e0 slot 20 hash 18 > | phandle 7e74aab0 slot 30 hash b > | phandle 7e74f788 slot 8 hash d > | Used entries: 8, hashed: 29 > > So the hash array has 64 entries out which only 8 are populated. Using > hash_32() populates 29 entries. > Could someone with real hardware verify this? > I'm not sure how important this performance wise, it looks just like a > waste using only 1/8 of the array. The hash used is based on the assumptions you noted, and as stated in the code, that phandle property values are in a contiguous range of 1..n (not starting from zero), which is what dtc generates. We knew that for systems that do not match the assumptions that the hash will not be optimal. Unless there is a serious performance problem for such systems, I do not want to make the phandle hash code more complicated to optimize for these cases. And the pseries have been performing ok without phandle related performance issues that I remember hearing since before the cache was added, which could have only helped the performance. Yes, if your observations are correct, some memory is being wasted, but a 64 entry cache is not very large on a pseries. There is already some push back from Rob that the existing code is more complex than needed (eg variable cache size). -Frank > > The patch used for testing: > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 1d667eb730e19..2640d4bc81a9a 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -197,6 +197,7 @@ void of_populate_phandle_cache(void) > u32 cache_entries; > struct device_node *np; > u32 phandles = 0; > + struct device_node **cache2; > > raw_spin_lock_irqsave(&devtree_lock, flags); > > @@ -214,14 +215,32 @@ void of_populate_phandle_cache(void) > > phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache), > GFP_ATOMIC); > + cache2 = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_ATOMIC); > if (!phandle_cache) > goto out; > > + pr_err("%s(%d) entries: %d\n", __func__, __LINE__, cache_entries); > for_each_of_allnodes(np) > if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) { > + int slot; > of_node_get(np); > phandle_cache[np->phandle & phandle_cache_mask] = np; > + slot = hash_32(np->phandle, __ffs(cache_entries)); > + cache2[slot] = np; > + pr_err("%s(%d) phandle %x slot %x hash %x\n", __func__, __LINE__, > + np->phandle, np->phandle & phandle_cache_mask, slot); > } > + { > + int i, filled = 0, filled_hash = 0; > + > + for (i = 0; i < cache_entries; i++) { > + if (phandle_cache[i]) > + filled++; > + if (cache2[i]) > + filled_hash++; > + } > + pr_err("%s(%d) Used entries: %d, hashed: %d\n", __func__, __LINE__, filled, filled_hash); > + } > > out: > raw_spin_unlock_irqrestore(&devtree_lock, flags); > > Sebastian >