From: Rob Herring <robh+dt@kernel.org>
To: Frank Rowand <frowand.list@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
devicetree@vger.kernel.org,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
Date: Tue, 3 Dec 2019 10:56:35 -0600 [thread overview]
Message-ID: <CAL_JsqKieG5=teL7gABPKbJOQfvoS9s-ZPF-=R0yEE_LUoy-Kw@mail.gmail.com> (raw)
In-Reply-To: <67e1da87-7f5a-3972-bc16-28bae2350c12@gmail.com>
On Mon, Dec 2, 2019 at 10:28 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 12/2/19 10:12 PM, Michael Ellerman wrote:
> > Frank Rowand <frowand.list@gmail.com> writes:
> >> On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote:
> >>> I've been looking at phandle_cache and noticed the following: The raw
> >>> phandle value as generated by dtc starts at zero and is incremented by
> >>> one for each phandle entry. The qemu pSeries model is using Slof (which
> >>> is probably the same thing as used on real hardware) and this looks like
> >>> a poiner value for the phandle.
> >>> With
> >>> qemu-system-ppc64le -m 16G -machine pseries -smp 8
> >>>
> >>> I got the following output:
> >>> | entries: 64
> >>> | phandle 7e732468 slot 28 hash c
> >>> | phandle 7e732ad0 slot 10 hash 27
> >>> | phandle 7e732ee8 slot 28 hash 3a
> >>> | phandle 7e734160 slot 20 hash 36
> >>> | phandle 7e734318 slot 18 hash 3a
> >>> | phandle 7e734428 slot 28 hash 33
> >>> | phandle 7e734538 slot 38 hash 2c
> >>> | phandle 7e734850 slot 10 hash e
> >>> | phandle 7e735220 slot 20 hash 2d
> >>> | phandle 7e735bf0 slot 30 hash d
> >>> | phandle 7e7365c0 slot 0 hash 2d
> >>> | phandle 7e736f90 slot 10 hash d
> >>> | phandle 7e737960 slot 20 hash 2d
> >>> | phandle 7e738330 slot 30 hash d
> >>> | phandle 7e738d00 slot 0 hash 2d
> >>> | phandle 7e739730 slot 30 hash 38
> >>> | phandle 7e73bd08 slot 8 hash 17
> >>> | phandle 7e73c2e0 slot 20 hash 32
> >>> | phandle 7e73c7f8 slot 38 hash 37
> >>> | phandle 7e782420 slot 20 hash 13
> >>> | phandle 7e782ed8 slot 18 hash 1b
> >>> | phandle 7e73ce28 slot 28 hash 39
> >>> | phandle 7e73d390 slot 10 hash 22
> >>> | phandle 7e73d9a8 slot 28 hash 1a
> >>> | phandle 7e73dc28 slot 28 hash 37
> >>> | phandle 7e73de00 slot 0 hash a
> >>> | phandle 7e73e028 slot 28 hash 0
> >>> | phandle 7e7621a8 slot 28 hash 36
> >>> | phandle 7e73e458 slot 18 hash 1e
> >>> | phandle 7e73e608 slot 8 hash 1e
> >>> | phandle 7e740078 slot 38 hash 28
> >>> | phandle 7e740180 slot 0 hash 1d
> >>> | phandle 7e740240 slot 0 hash 33
> >>> | phandle 7e740348 slot 8 hash 29
> >>> | phandle 7e740410 slot 10 hash 2
> >>> | phandle 7e740eb0 slot 30 hash 3e
> >>> | phandle 7e745390 slot 10 hash 33
> >>> | phandle 7e747b08 slot 8 hash c
> >>> | phandle 7e748528 slot 28 hash f
> >>> | phandle 7e74a6e0 slot 20 hash 18
> >>> | phandle 7e74aab0 slot 30 hash b
> >>> | phandle 7e74f788 slot 8 hash d
> >>> | Used entries: 8, hashed: 29
> >>>
> >>> So the hash array has 64 entries out which only 8 are populated. Using
> >>> hash_32() populates 29 entries.
> >>> Could someone with real hardware verify this?
> >>> I'm not sure how important this performance wise, it looks just like a
> >>> waste using only 1/8 of the array.
> >>
> >> The hash used is based on the assumptions you noted, and as stated in the
> >> code, that phandle property values are in a contiguous range of 1..n
> >> (not starting from zero), which is what dtc generates.
> >
> >> We knew that for systems that do not match the assumptions that the hash
> >> will not be optimal.
> >
> > If we're going to have the phandle cache it should at least make some
> > attempt to work across the systems that Linux supports.
> >
> >> Unless there is a serious performance problem for
> >> such systems, I do not want to make the phandle hash code more complicated
> >> to optimize for these cases. And the pseries have been performing ok
> >> without phandle related performance issues that I remember hearing since
> >> before the cache was added, which could have only helped the performance.
> >> Yes, if your observations are correct, some memory is being wasted, but
> >> a 64 entry cache is not very large on a pseries.
> >
> > A single line change to use an actual hash function is hardly
> > complicating it, compared to the amount of code already there.
>
> With a dtc generated devicetree, the hash is perfect, with no
> misses. That single line change then makes the hash bad for
> dtc generated devicetrees.
To quantify that, I did some tests with the hash algo and it's about a
10% collision rate using phandle values of 1-$cache_size. There's
never more than 2 phandles colliding in a slot. The actual impact
would be less given 100% thrashing seems unlikely.
> The cache was added to address a problem with a system with a
> dtc generated devicetree.
The cache was added to address a problem with a system with a large
number of nodes and phandles (814 phandles in the reported case). dtc
happens to be used in that case.
> I had not heard of any phandle
> lookup performance issues on ppc systems. An imperfect hash
> for ppc should not make the ppc performance worse (unless
> every single phandle value hashes to a single bucket). So
> the ppc performance is likely better than it was before
> the hash was added, even without an optimal hash algorithm.
>
> So the change would not be a single line change. It would
> be a change to use different hash algorithms for dtc
> generated device trees versus other device trees.
>
> Another possibility would be to make the cache be dependent
> upon not CONFIG_PPC. It might be possible to disable the
> cache with a minimal code change.
I'd rather not do that.
And yes, as mentioned earlier I don't like the complexity. I didn't
from the start and I'm I'm still of the opinion we should have a
fixed or 1 time sized true cache (i.e. smaller than total # of
phandles). That would solve the RT memory allocation and locking issue
too.
For reference, the performance difference between the current
implementation (assuming fixes haven't regressed it) was ~400ms vs. a
~340ms improvement with a 64 entry cache (using a mask, not a hash).
IMO, 340ms improvement was good enough.
Rob
next prev parent reply other threads:[~2019-12-03 16:56 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-29 15:10 [RFC] Efficiency of the phandle_cache on ppc64/SLOF Sebastian Andrzej Siewior
2019-11-30 2:14 ` Frank Rowand
2019-12-02 11:07 ` Sebastian Andrzej Siewior
2019-12-03 4:12 ` Michael Ellerman
2019-12-03 4:28 ` Frank Rowand
2019-12-03 16:56 ` Rob Herring [this message]
2019-12-05 16:35 ` Sebastian Andrzej Siewior
2019-12-06 2:01 ` Frank Rowand
2019-12-09 13:35 ` Sebastian Andrzej Siewior
2019-12-10 1:51 ` Rob Herring
2019-12-10 8:17 ` Frank Rowand
2019-12-10 12:46 ` Frank Rowand
2019-12-11 14:42 ` Rob Herring
2019-12-06 1:52 ` Frank Rowand
2019-12-08 6:59 ` Frank Rowand
2019-12-03 4:03 ` Michael Ellerman
2019-12-03 18:35 ` Segher Boessenkool
2019-12-06 1:37 ` Frank Rowand
2019-12-06 23:40 ` Segher Boessenkool
2019-12-08 4:30 ` Frank Rowand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL_JsqKieG5=teL7gABPKbJOQfvoS9s-ZPF-=R0yEE_LUoy-Kw@mail.gmail.com' \
--to=robh+dt@kernel.org \
--cc=benh@kernel.crashing.org \
--cc=bigeasy@linutronix.de \
--cc=devicetree@vger.kernel.org \
--cc=frowand.list@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).