linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Waiman Long <longman@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Stanislav Kinsbursky <skinsbursky@parallels.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-api@vger.kernel.org
Subject: Re: [RFC][PATCH] ipc: Remove IPCMNI
Date: Thu, 29 Mar 2018 12:32:54 -0700	[thread overview]
Message-ID: <20180329193254.GA22300@bombadil.infradead.org> (raw)
In-Reply-To: <05772f83-d680-aea1-b222-cef2430dcc83@colorfullife.com>

On Thu, Mar 29, 2018 at 08:07:44PM +0200, Manfred Spraul wrote:
> Hello Mathew,
> 
> On 03/29/2018 12:56 PM, Matthew Wilcox wrote:
> > On Thu, Mar 29, 2018 at 10:47:45AM +0200, Manfred Spraul wrote:
> > > > > > > > This can be implemented trivially with the current code
> > > > > > > > using idr_alloc_cyclic.
> > > Is there a performance impact?
> > > Right now, the idr tree is only large if there are lots of objects.
> > > What happens if we have only 1 object, with id=INT_MAX-1?
> > The radix tree uses a branching factor of 64 entries (6 bits) per level.
> > The maximum ID is 31 bits (positive signed 32-bit integer).  So the
> > worst case for a single object is 6 pointer dereferences to find the
> > object anywhere in the range (INT_MAX/2 - INT_MAX].  That will read 12
> > cachelines.  If we were to constrain ourselves to a maximum of INT_MAX/2
> > (30 bits), we'd reduce that to 5 pointer dereferences and 10 cachelines.
> I'm concerned about the up to 6 branches.
> But this is just guessing, we need a test with a realistic workload.

Yes, and once there's a realistic workload, I'll be happy to prioritise
adapting the data structure to reduce the pointer chases.

FWIW, the plan is this:

There's currently an unused 32-bit field (on 64-bit machines) which I plan
to make the 'base' field.  So at each step down the tree, one subtracts
that field from the index in order to decide which slot to look at next.
Something like this:

index=0x3000'0000
(root) -> order=24
          offset=48 -> order=18
                       offset=0 -> order=12
                                   offset=0 -> order=6
                                               offset=0 -> order=0
                                                           offset=0 -> data

compresses to a single node:
(root) -> order=0
          base=3000'0000
          offset=0 -> data

If one has one entry at 5 and another entry at 0x3000'0000, the tree
looks like this (three nodes):

(root) -> order=24
          base=0
          offset=0  -> order=0
                       base=0
                       offset=5 -> entry1
          offset=48 -> order=0
                       base=0
                       offset=0 -> entry2

The trick is making sure that looking up offset 0x300'1000 returns NULL
and not entry2, but I can make that work.

An alternative is to go to something a little more libJudy and have
mutating internal nodes in the tree that can represent this kind of
situation in a more compact form.  There's a tradeoff to be made between
simplicity of implementation, cost of insertion, cost of lookup and
memory consumption.  I don't know where the right balance point is yet.

  parent reply	other threads:[~2018-03-29 19:33 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-12 20:15 [PATCH v4 0/6] ipc: Clamp *mni to the real IPCMNI limit Waiman Long
2018-03-12 20:15 ` [PATCH v4 1/6] sysctl: Add flags to support min/max range clamping Waiman Long
2018-03-12 20:44   ` Luis R. Rodriguez
2018-03-12 20:48     ` Waiman Long
2018-03-13 17:46   ` Eric W. Biederman
2018-03-13 18:49     ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 2/6] proc/sysctl: Check for invalid flags bits Waiman Long
2018-03-12 20:46   ` Luis R. Rodriguez
2018-03-12 20:54     ` Waiman Long
2018-03-12 20:59       ` Luis R. Rodriguez
2018-03-12 21:02         ` Waiman Long
2018-03-12 20:52   ` Andrew Morton
2018-03-12 22:12     ` Waiman Long
2018-03-12 22:42       ` Andrew Morton
2018-03-12 20:15 ` [PATCH v4 3/6] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
2018-03-12 20:50   ` Luis R. Rodriguez
2018-03-12 21:07     ` Waiman Long
2018-03-12 21:00   ` Andrew Morton
2018-03-12 21:04     ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit Waiman Long
2018-03-13 18:17   ` Eric W. Biederman
2018-03-13 18:39     ` Waiman Long
2018-03-13 20:29       ` Eric W. Biederman
2018-03-13 21:06         ` Waiman Long
2018-03-15  0:49           ` [RFC][PATCH] ipc: Remove IPCMNI Eric W. Biederman
2018-03-15 17:02             ` Waiman Long
2018-03-15 19:00               ` Eric W. Biederman
2018-03-15 21:46                 ` Waiman Long
2018-03-29  2:14                   ` Davidlohr Bueso
2018-03-29  8:47                     ` Manfred Spraul
2018-03-29 10:56                       ` Matthew Wilcox
2018-03-29 18:07                         ` Manfred Spraul
2018-03-29 18:52                           ` Eric W. Biederman
2018-03-29 19:32                           ` Matthew Wilcox [this message]
2018-03-29 20:08                       ` Eric W. Biederman
2018-03-15 19:45             ` Matthew Wilcox
2018-03-12 20:15 ` [PATCH v4 5/6] ipc: Clamp semmni to the real IPCMNI limit Waiman Long
2018-03-12 20:52   ` Luis R. Rodriguez
2018-03-12 20:59     ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 6/6] test_sysctl: Add range clamping test Waiman Long
2018-03-12 20:53   ` Luis R. Rodriguez
2018-03-12 21:00     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180329193254.GA22300@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=manfred@colorfullife.com \
    --cc=mcgrof@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=skinsbursky@parallels.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).