SELinux Archive on lore.kernel.org
 help / color / Atom feed
From: Ondrej Mosnacek <omosnace@redhat.com>
To: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Jeff Vander Stoep <jeffv@google.com>,
	SElinux list <selinux@vger.kernel.org>,
	Paul Moore <paul@paul-moore.com>,
	will@kernel.org, Jovana Knezevic <jovanak@google.com>
Subject: Re: [PATCH v5] selinux: sidtab: reverse lookup hash table
Date: Thu, 7 Nov 2019 16:42:28 +0100
Message-ID: <CAFqZXNvxn-g3_m-49xBtONwBJwEh7GkrFvy025oX1pe9KcyG1g@mail.gmail.com> (raw)
In-Reply-To: <9b0959ff-93be-9086-063e-5403accabc8e@tycho.nsa.gov>

On Thu, Nov 7, 2019 at 3:54 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 11/7/19 5:17 AM, Jeff Vander Stoep wrote:
> > This replaces the reverse table lookup and reverse cache with a
> > hashtable which improves cache-miss reverse-lookup times from
> > O(n) to O(1)* and maintains the same performance as a reverse
> > cache hit.
> >
> > This reduces the time needed to add a new sidtab entry from ~500us
> > to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> >
> > The implementation uses the kernel's generic hashtable API,
> > It uses the context's string represtation as the hash source,
> > and the kernels generic string hashing algorithm full_name_hash()
> > to reduce the string to a 32 bit value.
> >
> > This change also maintains the improvement introduced in
> > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> > performance") which removed the need to keep the current sidtab
> > locked during policy reload. It does however introduce periodic
> > locking of the target sidtab while converting the hashtable. Sidtab
> > entries are never modified or removed, so the context struct stored
> > in the sid_to_context tree can also be used for the context_to_sid
> > hashtable to reduce memory usage.
> >
> > This bug was reported by:
> > - On the selinux bug tracker.
> >    BUG: kernel softlockup due to too many SIDs/contexts #37
> >    https://github.com/SELinuxProject/selinux-kernel/issues/37
> > - Jovana Knezevic on Android's bugtracker.
> >    Bug: 140252993
> >    "During multi-user performance testing, we create and remove users
> >    many times. selinux_android_restorecon_pkgdir goes from 1ms to over
> >    20ms after about 200 user creations and removals. Accumulated over
> >    ~280 packages, that adds a significant time to user creation,
> >    making perf benchmarks unreliable."
> >
> > * Hashtable lookup is only O(1) when n < the number of buckets.
> >
> > Changes in V2:
> > -The hashtable uses sidtab_entry_leaf objects directly so these
> > objects are shared between the sid_to_context lookup tree and the
> > context_to_sid hashtable. This simplifies memory allocation and
> > was suggested by Ondrej Mosnacek.
> > -The new sidtab hash stats file in selinuxfs has been moved out of
> > the avc dir and into a new "ss" dir.
> >
> > V3:
> > -Add lock nesting notation.
> >
> > V4:
> > -Moved to *_rcu variants of the various hashtable functions
> > as suggested by Ondrej Mosnacek and Will Deacon.
> > -Naming/spelling fixups.
>
> This still triggers a splat for me in running Ondrej's reproducer (also
> ran selinux-testsuite first, not sure if that mattered):
>
> SELinux:  Converting 3784 SID table entries...
> kernel: ------------[ cut here ]------------
> syscall 1 left IRQs disabled
> WARNING: CPU: 1 PID: 8313 at arch/x86/entry/common.c:261
> syscall_return_slowpathModules linked in: crypto_user
> scsi_transport_iscsi xt_multiport bluetooth ecdh_kernel:  crc32_pclmul
> snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel sCPU: 1 PID:
> 8313 Comm: runcon Not tainted 5.4.0-rc1+ #57
> Hardware name: Dell Inc. OptiPlex 7050/0NW6H5, BIOS 1.8.3 03/23/2018
> RIP: 0010:syscall_return_slowpath+0x1e1/0x4a0
> Code: ff ff e9 60 ff ff ff e8 2d ad 05 00 e9 2a ff ff ff 48 8d 7d 78 e8
> cf cf 50RSP: 0018:ffff8885a6adff28 EFLAGS: 00010082
> RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffed10b4d5bfd7
> RBP: ffff8885a6adff58 R08: ffffffffaa219f90 R09: fffffbfff5883615
> R10: fffffbfff5883614 R11: ffffffffac41b0a3 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f3e32c67080(0000) GS:ffff888822a40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f3e32f145d0 CR3: 00000005a7476005 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> kernel:  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f3e32e11467
> Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e
> fa 64 8bRSP: 002b:00007ffec6482bc8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> RAX: 000000000000002f RBX: 00007ffec6483d88 RCX: 00007f3e32e11467
> RDX: 000000000000002f RSI: 000055b75e5a5600 RDI: 0000000000000003
> RBP: 000055b75e5a5600 R08: 00000000ffffffff R09: 00007ffec6482a60
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffec6483c4c R14: 0000000000000000 R15: 0000000000000000
> irq event stamp: 7292
> hardirqs last  enabled at (7291): [<ffffffffab26d9bb>]
> _raw_write_unlock_irqresthardirqs last disabled at (7292):
> [<ffffffffab26db61>] _raw_spin_lock_irqsave+0xsoftirqs last  enabled at
> (7036): [<ffffffffab600494>] __do_softirq+0x494/0x665
> softirqs last disabled at (7025): [<ffffffffaa168b8f>] irq_exit+0x13f/0x160
> kernel: ---[ end trace 1557497a64e0b001 ]---
> kernel: ------------[ cut here ]------------
>
> Some comments below.  Recommend running it by the RCU wizards e.g. Paul
> McKenney too given the subtleties of getting RCU right.
>
> <snip>
>
> > diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
> > index 7d49994e8d5f..c052e620246c 100644
> > --- a/security/selinux/ss/sidtab.c
> > +++ b/security/selinux/ss/sidtab.c
> > @@ -17,26 +17,42 @@
> <snip>
> > +static u32 context_to_sid(struct sidtab *s, struct context *context)
> > +{
> > +     struct sidtab_entry_leaf *entry;
> > +
> > +     rcu_read_lock();
> > +     hash_for_each_possible_rcu(s->context_to_sid, entry, list,
> > +                                context->hash) {
> > +             if (context_cmp(&entry->context, context)) {
> > +                     rcu_read_unlock();
> > +                     return entry->sid;
>
> This looks unsafe to me; I would have assumed you need to at least save
> entry->sid to a temporary before rcu_read_unlock()?
>
> > @@ -305,29 +284,36 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
> >               rc = -ENOMEM;
> >               dst_convert = sidtab_do_lookup(convert->target, count, 1);
> >               if (!dst_convert) {
> > -                     context_destroy(dst);
> > +                     context_destroy(&dst->context);
> >                       goto out_unlock;
> >               }
> >
> > -             rc = convert->func(context, dst_convert, convert->args);
> > +             rc = convert->func(context, &dst_convert->context,
> > +                             convert->args);
> >               if (rc) {
> > -                     context_destroy(dst);
> > +                     context_destroy(&dst->context);
> >                       goto out_unlock;
> >               }
> > +             dst_convert->sid = index_to_sid(count);
> > +             convert->target->count = count + 1;
> >
> >               /* at this point we know the insert won't fail */
> > -             convert->target->count = count + 1;
> > +             spin_lock_irqsave_nested(&convert->target->lock, flags,
> > +                             SINGLE_DEPTH_NESTING);
> > +             hash_add_rcu(convert->target->context_to_sid,
> > +                             &dst_convert->list, dst_convert->context.hash);
> > +             spin_unlock_irqrestore(&convert->target->lock, flags);
>
> Still having a hard time understanding why we need to lock the target.
> The target here is always the newsidtab allocated by
> security_load_policy(), not yet exposed to any other threads in the system?

I believe this is to avoid a race with the hash_add() in
sidtab_convert_hashtable(), which locks only the target table.
However, I don't like the fact that we need to mess with inconsistent
lock nesting here... I think it would be better to just lock the
master's spinlock in sidtab_convert_hashtable() - it will then block
more when entries would be added to sidtab, but that shouldn't be a
big concern in practice.

-- 
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.


  reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 10:17 Jeff Vander Stoep
2019-11-07 12:01 ` Jeffrey Vander Stoep
2019-11-14 23:35   ` Paul Moore
2019-11-07 14:54 ` Stephen Smalley
2019-11-07 15:42   ` Ondrej Mosnacek [this message]
2019-11-07 15:59     ` Jeffrey Vander Stoep
2019-11-07 15:49   ` Jeffrey Vander Stoep
2019-11-07 16:12 ` Ondrej Mosnacek
2019-11-07 16:44   ` Will Deacon
2019-11-07 18:41     ` Ondrej Mosnacek
2019-11-07 20:06       ` Jeffrey Vander Stoep

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFqZXNvxn-g3_m-49xBtONwBJwEh7GkrFvy025oX1pe9KcyG1g@mail.gmail.com \
    --to=omosnace@redhat.com \
    --cc=jeffv@google.com \
    --cc=jovanak@google.com \
    --cc=paul@paul-moore.com \
    --cc=sds@tycho.nsa.gov \
    --cc=selinux@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

SELinux Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/selinux/0 selinux/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 selinux selinux/ https://lore.kernel.org/selinux \
		selinux@vger.kernel.org
	public-inbox-index selinux

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.selinux


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git