SELinux Archive on lore.kernel.org
 help / color / Atom feed
From: Stephen Smalley <sds@tycho.nsa.gov>
To: Jeff Vander Stoep <jeffv@google.com>, selinux@vger.kernel.org
Cc: omosnace@redhat.com, paul@paul-moore.com, will@kernel.org,
	Jovana Knezevic <jovanak@google.com>
Subject: Re: [PATCH v5] selinux: sidtab: reverse lookup hash table
Date: Thu, 7 Nov 2019 09:54:10 -0500
Message-ID: <9b0959ff-93be-9086-063e-5403accabc8e@tycho.nsa.gov> (raw)
In-Reply-To: <20191107101743.203699-1-jeffv@google.com>

On 11/7/19 5:17 AM, Jeff Vander Stoep wrote:
> This replaces the reverse table lookup and reverse cache with a
> hashtable which improves cache-miss reverse-lookup times from
> O(n) to O(1)* and maintains the same performance as a reverse
> cache hit.
> 
> This reduces the time needed to add a new sidtab entry from ~500us
> to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> 
> The implementation uses the kernel's generic hashtable API,
> It uses the context's string represtation as the hash source,
> and the kernels generic string hashing algorithm full_name_hash()
> to reduce the string to a 32 bit value.
> 
> This change also maintains the improvement introduced in
> commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> performance") which removed the need to keep the current sidtab
> locked during policy reload. It does however introduce periodic
> locking of the target sidtab while converting the hashtable. Sidtab
> entries are never modified or removed, so the context struct stored
> in the sid_to_context tree can also be used for the context_to_sid
> hashtable to reduce memory usage.
> 
> This bug was reported by:
> - On the selinux bug tracker.
>    BUG: kernel softlockup due to too many SIDs/contexts #37
>    https://github.com/SELinuxProject/selinux-kernel/issues/37
> - Jovana Knezevic on Android's bugtracker.
>    Bug: 140252993
>    "During multi-user performance testing, we create and remove users
>    many times. selinux_android_restorecon_pkgdir goes from 1ms to over
>    20ms after about 200 user creations and removals. Accumulated over
>    ~280 packages, that adds a significant time to user creation,
>    making perf benchmarks unreliable."
> 
> * Hashtable lookup is only O(1) when n < the number of buckets.
> 
> Changes in V2:
> -The hashtable uses sidtab_entry_leaf objects directly so these
> objects are shared between the sid_to_context lookup tree and the
> context_to_sid hashtable. This simplifies memory allocation and
> was suggested by Ondrej Mosnacek.
> -The new sidtab hash stats file in selinuxfs has been moved out of
> the avc dir and into a new "ss" dir.
> 
> V3:
> -Add lock nesting notation.
> 
> V4:
> -Moved to *_rcu variants of the various hashtable functions
> as suggested by Ondrej Mosnacek and Will Deacon.
> -Naming/spelling fixups.

This still triggers a splat for me in running Ondrej's reproducer (also 
ran selinux-testsuite first, not sure if that mattered):

SELinux:  Converting 3784 SID table entries...
kernel: ------------[ cut here ]------------
syscall 1 left IRQs disabled
WARNING: CPU: 1 PID: 8313 at arch/x86/entry/common.c:261 
syscall_return_slowpathModules linked in: crypto_user 
scsi_transport_iscsi xt_multiport bluetooth ecdh_kernel:  crc32_pclmul 
snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel sCPU: 1 PID: 
8313 Comm: runcon Not tainted 5.4.0-rc1+ #57
Hardware name: Dell Inc. OptiPlex 7050/0NW6H5, BIOS 1.8.3 03/23/2018
RIP: 0010:syscall_return_slowpath+0x1e1/0x4a0
Code: ff ff e9 60 ff ff ff e8 2d ad 05 00 e9 2a ff ff ff 48 8d 7d 78 e8 
cf cf 50RSP: 0018:ffff8885a6adff28 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffed10b4d5bfd7
RBP: ffff8885a6adff58 R08: ffffffffaa219f90 R09: fffffbfff5883615
R10: fffffbfff5883614 R11: ffffffffac41b0a3 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f3e32c67080(0000) GS:ffff888822a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3e32f145d0 CR3: 00000005a7476005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kernel:  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f3e32e11467
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e 
fa 64 8bRSP: 002b:00007ffec6482bc8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
RAX: 000000000000002f RBX: 00007ffec6483d88 RCX: 00007f3e32e11467
RDX: 000000000000002f RSI: 000055b75e5a5600 RDI: 0000000000000003
RBP: 000055b75e5a5600 R08: 00000000ffffffff R09: 00007ffec6482a60
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffec6483c4c R14: 0000000000000000 R15: 0000000000000000
irq event stamp: 7292
hardirqs last  enabled at (7291): [<ffffffffab26d9bb>] 
_raw_write_unlock_irqresthardirqs last disabled at (7292): 
[<ffffffffab26db61>] _raw_spin_lock_irqsave+0xsoftirqs last  enabled at 
(7036): [<ffffffffab600494>] __do_softirq+0x494/0x665
softirqs last disabled at (7025): [<ffffffffaa168b8f>] irq_exit+0x13f/0x160
kernel: ---[ end trace 1557497a64e0b001 ]---
kernel: ------------[ cut here ]------------

Some comments below.  Recommend running it by the RCU wizards e.g. Paul 
McKenney too given the subtleties of getting RCU right.

<snip>

> diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
> index 7d49994e8d5f..c052e620246c 100644
> --- a/security/selinux/ss/sidtab.c
> +++ b/security/selinux/ss/sidtab.c
> @@ -17,26 +17,42 @@
<snip>
> +static u32 context_to_sid(struct sidtab *s, struct context *context)
> +{
> +	struct sidtab_entry_leaf *entry;
> +
> +	rcu_read_lock();
> +	hash_for_each_possible_rcu(s->context_to_sid, entry, list,
> +				   context->hash) {
> +		if (context_cmp(&entry->context, context)) {
> +			rcu_read_unlock();
> +			return entry->sid;

This looks unsafe to me; I would have assumed you need to at least save 
entry->sid to a temporary before rcu_read_unlock()?

> @@ -305,29 +284,36 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
>   		rc = -ENOMEM;
>   		dst_convert = sidtab_do_lookup(convert->target, count, 1);
>   		if (!dst_convert) {
> -			context_destroy(dst);
> +			context_destroy(&dst->context);
>   			goto out_unlock;
>   		}
>   
> -		rc = convert->func(context, dst_convert, convert->args);
> +		rc = convert->func(context, &dst_convert->context,
> +				convert->args);
>   		if (rc) {
> -			context_destroy(dst);
> +			context_destroy(&dst->context);
>   			goto out_unlock;
>   		}
> +		dst_convert->sid = index_to_sid(count);
> +		convert->target->count = count + 1;
>   
>   		/* at this point we know the insert won't fail */
> -		convert->target->count = count + 1;
> +		spin_lock_irqsave_nested(&convert->target->lock, flags,
> +				SINGLE_DEPTH_NESTING);
> +		hash_add_rcu(convert->target->context_to_sid,
> +				&dst_convert->list, dst_convert->context.hash);
> +		spin_unlock_irqrestore(&convert->target->lock, flags);

Still having a hard time understanding why we need to lock the target. 
The target here is always the newsidtab allocated by 
security_load_policy(), not yet exposed to any other threads in the system?

>   	}
>   
>   	if (context->len)
>   		pr_info("SELinux:  Context %s is not valid (left unmapped).\n",
>   			context->str);
>   
> -	sidtab_rcache_push(s, count);
> -	*index = count;
> +	*sid = index_to_sid(count);
>   
> -	/* write entries before writing new count */
> +	/* Write entries before updating count. */
>   	smp_store_release(&s->count, count + 1);
> +	hash_add_rcu(s->context_to_sid, &dst->list, dst->context.hash);

Exactly what guarantee do we think we are getting from placing the 
hash_add_rcu() after the smp_store_release() on s->count, and is that 
guarantee well-founded?

> @@ -474,7 +463,7 @@ static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
>   
>   		for (i = 0; i < SIDTAB_LEAF_ENTRIES; i++)
>   			context_destroy(&node->entries[i].context);
> -		kfree(node);
> +		kfree_rcu(node, rcu);

Is it safe to do this without having deleted it from the hash and 
without having taken the spinlock?

>   	}
>   }
>   

> diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
> index 1f4763141aa1..f1f505df97ba 100644
> --- a/security/selinux/ss/sidtab.h
> +++ b/security/selinux/ss/sidtab.h
> @@ -83,11 +88,11 @@ struct sidtab {
>   	struct sidtab_convert_params *convert;
>   	spinlock_t lock;
>   
> -	/* reverse lookup cache - access atomically via {READ|WRITE}_ONCE() */
> -	u32 rcache[SIDTAB_RCACHE_SIZE];
> -
>   	/* index == SID - 1 (no entry for SECSID_NULL) */
>   	struct sidtab_isid_entry isids[SECINITSID_NUM];
> +
> +	/* Hash table for fast reverse context-to-sid lookups. */
> +	DECLARE_HASHTABLE(context_to_sid, SIDTAB_HASH_BITS);

Should this and/or any of the associated RCU-protected structures be 
marked with __rcu for sparse checking?

>   };
>   
>   int sidtab_init(struct sidtab *s);



  parent reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 10:17 Jeff Vander Stoep
2019-11-07 12:01 ` Jeffrey Vander Stoep
2019-11-14 23:35   ` Paul Moore
2019-11-07 14:54 ` Stephen Smalley [this message]
2019-11-07 15:42   ` Ondrej Mosnacek
2019-11-07 15:59     ` Jeffrey Vander Stoep
2019-11-07 15:49   ` Jeffrey Vander Stoep
2019-11-07 16:12 ` Ondrej Mosnacek
2019-11-07 16:44   ` Will Deacon
2019-11-07 18:41     ` Ondrej Mosnacek
2019-11-07 20:06       ` Jeffrey Vander Stoep

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b0959ff-93be-9086-063e-5403accabc8e@tycho.nsa.gov \
    --to=sds@tycho.nsa.gov \
    --cc=jeffv@google.com \
    --cc=jovanak@google.com \
    --cc=omosnace@redhat.com \
    --cc=paul@paul-moore.com \
    --cc=selinux@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

SELinux Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/selinux/0 selinux/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 selinux selinux/ https://lore.kernel.org/selinux \
		selinux@vger.kernel.org
	public-inbox-index selinux

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.selinux


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git