* RCU issue with SELinux (Re: SELINUX performance issues)
@ 2004-08-16 9:33 Kaigai Kohei
2004-08-16 15:19 ` James Morris
0 siblings, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-16 9:33 UTC (permalink / raw)
To: SELinux-ML(Eng), Linux Kernel ML(Eng); +Cc: James Morris
Hello, everyone.
Sat, 7 Aug 2004 22:57:08 -0400 (EDT)
James Morris <jmorris@redhat.com> wrote:
> > The biggest problem is the global lock:
> >
> > avc_has_perm_noaudit:
> > spin_lock_irqsave(&avc_lock, flags);
> >
> > Any chance we can get rid of it? Maybe with RCU?
>
> Yes, known problem. I plan on trying RCU soon, Rik was looking at a
> seqlock approach.
I'm interested in the scalability of SELinux, and tried with
rwlock and RCU approaches.
I simply replaced spinlock_irq_save() by (read|write)_lock_irqsave() first,
but performance improvement was observed in the hackbench only,not in OSDL-REAIM.
Next, I tried with RCU approach. I came across the following problem.
Some AVC-Entries are referred directly by avc_entry_ref structure
in various resource objects (such as task_struct, inode and so on...).
Thus, referring to invalidated AVC-Entries may happen after detaching
an entry from the AVC hash list.
Since only list scanning of forward direction is expected in RCU-model,
direct reference to AVC-Entry is not appropriate.
In my opinion, direct reference to AVC-Entry should be removed
to avoid the problem for scalability of SELinux.
The purpose of this direct reference is performance improvement
in consecutive access control check about each related object.
Performance degradation may happen by this.
But I think it is not so significant, because the number of the hash
slot is 512 in spite of that the number of AVC-Entry is 410 fixed.
We can reach the target AVC-Entry by one or two steps in average.
Is removing direct reference to AVC-Entry approach acceptable?
I'll try to consider this issue further.
--------
Kai Gai, Linux Promotion Center, NEC
E-mail: kaigai@ak.jp.nec.com
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-16 9:33 RCU issue with SELinux (Re: SELINUX performance issues) Kaigai Kohei
@ 2004-08-16 15:19 ` James Morris
2004-08-20 13:36 ` Kaigai Kohei
0 siblings, 1 reply; 31+ messages in thread
From: James Morris @ 2004-08-16 15:19 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), Stephen Smalley
On Mon, 16 Aug 2004, Kaigai Kohei wrote:
> Is removing direct reference to AVC-Entry approach acceptable?
>
> I'll try to consider this issue further.
Sure, if you can make it work without problems.
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-16 15:19 ` James Morris
@ 2004-08-20 13:36 ` Kaigai Kohei
2004-08-20 14:53 ` James Morris
` (3 more replies)
0 siblings, 4 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-20 13:36 UTC (permalink / raw)
To: James Morris, Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng)
[-- Attachment #1: Type: text/plain, Size: 2883 bytes --]
Hello, everyone.
Tuesday, August 17, 2004 12:19 AM
James Morris wrote:
> > Is removing direct reference to AVC-Entry approach acceptable?
> >
> > I'll try to consider this issue further.
>
> Sure, if you can make it work without problems.
The attached patches against to 2.6.8.1 kernel improve the performance
and the scalability of SELinux by RCU-approach.
The evaluation results are as follows:
<Environment>
CPU: Itanium2(1GHz) x 4/8/16/32
Memory: enough (no swap)
OS: 2.6.8.1 (SELinux disabled by 'selinux=0' boot option)
2.6.8.1 (SELinux enabled)
2.6.8.1 + rwlock patch by KaiGai
2.6.8.1 + RCU patch by KaiGai
The test program iterates write() to files on tmpfs 500,000 times in parallel.
It is attached as 'rcutest.c'.
We observed performance improvement and perfect scalability.
[rcutest Loop=500000 Parallel=<Num of CPU>]
- upper column is the RealTime [second].
- lefthand of lower column is the UserTime, righthand is the SysTime.
--- 4CPU --- --- 8CPU --- --- 16CPU --- --- 32CPU ---
2.6.8.1(disable) 8.0158[s] 8.0183[s] 8.0146[s] 8.0037[s]
(2.08/ 6.07) (1.86/6.31) (0.78/ 7.33) (2.03/5.94)
2.6.8.1(enable) 78.0957[s] 319.0451[s] 1322.0313[s] too long
(2.47/76.48) (2.47/316.96) (2.43/1319.8) (---/---)
2.6.8.1.rwlock 20.0100[s] 49.0390[s] 90.0200[s] 177.0533[s]
(2.57/17.52) (2.45/46.93) (2.37/87.78) (2.41/175.1)
2.6.8.1.rcu 11.0464[s] 11.0205[s] 11.0372[s] 11.0496[s]
(2.64/8.82) (2.21/8.98) (2.67/ 8.68) (2.51/8.95)
-------------------------------------------------------------------------
These patches achieve lock-free read access to AVC cache.
Significant changes are as follows:
- Direct references by avc_entry_ref were removed for RCU.
- Some structures(avc_entry/avc_node/avc_cache) were changed for RCU.
- avc_node type objects are allocated dynamically, not statically.
avc_reclaim_node() reclaims some avc_node objects when the number of
objects exceeds AVC_CACHE_THRESHOLD.
- For updating av_decision structure, I came up with a new RCU primitive,
list_replace_rcu() in include/linux/list.h .
It is needed to switch the linked pointer atomically, and replace
an old object with a new one. list_replace_rcu() is used in avc_update_node()
to update av_decision structure, and an original object is duplicated and updated.
We now pay more cost for updating av_decision structure as a compensation
for lock-free read access. But, in my opinion, updating av_decision is so rare.
NOTE: The fifth arguments of avc_has_perm() and avc_has_perm_noaudit()
are not necessary. But the prototype declarations and all the callers
remain, since I want to avoid messing up the patches.
Any comments?
----------
KaiGai <kaigai@ak.jp.nec.com>
[-- Attachment #2: selinux.rcu-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 21368 bytes --]
diff -rNU4 linux-2.6.8.1/security/selinux/avc.c linux-2.6.8.1.rcu/security/selinux/avc.c
--- linux-2.6.8.1/security/selinux/avc.c 2004-08-14 19:55:48.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/avc.c 2004-08-20 20:38:06.000000000 +0900
@@ -35,28 +35,31 @@
#include "av_perm_to_string.h"
#include "objsec.h"
#define AVC_CACHE_SLOTS 512
-#define AVC_CACHE_MAXNODES 410
+#define AVC_CACHE_THRESHOLD 410
+#define AVC_CACHE_RECLAIM 16
struct avc_entry {
u32 ssid;
u32 tsid;
u16 tclass;
struct av_decision avd;
- int used; /* used recently */
+ atomic_t used; /* used recently */
};
struct avc_node {
+ struct list_head list;
+ struct rcu_head rhead;
struct avc_entry ae;
- struct avc_node *next;
};
struct avc_cache {
- struct avc_node *slots[AVC_CACHE_SLOTS];
- u32 lru_hint; /* LRU hint for reclaim scan */
- u32 active_nodes;
- u32 latest_notif; /* latest revocation notification */
+ struct list_head slots[AVC_CACHE_SLOTS];
+ spinlock_t slots_lock[AVC_CACHE_SLOTS]; /* Lock for RCUupdate */
+ atomic_t lru_hint; /* LRU hint for reclaim scan */
+ atomic_t active_nodes;
+ u32 latest_notif; /* latest revocation notification */
};
struct avc_callback_node {
int (*callback) (u32 event, u32 ssid, u32 tsid,
@@ -69,10 +72,8 @@
u32 perms;
struct avc_callback_node *next;
};
-static spinlock_t avc_lock = SPIN_LOCK_UNLOCKED;
-static struct avc_node *avc_node_freelist;
static struct avc_cache avc_cache;
static unsigned avc_cache_stats[AVC_NSTATS];
static struct avc_callback_node *avc_callbacks;
@@ -187,52 +188,44 @@
* Initialize the access vector cache.
*/
void __init avc_init(void)
{
- struct avc_node *new;
int i;
- for (i = 0; i < AVC_CACHE_MAXNODES; i++) {
- new = kmalloc(sizeof(*new), GFP_ATOMIC);
- if (!new) {
- printk(KERN_WARNING "avc: only able to allocate "
- "%d entries\n", i);
- break;
- }
- memset(new, 0, sizeof(*new));
- new->next = avc_node_freelist;
- avc_node_freelist = new;
- }
-
+ for (i =0; i < AVC_CACHE_SLOTS; i++) {
+ INIT_LIST_HEAD(&avc_cache.slots[i]);
+ avc_cache.slots_lock[i] = SPIN_LOCK_UNLOCKED;
+ }
+ atomic_set(&avc_cache.active_nodes, 0);
+ atomic_set(&avc_cache.lru_hint, 0);
+
audit_log(current->audit_context, "AVC INITIALIZED\n");
}
#if 0
static void avc_hash_eval(char *tag)
{
int i, chain_len, max_chain_len, slots_used;
struct avc_node *node;
+ struct list_head *pos;
unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
slots_used = 0;
max_chain_len = 0;
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- if (node) {
+ if (!list_empty(&avc_cache.slots[i])) {
slots_used++;
chain_len = 0;
- while (node) {
+ list_for_each_rcu(pos, &avc_cache.slots[i])
chain_len++;
- node = node->next;
- }
if (chain_len > max_chain_len)
max_chain_len = chain_len;
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
printk(KERN_INFO "\n");
printk(KERN_INFO "%s avc: %d entries and %d/%d buckets used, longest "
"chain length %d\n", tag, avc_cache.active_nodes, slots_used,
@@ -242,188 +235,194 @@
static inline void avc_hash_eval(char *tag)
{ }
#endif
-static inline struct avc_node *avc_reclaim_node(void)
-{
- struct avc_node *prev, *cur;
- int hvalue, try;
+static void avc_node_free(struct rcu_head *rhead) {
+ struct avc_node *node;
+ node = container_of(rhead, struct avc_node, rhead);
+ kfree(node);
+}
- hvalue = avc_cache.lru_hint;
- for (try = 0; try < 2; try++) {
- do {
- prev = NULL;
- cur = avc_cache.slots[hvalue];
- while (cur) {
- if (!cur->ae.used)
- goto found;
+static inline void avc_reclaim_node(void)
+{
+ struct avc_node *node;
+ struct list_head *pos;
+ int hvalue, try, ecx;
- cur->ae.used = 0;
+ for (try = 0, ecx = 0; try < AVC_CACHE_SLOTS; try++ ) {
+ hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
- prev = cur;
- cur = cur->next;
+ if (spin_trylock(&avc_cache.slots_lock[hvalue])) {
+ list_for_each_rcu(pos, &avc_cache.slots[hvalue]) {
+ node = list_entry(pos, struct avc_node, list);
+ if (!atomic_dec_and_test(&node->ae.used)) {
+ /* Recently Unused */
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ ecx++;
+ if (ecx >= AVC_CACHE_RECLAIM) {
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ goto out;
+ }
+ }
}
- hvalue = (hvalue + 1) & (AVC_CACHE_SLOTS - 1);
- } while (hvalue != avc_cache.lru_hint);
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ }
}
-
- panic("avc_reclaim_node");
-
-found:
- avc_cache.lru_hint = hvalue;
-
- if (prev == NULL)
- avc_cache.slots[hvalue] = cur->next;
- else
- prev->next = cur->next;
-
- return cur;
+out:
+ return;
}
-static inline struct avc_node *avc_claim_node(u32 ssid,
- u32 tsid, u16 tclass)
+static inline struct avc_node *avc_get_node(void)
{
struct avc_node *new;
- int hvalue;
+ int actives;
- hvalue = avc_hash(ssid, tsid, tclass);
- if (avc_node_freelist) {
- new = avc_node_freelist;
- avc_node_freelist = avc_node_freelist->next;
- avc_cache.active_nodes++;
- } else {
- new = avc_reclaim_node();
- if (!new)
- goto out;
+ new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (new) {
+ actives = atomic_inc_return(&avc_cache.active_nodes);
+ if (actives > AVC_CACHE_THRESHOLD)
+ avc_reclaim_node();
}
- new->ae.used = 1;
- new->ae.ssid = ssid;
- new->ae.tsid = tsid;
- new->ae.tclass = tclass;
- new->next = avc_cache.slots[hvalue];
- avc_cache.slots[hvalue] = new;
-
-out:
return new;
}
static inline struct avc_node *avc_search_node(u32 ssid, u32 tsid,
u16 tclass, int *probes)
{
- struct avc_node *cur;
+ struct list_head *pos;
+ struct avc_node *node, *ret = NULL;
int hvalue;
int tprobes = 1;
hvalue = avc_hash(ssid, tsid, tclass);
- cur = avc_cache.slots[hvalue];
- while (cur != NULL &&
- (ssid != cur->ae.ssid ||
- tclass != cur->ae.tclass ||
- tsid != cur->ae.tsid)) {
+ list_for_each_rcu(pos, &avc_cache.slots[hvalue]) {
+ node = list_entry(pos, struct avc_node, list);
+ if (ssid == node->ae.ssid &&
+ tclass == node->ae.tclass &&
+ tsid == node->ae.tsid) {
+ ret = node;
+ break;
+ }
tprobes++;
- cur = cur->next;
}
- if (cur == NULL) {
+ if (ret == NULL) {
/* cache miss */
goto out;
}
/* cache hit */
if (probes)
*probes = tprobes;
-
- cur->ae.used = 1;
-
+ if (atomic_read(&ret->ae.used) != 1)
+ atomic_set(&ret->ae.used, 1);
out:
- return cur;
+ return ret;
}
/**
* avc_lookup - Look up an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
*
* Look up an AVC entry that is valid for the
* @requested permissions between the SID pair
* (@ssid, @tsid), interpreting the permissions
* based on @tclass. If a valid AVC entry exists,
- * then this function updates @aeref to refer to the
- * entry and returns %0. Otherwise, this function
- * returns -%ENOENT.
+ * then this function return the avc_node.
+ * Otherwise, this function returns NULL.
*/
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref)
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
{
struct avc_node *node;
- int probes, rc = 0;
+ int probes;
avc_cache_stats_incr(AVC_CAV_LOOKUPS);
node = avc_search_node(ssid, tsid, tclass,&probes);
if (node && ((node->ae.avd.decided & requested) == requested)) {
avc_cache_stats_incr(AVC_CAV_HITS);
avc_cache_stats_add(AVC_CAV_PROBES,probes);
- aeref->ae = &node->ae;
goto out;
}
avc_cache_stats_incr(AVC_CAV_MISSES);
- rc = -ENOENT;
out:
- return rc;
+ return node;
+}
+
+static int avc_latest_notif_update(int seqno, int is_insert)
+{
+ int ret = 0;
+ static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
+
+ spin_lock(¬if_lock);
+ if (seqno < avc_cache.latest_notif) {
+ if (is_insert) {
+ printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
+ seqno, avc_cache.latest_notif);
+ ret = -EAGAIN;
+ } else {
+ avc_cache.latest_notif = seqno;
+ }
+ }
+ spin_unlock(¬if_lock);
+ return ret;
}
/**
* avc_insert - Insert an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @ae: AVC entry
- * @aeref: AVC entry reference
*
* Insert an AVC entry for the SID pair
* (@ssid, @tsid) and class @tclass.
* The access vectors and the sequence number are
* normally provided by the security server in
* response to a security_compute_av() call. If the
* sequence number @ae->avd.seqno is not less than the latest
* revocation notification, then the function copies
- * the access vectors into a cache entry, updates
- * @aeref to refer to the entry, and returns %0.
- * Otherwise, this function returns -%EAGAIN.
+ * the access vectors into a cache entry, returns
+ * avc_node inserted. Otherwise, this function returns NULL.
*/
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *aeref)
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae)
{
- struct avc_node *node;
- int rc = 0;
+ struct avc_node *node = NULL;
+ int hvalue;
- if (ae->avd.seqno < avc_cache.latest_notif) {
- printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
- ae->avd.seqno, avc_cache.latest_notif);
- rc = -EAGAIN;
+ if (avc_latest_notif_update(ae->avd.seqno, 1))
goto out;
- }
- node = avc_claim_node(ssid, tsid, tclass);
- if (!node) {
- rc = -ENOMEM;
- goto out;
- }
+ node = avc_get_node();
+
+ if (node) {
+ hvalue = avc_hash(ssid, tsid, tclass);
+
+ INIT_LIST_HEAD(&node->list);
+ INIT_RCU_HEAD(&node->rhead);
- node->ae.avd.allowed = ae->avd.allowed;
- node->ae.avd.decided = ae->avd.decided;
- node->ae.avd.auditallow = ae->avd.auditallow;
- node->ae.avd.auditdeny = ae->avd.auditdeny;
- node->ae.avd.seqno = ae->avd.seqno;
- aeref->ae = &node->ae;
+ node->ae.ssid = ssid;
+ node->ae.tsid = tsid;
+ node->ae.tclass = tclass;
+ atomic_set(&node->ae.used, 1);
+
+ node->ae.avd.allowed = ae->avd.allowed;
+ node->ae.avd.decided = ae->avd.decided;
+ node->ae.avd.auditallow = ae->avd.auditallow;
+ node->ae.avd.auditdeny = ae->avd.auditdeny;
+ node->ae.avd.seqno = ae->avd.seqno;
+
+ list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
+ }
out:
- return rc;
+ return node;
}
static inline void avc_print_ipv6_addr(struct audit_buffer *ab,
struct in6_addr *addr, u16 port,
@@ -685,63 +684,99 @@
{
return (x == y || x == SECSID_WILD || y == SECSID_WILD);
}
-static inline void avc_update_node(u32 event, struct avc_node *node, u32 perms)
+static void avc_update_node(u32 event, u32 perms, u32 ssid, u32 tsid, u16 tclass)
{
- switch (event) {
- case AVC_CALLBACK_GRANT:
- node->ae.avd.allowed |= perms;
- break;
- case AVC_CALLBACK_TRY_REVOKE:
- case AVC_CALLBACK_REVOKE:
- node->ae.avd.allowed &= ~perms;
- break;
- case AVC_CALLBACK_AUDITALLOW_ENABLE:
- node->ae.avd.auditallow |= perms;
- break;
- case AVC_CALLBACK_AUDITALLOW_DISABLE:
- node->ae.avd.auditallow &= ~perms;
- break;
- case AVC_CALLBACK_AUDITDENY_ENABLE:
- node->ae.avd.auditdeny |= perms;
- break;
- case AVC_CALLBACK_AUDITDENY_DISABLE:
- node->ae.avd.auditdeny &= ~perms;
- break;
+ int hvalue;
+ struct list_head *pos;
+ struct avc_node *new, *node, *org = NULL;
+
+ new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ hvalue = avc_hash(ssid, tsid, tclass);
+
+ /* Lock the target slot */
+ spin_lock(&avc_cache.slots_lock[hvalue]);
+ list_for_each(pos, &avc_cache.slots[hvalue]) {
+ node = list_entry(pos, struct avc_node, list);
+ if (ssid==node->ae.ssid &&
+ tsid==node->ae.tsid &&
+ tclass==node->ae.tclass ) {
+ org = node;
+ break;
+ }
+ }
+
+ if (org) {
+ if (!new) {
+ list_del_rcu(&org->list);
+ call_rcu(&org->rhead, avc_node_free);
+ goto out;
+ }
+
+ /* Duplicate and Update */
+ memcpy(new, org, sizeof(struct avc_node));
+
+ switch (event) {
+ case AVC_CALLBACK_GRANT:
+ new->ae.avd.allowed |= perms;
+ break;
+ case AVC_CALLBACK_TRY_REVOKE:
+ case AVC_CALLBACK_REVOKE:
+ new->ae.avd.allowed &= ~perms;
+ break;
+ case AVC_CALLBACK_AUDITALLOW_ENABLE:
+ new->ae.avd.auditallow |= perms;
+ break;
+ case AVC_CALLBACK_AUDITALLOW_DISABLE:
+ new->ae.avd.auditallow &= ~perms;
+ break;
+ case AVC_CALLBACK_AUDITDENY_ENABLE:
+ new->ae.avd.auditdeny |= perms;
+ break;
+ case AVC_CALLBACK_AUDITDENY_DISABLE:
+ new->ae.avd.auditdeny &= ~perms;
+ break;
+ }
+
+ list_replace_rcu(&org->list,&new->list);
+ call_rcu(&org->rhead, avc_node_free);
}
+out:
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
}
static int avc_update_cache(u32 event, u32 ssid, u32 tsid,
u16 tclass, u32 perms)
{
+ struct list_head *pos;
struct avc_node *node;
int i;
- unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
if (ssid == SECSID_WILD || tsid == SECSID_WILD) {
/* apply to all matching nodes */
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- for (node = avc_cache.slots[i]; node;
- node = node->next) {
+ list_for_each_rcu(pos, &avc_cache.slots[i] ) {
+ node = list_entry(pos, struct avc_node, list);
if (avc_sidcmp(ssid, node->ae.ssid) &&
avc_sidcmp(tsid, node->ae.tsid) &&
- tclass == node->ae.tclass) {
- avc_update_node(event,node,perms);
+ tclass == node->ae.tclass ) {
+ avc_update_node(event, perms, node->ae.ssid
+ ,node->ae.tsid, node->ae.tclass);
}
}
}
} else {
/* apply to one node */
node = avc_search_node(ssid, tsid, tclass, NULL);
if (node) {
- avc_update_node(event,node,perms);
+ avc_update_node(event, perms, ssid, tsid, tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
return 0;
}
@@ -751,9 +786,8 @@
{
struct avc_callback_node *c;
u32 tretained = 0, cretained = 0;
int rc = 0;
- unsigned long flags;
/*
* try_revoke only removes permissions from the cache
* state if they are not retained by the object manager.
@@ -786,12 +820,9 @@
avc_update_cache(event,ssid,tsid,tclass,perms);
*out_retained = tretained;
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -856,34 +887,24 @@
int avc_ss_reset(u32 seqno)
{
struct avc_callback_node *c;
int i, rc = 0;
- struct avc_node *node, *tmp;
- unsigned long flags;
+ struct list_head *pos;
+ struct avc_node *node;
avc_hash_eval("reset");
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- while (node) {
- tmp = node;
- node = node->next;
- tmp->ae.ssid = tmp->ae.tsid = SECSID_NULL;
- tmp->ae.tclass = SECCLASS_NULL;
- tmp->ae.avd.allowed = tmp->ae.avd.decided = 0;
- tmp->ae.avd.auditallow = tmp->ae.avd.auditdeny = 0;
- tmp->ae.used = 0;
- tmp->next = avc_node_freelist;
- avc_node_freelist = tmp;
- avc_cache.active_nodes--;
+ list_for_each_rcu(pos, &avc_cache.slots[i]) {
+ node = list_entry(pos, struct avc_node, list);
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
}
- avc_cache.slots[i] = NULL;
}
- avc_cache.lru_hint = 0;
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
for (i = 0; i < AVC_NSTATS; i++)
avc_cache_stats[i] = 0;
@@ -895,12 +916,9 @@
goto out;
}
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -949,9 +967,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @avd: access vector decisions
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -968,72 +986,45 @@
int avc_has_perm_noaudit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct avc_entry_ref *aeref, struct av_decision *avd)
{
- struct avc_entry *ae;
- int rc = 0;
- unsigned long flags;
+ struct avc_node *node;
struct avc_entry entry;
+ int rc = 0;
u32 denied;
- struct avc_entry_ref ref;
- if (!aeref) {
- avc_entry_ref_init(&ref);
- aeref = &ref;
- }
-
- spin_lock_irqsave(&avc_lock, flags);
+ rcu_read_lock();
avc_cache_stats_incr(AVC_ENTRY_LOOKUPS);
- ae = aeref->ae;
- if (ae) {
- if (ae->ssid == ssid &&
- ae->tsid == tsid &&
- ae->tclass == tclass &&
- ((ae->avd.decided & requested) == requested)) {
- avc_cache_stats_incr(AVC_ENTRY_HITS);
- ae->used = 1;
- } else {
- avc_cache_stats_incr(AVC_ENTRY_DISCARDS);
- ae = NULL;
- }
- }
- if (!ae) {
- avc_cache_stats_incr(AVC_ENTRY_MISSES);
- rc = avc_lookup(ssid, tsid, tclass, requested, aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
- if (rc)
- goto out;
- spin_lock_irqsave(&avc_lock, flags);
- rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
- }
+ node = avc_lookup(ssid, tsid, tclass, requested);
+ if (!node) {
+ rcu_read_unlock();
+ rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
+ if (rc)
+ goto out;
+ rcu_read_lock();
+ node = avc_insert(ssid,tsid,tclass,&entry);
+ if (!node) {
+ rcu_read_unlock();
+ goto out;
}
- ae = aeref->ae;
}
-
if (avd)
- memcpy(avd, &ae->avd, sizeof(*avd));
+ memcpy(avd, &node->ae.avd, sizeof(*avd));
- denied = requested & ~(ae->avd.allowed);
+ denied = requested & ~(node->ae.avd.allowed);
if (!requested || denied) {
if (selinux_enforcing) {
- spin_unlock_irqrestore(&avc_lock,flags);
rc = -EACCES;
- goto out;
} else {
- ae->avd.allowed |= requested;
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
+ if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
+ avc_update_node(AVC_CALLBACK_GRANT
+ ,requested,ssid,tsid,tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
out:
return rc;
}
@@ -1042,9 +1033,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @auditdata: auxiliary audit data
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -1061,8 +1052,8 @@
{
struct av_decision avd;
int rc;
- rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, aeref, &avd);
+ rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, NULL, &avd);
avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
return rc;
}
diff -rNU4 linux-2.6.8.1/security/selinux/include/avc.h linux-2.6.8.1.rcu/security/selinux/include/avc.h
--- linux-2.6.8.1/security/selinux/include/avc.h 2004-08-14 19:54:51.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/include/avc.h 2004-08-20 20:40:50.000000000 +0900
@@ -118,13 +118,11 @@
*/
void __init avc_init(void);
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref);
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested);
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *out_aeref);
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae);
void avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd, int result, struct avc_audit_data *auditdata);
[-- Attachment #3: list_replace_rcu-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 845 bytes --]
--- linux-2.6.8.1/include/linux/list.h 2004-08-14 19:55:33.000000000 +0900
+++ linux-2.6.8.1.rcu/include/linux/list.h 2004-08-20 18:04:10.000000000 +0900
@@ -194,8 +194,23 @@
__list_del(entry->prev, entry->next);
entry->prev = LIST_POISON2;
}
+/*
+ * list_replace_rcu - replace old entry by new onw from list
+ * @old : the element to be replaced from the list.
+ * @new : the new element to insert to the list.
+ *
+ * The old entry will be replaced to the new entry atomically.
+ */
+static inline void list_replace_rcu(struct list_head *old, struct list_head *new){
+ new->next = old->next;
+ new->prev = old->prev;
+ smp_wmb();
+ new->next->prev = new;
+ new->prev->next = new;
+}
+
/**
* list_del_init - deletes entry from list and reinitialize it.
* @entry: the element to delete from the list.
*/
[-- Attachment #4: rcutest.c --]
[-- Type: application/octet-stream, Size: 1478 bytes --]
/* -----------------------------------------
usage:
% gcc -O3 rcutest.c -o rcutest
% N=`grep -c processor /proc/cpuinfo`
% ./rcutest ${N}
Copyright: KaiGai <kaigai@ak.jp.nec.com>
----------------------------------------- */
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#define LOOP (5000000)
void child_shmfile(){
int i,fd;
char buf[100],fname[100];
sprintf(fname,"/dev/shm/file.%d",getpid());
fd = open(fname,O_RDWR|O_CREAT|O_TRUNC);
if( fd < 0 ){
printf("NOTE: Process(%d) can not open %s",getpid(),fname);
return;
}
for(i=0;i<LOOP;i++){
write(fd,buf,1);
lseek(fd,0,SEEK_SET);
}
unlink(fname);
}
int main(int argc, char *argv[]){
int fd,i,j,cproc,nproc;
int sec,usec;
struct timeval stm,etm;
if( argc!=2 )
return( 1 );
nproc = atoi(argv[1]);
gettimeofday(&stm, NULL);
for(i=0,cproc=0;i<nproc;i++){
j = fork();
if( j==0 ){
child_shmfile();
exit( 0 );
}else if( j>0 ){
cproc++;
}
}
waitpid(-1,NULL,0);
gettimeofday(&etm, NULL);
sec = etm.tv_sec - stm.tv_sec;
usec = etm.tv_usec - stm.tv_usec;
if( usec<0 ){
sec--;
usec = 1000000 + usec;
}
printf("[%d Proc x %d Loop] Time: %.4f \n",cproc,LOOP,
(double)sec + ((double)usec)/10000000.0 );
}
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 13:36 ` Kaigai Kohei
@ 2004-08-20 14:53 ` James Morris
2004-08-24 7:27 ` Kaigai Kohei
2004-08-20 17:31 ` RCU issue with SELinux (Re: SELINUX performance issues) Luke Kenneth Casson Leighton
` (2 subsequent siblings)
3 siblings, 1 reply; 31+ messages in thread
From: James Morris @ 2004-08-20 14:53 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Fri, 20 Aug 2004, Kaigai Kohei wrote:
> The attached patches against to 2.6.8.1 kernel improve the performance
> and the scalability of SELinux by RCU-approach.
> --- 4CPU --- --- 8CPU --- --- 16CPU --- --- 32CPU ---
> 2.6.8.1(disable) 8.0158[s] 8.0183[s] 8.0146[s] 8.0037[s]
> (2.08/ 6.07) (1.86/6.31) (0.78/ 7.33) (2.03/5.94)
> 2.6.8.1(enable) 78.0957[s] 319.0451[s] 1322.0313[s] too long
> (2.47/76.48) (2.47/316.96) (2.43/1319.8) (---/---)
> 2.6.8.1.rwlock 20.0100[s] 49.0390[s] 90.0200[s] 177.0533[s]
> (2.57/17.52) (2.45/46.93) (2.37/87.78) (2.41/175.1)
> 2.6.8.1.rcu 11.0464[s] 11.0205[s] 11.0372[s] 11.0496[s]
> (2.64/8.82) (2.21/8.98) (2.67/ 8.68) (2.51/8.95)
> -------------------------------------------------------------------------
Do you have figures for 1 and 2 CPU?
Also, can you run some more benchmarks, e.g. lmbench, unixbench, dbench
etc?
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 13:36 ` Kaigai Kohei
2004-08-20 14:53 ` James Morris
@ 2004-08-20 17:31 ` Luke Kenneth Casson Leighton
2004-08-20 18:15 ` James Morris
2004-08-20 20:19 ` Paul E. McKenney
[not found] ` <1093014789.16585.186.camel@moss-spartans.epoch.ncsc.mil>
3 siblings, 1 reply; 31+ messages in thread
From: Luke Kenneth Casson Leighton @ 2004-08-20 17:31 UTC (permalink / raw)
To: Kaigai Kohei
Cc: James Morris, Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Fri, Aug 20, 2004 at 10:36:03PM +0900, Kaigai Kohei wrote:
> Hello, everyone.
>
> Tuesday, August 17, 2004 12:19 AM
> James Morris wrote:
> > > Is removing direct reference to AVC-Entry approach acceptable?
> > >
> > > I'll try to consider this issue further.
> >
> > Sure, if you can make it work without problems.
>
> The attached patches against to 2.6.8.1 kernel improve the performance
> and the scalability of SELinux by RCU-approach.
>
> The evaluation results are as follows:
> <Environment>
> CPU: Itanium2(1GHz) x 4/8/16/32
> Memory: enough (no swap)
> OS: 2.6.8.1 (SELinux disabled by 'selinux=0' boot option)
> 2.6.8.1 (SELinux enabled)
> 2.6.8.1 + rwlock patch by KaiGai
> 2.6.8.1 + RCU patch by KaiGai
>
> The test program iterates write() to files on tmpfs 500,000 times in parallel.
^^^^^
i presume that that is without the xattr patch which adds extended
attributes to tmpfs?
(see http://hands.com/~lkcl/selinux)
l.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 17:31 ` RCU issue with SELinux (Re: SELINUX performance issues) Luke Kenneth Casson Leighton
@ 2004-08-20 18:15 ` James Morris
0 siblings, 0 replies; 31+ messages in thread
From: James Morris @ 2004-08-20 18:15 UTC (permalink / raw)
To: Luke Kenneth Casson Leighton
Cc: Kaigai Kohei, Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Fri, 20 Aug 2004, Luke Kenneth Casson Leighton wrote:
> i presume that that is without the xattr patch which adds extended
> attributes to tmpfs?
Yes.
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 13:36 ` Kaigai Kohei
2004-08-20 14:53 ` James Morris
2004-08-20 17:31 ` RCU issue with SELinux (Re: SELINUX performance issues) Luke Kenneth Casson Leighton
@ 2004-08-20 20:19 ` Paul E. McKenney
2004-08-20 20:35 ` James Morris
2004-08-24 7:27 ` Kaigai Kohei
[not found] ` <1093014789.16585.186.camel@moss-spartans.epoch.ncsc.mil>
3 siblings, 2 replies; 31+ messages in thread
From: Paul E. McKenney @ 2004-08-20 20:19 UTC (permalink / raw)
To: Kaigai Kohei
Cc: James Morris, Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Fri, Aug 20, 2004 at 10:36:03PM +0900, Kaigai Kohei wrote:
> Hello, everyone.
>
> Tuesday, August 17, 2004 12:19 AM
> James Morris wrote:
> > > Is removing direct reference to AVC-Entry approach acceptable?
> > >
> > > I'll try to consider this issue further.
> >
> > Sure, if you can make it work without problems.
>
> The attached patches against to 2.6.8.1 kernel improve the performance
> and the scalability of SELinux by RCU-approach.
>
> The evaluation results are as follows:
> <Environment>
> CPU: Itanium2(1GHz) x 4/8/16/32
> Memory: enough (no swap)
> OS: 2.6.8.1 (SELinux disabled by 'selinux=0' boot option)
> 2.6.8.1 (SELinux enabled)
> 2.6.8.1 + rwlock patch by KaiGai
> 2.6.8.1 + RCU patch by KaiGai
>
> The test program iterates write() to files on tmpfs 500,000 times in parallel.
> It is attached as 'rcutest.c'.
>
> We observed performance improvement and perfect scalability.
>
> [rcutest Loop=500000 Parallel=<Num of CPU>]
> - upper column is the RealTime [second].
> - lefthand of lower column is the UserTime, righthand is the SysTime.
> --- 4CPU --- --- 8CPU --- --- 16CPU --- --- 32CPU ---
> 2.6.8.1(disable) 8.0158[s] 8.0183[s] 8.0146[s] 8.0037[s]
> (2.08/ 6.07) (1.86/6.31) (0.78/ 7.33) (2.03/5.94)
> 2.6.8.1(enable) 78.0957[s] 319.0451[s] 1322.0313[s] too long
> (2.47/76.48) (2.47/316.96) (2.43/1319.8) (---/---)
> 2.6.8.1.rwlock 20.0100[s] 49.0390[s] 90.0200[s] 177.0533[s]
> (2.57/17.52) (2.45/46.93) (2.37/87.78) (2.41/175.1)
> 2.6.8.1.rcu 11.0464[s] 11.0205[s] 11.0372[s] 11.0496[s]
> (2.64/8.82) (2.21/8.98) (2.67/ 8.68) (2.51/8.95)
> -------------------------------------------------------------------------
Nice performance increase and great scalability! I echo an earlier
email asking for 1-CPU and 2-CPU results.
> These patches achieve lock-free read access to AVC cache.
>
> Significant changes are as follows:
> - Direct references by avc_entry_ref were removed for RCU.
> - Some structures(avc_entry/avc_node/avc_cache) were changed for RCU.
> - avc_node type objects are allocated dynamically, not statically.
> avc_reclaim_node() reclaims some avc_node objects when the number of
> objects exceeds AVC_CACHE_THRESHOLD.
> - For updating av_decision structure, I came up with a new RCU primitive,
> list_replace_rcu() in include/linux/list.h .
> It is needed to switch the linked pointer atomically, and replace
> an old object with a new one. list_replace_rcu() is used in avc_update_node()
> to update av_decision structure, and an original object is duplicated and updated.
> We now pay more cost for updating av_decision structure as a compensation
> for lock-free read access. But, in my opinion, updating av_decision is so rare.
>
> NOTE: The fifth arguments of avc_has_perm() and avc_has_perm_noaudit()
> are not necessary. But the prototype declarations and all the callers
> remain, since I want to avoid messing up the patches.
>
> Any comments?
> ----------
> KaiGai <kaigai@ak.jp.nec.com>
The list_replace_rcu() primitive looks good. At last, a primitive that
reflects the reason for RCU's name! ;-)
A few comments interspersed for the larger patch (look for empty lines).
I am missing why some of the list insertions and deletions are SMP safe.
Thanx, Paul
> diff -rNU4 linux-2.6.8.1/security/selinux/avc.c linux-2.6.8.1.rcu/security/selinux/avc.c
> --- linux-2.6.8.1/security/selinux/avc.c 2004-08-14 19:55:48.000000000 +0900
> +++ linux-2.6.8.1.rcu/security/selinux/avc.c 2004-08-20 20:38:06.000000000 +0900
> @@ -35,28 +35,31 @@
> #include "av_perm_to_string.h"
> #include "objsec.h"
>
> #define AVC_CACHE_SLOTS 512
> -#define AVC_CACHE_MAXNODES 410
> +#define AVC_CACHE_THRESHOLD 410
> +#define AVC_CACHE_RECLAIM 16
>
> struct avc_entry {
> u32 ssid;
> u32 tsid;
> u16 tclass;
> struct av_decision avd;
> - int used; /* used recently */
> + atomic_t used; /* used recently */
> };
>
> struct avc_node {
> + struct list_head list;
> + struct rcu_head rhead;
> struct avc_entry ae;
> - struct avc_node *next;
> };
>
> struct avc_cache {
> - struct avc_node *slots[AVC_CACHE_SLOTS];
> - u32 lru_hint; /* LRU hint for reclaim scan */
> - u32 active_nodes;
> - u32 latest_notif; /* latest revocation notification */
> + struct list_head slots[AVC_CACHE_SLOTS];
> + spinlock_t slots_lock[AVC_CACHE_SLOTS]; /* Lock for RCUupdate */
> + atomic_t lru_hint; /* LRU hint for reclaim scan */
> + atomic_t active_nodes;
> + u32 latest_notif; /* latest revocation notification */
> };
>
> struct avc_callback_node {
> int (*callback) (u32 event, u32 ssid, u32 tsid,
> @@ -69,10 +72,8 @@
> u32 perms;
> struct avc_callback_node *next;
> };
>
> -static spinlock_t avc_lock = SPIN_LOCK_UNLOCKED;
> -static struct avc_node *avc_node_freelist;
> static struct avc_cache avc_cache;
> static unsigned avc_cache_stats[AVC_NSTATS];
> static struct avc_callback_node *avc_callbacks;
>
> @@ -187,52 +188,44 @@
> * Initialize the access vector cache.
> */
> void __init avc_init(void)
> {
> - struct avc_node *new;
> int i;
>
> - for (i = 0; i < AVC_CACHE_MAXNODES; i++) {
> - new = kmalloc(sizeof(*new), GFP_ATOMIC);
> - if (!new) {
> - printk(KERN_WARNING "avc: only able to allocate "
> - "%d entries\n", i);
> - break;
> - }
> - memset(new, 0, sizeof(*new));
> - new->next = avc_node_freelist;
> - avc_node_freelist = new;
> - }
> -
> + for (i =0; i < AVC_CACHE_SLOTS; i++) {
> + INIT_LIST_HEAD(&avc_cache.slots[i]);
> + avc_cache.slots_lock[i] = SPIN_LOCK_UNLOCKED;
> + }
> + atomic_set(&avc_cache.active_nodes, 0);
> + atomic_set(&avc_cache.lru_hint, 0);
> +
> audit_log(current->audit_context, "AVC INITIALIZED\n");
> }
>
> #if 0
> static void avc_hash_eval(char *tag)
> {
> int i, chain_len, max_chain_len, slots_used;
> struct avc_node *node;
> + struct list_head *pos;
> unsigned long flags;
>
> - spin_lock_irqsave(&avc_lock,flags);
> + rcu_read_lock();
>
> slots_used = 0;
> max_chain_len = 0;
> for (i = 0; i < AVC_CACHE_SLOTS; i++) {
> - node = avc_cache.slots[i];
> - if (node) {
> + if (!list_empty(&avc_cache.slots[i])) {
> slots_used++;
> chain_len = 0;
> - while (node) {
> + list_for_each_rcu(pos, &avc_cache.slots[i])
> chain_len++;
> - node = node->next;
> - }
> if (chain_len > max_chain_len)
> max_chain_len = chain_len;
> }
> }
>
> - spin_unlock_irqrestore(&avc_lock,flags);
> + rcu_read_unlock();
>
> printk(KERN_INFO "\n");
> printk(KERN_INFO "%s avc: %d entries and %d/%d buckets used, longest "
> "chain length %d\n", tag, avc_cache.active_nodes, slots_used,
> @@ -242,188 +235,194 @@
> static inline void avc_hash_eval(char *tag)
> { }
> #endif
>
> -static inline struct avc_node *avc_reclaim_node(void)
> -{
> - struct avc_node *prev, *cur;
> - int hvalue, try;
> +static void avc_node_free(struct rcu_head *rhead) {
> + struct avc_node *node;
> + node = container_of(rhead, struct avc_node, rhead);
> + kfree(node);
> +}
>
> - hvalue = avc_cache.lru_hint;
> - for (try = 0; try < 2; try++) {
> - do {
> - prev = NULL;
> - cur = avc_cache.slots[hvalue];
> - while (cur) {
> - if (!cur->ae.used)
> - goto found;
> +static inline void avc_reclaim_node(void)
> +{
> + struct avc_node *node;
> + struct list_head *pos;
> + int hvalue, try, ecx;
>
> - cur->ae.used = 0;
> + for (try = 0, ecx = 0; try < AVC_CACHE_SLOTS; try++ ) {
> + hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
>
> - prev = cur;
> - cur = cur->next;
> + if (spin_trylock(&avc_cache.slots_lock[hvalue])) {
> + list_for_each_rcu(pos, &avc_cache.slots[hvalue]) {
Since we are holding the lock, the list cannot change, and the _rcu()
is not needed. Right?
> + node = list_entry(pos, struct avc_node, list);
Why not just do:
list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
rather than having the separate list_entry().
> + if (!atomic_dec_and_test(&node->ae.used)) {
> + /* Recently Unused */
> + list_del_rcu(&node->list);
> + call_rcu(&node->rhead, avc_node_free);
> + atomic_dec(&avc_cache.active_nodes);
> + ecx++;
> + if (ecx >= AVC_CACHE_RECLAIM) {
> + spin_unlock(&avc_cache.slots_lock[hvalue]);
> + goto out;
> + }
> + }
> }
> - hvalue = (hvalue + 1) & (AVC_CACHE_SLOTS - 1);
> - } while (hvalue != avc_cache.lru_hint);
> + spin_unlock(&avc_cache.slots_lock[hvalue]);
> + }
> }
> -
> - panic("avc_reclaim_node");
> -
> -found:
> - avc_cache.lru_hint = hvalue;
> -
> - if (prev == NULL)
> - avc_cache.slots[hvalue] = cur->next;
> - else
> - prev->next = cur->next;
> -
> - return cur;
> +out:
> + return;
> }
>
> -static inline struct avc_node *avc_claim_node(u32 ssid,
> - u32 tsid, u16 tclass)
> +static inline struct avc_node *avc_get_node(void)
> {
> struct avc_node *new;
> - int hvalue;
> + int actives;
>
> - hvalue = avc_hash(ssid, tsid, tclass);
> - if (avc_node_freelist) {
> - new = avc_node_freelist;
> - avc_node_freelist = avc_node_freelist->next;
> - avc_cache.active_nodes++;
> - } else {
> - new = avc_reclaim_node();
> - if (!new)
> - goto out;
> + new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
> + if (new) {
> + actives = atomic_inc_return(&avc_cache.active_nodes);
> + if (actives > AVC_CACHE_THRESHOLD)
> + avc_reclaim_node();
> }
>
> - new->ae.used = 1;
> - new->ae.ssid = ssid;
> - new->ae.tsid = tsid;
> - new->ae.tclass = tclass;
> - new->next = avc_cache.slots[hvalue];
> - avc_cache.slots[hvalue] = new;
> -
> -out:
> return new;
> }
>
> static inline struct avc_node *avc_search_node(u32 ssid, u32 tsid,
> u16 tclass, int *probes)
> {
> - struct avc_node *cur;
> + struct list_head *pos;
> + struct avc_node *node, *ret = NULL;
> int hvalue;
> int tprobes = 1;
>
> hvalue = avc_hash(ssid, tsid, tclass);
> - cur = avc_cache.slots[hvalue];
> - while (cur != NULL &&
> - (ssid != cur->ae.ssid ||
> - tclass != cur->ae.tclass ||
> - tsid != cur->ae.tsid)) {
> + list_for_each_rcu(pos, &avc_cache.slots[hvalue]) {
> + node = list_entry(pos, struct avc_node, list);
Why not just do:
list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list) {
rather than having the separate list_entry().
> + if (ssid == node->ae.ssid &&
> + tclass == node->ae.tclass &&
> + tsid == node->ae.tsid) {
> + ret = node;
> + break;
> + }
> tprobes++;
> - cur = cur->next;
> }
>
> - if (cur == NULL) {
> + if (ret == NULL) {
> /* cache miss */
> goto out;
> }
>
> /* cache hit */
> if (probes)
> *probes = tprobes;
> -
> - cur->ae.used = 1;
> -
> + if (atomic_read(&ret->ae.used) != 1)
> + atomic_set(&ret->ae.used, 1);
> out:
> - return cur;
> + return ret;
> }
>
> /**
> * avc_lookup - Look up an AVC entry.
> * @ssid: source security identifier
> * @tsid: target security identifier
> * @tclass: target security class
> * @requested: requested permissions, interpreted based on @tclass
> - * @aeref: AVC entry reference
> *
> * Look up an AVC entry that is valid for the
> * @requested permissions between the SID pair
> * (@ssid, @tsid), interpreting the permissions
> * based on @tclass. If a valid AVC entry exists,
> - * then this function updates @aeref to refer to the
> - * entry and returns %0. Otherwise, this function
> - * returns -%ENOENT.
> + * then this function return the avc_node.
> + * Otherwise, this function returns NULL.
> */
> -int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
> - u32 requested, struct avc_entry_ref *aeref)
> +struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
> {
> struct avc_node *node;
> - int probes, rc = 0;
> + int probes;
>
> avc_cache_stats_incr(AVC_CAV_LOOKUPS);
> node = avc_search_node(ssid, tsid, tclass,&probes);
>
> if (node && ((node->ae.avd.decided & requested) == requested)) {
> avc_cache_stats_incr(AVC_CAV_HITS);
> avc_cache_stats_add(AVC_CAV_PROBES,probes);
> - aeref->ae = &node->ae;
> goto out;
> }
>
> avc_cache_stats_incr(AVC_CAV_MISSES);
> - rc = -ENOENT;
> out:
> - return rc;
> + return node;
> +}
> +
> +static int avc_latest_notif_update(int seqno, int is_insert)
> +{
> + int ret = 0;
> + static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
> +
> + spin_lock(¬if_lock);
> + if (seqno < avc_cache.latest_notif) {
> + if (is_insert) {
> + printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
> + seqno, avc_cache.latest_notif);
> + ret = -EAGAIN;
> + } else {
> + avc_cache.latest_notif = seqno;
> + }
> + }
> + spin_unlock(¬if_lock);
> + return ret;
> }
>
> /**
> * avc_insert - Insert an AVC entry.
> * @ssid: source security identifier
> * @tsid: target security identifier
> * @tclass: target security class
> * @ae: AVC entry
> - * @aeref: AVC entry reference
> *
> * Insert an AVC entry for the SID pair
> * (@ssid, @tsid) and class @tclass.
> * The access vectors and the sequence number are
> * normally provided by the security server in
> * response to a security_compute_av() call. If the
> * sequence number @ae->avd.seqno is not less than the latest
> * revocation notification, then the function copies
> - * the access vectors into a cache entry, updates
> - * @aeref to refer to the entry, and returns %0.
> - * Otherwise, this function returns -%EAGAIN.
> + * the access vectors into a cache entry, returns
> + * avc_node inserted. Otherwise, this function returns NULL.
> */
> -int avc_insert(u32 ssid, u32 tsid, u16 tclass,
> - struct avc_entry *ae, struct avc_entry_ref *aeref)
> +struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae)
> {
> - struct avc_node *node;
> - int rc = 0;
> + struct avc_node *node = NULL;
> + int hvalue;
>
> - if (ae->avd.seqno < avc_cache.latest_notif) {
> - printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
> - ae->avd.seqno, avc_cache.latest_notif);
> - rc = -EAGAIN;
> + if (avc_latest_notif_update(ae->avd.seqno, 1))
> goto out;
> - }
>
> - node = avc_claim_node(ssid, tsid, tclass);
> - if (!node) {
> - rc = -ENOMEM;
> - goto out;
> - }
> + node = avc_get_node();
> +
> + if (node) {
> + hvalue = avc_hash(ssid, tsid, tclass);
> +
> + INIT_LIST_HEAD(&node->list);
> + INIT_RCU_HEAD(&node->rhead);
>
> - node->ae.avd.allowed = ae->avd.allowed;
> - node->ae.avd.decided = ae->avd.decided;
> - node->ae.avd.auditallow = ae->avd.auditallow;
> - node->ae.avd.auditdeny = ae->avd.auditdeny;
> - node->ae.avd.seqno = ae->avd.seqno;
> - aeref->ae = &node->ae;
> + node->ae.ssid = ssid;
> + node->ae.tsid = tsid;
> + node->ae.tclass = tclass;
> + atomic_set(&node->ae.used, 1);
> +
> + node->ae.avd.allowed = ae->avd.allowed;
> + node->ae.avd.decided = ae->avd.decided;
> + node->ae.avd.auditallow = ae->avd.auditallow;
> + node->ae.avd.auditdeny = ae->avd.auditdeny;
> + node->ae.avd.seqno = ae->avd.seqno;
> +
> + list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
> + }
> out:
> - return rc;
> + return node;
> }
>
> static inline void avc_print_ipv6_addr(struct audit_buffer *ab,
> struct in6_addr *addr, u16 port,
> @@ -685,63 +684,99 @@
> {
> return (x == y || x == SECSID_WILD || y == SECSID_WILD);
> }
>
> -static inline void avc_update_node(u32 event, struct avc_node *node, u32 perms)
> +static void avc_update_node(u32 event, u32 perms, u32 ssid, u32 tsid, u16 tclass)
> {
> - switch (event) {
> - case AVC_CALLBACK_GRANT:
> - node->ae.avd.allowed |= perms;
> - break;
> - case AVC_CALLBACK_TRY_REVOKE:
> - case AVC_CALLBACK_REVOKE:
> - node->ae.avd.allowed &= ~perms;
> - break;
> - case AVC_CALLBACK_AUDITALLOW_ENABLE:
> - node->ae.avd.auditallow |= perms;
> - break;
> - case AVC_CALLBACK_AUDITALLOW_DISABLE:
> - node->ae.avd.auditallow &= ~perms;
> - break;
> - case AVC_CALLBACK_AUDITDENY_ENABLE:
> - node->ae.avd.auditdeny |= perms;
> - break;
> - case AVC_CALLBACK_AUDITDENY_DISABLE:
> - node->ae.avd.auditdeny &= ~perms;
> - break;
> + int hvalue;
> + struct list_head *pos;
> + struct avc_node *new, *node, *org = NULL;
> +
> + new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
> + hvalue = avc_hash(ssid, tsid, tclass);
> +
> + /* Lock the target slot */
> + spin_lock(&avc_cache.slots_lock[hvalue]);
> + list_for_each(pos, &avc_cache.slots[hvalue]) {
> + node = list_entry(pos, struct avc_node, list);
Why not just do:
list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list) {
rather than having the separate list_entry().
> + if (ssid==node->ae.ssid &&
> + tsid==node->ae.tsid &&
> + tclass==node->ae.tclass ) {
> + org = node;
> + break;
> + }
> + }
> +
> + if (org) {
> + if (!new) {
If we run out of memory, we silently delete the node that we were
wanting to update? Is this really what you want?
OK, OK, given that the "C" in "AVC" stands for "cache", I guess that
deleting the element does indeed make sense...
> + list_del_rcu(&org->list);
> + call_rcu(&org->rhead, avc_node_free);
> + goto out;
> + }
> +
> + /* Duplicate and Update */
> + memcpy(new, org, sizeof(struct avc_node));
> +
> + switch (event) {
> + case AVC_CALLBACK_GRANT:
> + new->ae.avd.allowed |= perms;
This is a 32-bit field, and is protected by a lock. Therefore, only
one update should be happening at a time. This field is 32-bit aligned
(right?), and so the write that does the update will appear atomic to
readers. Only one of the the updates happens in a given call.
So it should be safe to make the updates in place, right?
Some possible reasons why it might -not- be safe to do this:
o There is some bizarre CPU out there that does not do
atomic 32-bit writes (but this would break all sorts of
stuff).
o Consecutive changes might give a slow reader the incorrect
impression that permissions are wide open. One way to take
care of this would be to force a grace period between each
change. One way to do this would be to set a bit indicating
an update, and have the call_rcu() clear it. When getting
ready to update, if the bit is still set, block until the
bit is cleared. This would guarantee that a given reader
would see at most one update.
But this would require considerable code restructuring, since
this code is still under an rcu_read_lock(). But putting
forward the thought anyway in case it is helpful.
If updates are -really- rare, a simple way to make this happen
would be to protect the updates with a sema_t, and just do
an unconditional synchronize_kernel() just before releasing
the sema_t.
Since the current code permits readers to see changes to
several different nodes, a single synchronize_kernel() should
suffice.
> + break;
> + case AVC_CALLBACK_TRY_REVOKE:
> + case AVC_CALLBACK_REVOKE:
> + new->ae.avd.allowed &= ~perms;
> + break;
> + case AVC_CALLBACK_AUDITALLOW_ENABLE:
> + new->ae.avd.auditallow |= perms;
> + break;
> + case AVC_CALLBACK_AUDITALLOW_DISABLE:
> + new->ae.avd.auditallow &= ~perms;
> + break;
> + case AVC_CALLBACK_AUDITDENY_ENABLE:
> + new->ae.avd.auditdeny |= perms;
> + break;
> + case AVC_CALLBACK_AUDITDENY_DISABLE:
> + new->ae.avd.auditdeny &= ~perms;
> + break;
> + }
> +
> + list_replace_rcu(&org->list,&new->list);
> + call_rcu(&org->rhead, avc_node_free);
> }
> +out:
> + spin_unlock(&avc_cache.slots_lock[hvalue]);
> }
>
> static int avc_update_cache(u32 event, u32 ssid, u32 tsid,
> u16 tclass, u32 perms)
> {
> + struct list_head *pos;
> struct avc_node *node;
> int i;
> - unsigned long flags;
>
> - spin_lock_irqsave(&avc_lock,flags);
> + rcu_read_lock();
>
> if (ssid == SECSID_WILD || tsid == SECSID_WILD) {
> /* apply to all matching nodes */
> for (i = 0; i < AVC_CACHE_SLOTS; i++) {
> - for (node = avc_cache.slots[i]; node;
> - node = node->next) {
> + list_for_each_rcu(pos, &avc_cache.slots[i] ) {
> + node = list_entry(pos, struct avc_node, list);
Why not just do:
list_for_each_entry_rcu(pos, &avc_cache.slots[i], list) {
rather than having the separate list_entry().
> if (avc_sidcmp(ssid, node->ae.ssid) &&
> avc_sidcmp(tsid, node->ae.tsid) &&
> - tclass == node->ae.tclass) {
> - avc_update_node(event,node,perms);
> + tclass == node->ae.tclass ) {
> + avc_update_node(event, perms, node->ae.ssid
> + ,node->ae.tsid, node->ae.tclass);
> }
> }
> }
> } else {
> /* apply to one node */
> node = avc_search_node(ssid, tsid, tclass, NULL);
> if (node) {
> - avc_update_node(event,node,perms);
> + avc_update_node(event, perms, ssid, tsid, tclass);
> }
> }
>
> - spin_unlock_irqrestore(&avc_lock,flags);
> + rcu_read_unlock();
>
> return 0;
> }
>
> @@ -751,9 +786,8 @@
> {
> struct avc_callback_node *c;
> u32 tretained = 0, cretained = 0;
> int rc = 0;
> - unsigned long flags;
>
> /*
> * try_revoke only removes permissions from the cache
> * state if they are not retained by the object manager.
> @@ -786,12 +820,9 @@
> avc_update_cache(event,ssid,tsid,tclass,perms);
> *out_retained = tretained;
> }
>
> - spin_lock_irqsave(&avc_lock,flags);
> - if (seqno > avc_cache.latest_notif)
> - avc_cache.latest_notif = seqno;
> - spin_unlock_irqrestore(&avc_lock,flags);
> + avc_latest_notif_update(seqno, 0);
>
> out:
> return rc;
> }
> @@ -856,34 +887,24 @@
> int avc_ss_reset(u32 seqno)
> {
> struct avc_callback_node *c;
> int i, rc = 0;
> - struct avc_node *node, *tmp;
> - unsigned long flags;
> + struct list_head *pos;
> + struct avc_node *node;
>
> avc_hash_eval("reset");
>
> - spin_lock_irqsave(&avc_lock,flags);
> + rcu_read_lock();
>
> for (i = 0; i < AVC_CACHE_SLOTS; i++) {
> - node = avc_cache.slots[i];
> - while (node) {
> - tmp = node;
> - node = node->next;
> - tmp->ae.ssid = tmp->ae.tsid = SECSID_NULL;
> - tmp->ae.tclass = SECCLASS_NULL;
> - tmp->ae.avd.allowed = tmp->ae.avd.decided = 0;
> - tmp->ae.avd.auditallow = tmp->ae.avd.auditdeny = 0;
> - tmp->ae.used = 0;
> - tmp->next = avc_node_freelist;
> - avc_node_freelist = tmp;
> - avc_cache.active_nodes--;
> + list_for_each_rcu(pos, &avc_cache.slots[i]) {
> + node = list_entry(pos, struct avc_node, list);
Why not just do:
list_for_each_entry_rcu(pos, &avc_cache.slots[i], list) {
rather than having the separate list_entry().
> + list_del_rcu(&node->list);
I don't see what prevents multiple list_del_rcu()s from executing in
parallel, mangling the list. Is there some higher-level lock?
If so, why do we need RCU protecting the list?
> + call_rcu(&node->rhead, avc_node_free);
> }
> - avc_cache.slots[i] = NULL;
> }
> - avc_cache.lru_hint = 0;
>
> - spin_unlock_irqrestore(&avc_lock,flags);
> + rcu_read_unlock();
>
> for (i = 0; i < AVC_NSTATS; i++)
> avc_cache_stats[i] = 0;
>
> @@ -895,12 +916,9 @@
> goto out;
> }
> }
>
> - spin_lock_irqsave(&avc_lock,flags);
> - if (seqno > avc_cache.latest_notif)
> - avc_cache.latest_notif = seqno;
> - spin_unlock_irqrestore(&avc_lock,flags);
> + avc_latest_notif_update(seqno, 0);
> out:
> return rc;
> }
>
> @@ -949,9 +967,9 @@
> * @ssid: source security identifier
> * @tsid: target security identifier
> * @tclass: target security class
> * @requested: requested permissions, interpreted based on @tclass
> - * @aeref: AVC entry reference
> + * @aeref: AVC entry reference(not in use)
> * @avd: access vector decisions
> *
> * Check the AVC to determine whether the @requested permissions are granted
> * for the SID pair (@ssid, @tsid), interpreting the permissions
> @@ -968,72 +986,45 @@
> int avc_has_perm_noaudit(u32 ssid, u32 tsid,
> u16 tclass, u32 requested,
> struct avc_entry_ref *aeref, struct av_decision *avd)
> {
> - struct avc_entry *ae;
> - int rc = 0;
> - unsigned long flags;
> + struct avc_node *node;
> struct avc_entry entry;
> + int rc = 0;
> u32 denied;
> - struct avc_entry_ref ref;
>
> - if (!aeref) {
> - avc_entry_ref_init(&ref);
> - aeref = &ref;
> - }
> -
> - spin_lock_irqsave(&avc_lock, flags);
> + rcu_read_lock();
> avc_cache_stats_incr(AVC_ENTRY_LOOKUPS);
> - ae = aeref->ae;
> - if (ae) {
> - if (ae->ssid == ssid &&
> - ae->tsid == tsid &&
> - ae->tclass == tclass &&
> - ((ae->avd.decided & requested) == requested)) {
> - avc_cache_stats_incr(AVC_ENTRY_HITS);
> - ae->used = 1;
> - } else {
> - avc_cache_stats_incr(AVC_ENTRY_DISCARDS);
> - ae = NULL;
> - }
> - }
>
> - if (!ae) {
> - avc_cache_stats_incr(AVC_ENTRY_MISSES);
> - rc = avc_lookup(ssid, tsid, tclass, requested, aeref);
> - if (rc) {
> - spin_unlock_irqrestore(&avc_lock,flags);
> - rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
> - if (rc)
> - goto out;
> - spin_lock_irqsave(&avc_lock, flags);
> - rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
> - if (rc) {
> - spin_unlock_irqrestore(&avc_lock,flags);
> - goto out;
> - }
> + node = avc_lookup(ssid, tsid, tclass, requested);
> + if (!node) {
> + rcu_read_unlock();
> + rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
> + if (rc)
> + goto out;
> + rcu_read_lock();
> + node = avc_insert(ssid,tsid,tclass,&entry);
I don't see what prevents two copies of avc_insert from running in parallel.
Seems like this would trash the lists. Or, if there is a higher-level lock
that I am missing, why do we need RCU protecting the lists?
> + if (!node) {
> + rcu_read_unlock();
> + goto out;
> }
> - ae = aeref->ae;
> }
> -
> if (avd)
> - memcpy(avd, &ae->avd, sizeof(*avd));
> + memcpy(avd, &node->ae.avd, sizeof(*avd));
>
> - denied = requested & ~(ae->avd.allowed);
> + denied = requested & ~(node->ae.avd.allowed);
>
> if (!requested || denied) {
> if (selinux_enforcing) {
> - spin_unlock_irqrestore(&avc_lock,flags);
> rc = -EACCES;
> - goto out;
> } else {
> - ae->avd.allowed |= requested;
> - spin_unlock_irqrestore(&avc_lock,flags);
> - goto out;
> + if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
> + avc_update_node(AVC_CALLBACK_GRANT
> + ,requested,ssid,tsid,tclass);
> }
> }
>
> - spin_unlock_irqrestore(&avc_lock,flags);
> + rcu_read_unlock();
> out:
> return rc;
> }
>
> @@ -1042,9 +1033,9 @@
> * @ssid: source security identifier
> * @tsid: target security identifier
> * @tclass: target security class
> * @requested: requested permissions, interpreted based on @tclass
> - * @aeref: AVC entry reference
> + * @aeref: AVC entry reference(not in use)
> * @auditdata: auxiliary audit data
> *
> * Check the AVC to determine whether the @requested permissions are granted
> * for the SID pair (@ssid, @tsid), interpreting the permissions
> @@ -1061,8 +1052,8 @@
> {
> struct av_decision avd;
> int rc;
>
> - rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, aeref, &avd);
> + rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, NULL, &avd);
> avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
> return rc;
> }
> diff -rNU4 linux-2.6.8.1/security/selinux/include/avc.h linux-2.6.8.1.rcu/security/selinux/include/avc.h
> --- linux-2.6.8.1/security/selinux/include/avc.h 2004-08-14 19:54:51.000000000 +0900
> +++ linux-2.6.8.1.rcu/security/selinux/include/avc.h 2004-08-20 20:40:50.000000000 +0900
> @@ -118,13 +118,11 @@
> */
>
> void __init avc_init(void);
>
> -int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
> - u32 requested, struct avc_entry_ref *aeref);
> +struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested);
>
> -int avc_insert(u32 ssid, u32 tsid, u16 tclass,
> - struct avc_entry *ae, struct avc_entry_ref *out_aeref);
> +struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae);
>
> void avc_audit(u32 ssid, u32 tsid,
> u16 tclass, u32 requested,
> struct av_decision *avd, int result, struct avc_audit_data *auditdata);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 20:19 ` Paul E. McKenney
@ 2004-08-20 20:35 ` James Morris
2004-08-24 7:27 ` Kaigai Kohei
1 sibling, 0 replies; 31+ messages in thread
From: James Morris @ 2004-08-20 20:35 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Kaigai Kohei, Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
> > + hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
atomic_inc_return() is not implemented on ia32 or x86-64. Is there a
workaround, or do we need to implement it? (Andi Kleen suggested using
the xadd instruction and altinstructions for i386).
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
[not found] ` <1093014789.16585.186.camel@moss-spartans.epoch.ncsc.mil>
@ 2004-08-24 7:25 ` Kaigai Kohei
2004-08-24 15:37 ` Stephen Smalley
2004-08-24 23:02 ` RCU issue with SELinux (Re: SELINUX performance issues) Paul E. McKenney
0 siblings, 2 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-24 7:25 UTC (permalink / raw)
To: Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
[-- Attachment #1: Type: text/plain, Size: 2027 bytes --]
Hi Stephen, Thanks for your comments.
> I'm not overly familiar with RCU myself, but the comments in list.h for
> list_add_rcu suggest that you still need to hold a lock to avoid racing
> with another list_add_rcu or list_del_rcu call on the same list. But
> avc_insert is calling list_add_rcu without holding any lock; can't it
> race with another avc_insert on the same hash bucket? Do I just
> misunderstand, or is this unsafe? Thanks for clarifying.
You are right. Indeed, the lock for hash bucket is also necessary
when avc_insert() is called. I fixed them.
> I think we can likely eliminate the mutation of the node in the
> !selinux_enforcing case in avc_has_perm_noaudit, i.e. eliminate the
> entire else clause and just fall through with rc still 0. Adding the
> requested permissions to the node was simply to avoid flooding denials
> in permissive mode on the same permission check, but this can be
> addressed separately using the audit ratelimit mechanism.
I have another opinion.
This simple mechanism against the flood of audit log is necessary,
because it prevents the depletion of the system log buffer and denied log
all over the console when we are debugging the security policy in permissive mode.
So, I improved the avc_update_node() function and avc_node data structure.
It does not need kmalloc() when avc_update_node().
This approach is good for durability and compatibility of original implementation,
but double area of avc_nodes is needed for updating without kmalloc().
This approach can apply to any kinds of updating of avc_entry.
This idea is pretty complexer, though.
I modified the following points:
- We hold the lock for hash backet when avc_insert() and avc_ss_reset() are
called for safety.
- list_for_each_rcu() and list_entry() are replaced by list_for_entry().
- avc_node_dual structure which contains two avc_node objects is defined.
It allows to do avc_update_node() without kmalloc() or any locks.
Any comments please. Thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
[-- Attachment #2: list_replace_rcu-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 845 bytes --]
--- linux-2.6.8.1/include/linux/list.h 2004-08-14 19:55:33.000000000 +0900
+++ linux-2.6.8.1.rcu/include/linux/list.h 2004-08-20 18:04:10.000000000 +0900
@@ -194,8 +194,23 @@
__list_del(entry->prev, entry->next);
entry->prev = LIST_POISON2;
}
+/*
+ * list_replace_rcu - replace old entry by new onw from list
+ * @old : the element to be replaced from the list.
+ * @new : the new element to insert to the list.
+ *
+ * The old entry will be replaced to the new entry atomically.
+ */
+static inline void list_replace_rcu(struct list_head *old, struct list_head *new){
+ new->next = old->next;
+ new->prev = old->prev;
+ smp_wmb();
+ new->next->prev = new;
+ new->prev->next = new;
+}
+
/**
* list_del_init - deletes entry from list and reinitialize it.
* @entry: the element to delete from the list.
*/
[-- Attachment #3: selinux.rcu-2.6.8.1-take2.patch --]
[-- Type: application/octet-stream, Size: 20952 bytes --]
diff -rNU4 linux-2.6.8.1/security/selinux/avc.c linux-2.6.8.1.rcu/security/selinux/avc.c
--- linux-2.6.8.1/security/selinux/avc.c 2004-08-14 19:55:48.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/avc.c 2004-08-24 13:30:56.000000000 +0900
@@ -35,28 +35,40 @@
#include "av_perm_to_string.h"
#include "objsec.h"
#define AVC_CACHE_SLOTS 512
-#define AVC_CACHE_MAXNODES 410
+#define AVC_CACHE_THRESHOLD 410
+#define AVC_CACHE_RECLAIM 16
struct avc_entry {
u32 ssid;
u32 tsid;
u16 tclass;
struct av_decision avd;
- int used; /* used recently */
+ atomic_t used; /* used recently */
};
+struct avc_node_dual;
struct avc_node {
+ struct list_head list;
struct avc_entry ae;
- struct avc_node *next;
+ struct avc_node *reserved;
+ struct avc_node_dual *container;
+};
+
+/* dual avc_node is necessary for update */
+struct avc_node_dual {
+ struct rcu_head rhead;
+ struct avc_node node[2];
};
struct avc_cache {
- struct avc_node *slots[AVC_CACHE_SLOTS];
- u32 lru_hint; /* LRU hint for reclaim scan */
- u32 active_nodes;
- u32 latest_notif; /* latest revocation notification */
+ struct list_head slots[AVC_CACHE_SLOTS];
+ spinlock_t slots_lock[AVC_CACHE_SLOTS];
+ /* lock for insert/update/delete and reset */
+ atomic_t lru_hint; /* LRU hint for reclaim scan */
+ atomic_t active_nodes;
+ u32 latest_notif; /* latest revocation notification */
};
struct avc_callback_node {
int (*callback) (u32 event, u32 ssid, u32 tsid,
@@ -69,10 +81,8 @@
u32 perms;
struct avc_callback_node *next;
};
-static spinlock_t avc_lock = SPIN_LOCK_UNLOCKED;
-static struct avc_node *avc_node_freelist;
static struct avc_cache avc_cache;
static unsigned avc_cache_stats[AVC_NSTATS];
static struct avc_callback_node *avc_callbacks;
@@ -187,52 +197,44 @@
* Initialize the access vector cache.
*/
void __init avc_init(void)
{
- struct avc_node *new;
int i;
- for (i = 0; i < AVC_CACHE_MAXNODES; i++) {
- new = kmalloc(sizeof(*new), GFP_ATOMIC);
- if (!new) {
- printk(KERN_WARNING "avc: only able to allocate "
- "%d entries\n", i);
- break;
- }
- memset(new, 0, sizeof(*new));
- new->next = avc_node_freelist;
- avc_node_freelist = new;
- }
-
+ for (i =0; i < AVC_CACHE_SLOTS; i++) {
+ INIT_LIST_HEAD(&avc_cache.slots[i]);
+ avc_cache.slots_lock[i] = SPIN_LOCK_UNLOCKED;
+ }
+ atomic_set(&avc_cache.active_nodes, 0);
+ atomic_set(&avc_cache.lru_hint, 0);
+
audit_log(current->audit_context, "AVC INITIALIZED\n");
}
#if 0
static void avc_hash_eval(char *tag)
{
int i, chain_len, max_chain_len, slots_used;
struct avc_node *node;
+ struct list_head *pos;
unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
slots_used = 0;
max_chain_len = 0;
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- if (node) {
+ if (!list_empty(&avc_cache.slots[i])) {
slots_used++;
chain_len = 0;
- while (node) {
+ list_for_each_rcu(pos, &avc_cache.slots[i])
chain_len++;
- node = node->next;
- }
if (chain_len > max_chain_len)
max_chain_len = chain_len;
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
printk(KERN_INFO "\n");
printk(KERN_INFO "%s avc: %d entries and %d/%d buckets used, longest "
"chain length %d\n", tag, avc_cache.active_nodes, slots_used,
@@ -242,188 +244,208 @@
static inline void avc_hash_eval(char *tag)
{ }
#endif
-static inline struct avc_node *avc_reclaim_node(void)
-{
- struct avc_node *prev, *cur;
- int hvalue, try;
+static void avc_node_free(struct rcu_head *rhead) {
+ struct avc_node_dual *dual_node;
+ dual_node = container_of(rhead, struct avc_node_dual, rhead);
+ kfree(dual_node);
+}
- hvalue = avc_cache.lru_hint;
- for (try = 0; try < 2; try++) {
- do {
- prev = NULL;
- cur = avc_cache.slots[hvalue];
- while (cur) {
- if (!cur->ae.used)
- goto found;
+static inline void avc_reclaim_node(void)
+{
+ struct avc_node *node;
+ int hvalue, try, ecx;
- cur->ae.used = 0;
+ for (try = 0, ecx = 0; try < AVC_CACHE_SLOTS; try++ ) {
+ hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
- prev = cur;
- cur = cur->next;
+ if (spin_trylock(&avc_cache.slots_lock[hvalue])) {
+ list_for_each_entry(node, &avc_cache.slots[hvalue], list) {
+ if (!atomic_dec_and_test(&node->ae.used)) {
+ /* Recently Unused */
+ list_del_rcu(&node->list);
+ call_rcu(&node->container->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ ecx++;
+ if (ecx >= AVC_CACHE_RECLAIM) {
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ goto out;
+ }
+ }
}
- hvalue = (hvalue + 1) & (AVC_CACHE_SLOTS - 1);
- } while (hvalue != avc_cache.lru_hint);
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ }
}
-
- panic("avc_reclaim_node");
-
-found:
- avc_cache.lru_hint = hvalue;
-
- if (prev == NULL)
- avc_cache.slots[hvalue] = cur->next;
- else
- prev->next = cur->next;
-
- return cur;
+out:
+ return;
}
-static inline struct avc_node *avc_claim_node(u32 ssid,
- u32 tsid, u16 tclass)
+static inline struct avc_node *avc_get_node(void)
{
- struct avc_node *new;
- int hvalue;
+ struct avc_node_dual *new;
+ int actives;
- hvalue = avc_hash(ssid, tsid, tclass);
- if (avc_node_freelist) {
- new = avc_node_freelist;
- avc_node_freelist = avc_node_freelist->next;
- avc_cache.active_nodes++;
- } else {
- new = avc_reclaim_node();
- if (!new)
- goto out;
- }
+ new = kmalloc(sizeof(struct avc_node_dual), GFP_ATOMIC);
+ if (!new)
+ return NULL;
- new->ae.used = 1;
- new->ae.ssid = ssid;
- new->ae.tsid = tsid;
- new->ae.tclass = tclass;
- new->next = avc_cache.slots[hvalue];
- avc_cache.slots[hvalue] = new;
+ INIT_RCU_HEAD(&new->rhead);
+ INIT_LIST_HEAD(&new->node[0].list);
+ INIT_LIST_HEAD(&new->node[1].list);
+ new->node[0].container = new;
+ new->node[1].container = new;
+ new->node[0].reserved = &new->node[1];
+ new->node[1].reserved = &new->node[0];
-out:
- return new;
+ actives = atomic_inc_return(&avc_cache.active_nodes);
+ if (actives > AVC_CACHE_THRESHOLD)
+ avc_reclaim_node();
+
+ return &new->node[0];
}
static inline struct avc_node *avc_search_node(u32 ssid, u32 tsid,
u16 tclass, int *probes)
{
- struct avc_node *cur;
+ struct avc_node *node, *ret = NULL;
int hvalue;
int tprobes = 1;
hvalue = avc_hash(ssid, tsid, tclass);
- cur = avc_cache.slots[hvalue];
- while (cur != NULL &&
- (ssid != cur->ae.ssid ||
- tclass != cur->ae.tclass ||
- tsid != cur->ae.tsid)) {
+ list_for_each_entry(node, &avc_cache.slots[hvalue], list) {
+ if (ssid == node->ae.ssid &&
+ tclass == node->ae.tclass &&
+ tsid == node->ae.tsid) {
+ ret = node;
+ break;
+ }
tprobes++;
- cur = cur->next;
}
- if (cur == NULL) {
+ if (ret == NULL) {
/* cache miss */
goto out;
}
/* cache hit */
if (probes)
*probes = tprobes;
-
- cur->ae.used = 1;
-
+ if (atomic_read(&ret->ae.used) != 1)
+ atomic_set(&ret->ae.used, 1);
out:
- return cur;
+ return ret;
}
/**
* avc_lookup - Look up an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
*
* Look up an AVC entry that is valid for the
* @requested permissions between the SID pair
* (@ssid, @tsid), interpreting the permissions
* based on @tclass. If a valid AVC entry exists,
- * then this function updates @aeref to refer to the
- * entry and returns %0. Otherwise, this function
- * returns -%ENOENT.
+ * then this function return the avc_node.
+ * Otherwise, this function returns NULL.
*/
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref)
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
{
struct avc_node *node;
- int probes, rc = 0;
+ int probes;
avc_cache_stats_incr(AVC_CAV_LOOKUPS);
node = avc_search_node(ssid, tsid, tclass,&probes);
if (node && ((node->ae.avd.decided & requested) == requested)) {
avc_cache_stats_incr(AVC_CAV_HITS);
avc_cache_stats_add(AVC_CAV_PROBES,probes);
- aeref->ae = &node->ae;
goto out;
}
avc_cache_stats_incr(AVC_CAV_MISSES);
- rc = -ENOENT;
out:
- return rc;
+ return node;
+}
+
+static int avc_latest_notif_update(int seqno, int is_insert)
+{
+ int ret = 0;
+ static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
+
+ spin_lock(¬if_lock);
+ if (seqno < avc_cache.latest_notif) {
+ if (is_insert) {
+ printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
+ seqno, avc_cache.latest_notif);
+ ret = -EAGAIN;
+ } else {
+ avc_cache.latest_notif = seqno;
+ }
+ }
+ spin_unlock(¬if_lock);
+ return ret;
}
/**
* avc_insert - Insert an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @ae: AVC entry
- * @aeref: AVC entry reference
*
* Insert an AVC entry for the SID pair
* (@ssid, @tsid) and class @tclass.
* The access vectors and the sequence number are
* normally provided by the security server in
* response to a security_compute_av() call. If the
* sequence number @ae->avd.seqno is not less than the latest
* revocation notification, then the function copies
- * the access vectors into a cache entry, updates
- * @aeref to refer to the entry, and returns %0.
- * Otherwise, this function returns -%EAGAIN.
+ * the access vectors into a cache entry, returns
+ * avc_node inserted. Otherwise, this function returns NULL.
*/
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *aeref)
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae)
{
- struct avc_node *node;
- int rc = 0;
+ struct avc_node *pos, *node = NULL;
+ int hvalue;
- if (ae->avd.seqno < avc_cache.latest_notif) {
- printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
- ae->avd.seqno, avc_cache.latest_notif);
- rc = -EAGAIN;
+ if (avc_latest_notif_update(ae->avd.seqno, 1))
goto out;
- }
- node = avc_claim_node(ssid, tsid, tclass);
- if (!node) {
- rc = -ENOMEM;
- goto out;
- }
+ node = avc_get_node();
- node->ae.avd.allowed = ae->avd.allowed;
- node->ae.avd.decided = ae->avd.decided;
- node->ae.avd.auditallow = ae->avd.auditallow;
- node->ae.avd.auditdeny = ae->avd.auditdeny;
- node->ae.avd.seqno = ae->avd.seqno;
- aeref->ae = &node->ae;
+ if (node) {
+ hvalue = avc_hash(ssid, tsid, tclass);
+
+ node->ae.ssid = ssid;
+ node->ae.tsid = tsid;
+ node->ae.tclass = tclass;
+ atomic_set(&node->ae.used, 1);
+
+ node->ae.avd.allowed = ae->avd.allowed;
+ node->ae.avd.decided = ae->avd.decided;
+ node->ae.avd.auditallow = ae->avd.auditallow;
+ node->ae.avd.auditdeny = ae->avd.auditdeny;
+ node->ae.avd.seqno = ae->avd.seqno;
+
+ spin_lock(&avc_cache.slots_lock[hvalue]);
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list){
+ if( pos->ae.ssid == ssid &&
+ pos->ae.tsid == tsid &&
+ pos->ae.tclass == tclass ){
+ atomic_dec(&avc_cache.active_nodes);
+ kfree(node->container);
+ goto duplicate;
+ }
+ }
+ list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
+duplicate:
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ }
out:
- return rc;
+ return node;
}
static inline void avc_print_ipv6_addr(struct audit_buffer *ab,
struct in6_addr *addr, u16 port,
@@ -685,10 +707,32 @@
{
return (x == y || x == SECSID_WILD || y == SECSID_WILD);
}
-static inline void avc_update_node(u32 event, struct avc_node *node, u32 perms)
+static void avc_update_node(u32 event, u32 perms, u32 ssid, u32 tsid, u16 tclass)
{
+ int hvalue;
+ struct avc_node *pos, *node, *org = NULL;
+
+ /* Lock the target slot */
+ hvalue = avc_hash(ssid, tsid, tclass);
+ spin_lock(&avc_cache.slots_lock[hvalue]);
+
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list){
+ if ( ssid==pos->ae.ssid &&
+ tsid==pos->ae.tsid &&
+ tclass==pos->ae.tclass ){
+ org = pos;
+ break;
+ }
+ }
+
+ if (!org)
+ goto out;
+
+ node = org->reserved;
+ /* Duplicate and Update */
+ memcpy(node, org, sizeof(struct avc_node));
switch (event) {
case AVC_CALLBACK_GRANT:
node->ae.avd.allowed |= perms;
break;
@@ -708,40 +752,42 @@
case AVC_CALLBACK_AUDITDENY_DISABLE:
node->ae.avd.auditdeny &= ~perms;
break;
}
+ list_replace_rcu(&org->list,&node->list);
+out:
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
}
static int avc_update_cache(u32 event, u32 ssid, u32 tsid,
u16 tclass, u32 perms)
{
struct avc_node *node;
int i;
- unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
if (ssid == SECSID_WILD || tsid == SECSID_WILD) {
/* apply to all matching nodes */
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- for (node = avc_cache.slots[i]; node;
- node = node->next) {
+ list_for_each_entry(node, &avc_cache.slots[i], list) {
if (avc_sidcmp(ssid, node->ae.ssid) &&
avc_sidcmp(tsid, node->ae.tsid) &&
- tclass == node->ae.tclass) {
- avc_update_node(event,node,perms);
+ tclass == node->ae.tclass ) {
+ avc_update_node(event, perms, node->ae.ssid
+ ,node->ae.tsid, node->ae.tclass);
}
}
}
} else {
/* apply to one node */
node = avc_search_node(ssid, tsid, tclass, NULL);
if (node) {
- avc_update_node(event,node,perms);
+ avc_update_node(event, perms, ssid, tsid, tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
return 0;
}
@@ -751,9 +797,8 @@
{
struct avc_callback_node *c;
u32 tretained = 0, cretained = 0;
int rc = 0;
- unsigned long flags;
/*
* try_revoke only removes permissions from the cache
* state if they are not retained by the object manager.
@@ -786,12 +831,9 @@
avc_update_cache(event,ssid,tsid,tclass,perms);
*out_retained = tretained;
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -856,34 +898,22 @@
int avc_ss_reset(u32 seqno)
{
struct avc_callback_node *c;
int i, rc = 0;
- struct avc_node *node, *tmp;
- unsigned long flags;
+ struct avc_node *node;
avc_hash_eval("reset");
- spin_lock_irqsave(&avc_lock,flags);
-
+ rcu_read_lock();
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- while (node) {
- tmp = node;
- node = node->next;
- tmp->ae.ssid = tmp->ae.tsid = SECSID_NULL;
- tmp->ae.tclass = SECCLASS_NULL;
- tmp->ae.avd.allowed = tmp->ae.avd.decided = 0;
- tmp->ae.avd.auditallow = tmp->ae.avd.auditdeny = 0;
- tmp->ae.used = 0;
- tmp->next = avc_node_freelist;
- avc_node_freelist = tmp;
- avc_cache.active_nodes--;
+ spin_lock(&avc_cache.slots_lock[i]);
+ list_for_each_entry(node, &avc_cache.slots[i], list){
+ list_del_rcu(&node->list);
+ call_rcu(&node->container->rhead, avc_node_free);
}
- avc_cache.slots[i] = NULL;
+ spin_unlock(&avc_cache.slots_lock[i]);
}
- avc_cache.lru_hint = 0;
-
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
for (i = 0; i < AVC_NSTATS; i++)
avc_cache_stats[i] = 0;
@@ -895,12 +925,9 @@
goto out;
}
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -949,9 +976,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @avd: access vector decisions
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -968,72 +995,46 @@
int avc_has_perm_noaudit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct avc_entry_ref *aeref, struct av_decision *avd)
{
- struct avc_entry *ae;
- int rc = 0;
- unsigned long flags;
+ struct avc_node *node;
struct avc_entry entry;
+ int rc = 0;
u32 denied;
- struct avc_entry_ref ref;
-
- if (!aeref) {
- avc_entry_ref_init(&ref);
- aeref = &ref;
- }
- spin_lock_irqsave(&avc_lock, flags);
+ rcu_read_lock();
avc_cache_stats_incr(AVC_ENTRY_LOOKUPS);
- ae = aeref->ae;
- if (ae) {
- if (ae->ssid == ssid &&
- ae->tsid == tsid &&
- ae->tclass == tclass &&
- ((ae->avd.decided & requested) == requested)) {
- avc_cache_stats_incr(AVC_ENTRY_HITS);
- ae->used = 1;
- } else {
- avc_cache_stats_incr(AVC_ENTRY_DISCARDS);
- ae = NULL;
- }
- }
- if (!ae) {
- avc_cache_stats_incr(AVC_ENTRY_MISSES);
- rc = avc_lookup(ssid, tsid, tclass, requested, aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
- if (rc)
- goto out;
- spin_lock_irqsave(&avc_lock, flags);
- rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
- }
+ node = avc_lookup(ssid, tsid, tclass, requested);
+ if (!node) {
+ rcu_read_unlock();
+ rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
+ if (rc)
+ goto out;
+ rcu_read_lock();
+ node = avc_insert(ssid,tsid,tclass,&entry);
+ if (!node) {
+ rc = -ENOMEM;
+ rcu_read_unlock();
+ goto out;
}
- ae = aeref->ae;
}
-
if (avd)
- memcpy(avd, &ae->avd, sizeof(*avd));
+ memcpy(avd, &node->ae.avd, sizeof(*avd));
- denied = requested & ~(ae->avd.allowed);
+ denied = requested & ~(node->ae.avd.allowed);
if (!requested || denied) {
if (selinux_enforcing) {
- spin_unlock_irqrestore(&avc_lock,flags);
rc = -EACCES;
- goto out;
} else {
- ae->avd.allowed |= requested;
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
+ if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
+ avc_update_node(AVC_CALLBACK_GRANT
+ ,requested,ssid,tsid,tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
out:
return rc;
}
@@ -1042,9 +1043,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @auditdata: auxiliary audit data
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -1061,8 +1062,8 @@
{
struct av_decision avd;
int rc;
- rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, aeref, &avd);
+ rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, NULL, &avd);
avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
return rc;
}
diff -rNU4 linux-2.6.8.1/security/selinux/include/avc.h linux-2.6.8.1.rcu/security/selinux/include/avc.h
--- linux-2.6.8.1/security/selinux/include/avc.h 2004-08-14 19:54:51.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/include/avc.h 2004-08-20 20:40:50.000000000 +0900
@@ -118,13 +118,11 @@
*/
void __init avc_init(void);
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref);
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested);
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *out_aeref);
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae);
void avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd, int result, struct avc_audit_data *auditdata);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 14:53 ` James Morris
@ 2004-08-24 7:27 ` Kaigai Kohei
2004-08-24 13:24 ` James Morris
0 siblings, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-24 7:27 UTC (permalink / raw)
To: James Morris; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
Hi James, Thanks for your comments.
> Do you have figures for 1 and 2 CPU?
The results of 1CPU and 2CPU are following: (2.6.8.1-RCU by take2 patch)
[write() to files on tmpfs Loop=500000 Parallel=<Num of CPU>]
-- 1CPU-- -- 2CPU-- -- 4CPU-- -- 8CPU-- --16CPU-- --32CPU--
2.6.8.1(disable) 8.2959 8.0430 8.0158 8.0183 8.0146 8.0037
2.6.8.1(enable) 11.8427 14.0358 78.0957 319.0451 1322.0313 too long
2.6.8.1.rwlock 11.2485 13.8688 20.0100 49.0390 90.0200 177.0533
2.6.8.1.rcu 11.3435 11.3319 11.0464 11.0205 11.0372 11.0496
> Also, can you run some more benchmarks, e.g. lmbench, unixbench, dbench
> etc?
The results of unixbench and dbench are following:
o UNIXbench
* INDEX value comparison
2.6.8.1 2.6.8.1 2.6.8.1 2.6.8.1
(Disable) (Enable) (rwlock) (RCU)
Dhrystone 2 using register variables 268.9 268.8 269.2 269.0
Double-Precision Whetstone 94.2 94.2 94.2 94.2
Execl Throughput 388.3 379.0 377.8 377.9
File Copy 1024 bufsize 2000 maxblocks 606.6 526.6 515.6 504.8
File Copy 256 bufsize 500 maxblocks 508.9 417.0 410.4 395.2
File Copy 4096 bufsize 8000 maxblocks 987.1 890.4 876.0 857.9
Pipe Throughput 525.1 406.4 404.5 408.8
Process Creation 321.2 317.8 315.9 316.3
Shell Scripts (8 concurrent) 1312.8 1276.2 1278.8 1282.8
System Call Overhead 467.1 468.7 464.1 467.2
========================================
FINAL SCORE 445.8 413.2 410.1 407.7
o dbench [ 4 processes run parallely on 4-CPUs / 10 times trials ]
---- mean ---- - STD -
2.6.8.1(disable) 860.249 [MB/s] 44.683
2.6.8.1(enable) 714.254 [MB/s] 32.359
2.6.8.1(+rwlock) 767.904 [MB/s] 27.968
2.6.8.1(+RCU) 830.678 [MB/s] 16.352
> > > + hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
>
> atomic_inc_return() is not implemented on ia32 or x86-64. Is there a
> workaround, or do we need to implement it? (Andi Kleen suggested using
> the xadd instruction and altinstructions for i386).
Oops!
In IA-32 or x86_64, can anybady implement atomic_inc_return()?
If it can not, I'll try to make alternative macros or inline functions.
Thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-20 20:19 ` Paul E. McKenney
2004-08-20 20:35 ` James Morris
@ 2004-08-24 7:27 ` Kaigai Kohei
1 sibling, 0 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-24 7:27 UTC (permalink / raw)
To: paulmck
Cc: James Morris, Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
Hi Paul, Thanks for your comments.
> A few comments interspersed for the larger patch (look for empty lines).
> I am missing why some of the list insertions and deletions are SMP safe.
Sorry, some of them were my bug.
> > + if (spin_trylock(&avc_cache.slots_lock[hvalue])) {
> > + list_for_each_rcu(pos, &avc_cache.slots[hvalue]) {
>
> Since we are holding the lock, the list cannot change, and the _rcu()
> is not needed. Right?
No, only insert/update/delete paths must hold the lock.
The _rcu suffix is necessary since read-only path does not hold the lock.
(It was my bug the inserting path did not hold the lock.)
> > + node = list_entry(pos, struct avc_node, list);
>
> Why not just do:
>
> list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
>
> rather than having the separate list_entry().
Your suggestion is better. I fixed it.
> > + if (org) {
> > + if (!new) {
>
> If we run out of memory, we silently delete the node that we were
> wanting to update? Is this really what you want?
>
> OK, OK, given that the "C" in "AVC" stands for "cache", I guess that
> deleting the element does indeed make sense...
Indeed, I thought it is pretty rough, and fixed avc_insert().
Now, kmalloc() is unecessary in avc_insert().
Therefore, the updating always succeeds.
(But pretty more memory is required.)
> > + switch (event) {
> > + case AVC_CALLBACK_GRANT:
> > + new->ae.avd.allowed |= perms;
>
> This is a 32-bit field, and is protected by a lock. Therefore, only
> one update should be happening at a time. This field is 32-bit aligned
> (right?), and so the write that does the update will appear atomic to
> readers. Only one of the the updates happens in a given call.
Is it about "new->ae.avd.allowed |= perms;" ?
No others can refer the 'new', since the 'new' is the duplicated entry
of the 'org'. It is safe irrespective of the lock held or unheld.
In addition, the problem which had its roots in avc_insert()
was solved by the take2 patch.
> o There is some bizarre CPU out there that does not do
> atomic 32-bit writes (but this would break all sorts of
> stuff).
>
> o Consecutive changes might give a slow reader the incorrect
> impression that permissions are wide open. One way to take
> care of this would be to force a grace period between each
> change. One way to do this would be to set a bit indicating
> an update, and have the call_rcu() clear it. When getting
> ready to update, if the bit is still set, block until the
> bit is cleared. This would guarantee that a given reader
> would see at most one update.
>
> But this would require considerable code restructuring, since
> this code is still under an rcu_read_lock(). But putting
> forward the thought anyway in case it is helpful.
>
> If updates are -really- rare, a simple way to make this happen
> would be to protect the updates with a sema_t, and just do
> an unconditional synchronize_kernel() just before releasing
> the sema_t.
>
> Since the current code permits readers to see changes to
> several different nodes, a single synchronize_kernel() should
> suffice.
> > + list_del_rcu(&node->list);
>
> I don't see what prevents multiple list_del_rcu()s from executing in
> parallel, mangling the list. Is there some higher-level lock?
> If so, why do we need RCU protecting the list?
I agree, and fixed it.
> > + rcu_read_lock();
> > + node = avc_insert(ssid,tsid,tclass,&entry);
>
> I don't see what prevents two copies of avc_insert from running in parallel.
> Seems like this would trash the lists. Or, if there is a higher-level lock
> that I am missing, why do we need RCU protecting the lists?
I agree, and fixed it.
Thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 7:27 ` Kaigai Kohei
@ 2004-08-24 13:24 ` James Morris
2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 9:52 ` [PATCH]atomic_inc_return() for i386/x86_64 (Re: RCU issue with SELinux) Kaigai Kohei
0 siblings, 2 replies; 31+ messages in thread
From: James Morris @ 2004-08-24 13:24 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Tue, 24 Aug 2004, Kaigai Kohei wrote:
> o UNIXbench
>
> * INDEX value comparison
> 2.6.8.1 2.6.8.1 2.6.8.1 2.6.8.1
> (Disable) (Enable) (rwlock) (RCU)
> Dhrystone 2 using register variables 268.9 268.8 269.2 269.0
> Double-Precision Whetstone 94.2 94.2 94.2 94.2
> Execl Throughput 388.3 379.0 377.8 377.9 +
> File Copy 1024 bufsize 2000 maxblocks 606.6 526.6 515.6 504.8 *
> File Copy 256 bufsize 500 maxblocks 508.9 417.0 410.4 395.2 *
> File Copy 4096 bufsize 8000 maxblocks 987.1 890.4 876.0 857.9 *
> Pipe Throughput 525.1 406.4 404.5 408.8 +
> Process Creation 321.2 317.8 315.9 316.3 +
> Shell Scripts (8 concurrent) 1312.8 1276.2 1278.8 1282.8 +
> System Call Overhead 467.1 468.7 464.1 467.2 +
> ========================================
> FINAL SCORE 445.8 413.2 410.1 407.7
This benchmark somewhat characterizes 1P performance, and the ones I've
marked with (*) get noticably worse with the RCU patch compared to the
current locking scheme. Tests marked (+) show no or insignificant
improvement.
Might be useful to compare with the lmbench macrobenchmark, to see if it
shows a similar pattern.
> o dbench [ 4 processes run parallely on 4-CPUs / 10 times trials ]
> ---- mean ---- - STD -
> 2.6.8.1(disable) 860.249 [MB/s] 44.683
> 2.6.8.1(enable) 714.254 [MB/s] 32.359
> 2.6.8.1(+rwlock) 767.904 [MB/s] 27.968
> 2.6.8.1(+RCU) 830.678 [MB/s] 16.352
Can you show the figures for 1 and 2 clients?
> In IA-32 or x86_64, can anybady implement atomic_inc_return()?
> If it can not, I'll try to make alternative macros or inline functions.
If you can get this done, it will be very useful, as I could allso run
some benchmarks on my test systems.
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 7:25 ` Kaigai Kohei
@ 2004-08-24 15:37 ` Stephen Smalley
2004-08-25 9:51 ` Kaigai Kohei
2004-08-24 23:02 ` RCU issue with SELinux (Re: SELINUX performance issues) Paul E. McKenney
1 sibling, 1 reply; 31+ messages in thread
From: Stephen Smalley @ 2004-08-24 15:37 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Tue, 2004-08-24 at 03:25, Kaigai Kohei wrote:
> You are right. Indeed, the lock for hash bucket is also necessary
> when avc_insert() is called. I fixed them.
avc_has_perm* can be called from interrupt or bh, e.g. send_sigio or
sock_rcv_skb. So using just spin_lock/spin_unlock rather than
spin_lock_irqsave/restore is unsafe, right?
--
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 7:25 ` Kaigai Kohei
2004-08-24 15:37 ` Stephen Smalley
@ 2004-08-24 23:02 ` Paul E. McKenney
2004-08-25 9:51 ` Kaigai Kohei
1 sibling, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2004-08-24 23:02 UTC (permalink / raw)
To: Kaigai Kohei
Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Tue, Aug 24, 2004 at 04:25:32PM +0900, Kaigai Kohei wrote:
> Hi Stephen, Thanks for your comments.
>
> > I'm not overly familiar with RCU myself, but the comments in list.h for
> > list_add_rcu suggest that you still need to hold a lock to avoid racing
> > with another list_add_rcu or list_del_rcu call on the same list. But
> > avc_insert is calling list_add_rcu without holding any lock; can't it
> > race with another avc_insert on the same hash bucket? Do I just
> > misunderstand, or is this unsafe? Thanks for clarifying.
>
> You are right. Indeed, the lock for hash bucket is also necessary
> when avc_insert() is called. I fixed them.
>
> > I think we can likely eliminate the mutation of the node in the
> > !selinux_enforcing case in avc_has_perm_noaudit, i.e. eliminate the
> > entire else clause and just fall through with rc still 0. Adding the
> > requested permissions to the node was simply to avoid flooding denials
> > in permissive mode on the same permission check, but this can be
> > addressed separately using the audit ratelimit mechanism.
>
> I have another opinion.
> This simple mechanism against the flood of audit log is necessary,
> because it prevents the depletion of the system log buffer and denied log
> all over the console when we are debugging the security policy in permissive mode.
> So, I improved the avc_update_node() function and avc_node data structure.
> It does not need kmalloc() when avc_update_node().
>
> This approach is good for durability and compatibility of original implementation,
> but double area of avc_nodes is needed for updating without kmalloc().
> This approach can apply to any kinds of updating of avc_entry.
> This idea is pretty complexer, though.
>
> I modified the following points:
> - We hold the lock for hash backet when avc_insert() and avc_ss_reset() are
> called for safety.
> - list_for_each_rcu() and list_entry() are replaced by list_for_entry().
One subtlety here...
The traversals that are protected by rcu_read_lock() (rather than an
update-side spinlock) need to be list_for_each_entry_rcu() rather than
list_for_each_entry(). The "_rcu()" is required in order to work
reliably on Alpha, and has the added benefit of calling out exactly
which traversals are RCU-protected.
Update-side code remains list_for_each_entry().
> - avc_node_dual structure which contains two avc_node objects is defined.
> It allows to do avc_update_node() without kmalloc() or any locks.
What happens when you have two consecutive updates to the same object?
Don't you have to defer the second update until a grace period has
elapsed since the first update in order to avoid confusing readers that
are still accessing the original version?
One way to do this would be to set a "don't-touch-me" bit that is
cleared by an RCU callback. An update to an element with the
"don't-touch-me" bit set would block until the bit clears. There
are probably better ways...
Thanx, Paul
> Any comments please. Thanks.
> --------
> Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 15:37 ` Stephen Smalley
@ 2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 15:50 ` Stephen Smalley
0 siblings, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-25 9:51 UTC (permalink / raw)
To: Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]
Hi Stephen, thanks for your comment.
> > You are right. Indeed, the lock for hash bucket is also necessary
> > when avc_insert() is called. I fixed them.
>
> avc_has_perm* can be called from interrupt or bh, e.g. send_sigio or
> sock_rcv_skb. So using just spin_lock/spin_unlock rather than
> spin_lock_irqsave/restore is unsafe, right?
Indeed, spin_lock/spin_unlock should be replaced by spin_lock_irqsave/restore.
I fixed it.
The attached take3-patch is modified as follows:
- avc_node_dual was eliminated by Paul E.McKenny's suggestion.
avc_update_node() calls kmalloc() and may return -ENOMEM.
(But, I think this effect is so limited.)
- All list_for_each_entry() were replaced by list_for_each_entry_rcu().
- All spin_lock()/spin_unlock() were replaced by spin_lock_irqsave()
/spin_unlock_restore().
- In avc_node_insert(), if an entry with the same ssid/tsid/tclass as new
one exists, the older entry is replaced by the new one.
Thanks. I want to make it the last edition hopefully. :)
--------
Kai Gai <kaigai@ak.jp.nec.com>
[-- Attachment #2: list_replace_rcu-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 845 bytes --]
--- linux-2.6.8.1/include/linux/list.h 2004-08-14 19:55:33.000000000 +0900
+++ linux-2.6.8.1.rcu/include/linux/list.h 2004-08-20 18:04:10.000000000 +0900
@@ -194,8 +194,23 @@
__list_del(entry->prev, entry->next);
entry->prev = LIST_POISON2;
}
+/*
+ * list_replace_rcu - replace old entry by new onw from list
+ * @old : the element to be replaced from the list.
+ * @new : the new element to insert to the list.
+ *
+ * The old entry will be replaced to the new entry atomically.
+ */
+static inline void list_replace_rcu(struct list_head *old, struct list_head *new){
+ new->next = old->next;
+ new->prev = old->prev;
+ smp_wmb();
+ new->next->prev = new;
+ new->prev->next = new;
+}
+
/**
* list_del_init - deletes entry from list and reinitialize it.
* @entry: the element to delete from the list.
*/
[-- Attachment #3: selinux.rcu-2.6.8.1-take3.patch --]
[-- Type: application/octet-stream, Size: 21275 bytes --]
diff -rNU4 linux-2.6.8.1/security/selinux/avc.c linux-2.6.8.1.rcu/security/selinux/avc.c
--- linux-2.6.8.1/security/selinux/avc.c 2004-08-14 19:55:48.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/avc.c 2004-08-25 15:37:20.000000000 +0900
@@ -35,28 +35,32 @@
#include "av_perm_to_string.h"
#include "objsec.h"
#define AVC_CACHE_SLOTS 512
-#define AVC_CACHE_MAXNODES 410
+#define AVC_CACHE_THRESHOLD 410
+#define AVC_CACHE_RECLAIM 16
struct avc_entry {
u32 ssid;
u32 tsid;
u16 tclass;
struct av_decision avd;
- int used; /* used recently */
+ atomic_t used; /* used recently */
};
struct avc_node {
struct avc_entry ae;
- struct avc_node *next;
+ struct list_head list;
+ struct rcu_head rhead;
};
struct avc_cache {
- struct avc_node *slots[AVC_CACHE_SLOTS];
- u32 lru_hint; /* LRU hint for reclaim scan */
- u32 active_nodes;
- u32 latest_notif; /* latest revocation notification */
+ struct list_head slots[AVC_CACHE_SLOTS];
+ spinlock_t slots_lock[AVC_CACHE_SLOTS];
+ /* lock for insert/update/delete and reset */
+ atomic_t lru_hint; /* LRU hint for reclaim scan */
+ atomic_t active_nodes;
+ u32 latest_notif; /* latest revocation notification */
};
struct avc_callback_node {
int (*callback) (u32 event, u32 ssid, u32 tsid,
@@ -69,10 +73,8 @@
u32 perms;
struct avc_callback_node *next;
};
-static spinlock_t avc_lock = SPIN_LOCK_UNLOCKED;
-static struct avc_node *avc_node_freelist;
static struct avc_cache avc_cache;
static unsigned avc_cache_stats[AVC_NSTATS];
static struct avc_callback_node *avc_callbacks;
@@ -187,23 +189,17 @@
* Initialize the access vector cache.
*/
void __init avc_init(void)
{
- struct avc_node *new;
int i;
- for (i = 0; i < AVC_CACHE_MAXNODES; i++) {
- new = kmalloc(sizeof(*new), GFP_ATOMIC);
- if (!new) {
- printk(KERN_WARNING "avc: only able to allocate "
- "%d entries\n", i);
- break;
- }
- memset(new, 0, sizeof(*new));
- new->next = avc_node_freelist;
- avc_node_freelist = new;
- }
-
+ for (i =0; i < AVC_CACHE_SLOTS; i++) {
+ INIT_LIST_HEAD(&avc_cache.slots[i]);
+ avc_cache.slots_lock[i] = SPIN_LOCK_UNLOCKED;
+ }
+ atomic_set(&avc_cache.active_nodes, 0);
+ atomic_set(&avc_cache.lru_hint, 0);
+
audit_log(current->audit_context, "AVC INITIALIZED\n");
}
#if 0
@@ -212,27 +208,24 @@
int i, chain_len, max_chain_len, slots_used;
struct avc_node *node;
unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
slots_used = 0;
max_chain_len = 0;
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- if (node) {
+ if (!list_empty(&avc_cache.slots[i])) {
slots_used++;
chain_len = 0;
- while (node) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[i])
chain_len++;
- node = node->next;
- }
if (chain_len > max_chain_len)
max_chain_len = chain_len;
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
printk(KERN_INFO "\n");
printk(KERN_INFO "%s avc: %d entries and %d/%d buckets used, longest "
"chain length %d\n", tag, avc_cache.active_nodes, slots_used,
@@ -242,188 +235,207 @@
static inline void avc_hash_eval(char *tag)
{ }
#endif
-static inline struct avc_node *avc_reclaim_node(void)
+static void avc_node_free(struct rcu_head *rhead) {
+ struct avc_node *node;
+ node = container_of(rhead, struct avc_node, rhead);
+ kfree(node);
+}
+
+static inline int avc_reclaim_node(void)
{
- struct avc_node *prev, *cur;
- int hvalue, try;
+ struct avc_node *node;
+ int hvalue, try, ecx;
- hvalue = avc_cache.lru_hint;
- for (try = 0; try < 2; try++) {
- do {
- prev = NULL;
- cur = avc_cache.slots[hvalue];
- while (cur) {
- if (!cur->ae.used)
- goto found;
+ for (try = 0, ecx = 0; try < AVC_CACHE_SLOTS; try++ ) {
+ hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
- cur->ae.used = 0;
+ if (!spin_trylock(&avc_cache.slots_lock[hvalue]))
+ continue;
- prev = cur;
- cur = cur->next;
+ list_for_each_entry_rcu(node, &avc_cache.slots[hvalue], list) {
+ if (!atomic_dec_and_test(&node->ae.used)) {
+ /* Recently Unused */
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ ecx++;
+ if (ecx >= AVC_CACHE_RECLAIM) {
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ goto out;
+ }
}
- hvalue = (hvalue + 1) & (AVC_CACHE_SLOTS - 1);
- } while (hvalue != avc_cache.lru_hint);
+ }
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
}
-
- panic("avc_reclaim_node");
-
-found:
- avc_cache.lru_hint = hvalue;
-
- if (prev == NULL)
- avc_cache.slots[hvalue] = cur->next;
- else
- prev->next = cur->next;
-
- return cur;
+out:
+ return ecx;
}
-static inline struct avc_node *avc_claim_node(u32 ssid,
- u32 tsid, u16 tclass)
+static inline struct avc_node *avc_get_node(void)
{
struct avc_node *new;
- int hvalue;
+ int actives;
- hvalue = avc_hash(ssid, tsid, tclass);
- if (avc_node_freelist) {
- new = avc_node_freelist;
- avc_node_freelist = avc_node_freelist->next;
- avc_cache.active_nodes++;
- } else {
- new = avc_reclaim_node();
- if (!new)
- goto out;
- }
-
- new->ae.used = 1;
- new->ae.ssid = ssid;
- new->ae.tsid = tsid;
- new->ae.tclass = tclass;
- new->next = avc_cache.slots[hvalue];
- avc_cache.slots[hvalue] = new;
+ new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (!new)
+ return NULL;
+
+ INIT_RCU_HEAD(&new->rhead);
+ INIT_LIST_HEAD(&new->list);
+
+ actives = atomic_inc_return(&avc_cache.active_nodes);
+ if (actives > AVC_CACHE_THRESHOLD)
+ avc_reclaim_node();
-out:
return new;
}
static inline struct avc_node *avc_search_node(u32 ssid, u32 tsid,
u16 tclass, int *probes)
{
- struct avc_node *cur;
+ struct avc_node *node, *ret = NULL;
int hvalue;
int tprobes = 1;
hvalue = avc_hash(ssid, tsid, tclass);
- cur = avc_cache.slots[hvalue];
- while (cur != NULL &&
- (ssid != cur->ae.ssid ||
- tclass != cur->ae.tclass ||
- tsid != cur->ae.tsid)) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[hvalue], list) {
+ if (ssid == node->ae.ssid &&
+ tclass == node->ae.tclass &&
+ tsid == node->ae.tsid) {
+ ret = node;
+ break;
+ }
tprobes++;
- cur = cur->next;
}
- if (cur == NULL) {
+ if (ret == NULL) {
/* cache miss */
goto out;
}
/* cache hit */
if (probes)
*probes = tprobes;
-
- cur->ae.used = 1;
-
+ if (atomic_read(&ret->ae.used) != 1)
+ atomic_set(&ret->ae.used, 1);
out:
- return cur;
+ return ret;
}
/**
* avc_lookup - Look up an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
*
* Look up an AVC entry that is valid for the
* @requested permissions between the SID pair
* (@ssid, @tsid), interpreting the permissions
* based on @tclass. If a valid AVC entry exists,
- * then this function updates @aeref to refer to the
- * entry and returns %0. Otherwise, this function
- * returns -%ENOENT.
+ * then this function return the avc_node.
+ * Otherwise, this function returns NULL.
*/
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref)
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
{
struct avc_node *node;
- int probes, rc = 0;
+ int probes;
avc_cache_stats_incr(AVC_CAV_LOOKUPS);
node = avc_search_node(ssid, tsid, tclass,&probes);
if (node && ((node->ae.avd.decided & requested) == requested)) {
avc_cache_stats_incr(AVC_CAV_HITS);
avc_cache_stats_add(AVC_CAV_PROBES,probes);
- aeref->ae = &node->ae;
goto out;
}
avc_cache_stats_incr(AVC_CAV_MISSES);
- rc = -ENOENT;
out:
- return rc;
+ return node;
+}
+
+static int avc_latest_notif_update(int seqno, int is_insert)
+{
+ int ret = 0;
+ static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
+ unsigned long flag;
+
+ spin_lock_irqsave(¬if_lock, flag);
+ if (seqno < avc_cache.latest_notif) {
+ if (is_insert) {
+ printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
+ seqno, avc_cache.latest_notif);
+ ret = -EAGAIN;
+ } else {
+ avc_cache.latest_notif = seqno;
+ }
+ }
+ spin_unlock_irqrestore(¬if_lock, flag);
+ return ret;
}
/**
* avc_insert - Insert an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @ae: AVC entry
- * @aeref: AVC entry reference
*
* Insert an AVC entry for the SID pair
* (@ssid, @tsid) and class @tclass.
* The access vectors and the sequence number are
* normally provided by the security server in
* response to a security_compute_av() call. If the
* sequence number @ae->avd.seqno is not less than the latest
* revocation notification, then the function copies
- * the access vectors into a cache entry, updates
- * @aeref to refer to the entry, and returns %0.
- * Otherwise, this function returns -%EAGAIN.
+ * the access vectors into a cache entry, returns
+ * avc_node inserted. Otherwise, this function returns NULL.
*/
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *aeref)
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae)
{
- struct avc_node *node;
- int rc = 0;
+ struct avc_node *pos, *node = NULL;
+ int hvalue;
+ unsigned long flag;
- if (ae->avd.seqno < avc_cache.latest_notif) {
- printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
- ae->avd.seqno, avc_cache.latest_notif);
- rc = -EAGAIN;
+ if (avc_latest_notif_update(ae->avd.seqno, 1))
goto out;
- }
- node = avc_claim_node(ssid, tsid, tclass);
- if (!node) {
- rc = -ENOMEM;
- goto out;
- }
+ node = avc_get_node();
+
+ if (node) {
+ hvalue = avc_hash(ssid, tsid, tclass);
- node->ae.avd.allowed = ae->avd.allowed;
- node->ae.avd.decided = ae->avd.decided;
- node->ae.avd.auditallow = ae->avd.auditallow;
- node->ae.avd.auditdeny = ae->avd.auditdeny;
- node->ae.avd.seqno = ae->avd.seqno;
- aeref->ae = &node->ae;
+ node->ae.ssid = ssid;
+ node->ae.tsid = tsid;
+ node->ae.tclass = tclass;
+ atomic_set(&node->ae.used, 1);
+
+ node->ae.avd.allowed = ae->avd.allowed;
+ node->ae.avd.decided = ae->avd.decided;
+ node->ae.avd.auditallow = ae->avd.auditallow;
+ node->ae.avd.auditdeny = ae->avd.auditdeny;
+ node->ae.avd.seqno = ae->avd.seqno;
+
+ spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
+ list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list) {
+ if (pos->ae.ssid == ssid &&
+ pos->ae.tsid == tsid &&
+ pos->ae.tclass == tclass) {
+ list_replace_rcu(&pos->list, &node->list);
+ call_rcu(&pos->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ goto found;
+ }
+ }
+ list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
+found:
+ spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag);
+ }
out:
- return rc;
+ return node;
}
static inline void avc_print_ipv6_addr(struct audit_buffer *ab,
struct in6_addr *addr, u16 port,
@@ -685,10 +697,51 @@
{
return (x == y || x == SECSID_WILD || y == SECSID_WILD);
}
-static inline void avc_update_node(u32 event, struct avc_node *node, u32 perms)
+/**
+ * avc_update_node Update an AVC entry
+ * @event : Updating event
+ * @perms : Permission mask bits
+ * @ssid,@tsid,@tclass : identifier of an AVC entry
+ *
+ * if a valid AVC entry doesn't exist,this function returns -ENOENT.
+ * if kmalloc() called internal returns NULL, this function returns -ENOMEM.
+ * otherwise, this function update the AVC entry. The original AVC-entry object
+ * will release later by RCU.
+ */
+static int avc_update_node(u32 event, u32 perms, u32 ssid, u32 tsid, u16 tclass)
{
+ int hvalue, rc = 0;
+ unsigned long flag;
+ struct avc_node *pos, *node, *org = NULL;
+
+ /* Lock the target slot */
+ hvalue = avc_hash(ssid, tsid, tclass);
+ spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
+
+ list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list){
+ if ( ssid==pos->ae.ssid &&
+ tsid==pos->ae.tsid &&
+ tclass==pos->ae.tclass ){
+ org = pos;
+ break;
+ }
+ }
+
+ if (!org) {
+ rc = -ENOENT;
+ goto out;
+ }
+
+ node = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (!node) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ /* Duplicate and Update */
+ memcpy(node, org, sizeof(struct avc_node));
switch (event) {
case AVC_CALLBACK_GRANT:
node->ae.avd.allowed |= perms;
break;
@@ -708,40 +761,41 @@
case AVC_CALLBACK_AUDITDENY_DISABLE:
node->ae.avd.auditdeny &= ~perms;
break;
}
+ list_replace_rcu(&org->list,&node->list);
+out:
+ spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag);
+
+ return rc;
}
static int avc_update_cache(u32 event, u32 ssid, u32 tsid,
u16 tclass, u32 perms)
{
struct avc_node *node;
int i;
- unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
if (ssid == SECSID_WILD || tsid == SECSID_WILD) {
/* apply to all matching nodes */
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- for (node = avc_cache.slots[i]; node;
- node = node->next) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[i], list) {
if (avc_sidcmp(ssid, node->ae.ssid) &&
avc_sidcmp(tsid, node->ae.tsid) &&
- tclass == node->ae.tclass) {
- avc_update_node(event,node,perms);
+ tclass == node->ae.tclass ) {
+ avc_update_node(event, perms, node->ae.ssid
+ ,node->ae.tsid, node->ae.tclass);
}
}
}
} else {
/* apply to one node */
- node = avc_search_node(ssid, tsid, tclass, NULL);
- if (node) {
- avc_update_node(event,node,perms);
- }
+ avc_update_node(event, perms, ssid, tsid, tclass);
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
return 0;
}
@@ -751,9 +805,8 @@
{
struct avc_callback_node *c;
u32 tretained = 0, cretained = 0;
int rc = 0;
- unsigned long flags;
/*
* try_revoke only removes permissions from the cache
* state if they are not retained by the object manager.
@@ -786,12 +839,9 @@
avc_update_cache(event,ssid,tsid,tclass,perms);
*out_retained = tretained;
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -856,34 +906,23 @@
int avc_ss_reset(u32 seqno)
{
struct avc_callback_node *c;
int i, rc = 0;
- struct avc_node *node, *tmp;
- unsigned long flags;
+ unsigned long flag;
+ struct avc_node *node;
avc_hash_eval("reset");
- spin_lock_irqsave(&avc_lock,flags);
-
+ rcu_read_lock();
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- while (node) {
- tmp = node;
- node = node->next;
- tmp->ae.ssid = tmp->ae.tsid = SECSID_NULL;
- tmp->ae.tclass = SECCLASS_NULL;
- tmp->ae.avd.allowed = tmp->ae.avd.decided = 0;
- tmp->ae.avd.auditallow = tmp->ae.avd.auditdeny = 0;
- tmp->ae.used = 0;
- tmp->next = avc_node_freelist;
- avc_node_freelist = tmp;
- avc_cache.active_nodes--;
+ spin_lock_irqsave(&avc_cache.slots_lock[i], flag);
+ list_for_each_entry_rcu(node, &avc_cache.slots[i], list){
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
}
- avc_cache.slots[i] = NULL;
+ spin_unlock_irqrestore(&avc_cache.slots_lock[i], flag);
}
- avc_cache.lru_hint = 0;
-
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
for (i = 0; i < AVC_NSTATS; i++)
avc_cache_stats[i] = 0;
@@ -895,12 +934,9 @@
goto out;
}
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -949,9 +985,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @avd: access vector decisions
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -968,72 +1004,46 @@
int avc_has_perm_noaudit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct avc_entry_ref *aeref, struct av_decision *avd)
{
- struct avc_entry *ae;
- int rc = 0;
- unsigned long flags;
+ struct avc_node *node;
struct avc_entry entry;
+ int rc = 0;
u32 denied;
- struct avc_entry_ref ref;
- if (!aeref) {
- avc_entry_ref_init(&ref);
- aeref = &ref;
- }
-
- spin_lock_irqsave(&avc_lock, flags);
+ rcu_read_lock();
avc_cache_stats_incr(AVC_ENTRY_LOOKUPS);
- ae = aeref->ae;
- if (ae) {
- if (ae->ssid == ssid &&
- ae->tsid == tsid &&
- ae->tclass == tclass &&
- ((ae->avd.decided & requested) == requested)) {
- avc_cache_stats_incr(AVC_ENTRY_HITS);
- ae->used = 1;
- } else {
- avc_cache_stats_incr(AVC_ENTRY_DISCARDS);
- ae = NULL;
- }
- }
- if (!ae) {
- avc_cache_stats_incr(AVC_ENTRY_MISSES);
- rc = avc_lookup(ssid, tsid, tclass, requested, aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
- if (rc)
- goto out;
- spin_lock_irqsave(&avc_lock, flags);
- rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
- }
+ node = avc_lookup(ssid, tsid, tclass, requested);
+ if (!node) {
+ rcu_read_unlock();
+ rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
+ if (rc)
+ goto out;
+ rcu_read_lock();
+ node = avc_insert(ssid,tsid,tclass,&entry);
+ if (!node) {
+ rc = -ENOMEM;
+ rcu_read_unlock();
+ goto out;
}
- ae = aeref->ae;
}
-
if (avd)
- memcpy(avd, &ae->avd, sizeof(*avd));
+ memcpy(avd, &node->ae.avd, sizeof(*avd));
- denied = requested & ~(ae->avd.allowed);
+ denied = requested & ~(node->ae.avd.allowed);
if (!requested || denied) {
if (selinux_enforcing) {
- spin_unlock_irqrestore(&avc_lock,flags);
rc = -EACCES;
- goto out;
} else {
- ae->avd.allowed |= requested;
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
+ if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
+ avc_update_node(AVC_CALLBACK_GRANT
+ ,requested,ssid,tsid,tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
out:
return rc;
}
@@ -1042,9 +1052,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @auditdata: auxiliary audit data
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -1061,8 +1071,8 @@
{
struct av_decision avd;
int rc;
- rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, aeref, &avd);
+ rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, NULL, &avd);
avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
return rc;
}
diff -rNU4 linux-2.6.8.1/security/selinux/include/avc.h linux-2.6.8.1.rcu/security/selinux/include/avc.h
--- linux-2.6.8.1/security/selinux/include/avc.h 2004-08-14 19:54:51.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/include/avc.h 2004-08-20 20:40:50.000000000 +0900
@@ -118,13 +118,11 @@
*/
void __init avc_init(void);
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref);
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested);
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *out_aeref);
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae);
void avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd, int result, struct avc_audit_data *auditdata);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 23:02 ` RCU issue with SELinux (Re: SELINUX performance issues) Paul E. McKenney
@ 2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 17:34 ` Paul E. McKenney
0 siblings, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-25 9:51 UTC (permalink / raw)
To: paulmck
Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
Hi Paul, thanks for your comments.
> > I modified the following points:
> > - We hold the lock for hash backet when avc_insert() and avc_ss_reset() are
> > called for safety.
> > - list_for_each_rcu() and list_entry() are replaced by list_for_entry().
>
> One subtlety here...
>
> The traversals that are protected by rcu_read_lock() (rather than an
> update-side spinlock) need to be list_for_each_entry_rcu() rather than
> list_for_each_entry(). The "_rcu()" is required in order to work
> reliably on Alpha, and has the added benefit of calling out exactly
> which traversals are RCU-protected.
>
> Update-side code remains list_for_each_entry().
It was a simple misconception.
I fixed them in the take3-patch.
> > - avc_node_dual structure which contains two avc_node objects is defined.
> > It allows to do avc_update_node() without kmalloc() or any locks.
>
> What happens when you have two consecutive updates to the same object?
> Don't you have to defer the second update until a grace period has
> elapsed since the first update in order to avoid confusing readers that
> are still accessing the original version?
I didn't imagine such a situation. Indeed, such thing may happen.
> One way to do this would be to set a "don't-touch-me" bit that is
> cleared by an RCU callback. An update to an element with the
> "don't-touch-me" bit set would block until the bit clears. There
> are probably better ways...
I think we can't apply this approach for the implementation
of avc_update_node(), because execution context isn't permitted to block.
I changed my opinion and implementation of avc_update_node().
If kmalloc() returns NULL in avc_update_node(), it returns -ENOMEM.
But this effect of changing the prototype is limited, because only
avc_has_perm_noaudit() and avc_update_cache() call avc_update_node().
Even if avc_update_node() return -ENOMEM to avc_has_perm_noaudit(),
avc_has_perm_noaudit() can ignore it, because the purpose is only
to control the audit-log floods.
This adverse effect is only that audit-logs are printed twice.
Nobody calls avc_update_cache(), which is only defined.
Some other trivial fixes are as follows:
- All list_for_each_entry() were replaced by list_for_each_entry_rcu().
- All spin_lock()/spin_unlock() were replaced by spin_lock_irqsave()
/spin_unlock_restore().
- In avc_node_insert(), if an entry with the same ssid/tsid/tclass as new
one exists, the older entry is replaced by the new one.
Thank you for the opinion as a specialist of RCU!
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-24 13:24 ` James Morris
@ 2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 18:31 ` James Morris
2004-08-25 9:52 ` [PATCH]atomic_inc_return() for i386/x86_64 (Re: RCU issue with SELinux) Kaigai Kohei
1 sibling, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-25 9:51 UTC (permalink / raw)
To: James Morris; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
Hi James, thanks for your comment.
> > o dbench [ 4 processes run parallely on 4-CPUs / 10 times trials ]
> > ---- mean ---- - STD -
> > 2.6.8.1(disable) 860.249 [MB/s] 44.683
> > 2.6.8.1(enable) 714.254 [MB/s] 32.359
> > 2.6.8.1(+rwlock) 767.904 [MB/s] 27.968
> > 2.6.8.1(+RCU) 830.678 [MB/s] 16.352
>
> Can you show the figures for 1 and 2 clients?
The results are as follows:
o dbench [ 1/2/4 processes run parallely on 4-CPUs / 10 times trials ]
- Average[MB/s] - -1proc- -2procs- -4procs-
2.6.8.1(disable) 247.43 473.11 891.51
2.6.8.1(enable) 218.99 434.97 761.06
2.6.8.1(+rwlock) 225.18 432.80 802.62
2.6.8.1(+RCU) 231.84 444.30 820.00
--------------------------------------------
(*) 2.6.8.1(+RCU) is applied the take3-patch.
By the way, the results of dbench are sharply changed at every measurement.
I don't think it is statistically-significant.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH]atomic_inc_return() for i386/x86_64 (Re: RCU issue with SELinux)
2004-08-24 13:24 ` James Morris
2004-08-25 9:51 ` Kaigai Kohei
@ 2004-08-25 9:52 ` Kaigai Kohei
1 sibling, 0 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-25 9:52 UTC (permalink / raw)
To: James Morris; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
[-- Attachment #1: Type: text/plain, Size: 963 bytes --]
Hi James, thanks for your comment.
> > In IA-32 or x86_64, can anybady implement atomic_inc_return()?
> > If it can not, I'll try to make alternative macros or inline functions.
>
> If you can get this done, it will be very useful, as I could allso run
> some benchmarks on my test systems.
atomic_inc_return() is not defined for arm,arm26,i386,x86_64 and um archtectures.
This attached patch adds atomic_inc_return() and atomic_dec_return() to arm,i386 and x86_64.
It is implemented by 'xaddl' operation with LOCK prefix for i386 and x86_64.
But this operation is permitted after i486 processor only.
Another implementation may be necessary for i386SX/DX processor.
But 'xaddl' operation is used in 'include/asm-i386/rwsem.h' unconditionally.
I think it has agreed on using 'xaddl' operation in past days.
Is it acceptable? Any comment please.
(*) The implementation for ARM is simple wrapper to atomic_add_return().
--------
Kai Gai <kaigai@ak.jp.nec.com>
[-- Attachment #2: atomic_inc_return-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 2739 bytes --]
--- linux-2.6.8.1/include/asm-arm/atomic.h 2004-08-14 19:54:50.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-arm/atomic.h 2004-08-24 19:31:56.000000000 +0900
@@ -194,8 +194,10 @@
#define atomic_dec(v) atomic_sub(1, v)
#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
+#define atomic_inc_return(v) (atomic_add_return(1, v))
+#define atomic_dec_return(v) (atomic_sub_return(1, v))
#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
/* Atomic operations are already serializing on ARM */
--- linux-2.6.8.1/include/asm-i386/atomic.h 2004-08-14 19:55:09.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-i386/atomic.h 2004-08-25 11:57:08.000000000 +0900
@@ -175,8 +175,34 @@
:"ir" (i), "m" (v->counter) : "memory");
return c;
}
+/**
+ * atomic_add_return - add and return
+ * @v: pointer of type atomic_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @v and returns @i + @v
+ */
+static __inline__ int atomic_add_return(int i, atomic_t *v)
+{
+ /* NOTICE: This function can be used on i486+ */
+ int __i = i;
+ __asm__ __volatile__(
+ LOCK "xaddl %0, %1;"
+ :"=r"(i)
+ :"m"(v->counter), "0"(i));
+ return i + __i;
+}
+
+static __inline__ int atomic_sub_return(int i, atomic_t *v)
+{
+ return atomic_add_return(-i,v);
+}
+
+#define atomic_inc_return(v) (atomic_add_return(1,v))
+#define atomic_dec_return(v) (atomic_sub_return(1,v))
+
/* These are x86-specific, used by some header files */
#define atomic_clear_mask(mask, addr) \
__asm__ __volatile__(LOCK "andl %0,%1" \
: : "r" (~(mask)),"m" (*addr) : "memory")
--- linux-2.6.8.1/include/asm-x86_64/atomic.h 2004-08-14 19:56:23.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-x86_64/atomic.h 2004-08-25 11:57:36.000000000 +0900
@@ -177,8 +177,33 @@
:"ir" (i), "m" (v->counter) : "memory");
return c;
}
+/**
+ * atomic_add_return - add and return
+ * @v: pointer of type atomic_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @v and returns @i + @v
+ */
+static __inline__ int atomic_add_return(int i, atomic_t *v)
+{
+ int __i = i;
+ __asm__ __volatile__(
+ LOCK "xaddl %0, %1;"
+ :"=r"(i)
+ :"m"(v->counter), "0"(i));
+ return i + __i;
+}
+
+static __inline__ int atomic_sub_return(int i, atomic_t *v)
+{
+ return atomic_add_return(-i,v);
+}
+
+#define atomic_inc_return(v) (atomic_add_return(1,v))
+#define atomic_dec_return(v) (atomic_sub_return(1,v))
+
/* These are x86-specific, used by some header files */
#define atomic_clear_mask(mask, addr) \
__asm__ __volatile__(LOCK "andl %0,%1" \
: : "r" (~(mask)),"m" (*addr) : "memory")
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-25 9:51 ` Kaigai Kohei
@ 2004-08-25 15:50 ` Stephen Smalley
2004-08-25 16:11 ` Stephen Smalley
2004-08-26 7:53 ` Kaigai Kohei
0 siblings, 2 replies; 31+ messages in thread
From: Stephen Smalley @ 2004-08-25 15:50 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Wed, 2004-08-25 at 05:51, Kaigai Kohei wrote:
> The attached take3-patch is modified as follows:
> - avc_node_dual was eliminated by Paul E.McKenny's suggestion.
> avc_update_node() calls kmalloc() and may return -ENOMEM.
> (But, I think this effect is so limited.)
> - All list_for_each_entry() were replaced by list_for_each_entry_rcu().
> - All spin_lock()/spin_unlock() were replaced by spin_lock_irqsave()
> /spin_unlock_restore().
> - In avc_node_insert(), if an entry with the same ssid/tsid/tclass as new
> one exists, the older entry is replaced by the new one.
>
> Thanks. I want to make it the last edition hopefully. :)
I haven't tracked down the cause yet, but a kernel built with all three
patches (list_replace_rcu, atomic_inc_return, and selinux.rcu take3) on
x86 doesn't allow an enforcing boot; it begins auditing denials _before_
the initial policy load (which should never happen, as
security_compute_av allows everything until the policy is loaded), and
prevents /sbin/init from loading the policy.
A few other comments on the patch:
+ new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (!new)
+ return NULL;
Dynamically allocating the nodes at runtime (rather than pre-allocating
them and then just reclaiming them as necessary as in the current AVC)
worries me, as it introduces a new failure case for avc_has_perm.
Denying permission to a resource due to transient memory shortage is not
good for robustness. And changing the GFP_ATOMIC is not an option, as
calling context may not allow blocking. Hence, pre-allocation seems
desirable, regardless of the locking scheme.
+static int avc_latest_notif_update(int seqno, int is_insert)
+{
+ int ret = 0;
+ static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
+ unsigned long flag;
+
+ spin_lock_irqsave(¬if_lock, flag);
+ if (seqno < avc_cache.latest_notif) {
+ if (is_insert) {
+ printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
+ seqno, avc_cache.latest_notif);
+ ret = -EAGAIN;
+ } else {
+ avc_cache.latest_notif = seqno;
+ }
+ }
+ spin_unlock_irqrestore(¬if_lock, flag);
+ return ret;
}
In trying to merge the logic related to latest_notif, you've introduced
a bug - latest_notif should only be increased, never decreased. See the
original logic from avc_control and avc_ss_reset prior to your patch.
Those functions update the latest notif based on a policy change event.
In the insert case, you are checking that the entry is not stale, i.e.
has a smaller seqno than the latest notification due to an interleaving
with a policy change event.
+ if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
+ avc_update_node(AVC_CALLBACK_GRANT
+ ,requested,ssid,tsid,tclass);
}
The test seems unnecessary, as the function has already determined that
not all of the requested permissions were granted, so you should be able
to just unconditionally call avc_update_node here, and you only need to
pass it the denied set that has already been computed, as any other
permissions in requested were already allowed.
--
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-25 15:50 ` Stephen Smalley
@ 2004-08-25 16:11 ` Stephen Smalley
2004-08-26 7:53 ` Kaigai Kohei
1 sibling, 0 replies; 31+ messages in thread
From: Stephen Smalley @ 2004-08-25 16:11 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Wed, 2004-08-25 at 11:50, Stephen Smalley wrote:
> I haven't tracked down the cause yet, but a kernel built with all three
> patches (list_replace_rcu, atomic_inc_return, and selinux.rcu take3) on
> x86 doesn't allow an enforcing boot; it begins auditing denials _before_
> the initial policy load (which should never happen, as
> security_compute_av allows everything until the policy is loaded), and
> prevents /sbin/init from loading the policy.
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref)
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
{
struct avc_node *node;
- int probes, rc = 0;
+ int probes;
avc_cache_stats_incr(AVC_CAV_LOOKUPS);
node = avc_search_node(ssid, tsid, tclass,&probes);
if (node && ((node->ae.avd.decided & requested) == requested)) {
avc_cache_stats_incr(AVC_CAV_HITS);
avc_cache_stats_add(AVC_CAV_PROBES,probes);
- aeref->ae = &node->ae;
goto out;
}
avc_cache_stats_incr(AVC_CAV_MISSES);
- rc = -ENOENT;
out:
- return rc;
+ return node;
+}
Ah, I think the bug is here. avc_search_node() can return a node that
matches the (ssid,tsid,tclass) triple but doesn't include all of the
necessary decisions (the decided vector), but your avc_lookup() code
falls through nonetheless and returns it. This happens normally during
initialization prior to policy load, where security_compute_av is only
adding decisions as they are requested. Notice that the original code
set rc to ENOENT on that path; in your case, you need to reset node to
NULL.
--
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-25 9:51 ` Kaigai Kohei
@ 2004-08-25 17:34 ` Paul E. McKenney
0 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2004-08-25 17:34 UTC (permalink / raw)
To: Kaigai Kohei
Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Wed, Aug 25, 2004 at 06:51:53PM +0900, Kaigai Kohei wrote:
Hello again, Kai,
> Hi Paul, thanks for your comments.
>
> > > I modified the following points:
> > > - We hold the lock for hash backet when avc_insert() and avc_ss_reset() are
> > > called for safety.
> > > - list_for_each_rcu() and list_entry() are replaced by list_for_entry().
> >
> > One subtlety here...
> >
> > The traversals that are protected by rcu_read_lock() (rather than an
> > update-side spinlock) need to be list_for_each_entry_rcu() rather than
> > list_for_each_entry(). The "_rcu()" is required in order to work
> > reliably on Alpha, and has the added benefit of calling out exactly
> > which traversals are RCU-protected.
> >
> > Update-side code remains list_for_each_entry().
>
> It was a simple misconception.
> I fixed them in the take3-patch.
Let's take them one at a time:
1. The list_for_each_entry_rcu() in avc_hash_eval() is correct.
It is protected by rcu_read_lock(), so it is documenting
a RCU-protected traversal of the list, and inserting needed
memory barriers on Alpha CPUs.
2. The list_for_each_entry_rcu() in avc_reclaim_node() should
instead by list_for_each_entry(), with no _rcu(). This is
because this traversal is guarded by the update-side lock,
not by RCU. The fact that the lock is held means that no
other CPU can be changing this list during the traversal.
3. The list_for_each_entry_rcu() in avc_search_node() is correct.
As in item #1, it is protected by rcu_read_lock().
4. The list_for_each_entry_rcu() in avc_insert() is more interesting,
but needs to be list_for_each_entry(). It is protected by -both-
the spinlock and by an rcu_read_lock() acquired by the caller!
However, the spinlock prevents any other CPU from modifying the
data structure, so this should be list_for_each_entry() rather
than the current list_for_each_entry_rcu().
What is happening is that the code is upgrading from a read-side
rcu_read_lock() to an update-side spin_lock_irqsave(). This
is legal, though perhaps a bit unconventional.
However, looking closer, I don't understand why the rcu_read_lock()
needs to be re-acquired before the call to avc_insert() in
avc_has_perm_noaudit():
o avc_latest_notif_update() allocates a sequence number
under a lock.
o avc_get_node() allocates and initializes a node. It does
call avc_reclaim_node(), but this function is guarded
by a spinlock.
o avc_insert() calls the above two functions, and other
than that, initializes the new node (which no other CPU
has access to), and inserts it under the spinlock.
So I do not believe that avc_insert() needs rcu_read_lock().
Unless I am missing something, the rcu_read_lock() acquired
in avc_has_perm_noaudit() should be moved after the call to
avc_insert().
I was concerned about the possibility that avc_has_perm_noaudit()
might be preempted between the rcu_read_unlock() and the
rcu_read_lock(), but this issue appears to be correctly handled
by the code in avc_insert(), which handles either case, updating
the existing node or inserting a new one.
5. The list_for_each_entry_rcu() in avc_update_node() should be
changed to list_for_each_entry(), since it is protected
by the update-side spinlock, and therefore does not need
RCU protection, as in item #2 above. This is another case
where an rcu_read_lock() is "upgraded" to a write-side lock
by acquiring a spinlock.
6. The list_for_each_entry_rcu() in avc_update_cache() is correct.
It is protected only by rcu_read_lock(), so RCU protection is
required.
7. The list_for_each_entry_rcu() in avc_ss_reset() needs to be
changed to list_for_each_entry(), since it is protected by
the update-side spinlock. Since the array is statically allocated,
there is no need for the enclosing rcu_read_lock() and
rcu_read_unlock(), and they should be removed.
I clearly need to provide more documentation. ;-) This is in progress.
See below for a patch that applies on top of your take3 patch, since
the C code might be more clear than the English in this case.
> > > - avc_node_dual structure which contains two avc_node objects is defined.
> > > It allows to do avc_update_node() without kmalloc() or any locks.
> >
> > What happens when you have two consecutive updates to the same object?
> > Don't you have to defer the second update until a grace period has
> > elapsed since the first update in order to avoid confusing readers that
> > are still accessing the original version?
>
> I didn't imagine such a situation. Indeed, such thing may happen.
>
> > One way to do this would be to set a "don't-touch-me" bit that is
> > cleared by an RCU callback. An update to an element with the
> > "don't-touch-me" bit set would block until the bit clears. There
> > are probably better ways...
>
> I think we can't apply this approach for the implementation
> of avc_update_node(), because execution context isn't permitted to block.
That would certainly invalidate my suggestion! ;-)
> I changed my opinion and implementation of avc_update_node().
> If kmalloc() returns NULL in avc_update_node(), it returns -ENOMEM.
>
> But this effect of changing the prototype is limited, because only
> avc_has_perm_noaudit() and avc_update_cache() call avc_update_node().
>
> Even if avc_update_node() return -ENOMEM to avc_has_perm_noaudit(),
> avc_has_perm_noaudit() can ignore it, because the purpose is only
> to control the audit-log floods.
> This adverse effect is only that audit-logs are printed twice.
>
> Nobody calls avc_update_cache(), which is only defined.
Sounds good!
> Some other trivial fixes are as follows:
> - All list_for_each_entry() were replaced by list_for_each_entry_rcu().
See above. ;-)
> - All spin_lock()/spin_unlock() were replaced by spin_lock_irqsave()
> /spin_unlock_restore().
> - In avc_node_insert(), if an entry with the same ssid/tsid/tclass as new
> one exists, the older entry is replaced by the new one.
>
> Thank you for the opinion as a specialist of RCU!
Thank -you- for giving RCU a try! Again, the performance results are
quite impressive!
Thanx, Paul
diff -urpN -X dontdiff linux-2.6.8.1-selinux3.rcu/security/selinux/avc.c linux-2.6.8.1-selinux3.rcu.pem/security/selinux/avc.c
--- linux-2.6.8.1-selinux3.rcu/security/selinux/avc.c Wed Aug 25 09:34:26 2004
+++ linux-2.6.8.1-selinux3.rcu.pem/security/selinux/avc.c Wed Aug 25 10:30:10 2004
@@ -253,7 +253,7 @@ static inline int avc_reclaim_node(void)
if (!spin_trylock(&avc_cache.slots_lock[hvalue]))
continue;
- list_for_each_entry_rcu(node, &avc_cache.slots[hvalue], list) {
+ list_for_each_entry(node, &avc_cache.slots[hvalue], list) {
if (!atomic_dec_and_test(&node->ae.used)) {
/* Recently Unused */
list_del_rcu(&node->list);
@@ -419,7 +419,7 @@ struct avc_node *avc_insert(u32 ssid, u3
node->ae.avd.seqno = ae->avd.seqno;
spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
- list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list) {
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
if (pos->ae.ssid == ssid &&
pos->ae.tsid == tsid &&
pos->ae.tclass == tclass) {
@@ -719,7 +719,7 @@ static int avc_update_node(u32 event, u3
hvalue = avc_hash(ssid, tsid, tclass);
spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
- list_for_each_entry_rcu(pos, &avc_cache.slots[hvalue], list){
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list){
if ( ssid==pos->ae.ssid &&
tsid==pos->ae.tsid &&
tclass==pos->ae.tclass ){
@@ -912,16 +912,14 @@ int avc_ss_reset(u32 seqno)
avc_hash_eval("reset");
- rcu_read_lock();
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
spin_lock_irqsave(&avc_cache.slots_lock[i], flag);
- list_for_each_entry_rcu(node, &avc_cache.slots[i], list){
+ list_for_each_entry(node, &avc_cache.slots[i], list){
list_del_rcu(&node->list);
call_rcu(&node->rhead, avc_node_free);
}
spin_unlock_irqrestore(&avc_cache.slots_lock[i], flag);
}
- rcu_read_unlock();
for (i = 0; i < AVC_NSTATS; i++)
avc_cache_stats[i] = 0;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-25 9:51 ` Kaigai Kohei
@ 2004-08-25 18:31 ` James Morris
0 siblings, 0 replies; 31+ messages in thread
From: James Morris @ 2004-08-25 18:31 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng)
On Wed, 25 Aug 2004, Kaigai Kohei wrote:
> By the way, the results of dbench are sharply changed at every measurement.
> I don't think it is statistically-significant.
Try running dbench on an ext2 filesystem (including both the 'client'
file and where you run it locally).
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-25 15:50 ` Stephen Smalley
2004-08-25 16:11 ` Stephen Smalley
@ 2004-08-26 7:53 ` Kaigai Kohei
2004-08-26 13:24 ` Stephen Smalley
1 sibling, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-26 7:53 UTC (permalink / raw)
To: Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
Hi Stephen, thanks for your comments.
> A few other comments on the patch:
>
> + new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
> + if (!new)
> + return NULL;
>
> Dynamically allocating the nodes at runtime (rather than pre-allocating
> them and then just reclaiming them as necessary as in the current AVC)
> worries me, as it introduces a new failure case for avc_has_perm.
> Denying permission to a resource due to transient memory shortage is not
> good for robustness. And changing the GFP_ATOMIC is not an option, as
> calling context may not allow blocking. Hence, pre-allocation seems
> desirable, regardless of the locking scheme.
In my understanding, your worry about robustness is the execution path
when kmalloc() returns NULL.
The most significant point is there.
--[ in the original kernel-2.6.8.1 ]--
:
rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
if (rc) {
spin_unlock_irqrestore(&avc_lock,flags);
goto out;
}
:
--------------------------------------
In the original implementation, avc_insert() returns -ENOMEM when we can't
hold a avc_entry by avc_claim_node(), then avc_has_perm_noaudit() denies
the requested permition check by the reason an avc_node is not allocated.
(But avc_insert() always returns 0, because avc_insert() reclaim a avc_node
under the spinlock when free_list is empty.)
However, allocating an avc_node and making the decision according to
the policy of SELinux should be separated in considering.
In my approach with RCU, avc_insert() can actually return NULL.
Currently, avc_has_perm_noaudit() returns -ENOMEM without the decision-making
like as the original implementation.
But we can make an decision according to the policy of SELinux by a suitable
recover processing.
My proposal is as follows:
- Normaly, decision-making is performed to the entry on AVC.
These entries are allocated by kmalloc()
- When kmalloc() called by the extension of avc_insert() returns NULL,
the decision-making is performed to the entry on stack variable,
then the entry is not hold on AVC.
-- e.g. in avc_has_perm_noaudit() --------------
:
struct avc_entry local_ae, *ae;
:
node = avc_insert(....);
if (!node) {
SETUP_LOCAL_AE(&local_ae,&entry.avd);
ae = &local_ae;
} else {
ae = &node->ae;
}
if (avd)
memcpy(avd, &ae->avd, sizeof(*avd));
:
------------------------------------------------
By this method, the decision-making is available irrespective of
the result of kmalloc(). Is it robustless?
The original implementation has too many lock contensitons in Big-SMP
environment. It is more positive to consider the method using RCU.
> In trying to merge the logic related to latest_notif, you've introduced
> a bug - latest_notif should only be increased, never decreased. See the
> original logic from avc_control and avc_ss_reset prior to your patch.
> Those functions update the latest notif based on a policy change event.
> In the insert case, you are checking that the entry is not stale, i.e.
> has a smaller seqno than the latest notification due to an interleaving
> with a policy change event.
Ooooooops!
It is toooooo trivial BUG!
I'll fix them very soon.
> + if (node->ae.avd.allowed != (node->ae.avd.allowed|requested))
> + avc_update_node(AVC_CALLBACK_GRANT
> + ,requested,ssid,tsid,tclass);
> }
>
> The test seems unnecessary, as the function has already determined that
> not all of the requested permissions were granted, so you should be able
> to just unconditionally call avc_update_node here, and you only need to
> pass it the denied set that has already been computed, as any other
> permissions in requested were already allowed.
Indeed, it is right. I'll fix them soon.
avc_update_node() is in the section between rcu_read_lock() and rcu_read_unlock().
> - rc = -ENOENT;
> out:
> - return rc;
> + return node;
> +}
>
> Ah, I think the bug is here.
Oops!
I agree, and fix this soon.
Please wait for a patch, thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-26 7:53 ` Kaigai Kohei
@ 2004-08-26 13:24 ` Stephen Smalley
2004-08-27 11:07 ` Kaigai Kohei
2004-08-30 11:17 ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
0 siblings, 2 replies; 31+ messages in thread
From: Stephen Smalley @ 2004-08-26 13:24 UTC (permalink / raw)
To: Kaigai Kohei; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Thu, 2004-08-26 at 03:53, Kaigai Kohei wrote:
> In my understanding, your worry about robustness is the execution path
> when kmalloc() returns NULL.
Correct.
> (But avc_insert() always returns 0, because avc_insert() reclaim a avc_node
> under the spinlock when free_list is empty.)
Yes, this is the point. avc_has_perm could not fail previously from an
out of memory condition, as the cache nodes were preallocated,
maintained on their own freelist, and reclaimed as needed.
> By this method, the decision-making is available irrespective of
> the result of kmalloc(). Is it robustless?
> The original implementation has too many lock contensitons in Big-SMP
> environment. It is more positive to consider the method using RCU.
Yes, that would address my concern. However, I'm still unclear as to
why using RCU mandates that we migrate from preallocated nodes to
dynamic allocation. I certainly agree that the existing global spinlock
doesn't scale.
> Please wait for a patch, thanks.
Thanks for working on this. Could you also supply updated performance
data when you have a newer patch? Thanks.
--
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: RCU issue with SELinux (Re: SELINUX performance issues)
2004-08-26 13:24 ` Stephen Smalley
@ 2004-08-27 11:07 ` Kaigai Kohei
2004-08-30 11:17 ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
1 sibling, 0 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-27 11:07 UTC (permalink / raw)
To: Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
Hi Stephen, thanks for your comments.
> > By this method, the decision-making is available irrespective of
> > the result of kmalloc(). Is it robustless?
> > The original implementation has too many lock contensitons in Big-SMP
> > environment. It is more positive to consider the method using RCU.
>
> Yes, that would address my concern. However, I'm still unclear as to
> why using RCU mandates that we migrate from preallocated nodes to
> dynamic allocation. I certainly agree that the existing global spinlock
> doesn't scale.
When avc_reclaim_node() is called, one or some nodes are reclaimed
in both approachs(spinlock/RCU).
But we can't use the reclaimed nodes immediately in RCU-approach,
because it can't guarantee that nobody refers the nodes.
(These nodes are released actually after non-deterministic period.)
The success of avc_reclaim_node() does not mean that we can hold
an avc_node object immediately!
Therefore, we need to allocate a new avc_node object by kmalloc().
In original spinlock implementation, the reclaimed node is chained
to the 'avc_node_freelist' under the spinlock. Thus, we can use
the reclaimed node immediately.
Indeed, I had considered the RCU-approach with pre-allocation,
but I faced to the above difficult problem, and gave up.
Then, I make it with kmalloc() alternatively, so fine.
> > Please wait for a patch, thanks.
>
> Thanks for working on this. Could you also supply updated performance
> data when you have a newer patch? Thanks.
OK, the benchmark results will also be updated.
Thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-26 13:24 ` Stephen Smalley
2004-08-27 11:07 ` Kaigai Kohei
@ 2004-08-30 11:17 ` Kaigai Kohei
2004-08-30 15:35 ` Stephen Smalley
2004-08-31 15:33 ` James Morris
1 sibling, 2 replies; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-30 11:17 UTC (permalink / raw)
To: Stephen Smalley; +Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
[-- Attachment #1: Type: text/plain, Size: 4345 bytes --]
Hi Stephen, thanks for your comments.
> > Please wait for a patch, thanks.
>
> Thanks for working on this. Could you also supply updated performance
> data when you have a newer patch? Thanks.
I fixed the take-3 patch according to your and Paul's suggestions.
The attached take-4 patches replace the avc_lock in security/selinux/avc.c
by the lock-less read access with RCU.
The patches consist of three parts as follows:
[1] selinux.rcu-2.6.8.1-take4.patch
removed the spinlock (avc_lock) from security/selinux/avc.c,
and applied RCU-approach for the lock-less read access to the AVC.
[2] list_replace_rcu-2.6.8.1.patch
added list_replace_rcu() to include/linux/list.h for the atomic updating
operations according to RCU-model.
[3] atomic_inc_return-2.6.8.1.patch
added atomic_inc_return() and atomic_dec_return() to i386, x86_64 and ARM
architectures. It's necessary for the [1] patch.
[Changes by these patches from original implementation]
- Some of the data structures are re-defined.
(struct avc_entry/struct avc_node/struct avc_cache)
- The direct references from security-private member in inode, task_struct and so on
were removed, since these references prevent from applying RCU.
Currently, avc_entry_ref is not used.
- The pre-allocation semantics of avc_node structure was replaced by kmalloc()
on demand, since it's necessary for applying RCU.
(The number of nodes is adjusted by AVC_CACHE_THRESHOLD, but it's not a hard limit.)
- avc_update_node() can return -ENOMEM when kmalloc() returns NULL.
But this is not a significant problem, since the function is actually called
by only avc_has_perm_noaudit() for the control of the flood of audit-logs.
Signed-off-by: KaiGai, Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: Takayoshi Kochi <t-kochi@bq.jp.nec.com>
Thanks for all the comments and suggestions.
[Performance & Scalability results] --------------------------------------
2.6.8.1(disable/enable) : Original Stock Kernel
2.6.8.1.rwlock : Improvement by rwlock(not disclosed)
2.6.8.1.rcu : Improvement by RCU(attached take-4 patch)
o 500,000 times of write() syscall to the file on /dev/shm
on Itanium2 x 1/2/4/8/16/32 + enough memory(needless to swap-out)
*) The num of Processes equals the num of CPUs.
---------------- --1CPU-- --2CPU-- --4CPU-- -- 8CPU-- --16CPU-- --32CPU--
2.6.8.1(disable) 8.2959 8.0430 8.0158 8.0183 8.0146 8.0037
2.6.8.1(enable) 11.8427 14.0358 78.0957 319.0451 1322.0313 too long
2.6.8.1.rwlock 11.2485 13.8688 20.0100 49.0390 90.0200 177.0533
2.6.8.1.rcu 11.3754 11.2028 11.1743 11.1402 11.1216 11.2635
---------------- -------- -------- -------- --------- --------- ---------
o dbench (10 times mean results)
- Average[MB/s] - -1proc- -2procs- -4procs-
2.6.8.1(disable) 247.43 473.11 891.51
2.6.8.1(enable) 218.99 434.97 761.06
2.6.8.1(+rwlock) 225.18 432.80 802.62
2.6.8.1(+RCU) 231.16 447.62 838.15
--------------------------------------------
o UNIX-bench
* INDEX value comparison ---------------------------------------------------
2.6.8.1 2.6.8.1 2.6.8.1 2.6.8.1
(Disable) (Enable) (rwlock) (RCU)
Dhrystone 2 using register variables 268.9 268.8 269.2 268.7
Double-Precision Whetstone 94.2 94.2 94.2 94.2
Execl Throughput 388.3 379.0 377.8 377.0
File Copy 1024 bufsize 2000 maxblocks 606.6 526.6 515.6 513.1
File Copy 256 bufsize 500 maxblocks 508.9 417.0 410.4 402.9
File Copy 4096 bufsize 8000 maxblocks 987.1 890.4 876.0 864.1
Pipe Throughput 525.1 406.4 404.5 356.2
Process Creation 321.2 317.8 315.9 316.5
Shell Scripts (8 concurrent) 1312.8 1276.2 1278.8 1279.5
System Call Overhead 467.1 468.7 464.1 468.1
========================================
FINAL SCORE 445.8 413.2 410.1 403.8
----------------------------------------------------------------------------
--------
Kai Gai <kaigai@ak.jp.nec.com>
[-- Attachment #2: selinux.rcu-2.6.8.1-take4.patch --]
[-- Type: application/octet-stream, Size: 21657 bytes --]
diff -rNU4 linux-2.6.8.1/security/selinux/avc.c linux-2.6.8.1.rcu/security/selinux/avc.c
--- linux-2.6.8.1/security/selinux/avc.c 2004-08-14 19:55:48.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/avc.c 2004-08-30 20:00:27.000000000 +0900
@@ -3,8 +3,11 @@
*
* Authors: Stephen Smalley, <sds@epoch.ncsc.mil>
* James Morris <jmorris@redhat.com>
*
+ * Update: KaiGai, Kohei <kaigai@ak.jp.nec.com>
+ * Replaced the avc_lock spinlock by RCU.
+ *
* Copyright (C) 2003 Red Hat, Inc., James Morris <jmorris@redhat.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2,
@@ -35,28 +38,32 @@
#include "av_perm_to_string.h"
#include "objsec.h"
#define AVC_CACHE_SLOTS 512
-#define AVC_CACHE_MAXNODES 410
+#define AVC_CACHE_THRESHOLD 410
+#define AVC_CACHE_RECLAIM 16
struct avc_entry {
u32 ssid;
u32 tsid;
u16 tclass;
struct av_decision avd;
- int used; /* used recently */
+ atomic_t used; /* used recently */
};
struct avc_node {
struct avc_entry ae;
- struct avc_node *next;
+ struct list_head list;
+ struct rcu_head rhead;
};
struct avc_cache {
- struct avc_node *slots[AVC_CACHE_SLOTS];
- u32 lru_hint; /* LRU hint for reclaim scan */
- u32 active_nodes;
- u32 latest_notif; /* latest revocation notification */
+ struct list_head slots[AVC_CACHE_SLOTS];
+ spinlock_t slots_lock[AVC_CACHE_SLOTS];
+ /* lock for insert/update/delete and reset */
+ atomic_t lru_hint; /* LRU hint for reclaim scan */
+ atomic_t active_nodes;
+ u32 latest_notif; /* latest revocation notification */
};
struct avc_callback_node {
int (*callback) (u32 event, u32 ssid, u32 tsid,
@@ -69,10 +76,8 @@
u32 perms;
struct avc_callback_node *next;
};
-static spinlock_t avc_lock = SPIN_LOCK_UNLOCKED;
-static struct avc_node *avc_node_freelist;
static struct avc_cache avc_cache;
static unsigned avc_cache_stats[AVC_NSTATS];
static struct avc_callback_node *avc_callbacks;
@@ -187,23 +192,17 @@
* Initialize the access vector cache.
*/
void __init avc_init(void)
{
- struct avc_node *new;
int i;
- for (i = 0; i < AVC_CACHE_MAXNODES; i++) {
- new = kmalloc(sizeof(*new), GFP_ATOMIC);
- if (!new) {
- printk(KERN_WARNING "avc: only able to allocate "
- "%d entries\n", i);
- break;
- }
- memset(new, 0, sizeof(*new));
- new->next = avc_node_freelist;
- avc_node_freelist = new;
- }
-
+ for (i =0; i < AVC_CACHE_SLOTS; i++) {
+ INIT_LIST_HEAD(&avc_cache.slots[i]);
+ avc_cache.slots_lock[i] = SPIN_LOCK_UNLOCKED;
+ }
+ atomic_set(&avc_cache.active_nodes, 0);
+ atomic_set(&avc_cache.lru_hint, 0);
+
audit_log(current->audit_context, "AVC INITIALIZED\n");
}
#if 0
@@ -212,27 +211,24 @@
int i, chain_len, max_chain_len, slots_used;
struct avc_node *node;
unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
slots_used = 0;
max_chain_len = 0;
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- if (node) {
+ if (!list_empty(&avc_cache.slots[i])) {
slots_used++;
chain_len = 0;
- while (node) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[i])
chain_len++;
- node = node->next;
- }
if (chain_len > max_chain_len)
max_chain_len = chain_len;
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
printk(KERN_INFO "\n");
printk(KERN_INFO "%s avc: %d entries and %d/%d buckets used, longest "
"chain length %d\n", tag, avc_cache.active_nodes, slots_used,
@@ -242,188 +238,210 @@
static inline void avc_hash_eval(char *tag)
{ }
#endif
-static inline struct avc_node *avc_reclaim_node(void)
+static void avc_node_free(struct rcu_head *rhead) {
+ struct avc_node *node;
+ node = container_of(rhead, struct avc_node, rhead);
+ kfree(node);
+}
+
+static inline int avc_reclaim_node(void)
{
- struct avc_node *prev, *cur;
- int hvalue, try;
+ struct avc_node *node;
+ int hvalue, try, ecx;
- hvalue = avc_cache.lru_hint;
- for (try = 0; try < 2; try++) {
- do {
- prev = NULL;
- cur = avc_cache.slots[hvalue];
- while (cur) {
- if (!cur->ae.used)
- goto found;
+ for (try = 0, ecx = 0; try < AVC_CACHE_SLOTS; try++ ) {
+ hvalue = atomic_inc_return(&avc_cache.lru_hint) % AVC_CACHE_SLOTS;
- cur->ae.used = 0;
+ if (!spin_trylock(&avc_cache.slots_lock[hvalue]))
+ continue;
- prev = cur;
- cur = cur->next;
+ list_for_each_entry(node, &avc_cache.slots[hvalue], list) {
+ if (!atomic_dec_and_test(&node->ae.used)) {
+ /* Recently Unused */
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ ecx++;
+ if (ecx >= AVC_CACHE_RECLAIM) {
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
+ goto out;
+ }
}
- hvalue = (hvalue + 1) & (AVC_CACHE_SLOTS - 1);
- } while (hvalue != avc_cache.lru_hint);
+ }
+ spin_unlock(&avc_cache.slots_lock[hvalue]);
}
-
- panic("avc_reclaim_node");
-
-found:
- avc_cache.lru_hint = hvalue;
-
- if (prev == NULL)
- avc_cache.slots[hvalue] = cur->next;
- else
- prev->next = cur->next;
-
- return cur;
+out:
+ return ecx;
}
-static inline struct avc_node *avc_claim_node(u32 ssid,
- u32 tsid, u16 tclass)
+static inline struct avc_node *avc_get_node(void)
{
struct avc_node *new;
- int hvalue;
-
- hvalue = avc_hash(ssid, tsid, tclass);
- if (avc_node_freelist) {
- new = avc_node_freelist;
- avc_node_freelist = avc_node_freelist->next;
- avc_cache.active_nodes++;
- } else {
- new = avc_reclaim_node();
- if (!new)
- goto out;
- }
+ int actives;
- new->ae.used = 1;
- new->ae.ssid = ssid;
- new->ae.tsid = tsid;
- new->ae.tclass = tclass;
- new->next = avc_cache.slots[hvalue];
- avc_cache.slots[hvalue] = new;
+ new = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (!new)
+ return NULL;
+
+ INIT_RCU_HEAD(&new->rhead);
+ INIT_LIST_HEAD(&new->list);
+
+ actives = atomic_inc_return(&avc_cache.active_nodes);
+ if (actives > AVC_CACHE_THRESHOLD)
+ avc_reclaim_node();
-out:
return new;
}
static inline struct avc_node *avc_search_node(u32 ssid, u32 tsid,
u16 tclass, int *probes)
{
- struct avc_node *cur;
+ struct avc_node *node, *ret = NULL;
int hvalue;
int tprobes = 1;
hvalue = avc_hash(ssid, tsid, tclass);
- cur = avc_cache.slots[hvalue];
- while (cur != NULL &&
- (ssid != cur->ae.ssid ||
- tclass != cur->ae.tclass ||
- tsid != cur->ae.tsid)) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[hvalue], list) {
+ if (ssid == node->ae.ssid &&
+ tclass == node->ae.tclass &&
+ tsid == node->ae.tsid) {
+ ret = node;
+ break;
+ }
tprobes++;
- cur = cur->next;
}
- if (cur == NULL) {
+ if (ret == NULL) {
/* cache miss */
goto out;
}
/* cache hit */
if (probes)
*probes = tprobes;
-
- cur->ae.used = 1;
-
+ if (atomic_read(&ret->ae.used) != 1)
+ atomic_set(&ret->ae.used, 1);
out:
- return cur;
+ return ret;
}
/**
* avc_lookup - Look up an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
*
* Look up an AVC entry that is valid for the
* @requested permissions between the SID pair
* (@ssid, @tsid), interpreting the permissions
* based on @tclass. If a valid AVC entry exists,
- * then this function updates @aeref to refer to the
- * entry and returns %0. Otherwise, this function
- * returns -%ENOENT.
+ * then this function return the avc_node.
+ * Otherwise, this function returns NULL.
*/
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref)
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested)
{
struct avc_node *node;
- int probes, rc = 0;
+ int probes;
avc_cache_stats_incr(AVC_CAV_LOOKUPS);
node = avc_search_node(ssid, tsid, tclass,&probes);
if (node && ((node->ae.avd.decided & requested) == requested)) {
avc_cache_stats_incr(AVC_CAV_HITS);
avc_cache_stats_add(AVC_CAV_PROBES,probes);
- aeref->ae = &node->ae;
goto out;
}
+ node = NULL;
avc_cache_stats_incr(AVC_CAV_MISSES);
- rc = -ENOENT;
out:
- return rc;
+ return node;
+}
+
+static int avc_latest_notif_update(int seqno, int is_insert)
+{
+ int ret = 0;
+ static spinlock_t notif_lock = SPIN_LOCK_UNLOCKED;
+ unsigned long flag;
+
+ spin_lock_irqsave(¬if_lock, flag);
+ if (is_insert) {
+ if (seqno < avc_cache.latest_notif) {
+ printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
+ seqno, avc_cache.latest_notif);
+ ret = -EAGAIN;
+ }
+ } else {
+ if (seqno > avc_cache.latest_notif)
+ avc_cache.latest_notif = seqno;
+ }
+ spin_unlock_irqrestore(¬if_lock, flag);
+
+ return ret;
}
/**
* avc_insert - Insert an AVC entry.
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @ae: AVC entry
- * @aeref: AVC entry reference
*
* Insert an AVC entry for the SID pair
* (@ssid, @tsid) and class @tclass.
* The access vectors and the sequence number are
* normally provided by the security server in
* response to a security_compute_av() call. If the
* sequence number @ae->avd.seqno is not less than the latest
* revocation notification, then the function copies
- * the access vectors into a cache entry, updates
- * @aeref to refer to the entry, and returns %0.
- * Otherwise, this function returns -%EAGAIN.
+ * the access vectors into a cache entry, returns
+ * avc_node inserted. Otherwise, this function returns NULL.
*/
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *aeref)
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae)
{
- struct avc_node *node;
- int rc = 0;
+ struct avc_node *pos, *node = NULL;
+ int hvalue;
+ unsigned long flag;
- if (ae->avd.seqno < avc_cache.latest_notif) {
- printk(KERN_WARNING "avc: seqno %d < latest_notif %d\n",
- ae->avd.seqno, avc_cache.latest_notif);
- rc = -EAGAIN;
+ if (avc_latest_notif_update(ae->avd.seqno, 1))
goto out;
- }
- node = avc_claim_node(ssid, tsid, tclass);
- if (!node) {
- rc = -ENOMEM;
- goto out;
- }
+ node = avc_get_node();
+
+ if (node) {
+ hvalue = avc_hash(ssid, tsid, tclass);
- node->ae.avd.allowed = ae->avd.allowed;
- node->ae.avd.decided = ae->avd.decided;
- node->ae.avd.auditallow = ae->avd.auditallow;
- node->ae.avd.auditdeny = ae->avd.auditdeny;
- node->ae.avd.seqno = ae->avd.seqno;
- aeref->ae = &node->ae;
+ node->ae.ssid = ssid;
+ node->ae.tsid = tsid;
+ node->ae.tclass = tclass;
+ atomic_set(&node->ae.used, 1);
+
+ node->ae.avd.allowed = ae->avd.allowed;
+ node->ae.avd.decided = ae->avd.decided;
+ node->ae.avd.auditallow = ae->avd.auditallow;
+ node->ae.avd.auditdeny = ae->avd.auditdeny;
+ node->ae.avd.seqno = ae->avd.seqno;
+
+ spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
+ if (pos->ae.ssid == ssid &&
+ pos->ae.tsid == tsid &&
+ pos->ae.tclass == tclass) {
+ list_replace_rcu(&pos->list, &node->list);
+ call_rcu(&pos->rhead, avc_node_free);
+ atomic_dec(&avc_cache.active_nodes);
+ goto found;
+ }
+ }
+ list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
+found:
+ spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag);
+ }
out:
- return rc;
+ return node;
}
static inline void avc_print_ipv6_addr(struct audit_buffer *ab,
struct in6_addr *addr, u16 port,
@@ -685,10 +703,51 @@
{
return (x == y || x == SECSID_WILD || y == SECSID_WILD);
}
-static inline void avc_update_node(u32 event, struct avc_node *node, u32 perms)
+/**
+ * avc_update_node Update an AVC entry
+ * @event : Updating event
+ * @perms : Permission mask bits
+ * @ssid,@tsid,@tclass : identifier of an AVC entry
+ *
+ * if a valid AVC entry doesn't exist,this function returns -ENOENT.
+ * if kmalloc() called internal returns NULL, this function returns -ENOMEM.
+ * otherwise, this function update the AVC entry. The original AVC-entry object
+ * will release later by RCU.
+ */
+static int avc_update_node(u32 event, u32 perms, u32 ssid, u32 tsid, u16 tclass)
{
+ int hvalue, rc = 0;
+ unsigned long flag;
+ struct avc_node *pos, *node, *org = NULL;
+
+ /* Lock the target slot */
+ hvalue = avc_hash(ssid, tsid, tclass);
+ spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
+
+ list_for_each_entry(pos, &avc_cache.slots[hvalue], list){
+ if ( ssid==pos->ae.ssid &&
+ tsid==pos->ae.tsid &&
+ tclass==pos->ae.tclass ){
+ org = pos;
+ break;
+ }
+ }
+
+ if (!org) {
+ rc = -ENOENT;
+ goto out;
+ }
+
+ node = kmalloc(sizeof(struct avc_node), GFP_ATOMIC);
+ if (!node) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ /* Duplicate and Update */
+ memcpy(node, org, sizeof(struct avc_node));
switch (event) {
case AVC_CALLBACK_GRANT:
node->ae.avd.allowed |= perms;
break;
@@ -708,40 +767,41 @@
case AVC_CALLBACK_AUDITDENY_DISABLE:
node->ae.avd.auditdeny &= ~perms;
break;
}
+ list_replace_rcu(&org->list,&node->list);
+out:
+ spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag);
+
+ return rc;
}
static int avc_update_cache(u32 event, u32 ssid, u32 tsid,
u16 tclass, u32 perms)
{
struct avc_node *node;
int i;
- unsigned long flags;
- spin_lock_irqsave(&avc_lock,flags);
+ rcu_read_lock();
if (ssid == SECSID_WILD || tsid == SECSID_WILD) {
/* apply to all matching nodes */
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- for (node = avc_cache.slots[i]; node;
- node = node->next) {
+ list_for_each_entry_rcu(node, &avc_cache.slots[i], list) {
if (avc_sidcmp(ssid, node->ae.ssid) &&
avc_sidcmp(tsid, node->ae.tsid) &&
- tclass == node->ae.tclass) {
- avc_update_node(event,node,perms);
+ tclass == node->ae.tclass ) {
+ avc_update_node(event, perms, node->ae.ssid
+ ,node->ae.tsid, node->ae.tclass);
}
}
}
} else {
/* apply to one node */
- node = avc_search_node(ssid, tsid, tclass, NULL);
- if (node) {
- avc_update_node(event,node,perms);
- }
+ avc_update_node(event, perms, ssid, tsid, tclass);
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
return 0;
}
@@ -751,9 +811,8 @@
{
struct avc_callback_node *c;
u32 tretained = 0, cretained = 0;
int rc = 0;
- unsigned long flags;
/*
* try_revoke only removes permissions from the cache
* state if they are not retained by the object manager.
@@ -786,12 +845,9 @@
avc_update_cache(event,ssid,tsid,tclass,perms);
*out_retained = tretained;
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -856,34 +912,21 @@
int avc_ss_reset(u32 seqno)
{
struct avc_callback_node *c;
int i, rc = 0;
- struct avc_node *node, *tmp;
- unsigned long flags;
+ unsigned long flag;
+ struct avc_node *node;
avc_hash_eval("reset");
- spin_lock_irqsave(&avc_lock,flags);
-
for (i = 0; i < AVC_CACHE_SLOTS; i++) {
- node = avc_cache.slots[i];
- while (node) {
- tmp = node;
- node = node->next;
- tmp->ae.ssid = tmp->ae.tsid = SECSID_NULL;
- tmp->ae.tclass = SECCLASS_NULL;
- tmp->ae.avd.allowed = tmp->ae.avd.decided = 0;
- tmp->ae.avd.auditallow = tmp->ae.avd.auditdeny = 0;
- tmp->ae.used = 0;
- tmp->next = avc_node_freelist;
- avc_node_freelist = tmp;
- avc_cache.active_nodes--;
+ spin_lock_irqsave(&avc_cache.slots_lock[i], flag);
+ list_for_each_entry(node, &avc_cache.slots[i], list){
+ list_del_rcu(&node->list);
+ call_rcu(&node->rhead, avc_node_free);
}
- avc_cache.slots[i] = NULL;
+ spin_unlock_irqrestore(&avc_cache.slots_lock[i], flag);
}
- avc_cache.lru_hint = 0;
-
- spin_unlock_irqrestore(&avc_lock,flags);
for (i = 0; i < AVC_NSTATS; i++)
avc_cache_stats[i] = 0;
@@ -895,12 +938,9 @@
goto out;
}
}
- spin_lock_irqsave(&avc_lock,flags);
- if (seqno > avc_cache.latest_notif)
- avc_cache.latest_notif = seqno;
- spin_unlock_irqrestore(&avc_lock,flags);
+ avc_latest_notif_update(seqno, 0);
out:
return rc;
}
@@ -949,9 +989,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @avd: access vector decisions
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -968,72 +1008,44 @@
int avc_has_perm_noaudit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct avc_entry_ref *aeref, struct av_decision *avd)
{
- struct avc_entry *ae;
+ struct avc_node *node;
+ struct avc_entry entry, *p_ae;
int rc = 0;
- unsigned long flags;
- struct avc_entry entry;
u32 denied;
- struct avc_entry_ref ref;
- if (!aeref) {
- avc_entry_ref_init(&ref);
- aeref = &ref;
- }
-
- spin_lock_irqsave(&avc_lock, flags);
+ rcu_read_lock();
avc_cache_stats_incr(AVC_ENTRY_LOOKUPS);
- ae = aeref->ae;
- if (ae) {
- if (ae->ssid == ssid &&
- ae->tsid == tsid &&
- ae->tclass == tclass &&
- ((ae->avd.decided & requested) == requested)) {
- avc_cache_stats_incr(AVC_ENTRY_HITS);
- ae->used = 1;
- } else {
- avc_cache_stats_incr(AVC_ENTRY_DISCARDS);
- ae = NULL;
- }
- }
- if (!ae) {
- avc_cache_stats_incr(AVC_ENTRY_MISSES);
- rc = avc_lookup(ssid, tsid, tclass, requested, aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
- if (rc)
- goto out;
- spin_lock_irqsave(&avc_lock, flags);
- rc = avc_insert(ssid,tsid,tclass,&entry,aeref);
- if (rc) {
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
- }
- }
- ae = aeref->ae;
+ node = avc_lookup(ssid, tsid, tclass, requested);
+ if (!node) {
+ rcu_read_unlock();
+ rc = security_compute_av(ssid,tsid,tclass,requested,&entry.avd);
+ if (rc)
+ goto out;
+ rcu_read_lock();
+ node = avc_insert(ssid,tsid,tclass,&entry);
}
+ p_ae = node ? &node->ae : &entry;
+
if (avd)
- memcpy(avd, &ae->avd, sizeof(*avd));
+ memcpy(avd, &p_ae->avd, sizeof(*avd));
- denied = requested & ~(ae->avd.allowed);
+ denied = requested & ~(p_ae->avd.allowed);
if (!requested || denied) {
if (selinux_enforcing) {
- spin_unlock_irqrestore(&avc_lock,flags);
rc = -EACCES;
- goto out;
} else {
- ae->avd.allowed |= requested;
- spin_unlock_irqrestore(&avc_lock,flags);
- goto out;
+ if (node)
+ avc_update_node(AVC_CALLBACK_GRANT,requested
+ ,ssid,tsid,tclass);
}
}
- spin_unlock_irqrestore(&avc_lock,flags);
+ rcu_read_unlock();
out:
return rc;
}
@@ -1042,9 +1054,9 @@
* @ssid: source security identifier
* @tsid: target security identifier
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
- * @aeref: AVC entry reference
+ * @aeref: AVC entry reference(not in use)
* @auditdata: auxiliary audit data
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
@@ -1061,8 +1073,8 @@
{
struct av_decision avd;
int rc;
- rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, aeref, &avd);
+ rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, NULL, &avd);
avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
return rc;
}
diff -rNU4 linux-2.6.8.1/security/selinux/include/avc.h linux-2.6.8.1.rcu/security/selinux/include/avc.h
--- linux-2.6.8.1/security/selinux/include/avc.h 2004-08-14 19:54:51.000000000 +0900
+++ linux-2.6.8.1.rcu/security/selinux/include/avc.h 2004-08-20 20:40:50.000000000 +0900
@@ -118,13 +118,11 @@
*/
void __init avc_init(void);
-int avc_lookup(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct avc_entry_ref *aeref);
+struct avc_node *avc_lookup(u32 ssid, u32 tsid, u16 tclass, u32 requested);
-int avc_insert(u32 ssid, u32 tsid, u16 tclass,
- struct avc_entry *ae, struct avc_entry_ref *out_aeref);
+struct avc_node *avc_insert(u32 ssid, u32 tsid, u16 tclass, struct avc_entry *ae);
void avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd, int result, struct avc_audit_data *auditdata);
[-- Attachment #3: atomic_inc_return-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 2739 bytes --]
--- linux-2.6.8.1/include/asm-arm/atomic.h 2004-08-14 19:54:50.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-arm/atomic.h 2004-08-24 19:31:56.000000000 +0900
@@ -194,8 +194,10 @@
#define atomic_dec(v) atomic_sub(1, v)
#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
+#define atomic_inc_return(v) (atomic_add_return(1, v))
+#define atomic_dec_return(v) (atomic_sub_return(1, v))
#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
/* Atomic operations are already serializing on ARM */
--- linux-2.6.8.1/include/asm-i386/atomic.h 2004-08-14 19:55:09.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-i386/atomic.h 2004-08-25 11:57:08.000000000 +0900
@@ -175,8 +175,34 @@
:"ir" (i), "m" (v->counter) : "memory");
return c;
}
+/**
+ * atomic_add_return - add and return
+ * @v: pointer of type atomic_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @v and returns @i + @v
+ */
+static __inline__ int atomic_add_return(int i, atomic_t *v)
+{
+ /* NOTICE: This function can be used on i486+ */
+ int __i = i;
+ __asm__ __volatile__(
+ LOCK "xaddl %0, %1;"
+ :"=r"(i)
+ :"m"(v->counter), "0"(i));
+ return i + __i;
+}
+
+static __inline__ int atomic_sub_return(int i, atomic_t *v)
+{
+ return atomic_add_return(-i,v);
+}
+
+#define atomic_inc_return(v) (atomic_add_return(1,v))
+#define atomic_dec_return(v) (atomic_sub_return(1,v))
+
/* These are x86-specific, used by some header files */
#define atomic_clear_mask(mask, addr) \
__asm__ __volatile__(LOCK "andl %0,%1" \
: : "r" (~(mask)),"m" (*addr) : "memory")
--- linux-2.6.8.1/include/asm-x86_64/atomic.h 2004-08-14 19:56:23.000000000 +0900
+++ linux-2.6.8.1.rcu/include/asm-x86_64/atomic.h 2004-08-25 11:57:36.000000000 +0900
@@ -177,8 +177,33 @@
:"ir" (i), "m" (v->counter) : "memory");
return c;
}
+/**
+ * atomic_add_return - add and return
+ * @v: pointer of type atomic_t
+ * @i: integer value to add
+ *
+ * Atomically adds @i to @v and returns @i + @v
+ */
+static __inline__ int atomic_add_return(int i, atomic_t *v)
+{
+ int __i = i;
+ __asm__ __volatile__(
+ LOCK "xaddl %0, %1;"
+ :"=r"(i)
+ :"m"(v->counter), "0"(i));
+ return i + __i;
+}
+
+static __inline__ int atomic_sub_return(int i, atomic_t *v)
+{
+ return atomic_add_return(-i,v);
+}
+
+#define atomic_inc_return(v) (atomic_add_return(1,v))
+#define atomic_dec_return(v) (atomic_sub_return(1,v))
+
/* These are x86-specific, used by some header files */
#define atomic_clear_mask(mask, addr) \
__asm__ __volatile__(LOCK "andl %0,%1" \
: : "r" (~(mask)),"m" (*addr) : "memory")
[-- Attachment #4: list_replace_rcu-2.6.8.1.patch --]
[-- Type: application/octet-stream, Size: 845 bytes --]
--- linux-2.6.8.1/include/linux/list.h 2004-08-14 19:55:33.000000000 +0900
+++ linux-2.6.8.1.rcu/include/linux/list.h 2004-08-20 18:04:10.000000000 +0900
@@ -194,8 +194,23 @@
__list_del(entry->prev, entry->next);
entry->prev = LIST_POISON2;
}
+/*
+ * list_replace_rcu - replace old entry by new onw from list
+ * @old : the element to be replaced from the list.
+ * @new : the new element to insert to the list.
+ *
+ * The old entry will be replaced to the new entry atomically.
+ */
+static inline void list_replace_rcu(struct list_head *old, struct list_head *new){
+ new->next = old->next;
+ new->prev = old->prev;
+ smp_wmb();
+ new->next->prev = new;
+ new->prev->next = new;
+}
+
/**
* list_del_init - deletes entry from list and reinitialize it.
* @entry: the element to delete from the list.
*/
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-30 11:17 ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
@ 2004-08-30 15:35 ` Stephen Smalley
2004-08-30 16:13 ` Paul E. McKenney
2004-08-31 15:33 ` James Morris
1 sibling, 1 reply; 31+ messages in thread
From: Stephen Smalley @ 2004-08-30 15:35 UTC (permalink / raw)
To: Kaigai Kohei
Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris, Paul E. McKenney
On Mon, 2004-08-30 at 07:17, Kaigai Kohei wrote:
> I fixed the take-3 patch according to your and Paul's suggestions.
>
> The attached take-4 patches replace the avc_lock in security/selinux/avc.c
> by the lock-less read access with RCU.
Thanks. Was there a reason you didn't move the rcu_read_lock call after
the avc_insert call per the suggestion of Paul McKenney, or was that
just an oversight? No need to send a new patch, just ack whether or not
you meant to switch the order there.
--
Stephen Smalley <sds@epoch.ncsc.mil>
National Security Agency
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-30 15:35 ` Stephen Smalley
@ 2004-08-30 16:13 ` Paul E. McKenney
2004-08-31 4:33 ` Kaigai Kohei
0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2004-08-30 16:13 UTC (permalink / raw)
To: Stephen Smalley
Cc: Kaigai Kohei, SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Mon, Aug 30, 2004 at 11:35:19AM -0400, Stephen Smalley wrote:
> On Mon, 2004-08-30 at 07:17, Kaigai Kohei wrote:
> > I fixed the take-3 patch according to your and Paul's suggestions.
> >
> > The attached take-4 patches replace the avc_lock in security/selinux/avc.c
> > by the lock-less read access with RCU.
>
> Thanks. Was there a reason you didn't move the rcu_read_lock call after
> the avc_insert call per the suggestion of Paul McKenney, or was that
> just an oversight? No need to send a new patch, just ack whether or not
> you meant to switch the order there.
One reason might be because I called it out in the text of my message,
but failed to put it in my patch. :-/ Of course, if there is some reason
why moving the rcu_read_lock() call is bad, I would like to know for
my own education.
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-30 16:13 ` Paul E. McKenney
@ 2004-08-31 4:33 ` Kaigai Kohei
2004-08-31 16:20 ` Paul E. McKenney
0 siblings, 1 reply; 31+ messages in thread
From: Kaigai Kohei @ 2004-08-31 4:33 UTC (permalink / raw)
To: paulmck, Stephen Smalley
Cc: SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
Hi Stephen, Paul, thanks for your comments.
> > > The attached take-4 patches replace the avc_lock in security/selinux/avc.c
> > > by the lock-less read access with RCU.
> >
> > Thanks. Was there a reason you didn't move the rcu_read_lock call after
> > the avc_insert call per the suggestion of Paul McKenney, or was that
> > just an oversight? No need to send a new patch, just ack whether or not
> > you meant to switch the order there.
>
> One reason might be because I called it out in the text of my message,
> but failed to put it in my patch. :-/ Of course, if there is some reason
> why moving the rcu_read_lock() call is bad, I would like to know for
> my own education.
In my understanding, the issue is the Paul's suggestion as follows:
> So I do not believe that avc_insert() needs rcu_read_lock().
> Unless I am missing something, the rcu_read_lock() acquired
> in avc_has_perm_noaudit() should be moved after the call to
> avc_insert().
I don't move the rcu_read_lock() because of the possibility of preemption
between the spin_unlock_irqrestore() in avc_insert() and the rcu_read_lock()
which may be inserted after avc_insert() in avc_has_perm_noaudit().
When it's returning from avc_insert(), we can't ignore the possibility
that execution is preempted in this timing.
Therefore, I didn't move rcu_read_lock() in spite of its redundancy.
If rcu_read_lock() was moved after avc_insert()
[ in avc_insert() ]----------------------------
:
spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
:
}
list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
found:
spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag); ---------
// +--> including preempt_enable() |
It has the danger of releasing the 'node'. V
} preemption
out: is
return node; possible
}
-----------------------------------------------
Because it's legal to hold the rcu_read_lock() twice as Paul says,
we should do it for safety.
It's the reason that I didn't move rcu_read_lock() at this point,
and it might be lack of my explanation, sorry.
Thanks.
--------
Kai Gai <kaigai@ak.jp.nec.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-30 11:17 ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
2004-08-30 15:35 ` Stephen Smalley
@ 2004-08-31 15:33 ` James Morris
1 sibling, 0 replies; 31+ messages in thread
From: James Morris @ 2004-08-31 15:33 UTC (permalink / raw)
To: Kaigai Kohei
Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng), Rik van Riel
I've performed some benchmarking and profiling of these patches, comparing
the vanilla kernel with SELinux disabled, the same with SELinux enabled,
and then SELinux enabled with the RCU patches.
The benchmarks I used are apachebench, lmbench, dbench and webstone.
It seems that the patches are providing good scalability from around four
CPUs and up, although baseline performance has not been significantly
improved.
This benchmarking also confirms some suspected network-specific
performance issues, which need addressing separately. e.g. port and node
caches need to be implemented.
Results + before/after profiling below.
- James
--
James Morris
<jmorris@redhat.com>
test system : 8 way 900 MHz Xeon, 4G RAM, 3 x gigabit interfaces.
orig = vanilla 2.6.8.1 kernel
nec1 = 2.6.8.1 kernel with nec patches
disabled = SELinux is disabled
enabled = SELinux is enabled (strict policy)
----------------------------------------------------------------------------
apachebench
----------------------------------------------------------------------------
Configuration: 5k file 100k times via localhost.
Transfer rate in MB/s:
# Clients 1 2 4 8
-------------------------------------------------
orig disabled : 12.269 22.339 23.610 20.867
orig enabled : 9.319 16.594 17.532 16.322
nec1 enabled : 9.783 17.573 18.373 17.006
----------------------------------------------------------------------------
dbench
----------------------------------------------------------------------------
Configuration: standard client.txt, ext2, average of 10 runs each.
Througput in MB/s:
# Clients 1 2 3 4 5 6 7 8
----------------------------------------------------------------------
orig disabled : 156.8 279.2 383.7 470.0 547.9 610.5 651.7 713.7
orig enabled : 146.2 260.0 359.1 418.8 450.3 503.6 489.5 446.2
nec1 enabled : 146.1 261.7 356.9 451.1 515.3 596.5 629.4 684.2
----------------------------------------------------------------------------
Webstone 2.5
----------------------------------------------------------------------------
Configuration:
- two physical clients, each with private gigabit to server.
- standard fileset
- two mins per run, one iteration
Server throughput in Mb/s:
# Clients 2 4 8 16 32
---------------------------------------------------------
orig disabled : 66.55 302.40 380.16 421.40 434.08
orig enabled : 64.96 276.34 358.87 388.73 389.43
nec1 enabled : 64.33 282.50 367.29 403.66 409.65
---------------------------------------------------------
----------------------------------------------------------------------------
lmbench 3.0
----------------------------------------------------------------------------
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
xeon4.lab orig--disable i686-pc-linux-gnu 898 32 1
xeon4.lab orig--enable i686-pc-linux-gnu 898 32 1
xeon4.lab nec1--enable i686-pc-linux-gnu 898 32 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
xeon4.lab orig--disable 898 0.24 0.40 5.13 5.90 25.2 0.99 3.98 185. 1006 2853
xeon4.lab orig--enable 898 0.24 0.81 8.62 9.91 23.1 0.99 3.97 189. 1038 3027
xeon4.lab nec1--enable 898 0.24 0.75 8.54 9.65 23.1 0.99 4.02 187. 987. 3073
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
xeon4.lab orig--disable 11.3 21.5 20.1 17.3 14.0 11.4 10.8
xeon4.lab orig--enable 21.0 28.2 26.4 17.5 21.3 8.92000 17.4
xeon4.lab nec1--enable 11.4 27.2 2.5600 13.1 24.0 7.62000 11.8
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
xeon4.lab orig--disable 11.3 26.2 25.6 52.2 95.
xeon4.lab orig--enable 21.0 26.5 45.1 97.2 148.
xeon4.lab nec1--enable 11.4 27.6 45.8 71.5 142.
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
xeon4.lab orig--disable 31.5 23.5 86.3 48.1 1778.0 0.228 2.03950 16.9
xeon4.lab orig--enable 55.7 36.0 111.7 57.5 1668.0 0.288 1.99780 15.8
xeon4.lab nec1--enable 55.3 35.0 112.8 56.3 1715.0 0.160 2.03220 15.6
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
xeon4.lab orig--disable 519. 163. 181. 226.9 256.9 134.3 128.1 257. 199.7
xeon4.lab orig--enable 513. 163. 122. 226.4 256.9 134.3 129.5 256. 197.5
xeon4.lab nec1--enable 504. 294. 121. 226.7 257.2 134.1 129.7 257. 200.4
----------------------------------------------------------------------------
Profiling
----------------------------------------------------------------------------
apachebench (8 clients, 100k iter, 5k file, localhost):
orig-enable:
253368 total 0.1025
187668 default_idle 2932.3125
6024 avc_has_perm_noaudit 12.5500
4124 security_port_sid 25.7750
3784 finish_task_switch 29.5625
2899 .text.lock.avc 14.7157
1717 tcp_v4_rcv 0.7107
1462 __wake_up 18.2750
1032 avc_audit 0.3395
1023 ip_queue_xmit 0.6208
984 selinux_socket_sock_rcv_skb 1.3667
927 tcp_transmit_skb 0.5315
788 __kfree_skb 2.3452
750 __copy_user_intel 4.2614
680 security_node_sid 2.6562
657 netif_rx 1.4159
653 tcp_recvmsg 0.3549
644 alloc_skb 2.5156
582 ip_finish_output2 1.1734
563 __d_lookup 1.8520
541 tcp_rcv_established 0.2348
539 tcp_poll 1.2957
524 sel_netif_lookup 2.0469
523 avc_has_perm 4.2520
517 __mod_timer 1.5387
513 atomic_dec_and_lock 7.7727
506 sysenter_past_esp 4.4779
500 skb_release_data 3.4722
493 dev_queue_xmit 0.6556
490 kmem_cache_alloc 6.1250
482 lock_sock 6.0250
474 try_to_wake_up 0.6733
461 __kmalloc 3.2014
425 fget 6.6406
413 tcp_clean_rtx_queue 0.4232
413 __copy_to_user_ll 3.6875
408 sel_netif_find 6.3750
405 .text.lock.sock 2.6299
403 skb_copy_bits 0.6458
398 kmem_cache_free 4.1458
nec1-enable:
232223 total 0.0940
177690 default_idle 2776.4062
3729 finish_task_switch 29.1328
3656 security_port_sid 22.8500
1545 tcp_v4_rcv 0.6395
1422 __wake_up 17.7750
979 selinux_socket_sock_rcv_skb 1.3597
924 ip_queue_xmit 0.5607
903 tcp_transmit_skb 0.5178
863 avc_lookup 4.4948
752 __copy_user_intel 4.2727
722 memcpy 11.2812
720 __kfree_skb 2.1429
714 avc_has_perm_noaudit 2.2313
676 tcp_recvmsg 0.3674
632 netif_rx 1.3621
598 security_node_sid 2.3359
595 alloc_skb 2.3242
547 __mod_timer 1.6280
507 __d_lookup 1.6678
489 sysenter_past_esp 4.3274
485 skb_release_data 3.3681
484 tcp_rcv_established 0.2101
480 tcp_poll 1.1538
474 ip_finish_output2 0.9556
468 try_to_wake_up 0.6648
467 atomic_dec_and_lock 7.0758
431 avc_has_perm 3.5328
420 tcp_create_openreq_child 0.3125
415 lock_sock 5.1875
414 __copy_to_user_ll 3.6964
410 skb_copy_bits 0.6571
410 sel_netif_lookup 1.6016
407 fget 6.3594
398 .text.lock.sock 2.5844
397 tcp_clean_rtx_queue 0.4068
397 dev_queue_xmit 0.5279
392 do_gettimeofday 1.8846
362 sel_netif_find 5.6562
362 selinux_ip_postroute_last 0.5518
dbench (8 clients, 10 iter):
orig-enable:
432070 total 0.1749
125306 .text.lock.avc 636.0711
48451 __copy_user_intel 275.2898
34015 __copy_user_zeroing_intel 193.2670
30862 avc_has_perm_noaudit 64.2958
27203 __copy_to_user_ll 242.8839
26618 __copy_from_user_ll 237.6607
21745 avc_audit 7.1530
18133 default_idle 283.3281
6536 __d_lookup 21.5000
4034 link_path_walk 1.0962
3782 find_get_page 47.2750
3280 atomic_dec_and_lock 49.6970
2982 do_generic_mapping_read 2.8673
2931 kmap_atomic 20.3542
2504 __block_prepare_write 2.3712
2483 path_lookup 6.7473
1860 inode_has_perm 12.9167
1847 buffered_rmqueue 3.3952
1719 generic_file_aio_write_nolock 0.5165
1713 file_move 26.7656
1441 avc_has_perm 11.7154
1258 current_kernel_time 15.7250
1225 sysenter_past_esp 10.8407
1198 generic_commit_write 8.3194
1165 __find_get_block 6.0677
1123 put_page 7.7986
1097 wake_up_page 13.7125
1092 page_address 6.2045
1074 find_lock_page 6.1023
1039 selinux_file_permission 2.5975
1030 __block_commit_write 4.9519
952 kmem_cache_free 9.9167
931 kmem_cache_alloc 11.6375
916 do_page_cache_readahead 2.2019
899 in_group_p 7.0234
855 block_invalidatepage 3.5625
845 file_read_actor 3.5208
800 selinux_inode_permission 3.8462
782 find_get_pages 6.9821
nec1-enable:
275604 total 0.1115
55272 __copy_user_intel 314.0455
36716 __copy_user_zeroing_intel 208.6136
29933 __copy_from_user_ll 267.2589
29544 __copy_to_user_ll 263.7857
25789 default_idle 402.9531
4419 __d_lookup 14.5362
3636 find_get_page 45.4500
2800 do_generic_mapping_read 2.6923
2788 avc_has_perm_noaudit 8.7125
2768 link_path_walk 0.7522
2766 __block_prepare_write 2.6193
2706 path_lookup 7.3533
2549 kmap_atomic 17.7014
2282 atomic_dec_and_lock 34.5758
1935 buffered_rmqueue 3.5570
1745 generic_file_aio_write_nolock 0.5243
1614 inode_has_perm 11.2083
1344 generic_commit_write 9.3333
1238 current_kernel_time 15.4750
1190 sysenter_past_esp 10.5310
1184 file_move 18.5000
1172 __find_get_block 6.1042
1141 wake_up_page 14.2625
1098 avc_lookup 5.7188
1083 __block_commit_write 5.2067
1033 put_page 7.1736
1027 in_group_p 8.0234
1018 find_lock_page 5.7841
992 page_address 5.6364
990 selinux_file_permission 2.4750
930 memcpy 14.5312
888 block_invalidatepage 3.7000
855 file_read_actor 3.5625
846 kmem_cache_free 8.8125
828 free_hot_cold_page 2.8750
809 kmem_cache_alloc 10.1125
798 find_get_pages 7.1250
793 ext2_free_branches 1.5988
774 avc_has_perm 6.3443
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux)
2004-08-31 4:33 ` Kaigai Kohei
@ 2004-08-31 16:20 ` Paul E. McKenney
0 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2004-08-31 16:20 UTC (permalink / raw)
To: Kaigai Kohei
Cc: Stephen Smalley, SELinux-ML(Eng), Linux Kernel ML(Eng), James Morris
On Tue, Aug 31, 2004 at 01:33:33PM +0900, Kaigai Kohei wrote:
> Hi Stephen, Paul, thanks for your comments.
>
> > > > The attached take-4 patches replace the avc_lock in security/selinux/avc.c
> > > > by the lock-less read access with RCU.
> > >
> > > Thanks. Was there a reason you didn't move the rcu_read_lock call after
> > > the avc_insert call per the suggestion of Paul McKenney, or was that
> > > just an oversight? No need to send a new patch, just ack whether or not
> > > you meant to switch the order there.
> >
> > One reason might be because I called it out in the text of my message,
> > but failed to put it in my patch. :-/ Of course, if there is some reason
> > why moving the rcu_read_lock() call is bad, I would like to know for
> > my own education.
>
> In my understanding, the issue is the Paul's suggestion as follows:
>
> > So I do not believe that avc_insert() needs rcu_read_lock().
> > Unless I am missing something, the rcu_read_lock() acquired
> > in avc_has_perm_noaudit() should be moved after the call to
> > avc_insert().
>
> I don't move the rcu_read_lock() because of the possibility of preemption
> between the spin_unlock_irqrestore() in avc_insert() and the rcu_read_lock()
> which may be inserted after avc_insert() in avc_has_perm_noaudit().
>
> When it's returning from avc_insert(), we can't ignore the possibility
> that execution is preempted in this timing.
> Therefore, I didn't move rcu_read_lock() in spite of its redundancy.
>
> If rcu_read_lock() was moved after avc_insert()
> [ in avc_insert() ]----------------------------
> :
> spin_lock_irqsave(&avc_cache.slots_lock[hvalue], flag);
> list_for_each_entry(pos, &avc_cache.slots[hvalue], list) {
> :
> }
> list_add_rcu(&node->list, &avc_cache.slots[hvalue]);
> found:
> spin_unlock_irqrestore(&avc_cache.slots_lock[hvalue], flag); ---------
> // +--> including preempt_enable() |
> It has the danger of releasing the 'node'. V
> } preemption
> out: is
> return node; possible
> }
> -----------------------------------------------
> Because it's legal to hold the rcu_read_lock() twice as Paul says,
> we should do it for safety.
> It's the reason that I didn't move rcu_read_lock() at this point,
> and it might be lack of my explanation, sorry.
Works for me! Might be worth adding a comment, though.
Thanx, Paul
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2004-08-31 16:25 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-16 9:33 RCU issue with SELinux (Re: SELINUX performance issues) Kaigai Kohei
2004-08-16 15:19 ` James Morris
2004-08-20 13:36 ` Kaigai Kohei
2004-08-20 14:53 ` James Morris
2004-08-24 7:27 ` Kaigai Kohei
2004-08-24 13:24 ` James Morris
2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 18:31 ` James Morris
2004-08-25 9:52 ` [PATCH]atomic_inc_return() for i386/x86_64 (Re: RCU issue with SELinux) Kaigai Kohei
2004-08-20 17:31 ` RCU issue with SELinux (Re: SELINUX performance issues) Luke Kenneth Casson Leighton
2004-08-20 18:15 ` James Morris
2004-08-20 20:19 ` Paul E. McKenney
2004-08-20 20:35 ` James Morris
2004-08-24 7:27 ` Kaigai Kohei
[not found] ` <1093014789.16585.186.camel@moss-spartans.epoch.ncsc.mil>
2004-08-24 7:25 ` Kaigai Kohei
2004-08-24 15:37 ` Stephen Smalley
2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 15:50 ` Stephen Smalley
2004-08-25 16:11 ` Stephen Smalley
2004-08-26 7:53 ` Kaigai Kohei
2004-08-26 13:24 ` Stephen Smalley
2004-08-27 11:07 ` Kaigai Kohei
2004-08-30 11:17 ` [PATCH]SELinux performance improvement by RCU (Re: RCU issue with SELinux) Kaigai Kohei
2004-08-30 15:35 ` Stephen Smalley
2004-08-30 16:13 ` Paul E. McKenney
2004-08-31 4:33 ` Kaigai Kohei
2004-08-31 16:20 ` Paul E. McKenney
2004-08-31 15:33 ` James Morris
2004-08-24 23:02 ` RCU issue with SELinux (Re: SELINUX performance issues) Paul E. McKenney
2004-08-25 9:51 ` Kaigai Kohei
2004-08-25 17:34 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).