All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
@ 2014-02-08 21:06 Andy Lutomirski
  2014-02-10 16:57 ` Oleg Nesterov
  2014-02-18 20:17   ` Eric Paris
  0 siblings, 2 replies; 13+ messages in thread
From: Andy Lutomirski @ 2014-02-08 21:06 UTC (permalink / raw)
  To: linux-audit, linux-kernel
  Cc: Andy Lutomirski, Andi Kleen, Oleg Nesterov, Steve Grubb, Eric Paris

This toggles TIF_SYSCALL_AUDIT as needed when rules change instead
of leaving it set whenever rules might be set in the future.  This
reduces syscall latency from >60ns to closer to 40ns on my laptop.

This code is a little bit tricky.  It's not safe to turn
TIF_SYSCALL_AUDIT off during a syscall that is being audited.  So,
with this change, the only thing that ever turns TIF_SYSCALL_AUDIT
off after a process has been created is __audit_syscall_exit.

This has the somewhat odd side-effect that, unless there's a
'task,never' rule, every task's first syscall will use the slow
path.  This should be more or less unnoticable (what's another
20-40ns at task creation time between friends?).

There is a user-visible effect of this change: if there are no audit
rules, then events like AVC denials will not result in a syscall
audit record being written.  I'm happy to accept this minor loss in
debuggability in exchange for a massive speedup of all system calls
on default distribution configurations.

A better solution would be to ditch the syscall entry hook entirely
and use the syscall_get_xyz calls as needed to fill in the context.
This could be done on top of this patch, but it would be more code.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

Changes from v2 (actually v2.1):
 - Use for_each_process_thread instead of do_each_thread :)
 - Turn off TIF_SYSCALL_AUDIT lazily, thus hopefully avoiding all of
   the nasty cases that Oleg noticed.

Changes from v1:
 - For new tasks, set flags in a new audit_sync_flags callback instead of
   in audit_alloc (thanks, Oleg).
 - Rework locking.
 - Use irqsave/irqrestore to avoid having to think about who else might have
   taken spinlocks.

 include/linux/audit.h |  8 ++++++--
 kernel/auditfilter.c  |  4 ++--
 kernel/auditsc.c      | 57 +++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index aa865a9..ab00ffb 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -298,7 +298,8 @@ static inline void audit_mmap_fd(int fd, int flags)
 		__audit_mmap_fd(fd, flags);
 }
 
-extern int audit_n_rules;
+extern void audit_inc_n_rules(void);
+extern void audit_dec_n_rules(void);
 extern int audit_signals;
 #else /* CONFIG_AUDITSYSCALL */
 static inline int audit_alloc(struct task_struct *task)
@@ -404,7 +405,10 @@ static inline void audit_mmap_fd(int fd, int flags)
 { }
 static inline void audit_ptrace(struct task_struct *t)
 { }
-#define audit_n_rules 0
+static inline void audit_inc_n_rules(void)
+{ }
+static inline void audit_dec_n_rules(void)
+{ }
 #define audit_signals 0
 #endif /* CONFIG_AUDITSYSCALL */
 
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 14a78cc..4c7054b 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -903,7 +903,7 @@ static inline int audit_add_rule(struct audit_entry *entry)
 	}
 #ifdef CONFIG_AUDITSYSCALL
 	if (!dont_count)
-		audit_n_rules++;
+		audit_inc_n_rules();
 
 	if (!audit_match_signal(entry))
 		audit_signals++;
@@ -955,7 +955,7 @@ static inline int audit_del_rule(struct audit_entry *entry)
 
 #ifdef CONFIG_AUDITSYSCALL
 	if (!dont_count)
-		audit_n_rules--;
+		audit_dec_n_rules();
 
 	if (!audit_match_signal(entry))
 		audit_signals--;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 7aef2f4..d76947c 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -80,7 +80,7 @@
 #define MAX_EXECVE_AUDIT_LEN 7500
 
 /* number of audit rules */
-int audit_n_rules;
+static int audit_n_rules;
 
 /* determines whether we collect data for signals sent */
 int audit_signals;
@@ -911,6 +911,39 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
 	return context;
 }
 
+void audit_inc_n_rules()
+{
+	struct task_struct *p, *t;
+
+	read_lock(&tasklist_lock);
+	audit_n_rules++;
+	smp_wmb();
+	if (audit_n_rules == 1) {
+		/*
+		 * We now have a rule; we need to hook syscall entry.
+		 */
+		for_each_process_thread(p, t) {
+			if (t->audit_context)
+				set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
+		}
+	}
+	read_unlock(&tasklist_lock);
+}
+
+void audit_dec_n_rules()
+{
+	read_lock(&tasklist_lock);
+	--audit_n_rules;
+	BUG_ON(audit_n_rules < 0);
+
+	/*
+	 * If audit_n_rules == 0, then __audit_syscall_exit will clear
+	 * TIF_SYSCALL_AUDIT.
+	 */
+
+	read_unlock(&tasklist_lock);
+}
+
 /**
  * audit_alloc - allocate an audit context block for a task
  * @tsk: task
@@ -938,11 +971,12 @@ int audit_alloc(struct task_struct *tsk)
 	if (!(context = audit_alloc_context(state))) {
 		kfree(key);
 		audit_log_lost("out of memory in audit_alloc");
+		clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
 		return -ENOMEM;
 	}
 	context->filterkey = key;
 
-	tsk->audit_context  = context;
+	tsk->audit_context = context;
 	set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
 	return 0;
 }
@@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long return_code)
 		context->filterkey = NULL;
 	}
 	tsk->audit_context = context;
+
+	if (ACCESS_ONCE(audit_n_rules) == 0) {
+		/*
+		 * Either this is the very first syscall by this process or
+		 * audit_dec_n_rules recently set audit_n_rules to zero.
+		 */
+		smp_rmb();
+
+		/* audit_inc_n_rules could increment audit_n_rules here... */
+
+		clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+
+		smp_rmb();
+
+		if (ACCESS_ONCE(audit_n_rules) != 0) {
+			/* ... if so, set TIF_SYSCALL_AUDIT again. */
+			set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
+		}
+	}
 }
 
 static inline void handle_one(const struct inode *inode)
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-08 21:06 [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules Andy Lutomirski
@ 2014-02-10 16:57 ` Oleg Nesterov
  2014-02-10 17:29   ` Andy Lutomirski
  2014-02-18 20:17   ` Eric Paris
  1 sibling, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2014-02-10 16:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-audit, linux-kernel, Andi Kleen, Steve Grubb, Eric Paris

On 02/08, Andy Lutomirski wrote:
>
> +void audit_inc_n_rules()
> +{
> +	struct task_struct *p, *t;
> +
> +	read_lock(&tasklist_lock);
> +	audit_n_rules++;
> +	smp_wmb();
> +	if (audit_n_rules == 1) {
> +		/*
> +		 * We now have a rule; we need to hook syscall entry.
> +		 */
> +		for_each_process_thread(p, t) {
> +			if (t->audit_context)
> +				set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
> +		}
> +	}
> +	read_unlock(&tasklist_lock);
> +}
> +
> +void audit_dec_n_rules()
> +{
> +	read_lock(&tasklist_lock);
> +	--audit_n_rules;
> +	BUG_ON(audit_n_rules < 0);
> +
> +	/*
> +	 * If audit_n_rules == 0, then __audit_syscall_exit will clear
> +	 * TIF_SYSCALL_AUDIT.
> +	 */
> +
> +	read_unlock(&tasklist_lock);
> +}

To be honest, I do not understand why _dec_ takes tasklist_lock...
And why _inc_ increments audit_n_rules under tasklist.

> @@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long return_code)
>  		context->filterkey = NULL;
>  	}
>  	tsk->audit_context = context;
> +
> +	if (ACCESS_ONCE(audit_n_rules) == 0) {
> +		/*
> +		 * Either this is the very first syscall by this process or
> +		 * audit_dec_n_rules recently set audit_n_rules to zero.
> +		 */
> +		smp_rmb();

rmb() looks wrong, we need mb() to serialize ACCESS_ONCE() and
clear_tsk_thread_flag().

But, otoh, I think we do not need any barrier at all, we can rely on
control dependency. See the recent 18c03c61444a21 "Documentation/
memory-barriers.txt: Prohibit speculative writes".

> +		/* audit_inc_n_rules could increment audit_n_rules here... */
> +
> +		clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> +
> +		smp_rmb();

Again, I guess this should be mb() or smp_mb__after_clear_bit().


And I still think this needs more changes. Once again, I do not think
that, say, __audit_log_bprm_fcaps() should populate context->aux if
!TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or __audit_signal_info()...

Perhaps __audit_syscall_exit() should also set context->dummy?

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 16:57 ` Oleg Nesterov
@ 2014-02-10 17:29   ` Andy Lutomirski
  2014-02-10 17:47     ` Steve Grubb
  2014-02-10 19:01     ` Andy Lutomirski
  0 siblings, 2 replies; 13+ messages in thread
From: Andy Lutomirski @ 2014-02-10 17:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-audit, linux-kernel, Andi Kleen, Steve Grubb, Eric Paris

On Mon, Feb 10, 2014 at 8:57 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 02/08, Andy Lutomirski wrote:
>>
>> +void audit_inc_n_rules()
>> +{
>> +     struct task_struct *p, *t;
>> +
>> +     read_lock(&tasklist_lock);
>> +     audit_n_rules++;
>> +     smp_wmb();
>> +     if (audit_n_rules == 1) {
>> +             /*
>> +              * We now have a rule; we need to hook syscall entry.
>> +              */
>> +             for_each_process_thread(p, t) {
>> +                     if (t->audit_context)
>> +                             set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
>> +             }
>> +     }
>> +     read_unlock(&tasklist_lock);
>> +}
>> +
>> +void audit_dec_n_rules()
>> +{
>> +     read_lock(&tasklist_lock);
>> +     --audit_n_rules;
>> +     BUG_ON(audit_n_rules < 0);
>> +
>> +     /*
>> +      * If audit_n_rules == 0, then __audit_syscall_exit will clear
>> +      * TIF_SYSCALL_AUDIT.
>> +      */
>> +
>> +     read_unlock(&tasklist_lock);
>> +}
>
> To be honest, I do not understand why _dec_ takes tasklist_lock...
> And why _inc_ increments audit_n_rules under tasklist.

Bah, incorrect leftover from last time.

>
>> @@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long return_code)
>>               context->filterkey = NULL;
>>       }
>>       tsk->audit_context = context;
>> +
>> +     if (ACCESS_ONCE(audit_n_rules) == 0) {
>> +             /*
>> +              * Either this is the very first syscall by this process or
>> +              * audit_dec_n_rules recently set audit_n_rules to zero.
>> +              */
>> +             smp_rmb();
>
> rmb() looks wrong, we need mb() to serialize ACCESS_ONCE() and
> clear_tsk_thread_flag().

I clearly need to review the rules.  I think you're right, though --
no barrier should be needed.

>
> But, otoh, I think we do not need any barrier at all, we can rely on
> control dependency. See the recent 18c03c61444a21 "Documentation/
> memory-barriers.txt: Prohibit speculative writes".
>
>> +             /* audit_inc_n_rules could increment audit_n_rules here... */
>> +
>> +             clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>> +
>> +             smp_rmb();
>
> Again, I guess this should be mb() or smp_mb__after_clear_bit().
>
>
> And I still think this needs more changes. Once again, I do not think
> that, say, __audit_log_bprm_fcaps() should populate context->aux if
> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or __audit_signal_info()...
>
> Perhaps __audit_syscall_exit() should also set context->dummy?

That would work.

I'm still torn between trying to make it possible for things like
__audit_log_bprm_fcaps to start a syscall audit record in the middle
of a syscall or to just try to tighten up the current approach to the
point where it will work correctly.

Grr.  Why is all this crap tied up with syscall auditing anyway?  ISTM
it would have been a lot nicer if audit calls just immediately emitted
audit records, completely independently of the syscall machinery.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 17:29   ` Andy Lutomirski
@ 2014-02-10 17:47     ` Steve Grubb
  2014-02-10 18:05       ` Andy Lutomirski
  2014-02-10 19:01     ` Andy Lutomirski
  1 sibling, 1 reply; 13+ messages in thread
From: Steve Grubb @ 2014-02-10 17:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen, Eric Paris

On Monday, February 10, 2014 09:29:19 AM Andy Lutomirski wrote:
> Grr.  Why is all this crap tied up with syscall auditing anyway?  ISTM
> it would have been a lot nicer if audit calls just immediately emitted
> audit records, completely independently of the syscall machinery.

Because the majority of people needing audit need syscall records for it to 
make any sense. The auxiliary records generally report on the object of the 
syscall. We still require information about who was doing something, what they 
were doing, and what the result was. 

Even if you just get the AVC's, you still don't know what happened. If you get 
a deny record, was it really denied? The system could have been in permissive 
mode and the syscall succeeded. You only get the real decision when you have 
syscall records.

-Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 17:47     ` Steve Grubb
@ 2014-02-10 18:05       ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2014-02-10 18:05 UTC (permalink / raw)
  To: Steve Grubb
  Cc: Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen, Eric Paris

On Mon, Feb 10, 2014 at 9:47 AM, Steve Grubb <sgrubb@redhat.com> wrote:
> On Monday, February 10, 2014 09:29:19 AM Andy Lutomirski wrote:
>> Grr.  Why is all this crap tied up with syscall auditing anyway?  ISTM
>> it would have been a lot nicer if audit calls just immediately emitted
>> audit records, completely independently of the syscall machinery.
>
> Because the majority of people needing audit need syscall records for it to
> make any sense. The auxiliary records generally report on the object of the
> syscall. We still require information about who was doing something, what they
> were doing, and what the result was.
>
> Even if you just get the AVC's, you still don't know what happened. If you get
> a deny record, was it really denied? The system could have been in permissive
> mode and the syscall succeeded. You only get the real decision when you have
> syscall records.
>

Fair enough.

I'll see if I can turn this into something more workable.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 17:29   ` Andy Lutomirski
  2014-02-10 17:47     ` Steve Grubb
@ 2014-02-10 19:01     ` Andy Lutomirski
  2014-02-10 19:12         ` Steve Grubb
  2014-02-18 17:31       ` Eric Paris
  1 sibling, 2 replies; 13+ messages in thread
From: Andy Lutomirski @ 2014-02-10 19:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-audit, linux-kernel, Andi Kleen, Steve Grubb, Eric Paris

On Mon, Feb 10, 2014 at 9:29 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Feb 10, 2014 at 8:57 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 02/08, Andy Lutomirski wrote:
>>>
>>> +void audit_inc_n_rules()
>>> +{
>>> +     struct task_struct *p, *t;
>>> +
>>> +     read_lock(&tasklist_lock);
>>> +     audit_n_rules++;
>>> +     smp_wmb();
>>> +     if (audit_n_rules == 1) {
>>> +             /*
>>> +              * We now have a rule; we need to hook syscall entry.
>>> +              */
>>> +             for_each_process_thread(p, t) {
>>> +                     if (t->audit_context)
>>> +                             set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
>>> +             }
>>> +     }
>>> +     read_unlock(&tasklist_lock);
>>> +}
>>> +
>>> +void audit_dec_n_rules()
>>> +{
>>> +     read_lock(&tasklist_lock);
>>> +     --audit_n_rules;
>>> +     BUG_ON(audit_n_rules < 0);
>>> +
>>> +     /*
>>> +      * If audit_n_rules == 0, then __audit_syscall_exit will clear
>>> +      * TIF_SYSCALL_AUDIT.
>>> +      */
>>> +
>>> +     read_unlock(&tasklist_lock);
>>> +}
>>
>> To be honest, I do not understand why _dec_ takes tasklist_lock...
>> And why _inc_ increments audit_n_rules under tasklist.
>
> Bah, incorrect leftover from last time.
>
>>
>>> @@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long return_code)
>>>               context->filterkey = NULL;
>>>       }
>>>       tsk->audit_context = context;
>>> +
>>> +     if (ACCESS_ONCE(audit_n_rules) == 0) {
>>> +             /*
>>> +              * Either this is the very first syscall by this process or
>>> +              * audit_dec_n_rules recently set audit_n_rules to zero.
>>> +              */
>>> +             smp_rmb();
>>
>> rmb() looks wrong, we need mb() to serialize ACCESS_ONCE() and
>> clear_tsk_thread_flag().
>
> I clearly need to review the rules.  I think you're right, though --
> no barrier should be needed.
>
>>
>> But, otoh, I think we do not need any barrier at all, we can rely on
>> control dependency. See the recent 18c03c61444a21 "Documentation/
>> memory-barriers.txt: Prohibit speculative writes".
>>
>>> +             /* audit_inc_n_rules could increment audit_n_rules here... */
>>> +
>>> +             clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>>> +
>>> +             smp_rmb();
>>
>> Again, I guess this should be mb() or smp_mb__after_clear_bit().
>>
>>
>> And I still think this needs more changes. Once again, I do not think
>> that, say, __audit_log_bprm_fcaps() should populate context->aux if
>> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or __audit_signal_info()...
>>
>> Perhaps __audit_syscall_exit() should also set context->dummy?
>
> That would work.
>
> I'm still torn between trying to make it possible for things like
> __audit_log_bprm_fcaps to start a syscall audit record in the middle
> of a syscall or to just try to tighten up the current approach to the
> point where it will work correctly.

This is worse than I thought.  Things like signal auditing can enter
the audit system from outside of a syscall.  I don't think there's
currently any way to tell whether you're in a syscall (when
TIF_SYSCALL_AUDIT is clear) so getting this to work right would
require arch help.

I'll ask what people on the Fedora list think about just changing the
default to -t task,never.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 19:01     ` Andy Lutomirski
@ 2014-02-10 19:12         ` Steve Grubb
  2014-02-18 17:31       ` Eric Paris
  1 sibling, 0 replies; 13+ messages in thread
From: Steve Grubb @ 2014-02-10 19:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen, Eric Paris

On Monday, February 10, 2014 11:01:36 AM Andy Lutomirski wrote:
> >> And I still think this needs more changes. Once again, I do not think
> >> that, say, __audit_log_bprm_fcaps() should populate context->aux if
> >> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or
> >> __audit_signal_info()...
> >> 
> >> Perhaps __audit_syscall_exit() should also set context->dummy?
> > 
> > That would work.
> > 
> > I'm still torn between trying to make it possible for things like
> > __audit_log_bprm_fcaps to start a syscall audit record in the middle
> > of a syscall or to just try to tighten up the current approach to the
> > point where it will work correctly.
> 
> This is worse than I thought.  Things like signal auditing can enter
> the audit system from outside of a syscall.  I don't think there's
> currently any way to tell whether you're in a syscall (when
> TIF_SYSCALL_AUDIT is clear) so getting this to work right would
> require arch help.
> 
> I'll ask what people on the Fedora list think about just changing the
> default to -t task,never.

I can't recall ever seeing the task filter used in real life. But assuming you 
wanted to audit no tasks, what is the difference between using that filter and 
never setting audit_enable in the first place?

-Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
@ 2014-02-10 19:12         ` Steve Grubb
  0 siblings, 0 replies; 13+ messages in thread
From: Steve Grubb @ 2014-02-10 19:12 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-audit, Andi Kleen, Oleg Nesterov, linux-kernel

On Monday, February 10, 2014 11:01:36 AM Andy Lutomirski wrote:
> >> And I still think this needs more changes. Once again, I do not think
> >> that, say, __audit_log_bprm_fcaps() should populate context->aux if
> >> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or
> >> __audit_signal_info()...
> >> 
> >> Perhaps __audit_syscall_exit() should also set context->dummy?
> > 
> > That would work.
> > 
> > I'm still torn between trying to make it possible for things like
> > __audit_log_bprm_fcaps to start a syscall audit record in the middle
> > of a syscall or to just try to tighten up the current approach to the
> > point where it will work correctly.
> 
> This is worse than I thought.  Things like signal auditing can enter
> the audit system from outside of a syscall.  I don't think there's
> currently any way to tell whether you're in a syscall (when
> TIF_SYSCALL_AUDIT is clear) so getting this to work right would
> require arch help.
> 
> I'll ask what people on the Fedora list think about just changing the
> default to -t task,never.

I can't recall ever seeing the task filter used in real life. But assuming you 
wanted to audit no tasks, what is the difference between using that filter and 
never setting audit_enable in the first place?

-Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 19:12         ` Steve Grubb
  (?)
@ 2014-02-10 20:04         ` Andy Lutomirski
  2014-02-18 17:32           ` Eric Paris
  -1 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2014-02-10 20:04 UTC (permalink / raw)
  To: Steve Grubb
  Cc: Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen, Eric Paris

On Mon, Feb 10, 2014 at 11:12 AM, Steve Grubb <sgrubb@redhat.com> wrote:
> On Monday, February 10, 2014 11:01:36 AM Andy Lutomirski wrote:
>> >> And I still think this needs more changes. Once again, I do not think
>> >> that, say, __audit_log_bprm_fcaps() should populate context->aux if
>> >> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or
>> >> __audit_signal_info()...
>> >>
>> >> Perhaps __audit_syscall_exit() should also set context->dummy?
>> >
>> > That would work.
>> >
>> > I'm still torn between trying to make it possible for things like
>> > __audit_log_bprm_fcaps to start a syscall audit record in the middle
>> > of a syscall or to just try to tighten up the current approach to the
>> > point where it will work correctly.
>>
>> This is worse than I thought.  Things like signal auditing can enter
>> the audit system from outside of a syscall.  I don't think there's
>> currently any way to tell whether you're in a syscall (when
>> TIF_SYSCALL_AUDIT is clear) so getting this to work right would
>> require arch help.
>>
>> I'll ask what people on the Fedora list think about just changing the
>> default to -t task,never.
>
> I can't recall ever seeing the task filter used in real life. But assuming you
> wanted to audit no tasks, what is the difference between using that filter and
> never setting audit_enable in the first place?

Two possible differences:

1. You can toggle it both ways at runtime.  Setting audit_enabled is a
one-way street (at least as far TIF_SYSCALL_AUDIT is concerned).

2. Do AVC denial messages still get logged if audit_enable == 0?  If
not, then audit_enable is a non-starter.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 19:01     ` Andy Lutomirski
  2014-02-10 19:12         ` Steve Grubb
@ 2014-02-18 17:31       ` Eric Paris
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Paris @ 2014-02-18 17:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen, Steve Grubb

On Mon, 2014-02-10 at 11:01 -0800, Andy Lutomirski wrote:
> On Mon, Feb 10, 2014 at 9:29 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Mon, Feb 10, 2014 at 8:57 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >> On 02/08, Andy Lutomirski wrote:
> >>>
> >>> +void audit_inc_n_rules()
> >>> +{
> >>> +     struct task_struct *p, *t;
> >>> +
> >>> +     read_lock(&tasklist_lock);
> >>> +     audit_n_rules++;
> >>> +     smp_wmb();
> >>> +     if (audit_n_rules == 1) {
> >>> +             /*
> >>> +              * We now have a rule; we need to hook syscall entry.
> >>> +              */
> >>> +             for_each_process_thread(p, t) {
> >>> +                     if (t->audit_context)
> >>> +                             set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
> >>> +             }
> >>> +     }
> >>> +     read_unlock(&tasklist_lock);
> >>> +}
> >>> +
> >>> +void audit_dec_n_rules()
> >>> +{
> >>> +     read_lock(&tasklist_lock);
> >>> +     --audit_n_rules;
> >>> +     BUG_ON(audit_n_rules < 0);
> >>> +
> >>> +     /*
> >>> +      * If audit_n_rules == 0, then __audit_syscall_exit will clear
> >>> +      * TIF_SYSCALL_AUDIT.
> >>> +      */
> >>> +
> >>> +     read_unlock(&tasklist_lock);
> >>> +}
> >>
> >> To be honest, I do not understand why _dec_ takes tasklist_lock...
> >> And why _inc_ increments audit_n_rules under tasklist.
> >
> > Bah, incorrect leftover from last time.
> >
> >>
> >>> @@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long return_code)
> >>>               context->filterkey = NULL;
> >>>       }
> >>>       tsk->audit_context = context;
> >>> +
> >>> +     if (ACCESS_ONCE(audit_n_rules) == 0) {
> >>> +             /*
> >>> +              * Either this is the very first syscall by this process or
> >>> +              * audit_dec_n_rules recently set audit_n_rules to zero.
> >>> +              */
> >>> +             smp_rmb();
> >>
> >> rmb() looks wrong, we need mb() to serialize ACCESS_ONCE() and
> >> clear_tsk_thread_flag().
> >
> > I clearly need to review the rules.  I think you're right, though --
> > no barrier should be needed.
> >
> >>
> >> But, otoh, I think we do not need any barrier at all, we can rely on
> >> control dependency. See the recent 18c03c61444a21 "Documentation/
> >> memory-barriers.txt: Prohibit speculative writes".
> >>
> >>> +             /* audit_inc_n_rules could increment audit_n_rules here... */
> >>> +
> >>> +             clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
> >>> +
> >>> +             smp_rmb();
> >>
> >> Again, I guess this should be mb() or smp_mb__after_clear_bit().
> >>
> >>
> >> And I still think this needs more changes. Once again, I do not think
> >> that, say, __audit_log_bprm_fcaps() should populate context->aux if
> >> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or __audit_signal_info()...
> >>
> >> Perhaps __audit_syscall_exit() should also set context->dummy?
> >
> > That would work.
> >
> > I'm still torn between trying to make it possible for things like
> > __audit_log_bprm_fcaps to start a syscall audit record in the middle
> > of a syscall or to just try to tighten up the current approach to the
> > point where it will work correctly.

Personally, I'd say just hand the next syscall_entry(), don't try to get
that race closed that fast...   "The first syscall after a rule is added
will be audited"

> This is worse than I thought.  Things like signal auditing can enter
> the audit system from outside of a syscall.

Not sure what you mean here.  The only place I know of that we do
something like that is signal delivery to auditd itself, which is a
horrible nasty ugly ungodly terrible eat-your-babies hack...   We'd have
to special case that hack to not pay attention to TIF_SYSCALL_AUDIT or
audit_dummy_context(), but the rest might be fixable if we set/unset the
dummy spot...

> I don't think there's
> currently any way to tell whether you're in a syscall (when
> TIF_SYSCALL_AUDIT is clear) so getting this to work right would
> require arch help.

Don't understand why this is needed...

> I'll ask what people on the Fedora list think about just changing the
> default to -t task,never.
> 
> --Andy



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-10 20:04         ` Andy Lutomirski
@ 2014-02-18 17:32           ` Eric Paris
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Paris @ 2014-02-18 17:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steve Grubb, Oleg Nesterov, linux-audit, linux-kernel, Andi Kleen

On Mon, 2014-02-10 at 12:04 -0800, Andy Lutomirski wrote:
> On Mon, Feb 10, 2014 at 11:12 AM, Steve Grubb <sgrubb@redhat.com> wrote:

> 2. Do AVC denial messages still get logged if audit_enable == 0?  If
> not, then audit_enable is a non-starter.

They go out printk/dmesg/syslog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
  2014-02-08 21:06 [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules Andy Lutomirski
@ 2014-02-18 20:17   ` Eric Paris
  2014-02-18 20:17   ` Eric Paris
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Paris @ 2014-02-18 20:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-audit, linux-kernel, Andi Kleen, Oleg Nesterov, Steve Grubb

On Sat, 2014-02-08 at 13:06 -0800, Andy Lutomirski wrote:
> This toggles TIF_SYSCALL_AUDIT as needed when rules change instead
> of leaving it set whenever rules might be set in the future.  This
> reduces syscall latency from >60ns to closer to 40ns on my laptop.

Al also politely reminded me it might be wise to get some perf data
about where exactly we are spending out time.  I don't know squat about
perf, but Linus always tells me to do:

perf record -g -e cycles:pp -F 25000 $YOURTEST
perf report -s symbol

(the "-s symbol" is so that you don't get separate data for the
different processes that are part of the kernel build - you'll just
want "general kernel data"), and on one of the kernel symbols just
select it and do "Zoom into kernel DSO". You should see something like
this:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules
@ 2014-02-18 20:17   ` Eric Paris
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Paris @ 2014-02-18 20:17 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-audit, Andi Kleen, linux-kernel, Oleg Nesterov

On Sat, 2014-02-08 at 13:06 -0800, Andy Lutomirski wrote:
> This toggles TIF_SYSCALL_AUDIT as needed when rules change instead
> of leaving it set whenever rules might be set in the future.  This
> reduces syscall latency from >60ns to closer to 40ns on my laptop.

Al also politely reminded me it might be wise to get some perf data
about where exactly we are spending out time.  I don't know squat about
perf, but Linus always tells me to do:

perf record -g -e cycles:pp -F 25000 $YOURTEST
perf report -s symbol

(the "-s symbol" is so that you don't get separate data for the
different processes that are part of the kernel build - you'll just
want "general kernel data"), and on one of the kernel symbols just
select it and do "Zoom into kernel DSO". You should see something like
this:

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-02-18 20:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-08 21:06 [PATCH v3] audit: Turn off TIF_SYSCALL_AUDIT when there are no rules Andy Lutomirski
2014-02-10 16:57 ` Oleg Nesterov
2014-02-10 17:29   ` Andy Lutomirski
2014-02-10 17:47     ` Steve Grubb
2014-02-10 18:05       ` Andy Lutomirski
2014-02-10 19:01     ` Andy Lutomirski
2014-02-10 19:12       ` Steve Grubb
2014-02-10 19:12         ` Steve Grubb
2014-02-10 20:04         ` Andy Lutomirski
2014-02-18 17:32           ` Eric Paris
2014-02-18 17:31       ` Eric Paris
2014-02-18 20:17 ` Eric Paris
2014-02-18 20:17   ` Eric Paris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.