Re: [PATCH] audit: accelerate audit rule filter

From: "zhaozixuan (C)" <zhaozixuan2@huawei.com>
To: Paul Moore <paul@paul-moore.com>
Cc: "eparis@redhat.com" <eparis@redhat.com>,
	"linux-audit@redhat.com" <linux-audit@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] audit: accelerate audit rule filter
Date: Mon, 29 Nov 2021 07:35:39 +0000	[thread overview]
Message-ID: <4aac209c744848a38bb2003d601083e4@huawei.com> (raw)
In-Reply-To: <CAHC9VhS-pPDWBRX-6q7aj1Fj5oU0M2vgJC0ipTStczYGkFVvwg@mail.gmail.com>

>On Tue, Nov 23, 2021 at 2:50 AM Zixuan Zhao <zhaozixuan2@huawei.com> wrote:
>> We used lat_syscall of lmbench3 to test the performance impact of this  
>> patch. We changed the number of rules and run lat_syscall with 1000  
>> repetitions at each test. Syscalls measured by lat_syscall are not  
>> monitored by rules.
>>
>> Before this optimization:
>>
>>              null     read    write     stat    fstat      open
>>   0 rules  1.87ms   2.74ms   2.56ms   26.31ms  4.13ms   69.66ms
>>  10 rules  2.15ms   3.13ms   3.32ms   26.99ms  4.16ms   74.70ms
>>  20 rules  2.45ms   3.97ms   3.82ms   27.05ms  4.60ms   76.35ms
>>  30 rules  2.64ms   4.52ms   3.95ms   30.30ms  4.94ms   78.94ms
>>  40 rules  2.83ms   4.97ms   4.23ms   32.16ms  5.40ms   81.88ms
>>  50 rules  3.00ms   5.30ms   4.84ms   33.49ms  5.79ms   83.20ms
>> 100 rules  4.24ms   9.75ms   7.42ms   37.68ms  6.55ms   93.70ms
>> 160 rules  5.50ms   16.89ms  12.18ms  51.53ms  17.45ms  155.40ms
>>
>> After this optimization:
>>
>>              null     read    write     stat    fstat      open
>>   0 rules  1.81ms   2.84ms   2.42ms  27.70ms   4.15ms   69.10ms
>>  10 rules  1.97ms   2.83ms   2.69ms  27.70ms   4.15ms   69.30ms
>>  20 rules  1.72ms   2.91ms   2.41ms  26.49ms   3.91ms   71.19ms
>>  30 rules  1.85ms   2.94ms   2.48ms  26.27ms   3.97ms   71.43ms
>>  40 rules  1.88ms   2.94ms   2.78ms  26.85ms   4.08ms   69.79ms
>>  50 rules  1.86ms   3.17ms   3.08ms  26.25ms   4.03ms   72.32ms
>> 100 rules  1.84ms   3.00ms   2.81ms  26.25ms   3.98ms   70.25ms
>> 160 rules  1.92ms   3.32ms   3.06ms  26.81ms   4.57ms   71.41ms
>>
>> As the result shown above, the syscall latencies increase as  the 
>> number  of rules increases, while with the patch the latencies remain stable.
>>  This could help when a user adds many audit rules for purposes (such 
>> as  attack tracing or process behavior recording) but suffers from low  
>> performance.
>
>I have general concerns about trading memory and complexity for performance gains, but beyond that the numbers you posted above don't yet make sense to me.

Thanks for your reply.

The memory cost of this patch is less than 4KB (1820 bytes on x64 and
 3640 bytes on compatible x86_64) which is trivial in many cases.
 Besides, syscalls are called frequently on a system so a small
 optimization could bring a good income.

>Why are the latency increases due to rule count not similar across the different syscalls? For example, I would think that if the increase in syscall latency was >directly attributed to the audit rule processing then the increase on the "open" syscall should be similar to that of the "null" syscall.  In other phrasing, if we >can process 160 rules in ~4ms in the "null" case, why does it take us ~86ms in the "open" case?

As to the test result, we did some investigations and concluded two
 reasons:
1. The chosen rule sets were not very suitable. Though they were not hit
 by syscalls being measured, some of them were hit by other processes,
 which reduced the system performance and affected the test result;
2. The routine of lat_syscall is much more complicated than we thought. It
 called many other syscalls during the test, which may cause the result
 not to be linear.

Due to the reasons above, we did another test. We modified audit rule sets
 and made sure they wouldn't be hit at runtime. Then, we added
 ktime_get_real_ts64 to auditsc.c to record the time of executing
 __audit_syscall_exit. We ran "stat" syscall 10000 times for each rule set
 and recorded the time interval. The result is shown below:

Before this optimization:

rule set          time
  0 rules     3843.96ns
  1 rules    13119.08ns
 10 rules    14003.13ns
 20 rules    15420.18ns
 30 rules    17284.84ns
 40 rules    19010.67ns
 50 rules    21112.63ns
100 rules    25815.02ns
130 rules    29447.09ns

After this optimization:

 rule set          time
  0 rules     3597.78ns
  1 rules    13498.73ns
 10 rules    13122.57ns
 20 rules    12874.88ns
 30 rules    14351.99ns
 40 rules    14181.07ns
 50 rules    13806.45ns
100 rules    13890.85ns
130 rules    14441.45ns

As the result showed, the interval is linearly increased before
 optimization while the interval remains stable after optimization. Note 
 that audit skips some operations if there are no rules, so there is a gap
 between 0 rule and 1 rule set.