From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=unvS=KV=redhat.com=linux-audit-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85A5EC47082
	for <linux-audit@archiver.kernel.org>; Wed, 26 May 2021 18:11:35 +0000 (UTC)
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 06CEA613C5
	for <linux-audit@archiver.kernel.org>; Wed, 26 May 2021 18:11:34 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 06CEA613C5
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk
Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=linux-audit-bounces@redhat.com
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-473-Fgd_H_yNNUOM5mHe1RzmLg-1; Wed, 26 May 2021 14:11:28 -0400
X-MC-Unique: Fgd_H_yNNUOM5mHe1RzmLg-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C7C07107ACE8;
	Wed, 26 May 2021 18:11:24 +0000 (UTC)
Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 94E7C1349A;
	Wed, 26 May 2021 18:11:24 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by colo-mx.corp.redhat.com (Postfix) with ESMTP id 651F4180B463;
	Wed, 26 May 2021 18:11:24 +0000 (UTC)
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com
	[10.11.54.4])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id 14QI22Ho024574 for <linux-audit@listman.util.phx.redhat.com>;
	Wed, 26 May 2021 14:02:02 -0400
Received: by smtp.corp.redhat.com (Postfix)
	id 37C5720962DA; Wed, 26 May 2021 18:02:02 +0000 (UTC)
Received: from mimecast-mx02.redhat.com
	(mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 33134209A513
	for <linux-audit@redhat.com>; Wed, 26 May 2021 18:01:59 +0000 (UTC)
Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A280C83395F
	for <linux-audit@redhat.com>; Wed, 26 May 2021 18:01:59 +0000 (UTC)
Received: from mail-io1-f41.google.com (mail-io1-f41.google.com
	[209.85.166.41]) (Using TLS) by relay.mimecast.com with ESMTP id
	us-mta-105-i6BnlU2zPiutcWA9BZyv2w-1; Wed, 26 May 2021 14:01:57 -0400
X-MC-Unique: i6BnlU2zPiutcWA9BZyv2w-1
Received: by mail-io1-f41.google.com with SMTP id e17so1958420iol.7
	for <linux-audit@redhat.com>; Wed, 26 May 2021 11:01:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:subject:from:to:cc:references:message-id:date
	:user-agent:mime-version:in-reply-to:content-language
	:content-transfer-encoding;
	bh=BdL8vmcb1cxKPn9udDlqG4Iw5AURof2CP/0Mp7zS9Lk=;
	b=ttO1HA+lwai1vPFLgMwilRf++vOczoOq/KZ7sBrcHdqZaAxJsx2ixR/t1QBYNSvD7i
	tfsVbNj4f2Y0DBtWQyLHDtJTeLtyUKew2cVuDeaxBV9zsfLxjhSs+xZT3BivvzhlWNi0
	8rYD3FU2+/KBPSeCAIE7cNgpIJcVi1jVD30yYTD1N6VnGcbdNBSt45zvF05jcY8moCiR
	bi8ZehVCF5wJIPFXbvzQIkg3O7u8S3cyzl3ZkOCDC4ff39xBRJ8EN9H2N+wAqu8GdwUM
	gQ5BsB3OAEc2DhxhX2jUTnhICNCfPR1f6UZB5xpiOCQwDMNHylhEYE5xLFS5w9frBOel
	U6qQ==
X-Gm-Message-State: AOAM533f6ZakZIt+Zgoa9Zs/2XUEXV7xZ+TaRVeGZH8F789FukPHzc6W
	VULrwjDeWzI0bqVFK+qYCzmBdDp6wsTlh86J
X-Google-Smtp-Source: ABdhPJxijEuaOs0WYknoxU7Se4Y+4ZupNYEjwEa7CaUQdyLFEXKVX9KKUeigMrOGlS+bpbPljmTiqg==
X-Received: by 2002:a05:6602:1c4:: with SMTP id
	w4mr24895998iot.44.1622052116797; 
	Wed, 26 May 2021 11:01:56 -0700 (PDT)
Received: from [192.168.1.30] ([65.144.74.34])
	by smtp.gmail.com with ESMTPSA id v18sm35179iob.3.2021.05.26.11.01.55
	(version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
	Wed, 26 May 2021 11:01:56 -0700 (PDT)
Subject: Re: [RFC PATCH 2/9] audit,io_uring,io-wq: add some basic audit
	support to io_uring
From: Jens Axboe <axboe@kernel.dk>
To: Paul Moore <paul@paul-moore.com>
References: <162163367115.8379.8459012634106035341.stgit@sifl>
	<162163379461.8379.9691291608621179559.stgit@sifl>
	<f07bd213-6656-7516-9099-c6ecf4174519@gmail.com>
	<CAHC9VhRjzWxweB8d8fypUx11CX6tRBnxSWbXH+5qM1virE509A@mail.gmail.com>
	<162219f9-7844-0c78-388f-9b5c06557d06@gmail.com>
	<CAHC9VhSJuddB+6GPS1+mgcuKahrR3UZA=1iO8obFzfRE7_E0gA@mail.gmail.com>
	<8943629d-3c69-3529-ca79-d7f8e2c60c16@kernel.dk>
	<CAHC9VhTYBsh4JHhqV0Uyz=H5cEYQw48xOo=CUdXV0gDvyifPOQ@mail.gmail.com>
	<9e69e4b6-2b87-a688-d604-c7f70be894f5@kernel.dk>
	<3bef7c8a-ee70-d91d-74db-367ad0137d00@kernel.dk>
	<fa7bf4a5-5975-3e8c-99b4-c8d54c57da10@kernel.dk>
Message-ID: <a7669e4a-e7a7-7e94-f6ce-fa48311f7175@kernel.dk>
Date: Wed, 26 May 2021 12:01:55 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
	Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <fa7bf4a5-5975-3e8c-99b4-c8d54c57da10@kernel.dk>
X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection
	Definition; Similar Internal Domain=false;
	Similar Monitored External Domain=false;
	Custom External Domain=false; Mimecast External Domain=false;
	Newly Observed Domain=false; Internal User Name=false;
	Custom Display Name List=false; Reply-to Address Mismatch=false;
	Targeted Threat Dictionary=false;
	Mimecast Threat Dictionary=false; Custom Threat Dictionary=false
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4
X-loop: linux-audit@redhat.com
X-Mailman-Approved-At: Wed, 26 May 2021 14:09:40 -0400
Cc: selinux@vger.kernel.org, io-uring@vger.kernel.org,
	linux-security-module@vger.kernel.org, linux-audit@redhat.com,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>, linux-fsdevel@vger.kernel.org,
	Pavel Begunkov <asml.silence@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>
X-BeenThere: linux-audit@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
List-Id: Linux Audit Discussion <linux-audit.redhat.com>
List-Unsubscribe: <https://listman.redhat.com/mailman/options/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://listman.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://listman.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11
Authentication-Results: relay.mimecast.com;
	auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=linux-audit-bounces@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On 5/26/21 11:54 AM, Jens Axboe wrote:
> On 5/26/21 11:31 AM, Jens Axboe wrote:
>> On 5/26/21 11:15 AM, Jens Axboe wrote:
>>> On 5/25/21 8:04 PM, Paul Moore wrote:
>>>> On Tue, May 25, 2021 at 9:11 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 5/24/21 1:59 PM, Paul Moore wrote:
>>>>>> That said, audit is not for everyone, and we have build time and
>>>>>> runtime options to help make life easier.  Beyond simply disabling
>>>>>> audit at compile time a number of Linux distributions effectively
>>>>>> shortcut audit at runtime by adding a "never" rule to the audit
>>>>>> filter, for example:
>>>>>>
>>>>>>  % auditctl -a task,never
>>>>>
>>>>> As has been brought up, the issue we're facing is that distros have
>>>>> CONFIG_AUDIT=y and hence the above is the best real world case outside
>>>>> of people doing custom kernels. My question would then be how much
>>>>> overhead the above will add, considering it's an entry/exit call per op.
>>>>> If auditctl is turned off, what is the expectation in turns of overhead?
>>>>
>>>> I commented on that case in my last email to Pavel, but I'll try to go
>>>> over it again in a little more detail.
>>>>
>>>> As we discussed earlier in this thread, we can skip the req->opcode
>>>> check before both the _entry and _exit calls, so we are left with just
>>>> the bare audit calls in the io_uring code.  As the _entry and _exit
>>>> functions are small, I've copied them and their supporting functions
>>>> below and I'll try to explain what would happen in CONFIG_AUDIT=y,
>>>> "task,never" case.
>>>>
>>>> +  static inline struct audit_context *audit_context(void)
>>>> +  {
>>>> +    return current->audit_context;
>>>> +  }
>>>>
>>>> +  static inline bool audit_dummy_context(void)
>>>> +  {
>>>> +    void *p = audit_context();
>>>> +    return !p || *(int *)p;
>>>> +  }
>>>>
>>>> +  static inline void audit_uring_entry(u8 op)
>>>> +  {
>>>> +    if (unlikely(audit_enabled && audit_context()))
>>>> +      __audit_uring_entry(op);
>>>> +  }
>>>>
>>>> We have one if statement where the conditional checks on two
>>>> individual conditions.  The first (audit_enabled) is simply a check to
>>>> see if anyone has "turned on" auditing at runtime; historically this
>>>> worked rather well, and still does in a number of places, but ever
>>>> since systemd has taken to forcing audit on regardless of the admin's
>>>> audit configuration it is less useful.  The second (audit_context())
>>>> is a check to see if an audit_context has been allocated for the
>>>> current task.  In the case of "task,never" current->audit_context will
>>>> be NULL (see audit_alloc()) and the __audit_uring_entry() slowpath
>>>> will never be called.
>>>>
>>>> Worst case here is checking the value of audit_enabled and
>>>> current->audit_context.  Depending on which you think is more likely
>>>> we can change the order of the check so that the
>>>> current->audit_context check is first if you feel that is more likely
>>>> to be NULL than audit_enabled is to be false (it may be that way now).
>>>>
>>>> +  static inline void audit_uring_exit(int success, long code)
>>>> +  {
>>>> +    if (unlikely(!audit_dummy_context()))
>>>> +      __audit_uring_exit(success, code);
>>>> +  }
>>>>
>>>> The exit call is very similar to the entry call, but in the
>>>> "task,never" case it is very simple as the first check to be performed
>>>> is the current->audit_context check which we know to be NULL.  The
>>>> __audit_uring_exit() slowpath will never be called.
>>>
>>> I actually ran some numbers this morning. The test base is 5.13+, and
>>> CONFIG_AUDIT=y and CONFIG_AUDITSYSCALL=y is set for both the baseline
>>> test and the test with this series applied. I used your git branch as of
>>> this morning.
>>>
>>> The test case is my usual peak perf test, which is random reads at
>>> QD=128 and using polled IO. It's a single core test, not threaded. I ran
>>> two different tests - one was having a thread just do the IO, the other
>>> is using SQPOLL to do the IO for us. The device is capable than more
>>> IOPS than a single core can deliver, so we're CPU limited in this test.
>>> Hence it's a good test case as it does actual work, and shows software
>>> overhead quite nicely. Runs are very stable (less than 0.5% difference
>>> between runs on the same base), yet I did average 4 runs.
>>>
>>> Kernel		SQPOLL		IOPS		Perf diff
>>> ---------------------------------------------------------
>>> 5.13		0		3029872		0.0%
>>> 5.13		1		3031056		0.0%
>>> 5.13 + audit	0		2894160		-4.5%
>>> 5.13 + audit	1		2886168		-4.8%
>>>
>>> That's an immediate drop in perf of almost 5%. Looking at a quick
>>> profile of it (nothing fancy, just checking for 'audit' in the profile)
>>> shows this:
>>>
>>> +    2.17%  io_uring  [kernel.vmlinux]  [k] __audit_uring_entry
>>> +    0.71%  io_uring  [kernel.vmlinux]  [k] __audit_uring_exit
>>>      0.07%  io_uring  [kernel.vmlinux]  [k] __audit_syscall_entry
>>>      0.02%  io_uring  [kernel.vmlinux]  [k] __audit_syscall_exit
>>>
>>> Note that this is with _no_ rules!
>>
>> io_uring also supports a NOP command, which basically just measures
>> reqs/sec through the interface. Ran that as well:
>>
>> Kernel		SQPOLL		IOPS		Perf diff
>> ---------------------------------------------------------
>> 5.13		0		31.05M		0.0%
>> 5.13 + audit	0		25.31M		-18.5%
>>
>> and profile for the latter includes:
>>
>> +    5.19%  io_uring  [kernel.vmlinux]  [k] __audit_uring_entry
>> +    4.31%  io_uring  [kernel.vmlinux]  [k] __audit_uring_exit
>>      0.26%  io_uring  [kernel.vmlinux]  [k] __audit_syscall_entry
>>      0.08%  io_uring  [kernel.vmlinux]  [k] __audit_syscall_exit
> 
> As Pavel correctly pointed it, looks like auditing is enabled. And
> indeed it was! Hence the above numbers is without having turned off
> auditing. Running the NOPs after having turned off audit, we get 30.6M
> IOPS, which is down about 1.5% from the baseline. The results for the
> polled random read test above did _not_ change from this, they are still
> down the same amount.
> 
> Note, and I should have included this in the first email, this is not
> any kind of argument for or against audit logging. It's purely meant to
> be a set of numbers that show how the current series impacts
> performance.

And finally, just checking if we make it optional per opcode if we see
any real impact, and the answer is no. Using the below patch which
effectively bypasses audit calls unless the opcode has flagged the need
to do so, I cannot measure any difference in perf (as expected).

To turn this into something useful, my suggestion as a viable path
forward would be:

1) Use something like the below patch and flag request types that we
   want to do audit logging for.

2) As Pavel suggested, eliminate the need for having both and entry/exit
   hook, turning it into just one. That effectively cuts the number of
   checks and calls in half.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index aa065808ddcf..2c7c913b786b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -885,6 +885,8 @@ struct io_op_def {
 	unsigned		needs_async_setup : 1;
 	/* should block plug */
 	unsigned		plug : 1;
+	/* should audit */
+	unsigned		audit : 1;
 	/* size of async data needed, if any */
 	unsigned short		async_size;
 };
@@ -6122,7 +6124,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	if (req->work.creds && req->work.creds != current_cred())
 		creds = override_creds(req->work.creds);
 
-	if (req->opcode < IORING_OP_LAST)
+	if (io_op_defs[req->opcode].audit)
 		audit_uring_entry(req->opcode);
 
 	switch (req->opcode) {
@@ -6231,7 +6233,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 		break;
 	}
 
-	if (req->opcode < IORING_OP_LAST)
+	if (io_op_defs[req->opcode].audit)
 		audit_uring_exit(!ret, ret);
 
 	if (creds)

-- 
Jens Axboe

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit