From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-audit-bounces@redhat.com>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 4F84CC433EF
	for <linux-audit@archiver.kernel.org>; Wed, 19 Jan 2022 12:31:19 +0000 (UTC)
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-605-AqnUpNqdMLamR_A1KI96pw-1; Wed, 19 Jan 2022 07:31:09 -0500
X-MC-Unique: AqnUpNqdMLamR_A1KI96pw-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 695D3760C4;
	Wed, 19 Jan 2022 12:31:05 +0000 (UTC)
Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 4D1CC752BF;
	Wed, 19 Jan 2022 12:31:04 +0000 (UTC)
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by colo-mx.corp.redhat.com (Postfix) with ESMTP id 5435E4BB7C;
	Wed, 19 Jan 2022 12:31:01 +0000 (UTC)
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com
	[10.11.54.2])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id 20JCNFeT031407 for <linux-audit@listman.util.phx.redhat.com>;
	Wed, 19 Jan 2022 07:23:16 -0500
Received: by smtp.corp.redhat.com (Postfix)
	id 8CF5B4010A03; Wed, 19 Jan 2022 12:23:15 +0000 (UTC)
Received: from mimecast-mx02.redhat.com
	(mimecast10.extmail.prod.ext.rdu2.redhat.com [10.11.55.26])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 88A24400E132
	for <linux-audit@redhat.com>; Wed, 19 Jan 2022 12:23:15 +0000 (UTC)
Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3554D1C084F7
	for <linux-audit@redhat.com>; Wed, 19 Jan 2022 12:23:15 +0000 (UTC)
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com
	[45.249.212.189]) by relay.mimecast.com with ESMTP with STARTTLS
	(version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	us-mta-176-n9Lzk5E-MAWiprw0fZCIDw-1; Wed, 19 Jan 2022 07:23:12 -0500
X-MC-Unique: n9Lzk5E-MAWiprw0fZCIDw-1
Received: from dggeme762-chm.china.huawei.com (unknown [172.30.72.53])
	by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Jf4Tv0BxHz8wNn;
	Wed, 19 Jan 2022 20:20:19 +0800 (CST)
Received: from [10.67.110.176] (10.67.110.176) by
	dggeme762-chm.china.huawei.com (10.3.19.108) with Microsoft SMTP Server
	(version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id
	15.1.2308.21; Wed, 19 Jan 2022 20:23:08 +0800
Subject: Re: [RFC PATCH] audit: improve audit queue handling when "audit=1" on
	cmdline
To: Paul Moore <paul@paul-moore.com>, <linux-audit@redhat.com>, Xiujianfeng
	<xiujianfeng@huawei.com>, wangweiyang <wangweiyang2@huawei.com>
References: <164255558889.182404.10149317566869570438.stgit@olly>
From: cuigaosheng <cuigaosheng1@huawei.com>
Message-ID: <c50a8cb9-f828-559e-8f31-efa18248e46e@huawei.com>
Date: Wed, 19 Jan 2022 20:23:08 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
	Thunderbird/78.6.1
MIME-Version: 1.0
In-Reply-To: <164255558889.182404.10149317566869570438.stgit@olly>
X-Originating-IP: [10.67.110.176]
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
	dggeme762-chm.china.huawei.com (10.3.19.108)
X-CFilter-Loop: Reflected
X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection
	Definition; Similar Internal Domain=false;
	Similar Monitored External Domain=false;
	Custom External Domain=false; Mimecast External Domain=false;
	Newly Observed Domain=false; Internal User Name=false;
	Custom Display Name List=false; Reply-to Address Mismatch=false;
	Targeted Threat Dictionary=false;
	Mimecast Threat Dictionary=false; Custom Threat Dictionary=false
X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2
X-loop: linux-audit@redhat.com
X-BeenThere: linux-audit@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
List-Id: Linux Audit Discussion <linux-audit.redhat.com>
List-Unsubscribe: <https://listman.redhat.com/mailman/options/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://listman.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://listman.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14
Authentication-Results: relay.mimecast.com;
	auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=linux-audit-bounces@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: multipart/mixed; boundary="===============6454221015519942906=="

--===============6454221015519942906==
Content-Type: multipart/alternative;
	boundary="------------B9B105ACD67DD902D2AA6736"

--------------B9B105ACD67DD902D2AA6736
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: quoted-printable

Hi Paul,

There are some questions about this patch and hope it helps.

> /**
>    * kauditd_retry_skb - Queue an audit record, attempt to send again to =
auditd
>    * @skb: audit record
> + * @error: error code (unused)
>    *
>    * Description:
>    * Not as serious as kauditd_hold_skb() as we still have a connected au=
ditd,
>    * but for some reason we are having problems sending it audit records =
so
>    * queue the given record and attempt to resend.
>    */
> -static void kauditd_retry_skb(struct sk_buff *skb)
> +static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int e=
rror)
>   {
> -=09/* NOTE: because records should only live in the retry queue for a
> -=09 * short period of time, before either being sent or moved to the hol=
d
> -=09 * queue, we don't currently enforce a limit on this queue */
> -=09skb_queue_tail(&audit_retry_queue, skb);
> +=09if (!audit_backlog_limit ||
> +=09    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> +=09=09skb_queue_tail(&audit_retry_queue, skb);
> +=09=09return;
> +=09}
> +
> +=09audit_log_lost("kauditd retry queue overflow");
> +=09kfree_skb(skb);
>   }

When we process the main queue, should we printk the skb when audit_log_los=
t be call ?
> /**
>    * kauditd_hold_skb - Queue an audit record, waiting for auditd
>    * @skb: audit record
> + * @error: error code
>    *
>    * Description:
>    * Queue the audit record, waiting for an instance of auditd.  When thi=
s
> @@ -564,19 +566,31 @@ static void kauditd_rehold_skb(struct sk_buff *skb)
>    * and queue it, if we have room.  If we want to hold on to the record,=
 but we
>    * don't have room, record a record lost message.
>    */
> -static void kauditd_hold_skb(struct sk_buff *skb)
> +static void kauditd_hold_skb(struct sk_buff *skb, int error)
>   {
>   =09/* at this point it is uncertain if we will ever send this to auditd=
 so
>   =09 * try to send the message via printk before we go any further */
>   =09kauditd_printk_skb(skb);
>  =20
>   =09/* can we just silently drop the message? */
> -=09if (!audit_default) {
> -=09=09kfree_skb(skb);
> -=09=09return;
> +=09if (!audit_default)
> +=09=09goto drop;
> +
> +=09/* the hold queue is only for when the daemon goes away completely,
> +=09 * not -EAGAIN failures; if we are in a -EAGAIN state requeue the
> +=09 * record on the retry queue unless it's full, in which case drop it
> +=09 */
> +=09if (error =3D=3D -EAGAIN) {
> +=09=09if (!audit_backlog_limit ||
> +=09=09    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> +=09=09=09skb_queue_tail(&audit_retry_queue, skb);
> +=09=09=09return;
> +=09=09}
> +=09=09audit_log_lost("kauditd retry queue overflow");
> +=09=09goto drop;
>   =09}
>  =20
> -=09/* if we have room, queue the message */
> +=09/* if we have room in the hold queue, queue the message */
>   =09if (!audit_backlog_limit ||
>   =09    skb_queue_len(&audit_hold_queue) < audit_backlog_limit) {
>   =09=09skb_queue_tail(&audit_hold_queue, skb);
> @@ -585,24 +599,30 @@ static void kauditd_hold_skb(struct sk_buff *skb)
>  =20
>   =09/* we have no other options - drop the message */
>   =09audit_log_lost("kauditd hold queue overflow");
> +drop:
>   =09kfree_skb(skb);
>   }
If we move skbs from audit_hold_queue to audit_retry_queue, will these skbs=
 be printed
more than once by printk?

Thanks.

=E5=9C=A8 2022/1/19 9:26, Paul Moore =E5=86=99=E9=81=93:
> When an admin enables audit at early boot via the "audit=3D1" kernel
> command line the audit queue behavior is slightly different; the
> audit subsystem goes to greater lengths to avoid dropping records,
> which unfortunately can result in problems when the audit daemon is
> forcibly stopped for an extended period of time.
>
> This patch makes a number of changes designed to improve the audit
> queuing behavior so that leaving the audit daemon in a stopped state
> for an extended period does not cause a significant impact to the
> system.
>
> - kauditd_send_queue() is now limited to looping through the
>    passed queue only once per call.  This not only prevents the
>    function from looping indefinitely when records are returned
>    to the current queue, it also allows any recovery handling in
>    kauditd_thread() to take place when kauditd_send_queue()
>    returns.
>
> - Transient netlink send errors seen as -EAGAIN now cause the
>    record to be returned to the retry queue instead of going to
>    the hold queue.  The intention of the hold queue is to store,
>    perhaps for an extended period of time, the events which led
>    up to the audit daemon going offline.  The retry queue remains
>    a temporary queue intended to protect against transient issues
>    between the kernel and the audit daemon.
>
> - The retry queue is now limited by the audit_backlog_limit
>    setting, the same as the other queues.  This allows admins
>    to bound the size of all of the audit queues on the system.
>
> - kauditd_rehold_skb() now returns records to the end of the
>    hold queue to ensure ordering is preserved in the face of
>    recent changes to kauditd_send_queue().
>
> Cc: stable@vger.kernel.org
> Fixes: 5b52330bbfe63 ("audit: fix auditd/kernel connection state tracking=
")
> Fixes: f4b3ee3c85551 ("audit: improve robustness of the audit queue handl=
ing")
> Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>   kernel/audit.c |   60 ++++++++++++++++++++++++++++++++++++++-----------=
-------
>   1 file changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index e4bbe2c70c26..c45d3fe61466 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -541,20 +541,22 @@ static void kauditd_printk_skb(struct sk_buff *skb)
>   /**
>    * kauditd_rehold_skb - Handle a audit record send failure in the hold =
queue
>    * @skb: audit record
> + * @error: error code (unused)
>    *
>    * Description:
>    * This should only be used by the kauditd_thread when it fails to flus=
h the
>    * hold queue.
>    */
> -static void kauditd_rehold_skb(struct sk_buff *skb)
> +static void kauditd_rehold_skb(struct sk_buff *skb, __always_unused int =
error)
>   {
> -=09/* put the record back in the queue at the same place */
> -=09skb_queue_head(&audit_hold_queue, skb);
> +=09/* put the record back in the queue */
> +=09skb_queue_tail(&audit_hold_queue, skb);
>   }
>  =20
>   /**
>    * kauditd_hold_skb - Queue an audit record, waiting for auditd
>    * @skb: audit record
> + * @error: error code
>    *
>    * Description:
>    * Queue the audit record, waiting for an instance of auditd.  When thi=
s
> @@ -564,19 +566,31 @@ static void kauditd_rehold_skb(struct sk_buff *skb)
>    * and queue it, if we have room.  If we want to hold on to the record,=
 but we
>    * don't have room, record a record lost message.
>    */
> -static void kauditd_hold_skb(struct sk_buff *skb)
> +static void kauditd_hold_skb(struct sk_buff *skb, int error)
>   {
>   =09/* at this point it is uncertain if we will ever send this to auditd=
 so
>   =09 * try to send the message via printk before we go any further */
>   =09kauditd_printk_skb(skb);
>  =20
>   =09/* can we just silently drop the message? */
> -=09if (!audit_default) {
> -=09=09kfree_skb(skb);
> -=09=09return;
> +=09if (!audit_default)
> +=09=09goto drop;
> +
> +=09/* the hold queue is only for when the daemon goes away completely,
> +=09 * not -EAGAIN failures; if we are in a -EAGAIN state requeue the
> +=09 * record on the retry queue unless it's full, in which case drop it
> +=09 */
> +=09if (error =3D=3D -EAGAIN) {
> +=09=09if (!audit_backlog_limit ||
> +=09=09    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> +=09=09=09skb_queue_tail(&audit_retry_queue, skb);
> +=09=09=09return;
> +=09=09}
> +=09=09audit_log_lost("kauditd retry queue overflow");
> +=09=09goto drop;
>   =09}
>  =20
> -=09/* if we have room, queue the message */
> +=09/* if we have room in the hold queue, queue the message */
>   =09if (!audit_backlog_limit ||
>   =09    skb_queue_len(&audit_hold_queue) < audit_backlog_limit) {
>   =09=09skb_queue_tail(&audit_hold_queue, skb);
> @@ -585,24 +599,30 @@ static void kauditd_hold_skb(struct sk_buff *skb)
>  =20
>   =09/* we have no other options - drop the message */
>   =09audit_log_lost("kauditd hold queue overflow");
> +drop:
>   =09kfree_skb(skb);
>   }
>  =20
>   /**
>    * kauditd_retry_skb - Queue an audit record, attempt to send again to =
auditd
>    * @skb: audit record
> + * @error: error code (unused)
>    *
>    * Description:
>    * Not as serious as kauditd_hold_skb() as we still have a connected au=
ditd,
>    * but for some reason we are having problems sending it audit records =
so
>    * queue the given record and attempt to resend.
>    */
> -static void kauditd_retry_skb(struct sk_buff *skb)
> +static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int e=
rror)
>   {
> -=09/* NOTE: because records should only live in the retry queue for a
> -=09 * short period of time, before either being sent or moved to the hol=
d
> -=09 * queue, we don't currently enforce a limit on this queue */
> -=09skb_queue_tail(&audit_retry_queue, skb);
> +=09if (!audit_backlog_limit ||
> +=09    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
> +=09=09skb_queue_tail(&audit_retry_queue, skb);
> +=09=09return;
> +=09}
> +
> +=09audit_log_lost("kauditd retry queue overflow");
> +=09kfree_skb(skb);
>   }
>  =20
>   /**
> @@ -640,7 +660,7 @@ static void auditd_reset(const struct auditd_connecti=
on *ac)
>   =09/* flush the retry queue to the hold queue, but don't touch the main
>   =09 * queue since we need to process that normally for multicast */
>   =09while ((skb =3D skb_dequeue(&audit_retry_queue)))
> -=09=09kauditd_hold_skb(skb);
> +=09=09kauditd_hold_skb(skb, -ECONNREFUSED);
>   }
>  =20
>   /**
> @@ -714,16 +734,18 @@ static int kauditd_send_queue(struct sock *sk, u32 =
portid,
>   =09=09=09      struct sk_buff_head *queue,
>   =09=09=09      unsigned int retry_limit,
>   =09=09=09      void (*skb_hook)(struct sk_buff *skb),
> -=09=09=09      void (*err_hook)(struct sk_buff *skb))
> +=09=09=09      void (*err_hook)(struct sk_buff *skb, int error))
>   {
>   =09int rc =3D 0;
> -=09struct sk_buff *skb;
> +=09struct sk_buff *skb =3D NULL;
> +=09struct sk_buff *skb_tail;
>   =09unsigned int failed =3D 0;
>  =20
>   =09/* NOTE: kauditd_thread takes care of all our locking, we just use
>   =09 *       the netlink info passed to us (e.g. sk and portid) */
>  =20
> -=09while ((skb =3D skb_dequeue(queue))) {
> +=09skb_tail =3D skb_peek_tail(queue);
> +=09while ((skb !=3D skb_tail) && (skb =3D skb_dequeue(queue))) {
>   =09=09/* call the skb_hook for each skb we touch */
>   =09=09if (skb_hook)
>   =09=09=09(*skb_hook)(skb);
> @@ -731,7 +753,7 @@ static int kauditd_send_queue(struct sock *sk, u32 po=
rtid,
>   =09=09/* can we send to anyone via unicast? */
>   =09=09if (!sk) {
>   =09=09=09if (err_hook)
> -=09=09=09=09(*err_hook)(skb);
> +=09=09=09=09(*err_hook)(skb, -ECONNREFUSED);
>   =09=09=09continue;
>   =09=09}
>  =20
> @@ -745,7 +767,7 @@ static int kauditd_send_queue(struct sock *sk, u32 po=
rtid,
>   =09=09=09    rc =3D=3D -ECONNREFUSED || rc =3D=3D -EPERM) {
>   =09=09=09=09sk =3D NULL;
>   =09=09=09=09if (err_hook)
> -=09=09=09=09=09(*err_hook)(skb);
> +=09=09=09=09=09(*err_hook)(skb, rc);
>   =09=09=09=09if (rc =3D=3D -EAGAIN)
>   =09=09=09=09=09rc =3D 0;
>   =09=09=09=09/* continue to drain the queue */
>
> .

--------------B9B105ACD67DD902D2AA6736
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8=
">
  </head>
  <body>
    <pre>Hi Paul,</pre>
    <pre>There are some questions about this patch and hope it helps.</pre>
    <p>
      <blockquote type=3D"cite">
        <pre class=3D"moz-quote-pre" wrap=3D"">/**
  * kauditd_retry_skb - Queue an audit record, attempt to send again to aud=
itd
  * @skb: audit record
+ * @error: error code (unused)
  *
  * Description:
  * Not as serious as kauditd_hold_skb() as we still have a connected audit=
d,
  * but for some reason we are having problems sending it audit records so
  * queue the given record and attempt to resend.
  */
-static void kauditd_retry_skb(struct sk_buff *skb)
+static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int err=
or)
 {
-=09/* NOTE: because records should only live in the retry queue for a
-=09 * short period of time, before either being sent or moved to the hold
-=09 * queue, we don't currently enforce a limit on this queue */
-=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09if (!audit_backlog_limit ||
+=09    skb_queue_len(&amp;audit_retry_queue) &lt; audit_backlog_limit) {
+=09=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09=09return;
+=09}
+
+=09audit_log_lost("kauditd retry queue overflow");
+=09kfree_skb(skb);
 }</pre>
      </blockquote>
    </p>
    <pre>When we process the main queue, should we printk the skb when audi=
t_log_lost be call ?
<blockquote type=3D"cite"><pre class=3D"moz-quote-pre" wrap=3D"">/**
  * kauditd_hold_skb - Queue an audit record, waiting for auditd
  * @skb: audit record
+ * @error: error code
  *
  * Description:
  * Queue the audit record, waiting for an instance of auditd.  When this
@@ -564,19 +566,31 @@ static void kauditd_rehold_skb(struct sk_buff *skb)
  * and queue it, if we have room.  If we want to hold on to the record, bu=
t we
  * don't have room, record a record lost message.
  */
-static void kauditd_hold_skb(struct sk_buff *skb)
+static void kauditd_hold_skb(struct sk_buff *skb, int error)
 {
 =09/* at this point it is uncertain if we will ever send this to auditd so
 =09 * try to send the message via printk before we go any further */
 =09kauditd_printk_skb(skb);
=20
 =09/* can we just silently drop the message? */
-=09if (!audit_default) {
-=09=09kfree_skb(skb);
-=09=09return;
+=09if (!audit_default)
+=09=09goto drop;
+
+=09/* the hold queue is only for when the daemon goes away completely,
+=09 * not -EAGAIN failures; if we are in a -EAGAIN state requeue the
+=09 * record on the retry queue unless it's full, in which case drop it
+=09 */
+=09if (error =3D=3D -EAGAIN) {
+=09=09if (!audit_backlog_limit ||
+=09=09    skb_queue_len(&amp;audit_retry_queue) &lt; audit_backlog_limit) =
{
+=09=09=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09=09=09return;
+=09=09}
+=09=09audit_log_lost("kauditd retry queue overflow");
+=09=09goto drop;
 =09}
=20
-=09/* if we have room, queue the message */
+=09/* if we have room in the hold queue, queue the message */
 =09if (!audit_backlog_limit ||
 =09    skb_queue_len(&amp;audit_hold_queue) &lt; audit_backlog_limit) {
 =09=09skb_queue_tail(&amp;audit_hold_queue, skb);
@@ -585,24 +599,30 @@ static void kauditd_hold_skb(struct sk_buff *skb)
=20
 =09/* we have no other options - drop the message */
 =09audit_log_lost("kauditd hold queue overflow");
+drop:
 =09kfree_skb(skb);
 }
</pre></blockquote>If we move skbs from audit_hold_queue to audit_retry_que=
ue, will these skbs be printed
more than once by printk?=20

Thanks.
</pre>
    <div class=3D"moz-cite-prefix">=E5=9C=A8 2022/1/19 9:26, Paul Moore =E5=
=86=99=E9=81=93:<br>
    </div>
    <blockquote type=3D"cite"
      cite=3D"mid:164255558889.182404.10149317566869570438.stgit@olly">
      <pre class=3D"moz-quote-pre" wrap=3D"">When an admin enables audit at=
 early boot via the "audit=3D1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to the current queue, it also allows any recovery handling in
  kauditd_thread() to take place when kauditd_send_queue()
  returns.

- Transient netlink send errors seen as -EAGAIN now cause the
  record to be returned to the retry queue instead of going to
  the hold queue.  The intention of the hold queue is to store,
  perhaps for an extended period of time, the events which led
  up to the audit daemon going offline.  The retry queue remains
  a temporary queue intended to protect against transient issues
  between the kernel and the audit daemon.

- The retry queue is now limited by the audit_backlog_limit
  setting, the same as the other queues.  This allows admins
  to bound the size of all of the audit queues on the system.

- kauditd_rehold_skb() now returns records to the end of the
  hold queue to ensure ordering is preserved in the face of
  recent changes to kauditd_send_queue().

Cc: <a class=3D"moz-txt-link-abbreviated" href=3D"mailto:stable@vger.kernel=
.org">stable@vger.kernel.org</a>
Fixes: 5b52330bbfe63 ("audit: fix auditd/kernel connection state tracking")
Fixes: f4b3ee3c85551 ("audit: improve robustness of the audit queue handlin=
g")
Reported-by: Gaosheng Cui <a class=3D"moz-txt-link-rfc2396E" href=3D"mailto=
:cuigaosheng1@huawei.com">&lt;cuigaosheng1@huawei.com&gt;</a>
Signed-off-by: Paul Moore <a class=3D"moz-txt-link-rfc2396E" href=3D"mailto=
:paul@paul-moore.com">&lt;paul@paul-moore.com&gt;</a>
---
 kernel/audit.c |   60 ++++++++++++++++++++++++++++++++++++++--------------=
----
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index e4bbe2c70c26..c45d3fe61466 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -541,20 +541,22 @@ static void kauditd_printk_skb(struct sk_buff *skb)
 /**
  * kauditd_rehold_skb - Handle a audit record send failure in the hold que=
ue
  * @skb: audit record
+ * @error: error code (unused)
  *
  * Description:
  * This should only be used by the kauditd_thread when it fails to flush t=
he
  * hold queue.
  */
-static void kauditd_rehold_skb(struct sk_buff *skb)
+static void kauditd_rehold_skb(struct sk_buff *skb, __always_unused int er=
ror)
 {
-=09/* put the record back in the queue at the same place */
-=09skb_queue_head(&amp;audit_hold_queue, skb);
+=09/* put the record back in the queue */
+=09skb_queue_tail(&amp;audit_hold_queue, skb);
 }
=20
 /**
  * kauditd_hold_skb - Queue an audit record, waiting for auditd
  * @skb: audit record
+ * @error: error code
  *
  * Description:
  * Queue the audit record, waiting for an instance of auditd.  When this
@@ -564,19 +566,31 @@ static void kauditd_rehold_skb(struct sk_buff *skb)
  * and queue it, if we have room.  If we want to hold on to the record, bu=
t we
  * don't have room, record a record lost message.
  */
-static void kauditd_hold_skb(struct sk_buff *skb)
+static void kauditd_hold_skb(struct sk_buff *skb, int error)
 {
 =09/* at this point it is uncertain if we will ever send this to auditd so
 =09 * try to send the message via printk before we go any further */
 =09kauditd_printk_skb(skb);
=20
 =09/* can we just silently drop the message? */
-=09if (!audit_default) {
-=09=09kfree_skb(skb);
-=09=09return;
+=09if (!audit_default)
+=09=09goto drop;
+
+=09/* the hold queue is only for when the daemon goes away completely,
+=09 * not -EAGAIN failures; if we are in a -EAGAIN state requeue the
+=09 * record on the retry queue unless it's full, in which case drop it
+=09 */
+=09if (error =3D=3D -EAGAIN) {
+=09=09if (!audit_backlog_limit ||
+=09=09    skb_queue_len(&amp;audit_retry_queue) &lt; audit_backlog_limit) =
{
+=09=09=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09=09=09return;
+=09=09}
+=09=09audit_log_lost("kauditd retry queue overflow");
+=09=09goto drop;
 =09}
=20
-=09/* if we have room, queue the message */
+=09/* if we have room in the hold queue, queue the message */
 =09if (!audit_backlog_limit ||
 =09    skb_queue_len(&amp;audit_hold_queue) &lt; audit_backlog_limit) {
 =09=09skb_queue_tail(&amp;audit_hold_queue, skb);
@@ -585,24 +599,30 @@ static void kauditd_hold_skb(struct sk_buff *skb)
=20
 =09/* we have no other options - drop the message */
 =09audit_log_lost("kauditd hold queue overflow");
+drop:
 =09kfree_skb(skb);
 }
=20
 /**
  * kauditd_retry_skb - Queue an audit record, attempt to send again to aud=
itd
  * @skb: audit record
+ * @error: error code (unused)
  *
  * Description:
  * Not as serious as kauditd_hold_skb() as we still have a connected audit=
d,
  * but for some reason we are having problems sending it audit records so
  * queue the given record and attempt to resend.
  */
-static void kauditd_retry_skb(struct sk_buff *skb)
+static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int err=
or)
 {
-=09/* NOTE: because records should only live in the retry queue for a
-=09 * short period of time, before either being sent or moved to the hold
-=09 * queue, we don't currently enforce a limit on this queue */
-=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09if (!audit_backlog_limit ||
+=09    skb_queue_len(&amp;audit_retry_queue) &lt; audit_backlog_limit) {
+=09=09skb_queue_tail(&amp;audit_retry_queue, skb);
+=09=09return;
+=09}
+
+=09audit_log_lost("kauditd retry queue overflow");
+=09kfree_skb(skb);
 }
=20
 /**
@@ -640,7 +660,7 @@ static void auditd_reset(const struct auditd_connection=
 *ac)
 =09/* flush the retry queue to the hold queue, but don't touch the main
 =09 * queue since we need to process that normally for multicast */
 =09while ((skb =3D skb_dequeue(&amp;audit_retry_queue)))
-=09=09kauditd_hold_skb(skb);
+=09=09kauditd_hold_skb(skb, -ECONNREFUSED);
 }
=20
 /**
@@ -714,16 +734,18 @@ static int kauditd_send_queue(struct sock *sk, u32 po=
rtid,
 =09=09=09      struct sk_buff_head *queue,
 =09=09=09      unsigned int retry_limit,
 =09=09=09      void (*skb_hook)(struct sk_buff *skb),
-=09=09=09      void (*err_hook)(struct sk_buff *skb))
+=09=09=09      void (*err_hook)(struct sk_buff *skb, int error))
 {
 =09int rc =3D 0;
-=09struct sk_buff *skb;
+=09struct sk_buff *skb =3D NULL;
+=09struct sk_buff *skb_tail;
 =09unsigned int failed =3D 0;
=20
 =09/* NOTE: kauditd_thread takes care of all our locking, we just use
 =09 *       the netlink info passed to us (e.g. sk and portid) */
=20
-=09while ((skb =3D skb_dequeue(queue))) {
+=09skb_tail =3D skb_peek_tail(queue);
+=09while ((skb !=3D skb_tail) &amp;&amp; (skb =3D skb_dequeue(queue))) {
 =09=09/* call the skb_hook for each skb we touch */
 =09=09if (skb_hook)
 =09=09=09(*skb_hook)(skb);
@@ -731,7 +753,7 @@ static int kauditd_send_queue(struct sock *sk, u32 port=
id,
 =09=09/* can we send to anyone via unicast? */
 =09=09if (!sk) {
 =09=09=09if (err_hook)
-=09=09=09=09(*err_hook)(skb);
+=09=09=09=09(*err_hook)(skb, -ECONNREFUSED);
 =09=09=09continue;
 =09=09}
=20
@@ -745,7 +767,7 @@ static int kauditd_send_queue(struct sock *sk, u32 port=
id,
 =09=09=09    rc =3D=3D -ECONNREFUSED || rc =3D=3D -EPERM) {
 =09=09=09=09sk =3D NULL;
 =09=09=09=09if (err_hook)
-=09=09=09=09=09(*err_hook)(skb);
+=09=09=09=09=09(*err_hook)(skb, rc);
 =09=09=09=09if (rc =3D=3D -EAGAIN)
 =09=09=09=09=09rc =3D 0;
 =09=09=09=09/* continue to drain the queue */

.
</pre>
    </blockquote>
  </body>
</html>

--------------B9B105ACD67DD902D2AA6736--

--===============6454221015519942906==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit
--===============6454221015519942906==--