From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paul Moore <paul@paul-moore.com>
Subject: Re: [PATCH] audit: make sure we don't let the retry queue grow
	without bounds
Date: Mon, 10 Apr 2017 17:04:15 -0400
Message-ID: <CAHC9VhTr2KycQNNqManhpd9_sqz-MY4ivycn6OhK+zXk=+ek7g@mail.gmail.com>
References: <149185766728.14543.3818679159914063670.stgit@sifl>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <linux-audit-bounces@redhat.com>
Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com
	[10.5.110.31])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 5FA34179E9
	for <linux-audit@redhat.com>; Mon, 10 Apr 2017 21:04:18 +0000 (UTC)
Received: from mail-vk0-f68.google.com (mail-vk0-f68.google.com
	[209.85.213.68])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 90B11C04B326
	for <linux-audit@redhat.com>; Mon, 10 Apr 2017 21:04:16 +0000 (UTC)
Received: by mail-vk0-f68.google.com with SMTP id z204so13721338vkd.2
	for <linux-audit@redhat.com>; Mon, 10 Apr 2017 14:04:16 -0700 (PDT)
In-Reply-To: <149185766728.14543.3818679159914063670.stgit@sifl>
List-Unsubscribe: <https://www.redhat.com/mailman/options/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
To: Paul Moore <pmoore@redhat.com>
Cc: Seth Forshee <seth.forshee@canonical.com>, linux-audit@redhat.com
List-Id: linux-audit@redhat.com

On Mon, Apr 10, 2017 at 4:54 PM, Paul Moore <pmoore@redhat.com> wrote:
> From: Paul Moore <paul@paul-moore.com>
>
> The retry queue is intended to provide a temporary buffer in the case
> of transient errors when communicating with auditd, it is not meant
> as a long life queue, that functionality is provided by the hold
> queue.
>
> This patch fixes a problem identified by Seth where the retry queue
> could grow uncontrollably if an auditd instance did not connect to
> the kernel to drain the queues.  This commit fixes this by doing the
> following:
>
> * Make sure we always call auditd_reset() if we decide the connection
> with audit is really dead.  There were some cases in
> kauditd_hold_skb() where we did not reset the connection, this patch
> relocates the reset calls to kauditd_thread() so all the error
> conditions are caught and the connection reset.  As a side effect,
> this means we could move auditd_reset() and get rid of the forward
> definition at the top of kernel/audit.c.
>
> * We never checked the status of the auditd connection when
> processing the main audit queue which meant that the retry queue
> could grow unchecked.  This patch adds a call to auditd_reset()
> after the main queue has been processed if auditd is not connected,
> the auditd_reset() call will make sure the retry and hold queues are
> correctly managed/flushed so that the retry queue remains reasonable.
>
> Reported-by: Seth Forshee <seth.forshee@canonical.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  kernel/audit.c |   67 +++++++++++++++++++++++++++-----------------------------
>  1 file changed, 32 insertions(+), 35 deletions(-)

FWIW, I just ran this for about 30m randomly starting and stopping
auditd under heavy audit load and didn't notice any change in memory
usage on my test system.  If I don't hear anything negative by
tomorrow I'll send this up to Linus.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2f4964cfde0b..a871bf80fde1 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -160,7 +160,6 @@ static LIST_HEAD(audit_freelist);
>
>  /* queue msgs to send via kauditd_task */
>  static struct sk_buff_head audit_queue;
> -static void kauditd_hold_skb(struct sk_buff *skb);
>  /* queue msgs due to temporary unicast send problems */
>  static struct sk_buff_head audit_retry_queue;
>  /* queue msgs waiting for new auditd connection */
> @@ -454,30 +453,6 @@ static void auditd_set(int pid, u32 portid, struct net *net)
>  }
>
>  /**
> - * auditd_reset - Disconnect the auditd connection
> - *
> - * Description:
> - * Break the auditd/kauditd connection and move all the queued records into the
> - * hold queue in case auditd reconnects.
> - */
> -static void auditd_reset(void)
> -{
> -       struct sk_buff *skb;
> -
> -       /* if it isn't already broken, break the connection */
> -       rcu_read_lock();
> -       if (auditd_conn.pid)
> -               auditd_set(0, 0, NULL);
> -       rcu_read_unlock();
> -
> -       /* flush all of the main and retry queues to the hold queue */
> -       while ((skb = skb_dequeue(&audit_retry_queue)))
> -               kauditd_hold_skb(skb);
> -       while ((skb = skb_dequeue(&audit_queue)))
> -               kauditd_hold_skb(skb);
> -}
> -
> -/**
>   * kauditd_print_skb - Print the audit record to the ring buffer
>   * @skb: audit record
>   *
> @@ -505,9 +480,6 @@ static void kauditd_rehold_skb(struct sk_buff *skb)
>  {
>         /* put the record back in the queue at the same place */
>         skb_queue_head(&audit_hold_queue, skb);
> -
> -       /* fail the auditd connection */
> -       auditd_reset();
>  }
>
>  /**
> @@ -544,9 +516,6 @@ static void kauditd_hold_skb(struct sk_buff *skb)
>         /* we have no other options - drop the message */
>         audit_log_lost("kauditd hold queue overflow");
>         kfree_skb(skb);
> -
> -       /* fail the auditd connection */
> -       auditd_reset();
>  }
>
>  /**
> @@ -567,6 +536,30 @@ static void kauditd_retry_skb(struct sk_buff *skb)
>  }
>
>  /**
> + * auditd_reset - Disconnect the auditd connection
> + *
> + * Description:
> + * Break the auditd/kauditd connection and move all the queued records into the
> + * hold queue in case auditd reconnects.
> + */
> +static void auditd_reset(void)
> +{
> +       struct sk_buff *skb;
> +
> +       /* if it isn't already broken, break the connection */
> +       rcu_read_lock();
> +       if (auditd_conn.pid)
> +               auditd_set(0, 0, NULL);
> +       rcu_read_unlock();
> +
> +       /* flush all of the main and retry queues to the hold queue */
> +       while ((skb = skb_dequeue(&audit_retry_queue)))
> +               kauditd_hold_skb(skb);
> +       while ((skb = skb_dequeue(&audit_queue)))
> +               kauditd_hold_skb(skb);
> +}
> +
> +/**
>   * auditd_send_unicast_skb - Send a record via unicast to auditd
>   * @skb: audit record
>   *
> @@ -758,6 +751,7 @@ static int kauditd_thread(void *dummy)
>                                         NULL, kauditd_rehold_skb);
>                 if (rc < 0) {
>                         sk = NULL;
> +                       auditd_reset();
>                         goto main_queue;
>                 }
>
> @@ -767,6 +761,7 @@ static int kauditd_thread(void *dummy)
>                                         NULL, kauditd_hold_skb);
>                 if (rc < 0) {
>                         sk = NULL;
> +                       auditd_reset();
>                         goto main_queue;
>                 }
>
> @@ -775,16 +770,18 @@ static int kauditd_thread(void *dummy)
>                  * unicast, dump failed record sends to the retry queue; if
>                  * sk == NULL due to previous failures we will just do the
>                  * multicast send and move the record to the retry queue */
> -               kauditd_send_queue(sk, portid, &audit_queue, 1,
> -                                  kauditd_send_multicast_skb,
> -                                  kauditd_retry_skb);
> +               rc = kauditd_send_queue(sk, portid, &audit_queue, 1,
> +                                       kauditd_send_multicast_skb,
> +                                       kauditd_retry_skb);
> +               if (sk == NULL || rc < 0)
> +                       auditd_reset();
> +               sk = NULL;
>
>                 /* drop our netns reference, no auditd sends past this line */
>                 if (net) {
>                         put_net(net);
>                         net = NULL;
>                 }
> -               sk = NULL;
>
>                 /* we have processed all the queues so wake everyone */
>                 wake_up(&audit_backlog_wait);
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit

-- 
paul moore
www.paul-moore.com