audit.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks
@ 2023-05-11  5:21 Eiichi Tsukata
  2023-05-11  5:21 ` [PATCH v2 1/5] audit: refactor queue full checks Eiichi Tsukata
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

This patchset consists of the following refactors and fixes for
potential deadlocks.

  Patch1-3: Refactorings and a small fix preferable for following deadlock fixes
  Patch4: Fix for deadlock caused by wait/wake_up race in @audit_queue full
  Patch5: Fix for deadlock caused by missing wake_up

v1->v2:
  - wait_for_kauditd(): returns 0 if @audit_queue is not full
  - Add another fix for deadlock caused by missing wake_up

Eiichi Tsukata (5):
  audit: refactor queue full checks
  audit: account backlog waiting time in audit_receive()
  audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT
  audit: check if audit_queue is full after prepare_to_wait_exclusive()
  audit: do not use exclusive wait in audit_receive()

 kernel/audit.c | 78 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 28 deletions(-)

-- 
2.40.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/5] audit: refactor queue full checks
  2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
@ 2023-05-11  5:21 ` Eiichi Tsukata
  2023-05-19 20:54   ` Paul Moore
  2023-05-11  5:21 ` [PATCH v2 2/5] audit: account backlog waiting time in audit_receive() Eiichi Tsukata
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

Currently audit queue full checks are done in multiple places.
Consolidate them into one audit_queue_full().

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 9bc0b0301198..c15694e1a76b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
 	return retval;
 }
 
+static inline int audit_queue_full(const struct sk_buff_head *queue)
+{
+	return audit_backlog_limit &&
+	       (skb_queue_len(queue) > audit_backlog_limit);
+}
+
 /**
  * audit_log_lost - conditionally log lost audit message event
  * @message: the message stating reason for lost audit message
@@ -579,8 +585,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
 	 * record on the retry queue unless it's full, in which case drop it
 	 */
 	if (error == -EAGAIN) {
-		if (!audit_backlog_limit ||
-		    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
+		if (!audit_queue_full(&audit_retry_queue)) {
 			skb_queue_tail(&audit_retry_queue, skb);
 			return;
 		}
@@ -589,8 +594,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
 	}
 
 	/* if we have room in the hold queue, queue the message */
-	if (!audit_backlog_limit ||
-	    skb_queue_len(&audit_hold_queue) < audit_backlog_limit) {
+	if (!audit_queue_full(&audit_hold_queue)) {
 		skb_queue_tail(&audit_hold_queue, skb);
 		return;
 	}
@@ -613,8 +617,7 @@ static void kauditd_hold_skb(struct sk_buff *skb, int error)
  */
 static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
 {
-	if (!audit_backlog_limit ||
-	    skb_queue_len(&audit_retry_queue) < audit_backlog_limit) {
+	if (!audit_queue_full(&audit_retry_queue)) {
 		skb_queue_tail(&audit_retry_queue, skb);
 		return;
 	}
@@ -1564,8 +1567,7 @@ static void audit_receive(struct sk_buff  *skb)
 	audit_ctl_unlock();
 
 	/* can't block with the ctrl lock, so penalize the sender now */
-	if (audit_backlog_limit &&
-	    (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
+	if (audit_queue_full(&audit_queue)) {
 		DECLARE_WAITQUEUE(wait, current);
 
 		/* wake kauditd to try and flush the queue */
@@ -1866,8 +1868,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 	if (!(auditd_test_task(current) || audit_ctl_owner_current())) {
 		long stime = audit_backlog_wait_time;
 
-		while (audit_backlog_limit &&
-		       (skb_queue_len(&audit_queue) > audit_backlog_limit)) {
+		while (audit_queue_full(&audit_queue)) {
 			/* wake kauditd to try and flush the queue */
 			wake_up_interruptible(&kauditd_wait);
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/5] audit: account backlog waiting time in audit_receive()
  2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
  2023-05-11  5:21 ` [PATCH v2 1/5] audit: refactor queue full checks Eiichi Tsukata
@ 2023-05-11  5:21 ` Eiichi Tsukata
  2023-05-19 20:54   ` Paul Moore
  2023-05-11  5:21 ` [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT Eiichi Tsukata
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

Currently backlog waiting time in audit_receive() is not accounted as
audit_backlog_wait_time_actual. Accounts it as well.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index c15694e1a76b..89e119ccda76 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -628,6 +628,29 @@ static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
 	kfree_skb(skb);
 }
 
+/**
+ * wait_for_kauditd - Wait for kauditd to drain the queue
+ * @stime: time to sleep
+ *
+ * Description:
+ * Waits for kauditd to drain the queue then adds duration of sleep time to
+ * audit_backlog_wait_time_actual as cumulative sum.
+ * Returns remaining time to sleep.
+ */
+static long wait_for_kauditd(long stime)
+{
+	long rtime;
+	DECLARE_WAITQUEUE(wait, current);
+
+	add_wait_queue_exclusive(&audit_backlog_wait, &wait);
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	rtime = schedule_timeout(stime);
+	atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
+	remove_wait_queue(&audit_backlog_wait, &wait);
+
+	return rtime;
+}
+
 /**
  * auditd_reset - Disconnect the auditd connection
  * @ac: auditd connection state
@@ -1568,15 +1591,9 @@ static void audit_receive(struct sk_buff  *skb)
 
 	/* can't block with the ctrl lock, so penalize the sender now */
 	if (audit_queue_full(&audit_queue)) {
-		DECLARE_WAITQUEUE(wait, current);
-
 		/* wake kauditd to try and flush the queue */
 		wake_up_interruptible(&kauditd_wait);
-
-		add_wait_queue_exclusive(&audit_backlog_wait, &wait);
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(audit_backlog_wait_time);
-		remove_wait_queue(&audit_backlog_wait, &wait);
+		wait_for_kauditd(audit_backlog_wait_time);
 	}
 }
 
@@ -1874,17 +1891,8 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 
 			/* sleep if we are allowed and we haven't exhausted our
 			 * backlog wait limit */
-			if (gfpflags_allow_blocking(gfp_mask) && (stime > 0)) {
-				long rtime = stime;
-
-				DECLARE_WAITQUEUE(wait, current);
-
-				add_wait_queue_exclusive(&audit_backlog_wait,
-							 &wait);
-				set_current_state(TASK_UNINTERRUPTIBLE);
-				stime = schedule_timeout(rtime);
-				atomic_add(rtime - stime, &audit_backlog_wait_time_actual);
-				remove_wait_queue(&audit_backlog_wait, &wait);
+			if (gfpflags_allow_blocking(gfp_mask) && stime > 0) {
+				stime = wait_for_kauditd(stime);
 			} else {
 				if (audit_rate_check() && printk_ratelimit())
 					pr_warn("audit_backlog=%d > audit_backlog_limit=%d\n",
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT
  2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
  2023-05-11  5:21 ` [PATCH v2 1/5] audit: refactor queue full checks Eiichi Tsukata
  2023-05-11  5:21 ` [PATCH v2 2/5] audit: account backlog waiting time in audit_receive() Eiichi Tsukata
@ 2023-05-11  5:21 ` Eiichi Tsukata
  2023-05-19 20:54   ` Paul Moore
  2023-05-11  5:21 ` [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive() Eiichi Tsukata
  2023-05-11  5:21 ` [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive() Eiichi Tsukata
  4 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

As there is no need to use DECLARE_WAITQUEUE, use simpler DEFINE_WAIT.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 89e119ccda76..bcbb0ba33c84 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -640,13 +640,13 @@ static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
 static long wait_for_kauditd(long stime)
 {
 	long rtime;
-	DECLARE_WAITQUEUE(wait, current);
+	DEFINE_WAIT(wait);
 
-	add_wait_queue_exclusive(&audit_backlog_wait, &wait);
-	set_current_state(TASK_UNINTERRUPTIBLE);
+	prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
+				  TASK_UNINTERRUPTIBLE);
 	rtime = schedule_timeout(stime);
 	atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
-	remove_wait_queue(&audit_backlog_wait, &wait);
+	finish_wait(&audit_backlog_wait, &wait);
 
 	return rtime;
 }
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive()
  2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
                   ` (2 preceding siblings ...)
  2023-05-11  5:21 ` [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT Eiichi Tsukata
@ 2023-05-11  5:21 ` Eiichi Tsukata
  2023-05-19 20:54   ` Paul Moore
  2023-05-11  5:21 ` [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive() Eiichi Tsukata
  4 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

Commit 7ffb8e317bae ("audit: we don't need to
__set_current_state(TASK_RUNNING)") accidentally moved queue full check
before add_wait_queue_exclusive() which introduced the following race:

    CPU1                           CPU2
  ========                       ========
  (in audit_log_start())         (in kauditd_thread())

  @audit_queue is full
                                 wake_up(&audit_backlog_wait)
                                 wait_event_freezable()
  add_wait_queue_exclusive()
  ...
  schedule_timeout()

Once this happens, both audit_log_start() and kauditd_thread() can cause
deadlock for up to backlog_wait_time waiting for each other. To prevent
the race, this patch adds @audit_queue full check after
prepare_to_wait_exclusive() and call schedule_timeout() only if the
queue is full.

Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index bcbb0ba33c84..6b0cc0459984 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -644,8 +644,16 @@ static long wait_for_kauditd(long stime)
 
 	prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
 				  TASK_UNINTERRUPTIBLE);
-	rtime = schedule_timeout(stime);
-	atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
+
+	/* need to check if the queue is full again because kauditd might have
+	 * flushed the queue and went to sleep after prepare_to_wait_exclusive()
+	 */
+	if (audit_queue_full(&audit_queue)) {
+		rtime = schedule_timeout(stime);
+		atomic_add(stime - rtime, &audit_backlog_wait_time_actual);
+	} else {
+		rtime = 0;
+	}
 	finish_wait(&audit_backlog_wait, &wait);
 
 	return rtime;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()
  2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
                   ` (3 preceding siblings ...)
  2023-05-11  5:21 ` [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive() Eiichi Tsukata
@ 2023-05-11  5:21 ` Eiichi Tsukata
  2023-05-19 20:54   ` Paul Moore
  4 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-11  5:21 UTC (permalink / raw)
  To: paul, eparis, linux-kernel, audit; +Cc: Eiichi Tsukata

kauditd thread issues wake_up() before it goes to sleep. The wake_up()
call wakes up only one process as waiter side uses exclusive wait.
This can be problematic when there are multiple processes (one is in
audit_receive() and others are in audit_log_start()) waiting on
audit_backlog_wait queue.

For example, if there are two processes waiting:

  Process (A): in audit_receive()
  Process (B): in audit_log_start()

And (A) is at the head of the wait queue. Then kauditd's wake_up() only
wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
result, (B) can be blocked for up to backlog_wait_time.

To prevent the issue, use non-exclusive wait in audit_receive() so that
kauditd can wake up all waiters in audit_receive().

Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
---
 kernel/audit.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 6b0cc0459984..ef48210343ae 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -631,22 +631,27 @@ static void kauditd_retry_skb(struct sk_buff *skb, __always_unused int error)
 /**
  * wait_for_kauditd - Wait for kauditd to drain the queue
  * @stime: time to sleep
+ * @exclusive: use exclusive wait if true
  *
  * Description:
  * Waits for kauditd to drain the queue then adds duration of sleep time to
  * audit_backlog_wait_time_actual as cumulative sum.
  * Returns remaining time to sleep.
  */
-static long wait_for_kauditd(long stime)
+static long wait_for_kauditd(long stime, bool exclusive)
 {
 	long rtime;
 	DEFINE_WAIT(wait);
 
-	prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
-				  TASK_UNINTERRUPTIBLE);
+	if (exclusive)
+		prepare_to_wait_exclusive(&audit_backlog_wait, &wait,
+					  TASK_UNINTERRUPTIBLE);
+	else
+		prepare_to_wait(&audit_backlog_wait, &wait,
+				TASK_UNINTERRUPTIBLE);
 
 	/* need to check if the queue is full again because kauditd might have
-	 * flushed the queue and went to sleep after prepare_to_wait_exclusive()
+	 * flushed the queue and went to sleep after prepare_to_wait_*()
 	 */
 	if (audit_queue_full(&audit_queue)) {
 		rtime = schedule_timeout(stime);
@@ -1601,7 +1606,7 @@ static void audit_receive(struct sk_buff  *skb)
 	if (audit_queue_full(&audit_queue)) {
 		/* wake kauditd to try and flush the queue */
 		wake_up_interruptible(&kauditd_wait);
-		wait_for_kauditd(audit_backlog_wait_time);
+		wait_for_kauditd(audit_backlog_wait_time, false);
 	}
 }
 
@@ -1900,7 +1905,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 			/* sleep if we are allowed and we haven't exhausted our
 			 * backlog wait limit */
 			if (gfpflags_allow_blocking(gfp_mask) && stime > 0) {
-				stime = wait_for_kauditd(stime);
+				stime = wait_for_kauditd(stime, true);
 			} else {
 				if (audit_rate_check() && printk_ratelimit())
 					pr_warn("audit_backlog=%d > audit_backlog_limit=%d\n",
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/5] audit: refactor queue full checks
  2023-05-11  5:21 ` [PATCH v2 1/5] audit: refactor queue full checks Eiichi Tsukata
@ 2023-05-19 20:54   ` Paul Moore
  2023-05-22  4:19     ` Eiichi Tsukata
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Moore @ 2023-05-19 20:54 UTC (permalink / raw)
  To: Eiichi Tsukata, eparis, linux-kernel, audit

On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> Currently audit queue full checks are done in multiple places.
> Consolidate them into one audit_queue_full().
> 
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 9bc0b0301198..c15694e1a76b 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
>  	return retval;
>  }
>  
> +static inline int audit_queue_full(const struct sk_buff_head *queue)
> +{
> +	return audit_backlog_limit &&
> +	       (skb_queue_len(queue) > audit_backlog_limit);
> +}

Regardless of the other patches in this series, this seems like a
good candidate to merge, but could you make it return a 'bool'
instead of an 'int'?

--
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/5] audit: account backlog waiting time in  audit_receive()
  2023-05-11  5:21 ` [PATCH v2 2/5] audit: account backlog waiting time in audit_receive() Eiichi Tsukata
@ 2023-05-19 20:54   ` Paul Moore
  2023-05-22  4:22     ` Eiichi Tsukata
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Moore @ 2023-05-19 20:54 UTC (permalink / raw)
  To: Eiichi Tsukata, eparis, linux-kernel, audit

On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> Currently backlog waiting time in audit_receive() is not accounted as
> audit_backlog_wait_time_actual. Accounts it as well.
> 
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 44 ++++++++++++++++++++++++++------------------
>  1 file changed, 26 insertions(+), 18 deletions(-)

The audit_receive() wait is different from that in audit_log_start()
as processes calling into audit_receive() are performing an explicit
audit operation whereas those processes calling audit_log_start() are
likely doing something else, e.g. opening a file, that happens to
result in an audit record being generated.  The fact that the
audit_receive() accounting logic, as well as the timeout calculation
used, is different from audit_log_start() is intentional.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT
  2023-05-11  5:21 ` [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT Eiichi Tsukata
@ 2023-05-19 20:54   ` Paul Moore
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Moore @ 2023-05-19 20:54 UTC (permalink / raw)
  To: Eiichi Tsukata, eparis, linux-kernel, audit

On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> As there is no need to use DECLARE_WAITQUEUE, use simpler DEFINE_WAIT.
> 
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

This change looks reasonable, but it needs to be reworked to take into
consideration the comments in patch 2/5.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/5] audit: check if audit_queue is full after  prepare_to_wait_exclusive()
  2023-05-11  5:21 ` [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive() Eiichi Tsukata
@ 2023-05-19 20:54   ` Paul Moore
  2023-05-22  4:28     ` Eiichi Tsukata
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Moore @ 2023-05-19 20:54 UTC (permalink / raw)
  To: Eiichi Tsukata, eparis, linux-kernel, audit

On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> Commit 7ffb8e317bae ("audit: we don't need to
> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> before add_wait_queue_exclusive() which introduced the following race:
> 
>     CPU1                           CPU2
>   ========                       ========
>   (in audit_log_start())         (in kauditd_thread())
> 
>   @audit_queue is full
>                                  wake_up(&audit_backlog_wait)
>                                  wait_event_freezable()
>   add_wait_queue_exclusive()
>   ...
>   schedule_timeout()
> 
> Once this happens, both audit_log_start() and kauditd_thread() can cause
> deadlock for up to backlog_wait_time waiting for each other. To prevent
> the race, this patch adds @audit_queue full check after
> prepare_to_wait_exclusive() and call schedule_timeout() only if the
> queue is full.
> 
> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

I discussed my concerns with this patch in the last patchset, and I
believe they still apply here.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()
  2023-05-11  5:21 ` [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive() Eiichi Tsukata
@ 2023-05-19 20:54   ` Paul Moore
  2023-05-22  4:44     ` Eiichi Tsukata
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Moore @ 2023-05-19 20:54 UTC (permalink / raw)
  To: Eiichi Tsukata, eparis, linux-kernel, audit

On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> call wakes up only one process as waiter side uses exclusive wait.
> This can be problematic when there are multiple processes (one is in
> audit_receive() and others are in audit_log_start()) waiting on
> audit_backlog_wait queue.
> 
> For example, if there are two processes waiting:
> 
>   Process (A): in audit_receive()
>   Process (B): in audit_log_start()
> 
> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> result, (B) can be blocked for up to backlog_wait_time.
> 
> To prevent the issue, use non-exclusive wait in audit_receive() so that
> kauditd can wake up all waiters in audit_receive().
> 
> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
>  kernel/audit.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)

This was also discussed in the last patchset.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/5] audit: refactor queue full checks
  2023-05-19 20:54   ` Paul Moore
@ 2023-05-22  4:19     ` Eiichi Tsukata
  0 siblings, 0 replies; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-22  4:19 UTC (permalink / raw)
  To: Paul Moore; +Cc: eparis, linux-kernel, audit



> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> Currently audit queue full checks are done in multiple places.
>> Consolidate them into one audit_queue_full().
>> 
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 21 +++++++++++----------
>> 1 file changed, 11 insertions(+), 10 deletions(-)
>> 
>> diff --git a/kernel/audit.c b/kernel/audit.c
>> index 9bc0b0301198..c15694e1a76b 100644
>> --- a/kernel/audit.c
>> +++ b/kernel/audit.c
>> @@ -341,6 +341,12 @@ static inline int audit_rate_check(void)
>> return retval;
>> }
>> 
>> +static inline int audit_queue_full(const struct sk_buff_head *queue)
>> +{
>> + return audit_backlog_limit &&
>> +        (skb_queue_len(queue) > audit_backlog_limit);
>> +}
> 
> Regardless of the other patches in this series, this seems like a
> good candidate to merge, but could you make it return a 'bool'
> instead of an 'int'?
> 

Sure, will fix it in v3.

Eiichi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/5] audit: account backlog waiting time in audit_receive()
  2023-05-19 20:54   ` Paul Moore
@ 2023-05-22  4:22     ` Eiichi Tsukata
  2023-05-23 20:55       ` Paul Moore
  0 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-22  4:22 UTC (permalink / raw)
  To: Paul Moore; +Cc: eparis, linux-kernel, audit



> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> Currently backlog waiting time in audit_receive() is not accounted as
>> audit_backlog_wait_time_actual. Accounts it as well.
>> 
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 44 ++++++++++++++++++++++++++------------------
>> 1 file changed, 26 insertions(+), 18 deletions(-)
> 
> The audit_receive() wait is different from that in audit_log_start()
> as processes calling into audit_receive() are performing an explicit
> audit operation whereas those processes calling audit_log_start() are
> likely doing something else, e.g. opening a file, that happens to
> result in an audit record being generated.  The fact that the
> audit_receive() accounting logic, as well as the timeout calculation
> used, is different from audit_log_start() is intentional.
> 

The intention still sounds a bit not clear to me as both cases wait using
the same parameter “backlog_wait_time” and use the same wait
queue.

IMHO, it will be better to have some comprehensive code comments there.

Eiichi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive()
  2023-05-19 20:54   ` Paul Moore
@ 2023-05-22  4:28     ` Eiichi Tsukata
  2023-05-23 21:02       ` Paul Moore
  0 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-22  4:28 UTC (permalink / raw)
  To: Paul Moore; +Cc: eparis, linux-kernel, audit



> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> Commit 7ffb8e317bae ("audit: we don't need to
>> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
>> before add_wait_queue_exclusive() which introduced the following race:
>> 
>>    CPU1                           CPU2
>>  ========                       ========
>>  (in audit_log_start())         (in kauditd_thread())
>> 
>>  @audit_queue is full
>>                                 wake_up(&audit_backlog_wait)
>>                                 wait_event_freezable()
>>  add_wait_queue_exclusive()
>>  ...
>>  schedule_timeout()
>> 
>> Once this happens, both audit_log_start() and kauditd_thread() can cause
>> deadlock for up to backlog_wait_time waiting for each other. To prevent
>> the race, this patch adds @audit_queue full check after
>> prepare_to_wait_exclusive() and call schedule_timeout() only if the
>> queue is full.
>> 
>> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
> 
> I discussed my concerns with this patch in the last patchset, and I
> believe they still apply here.
> 

Please refer to the implementation of ___wait_event().
It checks the condition *after* prepare_to_wait_event().

Another similar example in the kernel code is unix_wait_for_peer().
It checks unix_recvq_full() after prepare_to_wait_exclusive().

I’m assuming this is a logical bug needs to be fixed.

Eiichi



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()
  2023-05-19 20:54   ` Paul Moore
@ 2023-05-22  4:44     ` Eiichi Tsukata
  2023-05-23  3:58       ` Eiichi Tsukata
  0 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-22  4:44 UTC (permalink / raw)
  To: Paul Moore; +Cc: eparis, linux-kernel, audit



> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> 
> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>> 
>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>> call wakes up only one process as waiter side uses exclusive wait.
>> This can be problematic when there are multiple processes (one is in
>> audit_receive() and others are in audit_log_start()) waiting on
>> audit_backlog_wait queue.
>> 
>> For example, if there are two processes waiting:
>> 
>>  Process (A): in audit_receive()
>>  Process (B): in audit_log_start()
>> 
>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>> result, (B) can be blocked for up to backlog_wait_time.
>> 
>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>> kauditd can wake up all waiters in audit_receive().
>> 
>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>> ---
>> kernel/audit.c | 17 +++++++++++------
>> 1 file changed, 11 insertions(+), 6 deletions(-)
> 
> This was also discussed in the last patchset.
> 
> 

This bug is much easily reproducible on real environments and can cause problematic
user space failure like SSH connection timeout.
Let’s not keep the bug unfixed.
(Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Other ideas in my minds are:

(1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid 
  wake_up() tries to wake up a waiter in audit_log_start().

(2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent 
  audit_receive() from unnecessarily waiting for so long time. 

BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
I’d appreciate if you could tell me the reason behind that value.

Eiichi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()
  2023-05-22  4:44     ` Eiichi Tsukata
@ 2023-05-23  3:58       ` Eiichi Tsukata
  2023-05-23 21:07         ` Paul Moore
  0 siblings, 1 reply; 19+ messages in thread
From: Eiichi Tsukata @ 2023-05-23  3:58 UTC (permalink / raw)
  To: Paul Moore; +Cc: eparis, linux-kernel, audit



> On May 22, 2023, at 13:44, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> 
> 
> 
>> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
>> 
>> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>>> 
>>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
>>> call wakes up only one process as waiter side uses exclusive wait.
>>> This can be problematic when there are multiple processes (one is in
>>> audit_receive() and others are in audit_log_start()) waiting on
>>> audit_backlog_wait queue.
>>> 
>>> For example, if there are two processes waiting:
>>> 
>>> Process (A): in audit_receive()
>>> Process (B): in audit_log_start()
>>> 
>>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
>>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
>>> result, (B) can be blocked for up to backlog_wait_time.
>>> 
>>> To prevent the issue, use non-exclusive wait in audit_receive() so that
>>> kauditd can wake up all waiters in audit_receive().
>>> 
>>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>> ---
>>> kernel/audit.c | 17 +++++++++++------
>>> 1 file changed, 11 insertions(+), 6 deletions(-)
>> 
>> This was also discussed in the last patchset.
>> 
>> 
> 
> This bug is much easily reproducible on real environments and can cause problematic
> user space failure like SSH connection timeout.
> Let’s not keep the bug unfixed.
> (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)
> 
> Other ideas in my minds are:
> 
> (1) Use different wait queues in audit_receive() and audit_log_start() to guarantee kautid 
>  wake_up() tries to wake up a waiter in audit_log_start().
> 
> (2) Periodically (say in every 1 sec) check if @audit_queue is full in audit_receive() to prevent 
>  audit_receive() from unnecessarily waiting for so long time. 
> 
> BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> I’d appreciate if you could tell me the reason behind that value.
> 
> Eiichi

I came up with a better idea:

(3) Move wait_for_kauditd() in audit_receive() *before* audit_ctl_lock() 
 and restrict penalty only for msg_type which can queue a new audit record. (AUDIT_USER, AUDIT_TRIM, AUDIT_MAKE_EQUIV, ..)

Originally, it’s not reasonable to give penalty for innocent operation
like AUDIT_GET.
This approach makes successive audit_log_end() wake up kauditd.
Also it prevents audit_log_end() from queueing skb ignoring backlog_limit.

Eiichi 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/5] audit: account backlog waiting time in audit_receive()
  2023-05-22  4:22     ` Eiichi Tsukata
@ 2023-05-23 20:55       ` Paul Moore
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Moore @ 2023-05-23 20:55 UTC (permalink / raw)
  To: Eiichi Tsukata; +Cc: eparis, linux-kernel, audit

On Mon, May 22, 2023 at 12:22 AM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> > On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >>
> >> Currently backlog waiting time in audit_receive() is not accounted as
> >> audit_backlog_wait_time_actual. Accounts it as well.
> >>
> >> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> >> ---
> >> kernel/audit.c | 44 ++++++++++++++++++++++++++------------------
> >> 1 file changed, 26 insertions(+), 18 deletions(-)
> >
> > The audit_receive() wait is different from that in audit_log_start()
> > as processes calling into audit_receive() are performing an explicit
> > audit operation whereas those processes calling audit_log_start() are
> > likely doing something else, e.g. opening a file, that happens to
> > result in an audit record being generated.  The fact that the
> > audit_receive() accounting logic, as well as the timeout calculation
> > used, is different from audit_log_start() is intentional.
> >
>
> The intention still sounds a bit not clear to me as both cases wait using
> the same parameter “backlog_wait_time” and use the same wait
> queue.
>
> IMHO, it will be better to have some comprehensive code comments there.

A fair point.  I'll add that to the todo list.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive()
  2023-05-22  4:28     ` Eiichi Tsukata
@ 2023-05-23 21:02       ` Paul Moore
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Moore @ 2023-05-23 21:02 UTC (permalink / raw)
  To: Eiichi Tsukata; +Cc: eparis, linux-kernel, audit

On Mon, May 22, 2023 at 12:28 AM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> > On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >>
> >> Commit 7ffb8e317bae ("audit: we don't need to
> >> __set_current_state(TASK_RUNNING)") accidentally moved queue full check
> >> before add_wait_queue_exclusive() which introduced the following race:
> >>
> >>    CPU1                           CPU2
> >>  ========                       ========
> >>  (in audit_log_start())         (in kauditd_thread())
> >>
> >>  @audit_queue is full
> >>                                 wake_up(&audit_backlog_wait)
> >>                                 wait_event_freezable()
> >>  add_wait_queue_exclusive()
> >>  ...
> >>  schedule_timeout()
> >>
> >> Once this happens, both audit_log_start() and kauditd_thread() can cause
> >> deadlock for up to backlog_wait_time waiting for each other. To prevent
> >> the race, this patch adds @audit_queue full check after
> >> prepare_to_wait_exclusive() and call schedule_timeout() only if the
> >> queue is full.
> >>
> >> Fixes: 7ffb8e317bae ("audit: we don't need to __set_current_state(TASK_RUNNING)")
> >> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> >> ---
> >> kernel/audit.c | 12 ++++++++++--
> >> 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > I discussed my concerns with this patch in the last patchset, and I
> > believe they still apply here.
> >
>
> Please refer to the implementation of ___wait_event().
> It checks the condition *after* prepare_to_wait_event().
>
> Another similar example in the kernel code is unix_wait_for_peer().
> It checks unix_recvq_full() after prepare_to_wait_exclusive().
>
> I’m assuming this is a logical bug needs to be fixed.

I disagree, see my previous comments.  The fixes you've presented do
not eliminate the possibility of rescheduling which could result in
some of the issues you've described.  The proper fix for systems which
are sensitive to long scheduling delays such as this is to adjust your
audit runtime configuration so that audit does not block userspace.
Suggestions include removing the backlog limit and/or shortening the
wait time.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive()
  2023-05-23  3:58       ` Eiichi Tsukata
@ 2023-05-23 21:07         ` Paul Moore
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Moore @ 2023-05-23 21:07 UTC (permalink / raw)
  To: Eiichi Tsukata; +Cc: eparis, linux-kernel, audit

On Mon, May 22, 2023 at 11:58 PM Eiichi Tsukata
<eiichi.tsukata@nutanix.com> wrote:
> > On May 22, 2023, at 13:44, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >> On May 20, 2023, at 5:54, Paul Moore <paul@paul-moore.com> wrote:
> >> On May 11, 2023 Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
> >>>
> >>> kauditd thread issues wake_up() before it goes to sleep. The wake_up()
> >>> call wakes up only one process as waiter side uses exclusive wait.
> >>> This can be problematic when there are multiple processes (one is in
> >>> audit_receive() and others are in audit_log_start()) waiting on
> >>> audit_backlog_wait queue.
> >>>
> >>> For example, if there are two processes waiting:
> >>>
> >>> Process (A): in audit_receive()
> >>> Process (B): in audit_log_start()
> >>>
> >>> And (A) is at the head of the wait queue. Then kauditd's wake_up() only
> >>> wakes up (A) leaving (B) as it is even if @audit_queue is drained. As a
> >>> result, (B) can be blocked for up to backlog_wait_time.
> >>>
> >>> To prevent the issue, use non-exclusive wait in audit_receive() so that
> >>> kauditd can wake up all waiters in audit_receive().
> >>>
> >>> Fixes: 8f110f530635 ("audit: ensure userspace is penalized the same as the kernel when under pressure")
> >>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> >>> ---
> >>> kernel/audit.c | 17 +++++++++++------
> >>> 1 file changed, 11 insertions(+), 6 deletions(-)
> >>
> >> This was also discussed in the last patchset.
> >
> > This bug is much easily reproducible on real environments and can cause problematic
> > user space failure like SSH connection timeout.
> > Let’s not keep the bug unfixed.
> > (Of course we’ve already carefully tuned audit related params and user space auditd config so that our product won’t hit backlog full.)

Good.  Resolving your issues through audit runtime configuration is
the proper solution to this.

> > BTW, the default backlog_wait_time is 60 * HZ which seems pretty large.
> > I’d appreciate if you could tell me the reason behind that value.

I do not recall the original logic behind that value.  It is likely
that the original value predated my maintenance of the audit
subsystem.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-23 21:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11  5:21 [PATCH v2 0/5] audit: refactors and fixes for potential deadlocks Eiichi Tsukata
2023-05-11  5:21 ` [PATCH v2 1/5] audit: refactor queue full checks Eiichi Tsukata
2023-05-19 20:54   ` Paul Moore
2023-05-22  4:19     ` Eiichi Tsukata
2023-05-11  5:21 ` [PATCH v2 2/5] audit: account backlog waiting time in audit_receive() Eiichi Tsukata
2023-05-19 20:54   ` Paul Moore
2023-05-22  4:22     ` Eiichi Tsukata
2023-05-23 20:55       ` Paul Moore
2023-05-11  5:21 ` [PATCH v2 3/5] audit: convert DECLARE_WAITQUEUE to DEFINE_WAIT Eiichi Tsukata
2023-05-19 20:54   ` Paul Moore
2023-05-11  5:21 ` [PATCH v2 4/5] audit: check if audit_queue is full after prepare_to_wait_exclusive() Eiichi Tsukata
2023-05-19 20:54   ` Paul Moore
2023-05-22  4:28     ` Eiichi Tsukata
2023-05-23 21:02       ` Paul Moore
2023-05-11  5:21 ` [PATCH v2 5/5] audit: do not use exclusive wait in audit_receive() Eiichi Tsukata
2023-05-19 20:54   ` Paul Moore
2023-05-22  4:44     ` Eiichi Tsukata
2023-05-23  3:58       ` Eiichi Tsukata
2023-05-23 21:07         ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).