All of lore.kernel.org
 help / color / mirror / Atom feed
* Flush the hold queue fall into an infinite loop.
       [not found] <96f4f1cb-0e7d-6682-ce33-f7f1314cba83@huawei.com>
@ 2022-01-13 11:56 ` cuigaosheng
  2022-01-13 12:16   ` some logs about the issue // " cuigaosheng
  2022-01-13 15:22     ` Paul Moore
  0 siblings, 2 replies; 8+ messages in thread
From: cuigaosheng @ 2022-01-13 11:56 UTC (permalink / raw)
  To: cuigaosheng1, Paul Moore
  Cc: wangweiyang, linux-audit, linux-security-module, Xiujianfeng,
	linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4557 bytes --]

When we add "audit=1" to the cmdline, kauditd will take up 100%
cpu resource.As follows:

    configurations:
    	auditctl -b 64
    	auditctl --backlog_wait_time 60000
    	auditctl -r 0
    	auditctl -w /root/aaa  -p wrx
    shell scripts:
    	#!/bin/bash
    	i=0
    	while [ $i -le 66 ]
    	do
    	    touch /root/aaa
    	    let i++
    	done
    mandatory conditions:

        add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).

  As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
  an infinite loop.

> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>  714                               struct sk_buff_head *queue,
>  715                               unsigned int retry_limit,
>  716                               void (*skb_hook)(struct sk_buff *skb),
>  717                               void (*err_hook)(struct sk_buff *skb))
>  718 {
>  719         int rc = 0;
>  720         struct sk_buff *skb;
>  721         unsigned int failed = 0;
>  722
>  723         /* NOTE: kauditd_thread takes care of all our locking, we 
> just use
>  724          *       the netlink info passed to us (e.g. sk and 
> portid) */
>  725
>  726         while ((skb = skb_dequeue(queue))) {
>  727                 /* call the skb_hook for each skb we touch */
>  728                 if (skb_hook)
>  729                         (*skb_hook)(skb);
>  730
>  731                 /* can we send to anyone via unicast? */
>  732                 if (!sk) {
>  733                         if (err_hook)
>  734                                 (*err_hook)(skb);
>  735                         continue;
>  736                 }
>  737
>  738 retry:
>  739                 /* grab an extra skb reference in case of error */
>  740                 skb_get(skb);
>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>  742                 if (rc < 0) {
>  743                         /* send failed - try a few times unless 
> fatal error */
>  744                         if (++failed >= retry_limit ||
>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>  746                                 sk = NULL;
>  747                                 if (err_hook)
>  748                                         (*err_hook)(skb);
>  749                                 if (rc == -EAGAIN)
>  750                                         rc = 0;
>  751                                 /* continue to drain the queue */
>  752                                 continue;
>  753                         } else
>  754                                 goto retry;
>  755                 } else {
>  756                         /* skb sent - drop the extra reference 
> and continue */
>  757                         consume_skb(skb);
>  758                         failed = 0;
>  759                 }
>  760         }
>  761
>  762         return (rc >= 0 ? 0 : rc);
>  763 }

When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
fall into an infinite loop.
I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
into kauditd_rehold_skb when the auditd is abnormal?

Look forward your reply. Thank you very much.

Gaosheng.

     


[-- Attachment #1.2: Type: text/html, Size: 5567 bytes --]

[-- Attachment #2: Type: text/plain, Size: 106 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 8+ messages in thread

* some logs about the issue // Re: Flush the hold queue fall into an infinite loop.
  2022-01-13 11:56 ` Flush the hold queue fall into an infinite loop cuigaosheng
@ 2022-01-13 12:16   ` cuigaosheng
  2022-01-13 15:22     ` Paul Moore
  1 sibling, 0 replies; 8+ messages in thread
From: cuigaosheng @ 2022-01-13 12:16 UTC (permalink / raw)
  To: Paul Moore
  Cc: wangweiyang, linux-audit, linux-security-module, Xiujianfeng,
	linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 7277 bytes --]

Log as follows:

> [  257.972293] CPU: 79 PID: 550 Comm: kauditd Kdump: loaded Tainted: 
> G           OE    --------- -t - 
> 4.18.0-147.5.2.5.h781.eulerosv2r10.x86_64 #1
> [  257.972294] Hardware name: Huawei CH121 V5/IT11SPCA1, BIOS 7.93 
> 01/14/2021
> [  257.972295] Call Trace:
> [  257.972297]  <IRQ>
> [  257.972307]  dump_stack+0x6f/0xab
> [  257.972314]  watchdog_timer_fn+0x222/0x2e0
> [  257.972316]  ? watchdog+0x50/0x50
> [  257.972322]  __hrtimer_run_queues+0x125/0x2f0
> [  257.972326]  ? recalibrate_cpu_khz+0x10/0x10
> [  257.972329]  hrtimer_interrupt+0xe5/0x240
> [  257.972331]  ? sched_clock+0x5/0x10
> [  257.972334]  smp_apic_timer_interrupt+0x6a/0x130
> [  257.972336]  apic_timer_interrupt+0xf/0x20
> [  257.972337]  </IRQ>
> [  257.972341] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
> [  257.972343] Code: ff ff 7f 5b 44 89 e8 5d 41 5c 41 5d c3 90 90 90 
> 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 
> f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
> c6 07
> [  257.972344] RSP: 0018:ffffb7d90e2d3e38 EFLAGS: 00000286 ORIG_RAX: 
> ffffffffffffff13
> [  257.972347] RAX: 0000000000000286 RBX: ffff9bb017d18b00 RCX: 
> ffff9bb017d19900
> [  257.972347] RDX: ffffffff8fb8fef0 RSI: 0000000000000286 RDI: 
> 0000000000000286
> [  257.972348] RBP: ffffffff8fb8fef0 R08: 000000000002b3a0 R09: 
> ffffffff8e7829a2
> [  257.972349] R10: ffffd9126778fa00 R11: 00000000000f4240 R12: 
> ffffffff8fb8ff04
> [  257.972350] R13: 0000000000000000 R14: ffff9bb017d18bf4 R15: 
> ffff9bb017d18b00
> [  257.972356]  ? netlink_attachskb+0xb2/0x1d0
> [  257.972362]  skb_dequeue+0x57/0x70
> [  257.972367]  kauditd_send_queue+0x37/0x100
> [  257.972369]  ? kauditd_retry_skb+0x20/0x20
> [  257.972370]  ? kauditd_send_multicast_skb+0x90/0x90
> [  257.972372]  kauditd_thread+0xa5/0x230
> [  257.972377]  ? finish_wait+0x80/0x80
> [  257.972378]  ? auditd_reset+0x90/0x90
> [  257.972381]  kthread+0x10d/0x130
> [  257.972383]  ? kthread_flush_work_fn+0x10/0x10
> [  257.972385]  ret_from_fork+0x35/0x40
> [  269.972020] Sample cputime: 3999999736 ns(HZ: 1000)
> [  269.972022] Sample cpurate: 0 us, 3984966800 sy, 0 ni, 0 id, 0 wa, 
> 15034536 hi, 0 si, 0 st
> [  269.972023] Sample softirq:
> [  269.972023] Sample hardirq:
> [  269.972232]         no hard irqs found.
> [  269.972233] watchdog: BUG: soft lockup - CPU#79 stuck for 22s! 
> [kauditd:550]

Thanks.


在 2022/1/13 19:56, cuigaosheng 写道:
> When we add "audit=1" to the cmdline, kauditd will take up 100%
> cpu resource.As follows:
>
>     configurations:
>     	auditctl -b 64
>     	auditctl --backlog_wait_time 60000
>     	auditctl -r 0
>     	auditctl -w /root/aaa  -p wrx
>     shell scripts:
>     	#!/bin/bash
>     	i=0
>     	while [ $i -le 66 ]
>     	do
>     	    touch /root/aaa
>     	    let i++
>     	done
>     mandatory conditions:
>
>         add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>
>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>   an infinite loop.
>
>> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>>  714                               struct sk_buff_head *queue,
>>  715                               unsigned int retry_limit,
>>  716                               void (*skb_hook)(struct sk_buff *skb),
>>  717                               void (*err_hook)(struct sk_buff *skb))
>>  718 {
>>  719         int rc = 0;
>>  720         struct sk_buff *skb;
>>  721         unsigned int failed = 0;
>>  722
>>  723         /* NOTE: kauditd_thread takes care of all our locking, 
>> we just use
>>  724          *       the netlink info passed to us (e.g. sk and 
>> portid) */
>>  725
>>  726         while ((skb = skb_dequeue(queue))) {
>>  727                 /* call the skb_hook for each skb we touch */
>>  728                 if (skb_hook)
>>  729                         (*skb_hook)(skb);
>>  730
>>  731                 /* can we send to anyone via unicast? */
>>  732                 if (!sk) {
>>  733                         if (err_hook)
>>  734                                 (*err_hook)(skb);
>>  735                         continue;
>>  736                 }
>>  737
>>  738 retry:
>>  739                 /* grab an extra skb reference in case of error */
>>  740                 skb_get(skb);
>>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>>  742                 if (rc < 0) {
>>  743                         /* send failed - try a few times unless 
>> fatal error */
>>  744                         if (++failed >= retry_limit ||
>>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>>  746                                 sk = NULL;
>>  747                                 if (err_hook)
>>  748                                         (*err_hook)(skb);
>>  749                                 if (rc == -EAGAIN)
>>  750                                         rc = 0;
>>  751                                 /* continue to drain the queue */
>>  752                                 continue;
>>  753                         } else
>>  754                                 goto retry;
>>  755                 } else {
>>  756                         /* skb sent - drop the extra reference 
>> and continue */
>>  757                         consume_skb(skb);
>>  758                         failed = 0;
>>  759                 }
>>  760         }
>>  761
>>  762         return (rc >= 0 ? 0 : rc);
>>  763 }
> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> fall into an infinite loop.
> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> into kauditd_rehold_skb when the auditd is abnormal?
>
> Look forward your reply. Thank you very much.
> Gaosheng.
>
>      

[-- Attachment #1.2: Type: text/html, Size: 9156 bytes --]

[-- Attachment #2: Type: text/plain, Size: 106 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
  2022-01-13 11:56 ` Flush the hold queue fall into an infinite loop cuigaosheng
@ 2022-01-13 15:22     ` Paul Moore
  2022-01-13 15:22     ` Paul Moore
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Moore @ 2022-01-13 15:22 UTC (permalink / raw)
  To: cuigaosheng
  Cc: linux-audit, Xiujianfeng, wangweiyang, linux-security-module,
	linux-kernel

On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>
> When we add "audit=1" to the cmdline, kauditd will take up 100%
> cpu resource.As follows:
>
> configurations:
> auditctl -b 64
> auditctl --backlog_wait_time 60000
> auditctl -r 0
> auditctl -w /root/aaa  -p wrx
> shell scripts:
> #!/bin/bash
> i=0
> while [ $i -le 66 ]
> do
>    touch /root/aaa
>    let i++
> done
> mandatory conditions:
>
> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>
>  As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>  an infinite loop.
>
> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>  714                               struct sk_buff_head *queue,
>  715                               unsigned int retry_limit,
>  716                               void (*skb_hook)(struct sk_buff *skb),
>  717                               void (*err_hook)(struct sk_buff *skb))
>  718 {
>  719         int rc = 0;
>  720         struct sk_buff *skb;
>  721         unsigned int failed = 0;
>  722
>  723         /* NOTE: kauditd_thread takes care of all our locking, we just use
>  724          *       the netlink info passed to us (e.g. sk and portid) */
>  725
>  726         while ((skb = skb_dequeue(queue))) {
>  727                 /* call the skb_hook for each skb we touch */
>  728                 if (skb_hook)
>  729                         (*skb_hook)(skb);
>  730
>  731                 /* can we send to anyone via unicast? */
>  732                 if (!sk) {
>  733                         if (err_hook)
>  734                                 (*err_hook)(skb);
>  735                         continue;
>  736                 }
>  737
>  738 retry:
>  739                 /* grab an extra skb reference in case of error */
>  740                 skb_get(skb);
>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>  742                 if (rc < 0) {
>  743                         /* send failed - try a few times unless fatal error */
>  744                         if (++failed >= retry_limit ||
>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>  746                                 sk = NULL;
>  747                                 if (err_hook)
>  748                                         (*err_hook)(skb);
>  749                                 if (rc == -EAGAIN)
>  750                                         rc = 0;
>  751                                 /* continue to drain the queue */
>  752                                 continue;
>  753                         } else
>  754                                 goto retry;
>  755                 } else {
>  756                         /* skb sent - drop the extra reference and continue */
>  757                         consume_skb(skb);
>  758                         failed = 0;
>  759                 }
>  760         }
>  761
>  762         return (rc >= 0 ? 0 : rc);
>  763 }
>
> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> fall into an infinite loop.
> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> into kauditd_rehold_skb when the auditd is abnormal?

Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
looking into it now.  I'll report back when I have a better idea of
the problem and a potential fix.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
@ 2022-01-13 15:22     ` Paul Moore
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2022-01-13 15:22 UTC (permalink / raw)
  To: cuigaosheng
  Cc: wangweiyang, linux-audit, linux-security-module, Xiujianfeng,
	linux-kernel

On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>
> When we add "audit=1" to the cmdline, kauditd will take up 100%
> cpu resource.As follows:
>
> configurations:
> auditctl -b 64
> auditctl --backlog_wait_time 60000
> auditctl -r 0
> auditctl -w /root/aaa  -p wrx
> shell scripts:
> #!/bin/bash
> i=0
> while [ $i -le 66 ]
> do
>    touch /root/aaa
>    let i++
> done
> mandatory conditions:
>
> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>
>  As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>  an infinite loop.
>
> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>  714                               struct sk_buff_head *queue,
>  715                               unsigned int retry_limit,
>  716                               void (*skb_hook)(struct sk_buff *skb),
>  717                               void (*err_hook)(struct sk_buff *skb))
>  718 {
>  719         int rc = 0;
>  720         struct sk_buff *skb;
>  721         unsigned int failed = 0;
>  722
>  723         /* NOTE: kauditd_thread takes care of all our locking, we just use
>  724          *       the netlink info passed to us (e.g. sk and portid) */
>  725
>  726         while ((skb = skb_dequeue(queue))) {
>  727                 /* call the skb_hook for each skb we touch */
>  728                 if (skb_hook)
>  729                         (*skb_hook)(skb);
>  730
>  731                 /* can we send to anyone via unicast? */
>  732                 if (!sk) {
>  733                         if (err_hook)
>  734                                 (*err_hook)(skb);
>  735                         continue;
>  736                 }
>  737
>  738 retry:
>  739                 /* grab an extra skb reference in case of error */
>  740                 skb_get(skb);
>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>  742                 if (rc < 0) {
>  743                         /* send failed - try a few times unless fatal error */
>  744                         if (++failed >= retry_limit ||
>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>  746                                 sk = NULL;
>  747                                 if (err_hook)
>  748                                         (*err_hook)(skb);
>  749                                 if (rc == -EAGAIN)
>  750                                         rc = 0;
>  751                                 /* continue to drain the queue */
>  752                                 continue;
>  753                         } else
>  754                                 goto retry;
>  755                 } else {
>  756                         /* skb sent - drop the extra reference and continue */
>  757                         consume_skb(skb);
>  758                         failed = 0;
>  759                 }
>  760         }
>  761
>  762         return (rc >= 0 ? 0 : rc);
>  763 }
>
> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> fall into an infinite loop.
> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> into kauditd_rehold_skb when the auditd is abnormal?

Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
looking into it now.  I'll report back when I have a better idea of
the problem and a potential fix.

-- 
paul moore
www.paul-moore.com


--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
  2022-01-13 15:22     ` Paul Moore
@ 2022-01-14  1:22       ` cuigaosheng
  -1 siblings, 0 replies; 8+ messages in thread
From: cuigaosheng @ 2022-01-14  1:22 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-audit, Xiujianfeng, wangweiyang, linux-security-module,
	linux-kernel

I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
seems that this modification goes against the design intent of audit_hold_queue. its
effect is similar to removing the audit_hold_queue.

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a38cbaf3ddb..a8091b1a6587 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32 
portid,
                                         (*err_hook)(skb);
                                 if (rc == -EAGAIN)
                                         rc = 0;
+                               audit_default = AUDIT_OFF;
                                 /* continue to drain the queue */
                                 continue;
                         } else
@@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32 
portid,
                 } else {
                         /* skb sent - drop the extra reference and 
continue */
                         consume_skb(skb);
+                       audit_default = audit_enabled;
                         failed = 0;
                 }
         }

在 2022/1/13 23:22, Paul Moore 写道:
> On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>> When we add "audit=1" to the cmdline, kauditd will take up 100%
>> cpu resource.As follows:
>>
>> configurations:
>> auditctl -b 64
>> auditctl --backlog_wait_time 60000
>> auditctl -r 0
>> auditctl -w /root/aaa  -p wrx
>> shell scripts:
>> #!/bin/bash
>> i=0
>> while [ $i -le 66 ]
>> do
>>     touch /root/aaa
>>     let i++
>> done
>> mandatory conditions:
>>
>> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>>
>>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>>   an infinite loop.
>>
>> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>>   714                               struct sk_buff_head *queue,
>>   715                               unsigned int retry_limit,
>>   716                               void (*skb_hook)(struct sk_buff *skb),
>>   717                               void (*err_hook)(struct sk_buff *skb))
>>   718 {
>>   719         int rc = 0;
>>   720         struct sk_buff *skb;
>>   721         unsigned int failed = 0;
>>   722
>>   723         /* NOTE: kauditd_thread takes care of all our locking, we just use
>>   724          *       the netlink info passed to us (e.g. sk and portid) */
>>   725
>>   726         while ((skb = skb_dequeue(queue))) {
>>   727                 /* call the skb_hook for each skb we touch */
>>   728                 if (skb_hook)
>>   729                         (*skb_hook)(skb);
>>   730
>>   731                 /* can we send to anyone via unicast? */
>>   732                 if (!sk) {
>>   733                         if (err_hook)
>>   734                                 (*err_hook)(skb);
>>   735                         continue;
>>   736                 }
>>   737
>>   738 retry:
>>   739                 /* grab an extra skb reference in case of error */
>>   740                 skb_get(skb);
>>   741                 rc = netlink_unicast(sk, skb, portid, 0);
>>   742                 if (rc < 0) {
>>   743                         /* send failed - try a few times unless fatal error */
>>   744                         if (++failed >= retry_limit ||
>>   745                             rc == -ECONNREFUSED || rc == -EPERM) {
>>   746                                 sk = NULL;
>>   747                                 if (err_hook)
>>   748                                         (*err_hook)(skb);
>>   749                                 if (rc == -EAGAIN)
>>   750                                         rc = 0;
>>   751                                 /* continue to drain the queue */
>>   752                                 continue;
>>   753                         } else
>>   754                                 goto retry;
>>   755                 } else {
>>   756                         /* skb sent - drop the extra reference and continue */
>>   757                         consume_skb(skb);
>>   758                         failed = 0;
>>   759                 }
>>   760         }
>>   761
>>   762         return (rc >= 0 ? 0 : rc);
>>   763 }
>>
>> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
>> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
>> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
>> fall into an infinite loop.
>> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
>> into kauditd_rehold_skb when the auditd is abnormal?
> Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> looking into it now.  I'll report back when I have a better idea of
> the problem and a potential fix.
>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
@ 2022-01-14  1:22       ` cuigaosheng
  0 siblings, 0 replies; 8+ messages in thread
From: cuigaosheng @ 2022-01-14  1:22 UTC (permalink / raw)
  To: Paul Moore
  Cc: wangweiyang, linux-audit, linux-security-module, Xiujianfeng,
	linux-kernel

I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
seems that this modification goes against the design intent of audit_hold_queue. its
effect is similar to removing the audit_hold_queue.

diff --git a/kernel/audit.c b/kernel/audit.c
index 2a38cbaf3ddb..a8091b1a6587 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32 
portid,
                                         (*err_hook)(skb);
                                 if (rc == -EAGAIN)
                                         rc = 0;
+                               audit_default = AUDIT_OFF;
                                 /* continue to drain the queue */
                                 continue;
                         } else
@@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32 
portid,
                 } else {
                         /* skb sent - drop the extra reference and 
continue */
                         consume_skb(skb);
+                       audit_default = audit_enabled;
                         failed = 0;
                 }
         }

在 2022/1/13 23:22, Paul Moore 写道:
> On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>> When we add "audit=1" to the cmdline, kauditd will take up 100%
>> cpu resource.As follows:
>>
>> configurations:
>> auditctl -b 64
>> auditctl --backlog_wait_time 60000
>> auditctl -r 0
>> auditctl -w /root/aaa  -p wrx
>> shell scripts:
>> #!/bin/bash
>> i=0
>> while [ $i -le 66 ]
>> do
>>     touch /root/aaa
>>     let i++
>> done
>> mandatory conditions:
>>
>> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
>>
>>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
>>   an infinite loop.
>>
>> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>>   714                               struct sk_buff_head *queue,
>>   715                               unsigned int retry_limit,
>>   716                               void (*skb_hook)(struct sk_buff *skb),
>>   717                               void (*err_hook)(struct sk_buff *skb))
>>   718 {
>>   719         int rc = 0;
>>   720         struct sk_buff *skb;
>>   721         unsigned int failed = 0;
>>   722
>>   723         /* NOTE: kauditd_thread takes care of all our locking, we just use
>>   724          *       the netlink info passed to us (e.g. sk and portid) */
>>   725
>>   726         while ((skb = skb_dequeue(queue))) {
>>   727                 /* call the skb_hook for each skb we touch */
>>   728                 if (skb_hook)
>>   729                         (*skb_hook)(skb);
>>   730
>>   731                 /* can we send to anyone via unicast? */
>>   732                 if (!sk) {
>>   733                         if (err_hook)
>>   734                                 (*err_hook)(skb);
>>   735                         continue;
>>   736                 }
>>   737
>>   738 retry:
>>   739                 /* grab an extra skb reference in case of error */
>>   740                 skb_get(skb);
>>   741                 rc = netlink_unicast(sk, skb, portid, 0);
>>   742                 if (rc < 0) {
>>   743                         /* send failed - try a few times unless fatal error */
>>   744                         if (++failed >= retry_limit ||
>>   745                             rc == -ECONNREFUSED || rc == -EPERM) {
>>   746                                 sk = NULL;
>>   747                                 if (err_hook)
>>   748                                         (*err_hook)(skb);
>>   749                                 if (rc == -EAGAIN)
>>   750                                         rc = 0;
>>   751                                 /* continue to drain the queue */
>>   752                                 continue;
>>   753                         } else
>>   754                                 goto retry;
>>   755                 } else {
>>   756                         /* skb sent - drop the extra reference and continue */
>>   757                         consume_skb(skb);
>>   758                         failed = 0;
>>   759                 }
>>   760         }
>>   761
>>   762         return (rc >= 0 ? 0 : rc);
>>   763 }
>>
>> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
>> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
>> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
>> fall into an infinite loop.
>> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
>> into kauditd_rehold_skb when the auditd is abnormal?
> Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> looking into it now.  I'll report back when I have a better idea of
> the problem and a potential fix.
>


--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
  2022-01-14  1:22       ` cuigaosheng
@ 2022-01-14 22:35         ` Paul Moore
  -1 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2022-01-14 22:35 UTC (permalink / raw)
  To: cuigaosheng
  Cc: linux-audit, Xiujianfeng, wangweiyang, linux-security-module,
	linux-kernel

On Thu, Jan 13, 2022 at 8:22 PM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>
> I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
> seems that this modification goes against the design intent of audit_hold_queue. its
> effect is similar to removing the audit_hold_queue.
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2a38cbaf3ddb..a8091b1a6587 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                                          (*err_hook)(skb);
>                                  if (rc == -EAGAIN)
>                                          rc = 0;
> +                               audit_default = AUDIT_OFF;
>                                  /* continue to drain the queue */
>                                  continue;
>                          } else
> @@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                  } else {
>                          /* skb sent - drop the extra reference and
> continue */
>                          consume_skb(skb);
> +                       audit_default = audit_enabled;
>                          failed = 0;
>                  }
>          }

We can't toggle the audit_default setting like this, that isn't
acceptable upstream.  I believe I have a fix, but I need to finish the
testing before I can post it for further review.

> 在 2022/1/13 23:22, Paul Moore 写道:
> > On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
> >> When we add "audit=1" to the cmdline, kauditd will take up 100%
> >> cpu resource.As follows:
> >>
> >> configurations:
> >> auditctl -b 64
> >> auditctl --backlog_wait_time 60000
> >> auditctl -r 0
> >> auditctl -w /root/aaa  -p wrx
> >> shell scripts:
> >> #!/bin/bash
> >> i=0
> >> while [ $i -le 66 ]
> >> do
> >>     touch /root/aaa
> >>     let i++
> >> done
> >> mandatory conditions:
> >>
> >> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
> >>
> >>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
> >>   an infinite loop.
> >>
> >> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
> >>   714                               struct sk_buff_head *queue,
> >>   715                               unsigned int retry_limit,
> >>   716                               void (*skb_hook)(struct sk_buff *skb),
> >>   717                               void (*err_hook)(struct sk_buff *skb))
> >>   718 {
> >>   719         int rc = 0;
> >>   720         struct sk_buff *skb;
> >>   721         unsigned int failed = 0;
> >>   722
> >>   723         /* NOTE: kauditd_thread takes care of all our locking, we just use
> >>   724          *       the netlink info passed to us (e.g. sk and portid) */
> >>   725
> >>   726         while ((skb = skb_dequeue(queue))) {
> >>   727                 /* call the skb_hook for each skb we touch */
> >>   728                 if (skb_hook)
> >>   729                         (*skb_hook)(skb);
> >>   730
> >>   731                 /* can we send to anyone via unicast? */
> >>   732                 if (!sk) {
> >>   733                         if (err_hook)
> >>   734                                 (*err_hook)(skb);
> >>   735                         continue;
> >>   736                 }
> >>   737
> >>   738 retry:
> >>   739                 /* grab an extra skb reference in case of error */
> >>   740                 skb_get(skb);
> >>   741                 rc = netlink_unicast(sk, skb, portid, 0);
> >>   742                 if (rc < 0) {
> >>   743                         /* send failed - try a few times unless fatal error */
> >>   744                         if (++failed >= retry_limit ||
> >>   745                             rc == -ECONNREFUSED || rc == -EPERM) {
> >>   746                                 sk = NULL;
> >>   747                                 if (err_hook)
> >>   748                                         (*err_hook)(skb);
> >>   749                                 if (rc == -EAGAIN)
> >>   750                                         rc = 0;
> >>   751                                 /* continue to drain the queue */
> >>   752                                 continue;
> >>   753                         } else
> >>   754                                 goto retry;
> >>   755                 } else {
> >>   756                         /* skb sent - drop the extra reference and continue */
> >>   757                         consume_skb(skb);
> >>   758                         failed = 0;
> >>   759                 }
> >>   760         }
> >>   761
> >>   762         return (rc >= 0 ? 0 : rc);
> >>   763 }
> >>
> >> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> >> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> >> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> >> fall into an infinite loop.
> >> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> >> into kauditd_rehold_skb when the auditd is abnormal?
> > Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> > looking into it now.  I'll report back when I have a better idea of
> > the problem and a potential fix.
> >



-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flush the hold queue fall into an infinite loop.
@ 2022-01-14 22:35         ` Paul Moore
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Moore @ 2022-01-14 22:35 UTC (permalink / raw)
  To: cuigaosheng
  Cc: wangweiyang, linux-audit, linux-security-module, Xiujianfeng,
	linux-kernel

On Thu, Jan 13, 2022 at 8:22 PM cuigaosheng <cuigaosheng1@huawei.com> wrote:
>
> I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
> seems that this modification goes against the design intent of audit_hold_queue. its
> effect is similar to removing the audit_hold_queue.
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2a38cbaf3ddb..a8091b1a6587 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                                          (*err_hook)(skb);
>                                  if (rc == -EAGAIN)
>                                          rc = 0;
> +                               audit_default = AUDIT_OFF;
>                                  /* continue to drain the queue */
>                                  continue;
>                          } else
> @@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                  } else {
>                          /* skb sent - drop the extra reference and
> continue */
>                          consume_skb(skb);
> +                       audit_default = audit_enabled;
>                          failed = 0;
>                  }
>          }

We can't toggle the audit_default setting like this, that isn't
acceptable upstream.  I believe I have a fix, but I need to finish the
testing before I can post it for further review.

> 在 2022/1/13 23:22, Paul Moore 写道:
> > On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1@huawei.com> wrote:
> >> When we add "audit=1" to the cmdline, kauditd will take up 100%
> >> cpu resource.As follows:
> >>
> >> configurations:
> >> auditctl -b 64
> >> auditctl --backlog_wait_time 60000
> >> auditctl -r 0
> >> auditctl -w /root/aaa  -p wrx
> >> shell scripts:
> >> #!/bin/bash
> >> i=0
> >> while [ $i -le 66 ]
> >> do
> >>     touch /root/aaa
> >>     let i++
> >> done
> >> mandatory conditions:
> >>
> >> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
> >>
> >>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
> >>   an infinite loop.
> >>
> >> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
> >>   714                               struct sk_buff_head *queue,
> >>   715                               unsigned int retry_limit,
> >>   716                               void (*skb_hook)(struct sk_buff *skb),
> >>   717                               void (*err_hook)(struct sk_buff *skb))
> >>   718 {
> >>   719         int rc = 0;
> >>   720         struct sk_buff *skb;
> >>   721         unsigned int failed = 0;
> >>   722
> >>   723         /* NOTE: kauditd_thread takes care of all our locking, we just use
> >>   724          *       the netlink info passed to us (e.g. sk and portid) */
> >>   725
> >>   726         while ((skb = skb_dequeue(queue))) {
> >>   727                 /* call the skb_hook for each skb we touch */
> >>   728                 if (skb_hook)
> >>   729                         (*skb_hook)(skb);
> >>   730
> >>   731                 /* can we send to anyone via unicast? */
> >>   732                 if (!sk) {
> >>   733                         if (err_hook)
> >>   734                                 (*err_hook)(skb);
> >>   735                         continue;
> >>   736                 }
> >>   737
> >>   738 retry:
> >>   739                 /* grab an extra skb reference in case of error */
> >>   740                 skb_get(skb);
> >>   741                 rc = netlink_unicast(sk, skb, portid, 0);
> >>   742                 if (rc < 0) {
> >>   743                         /* send failed - try a few times unless fatal error */
> >>   744                         if (++failed >= retry_limit ||
> >>   745                             rc == -ECONNREFUSED || rc == -EPERM) {
> >>   746                                 sk = NULL;
> >>   747                                 if (err_hook)
> >>   748                                         (*err_hook)(skb);
> >>   749                                 if (rc == -EAGAIN)
> >>   750                                         rc = 0;
> >>   751                                 /* continue to drain the queue */
> >>   752                                 continue;
> >>   753                         } else
> >>   754                                 goto retry;
> >>   755                 } else {
> >>   756                         /* skb sent - drop the extra reference and continue */
> >>   757                         consume_skb(skb);
> >>   758                         failed = 0;
> >>   759                 }
> >>   760         }
> >>   761
> >>   762         return (rc >= 0 ? 0 : rc);
> >>   763 }
> >>
> >> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> >> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> >> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> >> fall into an infinite loop.
> >> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> >> into kauditd_rehold_skb when the auditd is abnormal?
> > Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> > looking into it now.  I'll report back when I have a better idea of
> > the problem and a potential fix.
> >



-- 
paul moore
www.paul-moore.com


--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-14 22:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <96f4f1cb-0e7d-6682-ce33-f7f1314cba83@huawei.com>
2022-01-13 11:56 ` Flush the hold queue fall into an infinite loop cuigaosheng
2022-01-13 12:16   ` some logs about the issue // " cuigaosheng
2022-01-13 15:22   ` Paul Moore
2022-01-13 15:22     ` Paul Moore
2022-01-14  1:22     ` cuigaosheng
2022-01-14  1:22       ` cuigaosheng
2022-01-14 22:35       ` Paul Moore
2022-01-14 22:35         ` Paul Moore

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.