linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] kasan: fix livelock in qlist_move_cache
@ 2017-11-28  4:04 zhouzhouyi
  2017-11-28  4:05 ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: zhouzhouyi @ 2017-11-28  4:04 UTC (permalink / raw)
  To: aryabinin, glider, dvyukov, kasan-dev, linux-mm, linux-kernel; +Cc: Zhouyi Zhou

From: Zhouyi Zhou <zhouzhouyi@gmail.com>

This patch fix livelock by conditionally release cpu to let others
has a chance to run.

Tested on x86_64.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
 mm/kasan/quarantine.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 3a8ddf8..33eeff4 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
 				   struct kmem_cache *cache)
 {
 	struct qlist_node *curr;
+	struct qlist_head tmp_head;
+	unsigned long flags;
 
 	if (unlikely(qlist_empty(from)))
 		return;
 
+	qlist_init(&tmp_head);
 	curr = from->head;
 	qlist_init(from);
 	while (curr) {
@@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
 		if (obj_cache == cache)
 			qlist_put(to, curr, obj_cache->size);
 		else
-			qlist_put(from, curr, obj_cache->size);
+			qlist_put(&tmp_head, curr, obj_cache->size);
 
 		curr = next;
+
+		if (need_resched()) {
+			spin_unlock_irqrestore(&quarantine_lock, flags);
+			cond_resched();
+			spin_lock_irqsave(&quarantine_lock, flags);
+		}
 	}
+	qlist_move_all(&tmp_head, from);
 }
 
 static void per_cpu_remove_cache(void *arg)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  4:04 [PATCH 1/1] kasan: fix livelock in qlist_move_cache zhouzhouyi
@ 2017-11-28  4:05 ` Zhouyi Zhou
  2017-11-28  7:45   ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28  4:05 UTC (permalink / raw)
  To: aryabinin, Alexander Potapenko, Dmitry Vyukov, kasan-dev,
	linux-mm, linux-kernel
  Cc: Zhouyi Zhou

When there are huge amount of quarantined cache allocates in system,
number of entries in global_quarantine[i] will be great. Meanwhile,
there is no relax in while loop in function qlist_move_cache which
hold quarantine_lock. As a result, some userspace programs for example
libvirt will complain.

On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>
> This patch fix livelock by conditionally release cpu to let others
> has a chance to run.
>
> Tested on x86_64.
> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
>  mm/kasan/quarantine.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> index 3a8ddf8..33eeff4 100644
> --- a/mm/kasan/quarantine.c
> +++ b/mm/kasan/quarantine.c
> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>                                    struct kmem_cache *cache)
>  {
>         struct qlist_node *curr;
> +       struct qlist_head tmp_head;
> +       unsigned long flags;
>
>         if (unlikely(qlist_empty(from)))
>                 return;
>
> +       qlist_init(&tmp_head);
>         curr = from->head;
>         qlist_init(from);
>         while (curr) {
> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>                 if (obj_cache == cache)
>                         qlist_put(to, curr, obj_cache->size);
>                 else
> -                       qlist_put(from, curr, obj_cache->size);
> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>
>                 curr = next;
> +
> +               if (need_resched()) {
> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
> +                       cond_resched();
> +                       spin_lock_irqsave(&quarantine_lock, flags);
> +               }
>         }
> +       qlist_move_all(&tmp_head, from);
>  }
>
>  static void per_cpu_remove_cache(void *arg)
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  4:05 ` Zhouyi Zhou
@ 2017-11-28  7:45   ` Dmitry Vyukov
  2017-11-28  8:00     ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28  7:45 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> When there are huge amount of quarantined cache allocates in system,
> number of entries in global_quarantine[i] will be great. Meanwhile,
> there is no relax in while loop in function qlist_move_cache which
> hold quarantine_lock. As a result, some userspace programs for example
> libvirt will complain.

Hi,

The QUARANTINE_BATCHES thing was supposed to fix this problem, see
quarantine_remove_cache() function.
What is the amount of RAM and number of CPUs in your system?
If system has 4GB of RAM, quarantine size is 128MB and that's split
into 1024 batches. Batch size is 128KB. Even if that's filled with the
smallest objects of size 32, that's only 4K objects. And there is a
cond_resched() between processing of every batch.
I don't understand why it causes problems in your setup. We use KASAN
extremely heavily on hundreds of machines 24x7 and we have not seen
any single report from this code...


> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>
>> This patch fix livelock by conditionally release cpu to let others
>> has a chance to run.
>>
>> Tested on x86_64.
>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>> ---
>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>> index 3a8ddf8..33eeff4 100644
>> --- a/mm/kasan/quarantine.c
>> +++ b/mm/kasan/quarantine.c
>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>                                    struct kmem_cache *cache)
>>  {
>>         struct qlist_node *curr;
>> +       struct qlist_head tmp_head;
>> +       unsigned long flags;
>>
>>         if (unlikely(qlist_empty(from)))
>>                 return;
>>
>> +       qlist_init(&tmp_head);
>>         curr = from->head;
>>         qlist_init(from);
>>         while (curr) {
>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>                 if (obj_cache == cache)
>>                         qlist_put(to, curr, obj_cache->size);
>>                 else
>> -                       qlist_put(from, curr, obj_cache->size);
>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>
>>                 curr = next;
>> +
>> +               if (need_resched()) {
>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>> +                       cond_resched();
>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>> +               }
>>         }
>> +       qlist_move_all(&tmp_head, from);
>>  }
>>
>>  static void per_cpu_remove_cache(void *arg)
>> --
>> 2.1.4
>>
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To post to this group, send email to kasan-dev@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  7:45   ` Dmitry Vyukov
@ 2017-11-28  8:00     ` Zhouyi Zhou
  2017-11-28  8:12       ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28  8:00 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Thanks for reviewing
   My machine has 128G of RAM, and runs many KVM virtual machines.
libvirtd always
report "internal error: received hangup / error event on socket" under
heavy memory load.
   Then I use perf top -g, qlist_move_cache consumes 100% cpu for
several minutes.

On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>> When there are huge amount of quarantined cache allocates in system,
>> number of entries in global_quarantine[i] will be great. Meanwhile,
>> there is no relax in while loop in function qlist_move_cache which
>> hold quarantine_lock. As a result, some userspace programs for example
>> libvirt will complain.
>
> Hi,
>
> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
> quarantine_remove_cache() function.
> What is the amount of RAM and number of CPUs in your system?
> If system has 4GB of RAM, quarantine size is 128MB and that's split
> into 1024 batches. Batch size is 128KB. Even if that's filled with the
> smallest objects of size 32, that's only 4K objects. And there is a
> cond_resched() between processing of every batch.
> I don't understand why it causes problems in your setup. We use KASAN
> extremely heavily on hundreds of machines 24x7 and we have not seen
> any single report from this code...
>
>
>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>
>>> This patch fix livelock by conditionally release cpu to let others
>>> has a chance to run.
>>>
>>> Tested on x86_64.
>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>> ---
>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>> index 3a8ddf8..33eeff4 100644
>>> --- a/mm/kasan/quarantine.c
>>> +++ b/mm/kasan/quarantine.c
>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>                                    struct kmem_cache *cache)
>>>  {
>>>         struct qlist_node *curr;
>>> +       struct qlist_head tmp_head;
>>> +       unsigned long flags;
>>>
>>>         if (unlikely(qlist_empty(from)))
>>>                 return;
>>>
>>> +       qlist_init(&tmp_head);
>>>         curr = from->head;
>>>         qlist_init(from);
>>>         while (curr) {
>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>                 if (obj_cache == cache)
>>>                         qlist_put(to, curr, obj_cache->size);
>>>                 else
>>> -                       qlist_put(from, curr, obj_cache->size);
>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>
>>>                 curr = next;
>>> +
>>> +               if (need_resched()) {
>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>> +                       cond_resched();
>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>> +               }
>>>         }
>>> +       qlist_move_all(&tmp_head, from);
>>>  }
>>>
>>>  static void per_cpu_remove_cache(void *arg)
>>> --
>>> 2.1.4
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>> To post to this group, send email to kasan-dev@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  8:00     ` Zhouyi Zhou
@ 2017-11-28  8:12       ` Dmitry Vyukov
  2017-11-28  8:33         ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28  8:12 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Thanks for reviewing
>    My machine has 128G of RAM, and runs many KVM virtual machines.
> libvirtd always
> report "internal error: received hangup / error event on socket" under
> heavy memory load.
>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
> several minutes.

For 128GB of RAM, batch size is 4MB. Processing such batch should not
take more than few ms. So I am still struggling  to understand how/why
your change helps and why there are issues in the first place...



> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> When there are huge amount of quarantined cache allocates in system,
>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>> there is no relax in while loop in function qlist_move_cache which
>>> hold quarantine_lock. As a result, some userspace programs for example
>>> libvirt will complain.
>>
>> Hi,
>>
>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>> quarantine_remove_cache() function.
>> What is the amount of RAM and number of CPUs in your system?
>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>> smallest objects of size 32, that's only 4K objects. And there is a
>> cond_resched() between processing of every batch.
>> I don't understand why it causes problems in your setup. We use KASAN
>> extremely heavily on hundreds of machines 24x7 and we have not seen
>> any single report from this code...
>>
>>
>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>
>>>> This patch fix livelock by conditionally release cpu to let others
>>>> has a chance to run.
>>>>
>>>> Tested on x86_64.
>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>> ---
>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>> index 3a8ddf8..33eeff4 100644
>>>> --- a/mm/kasan/quarantine.c
>>>> +++ b/mm/kasan/quarantine.c
>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>                                    struct kmem_cache *cache)
>>>>  {
>>>>         struct qlist_node *curr;
>>>> +       struct qlist_head tmp_head;
>>>> +       unsigned long flags;
>>>>
>>>>         if (unlikely(qlist_empty(from)))
>>>>                 return;
>>>>
>>>> +       qlist_init(&tmp_head);
>>>>         curr = from->head;
>>>>         qlist_init(from);
>>>>         while (curr) {
>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>                 if (obj_cache == cache)
>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>                 else
>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>
>>>>                 curr = next;
>>>> +
>>>> +               if (need_resched()) {
>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>> +                       cond_resched();
>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>> +               }
>>>>         }
>>>> +       qlist_move_all(&tmp_head, from);
>>>>  }
>>>>
>>>>  static void per_cpu_remove_cache(void *arg)
>>>> --
>>>> 2.1.4
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  8:12       ` Dmitry Vyukov
@ 2017-11-28  8:33         ` Zhouyi Zhou
  2017-11-28  8:58           ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28  8:33 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Hi,
   Please take a look at function quarantine_put, I don't think following
code will limit the batch size below quarantine_batch_size. It only advance
quarantine_tail after qlist_move_all.

                qlist_move_all(q, &temp);

                spin_lock(&quarantine_lock);
                WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
                qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
                if (global_quarantine[quarantine_tail].bytes >=
                                READ_ONCE(quarantine_batch_size)) {
                        int new_tail;

                        new_tail = quarantine_tail + 1;
                        if (new_tail == QUARANTINE_BATCHES)
                                new_tail = 0;
                        if (new_tail != quarantine_head)
                                quarantine_tail = new_tail;

On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>> Thanks for reviewing
>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>> libvirtd always
>> report "internal error: received hangup / error event on socket" under
>> heavy memory load.
>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>> several minutes.
>
> For 128GB of RAM, batch size is 4MB. Processing such batch should not
> take more than few ms. So I am still struggling  to understand how/why
> your change helps and why there are issues in the first place...
>
>
>
>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>> When there are huge amount of quarantined cache allocates in system,
>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>> there is no relax in while loop in function qlist_move_cache which
>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>> libvirt will complain.
>>>
>>> Hi,
>>>
>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>> quarantine_remove_cache() function.
>>> What is the amount of RAM and number of CPUs in your system?
>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>> smallest objects of size 32, that's only 4K objects. And there is a
>>> cond_resched() between processing of every batch.
>>> I don't understand why it causes problems in your setup. We use KASAN
>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>> any single report from this code...
>>>
>>>
>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>
>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>> has a chance to run.
>>>>>
>>>>> Tested on x86_64.
>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>> ---
>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>> index 3a8ddf8..33eeff4 100644
>>>>> --- a/mm/kasan/quarantine.c
>>>>> +++ b/mm/kasan/quarantine.c
>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>                                    struct kmem_cache *cache)
>>>>>  {
>>>>>         struct qlist_node *curr;
>>>>> +       struct qlist_head tmp_head;
>>>>> +       unsigned long flags;
>>>>>
>>>>>         if (unlikely(qlist_empty(from)))
>>>>>                 return;
>>>>>
>>>>> +       qlist_init(&tmp_head);
>>>>>         curr = from->head;
>>>>>         qlist_init(from);
>>>>>         while (curr) {
>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>                 if (obj_cache == cache)
>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>                 else
>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>
>>>>>                 curr = next;
>>>>> +
>>>>> +               if (need_resched()) {
>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>> +                       cond_resched();
>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>> +               }
>>>>>         }
>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>  }
>>>>>
>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>> --
>>>>> 2.1.4
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  8:33         ` Zhouyi Zhou
@ 2017-11-28  8:58           ` Dmitry Vyukov
  2017-11-28  9:17             ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28  8:58 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Hi,
>    Please take a look at function quarantine_put, I don't think following
> code will limit the batch size below quarantine_batch_size. It only advance
> quarantine_tail after qlist_move_all.
>
>                 qlist_move_all(q, &temp);
>
>                 spin_lock(&quarantine_lock);
>                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
>                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
>                 if (global_quarantine[quarantine_tail].bytes >=
>                                 READ_ONCE(quarantine_batch_size)) {
>                         int new_tail;
>
>                         new_tail = quarantine_tail + 1;
>                         if (new_tail == QUARANTINE_BATCHES)
>                                 new_tail = 0;
>                         if (new_tail != quarantine_head)
>                                 quarantine_tail = new_tail;


As far as I see this code can exceed global quarantine batch size by
at most 1 per-cpu batch. Per-cpu batch is caped at 1MB. So max global
batch size will be 4MB+1MB. Which does not radically change situation.


> On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> Thanks for reviewing
>>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>>> libvirtd always
>>> report "internal error: received hangup / error event on socket" under
>>> heavy memory load.
>>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>>> several minutes.
>>
>> For 128GB of RAM, batch size is 4MB. Processing such batch should not
>> take more than few ms. So I am still struggling  to understand how/why
>> your change helps and why there are issues in the first place...
>>
>>
>>
>>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>> When there are huge amount of quarantined cache allocates in system,
>>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>>> there is no relax in while loop in function qlist_move_cache which
>>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>>> libvirt will complain.
>>>>
>>>> Hi,
>>>>
>>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>>> quarantine_remove_cache() function.
>>>> What is the amount of RAM and number of CPUs in your system?
>>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>>> smallest objects of size 32, that's only 4K objects. And there is a
>>>> cond_resched() between processing of every batch.
>>>> I don't understand why it causes problems in your setup. We use KASAN
>>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>>> any single report from this code...
>>>>
>>>>
>>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>
>>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>>> has a chance to run.
>>>>>>
>>>>>> Tested on x86_64.
>>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>> ---
>>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>>> index 3a8ddf8..33eeff4 100644
>>>>>> --- a/mm/kasan/quarantine.c
>>>>>> +++ b/mm/kasan/quarantine.c
>>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>                                    struct kmem_cache *cache)
>>>>>>  {
>>>>>>         struct qlist_node *curr;
>>>>>> +       struct qlist_head tmp_head;
>>>>>> +       unsigned long flags;
>>>>>>
>>>>>>         if (unlikely(qlist_empty(from)))
>>>>>>                 return;
>>>>>>
>>>>>> +       qlist_init(&tmp_head);
>>>>>>         curr = from->head;
>>>>>>         qlist_init(from);
>>>>>>         while (curr) {
>>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>                 if (obj_cache == cache)
>>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>>                 else
>>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>>
>>>>>>                 curr = next;
>>>>>> +
>>>>>> +               if (need_resched()) {
>>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>>> +                       cond_resched();
>>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>>> +               }
>>>>>>         }
>>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>>  }
>>>>>>
>>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>>> --
>>>>>> 2.1.4
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  8:58           ` Dmitry Vyukov
@ 2017-11-28  9:17             ` Zhouyi Zhou
  2017-11-28  9:27               ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28  9:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Hi,
    Imagine all of the QUARANTINE_BATCHES elements of
global_quarantine array is of size 4MB + 1MB, now a new call
to quarantine_put is invoked, one of the element will be of size 4MB +
1MB + 1MB, so on and on.

On Tue, Nov 28, 2017 at 4:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>> Hi,
>>    Please take a look at function quarantine_put, I don't think following
>> code will limit the batch size below quarantine_batch_size. It only advance
>> quarantine_tail after qlist_move_all.
>>
>>                 qlist_move_all(q, &temp);
>>
>>                 spin_lock(&quarantine_lock);
>>                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
>>                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
>>                 if (global_quarantine[quarantine_tail].bytes >=
>>                                 READ_ONCE(quarantine_batch_size)) {
>>                         int new_tail;
>>
>>                         new_tail = quarantine_tail + 1;
>>                         if (new_tail == QUARANTINE_BATCHES)
>>                                 new_tail = 0;
>>                         if (new_tail != quarantine_head)
>>                                 quarantine_tail = new_tail;
>
>
> As far as I see this code can exceed global quarantine batch size by
> at most 1 per-cpu batch. Per-cpu batch is caped at 1MB. So max global
> batch size will be 4MB+1MB. Which does not radically change situation.
>
>
>> On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>> Thanks for reviewing
>>>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>>>> libvirtd always
>>>> report "internal error: received hangup / error event on socket" under
>>>> heavy memory load.
>>>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>>>> several minutes.
>>>
>>> For 128GB of RAM, batch size is 4MB. Processing such batch should not
>>> take more than few ms. So I am still struggling  to understand how/why
>>> your change helps and why there are issues in the first place...
>>>
>>>
>>>
>>>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>> When there are huge amount of quarantined cache allocates in system,
>>>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>>>> there is no relax in while loop in function qlist_move_cache which
>>>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>>>> libvirt will complain.
>>>>>
>>>>> Hi,
>>>>>
>>>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>>>> quarantine_remove_cache() function.
>>>>> What is the amount of RAM and number of CPUs in your system?
>>>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>>>> smallest objects of size 32, that's only 4K objects. And there is a
>>>>> cond_resched() between processing of every batch.
>>>>> I don't understand why it causes problems in your setup. We use KASAN
>>>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>>>> any single report from this code...
>>>>>
>>>>>
>>>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>
>>>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>>>> has a chance to run.
>>>>>>>
>>>>>>> Tested on x86_64.
>>>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>> ---
>>>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>>>> index 3a8ddf8..33eeff4 100644
>>>>>>> --- a/mm/kasan/quarantine.c
>>>>>>> +++ b/mm/kasan/quarantine.c
>>>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>                                    struct kmem_cache *cache)
>>>>>>>  {
>>>>>>>         struct qlist_node *curr;
>>>>>>> +       struct qlist_head tmp_head;
>>>>>>> +       unsigned long flags;
>>>>>>>
>>>>>>>         if (unlikely(qlist_empty(from)))
>>>>>>>                 return;
>>>>>>>
>>>>>>> +       qlist_init(&tmp_head);
>>>>>>>         curr = from->head;
>>>>>>>         qlist_init(from);
>>>>>>>         while (curr) {
>>>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>                 if (obj_cache == cache)
>>>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>>>                 else
>>>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>>>
>>>>>>>                 curr = next;
>>>>>>> +
>>>>>>> +               if (need_resched()) {
>>>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>>>> +                       cond_resched();
>>>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>>>> +               }
>>>>>>>         }
>>>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>>>  }
>>>>>>>
>>>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>>>> --
>>>>>>> 2.1.4
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  9:17             ` Zhouyi Zhou
@ 2017-11-28  9:27               ` Dmitry Vyukov
  2017-11-28 11:30                 ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28  9:27 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 10:17 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Hi,
>     Imagine all of the QUARANTINE_BATCHES elements of
> global_quarantine array is of size 4MB + 1MB, now a new call
> to quarantine_put is invoked, one of the element will be of size 4MB +
> 1MB + 1MB, so on and on.


I see what you mean. Does it really happen in your case? What's the
maximum batch size that you get during your workload?

I always wondered why don't we drain quarantine right in
quarantine_put when we overflow it? We already take quarantine_lock
and calling cache_free should be fine in that context, since user code
already does that.



> On Tue, Nov 28, 2017 at 4:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> Hi,
>>>    Please take a look at function quarantine_put, I don't think following
>>> code will limit the batch size below quarantine_batch_size. It only advance
>>> quarantine_tail after qlist_move_all.
>>>
>>>                 qlist_move_all(q, &temp);
>>>
>>>                 spin_lock(&quarantine_lock);
>>>                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
>>>                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
>>>                 if (global_quarantine[quarantine_tail].bytes >=
>>>                                 READ_ONCE(quarantine_batch_size)) {
>>>                         int new_tail;
>>>
>>>                         new_tail = quarantine_tail + 1;
>>>                         if (new_tail == QUARANTINE_BATCHES)
>>>                                 new_tail = 0;
>>>                         if (new_tail != quarantine_head)
>>>                                 quarantine_tail = new_tail;
>>
>>
>> As far as I see this code can exceed global quarantine batch size by
>> at most 1 per-cpu batch. Per-cpu batch is caped at 1MB. So max global
>> batch size will be 4MB+1MB. Which does not radically change situation.
>>
>>
>>> On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>> Thanks for reviewing
>>>>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>>>>> libvirtd always
>>>>> report "internal error: received hangup / error event on socket" under
>>>>> heavy memory load.
>>>>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>>>>> several minutes.
>>>>
>>>> For 128GB of RAM, batch size is 4MB. Processing such batch should not
>>>> take more than few ms. So I am still struggling  to understand how/why
>>>> your change helps and why there are issues in the first place...
>>>>
>>>>
>>>>
>>>>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>>> When there are huge amount of quarantined cache allocates in system,
>>>>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>>>>> there is no relax in while loop in function qlist_move_cache which
>>>>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>>>>> libvirt will complain.
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>>>>> quarantine_remove_cache() function.
>>>>>> What is the amount of RAM and number of CPUs in your system?
>>>>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>>>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>>>>> smallest objects of size 32, that's only 4K objects. And there is a
>>>>>> cond_resched() between processing of every batch.
>>>>>> I don't understand why it causes problems in your setup. We use KASAN
>>>>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>>>>> any single report from this code...
>>>>>>
>>>>>>
>>>>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>>
>>>>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>>>>> has a chance to run.
>>>>>>>>
>>>>>>>> Tested on x86_64.
>>>>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>> ---
>>>>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>>>>> index 3a8ddf8..33eeff4 100644
>>>>>>>> --- a/mm/kasan/quarantine.c
>>>>>>>> +++ b/mm/kasan/quarantine.c
>>>>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>                                    struct kmem_cache *cache)
>>>>>>>>  {
>>>>>>>>         struct qlist_node *curr;
>>>>>>>> +       struct qlist_head tmp_head;
>>>>>>>> +       unsigned long flags;
>>>>>>>>
>>>>>>>>         if (unlikely(qlist_empty(from)))
>>>>>>>>                 return;
>>>>>>>>
>>>>>>>> +       qlist_init(&tmp_head);
>>>>>>>>         curr = from->head;
>>>>>>>>         qlist_init(from);
>>>>>>>>         while (curr) {
>>>>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>                 if (obj_cache == cache)
>>>>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>>>>                 else
>>>>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>>>>
>>>>>>>>                 curr = next;
>>>>>>>> +
>>>>>>>> +               if (need_resched()) {
>>>>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>>>>> +                       cond_resched();
>>>>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>>>>> +               }
>>>>>>>>         }
>>>>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>>>>> --
>>>>>>>> 2.1.4
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28  9:27               ` Dmitry Vyukov
@ 2017-11-28 11:30                 ` Zhouyi Zhou
  2017-11-28 17:56                   ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28 11:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Hi,
   By using perf top, qlist_move_cache occupies 100% cpu did really
happen in my environment yesterday, or I
won't notice the kasan code.
   Currently I have difficulty to let it reappear because the frontend
guy modified some user mode code.
   I can repeat again and again now is
kgdb_breakpoint () at kernel/debug/debug_core.c:1073
1073 wmb(); /* Sync point after breakpoint */
(gdb) p quarantine_batch_size
$1 = 3601946
   And by instrument code, maximum
global_quarantine[quarantine_tail].bytes reached is 6618208.

   I do think drain quarantine right in quarantine_put is a better
place to drain because cache_free is fine in
that context. I am willing do it if you think it is convenient :-)


On Tue, Nov 28, 2017 at 5:27 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 10:17 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>> Hi,
>>     Imagine all of the QUARANTINE_BATCHES elements of
>> global_quarantine array is of size 4MB + 1MB, now a new call
>> to quarantine_put is invoked, one of the element will be of size 4MB +
>> 1MB + 1MB, so on and on.
>
>
> I see what you mean. Does it really happen in your case? What's the
> maximum batch size that you get during your workload?
>
> I always wondered why don't we drain quarantine right in
> quarantine_put when we overflow it? We already take quarantine_lock
> and calling cache_free should be fine in that context, since user code
> already does that.
>
>
>
>> On Tue, Nov 28, 2017 at 4:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>> Hi,
>>>>    Please take a look at function quarantine_put, I don't think following
>>>> code will limit the batch size below quarantine_batch_size. It only advance
>>>> quarantine_tail after qlist_move_all.
>>>>
>>>>                 qlist_move_all(q, &temp);
>>>>
>>>>                 spin_lock(&quarantine_lock);
>>>>                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
>>>>                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
>>>>                 if (global_quarantine[quarantine_tail].bytes >=
>>>>                                 READ_ONCE(quarantine_batch_size)) {
>>>>                         int new_tail;
>>>>
>>>>                         new_tail = quarantine_tail + 1;
>>>>                         if (new_tail == QUARANTINE_BATCHES)
>>>>                                 new_tail = 0;
>>>>                         if (new_tail != quarantine_head)
>>>>                                 quarantine_tail = new_tail;
>>>
>>>
>>> As far as I see this code can exceed global quarantine batch size by
>>> at most 1 per-cpu batch. Per-cpu batch is caped at 1MB. So max global
>>> batch size will be 4MB+1MB. Which does not radically change situation.
>>>
>>>
>>>> On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>> Thanks for reviewing
>>>>>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>>>>>> libvirtd always
>>>>>> report "internal error: received hangup / error event on socket" under
>>>>>> heavy memory load.
>>>>>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>>>>>> several minutes.
>>>>>
>>>>> For 128GB of RAM, batch size is 4MB. Processing such batch should not
>>>>> take more than few ms. So I am still struggling  to understand how/why
>>>>> your change helps and why there are issues in the first place...
>>>>>
>>>>>
>>>>>
>>>>>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>>>> When there are huge amount of quarantined cache allocates in system,
>>>>>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>>>>>> there is no relax in while loop in function qlist_move_cache which
>>>>>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>>>>>> libvirt will complain.
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>>>>>> quarantine_remove_cache() function.
>>>>>>> What is the amount of RAM and number of CPUs in your system?
>>>>>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>>>>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>>>>>> smallest objects of size 32, that's only 4K objects. And there is a
>>>>>>> cond_resched() between processing of every batch.
>>>>>>> I don't understand why it causes problems in your setup. We use KASAN
>>>>>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>>>>>> any single report from this code...
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>>>
>>>>>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>>>>>> has a chance to run.
>>>>>>>>>
>>>>>>>>> Tested on x86_64.
>>>>>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>>> ---
>>>>>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>>>>>> index 3a8ddf8..33eeff4 100644
>>>>>>>>> --- a/mm/kasan/quarantine.c
>>>>>>>>> +++ b/mm/kasan/quarantine.c
>>>>>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>>                                    struct kmem_cache *cache)
>>>>>>>>>  {
>>>>>>>>>         struct qlist_node *curr;
>>>>>>>>> +       struct qlist_head tmp_head;
>>>>>>>>> +       unsigned long flags;
>>>>>>>>>
>>>>>>>>>         if (unlikely(qlist_empty(from)))
>>>>>>>>>                 return;
>>>>>>>>>
>>>>>>>>> +       qlist_init(&tmp_head);
>>>>>>>>>         curr = from->head;
>>>>>>>>>         qlist_init(from);
>>>>>>>>>         while (curr) {
>>>>>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>>                 if (obj_cache == cache)
>>>>>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>>>>>                 else
>>>>>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>>>>>
>>>>>>>>>                 curr = next;
>>>>>>>>> +
>>>>>>>>> +               if (need_resched()) {
>>>>>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>>>>>> +                       cond_resched();
>>>>>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>>>>>> +               }
>>>>>>>>>         }
>>>>>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>>>>>> --
>>>>>>>>> 2.1.4
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>>>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28 11:30                 ` Zhouyi Zhou
@ 2017-11-28 17:56                   ` Dmitry Vyukov
  2017-11-28 17:57                     ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28 17:56 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Hi,
>    By using perf top, qlist_move_cache occupies 100% cpu did really
> happen in my environment yesterday, or I
> won't notice the kasan code.
>    Currently I have difficulty to let it reappear because the frontend
> guy modified some user mode code.
>    I can repeat again and again now is
> kgdb_breakpoint () at kernel/debug/debug_core.c:1073
> 1073 wmb(); /* Sync point after breakpoint */
> (gdb) p quarantine_batch_size
> $1 = 3601946
>    And by instrument code, maximum
> global_quarantine[quarantine_tail].bytes reached is 6618208.

On second thought, size does not matter too much because there can be
large objects. Quarantine always quantize by objects, we can't part of
an object into one batch, and another part of the object into another
object. But it's not a problem, because overhead per objects is O(1).
We can push a single 4MB object and overflow target size by 4MB and
that will be fine.
Either way, 6MB is not terribly much too. Should take milliseconds to process.




>    I do think drain quarantine right in quarantine_put is a better
> place to drain because cache_free is fine in
> that context. I am willing do it if you think it is convenient :-)
>
>
> On Tue, Nov 28, 2017 at 5:27 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 10:17 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> Hi,
>>>     Imagine all of the QUARANTINE_BATCHES elements of
>>> global_quarantine array is of size 4MB + 1MB, now a new call
>>> to quarantine_put is invoked, one of the element will be of size 4MB +
>>> 1MB + 1MB, so on and on.
>>
>>
>> I see what you mean. Does it really happen in your case? What's the
>> maximum batch size that you get during your workload?
>>
>> I always wondered why don't we drain quarantine right in
>> quarantine_put when we overflow it? We already take quarantine_lock
>> and calling cache_free should be fine in that context, since user code
>> already does that.
>>
>>
>>
>>> On Tue, Nov 28, 2017 at 4:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> On Tue, Nov 28, 2017 at 9:33 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>> Hi,
>>>>>    Please take a look at function quarantine_put, I don't think following
>>>>> code will limit the batch size below quarantine_batch_size. It only advance
>>>>> quarantine_tail after qlist_move_all.
>>>>>
>>>>>                 qlist_move_all(q, &temp);
>>>>>
>>>>>                 spin_lock(&quarantine_lock);
>>>>>                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
>>>>>                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
>>>>>                 if (global_quarantine[quarantine_tail].bytes >=
>>>>>                                 READ_ONCE(quarantine_batch_size)) {
>>>>>                         int new_tail;
>>>>>
>>>>>                         new_tail = quarantine_tail + 1;
>>>>>                         if (new_tail == QUARANTINE_BATCHES)
>>>>>                                 new_tail = 0;
>>>>>                         if (new_tail != quarantine_head)
>>>>>                                 quarantine_tail = new_tail;
>>>>
>>>>
>>>> As far as I see this code can exceed global quarantine batch size by
>>>> at most 1 per-cpu batch. Per-cpu batch is caped at 1MB. So max global
>>>> batch size will be 4MB+1MB. Which does not radically change situation.
>>>>
>>>>
>>>>> On Tue, Nov 28, 2017 at 4:12 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> On Tue, Nov 28, 2017 at 9:00 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>>> Thanks for reviewing
>>>>>>>    My machine has 128G of RAM, and runs many KVM virtual machines.
>>>>>>> libvirtd always
>>>>>>> report "internal error: received hangup / error event on socket" under
>>>>>>> heavy memory load.
>>>>>>>    Then I use perf top -g, qlist_move_cache consumes 100% cpu for
>>>>>>> several minutes.
>>>>>>
>>>>>> For 128GB of RAM, batch size is 4MB. Processing such batch should not
>>>>>> take more than few ms. So I am still struggling  to understand how/why
>>>>>> your change helps and why there are issues in the first place...
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Tue, Nov 28, 2017 at 3:45 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>> On Tue, Nov 28, 2017 at 5:05 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>>>>>>> When there are huge amount of quarantined cache allocates in system,
>>>>>>>>> number of entries in global_quarantine[i] will be great. Meanwhile,
>>>>>>>>> there is no relax in while loop in function qlist_move_cache which
>>>>>>>>> hold quarantine_lock. As a result, some userspace programs for example
>>>>>>>>> libvirt will complain.
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The QUARANTINE_BATCHES thing was supposed to fix this problem, see
>>>>>>>> quarantine_remove_cache() function.
>>>>>>>> What is the amount of RAM and number of CPUs in your system?
>>>>>>>> If system has 4GB of RAM, quarantine size is 128MB and that's split
>>>>>>>> into 1024 batches. Batch size is 128KB. Even if that's filled with the
>>>>>>>> smallest objects of size 32, that's only 4K objects. And there is a
>>>>>>>> cond_resched() between processing of every batch.
>>>>>>>> I don't understand why it causes problems in your setup. We use KASAN
>>>>>>>> extremely heavily on hundreds of machines 24x7 and we have not seen
>>>>>>>> any single report from this code...
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Tue, Nov 28, 2017 at 12:04 PM,  <zhouzhouyi@gmail.com> wrote:
>>>>>>>>>> From: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>>>>
>>>>>>>>>> This patch fix livelock by conditionally release cpu to let others
>>>>>>>>>> has a chance to run.
>>>>>>>>>>
>>>>>>>>>> Tested on x86_64.
>>>>>>>>>> Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
>>>>>>>>>> ---
>>>>>>>>>>  mm/kasan/quarantine.c | 12 +++++++++++-
>>>>>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>>>>>>>>>> index 3a8ddf8..33eeff4 100644
>>>>>>>>>> --- a/mm/kasan/quarantine.c
>>>>>>>>>> +++ b/mm/kasan/quarantine.c
>>>>>>>>>> @@ -265,10 +265,13 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>>>                                    struct kmem_cache *cache)
>>>>>>>>>>  {
>>>>>>>>>>         struct qlist_node *curr;
>>>>>>>>>> +       struct qlist_head tmp_head;
>>>>>>>>>> +       unsigned long flags;
>>>>>>>>>>
>>>>>>>>>>         if (unlikely(qlist_empty(from)))
>>>>>>>>>>                 return;
>>>>>>>>>>
>>>>>>>>>> +       qlist_init(&tmp_head);
>>>>>>>>>>         curr = from->head;
>>>>>>>>>>         qlist_init(from);
>>>>>>>>>>         while (curr) {
>>>>>>>>>> @@ -278,10 +281,17 @@ static void qlist_move_cache(struct qlist_head *from,
>>>>>>>>>>                 if (obj_cache == cache)
>>>>>>>>>>                         qlist_put(to, curr, obj_cache->size);
>>>>>>>>>>                 else
>>>>>>>>>> -                       qlist_put(from, curr, obj_cache->size);
>>>>>>>>>> +                       qlist_put(&tmp_head, curr, obj_cache->size);
>>>>>>>>>>
>>>>>>>>>>                 curr = next;
>>>>>>>>>> +
>>>>>>>>>> +               if (need_resched()) {
>>>>>>>>>> +                       spin_unlock_irqrestore(&quarantine_lock, flags);
>>>>>>>>>> +                       cond_resched();
>>>>>>>>>> +                       spin_lock_irqsave(&quarantine_lock, flags);
>>>>>>>>>> +               }
>>>>>>>>>>         }
>>>>>>>>>> +       qlist_move_all(&tmp_head, from);
>>>>>>>>>>  }
>>>>>>>>>>
>>>>>>>>>>  static void per_cpu_remove_cache(void *arg)
>>>>>>>>>> --
>>>>>>>>>> 2.1.4
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
>>>>>>>>> To post to this group, send email to kasan-dev@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CAABZP2zEup53ZcNKOEUEMx_aRMLONZdYCLd7s5J4DLTccPxC-A%40mail.gmail.com.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28 17:56                   ` Dmitry Vyukov
@ 2017-11-28 17:57                     ` Dmitry Vyukov
  2017-11-28 23:41                       ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-28 17:57 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>> Hi,
>>    By using perf top, qlist_move_cache occupies 100% cpu did really
>> happen in my environment yesterday, or I
>> won't notice the kasan code.
>>    Currently I have difficulty to let it reappear because the frontend
>> guy modified some user mode code.
>>    I can repeat again and again now is
>> kgdb_breakpoint () at kernel/debug/debug_core.c:1073
>> 1073 wmb(); /* Sync point after breakpoint */
>> (gdb) p quarantine_batch_size
>> $1 = 3601946
>>    And by instrument code, maximum
>> global_quarantine[quarantine_tail].bytes reached is 6618208.
>
> On second thought, size does not matter too much because there can be
> large objects. Quarantine always quantize by objects, we can't part of
> an object into one batch, and another part of the object into another
> object. But it's not a problem, because overhead per objects is O(1).
> We can push a single 4MB object and overflow target size by 4MB and
> that will be fine.
> Either way, 6MB is not terribly much too. Should take milliseconds to process.
>
>
>
>
>>    I do think drain quarantine right in quarantine_put is a better
>> place to drain because cache_free is fine in
>> that context. I am willing do it if you think it is convenient :-)


Andrey, do you know of any problems with draining quarantine in push?
Do you have any objections?

But it's still not completely clear to me what problem we are solving.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28 17:57                     ` Dmitry Vyukov
@ 2017-11-28 23:41                       ` Zhouyi Zhou
  2017-11-29  4:54                         ` Zhouyi Zhou
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-28 23:41 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Hi,
    I will try to reestablish the environment, and design proof of
concept of experiment.
Cheers

On Wed, Nov 29, 2017 at 1:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> Hi,
>>>    By using perf top, qlist_move_cache occupies 100% cpu did really
>>> happen in my environment yesterday, or I
>>> won't notice the kasan code.
>>>    Currently I have difficulty to let it reappear because the frontend
>>> guy modified some user mode code.
>>>    I can repeat again and again now is
>>> kgdb_breakpoint () at kernel/debug/debug_core.c:1073
>>> 1073 wmb(); /* Sync point after breakpoint */
>>> (gdb) p quarantine_batch_size
>>> $1 = 3601946
>>>    And by instrument code, maximum
>>> global_quarantine[quarantine_tail].bytes reached is 6618208.
>>
>> On second thought, size does not matter too much because there can be
>> large objects. Quarantine always quantize by objects, we can't part of
>> an object into one batch, and another part of the object into another
>> object. But it's not a problem, because overhead per objects is O(1).
>> We can push a single 4MB object and overflow target size by 4MB and
>> that will be fine.
>> Either way, 6MB is not terribly much too. Should take milliseconds to process.
>>
>>
>>
>>
>>>    I do think drain quarantine right in quarantine_put is a better
>>> place to drain because cache_free is fine in
>>> that context. I am willing do it if you think it is convenient :-)
>
>
> Andrey, do you know of any problems with draining quarantine in push?
> Do you have any objections?
>
> But it's still not completely clear to me what problem we are solving.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-28 23:41                       ` Zhouyi Zhou
@ 2017-11-29  4:54                         ` Zhouyi Zhou
  2017-11-29  9:03                           ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2017-11-29  4:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

Hi,
There is new discoveries!

When I find qlist_move_cache reappear in my environment,
I use kgdb to break into function qlist_move_cache. I found
 this function is called because of cgroup release.

I also find libvirt allocate a memory croup for each qemu it started,
in my system, it looks like this:

root@ednserver3:/sys/fs/cgroup/memory/machine.slice# ls
cgroup.clone_children machine-qemu\x2d491_25_30.scope
machine-qemu\x2d491_40_30.scope  machine-qemu\x2d491_6_30.scope
memory.limit_in_bytes
cgroup.event_control machine-qemu\x2d491_26_30.scope
machine-qemu\x2d491_41_30.scope  machine-qemu\x2d491_7_30.scope
memory.max_usage_in_bytes
cgroup.procs machine-qemu\x2d491_27_30.scope
machine-qemu\x2d491_4_30.scope   machine-qemu\x2d491_8_30.scope
memory.move_charge_at_immigrate
machine-qemu\x2d491_10_30.scope  machine-qemu\x2d491_28_30.scope
machine-qemu\x2d491_47_30.scope  machine-qemu\x2d491_9_30.scope
memory.numa_stat
machine-qemu\x2d491_11_30.scope  machine-qemu\x2d491_29_30.scope
machine-qemu\x2d491_48_30.scope  memory.failcnt
memory.oom_control
machine-qemu\x2d491_12_30.scope  machine-qemu\x2d491_30_30.scope
machine-qemu\x2d491_49_30.scope  memory.force_empty
memory.pressure_level
machine-qemu\x2d491_13_30.scope  machine-qemu\x2d491_31_30.scope
machine-qemu\x2d491_50_30.scope  memory.kmem.failcnt
memory.soft_limit_in_bytes
machine-qemu\x2d491_17_30.scope  machine-qemu\x2d491_32_30.scope
machine-qemu\x2d491_51_30.scope  memory.kmem.limit_in_bytes
memory.stat
machine-qemu\x2d491_18_30.scope  machine-qemu\x2d491_33_30.scope
machine-qemu\x2d491_52_30.scope  memory.kmem.max_usage_in_bytes
memory.swappiness
machine-qemu\x2d491_19_30.scope  machine-qemu\x2d491_34_30.scope
machine-qemu\x2d491_5_30.scope   memory.kmem.slabinfo
memory.usage_in_bytes
machine-qemu\x2d491_20_30.scope  machine-qemu\x2d491_35_30.scope
machine-qemu\x2d491_53_30.scope  memory.kmem.tcp.failcnt
memory.use_hierarchy
machine-qemu\x2d491_21_30.scope  machine-qemu\x2d491_36_30.scope
machine-qemu\x2d491_54_30.scope  memory.kmem.tcp.limit_in_bytes
notify_on_release
machine-qemu\x2d491_22_30.scope  machine-qemu\x2d491_37_30.scope
machine-qemu\x2d491_55_30.scope  memory.kmem.tcp.max_usage_in_bytes
tasks
machine-qemu\x2d491_23_30.scope  machine-qemu\x2d491_38_30.scope
machine-qemu\x2d491_56_30.scope  memory.kmem.tcp.usage_in_bytes
machine-qemu\x2d491_24_30.scope  machine-qemu\x2d491_39_30.scope
machine-qemu\x2d491_57_30.scope  memory.kmem.usage_in_bytes

and in each memory cgroup there are many slabs:
root@ednserver3:/sys/fs/cgroup/memory/machine.slice/machine-qemu\x2d491_10_30.scope#
cat memory.kmem.slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
slabdata <active_slabs> <num_slabs> <sharedavail>
kmalloc-2048           0      0   2240    3    2 : tunables   24   12
  8 : slabdata      0      0      0
kmalloc-512            0      0    704   11    2 : tunables   54   27
  8 : slabdata      0      0      0
skbuff_head_cache      0      0    384   10    1 : tunables   54   27
  8 : slabdata      0      0      0
kmalloc-1024           0      0   1216    3    1 : tunables   24   12
  8 : slabdata      0      0      0
kmalloc-192            0      0    320   12    1 : tunables  120   60
  8 : slabdata      0      0      0
pid                    3     21    192   21    1 : tunables  120   60
  8 : slabdata      1      1      0
signal_cache           0      0   1216    3    1 : tunables   24   12
  8 : slabdata      0      0      0
sighand_cache          0      0   2304    3    2 : tunables   24   12
  8 : slabdata      0      0      0
fs_cache               0      0    192   21    1 : tunables  120   60
  8 : slabdata      0      0      0
files_cache            0      0    896    4    1 : tunables   54   27
  8 : slabdata      0      0      0
task_delay_info        3     72    112   36    1 : tunables  120   60
  8 : slabdata      2      2      0
task_struct            3      3   3840    1    1 : tunables   24   12
  8 : slabdata      3      3      0
radix_tree_node        0      0    728    5    1 : tunables   54   27
  8 : slabdata      0      0      0
shmem_inode_cache      2      9    848    9    2 : tunables   54   27
  8 : slabdata      1      1      0
inode_cache           39     45    744    5    1 : tunables   54   27
  8 : slabdata      9      9      0
ext4_inode_cache       0      0   1224    3    1 : tunables   24   12
  8 : slabdata      0      0      0
sock_inode_cache       3      8    832    4    1 : tunables   54   27
  8 : slabdata      2      2      0
proc_inode_cache       0      0    816    5    1 : tunables   54   27
  8 : slabdata      0      0      0
dentry                52     90    272   15    1 : tunables  120   60
  8 : slabdata      6      6      0
anon_vma             140    348    136   29    1 : tunables  120   60
  8 : slabdata     12     12      0
anon_vma_chain       257    468    112   36    1 : tunables  120   60
  8 : slabdata     13     13      0
vm_area_struct       510    780    272   15    1 : tunables  120   60
  8 : slabdata     52     52      0
mm_struct              1      3   1280    3    1 : tunables   24   12
  8 : slabdata      1      1      0
cred_jar              12     24    320   12    1 : tunables  120   60
  8 : slabdata      2      2      0

So, when I end the libvirt scenery, those slabs belong to those qemus
has to invoke quarantine_remove_cache,
I guess that's why  qlist_move_cache occupies so much CPU cycles. I
also guess this make libvirt complain
(wait for too long?)

Sorry not to research deeply into system in the first place and submit
a patch in a hurry.

And I propose a little sugguestion to  improve qlist_move_cache if you
like. Won't we design some kind of hash mechanism,
then we group the qlist_node according to their cache, so as not to
compare one by one to every qlist_node in the system.


Sorry for your time
Best Wishes
Zhouyi

On Wed, Nov 29, 2017 at 7:41 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Hi,
>     I will try to reestablish the environment, and design proof of
> concept of experiment.
> Cheers
>
> On Wed, Nov 29, 2017 at 1:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>>> Hi,
>>>>    By using perf top, qlist_move_cache occupies 100% cpu did really
>>>> happen in my environment yesterday, or I
>>>> won't notice the kasan code.
>>>>    Currently I have difficulty to let it reappear because the frontend
>>>> guy modified some user mode code.
>>>>    I can repeat again and again now is
>>>> kgdb_breakpoint () at kernel/debug/debug_core.c:1073
>>>> 1073 wmb(); /* Sync point after breakpoint */
>>>> (gdb) p quarantine_batch_size
>>>> $1 = 3601946
>>>>    And by instrument code, maximum
>>>> global_quarantine[quarantine_tail].bytes reached is 6618208.
>>>
>>> On second thought, size does not matter too much because there can be
>>> large objects. Quarantine always quantize by objects, we can't part of
>>> an object into one batch, and another part of the object into another
>>> object. But it's not a problem, because overhead per objects is O(1).
>>> We can push a single 4MB object and overflow target size by 4MB and
>>> that will be fine.
>>> Either way, 6MB is not terribly much too. Should take milliseconds to process.
>>>
>>>
>>>
>>>
>>>>    I do think drain quarantine right in quarantine_put is a better
>>>> place to drain because cache_free is fine in
>>>> that context. I am willing do it if you think it is convenient :-)
>>
>>
>> Andrey, do you know of any problems with draining quarantine in push?
>> Do you have any objections?
>>
>> But it's still not completely clear to me what problem we are solving.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache
  2017-11-29  4:54                         ` Zhouyi Zhou
@ 2017-11-29  9:03                           ` Dmitry Vyukov
  0 siblings, 0 replies; 15+ messages in thread
From: Dmitry Vyukov @ 2017-11-29  9:03 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Andrey Ryabinin, Alexander Potapenko, kasan-dev, Linux-MM, linux-kernel

On Wed, Nov 29, 2017 at 5:54 AM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Hi,
> There is new discoveries!
>
> When I find qlist_move_cache reappear in my environment,
> I use kgdb to break into function qlist_move_cache. I found
>  this function is called because of cgroup release.
>
> I also find libvirt allocate a memory croup for each qemu it started,
> in my system, it looks like this:
>
> root@ednserver3:/sys/fs/cgroup/memory/machine.slice# ls
> cgroup.clone_children machine-qemu\x2d491_25_30.scope
> machine-qemu\x2d491_40_30.scope  machine-qemu\x2d491_6_30.scope
> memory.limit_in_bytes
> cgroup.event_control machine-qemu\x2d491_26_30.scope
> machine-qemu\x2d491_41_30.scope  machine-qemu\x2d491_7_30.scope
> memory.max_usage_in_bytes
> cgroup.procs machine-qemu\x2d491_27_30.scope
> machine-qemu\x2d491_4_30.scope   machine-qemu\x2d491_8_30.scope
> memory.move_charge_at_immigrate
> machine-qemu\x2d491_10_30.scope  machine-qemu\x2d491_28_30.scope
> machine-qemu\x2d491_47_30.scope  machine-qemu\x2d491_9_30.scope
> memory.numa_stat
> machine-qemu\x2d491_11_30.scope  machine-qemu\x2d491_29_30.scope
> machine-qemu\x2d491_48_30.scope  memory.failcnt
> memory.oom_control
> machine-qemu\x2d491_12_30.scope  machine-qemu\x2d491_30_30.scope
> machine-qemu\x2d491_49_30.scope  memory.force_empty
> memory.pressure_level
> machine-qemu\x2d491_13_30.scope  machine-qemu\x2d491_31_30.scope
> machine-qemu\x2d491_50_30.scope  memory.kmem.failcnt
> memory.soft_limit_in_bytes
> machine-qemu\x2d491_17_30.scope  machine-qemu\x2d491_32_30.scope
> machine-qemu\x2d491_51_30.scope  memory.kmem.limit_in_bytes
> memory.stat
> machine-qemu\x2d491_18_30.scope  machine-qemu\x2d491_33_30.scope
> machine-qemu\x2d491_52_30.scope  memory.kmem.max_usage_in_bytes
> memory.swappiness
> machine-qemu\x2d491_19_30.scope  machine-qemu\x2d491_34_30.scope
> machine-qemu\x2d491_5_30.scope   memory.kmem.slabinfo
> memory.usage_in_bytes
> machine-qemu\x2d491_20_30.scope  machine-qemu\x2d491_35_30.scope
> machine-qemu\x2d491_53_30.scope  memory.kmem.tcp.failcnt
> memory.use_hierarchy
> machine-qemu\x2d491_21_30.scope  machine-qemu\x2d491_36_30.scope
> machine-qemu\x2d491_54_30.scope  memory.kmem.tcp.limit_in_bytes
> notify_on_release
> machine-qemu\x2d491_22_30.scope  machine-qemu\x2d491_37_30.scope
> machine-qemu\x2d491_55_30.scope  memory.kmem.tcp.max_usage_in_bytes
> tasks
> machine-qemu\x2d491_23_30.scope  machine-qemu\x2d491_38_30.scope
> machine-qemu\x2d491_56_30.scope  memory.kmem.tcp.usage_in_bytes
> machine-qemu\x2d491_24_30.scope  machine-qemu\x2d491_39_30.scope
> machine-qemu\x2d491_57_30.scope  memory.kmem.usage_in_bytes
>
> and in each memory cgroup there are many slabs:
> root@ednserver3:/sys/fs/cgroup/memory/machine.slice/machine-qemu\x2d491_10_30.scope#
> cat memory.kmem.slabinfo
> slabinfo - version: 2.1
> # name            <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
> slabdata <active_slabs> <num_slabs> <sharedavail>
> kmalloc-2048           0      0   2240    3    2 : tunables   24   12
>   8 : slabdata      0      0      0
> kmalloc-512            0      0    704   11    2 : tunables   54   27
>   8 : slabdata      0      0      0
> skbuff_head_cache      0      0    384   10    1 : tunables   54   27
>   8 : slabdata      0      0      0
> kmalloc-1024           0      0   1216    3    1 : tunables   24   12
>   8 : slabdata      0      0      0
> kmalloc-192            0      0    320   12    1 : tunables  120   60
>   8 : slabdata      0      0      0
> pid                    3     21    192   21    1 : tunables  120   60
>   8 : slabdata      1      1      0
> signal_cache           0      0   1216    3    1 : tunables   24   12
>   8 : slabdata      0      0      0
> sighand_cache          0      0   2304    3    2 : tunables   24   12
>   8 : slabdata      0      0      0
> fs_cache               0      0    192   21    1 : tunables  120   60
>   8 : slabdata      0      0      0
> files_cache            0      0    896    4    1 : tunables   54   27
>   8 : slabdata      0      0      0
> task_delay_info        3     72    112   36    1 : tunables  120   60
>   8 : slabdata      2      2      0
> task_struct            3      3   3840    1    1 : tunables   24   12
>   8 : slabdata      3      3      0
> radix_tree_node        0      0    728    5    1 : tunables   54   27
>   8 : slabdata      0      0      0
> shmem_inode_cache      2      9    848    9    2 : tunables   54   27
>   8 : slabdata      1      1      0
> inode_cache           39     45    744    5    1 : tunables   54   27
>   8 : slabdata      9      9      0
> ext4_inode_cache       0      0   1224    3    1 : tunables   24   12
>   8 : slabdata      0      0      0
> sock_inode_cache       3      8    832    4    1 : tunables   54   27
>   8 : slabdata      2      2      0
> proc_inode_cache       0      0    816    5    1 : tunables   54   27
>   8 : slabdata      0      0      0
> dentry                52     90    272   15    1 : tunables  120   60
>   8 : slabdata      6      6      0
> anon_vma             140    348    136   29    1 : tunables  120   60
>   8 : slabdata     12     12      0
> anon_vma_chain       257    468    112   36    1 : tunables  120   60
>   8 : slabdata     13     13      0
> vm_area_struct       510    780    272   15    1 : tunables  120   60
>   8 : slabdata     52     52      0
> mm_struct              1      3   1280    3    1 : tunables   24   12
>   8 : slabdata      1      1      0
> cred_jar              12     24    320   12    1 : tunables  120   60
>   8 : slabdata      2      2      0
>
> So, when I end the libvirt scenery, those slabs belong to those qemus
> has to invoke quarantine_remove_cache,
> I guess that's why  qlist_move_cache occupies so much CPU cycles. I
> also guess this make libvirt complain
> (wait for too long?)
>
> Sorry not to research deeply into system in the first place and submit
> a patch in a hurry.
>
> And I propose a little sugguestion to  improve qlist_move_cache if you
> like. Won't we design some kind of hash mechanism,
> then we group the qlist_node according to their cache, so as not to
> compare one by one to every qlist_node in the system.

Yes, quarantine_remove_cache() is very slow because it walk a huge
linked list and synchronize_srcu() does not help either. It would be
great to make it faster rather than peppering over the problem with
rescheds.

Please detail your scheme.
Note that quarantine needs to be [best-effort] global FIFO and that
the main operations are actually kmalloc/kfree, so we should not
penalize them either. We also have limited memory in memory blocks.

I had some ideas but I couldn't come up with a complete solution that
I would like.
One thing is that we could first check if the cache actually has _any_
outstanding objects. Looking at your slabinfo dump, it seems that lots
of them don't have active objects. In that case we can skip all of
quarantine_remove_cache entirely. I see there is already a function
for this:

static int shutdown_cache(struct kmem_cache *s)
{
        /* free asan quarantined objects */
        kasan_cache_shutdown(s);

        if (__kmem_cache_shutdown(s) != 0)
                return -EBUSY;

So maybe we could do just:

static int shutdown_cache(struct kmem_cache *s)
{
        if (__kmem_cache_shutdown(s) != 0) {
               /* free asan quarantined objects */
               kasan_cache_shutdown(s);
               if (__kmem_cache_shutdown(s) != 0)
                       return -EBUSY;
        }


We could also make cache freeing asynchronous. Then we could either
just wait when the cache doesn't have any active objects (walk and
check all deferred caches after each quarantine_reduce()), or
accumulate a batch of them and then walk quarantine once and remove
objects for the batch of caches (this would amortize overhead by batch
size). As far as I understand in lots of cases caches are freed in
large batches (cgroups, namespaces), and that's exactly when
quarantine_remove_cache() performance is a problem.

Or we could make quarantine a doubly-linked list and then walk all
active objects in the cache (is it possible?) and remove them from
quarantine by shuffling next/prev pointers. However, this can increase
memory consumption and penalize performance of other operations.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-11-29  9:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28  4:04 [PATCH 1/1] kasan: fix livelock in qlist_move_cache zhouzhouyi
2017-11-28  4:05 ` Zhouyi Zhou
2017-11-28  7:45   ` Dmitry Vyukov
2017-11-28  8:00     ` Zhouyi Zhou
2017-11-28  8:12       ` Dmitry Vyukov
2017-11-28  8:33         ` Zhouyi Zhou
2017-11-28  8:58           ` Dmitry Vyukov
2017-11-28  9:17             ` Zhouyi Zhou
2017-11-28  9:27               ` Dmitry Vyukov
2017-11-28 11:30                 ` Zhouyi Zhou
2017-11-28 17:56                   ` Dmitry Vyukov
2017-11-28 17:57                     ` Dmitry Vyukov
2017-11-28 23:41                       ` Zhouyi Zhou
2017-11-29  4:54                         ` Zhouyi Zhou
2017-11-29  9:03                           ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).