All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@gmx.us>
To: Waiman Long <longman@redhat.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	arnd@arndb.de, linux kernel <linux-kernel@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [PATCH v4] debugobjects: scale the static pool size
Date: Sun, 25 Nov 2018 23:52:24 -0500	[thread overview]
Message-ID: <5abb31e1-b5f2-718d-3a48-b0d8a73d6e5c@gmx.us> (raw)
In-Reply-To: <473f6a6e-1a14-d07c-b0f0-4d96e3232d1a@redhat.com>



On 11/25/18 8:31 PM, Waiman Long wrote:
> On 11/25/2018 03:42 PM, Qian Cai wrote:
>>
>>
>> On 11/23/18 10:01 PM, Qian Cai wrote:
>>>
>>>
>>>> On Nov 22, 2018, at 4:56 PM, Thomas Gleixner <tglx@linutronix.de>
>>>> wrote:
>>>>
>>>> On Tue, 20 Nov 2018, Qian Cai wrote:
>>>>
>>>> Looking deeper at that.
>>>>
>>>>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>>>>> index 70935ed91125..140571aa483c 100644
>>>>> --- a/lib/debugobjects.c
>>>>> +++ b/lib/debugobjects.c
>>>>> @@ -23,9 +23,81 @@
>>>>> #define ODEBUG_HASH_BITS    14
>>>>> #define ODEBUG_HASH_SIZE    (1 << ODEBUG_HASH_BITS)
>>>>>
>>>>> -#define ODEBUG_POOL_SIZE    1024
>>>>> +#define ODEBUG_DEFAULT_POOL    512
>>>>> #define ODEBUG_POOL_MIN_LEVEL    256
>>>>>
>>>>> +/*
>>>>> + * Some debug objects are allocated during the early boot.
>>>>> Enabling some options
>>>>> + * like timers or workqueue objects may increase the size required
>>>>> significantly
>>>>> + * with large number of CPUs. For example (as today, 20 Nov. 2018),
>>>>> + *
>>>>> + * No. CPUs x 2 (worker pool) objects:
>>>>> + *
>>>>> + * start_kernel
>>>>> + *   workqueue_init_early
>>>>> + *     init_worker_pool
>>>>> + *       init_timer_key
>>>>> + *         debug_object_init
>>>>> + *
>>>>> + * No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
>>>>> + *
>>>>> + * sched_init
>>>>> + *   hrtick_rq_init
>>>>> + *     hrtimer_init
>>>>> + *
>>>>> + * CONFIG_DEBUG_OBJECTS_WORK:
>>>>> + * No. CPUs x 6 (workqueue) objects:
>>>>> + *
>>>>> + * workqueue_init_early
>>>>> + *   alloc_workqueue
>>>>> + *     __alloc_workqueue_key
>>>>> + *       alloc_and_link_pwqs
>>>>> + *         init_pwq
>>>>> + *
>>>>> + * Also, plus No. CPUs objects:
>>>>> + *
>>>>> + * perf_event_init
>>>>> + *    __init_srcu_struct
>>>>> + *      init_srcu_struct_fields
>>>>> + *        init_srcu_struct_nodes
>>>>> + *          __init_work
>>>>
>>>> None of the things are actually used or required _BEFORE_
>>>> debug_objects_mem_init() is invoked.
>>>>
>>>> The reason why the call is at this place in start_kernel() is
>>>> historical. It's because back in the days when debugobjects were
>>>> added the
>>>> memory allocator was enabled way later than today. So we can just
>>>> move the
>>>> debug_objects_mem_init() call right before sched_init() I think.
>>>
>>> Well, now that kmemleak_init() seems complains that
>>> debug_objects_mem_init()
>>> is called before it.
>>>
>>> [    0.078805] kmemleak: Cannot insert 0xc000000dff930000 into the
>>> object search tree (overlaps existing)
>>> [    0.078860] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc3+ #3
>>> [    0.078883] Call Trace:
>>> [    0.078904] [c000000001c8fcd0] [c000000000c96b34]
>>> dump_stack+0xe8/0x164 (unreliable)
>>> [    0.078935] [c000000001c8fd20] [c000000000486e84]
>>> create_object+0x344/0x380
>>> [    0.078962] [c000000001c8fde0] [c000000000489544]
>>> early_alloc+0x108/0x1f8
>>> [    0.078989] [c000000001c8fe20] [c00000000109738c]
>>> kmemleak_init+0x1d8/0x3d4
>>> [    0.079016] [c000000001c8ff00] [c000000001054028]
>>> start_kernel+0x5c0/0x6f8
>>> [    0.079043] [c000000001c8ff90] [c00000000000ae7c]
>>> start_here_common+0x1c/0x520
>>> [    0.079070] kmemleak: Kernel memory leak detector disabled
>>> [    0.079091] kmemleak: Object 0xc000000ffd587b68 (size 40):
>>> [    0.079112] kmemleak:   comm "swapper/0", pid 0, jiffies 4294937299
>>> [    0.079135] kmemleak:   min_count = -1
>>> [    0.079153] kmemleak:   count = 0
>>> [    0.079170] kmemleak:   flags = 0x5
>>> [    0.079188] kmemleak:   checksum = 0
>>> [    0.079206] kmemleak:   backtrace:
>>> [    0.079227]      __debug_object_init+0x688/0x700
>>> [    0.079250]      debug_object_activate+0x1e0/0x350
>>> [    0.079272]      __call_rcu+0x60/0x430
>>> [    0.079292]      put_object+0x60/0x80
>>> [    0.079311]      kmemleak_init+0x2cc/0x3d4
>>> [    0.079331]      start_kernel+0x5c0/0x6f8
>>> [    0.079351]      start_here_common+0x1c/0x520
>>> [    0.079380] kmemleak: Early log backtrace:
>>> [    0.079399]    memblock_alloc_try_nid_raw+0x90/0xcc
>>> [    0.079421]    sparse_init_nid+0x144/0x51c
>>> [    0.079440]    sparse_init+0x1a0/0x238
>>> [    0.079459]    initmem_init+0x1d8/0x25c
>>> [    0.079498]    setup_arch+0x3e0/0x464
>>> [    0.079517]    start_kernel+0xa4/0x6f8
>>> [    0.079536]    start_here_common+0x1c/0x520
>>>
>>
>> So this is an chicken-egg problem. Debug objects need kmemleak_init()
>> first, so it can make use of kmemleak_ignore() for all debug objects
>> in order to avoid the overlapping like the above.
>>
>> while (obj_pool_free < debug_objects_pool_min_level) {
>>
>>      new = kmem_cache_zalloc(obj_cache, gfp);
>>      if (!new)
>>          return;
>>
>>      kmemleak_ignore(new);
>>
>> However, there seems no way to move kmemleak_init() together this
>> early in start_kernel() just before vmalloc_init() [1] because it
>> looks like it depends on things like workqueue
>> (schedule_work(&cleanup_work)) and rcu. Hence, it needs to be after
>> workqueue_init_early() and rcu_init()
>>
>> Given that, maybe the best outcome is to stick to the alternative
>> approach that works [1] rather messing up with the order of
>> debug_objects_mem_init() in start_kernel() which seems tricky. What do
>> you think?
>>
>> [1] https://goo.gl/18N78g
>> [2] https://goo.gl/My6ig6
> 
> Could you move kmemleak_init() and debug_objects_mem_init() as far up as
> possible, like before the hrtimer_init() to at least make static count
> calculation as simple as possible?
>

Well, there is only 2 x NR_CPUS difference after moved both calls just after 
rcu_init().

          Before After
64-CPU:  1114   974
160-CPU: 2774   2429
256-CPU: 3853   4378

I suppose it is possible that the timers only need the scale factor 5 instead of 
10. However, it needs to be retested for all the configurations to be sure, and 
likely need to remove all irqs calls in kmemleak_init() and subroutines because 
it is now called with irq disabled. Given the initdata will be freed anyway, 
does it really worth doing?

BTW, calling debug_objects_mem_init() before kmemleak_init() actually could 
trigger a loop on machines with 160+ CPUs until the pool is filled up,

debug_objects_pool_min_level += num_possible_cpus() * 4;

[1] while (obj_pool_free < debug_objects_pool_min_level)

kmemleak_init
   kmemleak_ignore (from replaced static debug objects)
     make_black_object
       put_object
         __call_rcu (kernel/rcu/tree.c)
           debug_rcu_head_queue
             debug_object_activate
               debug_object_init
                 fill_pool
                   kmemleak_ignore (looping in [1])
                     make_black_object
                       ...

I think until this is resolved, there is no way to move debug_objects_mem_init() 
before kmemleak_init().

  reply	other threads:[~2018-11-26  4:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-20 20:14 [PATCH v2] debugobjects: scale the static pool size Qian Cai
2018-11-20 20:54 ` Waiman Long
2018-11-20 20:56   ` Thomas Gleixner
2018-11-20 23:28 ` [PATCH v3] " Qian Cai
2018-11-20 23:38   ` Waiman Long
2018-11-20 23:54     ` Qian Cai
2018-11-20 23:58       ` Joe Perches
2018-11-21  2:11   ` [PATCH v4] " Qian Cai
2018-11-21 14:45     ` Waiman Long
2018-11-22 21:56     ` Thomas Gleixner
2018-11-23  4:31       ` [PATCH] debugobjects: call debug_objects_mem_init eariler Qian Cai
2018-11-23  4:59         ` Waiman Long
2018-11-23 21:46           ` Thomas Gleixner
2018-11-24  2:54             ` Qian Cai
2018-11-24  3:01       ` [PATCH v4] debugobjects: scale the static pool size Qian Cai
2018-11-25 20:42         ` Qian Cai
2018-11-26  1:31           ` Waiman Long
2018-11-26  4:52             ` Qian Cai [this message]
2018-11-26  9:45               ` Qian Cai
2018-11-26 10:50                 ` Catalin Marinas
2018-11-25 20:48       ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5abb31e1-b5f2-718d-3a48-b0d8a73d6e5c@gmx.us \
    --to=cai@gmx.us \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.