bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Bill Wendling <morbo@google.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	<dwarves@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	<kernel-team@fb.com>
Subject: Re: [PATCH dwarves v3 3/3] dwarf_loader: permit merging all dwarf cu's for clang lto built binary
Date: Tue, 30 Mar 2021 15:28:42 -0700	[thread overview]
Message-ID: <a71634a6-4b91-1af0-a5aa-11623d391464@fb.com> (raw)
In-Reply-To: <CAGG=3QVHfkPanWc20HcjJJO57ynUfE3ZV4SV_SKEOHyEujw5qQ@mail.gmail.com>



On 3/30/21 2:44 PM, Bill Wendling wrote:
> On Tue, Mar 30, 2021 at 1:15 PM Yonghong Song <yhs@fb.com> wrote:
>> On 3/30/21 1:08 PM, Bill Wendling wrote:
>>> On Sun, Mar 28, 2021 at 1:14 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>> For vmlinux built with clang thin-lto or lto, there exist
>>>> cross cu type references. For example, the below can happen:
>>>>     compile unit 1:
>>>>        tag 10:  type A
>>>>     compile unit 2:
>>>>        ...
>>>>          refer to type A (tag 10 in compile unit 1)
>>>> I only checked a few but have seen type A may be a simple type
>>>> like "unsigned char" or a complex type like an array of base types.
>>>>
>>>> To resolve this issue, the tag DW_AT_producer of the first
>>>> DW_TAG_compile_unit is checked. If the binary is built
>>>> with clang lto, all debuginfo dwarf cu's will be merged
>>>> into one pahole cu which will resolve the above
>>>> cross-cu tag reference issue. To test whether a binary
>>>> is built with clang lto or not, The "clang version"
>>>> and "-flto" will be checked against DW_AT_producer string
>>>> for the first 5 debuginfo cu's. The reason is that
>>>> a few linux files disabled lto for various reasons.
>>>>
>>>> Merging cu's will create a single cu with lots of types, tags
>>>> and functions. For example with clang thin-lto built vmlinux,
>>>> I saw 9M entries in types table, 5.2M in tags table. The
>>>> below are pahole wallclock time for different hashbits:
>>>> command line: time pahole -J vmlinux
>>>>         # of hashbits            wallclock time in seconds
>>>>             15                       460
>>>>             16                       255
>>>>             17                       131
>>>>             18                       97
>>>>             19                       75
>>>>             20                       69
>>>>             21                       64
>>>>             22                       62
>>>>             23                       58
>>>>             24                       64
>>>>
>>>> The problem is with hashtags__find(), esp. the loop
>>>>       uint32_t bucket = hashtags__fn(id);
>>>>       const struct hlist_head *head = hashtable + bucket;
>>>>       hlist_for_each_entry(tpos, pos, head, hash_node) {
>>>>               if (tpos->id == id)
>>>>                       return tpos;
>>>>       }
>>>>
>>>> Say we have 9M types and (1 << 15) buckets, that means each bucket
>>>> will have roughly 64 elements. So each lookup will traverse
>>>> the loop 32 iterations on average.
>>>>
>>>> If we have 1 << 21 buckets, then each buckets will have 4 elements,
>>>> and the average number of loop iterations for hashtags__find()
>>>> will be 2.
>>>>
>>>> Note that the number of hashbits 24 makes performance worse
>>>> than 23. The reason could be that 23 hashbits can cover 8M
>>>> buckets (close to 9M for the number of entries in types table).
>>>> Higher number of hash bits allocates more memory and becomes
>>>> less cache efficient compared to 23 hashbits.
>>>>
>>>> This patch picks # of hashbits 21 as the starting value
>>>> and will try to allocate memory based on that, if memory
>>>> allocation fails, we will go with less hashbits until
>>>> we reach hashbits 15 which is the default for
>>>> non merge-cu case.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    dwarf_loader.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 120 insertions(+)
>>>>
>>>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>>>> index aa6372a..a51391e 100644
>>>> --- a/dwarf_loader.c
>>>> +++ b/dwarf_loader.c
>>>> @@ -51,6 +51,7 @@ struct strings *strings;
>>>>    #endif
>>>>
>>>>    static uint32_t hashtags__bits = 15;
>>>> +static uint32_t max_hashtags__bits = 21;
>>>>
>>>>    static uint32_t hashtags__fn(Dwarf_Off key)
>>>>    {
>>>> @@ -2484,6 +2485,115 @@ static int cus__load_debug_types(struct cus *cus, struct conf_load *conf,
>>>>           return 0;
>>>>    }
>>>>
>>>> +static bool cus__merging_cu(Dwarf *dw)
>>>> +{
>>>> +       uint8_t pointer_size, offset_size;
>>>> +       Dwarf_Off off = 0, noff;
>>>> +       size_t cuhl;
>>>> +       int cnt = 0;
>>>> +
>>>> +       /*
>>>> +        * Just checking the first cu is not enough.
>>>> +        * In linux, some C files may have LTO is disabled, e.g.,
>>>> +        *   e242db40be27  x86, vdso: disable LTO only for vDSO
>>>> +        *   d2dcd3e37475  x86, cpu: disable LTO for cpu.c
>>>> +        * Fortunately, disabling LTO for a particular file in a LTO build
>>>> +        * is rather an exception. Iterating 5 cu's to check whether
>>>> +        * LTO is used or not should be enough.
>>>> +        */
>>>> +       while (dwarf_nextcu(dw, off, &noff, &cuhl, NULL, &pointer_size,
>>>> +                           &offset_size) == 0) {
>>>> +               Dwarf_Die die_mem;
>>>> +               Dwarf_Die *cu_die = dwarf_offdie(dw, off + cuhl, &die_mem);
>>>> +
>>>> +               if (cu_die == NULL)
>>>> +                       break;
>>>> +
>>>> +               if (++cnt > 5)
>>>> +                       break;
>>>> +
>>>> +               const char *producer = attr_string(cu_die, DW_AT_producer);
>>>> +               if (strstr(producer, "clang version") != NULL &&
>>>> +                   strstr(producer, "-flto") != NULL)
>>>
>>> Instead of checking for flags, which can be a bit brittle, would it
>>> make more sense to scan the abbreviations to see if there are any
>>> "sec_offset" encodings used for type attributes to indicate that LTO
>>> was used?
>>
>> Do you have additional info related to "sec_offset"? I scanned through
>> my llvm-dwarfdump result and didn't find it.
>>
> Sorry about that. It was the wrong thing to check. I consulted our
> DWARF expert here and he said this.
> 
> "DW_FORM_ref_addr is the important thing used for cross-CU references,
> like in the DW_AT_abstract_origin of the DW_TAG_inlined_subroutine. In
> intra-CU references, DW_FORM_ref4 is used instead."

Thanks. I checked with lto dwarf, it should work. The only drawback is 
that it needs to traverse all DW_TAG_inlined_subroutine
and DW_TAG_type's in the cu. If nothing found, going to the next.
This is exactly what I tried to avoid in the beginning, going deep
in the dwarf cu. I could remember the maximum and minimum of the ref. 
type tags for each cu, if it is beyond the current cu
range, it means cross-cu access. But I didn't use this approach
as it is something every pahole user will pay the price regardless
of whether their binary is lto or not (mostly probably not.)

I checked my current vmlinux-lto. I will be able
to find a DW_FORM_ref_addr in the 5th cu. May not be too bad.
We could have a heuristic to only check the first 10 cu's.
If no cross cu reference, that means not lto. Not perfect.
But for sure we don't want to go through all cu's just to
find it is not lto build.

Still feel we should get whether it is a lto built binary
as soon as possible. With an explicit option, compiler flags
seem second choice...

> 
> -bw
> 
>>>
>>> Thank you for improving on my hacky patch! :-)
>>>
>>> -bw
>>>
>>>> +                       return true;
>>>> +
>>>> +               off = noff;
>>>> +       }
>>>> +
>>>> +       return false;
>>>> +}
>>>> +
>> [...]

  reply	other threads:[~2021-03-30 22:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-28 20:14 [PATCH dwarves v3 0/3] permit merging all dwarf cu's for clang lto built binary Yonghong Song
2021-03-28 20:14 ` [PATCH dwarves v3 1/3] dwarf_loader: permits flexible HASHTAGS__BITS Yonghong Song
2021-03-28 20:14 ` [PATCH dwarves v3 2/3] dwarf_loader: factor out common code to initialize a cu Yonghong Song
2021-03-28 20:14 ` [PATCH dwarves v3 3/3] dwarf_loader: permit merging all dwarf cu's for clang lto built binary Yonghong Song
2021-03-30 20:08   ` Bill Wendling
2021-03-30 20:15     ` Yonghong Song
2021-03-30 21:44       ` Bill Wendling
2021-03-30 22:28         ` Yonghong Song [this message]
2021-03-30 23:25           ` Bill Wendling
2021-03-29 17:40 ` [PATCH dwarves v3 0/3] " Arnaldo Carvalho de Melo
2021-03-30 15:10   ` Arnaldo Carvalho de Melo
2021-03-30 18:08     ` Arnaldo Carvalho de Melo
2021-03-30 18:24       ` Arnaldo Carvalho de Melo
2021-03-31  3:20         ` Yonghong Song
2021-03-31 13:54           ` Arnaldo Carvalho de Melo
2021-03-31 15:08             ` Yonghong Song
2021-03-31  0:29     ` Yonghong Song
2021-03-29 23:14 ` Nick Desaulniers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a71634a6-4b91-1af0-a5aa-11623d391464@fb.com \
    --to=yhs@fb.com \
    --cc=andrii@kernel.org \
    --cc=arnaldo.melo@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=morbo@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).