Re: [PATCH v2 4/5] trace-cmd,kernel-shark: New libtracefs APIs for ftrace events and systems

From: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-trace-devel@vger.kernel.org
Subject: Re: [PATCH v2 4/5] trace-cmd,kernel-shark: New libtracefs APIs for ftrace events and systems
Date: Thu, 9 Jan 2020 16:14:27 +0200	[thread overview]
Message-ID: <CAPpZLN6btoinjnxL7Gv5bgyc1f1==WRJsshXqD6sqtne2Y=rNw@mail.gmail.com> (raw)
In-Reply-To: <20200108160918.32d81c4a@gandalf.local.home>

On Wed, Jan 8, 2020 at 11:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon,  6 Jan 2020 17:40:57 +0200
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
>
> > --- /dev/null
> > +++ b/lib/tracefs/tracefs-events.c
> > @@ -0,0 +1,476 @@
> > +// SPDX-License-Identifier: LGPL-2.1
> > +/*
> > + * Copyright (C) 2008, 2009, 2010 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
> > + *
> > + * Updates:
> > + * Copyright (C) 2019, VMware, Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
> > + *
> > + */
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <dirent.h>
> > +#include <unistd.h>
> > +#include <errno.h>
> > +#include <sys/stat.h>
> > +#include <fcntl.h>
> > +
> > +#include "kbuffer.h"
> > +#include "tracefs.h"
> > +#include "tracefs-local.h"
> > +
> > +/**
> > + * tracefs_read_page_record - read a record off of a page
> > + * @tep: tep used to parse the page
> > + * @page: the page to read
> > + * @size: the size of the page
> > + * @last_record: last record read from this page
> > + *
> > + * If a ring buffer page is available, and the need to parse it
> > + * without having a handle, then this function can be used
>
> Not having a handle to what? Ah, I wrote this back in 2011. And that
> "handle" meant a tracecmd_input_handle. Hmm, I think we need a
> tracefs_handle of some sort for this, because it will slow down
> tracefs_iterate_raw_events() very much so).
>
I'm not sure how tracefs_handle could speed up the events iteration,
but I can think about it in a context of a different patch. The
current patch set
is just a skeleton of the new library, most of the code is moved from
the application
or libtracecmd with almost no modifications.

>
> > + *
> > + * The @tep needs to be initialized to have the page header information
> > + * already available.
> > + *
> > + * The @last_record is used to know where to read the next record from
> > + * If @last_record is NULL, the first record on the page will be read
> > + *
> > + * Returns:
> > + *  A newly allocated record that must be freed with free_record() if
>
> Hmm, free_record() needs a prefix ("tracefs_") ?
Actually free_record() is not exported as libtracefs API, which is
confusing. There is
free_record() libtracecmd API, which is used in the trace-cmd context. May be it
is better to have tracefs_free_record() , or to use the one from libtracecmd ?

>
> > + *  a record is found. Otherwise NULL is returned if the record is bad
> > + *  or no more records exist
> > + */
> > +struct tep_record *
> > +tracefs_read_page_record(struct tep_handle *tep, void *page, int size,
> > +                       struct tep_record *last_record)
> > +{
> > +     unsigned long long ts;
> > +     struct kbuffer *kbuf;
> > +     struct tep_record *record = NULL;
> > +     enum kbuffer_long_size long_size;
> > +     enum kbuffer_endian endian;
> > +     void *ptr;
> > +
> > +     if (tep_is_file_bigendian(tep))
> > +             endian = KBUFFER_ENDIAN_BIG;
> > +     else
> > +             endian = KBUFFER_ENDIAN_LITTLE;
> > +
> > +     if (tep_get_header_page_size(tep) == 8)
> > +             long_size = KBUFFER_LSIZE_8;
> > +     else
> > +             long_size = KBUFFER_LSIZE_4;
> > +
> > +     kbuf = kbuffer_alloc(long_size, endian);
> > +     if (!kbuf)
> > +             return NULL;
> > +
> > +     kbuffer_load_subbuffer(kbuf, page);
> > +     if (kbuffer_subbuffer_size(kbuf) > size) {
> > +             warning("%s: page_size > size", __func__);
> > +             goto out_free;
> > +     }
> > +
> > +     if (last_record) {
> > +             if (last_record->data < page || last_record->data >= (page + size)) {
> > +                     warning("%s: bad last record (size=%u)",
> > +                             __func__, last_record->size);
> > +                     goto out_free;
> > +             }
> > +
> > +             ptr = kbuffer_read_event(kbuf, &ts);
> > +             while (ptr < last_record->data) {
>
> Here we are doing a linear search of last_record.
>
> I could probably add a quicker solution to this by finding the next
> record by passing in last_record->data and last_record->ts to a kbuffer
> function, at the bottom.
>
>
> > +                     ptr = kbuffer_next_event(kbuf, NULL);
> > +                     if (!ptr)
> > +                             break;
> > +                     if (ptr == last_record->data)
> > +                             break;
> > +             }
> > +             if (ptr != last_record->data) {
> > +                     warning("$s: could not find last_record", __func__);
> > +                     goto out_free;
> > +             }
> > +             ptr = kbuffer_next_event(kbuf, &ts);
> > +     } else
> > +             ptr = kbuffer_read_event(kbuf, &ts);
> > +
> > +     if (!ptr)
> > +             goto out_free;
> > +
> > +     record = malloc(sizeof(*record));
> > +     if (!record)
> > +             return NULL;
> > +     memset(record, 0, sizeof(*record));
> > +
> > +     record->ts = ts;
> > +     record->size = kbuffer_event_size(kbuf);
> > +     record->record_size = kbuffer_curr_size(kbuf);
> > +     record->cpu = 0;
> > +     record->data = ptr;
> > +     record->ref_count = 1;
> > +
> > + out_free:
> > +     kbuffer_free(kbuf);
> > +     return record;
> > +}
> > +
> > +static void free_record(struct tep_record *record)
> > +{
> > +     if (!record)
> > +             return;
> > +
> > +     if (record->ref_count > 0)
> > +             record->ref_count--;
> > +     if (record->ref_count)
> > +             return;
> > +
> > +     free(record);
> > +}
> > +
> > +static int
> > +get_events_in_page(struct tep_handle *tep, void *page,
> > +                int size, int cpu,
> > +                int (*callback)(struct tep_event *,
> > +                                struct tep_record *,
> > +                                int, void *),
> > +                void *callback_context)
> > +{
> > +     struct tep_record *last_record = NULL;
> > +     struct tep_event *event = NULL;
> > +     struct tep_record *record;
> > +     int id, cnt = 0;
> > +
> > +     if (size <= 0)
> > +             return 0;
> > +
> > +     while (true) {
> > +             event = NULL;
> > +             record = tracefs_read_page_record(tep, page, size, last_record);
>
> This is very slow because the above function does a linear search to
> find the next record each time! It would be much better to keep a kbuf
> reference and use that.
>
>
> > +             if (!record)
> > +                     break;
> > +             free_record(last_record);
> > +             id = tep_data_type(tep, record);
> > +             event = tep_find_event(tep, id);
> > +             if (event && callback) {
> > +                     if (callback(event, record, cpu, callback_context))
> > +                             break;
> > +             }
> > +             last_record = record;
> > +     }
> > +     free_record(last_record);
> > +
> > +     return cnt;
> > +}
> > +
>
> Here's a patch (untested ;-) that should help speed up the reading by
> using the last record from before. This may even be able to help speed
> up other parts of the code.
>
> -- Steve
>
> diff --git a/lib/traceevent/kbuffer-parse.c b/lib/traceevent/kbuffer-parse.c
> index b18dedc4..ecbb64e9 100644
> --- a/lib/traceevent/kbuffer-parse.c
> +++ b/lib/traceevent/kbuffer-parse.c
> @@ -649,6 +649,40 @@ void *kbuffer_read_at_offset(struct kbuffer *kbuf, int offset,
>         return data;
>  }
>
> +/**
> + * kbuffer_set_position - set kbuf to index to a current offset and timestamp
> + * @kbuf:      The kbuffer to read from
> + * @offset:    The offset to set the current postion to
> + * @ts:                The timestamp to set the kbuffer to.
> + *
> + * This routine is used to set the current position saved in @kbuf.
> + * This can be used in conjunction with using kbuffer_curr_offset() to
> + * get the offset of an event retrieved from the @kbuf subbuffer, and
> + * also using the time stamp that was retrived from that same event.
> + * This will set the position of @kbuf back to the state it was when
> + * it returned that event.
> + *
> + * Returns zero on success, -1 otherwise.
> + */
> +int kbuffer_set_position(struct kbuffer *kbuf, int offset,
> +                          unsigned long long *ts)
> +{
> +       int ret;
> +
> +       offset -= kbuf->start;
> +
> +       if (offset < 0 || offset >= kbuf->size || !ts)
> +               return -1;      /* Bad offset */
> +
> +       kbuf->next = offset;
> +       ret = next_event(kbuf);
> +       if (ret < 0)
> +               return -1;
> +
> +       kbuf->timestamp = *ts;
> +       return 0;
> +}
> +
>  /**
>   * kbuffer_subbuffer_size - the size of the loaded subbuffer
>   * @kbuf:      The kbuffer to read from

Thanks Steven! I'll test it and add it in the next version of the patch set.

-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center