From: Tzvetomir Stoyanov <tstoyanov@vmware.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>,
Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Subject: Re: [PATCH v4] trace-cmd: Save the tracee memory map into the trace.dat file.
Date: Fri, 9 Aug 2019 12:16:09 +0000 [thread overview]
Message-ID: <CACqStofTwKeD_K+=s71_XHwWbVidarO7k28Zw0dhOoSKTtTXAQ@mail.gmail.com> (raw)
In-Reply-To: <20190808185622.67ad09e2@gandalf.local.home>
On Fri, Aug 9, 2019 at 1:56 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 3 Jul 2019 14:51:34 +0300
> tz.stoyanov@gmail.com wrote:
>
> > From: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
> >
> > A new trace-cmd record option is added: "--mmap". When it is set with
> > combination of -F or -P options, the memory map of the traced applications
> > is stored in the trace.dat file. A new API tracecmd_search_task_mmap()
> > can be used to look up into stored memory maps. The map is retrieved from
> > /proc/<pid>/maps file.
> >
>
> I also have some notes below.
>
> > diff --git a/Documentation/trace-cmd-record.1.txt b/Documentation/trace-cmd-record.1.txt
> > index 26a8299..4a59de9 100644
> > --- a/Documentation/trace-cmd-record.1.txt
> > +++ b/Documentation/trace-cmd-record.1.txt
> > @@ -119,6 +119,9 @@ OPTIONS
> > Used with either *-F* (or *-P* if kernel supports it) to trace the process'
> > children too.
> >
> > +*--mmap*::
> > + Used with either *-F* or *-P*, save the traced process memory map into
> > + the trace.dat file.
>
> Do we need to use -F? If we want to record the proc/mmap of the process
> that we execute (trace-cmd recode -e all some-program, where we save
> the process map of some-program). Do we have to only record
> some-program? Which is what -F will do.
>
The only reason for this limitation is that current ptrace logic in trace-cmd is
bound to filtered task, and ptrace is used to track when the process finishes.
It makes sense to have the proc/mmap in this case, I will change the ptrace
logic to not depend on filtered task.
>
> > *-C* 'clock'::
> > Set the trace clock to "clock".
> >
> > diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
> > index 6f62ab9..17edb9d 100644
> > --- a/include/trace-cmd/trace-cmd.h
> > +++ b/include/trace-cmd/trace-cmd.h
> > @@ -82,6 +82,7 @@ enum {
> > TRACECMD_OPTION_OFFSET,
> > TRACECMD_OPTION_CPUCOUNT,
> > TRACECMD_OPTION_VERSION,
> > + TRACECMD_OPTION_PIDMMAPS,
> > };
> >
> > enum {
> > @@ -97,6 +98,12 @@ struct tracecmd_ftrace {
> > int long_size;
> > };
> >
> > +struct lib_mem_map {
>
> If this is going to be exported to external users, it needs to use the
> tracecmd_ prefix. Probably should also add "proc" as well.
>
> struct tracecmd_proc_mem_map ?
>
> I'm also thinking of removing the "mem" as it's not really memory maps,
> it's really called address mapping. Perhaps just drop the "mem", and
> call it tracecmd_proc_map. And address may not even be pointing into
> memory (It could have a device I/O mapping).
>
>
This struct will not be exposed, I'll rename it to tracecmd_proc_map
move it to trace-cmd-local.h
>
> > + unsigned long long start;
> > + unsigned long long end;
> > + char *lib_name;
> > +};
> > +
> > typedef void (*tracecmd_show_data_func)(struct tracecmd_input *handle,
> > struct tep_record *record);
> > typedef void (*tracecmd_handle_init_func)(struct tracecmd_input *handle,
> > @@ -208,6 +215,8 @@ unsigned long long tracecmd_page_ts(struct tracecmd_input *handle,
> > unsigned int tracecmd_record_ts_delta(struct tracecmd_input *handle,
> > struct tep_record *record);
> >
> > +struct lib_mem_map *tracecmd_search_task_mmap(struct tracecmd_input *handle,
> > + int pid, unsigned long long addr);
> > #ifndef SWIG
> > /* hack for function graph work around */
> > extern __thread struct tracecmd_input *tracecmd_curr_thread_handle;
> > diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
> > index 61566ba..d9abbcf 100644
> > --- a/lib/trace-cmd/trace-input.c
> > +++ b/lib/trace-cmd/trace-input.c
> > @@ -101,6 +101,7 @@ struct tracecmd_input {
> > struct tracecmd_ftrace finfo;
> >
> > struct hook_list *hooks;
> > + struct pid_mem_maps *pid_mmaps;
> > /* file information */
> > size_t header_files_start;
> > size_t ftrace_files_start;
> > @@ -2134,6 +2135,166 @@ void tracecmd_set_ts2secs(struct tracecmd_input *handle,
> > handle->use_trace_clock = false;
> > }
> >
> > +static int trace_pid_mmap_cmp(const void *a, const void *b)
> > +{
> > + struct lib_mem_map *map_a = (struct lib_mem_map *)a;
> > + struct lib_mem_map *map_b = (struct lib_mem_map *)b;
> > +
> > + if (map_a->start > map_b->start)
> > + return 1;
> > + if (map_a->start < map_b->start)
> > + return -1;
> > + return 0;
> > +}
> > +
> > +static void mmap_free(struct pid_mem_maps *maps)
> > +{
> > + int i;
> > +
> > + if (!maps)
> > + return;
> > + if (maps->lib_maps) {
> > + for (i = 0; i < maps->nr_lib_maps; i++)
> > + free(maps->lib_maps[i].lib_name);
> > + free(maps->lib_maps);
> > + }
> > + free(maps->proc_name);
> > + free(maps);
> > +}
> > +
> > +#define STR_MMAP_LINE_MAX (PATH_MAX+34)
>
> What's the reason for the +34?
I have no idea, cannot remember what was in my head at that time.
It should be the maximum possible size of
sscanf(buf, "%x %x %s", &maps->pid, &maps->nr_lib_maps, mapname);
string, where:
%s is the name of the library, up to PATH_MAX
the two "%x", int, should be 20 (counting the leading 0x)
and we have two spaces.
so according to this, the define should be (PATH_MAX+22).
We need somehow to protect the local buffer char mapname[], to not
be overflowed.
I guess this string was in a different format in the first version of
the code, and the
define was not updated when the string was changed.
>
> > +static int trace_pid_mmap_load(struct tracecmd_input *handle, char *buf)
> > +{
> > + struct pid_mem_maps *maps = NULL;
> > + char mapname[STR_MMAP_LINE_MAX];
> > + char *line;
> > + int res;
> > + int ret;
> > + int i;
> > +
> > + maps = calloc(1, sizeof(*maps));
> > + if (!maps)
> > + return -ENOMEM;
> > +
> > + ret = -EINVAL;
> > + line = strchr(buf, '\n');
> > + if (!line)
> > + goto out_fail;
> > +
> > + *line = '\0';
> > + if (strlen(buf) > STR_MMAP_LINE_MAX)
> > + goto out_fail;
> > +
> > + res = sscanf(buf, "%x %x %s", &maps->pid, &maps->nr_lib_maps, mapname);
> > + if (res != 3)
> > + goto out_fail;
> > +
> > + ret = -ENOMEM;
> > + maps->proc_name = strdup(mapname);
> > + if (!maps->proc_name)
> > + goto out_fail;
> > +
> > + maps->lib_maps = calloc(maps->nr_lib_maps, sizeof(struct lib_mem_map));
> > + if (!maps->lib_maps)
> > + goto out_fail;
> > +
> > + buf = line + 1;
> > + line = strchr(buf, '\n');
> > + for (i = 0; i < maps->nr_lib_maps; i++) {
> > + if (!line)
> > + break;
> > + *line = '\0';
> > + if (strlen(buf) > STR_MMAP_LINE_MAX)
> > + break;
> > + res = sscanf(buf, "%llx %llx %s", &maps->lib_maps[i].start,
> > + &maps->lib_maps[i].end, mapname);
> > + if (res != 3)
> > + break;
> > + maps->lib_maps[i].lib_name = strdup(mapname);
> > + if (!maps->lib_maps[i].lib_name)
> > + goto out_fail;
> > + buf = line + 1;
> > + line = strchr(buf, '\n');
> > + }
> > +
> > + ret = -EINVAL;
> > + if (i != maps->nr_lib_maps)
> > + goto out_fail;
> > +
> > + qsort(maps->lib_maps, maps->nr_lib_maps,
> > + sizeof(*maps->lib_maps), trace_pid_mmap_cmp);
> > +
> > + maps->next = handle->pid_mmaps;
> > + handle->pid_mmaps = maps;
> > +
> > + return 0;
> > +
> > +out_fail:
> > + mmap_free(maps);
> > + return ret;
> > +}
> > +
> > +static void trace_pid_mmap_free(struct pid_mem_maps *mmaps)
> > +{
> > + struct pid_mem_maps *del;
> > +
> > + while (mmaps) {
> > + del = mmaps;
> > + mmaps = mmaps->next;
> > + mmap_free(del);
> > + }
> > +}
> > +
> > +static int trace_pid_mmap_search(const void *a, const void *b)
> > +{
> > + struct lib_mem_map *key = (struct lib_mem_map *)a;
> > + struct lib_mem_map *map = (struct lib_mem_map *)b;
> > +
> > + if (key->start >= map->end)
> > + return 1;
> > + if (key->start < map->start)
> > + return -1;
> > + return 0;
> > +}
> > +
> > +/**
> > + * tracecmd_search_task_mmap - Search task memory address map
> > + * @handle: input handle to the trace.dat file
> > + * @pid: pid of the task
> > + * @addr: address from the task memory space.
> > + *
> > + * Map of the task memory can be saved in the trace.dat file, using the option
> > + * "--mmap". If there is such information, this API can be used to look up into
> > + * this memory map to find what library is loaded at the given @addr.
> > + *
> > + * A pointer to struct lib_mem_map is returned, containing the name of
> > + * the library at given task @addr and the library start and end addresses.
> > + */
> > +struct lib_mem_map *tracecmd_search_task_mmap(struct tracecmd_input *handle,
> > + int pid, unsigned long long addr)
> > +{
> > + struct pid_mem_maps *maps;
> > + struct lib_mem_map *lib;
> > + struct lib_mem_map key;
> > +
> > + if (!handle || !handle->pid_mmaps)
> > + return NULL;
> > +
> > + maps = handle->pid_mmaps;
> > + while (maps) {
> > + if (maps->pid == pid)
> > + break;
> > + maps = maps->next;
> > + }
> > + if (!maps || !maps->nr_lib_maps || !maps->lib_maps)
> > + return NULL;
> > + key.start = addr;
> > + lib = bsearch(&key, maps->lib_maps, maps->nr_lib_maps,
> > + sizeof(*maps->lib_maps), trace_pid_mmap_search);
> > +
> > + return lib;
> > +}
> > +
> > static int handle_options(struct tracecmd_input *handle)
> > {
> > unsigned long long offset;
> > @@ -2221,9 +2382,6 @@ static int handle_options(struct tracecmd_input *handle)
> > case TRACECMD_OPTION_UNAME:
> > handle->uname = strdup(buf);
> > break;
> > - case TRACECMD_OPTION_VERSION:
> > - handle->version = strdup(buf);
> > - break;
> > case TRACECMD_OPTION_HOOK:
> > hook = tracecmd_create_event_hook(buf);
> > hook->next = handle->hooks;
> > @@ -2233,6 +2391,10 @@ static int handle_options(struct tracecmd_input *handle)
> > cpus = *(int *)buf;
> > handle->cpus = tep_read_number(handle->pevent, &cpus, 4);
> > break;
> > + case TRACECMD_OPTION_PIDMMAPS:
>
> OPTION_PROCMAPS ?
>
> > + if (buf[size] == '\0')
> > + trace_pid_mmap_load(handle, buf);
> > + break;
> > default:
> > warning("unknown option %d", option);
> > break;
> > @@ -2842,10 +3004,12 @@ void tracecmd_close(struct tracecmd_input *handle)
> > free(handle->cpu_data);
> > free(handle->uname);
> > close(handle->fd);
> > -
>
> I wouldn't remove this blank line.
>
>
> > tracecmd_free_hooks(handle->hooks);
> > handle->hooks = NULL;
> >
> > + trace_pid_mmap_free(handle->pid_mmaps);
> > + handle->pid_mmaps = NULL;
> > +
> > if (handle->flags & TRACECMD_FL_BUFFER_INSTANCE)
> > tracecmd_close(handle->parent);
> > else {
> > diff --git a/tracecmd/include/trace-local.h b/tracecmd/include/trace-local.h
> > index 1cad3cc..ae1632c 100644
> > --- a/tracecmd/include/trace-local.h
> > +++ b/tracecmd/include/trace-local.h
> > @@ -157,6 +157,14 @@ struct func_list {
> > const char *mod;
> > };
> >
> > +struct pid_mem_maps {
> > + struct pid_mem_maps *next;
> > + struct lib_mem_map *lib_maps;
> > + unsigned int nr_lib_maps;
> > + char *proc_name;
> > + int pid;
> > +};
> > +
> > struct buffer_instance {
> > struct buffer_instance *next;
> > const char *name;
> > @@ -183,6 +191,8 @@ struct buffer_instance {
> > struct tracecmd_msg_handle *msg_handle;
> > struct tracecmd_output *network_handle;
> >
> > + struct pid_mem_maps *mem_maps;
> > +
> > char *max_graph_depth;
> >
> > int flags;
> > diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
> > index 5dc6f17..48081d4 100644
> > --- a/tracecmd/trace-record.c
> > +++ b/tracecmd/trace-record.c
> > @@ -84,6 +84,7 @@ static int max_kb;
> > static bool use_tcp;
> >
> > static int do_ptrace;
> > +static int get_mmap;
> >
> > static int filter_task;
> > static int filter_pid = -1;
> > @@ -1062,6 +1063,120 @@ static char *make_pid_filter(char *curr_filter, const char *field)
> > return filter;
> > }
> >
> > +static int get_pid_mmaps(int pid)
>
> Should probably remove all references to anything with 'mmap' here and
> even remove "mem" as well. Again, these are technically not memory
> maps but address mappings. Subtle difference, but a solid difference
> none the less.
OK
I'll rename all 'mmaps', related to this feature, to 'procmaps', for
consistency.
>
> > +{
> > + struct buffer_instance *instance = &top_instance;
> > + struct pid_mem_maps *maps = instance->mem_maps;
> > + struct pid_mem_maps *m;
> > + unsigned long long begin, end, inode, tmp;
> > + struct lib_mem_map *map;
> > + char mapname[PATH_MAX+1];
> > + char fname[PATH_MAX+1];
> > + char buf[PATH_MAX+100];
>
> What's the reason for the +100?
This should count the maximum size of
"%llx-%llx %s %llx %s %lld "
Where we have:
3 * %llx -> 48
%lld -> 21
%s, 4 bytes permisions
%s, 5 bytes device
spaces - at least 6 bytes, can vary.
>
> > + char perm[5];
> > + char dev[6];
> > + FILE *f;
> > + int ret;
> > + int res;
> > + int i;
> > +
> > + sprintf(fname, "/proc/%d/exe", pid);
> > + ret = readlink(fname, mapname, PATH_MAX);
> > + if (ret >= PATH_MAX || ret < 0)
> > + return -ENOENT;
> > + mapname[ret] = 0;
> > +
> > + sprintf(fname, "/proc/%d/maps", pid);
> > + f = fopen(fname, "r");
> > + if (!f)
> > + return -ENOENT;
> > +
> > + while (maps) {
> > + if (pid == maps->pid)
> > + break;
> > + maps = maps->next;
> > + }
> > +
> > + ret = -ENOMEM;
> > + if (!maps) {
> > + maps = calloc(1, sizeof(*maps));
> > + if (!maps)
> > + goto out_fail;
> > + maps->pid = pid;
> > + maps->next = instance->mem_maps;
> > + instance->mem_maps = maps;
> > + } else {
> > + for (i = 0; i < maps->nr_lib_maps; i++)
> > + free(maps->lib_maps[i].lib_name);
> > + free(maps->lib_maps);
> > + maps->lib_maps = NULL;
> > + maps->nr_lib_maps = 0;
> > + free(maps->proc_name);
> > + }
> > +
> > + maps->proc_name = strdup(mapname);
> > + if (!maps->proc_name)
> > + goto out;
> > +
> > + while (fgets(buf, sizeof(buf), f)) {
> > + mapname[0] = '\0';
> > + res = sscanf(buf, "%llx-%llx %s %llx %s %lld %s",
> > + &begin, &end, perm, &tmp, dev, &inode, mapname);
>
> This is dangerous with perm and dev, as if either string is bigger than
> their size, you just created an overflow bug.
>
> buf, "%llx-%llx %4s %llx %5s %lld %" STRINGIFY(PATH_MAX) "s"
>
> Which will keep sscanf from overwriting the size of the variables.
>
> Note, the STRINGIFY() is defined as:
>
> #define _STRINGIFY(x) #x
> #define STRINGIFY(x) _STRINGIFY(x)
>
Thanks, I'll change it. I will remove perm and dev, as these
are not used in our code and their length can vary. Going to read them
with "%*s".
> -- Steve
>
>
--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center
prev parent reply other threads:[~2019-08-09 12:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-03 11:51 [PATCH v4] trace-cmd: Save the tracee memory map into the trace.dat file tz.stoyanov
2019-07-04 12:36 ` Yordan Karadzhov (VMware)
2019-08-08 22:03 ` Steven Rostedt
2019-08-08 22:56 ` Steven Rostedt
2019-08-09 12:16 ` Tzvetomir Stoyanov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACqStofTwKeD_K+=s71_XHwWbVidarO7k28Zw0dhOoSKTtTXAQ@mail.gmail.com' \
--to=tstoyanov@vmware.com \
--cc=linux-trace-devel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tz.stoyanov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).