linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Format of trace.dat flyrecord section
@ 2022-06-15 15:18 Matteo Bertolino
  2022-06-16  4:25 ` Tzvetomir Stoyanov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matteo Bertolino @ 2022-06-15 15:18 UTC (permalink / raw)
  To: linux-trace-users

Dear community,

[Goal] I am trying to understand more about tracing. In one of my experiments, I decided to try to write my own trace.dat file (with some dummy entries).
To do that, I am following the documentation for a "version-7 .dat" provided in https://github.com/rostedt/trace-cmd/blob/master/Documentation/trace-cmd/trace-cmd.dat.v7.5.txt , 
and trying to looking at the code of libtraceevent, trace-cmd and kernelshark. I use kernelshark to be sure I am writing a good trace.

[Context] I managed to write a complete header (I double-checked checking in strategical points in trace-cmd, kernelshark and libtraceevent code). In this header, I have three sections:
- Section 0, with options 16 (header_infos), option 8 (in which I stated to have a single CPU), and option 3. The latter states that I have a cpu whose data starts at offset 4096 of the trace.
- Section 16: in which I transfers the information of `header_page` and `header_event` files.
- Section 3: the flyrecord section, whose header is followed by a padding to be one-page aligned. After the padding, there should be the CPU datas.
Yet another check of the header's correctness are outputs of commands `trace-cmd dump --flyrecord -i mytrace.dat` and `trace-cmd dump --summary -i ../mytrace_v7.dat`.

[Problem] The problems occur there. I don't manage to get the format of flyrecords. 
I understood that timestamps need to follow the structure of ringbuffers (those expressed in `header_event` file), so 5 bits for the type_len, 27 for the time_delta, and an u32 array[].
But, kernelshark entries contains also: timestamp, CPUs, PID, EVENT, TASK, LATENCY and "INFO". 

Through flybuffer schema, I can only provide TIMESTAMP divided by CPUs, but where do I take other fields?
In addition to this, I didn't understand how I can provide multiple entries.

Thanks for you possible help,
Matteo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Format of trace.dat flyrecord section
  2022-06-15 15:18 Format of trace.dat flyrecord section Matteo Bertolino
@ 2022-06-16  4:25 ` Tzvetomir Stoyanov
  2022-06-16 13:00 ` Steven Rostedt
       [not found] ` <20220616090237.2c9dda8b@gandalf.local.home>
  2 siblings, 0 replies; 8+ messages in thread
From: Tzvetomir Stoyanov @ 2022-06-16  4:25 UTC (permalink / raw)
  To: Matteo Bertolino; +Cc: linux-trace-users, Steven Rostedt

On Wed, Jun 15, 2022 at 6:19 PM Matteo Bertolino
<matteo.bertolino@huawei.com> wrote:
>
> Dear community,
>
> [Goal] I am trying to understand more about tracing. In one of my experiments, I decided to try to write my own trace.dat file (with some dummy entries).
> To do that, I am following the documentation for a "version-7 .dat" provided in https://github.com/rostedt/trace-cmd/blob/master/Documentation/trace-cmd/trace-cmd.dat.v7.5.txt ,
> and trying to looking at the code of libtraceevent, trace-cmd and kernelshark. I use kernelshark to be sure I am writing a good trace.
>
> [Context] I managed to write a complete header (I double-checked checking in strategical points in trace-cmd, kernelshark and libtraceevent code). In this header, I have three sections:
> - Section 0, with options 16 (header_infos), option 8 (in which I stated to have a single CPU), and option 3. The latter states that I have a cpu whose data starts at offset 4096 of the trace.
> - Section 16: in which I transfers the information of `header_page` and `header_event` files.
> - Section 3: the flyrecord section, whose header is followed by a padding to be one-page aligned. After the padding, there should be the CPU datas.
> Yet another check of the header's correctness are outputs of commands `trace-cmd dump --flyrecord -i mytrace.dat` and `trace-cmd dump --summary -i ../mytrace_v7.dat`.
>
> [Problem] The problems occur there. I don't manage to get the format of flyrecords.
> I understood that timestamps need to follow the structure of ringbuffers (those expressed in `header_event` file), so 5 bits for the type_len, 27 for the time_delta, and an u32 array[].
> But, kernelshark entries contains also: timestamp, CPUs, PID, EVENT, TASK, LATENCY and "INFO".
>
> Through flybuffer schema, I can only provide TIMESTAMP divided by CPUs, but where do I take other fields?

Hi Matteo,
The best source to look for the format of the events in the ring
buffer and how to parse them is
https://github.com/rostedt/libtraceevent/ , and especially the file
https://github.com/rostedt/libtraceevent/blob/libtraceevent/src/kbuffer-parse.c
. The format of the data in the u32 array[] that you mentioned is
event specific and is described in its format file, for example look
at the events/sched/sched_wakeup/format for sched_wakeup event data,
in the tracefs directory. Notice that the first 4 fields of all
events are the same - those with "common_" prefix.

> In addition to this, I didn't understand how I can provide multiple entries.

The events in the ring buffer are recorded just one after another. As
you can find the size of each recorded event when reading and parsing
the data, it is easy to jump to the next one. Be aware that pages of
the ring buffer are always aligned to the size of the system page.
Note that there are also some special events, with types described in
the `header_event` file.

>
> Thanks for you possible help,
> Matteo
>

-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Format of trace.dat flyrecord section
  2022-06-15 15:18 Format of trace.dat flyrecord section Matteo Bertolino
  2022-06-16  4:25 ` Tzvetomir Stoyanov
@ 2022-06-16 13:00 ` Steven Rostedt
       [not found] ` <20220616090237.2c9dda8b@gandalf.local.home>
  2 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2022-06-16 13:00 UTC (permalink / raw)
  To: Matteo Bertolino; +Cc: linux-trace-users


[ Resending due to typo in mailing list address ]

On Wed, 15 Jun 2022 15:18:41 +0000
Matteo Bertolino <matteo.bertolino@huawei.com> wrote:

> Dear community,

Hi Matteo,

And welcome to our community!

> 
> [Goal] I am trying to understand more about tracing. In one of my experiments, I decided to try to write my own trace.dat file (with some dummy entries).
> To do that, I am following the documentation for a "version-7 .dat" provided in https://github.com/rostedt/trace-cmd/blob/master/Documentation/trace-cmd/trace-cmd.dat.v7.5.txt , 
> and trying to looking at the code of libtraceevent, trace-cmd and kernelshark. I use kernelshark to be sure I am writing a good trace.
> 
> [Context] I managed to write a complete header (I double-checked checking in strategical points in trace-cmd, kernelshark and libtraceevent code). In this header, I have three sections:
> - Section 0, with options 16 (header_infos), option 8 (in which I stated to have a single CPU), and option 3. The latter states that I have a cpu whose data starts at offset 4096 of the trace.
> - Section 16: in which I transfers the information of `header_page` and `header_event` files.
> - Section 3: the flyrecord section, whose header is followed by a padding to be one-page aligned. After the padding, there should be the CPU datas.
> Yet another check of the header's correctness are outputs of commands `trace-cmd dump --flyrecord -i mytrace.dat` and `trace-cmd dump --summary -i ../mytrace_v7.dat`.
> 
> [Problem] The problems occur there. I don't manage to get the format of flyrecords. 
> I understood that timestamps need to follow the structure of ringbuffers (those expressed in `header_event` file), so 5 bits for the type_len, 27 for the time_delta, and an u32 array[].

I guess you have most the meta data set up. Now the actual data is defined
by the Linux kernel, which you appear to have some idea about. The document
of the format is in the kernel here:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/ring_buffer.h#n22

And is broken up by the defined "page size" (which is really a misnomer,
because it should be the "buffer" size) in the BUFFER option.

  trace-cmd dump --options
[..]
                [Option BUFFER, 183 bytes]
1540361 [offset]
"" [name]
"local" [clock]
4096 [Page size, bytes]   <---
8 [CPUs]:


Thus the buffer is broken up by 4096 bytes, where the first page has the
timestamp information (defined in the kernel)

 # cat /sys/kernel/tracing/events/header_page
	field: u64 timestamp;	offset:0;	size:8;
 signed:0; field: local_t commit;	offset:8;	size:8;
 signed:1; field: int overwrite;	offset:8;	size:1;
 signed:1; field: char data;	offset:16;	size:4080;
 signed:1;

The timestamp is what the rest of the timestamps are based on.

The commit is the size of the data on the page (should be less than 4096 -
16)

"overwrite" means that the commit can hold flags that show that events were
missed.

"data" is where the payload of events exist.

Also stored in the trace.dat file:

  trace-cmd dump --head-page
	[Section 16 @ 37: "headers", flags 0x1, 263 bytes]
	[Header page, 205 bytes]
	field: u64 timestamp;	offset:0;	size:8;
  signed:0; field: local_t commit;	offset:8;	size:8;
  signed:1; field: int overwrite;	offset:8;	size:1;
  signed:1; field: char data;	offset:16;	size:4080;
  signed:1;


As described in ring_buffer.h, the events then have a header:

The type_len is 5 bits that are as follows:

0 - means that the length of the entire event is in the next 4 bytes,
   and the payload starts in the 4 bytes after that.
1-28 - means that the payload of the event starts in the next 4 bytes
  and is a multiple of 4 of this field.
  1 - the payload is 4 bytes
  28 - the payload is 112 bytes.
 For events bigger than 112 bytes, the size is defined by the 4 bytes,
  and this field is zero.

29 - padding
30 - time extend
31 - absolute timestamp.

The above is also stored in the trace.dat file:

  trace-cmd dump --head-event
        [Section 16 @ 37: "headers", flags 0x1, 263 bytes]
        [Header event, 205 bytes]
# compressed entry header
        type_len    :    5 bits
        time_delta  :   27 bits
        array       :   32 bits

        padding     : type == 29
        time_extend : type == 30
        time_stamp : type == 31
        data max type_len  == 28



Then the rest of the payload is defined by the events themselves, that is
stored in the trace.dat file:

 trace-cmd dump --events
[..]
        [Section 18 @ 1734: "events format", flags 0x1, 116488 bytes]
        [Events format, 110 systems]
name: xhci_urb_giveback
ID: 1425
format:
        field:unsigned short common_type;       offset:0;       size:2;
 signed:0; field:unsigned char common_flags;       offset:2;       size:1;
 signed:0; field:unsigned char common_preempt_count;       offset:3;
 size:1; signed:0; field:int common_pid;   offset:4;       size:4; signed:1;

        field:void * urb;       offset:8;       size:8; signed:0;
        field:unsigned int pipe;        offset:16;      size:4; signed:0;
        field:unsigned int stream;      offset:20;      size:4; signed:0;
        field:int status;       offset:24;      size:4; signed:1;
        field:unsigned int flags;       offset:28;      size:4; signed:0;
        field:int num_mapped_sgs;       offset:32;      size:4; signed:1;
        field:int num_sgs;      offset:36;      size:4; signed:1;
        field:int length;       offset:40;      size:4; signed:1;
        field:int actual;       offset:44;      size:4; signed:1;
        field:int epnum;        offset:48;      size:4; signed:1;
        field:int dir_in;       offset:52;      size:4; signed:1;
        field:int type; offset:56;      size:4; signed:1;
        field:int slot_id;      offset:60;      size:4; signed:1;

print fmt: "ep%d%s-%s: urb %p pipe %u slot %d length %d/%d sgs %d/%d stream
%d flags %08x", REC->epnum, REC->dir_in ? "in" : "out",
__print_symbolic(REC->type, { 3, "intr" }, { 0, "control" }, { 2, "bulk" },
{ 1, "isoc" }), REC->urb, REC->pipe, REC->slot_id, REC->actual,
REC->length, REC->num_mapped_sgs, REC->num_sgs, REC->stream, REC->flags

[..]

This defines the payload that sits in the event.


> But, kernelshark entries contains also: timestamp, CPUs, PID, EVENT, TASK, LATENCY and "INFO". 
> 
> Through flybuffer schema, I can only provide TIMESTAMP divided by CPUs, but where do I take other fields?
> In addition to this, I didn't understand how I can provide multiple entries.
> 


Hope the above helps.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Format of trace.dat flyrecord section
       [not found] ` <20220616090237.2c9dda8b@gandalf.local.home>
@ 2022-06-16 13:03   ` Steven Rostedt
  2022-06-17 13:08     ` Matteo Bertolino
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2022-06-16 13:03 UTC (permalink / raw)
  To: Matteo Bertolino; +Cc: linux-trace-users


OK, this is a claws-mail bug. It added the '\' in the email address when
doing a reply all.

-- Steve


On Thu, 16 Jun 2022 09:02:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> This is just a test to see if the added '\' was due to a claws-mail bug or
> somehow I did it.
> 
> -- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Format of trace.dat flyrecord section
  2022-06-16 13:03   ` Steven Rostedt
@ 2022-06-17 13:08     ` Matteo Bertolino
  2022-06-17 13:10       ` Matteo Bertolino
  2022-06-17 15:25       ` Tzvetomir Stoyanov
  0 siblings, 2 replies; 8+ messages in thread
From: Matteo Bertolino @ 2022-06-17 13:08 UTC (permalink / raw)
  To: linux-trace-users; +Cc: Steven Rostedt, Tzvetomir Stoyanov

Dear Mr Rostedt and mr Stoyanov,
thanks for your warming welcoming and for your answers.

I had clearer ideas now. If possible, I would like to ask some clarifications.
Once everything is clear in my mind, I will write a resume with an example for the mailing list, so it can be usefulf for future experiments.

Let's suppose that I want to write some events like the "function tracer":
Thanks to your explanation, I successfully communicated in the header this format. This is confirmed by:

```
dreamer@dreamer-VB:~/Desktop/mydotdat/input$ trace-cmd dump --events -i ../mytrace_v7.dat 
	[Section 18 @ 617: "Unknown", flags 0x0, 485 bytes]
	[Events format, 1 systems]
name: function
ID: 1
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long parent_ip;	offset:16;	size:8;	signed:0;

print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip
```

Now, in my trace.dat, I want to write the real information.
If you permit me, I'd like to continue this discussion with a concrete example.

Let's suppose that I want to represent the following information:

Event: function trace event
Timestamp 9 seconds
PID: 7777
Function address: 0x2266D
Parent: 0x2266E

I start by filling the header_event information:
- Writing 900000000000 on 8 bytes (timestamp).
- Writing `24`in 8 bytes (commit) --> event data for event ID are on 24 bytes, indeed: common_* are on 8 bytes, ip and parent ip on 8.
- Ignoring overwrite: indeed, I already used commit.

Then, I have the 4080 bytes of payload event, but I use only 24 of them, as expressed in commit.
- common_type is 1, (2 bytes), because I want a function event
- common_flags is 0 (1 byte)
- common_preempt_count is 0 (1 byte)
- common_pid is 7777 (4 bytes)
- ip is 0x2266D (8 bytes)
- parent_ip is 0x2266E (8 bytes)
- the remaining 4096 - 24 bytes are of padding, so I can complete the 4080 bytes of data.

Unfortunately, in this way, only CPU number and timestamp are correctly displayed on the frontend (kernelshark).
Event is marked as "unknown" and PID is wrong.
Timestamp astonishes me too: to correctly display it, I just need to write 900000000000 on 8 bytes.
Trying to put "1" on time_type (5bits), 0 on delta (27 bits) and 900000000000 on array (1*4 bytes), the timestamp is not correct.

Thus, I think I am doing something wrong. May you help me in correcting this example, please?

Another thing that did let me with a doubt is the following one: do I need a page (4096 bytes) for an event?
In other words: the 4096-24 bytes of padding to complete the 4080 bytes of "data" are mandatory to express another event?

Thanks for your help,
Matteo

-----Original Message-----
From: Steven Rostedt [mailto:rostedt@goodmis.org] 
Sent: Thursday, June 16, 2022 3:04 PM
To: Matteo Bertolino <matteo.bertolino@huawei.com>
Cc: linux-trace-users@vger.kernel.org
Subject: Re: Format of trace.dat flyrecord section


OK, this is a claws-mail bug. It added the '\' in the email address when doing a reply all.

-- Steve


On Thu, 16 Jun 2022 09:02:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> This is just a test to see if the added '\' was due to a claws-mail 
> bug or somehow I did it.
> 
> -- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Format of trace.dat flyrecord section
  2022-06-17 13:08     ` Matteo Bertolino
@ 2022-06-17 13:10       ` Matteo Bertolino
  2022-06-17 15:25       ` Tzvetomir Stoyanov
  1 sibling, 0 replies; 8+ messages in thread
From: Matteo Bertolino @ 2022-06-17 13:10 UTC (permalink / raw)
  To: linux-trace-users; +Cc: Steven Rostedt, Tzvetomir Stoyanov

Typo in the previous mail (towards the end): " Trying to put "1" on time_type (5bits), 1 on delta (27 bits) and ....."
I also corrected the cited text, below.

Bests,
Matteo

-----Original Message-----
From: Matteo Bertolino [mailto:matteo.bertolino@huawei.com] 
Sent: Friday, June 17, 2022 3:09 PM
To: linux-trace-users@vger.kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>; Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Subject: RE: Format of trace.dat flyrecord section

Dear Mr Rostedt and mr Stoyanov,
thanks for your warming welcoming and for your answers.

I had clearer ideas now. If possible, I would like to ask some clarifications.
Once everything is clear in my mind, I will write a resume with an example for the mailing list, so it can be usefulf for future experiments.

Let's suppose that I want to write some events like the "function tracer":
Thanks to your explanation, I successfully communicated in the header this format. This is confirmed by:

```
dreamer@dreamer-VB:~/Desktop/mydotdat/input$ trace-cmd dump --events -i ../mytrace_v7.dat 
	[Section 18 @ 617: "Unknown", flags 0x0, 485 bytes]
	[Events format, 1 systems]
name: function
ID: 1
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long parent_ip;	offset:16;	size:8;	signed:0;

print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip ```

Now, in my trace.dat, I want to write the real information.
If you permit me, I'd like to continue this discussion with a concrete example.

Let's suppose that I want to represent the following information:

Event: function trace event
Timestamp 9 seconds
PID: 7777
Function address: 0x2266D
Parent: 0x2266E

I start by filling the header_event information:
- Writing 900000000000 on 8 bytes (timestamp).
- Writing `24`in 8 bytes (commit) --> event data for event ID are on 24 bytes, indeed: common_* are on 8 bytes, ip and parent ip on 8.
- Ignoring overwrite: indeed, I already used commit.

Then, I have the 4080 bytes of payload event, but I use only 24 of them, as expressed in commit.
- common_type is 1, (2 bytes), because I want a function event
- common_flags is 0 (1 byte)
- common_preempt_count is 0 (1 byte)
- common_pid is 7777 (4 bytes)
- ip is 0x2266D (8 bytes)
- parent_ip is 0x2266E (8 bytes)
- the remaining 4096 - 24 bytes are of padding, so I can complete the 4080 bytes of data.

Unfortunately, in this way, only CPU number and timestamp are correctly displayed on the frontend (kernelshark).
Event is marked as "unknown" and PID is wrong.
Timestamp astonishes me too: to correctly display it, I just need to write 900000000000 on 8 bytes.
Trying to put "1" on time_type (5bits), 1 on delta (27 bits) and 900000000000 on array (1*4 bytes), the timestamp is not correct.

Thus, I think I am doing something wrong. May you help me in correcting this example, please?

Another thing that did let me with a doubt is the following one: do I need a page (4096 bytes) for an event?
In other words: the 4096-24 bytes of padding to complete the 4080 bytes of "data" are mandatory to express another event?

Thanks for your help,
Matteo

-----Original Message-----
From: Steven Rostedt [mailto:rostedt@goodmis.org]
Sent: Thursday, June 16, 2022 3:04 PM
To: Matteo Bertolino <matteo.bertolino@huawei.com>
Cc: linux-trace-users@vger.kernel.org
Subject: Re: Format of trace.dat flyrecord section


OK, this is a claws-mail bug. It added the '\' in the email address when doing a reply all.

-- Steve


On Thu, 16 Jun 2022 09:02:37 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> This is just a test to see if the added '\' was due to a claws-mail 
> bug or somehow I did it.
> 
> -- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Format of trace.dat flyrecord section
  2022-06-17 13:08     ` Matteo Bertolino
  2022-06-17 13:10       ` Matteo Bertolino
@ 2022-06-17 15:25       ` Tzvetomir Stoyanov
  2022-06-20 15:59         ` Matteo Bertolino
  1 sibling, 1 reply; 8+ messages in thread
From: Tzvetomir Stoyanov @ 2022-06-17 15:25 UTC (permalink / raw)
  To: Matteo Bertolino; +Cc: linux-trace-users, Steven Rostedt

On Fri, Jun 17, 2022 at 4:08 PM Matteo Bertolino
<matteo.bertolino@huawei.com> wrote:
>
> Dear Mr Rostedt and mr Stoyanov,
> thanks for your warming welcoming and for your answers.
>
> I had clearer ideas now. If possible, I would like to ask some clarifications.
> Once everything is clear in my mind, I will write a resume with an example for the mailing list, so it can be usefulf for future experiments.
>
> Let's suppose that I want to write some events like the "function tracer":
> Thanks to your explanation, I successfully communicated in the header this format. This is confirmed by:
>
> ```
> dreamer@dreamer-VB:~/Desktop/mydotdat/input$ trace-cmd dump --events -i ../mytrace_v7.dat
>         [Section 18 @ 617: "Unknown", flags 0x0, 485 bytes]
>         [Events format, 1 systems]
> name: function
> ID: 1
> format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
>
>         field:unsigned long ip; offset:8;       size:8; signed:0;
>         field:unsigned long parent_ip;  offset:16;      size:8; signed:0;
>
> print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip
> ```
>
> Now, in my trace.dat, I want to write the real information.
> If you permit me, I'd like to continue this discussion with a concrete example.
>
> Let's suppose that I want to represent the following information:
>
> Event: function trace event
> Timestamp 9 seconds
> PID: 7777
> Function address: 0x2266D
> Parent: 0x2266E
>
> I start by filling the header_event information:
> - Writing 900000000000 on 8 bytes (timestamp).
> - Writing `24`in 8 bytes (commit) --> event data for event ID are on 24 bytes, indeed: common_* are on 8 bytes, ip and parent ip on 8.
> - Ignoring overwrite: indeed, I already used commit.
>
> Then, I have the 4080 bytes of payload event, but I use only 24 of them, as expressed in commit.
> - common_type is 1, (2 bytes), because I want a function event
> - common_flags is 0 (1 byte)
> - common_preempt_count is 0 (1 byte)
> - common_pid is 7777 (4 bytes)
> - ip is 0x2266D (8 bytes)
> - parent_ip is 0x2266E (8 bytes)
> - the remaining 4096 - 24 bytes are of padding, so I can complete the 4080 bytes of data.
>
> Unfortunately, in this way, only CPU number and timestamp are correctly displayed on the frontend (kernelshark).
> Event is marked as "unknown" and PID is wrong.
> Timestamp astonishes me too: to correctly display it, I just need to write 900000000000 on 8 bytes.
> Trying to put "1" on time_type (5bits), 0 on delta (27 bits) and 900000000000 on array (1*4 bytes), the timestamp is not correct.

Hi Matteo,
The 'type_len' should be 6 for your event, as the payload is 24 bytes
(6*4). The 'time_delta' is the delta from the timestamp written in the
page header, i.e. if you put 199 in 'time_delta' the time stamp of the
event is 900000000199. One very important note - take care of the
endian, used for all numbers in the file. The endian of the file can
be different from the endian of the host, where the file is displayed.
That's why that information is written in the trace.dat - in the very
beginning of the file.

>
> Thus, I think I am doing something wrong. May you help me in correcting this example, please?
>
> Another thing that did let me with a doubt is the following one: do I need a page (4096 bytes) for an event?
> In other words: the 4096-24 bytes of padding to complete the 4080 bytes of "data" are mandatory to express another event?
>
> Thanks for your help,
> Matteo
>
> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> Sent: Thursday, June 16, 2022 3:04 PM
> To: Matteo Bertolino <matteo.bertolino@huawei.com>
> Cc: linux-trace-users@vger.kernel.org
> Subject: Re: Format of trace.dat flyrecord section
>
>
> OK, this is a claws-mail bug. It added the '\' in the email address when doing a reply all.
>
> -- Steve
>
>
> On Thu, 16 Jun 2022 09:02:37 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > This is just a test to see if the added '\' was due to a claws-mail
> > bug or somehow I did it.
> >
> > -- Steve
>


-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Format of trace.dat flyrecord section
  2022-06-17 15:25       ` Tzvetomir Stoyanov
@ 2022-06-20 15:59         ` Matteo Bertolino
  0 siblings, 0 replies; 8+ messages in thread
From: Matteo Bertolino @ 2022-06-20 15:59 UTC (permalink / raw)
  To: Tzvetomir Stoyanov; +Cc: linux-trace-users, Steven Rostedt

Dear Tzvetomir,
Thanks for your answer. Everything is clear to me now.

In the rest of this mail, I'd like to share to the community some considerations about how to structure the fly record section in a trace.dat file.
This resume is not complete at all: it only includes points that were not clear to me before exchanging with Mr. Stoyanov and Mr. Rostedt,. Maybe some people may also find it useful.

- In a Flyrecord section (ID: 3),  data are divided per CPU. After the section's header, a padding must be present to ensure the real data to start at a offset that is page-aligned (e.g., 4096 is a common situation).
- Let's imagine that, for CPU 0, data starts at offset 4096 of the trace.dat
- Real tracing data are divided in "pages"/buffers of 4096 bytes.  Each page starts with a header, whose format is explicated in "header_page" file. 

	field: u64 timestamp;	offset:0;	size:8;	signed:1;
	field: local_t commit;	offset:8;	size:8;	signed:1;
	field: int overwrite;	offset:8;	size:1;	signed:1;
	field: char data;	offset:16;	size:4080;	signed:1;

- In header_page, pay attention that fields `commit` and `overwrite` are mutually exclusive (same offset!).
- In header_page, timestamp is a 64-bit number that represent the base timestamp for all the entries of that page. Let's suppose that we write 9000000000 (9 seconds) on it: this `9000000000` will not eventually displayed on kernelshark. It's just a starting point. "data" is the buffer in which the real entries (several) will be written.
- Then, it's time to write the real entries (also named "events". Entries are composed of 2 parts: 1) a header  and 2) rest of the data.
- Events' header is explained in the file "header_event". This is the same as Linux's ringbuffer format: https://elixir.bootlin.com/linux/latest/source/include/linux/ring_buffer.h#15 ->

# compressed entry header
	type_len    :    5 bits
	time_delta  :   27 bits
	array       :   32 bits

	padding     : type == 29
	time_extend : type == 30
	time_stamp : type == 31
	data max type_len  == 28

- You pass both headers files within the section 16.
- Events'/Entries' header only explicit the timestamp of the entry. I will not explain it since it's very clear by previous exchanges with Mr. Stoyanov and Mr. Rostedt and from Linux source code. However, let me clarify some magic points.
- Let's take the base of  9 seconds. We want to add 20 ms to it. In this case, the reasoning shall be: "9s + 20 ms is represented on 24 bytes (6*4). So, I need to write 6 on the first 5 bits (type_len), 20000000 on the remaining 27 bits (time delta)" (for a total of 4 bytes) AND NOT WRITING the 32 bits of array: I don't use it, so I don't write it. Note: 6 bytes are required to contain the value of the BASE (of 9s) + the delta (of 20 ms). In this way, kernelshark will display 9.02 s.
- If we need to add another entry, keep in mind that the time delta is respect to the previous entry, and not the base anymore.
- At this point we understood how to write the timestamps, but a trace is composed of many other fields: PID, function address, caller, etc.
- First of all, we need to pick a format. Naturally we can write our own format, but, if you want to display this trace with kernelshark without writing kernelshark plugins, you need to respect one of the existing formats. For instance, let's suppose to use "function tracer" of ftrace. In debugfs, look for: /sys/kernel/debug/tracing/events/ftrace/function/format

ID: 1
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long ip;	offset:8;	size:8;	signed:0;
	field:unsigned long parent_ip;	offset:16;	size:8;	signed:0;

print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip

- If you didn't already do it, you need to pass this format to the trace.dat through section 18.
- common_type is the ID of the event (notice: for the previous event, it's 1, as we can notice at the beginning of the file)
- common_flags and common_preempt_count impacts the field "latency" of kernelshark.
- common_pid is, indeed, the PID
- ip and parent_ip are the trace function's address and his caller.
- The final print will impact the field "info" of kernelshark.

Hopefully this will be useful for the ftrace-users community.
Bests,
Matteo

-----Original Message-----
From: Tzvetomir Stoyanov [mailto:tz.stoyanov@gmail.com] 
Sent: Friday, June 17, 2022 5:26 PM
To: Matteo Bertolino <matteo.bertolino@huawei.com>
Cc: linux-trace-users@vger.kernel.org; Steven Rostedt <rostedt@goodmis.org>
Subject: Re: Format of trace.dat flyrecord section

On Fri, Jun 17, 2022 at 4:08 PM Matteo Bertolino <matteo.bertolino@huawei.com> wrote:
>
> Dear Mr Rostedt and mr Stoyanov,
> thanks for your warming welcoming and for your answers.
>
> I had clearer ideas now. If possible, I would like to ask some clarifications.
> Once everything is clear in my mind, I will write a resume with an example for the mailing list, so it can be usefulf for future experiments.
>
> Let's suppose that I want to write some events like the "function tracer":
> Thanks to your explanation, I successfully communicated in the header this format. This is confirmed by:
>
> ```
> dreamer@dreamer-VB:~/Desktop/mydotdat/input$ trace-cmd dump --events -i ../mytrace_v7.dat
>         [Section 18 @ 617: "Unknown", flags 0x0, 485 bytes]
>         [Events format, 1 systems]
> name: function
> ID: 1
> format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
>
>         field:unsigned long ip; offset:8;       size:8; signed:0;
>         field:unsigned long parent_ip;  offset:16;      size:8; signed:0;
>
> print fmt: " %ps <-- %ps", (void *)REC->ip, (void *)REC->parent_ip ```
>
> Now, in my trace.dat, I want to write the real information.
> If you permit me, I'd like to continue this discussion with a concrete example.
>
> Let's suppose that I want to represent the following information:
>
> Event: function trace event
> Timestamp 9 seconds
> PID: 7777
> Function address: 0x2266D
> Parent: 0x2266E
>
> I start by filling the header_event information:
> - Writing 900000000000 on 8 bytes (timestamp).
> - Writing `24`in 8 bytes (commit) --> event data for event ID are on 24 bytes, indeed: common_* are on 8 bytes, ip and parent ip on 8.
> - Ignoring overwrite: indeed, I already used commit.
>
> Then, I have the 4080 bytes of payload event, but I use only 24 of them, as expressed in commit.
> - common_type is 1, (2 bytes), because I want a function event
> - common_flags is 0 (1 byte)
> - common_preempt_count is 0 (1 byte)
> - common_pid is 7777 (4 bytes)
> - ip is 0x2266D (8 bytes)
> - parent_ip is 0x2266E (8 bytes)
> - the remaining 4096 - 24 bytes are of padding, so I can complete the 4080 bytes of data.
>
> Unfortunately, in this way, only CPU number and timestamp are correctly displayed on the frontend (kernelshark).
> Event is marked as "unknown" and PID is wrong.
> Timestamp astonishes me too: to correctly display it, I just need to write 900000000000 on 8 bytes.
> Trying to put "1" on time_type (5bits), 0 on delta (27 bits) and 900000000000 on array (1*4 bytes), the timestamp is not correct.

Hi Matteo,
The 'type_len' should be 6 for your event, as the payload is 24 bytes (6*4). The 'time_delta' is the delta from the timestamp written in the page header, i.e. if you put 199 in 'time_delta' the time stamp of the event is 900000000199. One very important note - take care of the endian, used for all numbers in the file. The endian of the file can be different from the endian of the host, where the file is displayed.
That's why that information is written in the trace.dat - in the very beginning of the file.

>
> Thus, I think I am doing something wrong. May you help me in correcting this example, please?
>
> Another thing that did let me with a doubt is the following one: do I need a page (4096 bytes) for an event?
> In other words: the 4096-24 bytes of padding to complete the 4080 bytes of "data" are mandatory to express another event?
>
> Thanks for your help,
> Matteo
>
> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> Sent: Thursday, June 16, 2022 3:04 PM
> To: Matteo Bertolino <matteo.bertolino@huawei.com>
> Cc: linux-trace-users@vger.kernel.org
> Subject: Re: Format of trace.dat flyrecord section
>
>
> OK, this is a claws-mail bug. It added the '\' in the email address when doing a reply all.
>
> -- Steve
>
>
> On Thu, 16 Jun 2022 09:02:37 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > This is just a test to see if the added '\' was due to a claws-mail 
> > bug or somehow I did it.
> >
> > -- Steve
>


--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-20 16:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15 15:18 Format of trace.dat flyrecord section Matteo Bertolino
2022-06-16  4:25 ` Tzvetomir Stoyanov
2022-06-16 13:00 ` Steven Rostedt
     [not found] ` <20220616090237.2c9dda8b@gandalf.local.home>
2022-06-16 13:03   ` Steven Rostedt
2022-06-17 13:08     ` Matteo Bertolino
2022-06-17 13:10       ` Matteo Bertolino
2022-06-17 15:25       ` Tzvetomir Stoyanov
2022-06-20 15:59         ` Matteo Bertolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).