All of lore.kernel.org
 help / color / mirror / Atom feed
* Rotation of instance traces
@ 2022-02-18 18:45 Joel Fernandes
  2022-02-18 18:49 ` Joel Fernandes
  2022-02-18 19:22 ` Steven Rostedt
  0 siblings, 2 replies; 5+ messages in thread
From: Joel Fernandes @ 2022-02-18 18:45 UTC (permalink / raw)
  To: linux-trace-devel, Steven Rostedt

Hello!

I am seeing something weird with trace instances, that I don't see
with regular non-instance traces.

If I enable some sched events and set the buffer size to something
tiny, like 1KB, I am expecting the buffer to rotate quickly as it
overflows, but I don't see that:

  watch "cat trace | head -n 1" and I see the following

https://paste.googleplex.com/6259651512369152?raw

and if I do some heavy workloads, I still see that the top rows are
not moving and are stuck showing the same thing. I am expecting the
buffer to drop old events as it is being overwritten/rotation.

 However if I do a tail on the trace, I do see updates. So tracing is
in progress...

I did the exact same experiment on non-instance global tracing and it
works fine. I confirmed that 'overwrite' is set in the trace_options
of the instance (and non-instance).

This could be some trace formatting issue or another bug? Note I am
reading the tracefs trace file directly and not using trace-cmd for
this.

Thoughts?

Thanks,
Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rotation of instance traces
  2022-02-18 18:45 Rotation of instance traces Joel Fernandes
@ 2022-02-18 18:49 ` Joel Fernandes
  2022-02-18 19:22 ` Steven Rostedt
  1 sibling, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2022-02-18 18:49 UTC (permalink / raw)
  To: linux-trace-devel, Steven Rostedt

On Fri, Feb 18, 2022 at 1:45 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hello!
>
> I am seeing something weird with trace instances, that I don't see
> with regular non-instance traces.
>
> If I enable some sched events and set the buffer size to something
> tiny, like 1KB, I am expecting the buffer to rotate quickly as it
> overflows, but I don't see that:
>
>   watch "cat trace | head -n 1" and I see the following
>
> https://paste.googleplex.com/6259651512369152?raw

Oops wrong paste link, I mean I did:
watch "cat head -n 40 trace"

And I get the following paste:
https://www.toptal.com/developers/hastebin/raw/iqaqusiyik

Thanks,

Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rotation of instance traces
  2022-02-18 18:45 Rotation of instance traces Joel Fernandes
  2022-02-18 18:49 ` Joel Fernandes
@ 2022-02-18 19:22 ` Steven Rostedt
  2022-02-18 19:28   ` Steven Rostedt
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-02-18 19:22 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-trace-devel

On Fri, 18 Feb 2022 13:45:44 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

> Hello!
> 
> I am seeing something weird with trace instances, that I don't see
> with regular non-instance traces.
> 
> If I enable some sched events and set the buffer size to something
> tiny, like 1KB, I am expecting the buffer to rotate quickly as it
> overflows, but I don't see that:
> 
>   watch "cat trace | head -n 1" and I see the following
> 
> https://paste.googleplex.com/6259651512369152?raw
> 
> and if I do some heavy workloads, I still see that the top rows are
> not moving and are stuck showing the same thing. I am expecting the
> buffer to drop old events as it is being overwritten/rotation.
> 
>  However if I do a tail on the trace, I do see updates. So tracing is
> in progress...
> 
> I did the exact same experiment on non-instance global tracing and it
> works fine. I confirmed that 'overwrite' is set in the trace_options
> of the instance (and non-instance).
> 
> This could be some trace formatting issue or another bug? Note I am
> reading the tracefs trace file directly and not using trace-cmd for
> this.

It's not a bug it's a feature ;-)

The "trace" file is not a consumer file. That is, it does not clear out the
ring buffer when you read it. If you want that, use trace_pipe.

What you are seeing is the effect of the implementation of the ring buffer.
As the ring buffer is broken up into "sub buffers" which are currently a
page in size. Although Tzvetomir has RFC patches to change that:

  https://patchwork.kernel.org/project/linux-trace-devel/list/?series=594661&state=*


If you pause tracing (echo 0 > tracing_on) you can read the trace file over
and over and still get the same result. That is because it uses an iterator
to read over the ring buffer and does not modify it. But if you were to use
trace_pipe, it consumes the tracing data, and it does so by swapping out
the sub buffers from the ring buffer and giving it to the user.

Now to be able to do things like splice, and zero copy from the ring
buffer, when you read the ring buffer it swaps out a sub buffer from the
writable portion of the ring buffer and makes it into the "reader_page".
The reader page is now own by the reader, and will never be written to.
This allows the reader to take its time to read it and not worry about
being overwritten by the writer.

Now what you are seeing, is that a reader page was swapped out of the ring
buffer. This means that it will never be written to again, as it belongs to
the reader (it's the reader page). And since you are using the "trace" file
and not the "trace_pipe" file, it will not even discard this page. But as
you have not turned off tracing, the rest of the ring buffer is still being
updated.

What happens is, the start of the trace file will show you the reader page
(never be updated, and will always stay the same), and then walk into the
writable section of the ring buffer, which will constantly be updated. This
gives you exactly what you are seeing.

Like I said, it's a feature!

Now, why is it not happening in the top level file? It's because if the
reader page has not be read yet, the iterator may go right into the
writable portion and skip over the reader page as it is empty. If you did
something that causes the reader page to be filled, it will give you what
you see. So this isn't a difference between the instances and top level,
it's a difference between what you have have done to the instances before
reading the trace file, which you did not do for the top level buffer.

They both use the same code, so they should be acting identically.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rotation of instance traces
  2022-02-18 19:22 ` Steven Rostedt
@ 2022-02-18 19:28   ` Steven Rostedt
  2022-02-18 21:46     ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2022-02-18 19:28 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-trace-devel

On Fri, 18 Feb 2022 14:22:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Now to be able to do things like splice, and zero copy from the ring
> buffer, when you read the ring buffer it swaps out a sub buffer from the
> writable portion of the ring buffer and makes it into the "reader_page".
> The reader page is now own by the reader, and will never be written to.

I need to correct that statement about never being written to. It can be.
If the current write head is on the page as it gets swapped out and becomes
the reader page. A new write will happen on the reader page, but once the
writer walks off the reader page and into the main ring buffer, it will
never touch the reader page again.

The read logic detects this, and will do copies of the page to send it to
the user as it can not give the page itself to the user with the writer
still on it.

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Rotation of instance traces
  2022-02-18 19:28   ` Steven Rostedt
@ 2022-02-18 21:46     ` Joel Fernandes
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2022-02-18 21:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-trace-devel

On Fri, Feb 18, 2022 at 2:28 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 18 Feb 2022 14:22:16 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Now to be able to do things like splice, and zero copy from the ring
> > buffer, when you read the ring buffer it swaps out a sub buffer from the
> > writable portion of the ring buffer and makes it into the "reader_page".
> > The reader page is now own by the reader, and will never be written to.
>
> I need to correct that statement about never being written to. It can be.
> If the current write head is on the page as it gets swapped out and becomes
> the reader page. A new write will happen on the reader page, but once the
> writer walks off the reader page and into the main ring buffer, it will
> never touch the reader page again.
>
> The read logic detects this, and will do copies of the page to send it to
> the user as it can not give the page itself to the user with the writer
> still on it.

Thanks Steve! I went over your email and it makes sense. I also cannot
reproduce this issue any more so I don't know what I did :P.  I was
messing with the buffer_size_kb while doing reads from 'trace', so
maybe I managed to get the read page in a weird state? Anyway, I think
I am going to collect these traces only with 'trace_pipe_raw' , so as
long as the data in the ring buffer is consistent, that's good.

I was mostly concerned about the validity/integrity of the data stored
since the 'trace' made it look like a corruption :P , when I was doing
"head -n 40 trace" , I was seeing the timestamps of the first couple
of records updating, but the data remained the same. Almost as if, the
header of the record was changing but not the payload. I am not sure
if you have seen anything like that.

Thanks,
             Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-18 21:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 18:45 Rotation of instance traces Joel Fernandes
2022-02-18 18:49 ` Joel Fernandes
2022-02-18 19:22 ` Steven Rostedt
2022-02-18 19:28   ` Steven Rostedt
2022-02-18 21:46     ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.