linux-trace-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Subject: Re: [PATCH v6 25/45] trace-cmd: Read compressed trace data
Date: Tue, 22 Jun 2021 09:51:26 -0400	[thread overview]
Message-ID: <20210622095126.20fa7879@gandalf.local.home> (raw)
In-Reply-To: <CAPpZLN4dHKxa2DztSurSow=PbsFSyuLgkifHt8yvJ6kL8pvu8Q@mail.gmail.com>

On Tue, 22 Jun 2021 13:50:44 +0300
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> > With large trace files, this will be an issue. Several systems setup the
> > /tmp directory as a ramfs file system (that is, it is locate in ram, and
> > not backed up on disk). If you have very large trace files, which you would
> > if you are going to bother compressing them, by uncompressing them into
> > /tmp, it could take up all the memory of the machine, or easily fill the
> > /tmp limit.  
> 
> There are a few possible approaches for solving that:
>  - use the same directory where the input trace file is located

I thought about that, but then decided against it, because there's a reason
people compress it. If we have to uncompress it to read it, I can see
people saying "why is it compressed in the first place?" When data is
compressed to save disk space (which I consider this a case), then the
reading has to uncompress it on a as-needed basis.

>  - use an environment variable for user specified temp directory for these files
>  - check if there is enough free space on the FS before uncompressing
> 
> >
> > Simply uncompressing the entire trace data is not an option. The best we
> > can do is to uncompress on a as needed basis. That would require having
> > meta data that is stored to know what pages are compressed.
> >  
> I can modify that logic to compress page by page, as the data is
> loaded by pages. Or use some of the above approaches ?

Doing it page by page is probably the most logical solution. It will make
it easier to manage without needing to create separate temporary files.

I'm guessing we need an index of each page and where they start. We need a
way to map the record offset to the page that contains it in such a way
that tracecmd_read_at() still works.

We could keep this in the file, or create it from the data. I'm thinking
saving this as a section in the file would be good as it would be quicker
for loading.

Have a section for each CPU, that maps each page with their compressed
offset in the file, and then just consider the page to be page size.

Oh, which reminds me, we need to make sure that we don't use
"getpagesize()" to determine the size of the page buffers, because I may be
making the buffers more than a single page. It must use the header_page
file in the events directory, because it that might change in the future!

Anyway, we can have this:

	buffer_page_size:	4096

/* lets say the compressed data starts at 10,000 just to make this easier
to explain. */

	u64 cpu_array[0]	10000 <- page 1 (compress to 100 bytes)
				10100 <- page 2 (compressed to 150 bytes)
				10250 <- page 3
				[...]

But the record->offset should contain the offset of the uncompressed data.
That is, if the record is on page 2 at offset 400 (uncompressed) then
offset should be:

	record->offset = 14496 (10000 + 4096 + 400)

Which would be calculated as:

	record->offset = cpu_data_start[cpu] + page * buffer_page_size + offset;

This also means that cpu_array[1] has to save its uncompressed start. That
is, even though it may start at 20,000 in the trace data file (10,000 more
than the cpu_array[0] start). It's uncompressed location needs to account
for all the cpu_array[0] pages, such that no two record's offsets will
overlap if they are on different CPUs.

	cpu_data_start[0] = 10000 (but has 1000 pages, where 1000 * 4096 = 4,096,000)

But even if cpu_array[1] starts at 20000, it has to account for the
uncompressed cpu_array[0] data, thus we have:

	cpu_data_start[1] = 4106000 (4096000 + 10000)

-- Steve

  reply	other threads:[~2021-06-22 13:51 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  7:49 [PATCH v6 00/45] Add trace file compression Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 01/45] trace-cmd library: Remove unused private APIs for creating trace files Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 02/45] trace-cmd library: Remove unused API tracecmd_update_option Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 03/45] trace-cmd: Check if file version is supported Tzvetomir Stoyanov (VMware)
2021-06-21 22:27   ` Steven Rostedt
2021-06-21 22:36     ` Steven Rostedt
2021-06-14  7:49 ` [PATCH v6 04/45] trace-cmd library: Add new API to get file version of input handler Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 05/45] trace-cmd library: Select the file version when writing trace file Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 06/45] trace-cmd: Add APIs for library initialization and free Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 07/45] trace-cmd library: Add support for compression algorithms Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 08/45] trace-cmd list: Show supported " Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 09/45] trace-cmd library: Bump the trace file version to 7 Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 10/45] trace-cmd library: Compress part of the trace file Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 11/45] trace-cmd library: Read compressed " Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 12/45] trace-cmd library: Add new API to get compression of input handler Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 13/45] trace-cmd library: Inherit compression algorithm from input file Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 14/45] trace-cmd library: Extend the create file APIs to support different compression Tzvetomir Stoyanov (VMware)
2021-06-14  7:49 ` [PATCH v6 15/45] trace-cmd record: Add new parameter --compression Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 16/45] trace-cmd dump: Add support for trace files version 7 Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 17/45] trace-cmd library: Add support for zlib compression library Tzvetomir Stoyanov (VMware)
2021-06-22  1:26   ` Steven Rostedt
2021-06-22 10:29     ` Tzvetomir Stoyanov
2021-06-22 13:31       ` Steven Rostedt
2021-06-14  7:50 ` [PATCH v6 18/45] trace-cmd library: Hide the logic for updating buffer offset Tzvetomir Stoyanov (VMware)
2021-06-22  2:10   ` Steven Rostedt
2021-06-14  7:50 ` [PATCH v6 19/45] trace-cmd: Move buffers description outside of options Tzvetomir Stoyanov (VMware)
2021-06-21 23:07   ` Steven Rostedt
2021-06-14  7:50 ` [PATCH v6 20/45] trace-cmd library: Track the offset in the option section in the trace file Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 21/45] trace-cmd library: Add compression of the option section of " Tzvetomir Stoyanov (VMware)
2021-06-21 23:10   ` Steven Rostedt
2021-06-22 10:43     ` Tzvetomir Stoyanov
2021-06-14  7:50 ` [PATCH v6 22/45] trace-cmd library: Refactor the logic for writing trace data in the file Tzvetomir Stoyanov (VMware)
2021-06-21 23:12   ` Steven Rostedt
2021-06-14  7:50 ` [PATCH v6 23/45] trace-cmd library: Add APIs for read and write compressed data in chunks Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 24/45] trace-cmd: Compress trace data Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 25/45] trace-cmd: Read compressed " Tzvetomir Stoyanov (VMware)
2021-06-21 23:23   ` Steven Rostedt
2021-06-22 10:50     ` Tzvetomir Stoyanov
2021-06-22 13:51       ` Steven Rostedt [this message]
2021-06-14  7:50 ` [PATCH v6 26/45] trace-cmd library: Compress latency " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 27/45] trace-cmd: Read compressed " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 28/45] trace-cmd library: Reuse within the library the function that checks file state Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 29/45] trace-cmd library: Make tracecmd_copy_headers() to work with output handler Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 30/45] trace-cmd: Do not use trace file compression with streams Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 31/45] trace-cmd library: Add new API to get file version of output handler Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 32/45] trace-cmd: Add file state parameter to tracecmd_copy Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 33/45] trace-cmd: Copy CPU count in tracecmd_copy Tzvetomir Stoyanov (VMware)
2021-06-21 23:27   ` Steven Rostedt
2021-06-14  7:50 ` [PATCH v6 34/45] trace-cmd: Copy buffers description " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 35/45] trace-cmd: Copy options " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 36/45] trace-cmd library: Refactor the logic for writing CPU trace data Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 37/45] trace-cmd library: Refactor the logic for writing CPU instance " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 38/45] trace-cmd: Copy trace data in tracecmd_copy Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 39/45] trace-cmd: Add compression parameter to tracecmd_copy Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 40/45] trace-cmd: Add new command "trace-cmd convert" Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 41/45] trace-cmd record: Update man page Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 42/45] trace-cmd: Add convert " Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 43/45] trace-cmd: Update bash completion Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 44/45] trace-cmd list: Update the man page Tzvetomir Stoyanov (VMware)
2021-06-14  7:50 ` [PATCH v6 45/45] trace-cmd: Update trace.dat " Tzvetomir Stoyanov (VMware)
2021-06-22  0:37   ` Steven Rostedt
2021-06-22 11:05     ` Tzvetomir Stoyanov
2021-06-22 13:58       ` Steven Rostedt
2021-06-22  2:22 ` [PATCH v6 00/45] Add trace file compression Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210622095126.20fa7879@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=tz.stoyanov@gmail.com \
    --subject='Re: [PATCH v6 25/45] trace-cmd: Read compressed trace data' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).