All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Add --show-size to git log to print message size
@ 2007-07-14 16:52 Marco Costalba
  2007-07-14 19:03 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-14 16:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Print message size just before the corresponding message
to speedup the parsing by scripts/porcelains tools.

Because git log output is normally read incrementally by
porcelain tools, if message size is ignored then an
expensive seek of the delimiting char, as example '\0'
must be done when parsing the output stream.

With this patch it is possible to avoid an otherwise
mandatory seek for '\0' starting from the beginning
of log body.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---

This little patch makes great difference in loading
performance of our loved ;-) tool....and probably
also others.

Please apply.

Thanks
Marco


 log-tree.c |    3 +++
 revision.c |    4 ++++
 revision.h |    1 +
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 8624d5a..2fb7761 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -295,6 +295,9 @@ void show_log(struct rev_info *opt,
 	if (opt->add_signoff)
  		len = append_signoff(&msgbuf, &msgbuf_len, len,
 				     opt->add_signoff);
+ 	if (opt->show_size)
+		printf("size %i\n", len);
+
  	printf("%s%s%s", msgbuf, extra, sep);
  	free(msgbuf);
 }
diff --git a/revision.c b/revision.c
index 33ee9ee..3850a1e 100644
--- a/revision.c
+++ b/revision.c
@@ -1136,6 +1136,10 @@ int setup_revisions(int argc,
 				continue;
 			}
+			if (!strcmp(arg, "--show-size")) {
+				revs->show_size = 1;
+				continue;
+			}

 			/*
 			 * Grepping the commit log
diff --git a/revision.h b/revision.h
index f46b4d5..584b3f1 100644
--- a/revision.h
+++ b/revision.h
@@ -81,6 +81,7 @@ struct rev_info {
  	const char	*log_reencode;
  	const char	*subject_prefix;
 	int		no_inline;
+	int		show_size;

 	/* Filter by commit log message */
  	struct grep_opt	*grep_filter;
-- 
1.5.3.rc0.81.g1ed84-dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 16:52 [PATCH] Add --show-size to git log to print message size Marco Costalba
@ 2007-07-14 19:03 ` Junio C Hamano
  2007-07-14 20:46   ` Marco Costalba
                     ` (2 more replies)
  2007-07-17  7:49 ` Andy Parkins
  2007-07-25  4:03 ` Junio C Hamano
  2 siblings, 3 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-07-14 19:03 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

"Marco Costalba" <mcostalba@gmail.com> writes:

> Print message size just before the corresponding message
> to speedup the parsing by scripts/porcelains tools.

> diff --git a/log-tree.c b/log-tree.c
> index 8624d5a..2fb7761 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -295,6 +295,9 @@ void show_log(struct rev_info *opt,
> 	if (opt->add_signoff)
>  		len = append_signoff(&msgbuf, &msgbuf_len, len,
> 				     opt->add_signoff);
> + 	if (opt->show_size)
> +		printf("size %i\n", len);
> +
>  	printf("%s%s%s", msgbuf, extra, sep);
>  	free(msgbuf);
> }

"size" is a bit vague here.  What if we later want to extend
things so that you can ask for the entire log entry size
including the patch output part (I am not saying that would be
an easy change --- I am more worried about the stability of the
external interface).  So is --show-"size".  "message-size" would
have been a bit easier to swallow, but I sense the problem runs
deeper.

The current code spits out a log message after formatting it in
its entirety in core, so we happen to have its size upfront.
Having to say the size upfront means we close the door for
alternative implementations that stream the log formatting
processing.

This is not a problem for log messages per-se, as we
traditionally even did not show a commit log over 16kB (these
days we are supposed to be unbounded, although I do not know if
anybody actually tested that).  But if we ever want to extend
this concept to cover the patch part, so that the reader can
split the "git log" output stream into individual commits with
the same "efficiency improvements" you are seeing from this
patch, that becomes a real problem, I would think.

Naturally, this reminds me of having to say Content-Length
upfront vs chunked transfer.  Essentially you are treating the
output stream from "git log" into the pipe as a sequence of
messages (and without "-p", your "size" is exactly what a
"Content-Length" header is).  The fact that this analogy works
only when the command is run without "-p" (but "--show-size"
does not check that) bothers me.  What would we do when we want
to help the readers that reads from "-p" output?

I have a more basic question. If you are reading from non "-p"
output, where do you exactly have the wasted cycles in your
reader's processing?  One immediately obvious thing is that you
would not have to repeatedly realloc your buffer to keep one
message worth of data in core, but somehow I cannot imagine that
that is the source of a huge performance boost.

One use case that this would give a huge improvement I can think
of is if you read the stream, and show only every tenth commit.
You can discard other 9 out of 10 without even looking at their
contents, and being able to read known amount of bytes and
immediately discard would certainly be much more efficient than
having to scan for NUL, only to discard.  But that does not
sound as a plausible real-life scenario.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 19:03 ` Junio C Hamano
@ 2007-07-14 20:46   ` Marco Costalba
  2007-07-15  9:35     ` Alex Riesen
  2007-07-16 12:04   ` Marco Costalba
  2007-07-16 17:50   ` Marco Costalba
  2 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-14 20:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 7/14/07, Junio C Hamano <gitster@pobox.com> wrote:
>
> "size" is a bit vague here.  What if we later want to extend
> things so that you can ask for the entire log entry size
> including the patch output part (I am not saying that would be
> an easy change --- I am more worried about the stability of the
> external interface).  So is --show-"size".  "message-size" would
> have been a bit easier to swallow, but I sense the problem runs
> deeper.

What about --section-sizes?

You can add in the output line all the sizes you want, message, patch
and future extensions separated by a space. An example output for
message and patch sizes.

sizes 456 565\n

Or, as a stream friendly alternative (and also more elegant) you can
output 'section size' before each section, so as example

commit d9e940....
section size 456
< log header and message body>
section size 565
<patch diff content>
section size 232
<other type of content>


> I have a more basic question. If you are reading from non "-p"
> output, where do you exactly have the wasted cycles in your
> reader's processing?

qgit loading works like this:

git log output is read as a series of big binary chunks by a Qt
library function that calls read(), these chunks are read each one in
a different buffer and there they stay for all the application life
time (or until a data refresh), so there is no copy of data in qgit,
the buffers are allocated and the pointers passed to read(), that's
all.

It's a kind of software DMA ;-)

The only information that qgit needs to infere at startup is where to
find the first line of each commit, for parent information and
revision's counting, all the other data is read and consumed only on
demand, i.e. for showing to user, but because only the screen visible
part of the list is needed, data is read from these buffers and parsed
*in small chunks* and only when user scrolls the view.

The problem is that to get the first line of each revision the message
boundaries of _all_ the commits must be known/found.

Because currently there is no message size information the application have to:

-get the offset of commit first line
-try to find the delimiting '\0' if existing (binary chunks could be
truncated at any point)
-get the offset of commit first line of next revision
-and so on for all the revisions

Finding the delimiting '\0' it means to loop across the whole buffers
and _this_ is the expensive and not needed part. If just after the
first line would be possible to point to the beginning of the next
revision this seeking for '\0' would be not necessary anymore.

When user asks for data of revision 'x' then because offset of
revision 'x' is known, application could just point to the correct
offset in the correct data buffer and parse out the (small) needed
info.

Hope I have explained clearly enough, I have some problems writing in
at late evening ;-)


Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 20:46   ` Marco Costalba
@ 2007-07-15  9:35     ` Alex Riesen
  2007-07-15 10:06       ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Riesen @ 2007-07-15  9:35 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List

Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200:
> Finding the delimiting '\0' it means to loop across the whole buffers
> and _this_ is the expensive and not needed part. If just after the

It is _not_ expensive. It could be made expensive, though. By using
QString and QByteArray, for instance.

> first line would be possible to point to the beginning of the next
> revision this seeking for '\0' would be not necessary anymore.

But this will make your reading different: you have to handle the case
when the next revision is not _fully_ read in yet, but you already
know its size.

> Hope I have explained clearly enough, I have some problems writing in
> at late evening ;-)

P.S. BTW, why do you have some 20 source files marked executable in
your qgit4 repository?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15  9:35     ` Alex Riesen
@ 2007-07-15 10:06       ` Marco Costalba
  2007-07-15 10:48         ` Alex Riesen
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 10:06 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List

On 7/15/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200:
> > Finding the delimiting '\0' it means to loop across the whole buffers
> > and _this_ is the expensive and not needed part. If just after the
>
> It is _not_ expensive. It could be made expensive, though. By using
> QString and QByteArray, for instance.
>

The searching we are talking about is this (Rev::indexData() in
git_startup.cpp):

int end = ba.indexOf('\0', idx); // this is the slowest find

the starting point 'idx' is at the beginning of the log message.


Qt implemantation of indexOf() is this (src/corelib/tools/qbytearray.cpp):

int QByteArray::indexOf(char ch, int from) const
{
    if (from < 0)
        from = qMax(from + d->size, 0);
    if (from < d->size) {
        const char *n = d->data + from - 1;
        const char *e = d->data + d->size;
        while (++n != e)
        if (*n == ch)
            return  n - d->data;
    }
    return -1;
}

Hope this clears any doubts regarding (supposed) slowness of Qt classes.

> > first line would be possible to point to the beginning of the next
> > revision this seeking for '\0' would be not necessary anymore.
>
> But this will make your reading different: you have to handle the case
> when the next revision is not _fully_ read in yet, but you already
> know its size.
>

Reading and creating revision is made as a streaming, it means that
when there is new data  from git a new Rev struct (well it's a class
indeed, but there's no diference) is created and populated with index
data: offset of the rev, parents number, offset of log message and so
on.

If, *while parsing the data* a truncated rev is found (we are at EOF
and no '\0' is found) the whole rev is discarded and deleted, we wait
for some more data and restart the process.

Because the above event is quite rare given the size of the buffers
where git row data is stored, no really loss of speed occurs and we
have the (big) advantage of indexing *while* searching for '\0', so to
scan data only once.

This is how it works now.

With the proposed patch will be easier to find a truncated rev,
because as soon as we know the rev size, after reading it from the
stream, we check:

             if (revision_offset + size > byte_array_size)
                       truncated_rev;


>
> P.S. BTW, why do you have some 20 source files marked executable in
> your qgit4 repository?
>

Importing from Windows: ntfs does not handles file attributes
correctly, I should clean up permissions but I'm lazy ;-)


Marco

P.S: I have an experimental branch where the above is implemented, I
cannot publish now because it requires the --show-size change in git,
but after initial testing I have found that with the above applied the
overhead of qgit on git-log it's about of only 16%.

It means that if git-log runs in say 3 seconds (warm cache), qgit with
the same git log arguments runs in about 3.5 seconds.

With cold cache overhead is also less because disk access is accounted
on the git side ;-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 10:06       ` Marco Costalba
@ 2007-07-15 10:48         ` Alex Riesen
  2007-07-15 11:32           ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Riesen @ 2007-07-15 10:48 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List

Marco Costalba, Sun, Jul 15, 2007 12:06:53 +0200:
> 
> Hope this clears any doubts regarding (supposed) slowness of Qt classes.

No, it does not. Look at the implementation of memchr.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 10:48         ` Alex Riesen
@ 2007-07-15 11:32           ` Marco Costalba
  2007-07-15 12:29             ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 11:32 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List

On 7/15/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> Marco Costalba, Sun, Jul 15, 2007 12:06:53 +0200:
> >
> > Hope this clears any doubts regarding (supposed) slowness of Qt classes.
>
> No, it does not. Look at the implementation of memchr.
>

If this is the implementation you refer:
http://www.google.com/codesearch?hl=it&q=+lang:c+license:gpl+package:libc+memchr+show:6Tvt4_C8w_Q:Sdpf5XO0rrM:Emqlj8d07Ns&sa=N&cd=4&ct=rc&cs_p=http://ftp.gnu.org/gnu/glibc/glibc-2.2.1.tar.gz&cs_f=glibc-2.2.1/sysdeps/i386/bits/string.h#a0

I would say that I would prefer to avoid assemply and Qt libs seems to
do the same, perhaps for portability, but it's only a guess.

Anyway with the proposed patch to git you don't need assembly at all
(to me it's would seem an overkill)

Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 11:32           ` Marco Costalba
@ 2007-07-15 12:29             ` Marco Costalba
  2007-07-15 12:35               ` Sean
  2007-07-15 18:14               ` Linus Torvalds
  0 siblings, 2 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 12:29 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List

To further push for git patch, please check this from current linux tree:

git log --parents --pretty=raw -z -r -p c4201214

As you can see the diff contains a '\0' value (actually removed by the patch).

qgit of course fails, as any tool based on parsing '\0' delimiting
character records. If the size of the patch was explicitly given
instead no fail will occur.

Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 12:29             ` Marco Costalba
@ 2007-07-15 12:35               ` Sean
  2007-07-15 14:58                 ` Marco Costalba
  2007-07-15 18:14               ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Sean @ 2007-07-15 12:35 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On Sun, 15 Jul 2007 14:29:04 +0200
"Marco Costalba" <mcostalba@gmail.com> wrote:

> To further push for git patch, please check this from current linux tree:
> 
> git log --parents --pretty=raw -z -r -p c4201214
> 
> As you can see the diff contains a '\0' value (actually removed by the patch).
> 
> qgit of course fails, as any tool based on parsing '\0' delimiting
> character records. If the size of the patch was explicitly given
> instead no fail will occur.
> 

If you only look for ^\0 (ie. first position only) the parsing should
be okay.  Not that it helps with the performance issue you're trying
to address.

Sean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 12:35               ` Sean
@ 2007-07-15 14:58                 ` Marco Costalba
  2007-07-15 15:04                   ` Sean
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 14:58 UTC (permalink / raw)
  To: Sean; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On 7/15/07, Sean <seanlkml@sympatico.ca> wrote:
>
> If you only look for ^\0 (ie. first position only) the parsing should
> be okay.  Not that it helps with the performance issue you're trying
> to address.
>
revisions are streamed, there is only one first revision, all the
others are linked togheter in a big stream.

Actually I ended up looking for the character after the '\0' (if any)
to be 'c' that is the beginning of "commit <sha> ....." but it's a bit
hacky.


Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 14:58                 ` Marco Costalba
@ 2007-07-15 15:04                   ` Sean
  2007-07-15 15:58                     ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Sean @ 2007-07-15 15:04 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On Sun, 15 Jul 2007 16:58:16 +0200
"Marco Costalba" <mcostalba@gmail.com> wrote:

> On 7/15/07, Sean <seanlkml@sympatico.ca> wrote:
> >
> > If you only look for ^\0 (ie. first position only) the parsing should
> > be okay.  Not that it helps with the performance issue you're trying
> > to address.
> >
> revisions are streamed, there is only one first revision, all the
> others are linked togheter in a big stream.

Right, but how does that alter the notion that ^\0 is the safe way to
test for the "real" NUL terminator?

> Actually I ended up looking for the character after the '\0' (if any)
> to be 'c' that is the beginning of "commit <sha> ....." but it's a bit
> hacky.

Much more hacky than just testing for ^\0.  AFAIU, the NUL is
guaranteed to come at the start of the line and _all_ other NUL
characters are guaranteed to appear somewhere else.  Your current
test will fail if a "c" happens to follow NUL in the patch text.

Sean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 15:04                   ` Sean
@ 2007-07-15 15:58                     ` Marco Costalba
  2007-07-15 16:16                       ` Sean
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 15:58 UTC (permalink / raw)
  To: Sean; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On 7/15/07, Sean <seanlkml@sympatico.ca> wrote:
>
> characters are guaranteed to appear somewhere else.  Your current
> test will fail if a "c" happens to follow NUL in the patch text.
>

Yes sir, I agree with you 100%

The real issue IMHO is that a 'delimiting character' should not appear
in the content that it is trying to delimit. That's all the point
around this thing. If a real 'delimiting character' does not exist
because the content is free perhaps we should find a way to understand
when we are at revision boundaries. The simplest (and most efficient)
way is to know the size of the message in advance.

Also if git log filters out the '\0' chars in the content it is
suboptimal because we could miss to see we have a '\0' in content. BTW
the above mentioned patch in Linux tree was all around removing that
garbage '\0'. If diff output was filtered out perhaps the author could
have missed that patch opportunity.

Marco

BTW, perhaps I'm retarded, but I fail to see what to search for with ^\0.

As example if I have this output:


    Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index fd3fd90..36c523d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -444,7 +444,6 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd)
 }

 /**
-^@
  * gfs2_ri_total - Total up the file system space, according to the rindex.
  *
  */
^@commit 8fb68595d508fd30ec90939572484b263600376c
fad59c1390045b5adb7c7249ec4e77e0f868aca5
tree 218a457675c111e2224fb57998d38e45d5786bd1
parent fad59c1390045b5adb7c7249ec4e77e0f868aca5


What should I search for to find the revision boundary? "\n\0" ? is
this that you mean with ^\0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 15:58                     ` Marco Costalba
@ 2007-07-15 16:16                       ` Sean
  2007-07-15 16:27                         ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Sean @ 2007-07-15 16:16 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On Sun, 15 Jul 2007 17:58:21 +0200
"Marco Costalba" <mcostalba@gmail.com> wrote:

> What should I search for to find the revision boundary? "\n\0" ? is
> this that you mean with ^\0

Essentially yes.  That pattern will never appear inside the comment
or patch text because both sections are always indented.  So in the
example you cite, "\n\0" would match the proper terminator, and the
"\n-\0" NUL would be ignored.

Sean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 16:16                       ` Sean
@ 2007-07-15 16:27                         ` Marco Costalba
  2007-07-15 16:34                           ` Sean
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 16:27 UTC (permalink / raw)
  To: Sean; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On 7/15/07, Sean <seanlkml@sympatico.ca> wrote:
> On Sun, 15 Jul 2007 17:58:21 +0200
> "Marco Costalba" <mcostalba@gmail.com> wrote:
>
> > What should I search for to find the revision boundary? "\n\0" ? is
> > this that you mean with ^\0
>
> Essentially yes.  That pattern will never appear inside the comment
> or patch text because both sections are always indented.  So in the
> example you cite, "\n\0" would match the proper terminator, and the
> "\n-\0" NUL would be ignored.
>

But your scheme does not fail if the patch is not \n terminated ?

It can happen if the patch adds lines at the end of a file and the
last line is not \n terminated.

In this case your scheme seems to miss the good next revision boundary.


Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 16:27                         ` Marco Costalba
@ 2007-07-15 16:34                           ` Sean
  2007-07-15 16:54                             ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Sean @ 2007-07-15 16:34 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On Sun, 15 Jul 2007 18:27:41 +0200
"Marco Costalba" <mcostalba@gmail.com> wrote:

> But your scheme does not fail if the patch is not \n terminated ?

I'm pretty sure there is always a preceding end-of-line marker,
before the terminating NUL; although i haven't broken down and
read the code yet.

> It can happen if the patch adds lines at the end of a file and the
> last line is not \n terminated.
> 
> In this case your scheme seems to miss the good next revision boundary.
> 

Just tested this case here with "git log --parents --pretty=raw -z -r -p"
and git prints out: "\ No newline at end of file"  followed by a newline
and then the NUL.

Sean

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 16:34                           ` Sean
@ 2007-07-15 16:54                             ` Marco Costalba
  0 siblings, 0 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 16:54 UTC (permalink / raw)
  To: Sean; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On 7/15/07, Sean <seanlkml@sympatico.ca> wrote:
>
> Just tested this case here with "git log --parents --pretty=raw -z -r -p"
> and git prints out: "\ No newline at end of file"  followed by a newline
> and then the NUL.
>

Thanks. I think I will use your scheme so. But I really hope it's a
temporary workaround before git prints sizes with messages.

Thanks
Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 12:29             ` Marco Costalba
  2007-07-15 12:35               ` Sean
@ 2007-07-15 18:14               ` Linus Torvalds
  2007-07-15 18:45                 ` Marco Costalba
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-07-15 18:14 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List



On Sun, 15 Jul 2007, Marco Costalba wrote:
>
> To further push for git patch, please check this from current linux tree:
> 
> git log --parents --pretty=raw -z -r -p c4201214
> 
> As you can see the diff contains a '\0' value (actually removed by the patch).

So arguably maybe we should have turned that patch into a binary patch, 
but then it would have been really hard to read, and GNU patch and friends 
couldn't have read it.

So I think a better option would be:

> qgit of course fails, as any tool based on parsing '\0' delimiting
> character records. If the size of the patch was explicitly given
> instead no fail will occur.

You have an alternate fix, namely to only look at the NUL character at 
newline boundaries. Also, when it comes to "git log", you actually know 
even more: each commit begins with "commit".

A patch will always be nicely line-oriented, and will never have a NULL at 
the *beginning* of a line.

So you could actually make the parsing really strict, and look for the 
sequence

	'\n\0commit '

and that's a very trustworthy delimeter.

But yes, you can have NUL-characters in the middle of patches.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-15 18:14               ` Linus Torvalds
@ 2007-07-15 18:45                 ` Marco Costalba
  0 siblings, 0 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-15 18:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Riesen, Junio C Hamano, Git Mailing List

On 7/15/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> So you could actually make the parsing really strict, and look for the
> sequence
>
>         '\n\0commit '
>
> and that's a very trustworthy delimeter.
>

Thanks to you and Sean for the hint, anyhow I would wish the message
size patch applied if it is possible and there are no drawbacks. Tool
that parses git-log output would benefit form that and the impact (in
code lines) is very low and also does not change anything if not
activated.

Thanks
Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 19:03 ` Junio C Hamano
  2007-07-14 20:46   ` Marco Costalba
@ 2007-07-16 12:04   ` Marco Costalba
  2007-07-16 12:31     ` Alex Riesen
  2007-07-16 17:50   ` Marco Costalba
  2 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-16 12:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 7/14/07, Junio C Hamano <gitster@pobox.com> wrote:
> "Marco Costalba" <mcostalba@gmail.com> writes:
>
>
> "size" is a bit vague here.  What if we later want to extend
> things so that you can ask for the entire log entry size
> including the patch output part (I am not saying that would be
> an easy change --- I am more worried about the stability of the
> external interface).  So is --show-"size".  "message-size" would
> have been a bit easier to swallow, but I sense the problem runs
> deeper.
>

I'm rewriting the patch + documentation cleaned up and with a
different option name (--show-log).

Currently the patch just include the log message because the diff
content (-p option) is not buffered but is written with a combination
of printf, puts, fputs, fwrite and putchar directly to stdout.

My question is, there is a way to get the quantity of bytes written to
stdout before they are printed? I'm not an expert of C stdio library,
so perhaps this is nonsense, but I was thinking of reading the size of
stout buffer before to fflush() (I don't know if it is possible).

If it is not possible I have to give up in extending the scope of this
'show me the size' patch to diff content and I will stick to the
proper name --show-log.

Thanks
Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-16 12:04   ` Marco Costalba
@ 2007-07-16 12:31     ` Alex Riesen
  2007-07-16 17:50       ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Riesen @ 2007-07-16 12:31 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List

On 7/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
> Currently the patch just include the log message because the diff
> content (-p option) is not buffered but is written with a combination
> of printf, puts, fputs, fwrite and putchar directly to stdout.
>
> My question is, there is a way to get the quantity of bytes written to
> stdout before they are printed? I'm not an expert of C stdio library,
> so perhaps this is nonsense, but I was thinking of reading the size of
> stout buffer before to fflush() (I don't know if it is possible).

It is not possible. Buffers can be flushed at any time (i.e. they are
flushed when EOL reached and the output is a terminal for stdout).

It is also a bit unclear _why_ do you need the diff output. You don't
show it immediately anyway.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-16 12:31     ` Alex Riesen
@ 2007-07-16 17:50       ` Junio C Hamano
  2007-07-16 17:55         ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2007-07-16 17:50 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Marco Costalba, Git Mailing List

"Alex Riesen" <raa.lkml@gmail.com> writes:

> On 7/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
>> Currently the patch just include the log message because the diff
>> content (-p option) is not buffered but is written with a combination
>> of printf, puts, fputs, fwrite and putchar directly to stdout.
>>
>> My question is, there is a way to get the quantity of bytes written to
>> stdout before they are printed? I'm not an expert of C stdio library,
>> so perhaps this is nonsense, but I was thinking of reading the size of
>> stout buffer before to fflush() (I don't know if it is possible).
>
> It is not possible. Buffers can be flushed at any time (i.e. they are
> flushed when EOL reached and the output is a terminal for stdout).
>
> It is also a bit unclear _why_ do you need the diff output. You don't
> show it immediately anyway.

I'd say making --show-message-size option incompatible with diff output
would be good enough futureproofing for now.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 19:03 ` Junio C Hamano
  2007-07-14 20:46   ` Marco Costalba
  2007-07-16 12:04   ` Marco Costalba
@ 2007-07-16 17:50   ` Marco Costalba
  2 siblings, 0 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-16 17:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 7/14/07, Junio C Hamano <gitster@pobox.com> wrote:
>
> The current code spits out a log message after formatting it in
> its entirety in core, so we happen to have its size upfront.
> Having to say the size upfront means we close the door for
> alternative implementations that stream the log formatting
> processing.
>

This is a better named patch that try also to easy your correct point
about alternative implementations, in particular  a value of zero is
reserved in case git is unable to satisfy the request, so that tools
can fallback on standard '\0' searching.

After inspecting diff.c code I made up my mind only log message can be
sized upfront, diff ocntent is printed by a bunch of printf, fwrite,
fputs, putchar and so on and it's impossible to know the size in
advance.

------------------- cut -------------------

Subject: [PATCH] Add --log-size to git log to print message size

Print message size just before the corresponding message.

Because git log output is normally read incrementally by
porcelain tools, if message size is ignored then an
expensive seek of a delimiting char, as example '\0'
must be done when parsing the output stream.

With this patch it is possible to avoid an otherwise
mandatory seek for '\0' starting from the beginning
of log body.

In case it is not possible to know the size upfront
size value is set to zero.

Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---
 Documentation/git-log.txt |    5 +++++
 log-tree.c                |    3 +++
 revision.c                |    4 ++++
 revision.h                |    1 +
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index 63c1dbe..b539f50 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -64,6 +64,11 @@ include::pretty-options.txt[]
 --follow::
 	Continue listing the history of a file beyond renames.

+--log-size::
+	Before the log message print out its size in bytes. Intended
+	mainly for porcelain tools consumption. If git is unable to
+	produce a valid value size is set to zero.
+
 <paths>...::
 	Show only commits that affect the specified paths.

diff --git a/log-tree.c b/log-tree.c
index 8624d5a..2dc6b1b 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -295,6 +295,9 @@ void show_log(struct rev_info *opt,
 	if (opt->add_signoff)
 		len = append_signoff(&msgbuf, &msgbuf_len, len,
 				     opt->add_signoff);
+ 	if (opt->show_log_size)
+		printf("log size %i\n", len);
+
 	printf("%s%s%s", msgbuf, extra, sep);
 	free(msgbuf);
 }
diff --git a/revision.c b/revision.c
index 28b5f2e..f1cbb1f 100644
--- a/revision.c
+++ b/revision.c
@@ -1149,6 +1149,10 @@ int setup_revisions(int argc, const
 					die("unknown date format %s", arg);
 				continue;
 			}
+			if (!strcmp(arg, "--log-size")) {
+				revs->show_log_size = 1;
+				continue;
+			}

 			/*
 			 * Grepping the commit log
diff --git a/revision.h b/revision.h
index f46b4d5..98a0a8f 100644
--- a/revision.h
+++ b/revision.h
@@ -81,6 +81,7 @@ struct rev_info {
 	const char	*log_reencode;
 	const char	*subject_prefix;
 	int		no_inline;
+	int		show_log_size;

 	/* Filter by commit log message */
 	struct grep_opt	*grep_filter;
-- 
1.5.3.rc2-dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-16 17:50       ` Junio C Hamano
@ 2007-07-16 17:55         ` Marco Costalba
  2007-07-16 18:02           ` Marco Costalba
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-16 17:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Riesen, Git Mailing List

On 7/16/07, Junio C Hamano <gitster@pobox.com> wrote:
>
> I'd say making --show-message-size option incompatible with diff output
> would be good enough futureproofing for now.
>
Oooops, I didn't see your post.

I agree 100%, please tell me if doc it's clear enough or it would be
better to clarify that "message log" it means only message and no diff
content.

Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-16 17:55         ` Marco Costalba
@ 2007-07-16 18:02           ` Marco Costalba
  2007-07-16 22:37             ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Marco Costalba @ 2007-07-16 18:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Riesen, Git Mailing List

On 7/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
> On 7/16/07, Junio C Hamano <gitster@pobox.com> wrote:
> >
> > I'd say making --show-message-size option incompatible with diff output
> > would be good enough futureproofing for now.
> >
> Oooops, I didn't see your post.
>
> I agree 100%, please tell me if doc it's clear enough or it would be
> better to clarify that "message log" it means only message and no diff
> content.
>

Sorry to bother you again, but my English is very bad and I would like
to be clear.

When i say no diff content I mean that git log --log-size -p it's a
perfect valid command but --log-size will make git print the size of
_only_ the log part, it means from the line after "log size xxx\n"
until the end of log message that can be '\0' (if no diff) or the
beginning of diff part.

Sorry to be pedantic.

Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-16 18:02           ` Marco Costalba
@ 2007-07-16 22:37             ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2007-07-16 22:37 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Alex Riesen, Git Mailing List

"Marco Costalba" <mcostalba@gmail.com> writes:

> On 7/16/07, Marco Costalba <mcostalba@gmail.com> wrote:
>> On 7/16/07, Junio C Hamano <gitster@pobox.com> wrote:
>> >
>> > I'd say making --show-message-size option incompatible with diff output
>> > would be good enough futureproofing for now.
>> >
>> Oooops, I didn't see your post.
>>
>> I agree 100%, please tell me if doc it's clear enough or it would be
>> better to clarify that "message log" it means only message and no diff
>> content.
>>
>
> Sorry to bother you again, but my English is very bad and I would like
> to be clear.
>
> When i say no diff content I mean that git log --log-size -p it's a
> perfect valid command but --log-size will make git print the size of
> _only_ the log part, it means from the line after "log size xxx\n"
> until the end of log message that can be '\0' (if no diff) or the
> beginning of diff part.

What I originally meant with my comment was to _error_ out if
the user says "git log --log-size -p" (or "git show --log-size"
without "-s").

But I guess it is probably fine, as long as it is crystal clear
to the users that we do not do size for non log part.  I still
have this nagging feeling that this is an ugly workaround for Qt
library's programming interface, though...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 16:52 [PATCH] Add --show-size to git log to print message size Marco Costalba
  2007-07-14 19:03 ` Junio C Hamano
@ 2007-07-17  7:49 ` Andy Parkins
  2007-07-17 16:36   ` Marco Costalba
  2007-07-25  4:03 ` Junio C Hamano
  2 siblings, 1 reply; 29+ messages in thread
From: Andy Parkins @ 2007-07-17  7:49 UTC (permalink / raw)
  To: git; +Cc: Marco Costalba, Junio C Hamano

On Saturday 2007 July 14, Marco Costalba wrote:
> Print message size just before the corresponding message
> to speedup the parsing by scripts/porcelains tools.

Does this really give a speedup?

I'd be surprised, as long as the parse is being done during the output from 
git using the QProcess::readyRead() signal once and only once, then git is 
the bottle neck.  Parsing the stream is almost trivial in comparison to the 
work that git is doing.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-17  7:49 ` Andy Parkins
@ 2007-07-17 16:36   ` Marco Costalba
  0 siblings, 0 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-17 16:36 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano

On 7/17/07, Andy Parkins <andyparkins@gmail.com> wrote:
>
> Does this really give a speedup?
>
> I'd be surprised, as long as the parse is being done during the output from
> git using the QProcess::readyRead() signal once and only once, then git is
> the bottle neck.  Parsing the stream is almost trivial in comparison to the
> work that git is doing.
>

Well, it's not exactly from readyRead() but you are right, the bottle
neck it's git.

A bare:
time git log --parents --boundary --pretty=raw -z --log-size
--topo-order HEAD > /dev/null

gives on linux and git repositories
git 960ms
linux 6920ms

Instead qgit gives, with the same command:

WITHOUT --log-size patch
git  1170ms  (+21%)
linux 7942ms (+15%)


WITH --log-size patch
git 1125ms (+17%)
linux 7820ms (+13%)

I don't know why overhead on Linux is less, probably on small repos
the GUI part of qgit that you can consider more or less a constant
time weights more regarding parsing.


Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-14 16:52 [PATCH] Add --show-size to git log to print message size Marco Costalba
  2007-07-14 19:03 ` Junio C Hamano
  2007-07-17  7:49 ` Andy Parkins
@ 2007-07-25  4:03 ` Junio C Hamano
  2007-07-25  9:38   ` Marco Costalba
  2 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2007-07-25  4:03 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List

I've been re-reviewing recent patches, and this is one of them.
However, I am wondering if this is an intended behaviour...

: git.git master; ./git-log --log-size --abbrev-commit --pretty=oneline \
  ko/master..master
9d468ac... log size 47
Add --log-size to git log to print message size
ca193cf... log size 40
git am: skip pine's internal folder data
d1cc130... log size 48
Teach git-commit about commit message templates.
af66366... log size 41
Teach approxidate() to understand "never"
7b69b87... log size 64
git log -g: Complain, but do not fail, when no reflogs are there
2d8ae40... log size 49
send-email: Update regex parsing for pine aliases
f836f1a... log size 40
cvsexportcommit: avoid racy CVS problem.
1843d8d... log size 53
cleanup unpack-trees.c: shrink struct tree_entry_list
24d0063... log size 56
filter-branch: fix dash complaining about "Missing '))'"
3473e7d... log size 56
gitweb: More detailed error messages for snapshot format
93c22ee... log size 47
git.el: Support for incremental status updates.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] Add --show-size to git log to print message size
  2007-07-25  4:03 ` Junio C Hamano
@ 2007-07-25  9:38   ` Marco Costalba
  0 siblings, 0 replies; 29+ messages in thread
From: Marco Costalba @ 2007-07-25  9:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Paul Mackerras

On 7/25/07, Junio C Hamano <gitster@pobox.com> wrote:
> I've been re-reviewing recent patches, and this is one of them.

Thanks for pushing to pu. Please tell me if you want me to send you a
patch to update to what we have discussed, specifically using "length
unknown" and changing the name to --show-lengths.

> However, I am wondering if this is an intended behaviour...
>
> : git.git master; ./git-log --log-size --abbrev-commit --pretty=oneline \
>   ko/master..master
> 9d468ac... log size 47
> Add --log-size to git log to print message size
> ca193cf... log size 40

Well, the patch should be used to speedup the parsing by tools because
you can skip big part of the record and jump directly to the beginning
of the next one. So IMHO I don't see a lot of sense in using it
together with --pretty=oneline.

Anyway only default options should be guaranteed to behave correctly
with all the other options. In general, responsibility for what you
see on the screen it's on the tips of user's fingers. IMHO
responsibility of git is of not crashing and do not show incorrect
info, not that the info should be useful.

Paul, I don't know gitk and Tcl to being able to answer myself, but I
would like to know if this new option could be useful also for gitk.

This option, after the first line, gives the size of the following
part of the record. Does this allow you to delay the parsing of the
biggest part of the commit record?

Author name, date, log title, log message could be read only when it's
needed, so that after reading the first couple lines of a commit you
can point directly to the beginning of the next one skipping the rest.

Marco

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-07-25 10:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-14 16:52 [PATCH] Add --show-size to git log to print message size Marco Costalba
2007-07-14 19:03 ` Junio C Hamano
2007-07-14 20:46   ` Marco Costalba
2007-07-15  9:35     ` Alex Riesen
2007-07-15 10:06       ` Marco Costalba
2007-07-15 10:48         ` Alex Riesen
2007-07-15 11:32           ` Marco Costalba
2007-07-15 12:29             ` Marco Costalba
2007-07-15 12:35               ` Sean
2007-07-15 14:58                 ` Marco Costalba
2007-07-15 15:04                   ` Sean
2007-07-15 15:58                     ` Marco Costalba
2007-07-15 16:16                       ` Sean
2007-07-15 16:27                         ` Marco Costalba
2007-07-15 16:34                           ` Sean
2007-07-15 16:54                             ` Marco Costalba
2007-07-15 18:14               ` Linus Torvalds
2007-07-15 18:45                 ` Marco Costalba
2007-07-16 12:04   ` Marco Costalba
2007-07-16 12:31     ` Alex Riesen
2007-07-16 17:50       ` Junio C Hamano
2007-07-16 17:55         ` Marco Costalba
2007-07-16 18:02           ` Marco Costalba
2007-07-16 22:37             ` Junio C Hamano
2007-07-16 17:50   ` Marco Costalba
2007-07-17  7:49 ` Andy Parkins
2007-07-17 16:36   ` Marco Costalba
2007-07-25  4:03 ` Junio C Hamano
2007-07-25  9:38   ` Marco Costalba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.