All of lore.kernel.org
 help / color / mirror / Atom feed
* git: detect file creator
@ 2022-07-15 12:39 Sim Tov
  2022-07-15 13:22 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Sim Tov @ 2022-07-15 12:39 UTC (permalink / raw)
  To: git

Hello,

I run a book digitizing project and pay people certain rate per 10K
characters for the text files they upload to a git repo. Till now I
was using following command to detect files authored by CertainEditor:

    git log --use-mailmap --no-merges --author="CertainEditor"
--name-only --pretty=format:""

Then I would pipe this output in `wc -m` and get amount of characters
authored by CertainEditor and pay him accordingly. Usually editors do
not touch each other's files and everything worked well. However
recently one editor spotted a typo in somebody else's file and
corrected it. This behavior is actually good and I would like to
encourage it. However, now the command above lists the corrected file
also as his, and so he gets paid for all the characters in the file
while he changed only one of them. This, obviously, is not good.

1. Do you have an idea how can I list all the files **created** (not
authored / committed) by a user, so I can implement a fair characters
counting?

2. Maybe some commit hooks can be used that will check whether the
Author of a new commit is different from the previous one and if true
- override it to the previous Author?

3. Those small changes by a non-creator may be left not paid for (as
this action is not so intensive and may be reciprocal), but if you
have a good idea how I can pay for the "diff" the non-creator provides
- it would be nice! Do you think this "diff" should be deducted from
the creator? And if yes - how?

Thank you!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git: detect file creator
  2022-07-15 12:39 git: detect file creator Sim Tov
@ 2022-07-15 13:22 ` Ævar Arnfjörð Bjarmason
  2022-07-16 21:38   ` Sim Tov
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-15 13:22 UTC (permalink / raw)
  To: Sim Tov; +Cc: git


On Fri, Jul 15 2022, Sim Tov wrote:

> Hello,
>
> I run a book digitizing project and pay people certain rate per 10K
> characters for the text files they upload to a git repo. Till now I

It's nice to see that someone still believes in The Mythical Man- Month
:)

> was using following command to detect files authored by CertainEditor:
>
>     git log --use-mailmap --no-merges --author="CertainEditor"
> --name-only --pretty=format:""
>
> Then I would pipe this output in `wc -m` and get amount of characters
> authored by CertainEditor and pay him accordingly. Usually editors do
> not touch each other's files and everything worked well. However
> recently one editor spotted a typo in somebody else's file and
> corrected it. This behavior is actually good and I would like to
> encourage it. However, now the command above lists the corrected file
> also as his, and so he gets paid for all the characters in the file
> while he changed only one of them. This, obviously, is not good.
>
> 1. Do you have an idea how can I list all the files **created** (not
> authored / committed) by a user, so I can implement a fair characters
> counting?

If you want to adapt your current script perhaps --diff-filter helps,
but...

> 2. Maybe some commit hooks can be used that will check whether the
> Author of a new commit is different from the previous one and if true
> - override it to the previous Author?

..it seems you should fundamentally stop using it, and instead iterate
over the commits, and pay for a "diff". Then you'd get the original
change, as well as the change-on-top.

> 3. Those small changes by a non-creator may be left not paid for (as
> this action is not so intensive and may be reciprocal), but if you
> have a good idea how I can pay for the "diff" the non-creator provides
> - it would be nice!

Just wc -l on the changed files(s) before & after, and pay the abs()
difference.

> Do you think this "diff" should be deducted from the creator? And if
> yes - how?

You could walk it back with "git blame" I guess.

But you might want to consider the economic & social mis-incentives of
lifting money from your co-workers coffers by pointing out a mistake to
them...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git: detect file creator
  2022-07-15 13:22 ` Ævar Arnfjörð Bjarmason
@ 2022-07-16 21:38   ` Sim Tov
  2022-07-18 20:12     ` Sim Tov
  0 siblings, 1 reply; 6+ messages in thread
From: Sim Tov @ 2022-07-16 21:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Thank you very much, again, for the very useful suggestions!

> > 1. Do you have an idea how can I list all the files **created** (not
> > authored / committed) by a user, so I can implement a fair characters
> > counting?
>
> If you want to adapt your current script perhaps --diff-filter helps,
> but...

I added `--diff-filter=AR` to my original command like this:

git log --use-mailmap --no-merges --diff-filter=AR
--author="CertainEditor" --name-only --pretty=format:""

and it seems to do the job! May I have missed/messed something here?

> > 2. Maybe some commit hooks can be used that will check whether the
> > Author of a new commit is different from the previous one and if true
> > - override it to the previous Author?
>
> ..it seems you should fundamentally stop using it, and instead iterate
> over the commits, and pay for a "diff". Then you'd get the original
> change, as well as the change-on-top.

2.1. Will such iteration over history not more time consuming compared
to my command?
2.2. Will it not account for useless "diff"s, like add some rubbish,
delete that rubbish - and I'll have to pay for it...

> > 3. Those small changes by a non-creator may be left not paid for (as
> > this action is not so intensive and may be reciprocal), but if you
> > have a good idea how I can pay for the "diff" the non-creator provides
> > - it would be nice!
>
> Just wc -l on the changed files(s) before & after, and pay the abs()
> difference.

I pay per character, so you probably mean `wc -m`... but what do you
mean by changed files? The command above
is part of a script I run all the time I want to measure each editors'
char count... it might be even without any specific
recent changes...

> > Do you think this "diff" should be deducted from the creator? And if
> > yes - how?
>
> You could walk it back with "git blame" I guess.
>
> But you might want to consider the economic & social mis-incentives of
> lifting money from your co-workers coffers by pointing out a mistake to
> them...

Good point :-) I will not do it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git: detect file creator
  2022-07-16 21:38   ` Sim Tov
@ 2022-07-18 20:12     ` Sim Tov
  2022-07-18 20:40       ` Sim Tov
  0 siblings, 1 reply; 6+ messages in thread
From: Sim Tov @ 2022-07-18 20:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

> Thank you very much, again, for the very useful suggestions!
>
> > > 1. Do you have an idea how can I list all the files **created** (not
> > > authored / committed) by a user, so I can implement a fair characters
> > > counting?
> >
> > If you want to adapt your current script perhaps --diff-filter helps,
> > but...
>
> I added `--diff-filter=AR` to my original command like this:
>
> git log --use-mailmap --no-merges --diff-filter=AR
> --author="CertainEditor" --name-only --pretty=format:""
>
> and it seems to do the job! May I have missed/messed something here?

Now I see that if one editor renames(/moves) the files created by
another editor - the former gets credits on all the characters inside
those renamed files.
This is bad. And it seemingly stems from the fact that in
`--diff-filter=AR` means `A` OR `R`. Is there a way to do `A` OR (`A`
AND `R`), meaning if a file was
renamed then list it only if he also was created by that same author...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git: detect file creator
  2022-07-18 20:12     ` Sim Tov
@ 2022-07-18 20:40       ` Sim Tov
  2022-07-22  6:11         ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Sim Tov @ 2022-07-18 20:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

> > Thank you very much, again, for the very useful suggestions!
> >
> > > > 1. Do you have an idea how can I list all the files **created** (not
> > > > authored / committed) by a user, so I can implement a fair characters
> > > > counting?
> > >
> > > If you want to adapt your current script perhaps --diff-filter helps,
> > > but...
> >
> > I added `--diff-filter=AR` to my original command like this:
> >
> > git log --use-mailmap --no-merges --diff-filter=AR
> > --author="CertainEditor" --name-only --pretty=format:""
> >
> > and it seems to do the job! May I have missed/messed something here?
>
> Now I see that if one editor renames(/moves) the files created by
> another editor - the former gets credits on all the characters inside
> those renamed files.
> This is bad. And it seemingly stems from the fact that in
> `--diff-filter=AR` means `A` OR `R`. Is there a way to do `A` OR (`A`
> AND `R`), meaning if a file was
> renamed then list it only if he also was created by that same author...

PS: the only way I see to approach it is to create a mechanism that
will prevent renaming
files by anyone except for the file creator (the one who Added file).
Can such a commit hook be created? If yes - how?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git: detect file creator
  2022-07-18 20:40       ` Sim Tov
@ 2022-07-22  6:11         ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2022-07-22  6:11 UTC (permalink / raw)
  To: Sim Tov; +Cc: Ævar Arnfjörð Bjarmason, git

Sim Tov <smntov@gmail.com> writes:

> PS: the only way I see to approach it is to create a mechanism that
> will prevent renaming
> files by anyone except for the file creator (the one who Added file).
> Can such a commit hook be created? If yes - how?

As Git is distributed, local hooks cannot fundamentally be used as
an enforcement mechanism.

How about running "git blame" on the end-result, with -C/-M turned
on, instead of paying too much attention to file creation?  That
way, at least you can trace who contributed the words on each line.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-22  6:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-15 12:39 git: detect file creator Sim Tov
2022-07-15 13:22 ` Ævar Arnfjörð Bjarmason
2022-07-16 21:38   ` Sim Tov
2022-07-18 20:12     ` Sim Tov
2022-07-18 20:40       ` Sim Tov
2022-07-22  6:11         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.