All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Beller <sbeller@google.com>, Jeff King <peff@peff.net>,
	git@vger.kernel.org
Subject: Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
Date: Tue, 23 Jun 2015 09:11:32 +0200	[thread overview]
Message-ID: <558906A4.8060106@alum.mit.edu> (raw)
In-Reply-To: <xmqqtwtzfo79.fsf@gitster.dls.corp.google.com>

On 06/22/2015 11:06 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> Error out if the ref_transaction includes more than one update for any
>> refname.
>>
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
>> ---
>>  refs.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
> 
> This somehow feels like "ehh, I now know better and this function
> should have been like this from the beginning" to me.
> 
> But that is OK.
> 
> Is the initial creation logic too fragile to deserve its own
> function to force callers to think about it, by the way?
> 
> What I am wondering is if we could turn the safety logic that appear
> here (i.e. no existing refs must be assumed by the set of updates,
> etc.)  into an optimization cue and implement this as a special case
> helper to ref_transaction_commit(), i.e.
> 
> 	ref_transaction_commit(...)
>         {
> 		if (updates are all initial creation &&
>                     no existing refs in repository)
> 			return initial_ref_transaction_commit(...);
> 		/* otherwise we do the usual thing */
> 		...
> 	}
> 
> and have "clone" call ref_transaction_commit() as usual.

The safety logic in this function is (approximately) necessary, but not
sufficient, to guarantee safety. One of the shortcuts that it takes is
not locking the references while they are being created. Therefore, it
would be unsafe for one process to call ref_transaction_commit() while
another is calling initial_ref_transaction_commit(). So the caller has
to "know" somehow that no other processes are working in the repository
for this optimization to be safe. It conveys that knowledge by calling
initial_ref_transaction_commit() rather than ref_transaction_commit().

Of course the next question is, "How does `git clone` know that no other
process is working in the new repository?" Actually, it doesn't. For
example, I just verified that I can run

    git clone $URL mygit &
    sleep 0.1
    cd mygit
    git commit --allow-empty -m "New root commit"

and thereby "overwrite" the upstream `master` without the usual
non-fast-forward protection. I guess we are just relying on the user's
common sense not to run Git commands in a new repository before its
creation is complete.

I suppose we *could* special-case `git clone` to not finish the
initialization of the repository (for example, not write the `config`
file) until *after* the packed-refs file is written. This would prevent
other git processes from recognizing the directory as a Git repository
and so prevent them from running before the clone is finished.

But I think if anything it would make more sense to go the other direction:

* Teach ref_transaction_commit() an option that asks it to write
  references updates to packed-refs instead of loose refs (but
  locking the references as usual).

* Change clone to use ref_transaction_commit() like everybody
  else, passing it the new REFS_WRITE_TO_PACKED_REFS option.

Then clone would participate in the normal locking protocol, and it
wouldn't *matter* if another process runs before the clone is finished.
There would also be some consistency benefits. For example, if
core.logallrefupdates is set globally or on the command line, the
initial reference creations would be reflogged. And other operations
that write references in bulk could use the new
REFS_WRITE_TO_PACKED_REFS option to prevent loose reference proliferation.

But I don't think any of this is a problem in practice, and I think we
can live with using the optimized-but-not-100%-safe
initial_ref_transaction_commit() for cloning.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2015-06-23  7:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 01/19] delete_ref(): move declaration to refs.h Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 02/19] remove_branches(): remove temporary Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 03/19] delete_ref(): handle special case more explicitly Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 04/19] delete_refs(): new function for the refs API Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 05/19] delete_refs(): make error message more generic Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 06/19] delete_refs(): bail early if the packed-refs file cannot be rewritten Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 07/19] prune_remote(): use delete_refs() Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 08/19] prune_refs(): " Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 09/19] repack_without_refs(): make function private Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 10/19] initial_ref_transaction_commit(): function for initial ref creation Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 11/19] refs: remove some functions from the module's public interface Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs Michael Haggerty
2015-06-22 21:06   ` Junio C Hamano
2015-06-23  7:11     ` Michael Haggerty [this message]
2015-06-23 17:44       ` Junio C Hamano
2015-06-22 14:03 ` [PATCH v3 13/19] initial_ref_transaction_commit(): check for ref D/F conflicts Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 14/19] refs: move the remaining ref module declarations to refs.h Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 15/19] refs.h: add some parameter names to function declarations Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 16/19] check_branch_commit(): make first parameter const Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 17/19] update_ref(): don't read old reference value before delete Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 18/19] cmd_update_ref(): make logic more straightforward Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1 Michael Haggerty
2015-06-22 21:10   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558906A4.8060106@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.