All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Sprent <williams@unity3d.com>
To: Victoria Dye <vdye@github.com>,
	William Sprent via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: "Eric Sunshine" <sunshine@sunshineco.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: Re: [PATCH v2] ls-tree: add --sparse-filter-oid argument
Date: Wed, 25 Jan 2023 14:47:06 +0100	[thread overview]
Message-ID: <569043fb-9766-037e-c587-1545c2978e7d@unity3d.com> (raw)
In-Reply-To: <7ccf7b17-4448-5ef4-63b1-9073a400e486@github.com>

On 24/01/2023 21.11, Victoria Dye wrote:
> William Sprent via GitGitGadget wrote:
>> From: William Sprent <williams@unity3d.com>
>>
>> There is currently no way to ask git the question "which files would be
>> part of a sparse checkout of commit X with sparse checkout patterns Y".
>> One use-case would be that tooling may want know whether sparse checkouts
>> of two commits contain the same content even if the full trees differ.
>> Another interesting use-case would be for tooling to use in conjunction
>> with 'git update-index --index-info'.
> 
> These types of use cases align nicely with "Behavior A" as described in
> 'Documentation/technical/sparse-checkout.txt' [1]. I think adding a
> '--scope=(sparse|all)' option to 'ls-trees' would be a good way to make
> progress on the goals of that design document as well as serve the needs
> described above.
> 
> [1] https://lore.kernel.org/git/pull.1367.v4.git.1667714666810.gitgitgadget@gmail.com/
> >>
>> 'rev-list --objects --filter=sparse:oid' comes close, but as rev-list is
>> concerned with objects rather than directory trees, it leaves files out
>> when the same blob occurs in at two different paths.
>>
>> It is possible to ask git about the sparse status of files currently in
>> the index with 'ls-files -t'. However, this does not work well when the
>> caller is interested in another commit, intererested in sparsity
>> patterns that aren't currently in '.git/info/sparse-checkout', or when
>> working in with bare repo.
> 
> I'm a bit confused by your described use cases here, though. Right now,
> 'sparse-checkout' is a local repo-only optimization (for performance and/or
> scoping user workspaces), but you seem to be implying a need for
> "sparse-checkout patterns" as a general-purpose data format (like a config
> file or 'rebase-todo') that can be used to manually configure command
> behavior.
> 
> If that is what you're getting at, it seems like a pretty substantial
> expansion to the scope of what we consider "sparse checkout". That's not to
> say it isn't a good idea - I can definitely imagine tooling where this type
> of functionality is useful - just that it warrants careful consideration so
> we don't over-complicate the typical sparse-checkout user experience.
> 

I guess it is sort of what I'm getting at. I want to enable tooling (in 
particular) to be able to interrogate git about sparse checkouts without 
having to checkout a commit and load a set of patterns. There's already 
a deterministic relationship between a (commit * sparse checkout 
patterns) and a list of included files, so I think it makes sense to be 
able to ask git about it.

I agree with your concerns about not over complicating things for the 
average user. I think the cone/non-cone distinctions are already a bit 
hard to grasp (at least it was for me). FWIW, it isn't my intention that 
an average user should use the "explicitly pass patterns to ls-tree" 
functionality. But of course we should still get it right.


>>
>> To fill this gap, add a new argument to ls-tree '--sparse-filter-oid'
>> which takes the object id of a blob containing sparse checkout patterns
>> that filters the output of 'ls-tree'. When filtering with given sparsity
>> patterns, 'ls-tree' only outputs blobs and commit objects that
>> match the given patterns.
> 
> I don't think a blob OID is the best way to specify an arbitrary pattern set
> in this case. OIDs will only be created for blobs that are tracked in Git;
> it's possible that your project tracks potential sparse-checkout patterns in
> Git, but it seems overly restrictive. You could instead specify the file
> with a path on-disk (like the '--file' options in various commands, e.g.
> 'git commit'); if you did need to get that file from the object store, you
> could first get its contents with 'git cat-file'.
> 

I'm fine with changing it to be a file. I picked it to align with

  'git rev-list --filter=sparse:<blob-oid>'

which I think is the only other place in the code base where you can 
query about arbitrary filters. But I agree, it is a bit awkward to use.

>>
>> While it may be valid in some situations to output a tree object -- e.g.
>> when a cone pattern matches all blobs recursively contained in a tree --
>> it is less unclear what should be output if a sparse pattern matches
>> parts of a tree.
>>
>> To allow for reusing the pattern matching logic found in
>> 'path_in_sparse_checkout_1()' in 'dir.c' with arbitrary patterns,
>> extract the pattern matching part of the function into its own new
>> function 'recursively_match_path_with_sparse_patterns()'.
>>
>> Signed-off-by: William Sprent <williams@unity3d.com>
>> ---
>>      ls-tree: add --sparse-filter-oid argument
>>      
>>      I'm resubmitting this change as rebased on top of 'master', as it
>>      conflicted with the topic 'ls-tree.c: clean-up works' 1
>>      [https://public-inbox.org/git/20230112091135.20050-1-tenglong.tl@alibaba-inc.com],
>>      which was merged to 'master' recently.
>>      
>>      This versions also incorporates changes based on the comments made in 2
>>      [https://public-inbox.org/git/CAPig+cRgZ0CrkqY7mufuWmhf6BC8yXjXXuOTEQjuz+Y0NA+N7Q@mail.gmail.com/].
>>      
>>      I'm also looping in contributors that have touched ls-tree and/or
>>      sparse-checkouts recently. I hope that's okay.
> 
> Definitely okay, thanks for CC-ing!
> 
> Overall, this is an interesting extension to 'sparse-checkout', and opens up
> some opportunities for expanded tooling. However, at an implementation
> level, I think it's addressing two separate needs ("constrain 'ls-files' to
> display files matching patterns" and "specify sparse patterns not in
> '.git/info/sparse-checkout'") in one option, which makes for a somewhat
> confusing and inflexible user experience. What about instead breaking this
> into two options:
> 
> * --scope=(sparse|all): limits the scope of the files output by ("Behavior
>    A" vs. "Behavior B" from the document linked earlier).
> * --sparse-patterns=<file>: specifies "override" patterns to use instead of
>    those in '.git/info/sparse-checkout' (whether 'sparse-checkout' is enabled
>    for the repository or not).
> 

That makes sense to me. I'll look into making those changes.

> I haven't looked at your implementation in detail yet, but I did want to
> offer a recommendation in case you hadn't considered it: if you want to
> allow the use of patterns from a user-specified specific file, it would be
> nice to do it in a way that fully replaces the "default" sparse-checkout
> settings at the lowest level (i.e., override the values of
> 'core_apply_sparse_checkout', 'core_sparse_checkout_cone', and
> 'get_sparse_checkout_filename()'). Doing it that way would both make it
> easier for other commands to add a '--sparse-patterns' option, and avoid the
> separate code path ('path_in_sparse_checkout_1()' vs.
> 'recursively_match_path_with_sparse_patterns()', for example) when dealing
> with '.git/info/sparse-checkout' patterns vs. manually-specified patterns.
> 

Thanks for the pointers. I'll see what I can do. Do you mean something 
along the line of the following?

    set_sparse_checkout_file(filename);
    init_sparse_checkout_patterns(istate);
    _ = path_in_sparse_checkout_1(some_path, istate, ...);


I do I think I like the explicitness of first loading the pattern_list 
and then passing it to the matching function, though. But maybe it is 
too cumbersome.

>>      
>>      William
>>
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1459%2Fwilliams-unity%2Fls-tree-sparse-v2
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1459/williams-unity/ls-tree-sparse-v2
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1459
> 


  reply	other threads:[~2023-01-25 13:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 17:01 [PATCH] ls-tree: add --sparse-filter-oid argument William Sprent via GitGitGadget
2023-01-13 14:17 ` Eric Sunshine
2023-01-13 20:01   ` Junio C Hamano
2023-01-16 15:13     ` William Sprent
2023-01-16 12:14   ` William Sprent
2023-01-23 11:46 ` [PATCH v2] " William Sprent via GitGitGadget
2023-01-23 13:00   ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-23 13:06   ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30     ` William Sprent
2023-01-24 20:11   ` Victoria Dye
2023-01-25 13:47     ` William Sprent [this message]
2023-01-25 18:32       ` Victoria Dye
2023-01-26 14:55         ` William Sprent
2023-01-25  5:11   ` Elijah Newren
2023-01-25 16:16     ` William Sprent
2023-01-26  3:25       ` Elijah Newren
2023-01-27 11:58         ` William Sprent
2023-01-28 16:45           ` Elijah Newren
2023-01-30 15:28             ` William Sprent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569043fb-9766-037e-c587-1545c2978e7d@unity3d.com \
    --to=williams@unity3d.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.