* Exec upload-pack on remote with what parameters to get direntries. @ 2021-08-28 12:56 Stef Bon 2021-08-30 19:10 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Stef Bon @ 2021-08-28 12:56 UTC (permalink / raw) To: Git Users Hi, I've got a custom ssh library which I use to make a connection to a git server like www.github.com, user stefbon. Now I want to get the direntries of a remote repo, and I know I have to use upload-pack for that, but with what parameters? I want to use the outcome to make a fuse fs, user can browse the files. Possibly the user can also view the contents. Stef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-28 12:56 Exec upload-pack on remote with what parameters to get direntries Stef Bon @ 2021-08-30 19:10 ` Jeff King 2021-08-30 19:43 ` Junio C Hamano 2021-08-31 6:38 ` Stef Bon 0 siblings, 2 replies; 12+ messages in thread From: Jeff King @ 2021-08-30 19:10 UTC (permalink / raw) To: Stef Bon; +Cc: Git Users On Sat, Aug 28, 2021 at 02:56:17PM +0200, Stef Bon wrote: > I've got a custom ssh library which I use to make a connection to a > git server like www.github.com, user stefbon. > > Now I want to get the direntries of a remote repo, and I know I have > to use upload-pack for that, but with what parameters? > > I want to use the outcome to make a fuse fs, user can browse the > files. Possibly the user can also view the contents. The protocol used by upload-pack is described in Documentation/technical/pack-protocol.txt, but in short: I don't think it will do what you want. There is no operation to list the tree contents, for example, nor really even a good way to fetch a single object. The protocol is geared around efficiently transferring slices of history, so it is looking at sets of reachable objects (what the client is asking for, and what it claims to have). You might be able to cobble something together with shallow and partial fetches. E.g., something like: git clone --depth 1 --filter=blob:none --single-branch -b $branch is basically asking to send only a single commit, plus all of its trees, but no blobs. From there you could parse the tree objects to assemble a directory listing. Possibly with a tree:depth filter you could even do it iteratively. Some hosts offer a separate API that would give you a much nicer interface. E.g., GitHub has: https://docs.github.com/en/rest/reference/git#trees But of course that won't work with GitLab, etc, and you'd have to implement against the API for each hosting provider. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-30 19:10 ` Jeff King @ 2021-08-30 19:43 ` Junio C Hamano 2021-08-30 20:46 ` Jeff King 2021-08-31 6:38 ` Stef Bon 1 sibling, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2021-08-30 19:43 UTC (permalink / raw) To: Jeff King; +Cc: Stef Bon, Git Users Jeff King <peff@peff.net> writes: > There is no operation to list the tree contents, for example, nor really > even a good way to fetch a single object. The protocol is geared around > efficiently transferring slices of history, so it is looking at sets of > reachable objects (what the client is asking for, and what it claims to > have). > > You might be able to cobble something together with shallow and partial > fetches. E.g., something like: > > git clone --depth 1 --filter=blob:none --single-branch -b $branch I was hoping that our support for fetching a single object (not necessarily a commit) at the protocol level was good enough, so that Stef's fuse/nfs daemon can fetch the tree object it is interested in. There also is an effort, slowly moving to add verbs like object-info to the protocol to help the vfs usecase, but primitives at too low a level would be killed by latency, so it is somewhat unknown how effective it would be. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-30 19:43 ` Junio C Hamano @ 2021-08-30 20:46 ` Jeff King 2021-08-30 21:21 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2021-08-30 20:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Stef Bon, Git Users On Mon, Aug 30, 2021 at 12:43:38PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > There is no operation to list the tree contents, for example, nor really > > even a good way to fetch a single object. The protocol is geared around > > efficiently transferring slices of history, so it is looking at sets of > > reachable objects (what the client is asking for, and what it claims to > > have). > > > > You might be able to cobble something together with shallow and partial > > fetches. E.g., something like: > > > > git clone --depth 1 --filter=blob:none --single-branch -b $branch > > I was hoping that our support for fetching a single object (not > necessarily a commit) at the protocol level was good enough, so that > Stef's fuse/nfs daemon can fetch the tree object it is interested > in. I don't think there's a clean way to ask for a single object. But thinking on it more, I suspect you could do something _really_ hacky using the new object-type filters: git fetch --filter=object:type=commit --filter=object:type=blob Because we AND the filters together, no object can satisfy both. But because we also send any objects which were _explicitly_ requested by the client, you can now fetch whatever single objects you want. And as long as you tell the other side you don't have any objects, it won't send any deltas. > There also is an effort, slowly moving to add verbs like object-info > to the protocol to help the vfs usecase, but primitives at too low a > level would be killed by latency, so it is somewhat unknown how > effective it would be. Yes. At GitHub we actually have a custom endpoint which hooks up "cat-file --batch" with a format of the client's choosing. That's what (indirectly) feeds things like raw.github.com. I've been tempted to send it upstream, but it's pretty ugly, and does give the client a lot of power (for now, the placeholders you can use with cat-file are not that powerful, but if we start to unify with ref-filter, etc, then we run into situations like we had with %(describe) recently). Likewise, the v2 object-info endpoint _could_ accept arbitrary format strings (it's the same idea, just with --batch-check instead of --batch). -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-30 20:46 ` Jeff King @ 2021-08-30 21:21 ` Junio C Hamano 2021-08-31 14:23 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2021-08-30 21:21 UTC (permalink / raw) To: Jeff King; +Cc: Stef Bon, Git Users Jeff King <peff@peff.net> writes: > Yes. At GitHub we actually have a custom endpoint which hooks up > "cat-file --batch" with a format of the client's choosing. That's what > (indirectly) feeds things like raw.github.com. > > I've been tempted to send it upstream, but it's pretty ugly, and does > give the client a lot of power (for now, the placeholders you can use > with cat-file are not that powerful, but if we start to unify with > ref-filter, etc, then we run into situations like we had with > %(describe) recently). Likewise, the v2 object-info endpoint _could_ > accept arbitrary format strings (it's the same idea, just with > --batch-check instead of --batch). Yeah, the object-info actually was from folks who are interested in doing something similar, and it would be nice if we can share the protocol endpoint that is more suitable for interactive tree and history traversal to help those who want to do virtual filesystem. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-30 21:21 ` Junio C Hamano @ 2021-08-31 14:23 ` Ævar Arnfjörð Bjarmason 2021-08-31 15:35 ` Bruno Albuquerque 0 siblings, 1 reply; 12+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-08-31 14:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Stef Bon, Git Users, Bruno Albuquerque On Mon, Aug 30 2021, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> Yes. At GitHub we actually have a custom endpoint which hooks up >> "cat-file --batch" with a format of the client's choosing. That's what >> (indirectly) feeds things like raw.github.com. >> >> I've been tempted to send it upstream, but it's pretty ugly, and does >> give the client a lot of power (for now, the placeholders you can use >> with cat-file are not that powerful, but if we start to unify with >> ref-filter, etc, then we run into situations like we had with >> %(describe) recently). Likewise, the v2 object-info endpoint _could_ >> accept arbitrary format strings (it's the same idea, just with >> --batch-check instead of --batch). > > Yeah, the object-info actually was from folks who are interested in > doing something similar, and it would be nice if we can share the > protocol endpoint that is more suitable for interactive tree and > history traversal to help those who want to do virtual filesystem. While this is all clever, I think this discussion really suggests that the first thing we should do is make the relatively recent "object-info" protocol verb not a default part of the supported v2 protocol we ship in git.git. I.e. someone setting up a git server probably isn't going to suspect that one day their server load is going to go up by some big % because some developer somewhere is using a local IDE whose every file click on a directory is a new remote server request (i.e. the case where "object-info"'s functionality is expanded like this). I found myself wondering this when reading serve.c the other day, i.e. why we have "always_advertise" for object-info, but it seemed innocuous enough given how it's described in a2ba162cda2 (object-info: support for retrieving object info, 2021-04-20). But just as a general thing, while I'm very much in favor of git growing *optional* support for more server<->client cooperation and CPU offloading, even things like "git grep" or "git log" optimistically running server-side, I think those sorts of features should definitely be off by default for the reasons noted above. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-31 14:23 ` Ævar Arnfjörð Bjarmason @ 2021-08-31 15:35 ` Bruno Albuquerque 2021-08-31 16:23 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Bruno Albuquerque @ 2021-08-31 15:35 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Jeff King, Stef Bon, Git Users On Tue, Aug 31, 2021 at 7:28 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: [Replying again as I used HTML mail by mistake. Sorry.] > I.e. someone setting up a git server probably isn't going to suspect > that one day their server load is going to go up by some big % because > some developer somewhere is using a local IDE whose every file click on > a directory is a new remote server request (i.e. the case where > "object-info"'s functionality is expanded like this). Do you mean by someone directly sending object-info requests? I am working on wiring object-info to some of the existing tools (cat-file/ls-tree) so this is the general idea about how I see this being used: - object-info would be used when it made sense but only if the actual object being queried is not already fetched locally. If you think of a virtual filesystem that is backed by, say, partial clones, this mostly means retrieving metadata information to be displayed to the user. - Still in the context of a virtual filesystem, metadata is usually cached locally independently of Git itself, further reducing the need to call object-info (but, of course, this is a brittle assumption as it is not controlled by Git). - git cat-file, for example, would be changed to support real batching and then send a single request instead of the multiple requests it does currently. My point is that I understand where your worry is coming from and as long as someone can send arbitrary requests then it is possible your scenario of a heavier server load can potentially happen but as far as the expected canonical usage, I do not think this would be a problem and, in fact, under some usage patterns it might make things better (mostly due to batching support in object-info). With all that being said, I don' t think making it optional would be an issue so I have no strong feelings about this. I am fine with whatever is agreed to be the best approach. > I found myself wondering this when reading serve.c the other day, > i.e. why we have "always_advertise" for object-info, but it seemed > innocuous enough given how it's described in a2ba162cda2 (object-info: > support for retrieving object info, 2021-04-20). For what it is worth, The same change is now being reviewed in JGit and there the feature is conditionally enabled. But that was a side-effect of needing to deploy it to multiple servers before making the feature available to clients. -- Bruno Albuquerque | Software Engineer | bga@google.com | +1 650-395-8242 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-31 15:35 ` Bruno Albuquerque @ 2021-08-31 16:23 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2021-08-31 16:23 UTC (permalink / raw) To: Bruno Albuquerque Cc: Ævar Arnfjörð Bjarmason, Jeff King, Stef Bon, Git Users Bruno Albuquerque <bga@google.com> writes: > With all that being said, I don' t think making it optional would be > an issue so I have no strong feelings about this. I am fine with > whatever is agreed to be the best approach. > >> I found myself wondering this when reading serve.c the other day, >> i.e. why we have "always_advertise" for object-info, but it seemed >> innocuous enough given how it's described in a2ba162cda2 (object-info: >> support for retrieving object info, 2021-04-20). > For what it is worth, The same change is now being reviewed in JGit > and there the feature is conditionally enabled. But that was a > side-effect of needing to deploy it to multiple servers before making > the feature available to clients. FWIW, I do not mind, and probably prefer if I think about it a bit longer, to make it an opt-in feature, like all other capabilities defined in the serve.c file. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-30 19:10 ` Jeff King 2021-08-30 19:43 ` Junio C Hamano @ 2021-08-31 6:38 ` Stef Bon 2021-08-31 7:07 ` Jeff King 1 sibling, 1 reply; 12+ messages in thread From: Stef Bon @ 2021-08-31 6:38 UTC (permalink / raw) To: Jeff King; +Cc: Git Users Hi, thank you for the answer. I understand that the core of git is to make people work together when writing code. To get a tree of the source files is not directly part of that, but pure informational. That is also the intent of my fuse fs: provide the user information about the source files. Now I have a working ssh connection to the server, and open a channel for running the upload-pack on the server using the exec channel request: https://datatracker.ietf.org/doc/html/rfc4254#section-6.5 So in my program I do not have to do something like: ssh -x git@server "git-upload-pack 'simplegit-progit.git'" It is only the sending of an exec message with the right command. Via the SSH_MSG_CHANNEL_DATA message the server will return the output. In my program I have to write a parser to get the tree/direntries. Now you suggest the git clone --depth 1 --filter=blob:none --single-branch -b $branch command. How does that look when writing it in lowlevel git messages as described in https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols ? I'm programming at this low level, so I have to write the messages to send to the server myself. And you mention the api github has for a git tree object. But git2 has already the git_tree object? Stef My project by the way is: https://github.com/stefbon/OSNS ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-31 6:38 ` Stef Bon @ 2021-08-31 7:07 ` Jeff King 2021-08-31 9:44 ` Stef Bon 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2021-08-31 7:07 UTC (permalink / raw) To: Stef Bon; +Cc: Git Users On Tue, Aug 31, 2021 at 08:38:39AM +0200, Stef Bon wrote: > So in my program I do not have to do something like: > > ssh -x git@server "git-upload-pack 'simplegit-progit.git'" > > It is only the sending of an exec message with the right command. > Via the SSH_MSG_CHANNEL_DATA message the server will return the > output. In my program I have to write a parser to get the > tree/direntries. > > Now you suggest the git clone --depth 1 --filter=blob:none > --single-branch -b $branch > command. How does that look when writing it in lowlevel git messages > as described in > > https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols > > ? > I'm programming at this low level, so I have to write the messages to > send to the server myself. You'll have to read the documentation I pointed to earlier: https://github.com/git/git/blob/master/Documentation/technical/pack-protocol.txt In short: the server tells you which refs it has and what they point to, then the client says which objects it wants and which objects it has, and then the server sends a packfile. The flow of the protocol and the format of the messages is laid out there. You might also set GIT_TRACE_PACKET=1 in your environment and try running some Git commands. They will show you what's being said on the wire, up until the packfile is sent (decoding the packfile itself is a whole other story). > And you mention the api github has for a git tree object. But git2 has > already the git_tree object? If you mean libgit2, then yes, it has a git_tree struct. Just like we have internally within regular Git. But those are for accessing _local_ objects, that have already been fetched. You could build a fuse filesystem around a local Git repository pretty easily, either by using libgit2 or around tools like "git ls-tree" and "git cat-file". But if your purpose is to access a remote one without downloading all of the objects first, then no, Git does not expose any of the endpoints you'd need remotely (but provider-specific APIs like GitHub's do). -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-31 7:07 ` Jeff King @ 2021-08-31 9:44 ` Stef Bon 2021-08-31 14:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 12+ messages in thread From: Stef Bon @ 2021-08-31 9:44 UTC (permalink / raw) To: Jeff King; +Cc: Git Users Op di 31 aug. 2021 om 09:07 schreef Jeff King <peff@peff.net>: > > On Tue, Aug 31, 2021 at 08:38:39AM +0200, Stef Bon wrote: > > You might also set GIT_TRACE_PACKET=1 in your environment and try > running some Git commands. They will show you what's being said on the > wire, up until the packfile is sent (decoding the packfile itself is a > whole other story). > Yes that will give me the insight I need. I will come back when it comes to decoding the packfile. Thanks, Stef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Exec upload-pack on remote with what parameters to get direntries. 2021-08-31 9:44 ` Stef Bon @ 2021-08-31 14:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 12+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-08-31 14:01 UTC (permalink / raw) To: Stef Bon; +Cc: Jeff King, Git Users On Tue, Aug 31 2021, Stef Bon wrote: > Op di 31 aug. 2021 om 09:07 schreef Jeff King <peff@peff.net>: >> >> On Tue, Aug 31, 2021 at 08:38:39AM +0200, Stef Bon wrote: >> > >> You might also set GIT_TRACE_PACKET=1 in your environment and try >> running some Git commands. They will show you what's being said on the >> wire, up until the packfile is sent (decoding the packfile itself is a >> whole other story). >> > > Yes that will give me the insight I need. > I will come back when it comes to decoding the packfile. Aside from the "here's how you can do it", you haven't said why you'd like to do such "online" browsing of the repository. I'd think that even for something that e.g. implements a file browser with magic git-remote support (think GNOME VFS-like), what you'd want to do in the background would be to do a "clone", although a clone with some combination of --single-branch, --no-tags, and perhaps --depth and the filters discussed upthread. It will take the same time to get the pack, but once you do you can use libgit2, git's plumbing etc. to do really fast browsing/wildcarding etc. of the entries locally. So is there a real performance or other use-case for wanting to do this, or does it just come down a lack of nice a "one-shot" API for "list remote files?". In any case, on the topic of clever things you can (ab)use to do this, some remotes support running "git archive" for you. Notably GitHub doesn't, but GitLab does. Please don't take this as an endorsement to run this command "in production" $ time (git archive --format=tar --remote=git@gitlab.com:git-vcs/git.git --prefix=t/t4018/ HEAD:t/t4018 | tar -tf- | head -n 3) t/t4018/ t/t4018/README t/t4018/bash-arithmetic-function real 0m1.545s I idly wonder if there's a want/need for a file listing API whether doing so via the tar/zip format wouldn't be a more viable & widely supported thing than expecting everyone to come up with their own git packfile decoders. I.e. if we just supported some option to create all-empty dummy files via "git archive" this could be even better as a dummy file listing API. Right now this (ab)use of it requires e.g. sending ~10MB of t/'s content just to list everything in the t/ directory. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-31 16:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-28 12:56 Exec upload-pack on remote with what parameters to get direntries Stef Bon 2021-08-30 19:10 ` Jeff King 2021-08-30 19:43 ` Junio C Hamano 2021-08-30 20:46 ` Jeff King 2021-08-30 21:21 ` Junio C Hamano 2021-08-31 14:23 ` Ævar Arnfjörð Bjarmason 2021-08-31 15:35 ` Bruno Albuquerque 2021-08-31 16:23 ` Junio C Hamano 2021-08-31 6:38 ` Stef Bon 2021-08-31 7:07 ` Jeff King 2021-08-31 9:44 ` Stef Bon 2021-08-31 14:01 ` Ævar Arnfjörð Bjarmason
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.