Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
@ 2020-03-12  3:55 James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
                   ` (19 more replies)
  0 siblings, 20 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  3:55 UTC (permalink / raw)
  To: git

It was great to see everyone at the Contributor Summit last week, in 
person and virtually.

Particular thanks go to Peff for facilitating, and to GitHub for 
organizing the logistics of the meeting place and food. Thank you!

On the day, the topics below were discussed:

1. Ref table (8 votes)
2. Hooks in the future (7 votes)
3. Obliterate (6 votes)
4. Sparse checkout (5 votes)
5. Partial Clone (6 votes)
6. GC strategies (6 votes)
7. Background operations/maintenance (4 votes)
8. Push performance (4 votes)
9. Obsolescence markers and evolve (4 votes)
10. Expel ‘git shell’? (3 votes)
11. GPL enforcement (3 votes)
12. Test harness improvements (3 votes)
13. Cross implementation test suite (3 votes)
14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
15. Reachability checks (2 votes)
16. “I want a reviewer” (2 votes)
17. Security (2 votes)

Notes were taken in the linked Google Doc, but for those who’d rather 
read the notes here, I’ll also send the notes as replies to this 
message.

https://docs.google.com/document/d/15a_MPnKaEPbC92a4jhprlHvkyirDh2CtTtgOxNbnIbA/edit#heading=h.vvhyp0oa4hhz

Regards,
James

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 1/17] Reftable
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
@ 2020-03-12  3:56 ` James Ramsay
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  3:56 UTC (permalink / raw)
  To: git

1. In case you’re not aware what it is. It was introduced in JGit. 
???Prefix table??

2. Gerrit team likes to get this in cgit

3. From the Stump the Experts yesterday, the question was “If you 
could go back and change anything what would it be?”: Loose refs can 
cause difficulties. So it would be nice to make reftables a first-class 
citizen. There are issues with OSes with case-insensitive filesystems. 
Reftables can help with this.

4. Stolee: contributing an entire copy of the source of a library 
elsewhere as one patch makes it hard to review, and doesn’t feel like 
a contribution to Git.

5. Brian: agree. Is it an external library that needs to be pulled in 
every time a new version added in JGit.

6. Edward: having it as external library moves the maintenance burden

7. Jonathan N: example of xdiff, we have a copy, Mercurial has a copy, 
and they have been patched in different ways. Can we separate these 
concerns? One: patches that can be reviewed separately. Two: licensing. 
Three: ongoing maintenance approach.

8. Peff: benefits of external library are clear. What is the maintenance 
burden of not maintaining this in the core git tree. More concerned 
about niceties in Git that aren’t in other libraries, like strbufs and 
data structures. Lowest common denominator isn’t ideal. Can this cost 
be mitigated?

9. Ed: I have the same concerns. We also have strbufs, but they are not 
the same. We also might run into licensing issues.

10. Stolee: also cross platform compatibility… It might not perform 
well on different platforms.
Peff: It feels to me there are a lot of hairy filesystem details 
reftables need to do.

11. Brian: Atomic renames have issues on Windows.

12. Jonathan N: Han-Wen wanted a more substantial review, and we just 
provided one (actionable for

13. Jonathan: write a summary email to Han-Wen)

14. Brian: (inaudible) Having a reftable library would be interesting to 
test SHA256 changes.

15. Stolee: would be nice to have tests regarding case-sensitivity & 
directory/file conflicts

16. Ed: wait, are we loosening the restriction?

17. Peff: no, for backwards-compatibility we cannot. Would love to get 
rid of that restriction, though.

18. Jonathan N: Immediate benefit wrt D/F conflicts is being able to 
keep reflogs for deleted branches

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 2/17] Hooks in the future
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
@ 2020-03-12  3:56 ` James Ramsay
  2020-03-12 14:16   ` Emily Shaffer
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: James Ramsay @ 2020-03-12  3:56 UTC (permalink / raw)
  To: git

1. Emily: hooks in the config file. Sent a read only patch, but didn’t 
get much traction. Add a new header in the config file, then have prefix 
number so that security scan could be configured at system level that 
would run last, and then hook could also be configured at the project 
level.

2. Peff: Having hooks in the config would be nice. But don’t do it at 
`hooks.prereceive`, but use a subconfig like `hooks.prereceive.command` 
so it’s possible to add options later on.

3. Brian: sometimes the need to overridden, ordering works for me. For 
Git LFS it would be helpful to have a pre-push hook for LFS, and a 
different one for something else. Want flexibility about finding and 
discovering hooks.

4. Emily: if you want to specify a hook that is in the work tree, then 
it has to be configured after cloning.

5. Jonathan: It’s better to start with something low complexity as 
long as it can be extended/changed later. If there's room to tweak it 
over time then I'm not too worried about the initial version being 
perfect — we can make mistakes and learn from them. A challenge will 
be how hooks interact. Analogy to the challenges of stacked union 
filesystems and security modules in Linux. Analogy to sequence number 
allocation for unit scripts

6. CB: Declare dependencies instead of a sequence number? In theory 
independent hooks can also run in parallel.

7. Peff: Maybe that’s something to not worry about from the start. 
Like, how many hooks do you expect to run anyway.

8. Christian: At booking.com they use a lot of hooks, and they also sent 
patches to the mailing list to improve that.

9. Emily: In-tree hooks?

10. Brian: You can do `git cat-file <ref> | sh` to run a hook.

11. Brandon: Is it possible to globally to disable all hooks locally? It 
might be a security concern. Or is it something we might want to add?

12. Peff: No it’s not.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 3/17] Obliterate
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
  2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
@ 2020-03-12  3:57 ` James Ramsay
  2020-03-12 18:06   ` Konstantin Ryabitsev
  2020-03-15 22:19   ` Damien Robert
  2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  3:57 UTC (permalink / raw)
  To: git

1. Jonathan N: sometimes people accidentally add a big file they don’t 
need. Have to use BFG and it’s a pain. Next time, maybe you just deal 
with it and ignore. This happened to Chrome. Some huge blob that was in 
the repo, should no longer be in the repo, but don’t want to rewrite 
the history. Other use cases are confidential information, like 
password, credit card number etc. Initial reactions: it’s already out 
there, rotate. Second reaction: if it’s a toxic blob it needs to be 
removed everywhere! What if someone taught kernel repo to

2. James: I’ve been in a lot of meetings with customers where they 
mentioned it’s not possible to rotate the information that was leaked 
into the repo

3. Demetr: How far back do we allow to go to obliterate?

4. Jonathan N: there are indeed horrible real-world examples where 
things to be obliterated are from a long time ago.

5. James: real cost to changing object ids: Git and tools interacting 
with it really assume that history is immutable.

6. Elijah: replace refs helps, but not supported by hosts like GitHub 
etc

     a. Stolee: breaks commit graph because of generation numbers.
     b. Replace refs for blobs, then special packfile, there were edge 
cases.

7. Demetr: Backward compatibility, wouldn’t custom handling be 
problematic for old clients.

8. Jeff H: can we introduce a new type of object -- a "revoked blob" if 
you will that burns the original one but also holds the original SHA in 
the ODB ??

9. Peff: what would this mean for signatures? New opportunity to forge 
signatures.

10. Jonathan N: if a new entity, this means you’ve changed the content 
which we want to avoid. Maybe a list of revoked blobs. If fsck notices 
missing, it should be happy. Protocol support, if someone tries to 
include a patch with it, just ignore it. Not great. Improvement would be 
to send a list of things I deliberately didn’t send. Could also 
communicate blobs to be deleted, but ignore that for v1. Learn from 
Mercurial who have a very complicated signed revocation mechanism.

11. Brian: the remote can’t be trusted, ala leftpad maintainer could 
do something malicious causing repo to become invalid.

12. Jonathan N: main scenario I’m considering is trusted company 
remote.

13. Terry: partial clone and solve large files. Maybe the server could 
handle it by converting normal clone into partial, and then handle the 
error if someone asks for that blob.

14. Jakub: one idea would simply be to treat this as a missing blob in a 
partial clone

15. Michael Haggerty: does this only apply to blobs? (Peff: no, commit 
messages can contain sensitive information; Johannes: trees contain file 
names which also can contain sensitive information)

16. Jonathan N: partial clone is not a solution for the desire to get 
rid of the blob on the server side.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 4/17] Sparse checkout
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (2 preceding siblings ...)
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
@ 2020-03-12  3:58 ` James Ramsay
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  3:58 UTC (permalink / raw)
  To: git

1. Stolee: built in! We’re making improvements! We’re looking at UX 
- add, remove, state. Harder things like grep needs everything, but not 
expected.

2. Elijah: we do have a strongly worded warning, so we could just change 
it.

3. Jonathan N: I like both modes.

4. Terry: how is GC wired up? If I change a cone, will it be reclaimed?

5. Stolee: GC doesn’t remove reachable objects. Haven’t found people 
need to do this, unless they accidentally rehydrated something massive 
they didn’t really need. Day to day work doesn’t introduce too much.

6. Terry: Android devs have massive special machines. Constantly running 
out of disk space.

7. Stolee: more of a partial clone feature, than a sparse checkout 
feature. If I checkout three branches, go offline, I don’t want GC to 
clean things that I had downloaded.

8. Jonathan N: switching between Word and Powerpoint. Would it be useful 
to attach cone to branch rather than repo.

9. Stolee: Office team is building some kind of magic to automatically 
detect from branch.

10. Brian: can use reflog maybe. Prune based on that? People who run out 
of disk space could have shorter reflog.

11. Elijah: biggest problem people run into doing a rebase/pull, hit 
conflicts, then they need to update sparsity patterns, which they 
can’t do because there are conflicts. Working on a patch.

12. Stolee: Office scoper tool would automatically recalculate 
dependencies and update sparsity config so that they can build.

13. During break, Minh brought up an idea that we could use in-tree data 
to manage the dependency chain: The tree could contain files that 
contain directory names, and users use config to specify the list of 
those files to use for the sparse-checkout definition. When Git updates 
the working directory and those files change, the sparse-checkout can be 
updated to include the union of those directories. Stolee will look into 
how this could work and whether this works for existing customers.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 5/17] Partial Clone
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (3 preceding siblings ...)
  2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
@ 2020-03-12  4:00 ` James Ramsay
  2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
  2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:00 UTC (permalink / raw)
  To: git

1. Stolee: what is the status, who is deploying it, what issues need to 
be handled? Example, downloading tags. Hard to highly recommend it.

2. Taylor: we deployed it. No activity except for internal testing. Some 
more activity, but no crashes. Have been dragging our feet. Chicken egg, 
can’t deploy it because the client may not work, but hoping to hear 
about problems.

3. ZJ: dark launched for a mission critical repos. Internal questions 
from CI team, not sure about performance. Build farm hitting it hard 
with different filter specs.

4. Taylor: we have patches we are promising to the list. Blob none and 
limit, for now, but add them incrementally. Bitmap patches are on the 
list

5. James: I’ve been talking to customers who have high interest in 
this. But they are hesitant. Do people have similar situations, like 
shallow clones?

6. Jonathan N: we’re not using it en masse with server farms (see 
Terry’s stats). Performance issues with catchup, long periods of 
downloading and no progress. Missing progress display means a user waits 
and gets worried. On server side, reachability check can be expensive, 
in part because enumerating refs is expensive.

7. Peff: client experience sucks with N+1 situations. If the server 
operator side is tolerable, that way it’s easier to move the client 
side forward. By default, v2 just serves them up, no reachability check. 
Not sure if we’ll do that forever. Often have to inflate blobs that 
are deltas, and then delta compression which is not needed.

8. Stolee: Jonathan built a batch download when changing trees. Possible 
to improve by sending haves.

9. Jonathan N: if you’re in the blob none filter, and say I have a 
commit, I might not actually have what the server expects.

10. Peff: could enumerate blobs

12. Demetr: Partial clones are dangerous for DoS attacks

12. Jonathan: JGit forbids most filters that can't use bitmaps.

13. Peff: just blob filters? Yes, so far.

14. Jonathan: as far as the client experience goes, we’re not batching 
often enough and not showing progress on catch-up fetches. Any other UX 
issues?

15. Jeff: no, those two are what I meant.

16. James: another question for git service providers: Is it a 
replacement for LFS?

17. Brian: some files can compress, others don’t. Repacking can blow 
up if you try to compress something that can’t be compressed. How do 
we identify which objects we compress, and which we don’t.

18. Jonathan N: if you see something already compressed, tell zlib to do 
passthrough compression.

19. Taylor: two problems - which projects do you want to quarantine, 
where do you put them. CDN offloading would be nice.

20. Stolee: reachability bitmaps are tied to a single packfile. Becomes 
more and more expensive. Even just having them in another file requires 
a lot of work.

21. Taylor: we’re looking at some heuristics so that some parts of the 
pack can just be moved over verbatim.

22. Peff: I see three problems: multi pack lookups, bitmaps,

23. Jonathan N: we never generate on the fly deltas

24. Peff: there are pathological cases.

25. Terry: we are seeing 89k partial clones per day. Majority is clone. 
Shallow clone equivalent.

26. Peff: why? Is it better?

27. Jonathan N: initial clone is about the same as shallow. One reason 
we encourage, if you do a follow up, with shallow clone it is expensive 
for the server.

28. Stolee: if you persist the previous shallow clone, it is much much 
cheaper to do incremental fetch.

29. Terry: JGit has enough shallow clone bugs that we often just send 
everything. Make shallow clone obsolete

30. Jonathan N: Jenkins style CI, option for shallow clone. Want to run 
diff or git describe, have to turn it off. Partial clone is simpler.

31. Minh: could the server force the client to partial clone?

32. Brian: risks, working on an airplane. I don’t want to do any kind 
of fetch operation on poor connection. Could be good for CI, but don’t 
want to break things for humans.

33. Jonathan N: if I am going to get on an airplane, is there a way to 
fill it in the background. There are workarounds, like run `git show` 
which needs everything.

34. Elijah: I want to fetch a bunch more stuff, but don’t fetch 
anymore, throw an error rather than hanging.

35. Jonathan: filter blob:none is people's first experience of the 
feature. Make it a first class ui concept, present a user oriented UI 
like git sparse-checkout?

36. Taylor: It looks like it’s simple to use, but there’s a lot to 
do to actually use it. And Scalar is doing that for you.

37. James: Some of our customers would be interested to have a feature 
that pushes down configuration to all the users. It would give them LFS 
by default, without the end-users doing something.

38. Jonathan: We considered enabling a global config at Google. For 
example for 1+GB files.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 6/17] GC strategies
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (4 preceding siblings ...)
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
@ 2020-03-12  4:01 ` James Ramsay
  2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:01 UTC (permalink / raw)
  To: git

1. Jonathan N: Git has a flexible packfile format. Compared to CVS where 
things are stored as deltas against the next revision of the same file. 
GC can be a huge operation if it’s not done regularly. "git gc" makes 
one huge pack. Better amortized behavior to have multiple packs with 
exponentially increasing size and combine them when needed (Martin 
Dick's exproll).

2. Jonathan N: There are also unreachable objects to take care about. GC 
can/should delete them. But at the same time someone else might be 
creating history that still needs those objects. To give objects a grace 
period, we turn the unused objects into loose objects and look at the 
creation time. But alternatively there’s the proposal to move these 
unreachable objects into a packfile for all these objects. But this can 
be a problem for older git clients, because they might not know the pack 
is garbage and might move objects across packs. See the hash function 
transition doc for details.

3. Terry: JGit has these unreachable garbage packs

4. Peff: You want to solve this loose objects explosion problem?

5. Peff: what if you reference an object in the garbage pack from an 
object in a non-garbage pack?

6. Jonathan N: At GC time the object from the garbage pack is copied to 
a non-garbage pack. Basically rescue it from the garbage. It only saves 
the referenced objects, not the whole garbage pack.

7. Jonathan N: It has been running in production for >2 years.

8. Peff: There are so many non-atomic operations that can happen. And 
races can happen.

9. Jonathan N: If you find races, please comment on the JGit change that 
describes the algorithm. Happens-before relation and grace period.

10. `git gc --prune-now` should no longer create loose objects first, 
before just deleting them.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 7/17] Background operations/maintenance
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (5 preceding siblings ...)
  2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
@ 2020-03-12  4:02 ` James Ramsay
  2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:02 UTC (permalink / raw)
  To: git

1. Stolee: Are we interested in having a background process doing 
things?

2. Emily: There are a lot of different ways to do this. Even only 
looking at Linux there are different ways.

3. Stolee: Without looking at how? What background operations would we 
like to have?

4. Emily: Is it a good candidate for `git undo`? To keep track of what 
the user was doing and to make it possible to roll back?

5. Brian: It can run into scalability issues. Also there might be repos 
on my disk that never change and don’t need background processing. At 
GitHub we do maintenance based on the number of pushes.

6. Stolee: Kind of maintenance will differ from client and server, 
interests are different. For Scalar we have this one process looking at 
all repos and will do operations on them.

7. Peff: On server-side you’ll have millions of repos and even one 
process looking at all processes have impact on the system. Most hosting 
providers already have services taking care of this, so I think this 
feature is only interesting for client-side.

8. Brian: We should be careful. For example I’m constantly creating 
test repos in /tmp.

9. Stolee: Thanks for the input, we’ll do research and come back to 
this.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 8/17] Push performance
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (6 preceding siblings ...)
  2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
@ 2020-03-12  4:03 ` James Ramsay
  2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:03 UTC (permalink / raw)
  To: git

1. Terry: Chrome has 500MB file pushed up. Using Gerrit, feature work 
becomes stale over a few days, then push. For a few months pushes would 
push gigabytes of data.

2. Stolee: where we do the tree walk, we are doing it from the merge 
base.
Jonathan N: Minh rescued us by advertising more .have refs to avoid it 
being pushed. In protocol V2 for push there are 3 major changes 
proposed: one, abbreviating ref advertisement; two, adding negotiation; 
three, push to fast moving ref if you don’t care if its a fast 
forward. Are there other cases?

3. Minh: performance on reachability. Would help to know what branch you 
are pushing.

4. Peff: I might be pushing a random sha, without a branch.

5. Brian: I’ve seen cases with 80k refs, we tried then to send minimal 
amounts of objects. We spend a lot of time negotiating, to eventually 
only send 4 objects. It’s not very efficient, you could just spend 
less time on that and send a few more objects.

6. Minh: can we invert the pattern? Just send the new thing, and then 
the server says give me more.

7. Peff: You’ll get N+1 issues.

8. Jonathan N: I like Jeff Hostetler’s idea in Zoom chat. You can look 
at the branch and see when the author changes and use that as a crude 
heuristic to ask the server if they have that commit.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 9/17] Obsolescence markers and evolve
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (7 preceding siblings ...)
  2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
@ 2020-03-12  4:04 ` James Ramsay
  2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:04 UTC (permalink / raw)
  To: git

1. Brandon: I thought it would be interesting to have a similar feature 
as Mercurial has. Mercurial evolve will help you do a big rebase commit 
by commit. Giving you more insights how commits change over time.

2. Peff: This has been discussed a lot of time on the list already.

3. Jonathan N: It will help with Googlers productivity, but it’s 
smaller compared to other performance fixes.

4. Brian: It’s a great feature and I would like to have it, but I’m 
not sure it gives enough value to someone to sit down and implement it.

5. Emily: Is it a good candidate for GSoC?

6. Brian: If we have a good design.

7. Stolee: It should be easier to use than interactive rebase.

8. Stolee: It would be nice to have instead of fixup commits I would 
send to you new commits which mark your original commits are obsolete.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 10/17] Expel ‘git shell’?
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (8 preceding siblings ...)
  2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
@ 2020-03-12  4:05 ` James Ramsay
  2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:05 UTC (permalink / raw)
  To: git

1. Jonathan N: Cannot use safely on its own. So why do we still have it?

2. Jonathan N: It’s not an interactive shell, it’s a login shell. To 
give the user only access to a git repo.

3. Jonathan N: Gitolite is the only sensible thing that uses git-shell. 
If this is the only good use-case? So can we donate it to them?

4. Peff: If it’s a tool for security, but no one is using it, so 
it’s dangerous to have it around. It’s mostly stand-alone, so it 
should be possible.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 11/17] GPL enforcement
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (9 preceding siblings ...)
  2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
@ 2020-03-12  4:07 ` James Ramsay
  2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:07 UTC (permalink / raw)
  To: git

1. Peff: Hypothetically if a company would not distribute the sources of 
a modified version of git they ship. What should we do about it? Should 
we take legal actions and make them aware that they are doing something 
they should not do? Making sure they also treat other projects better.

2. Brandon: Would we gather together with other projects affected by 
this company?

3. Peff: How hard do we want to take it on this company?

4. Ed: I’m also bothered by this. And they just send me a tar ball. At 
Microsoft we are aggressive about doing this right.

5. Jonathan N: Can we make it really easy to comply, e.g. by making a 
build target that contains everything?

6. ZJ: They could just push to GitHub. That would be fine.

7. Peff: I brought this up, because we were made aware of this by the 
Conservancy. So I wanted to hear how people are feeling about it.

8. Brian: A more aggressive approach would be appropriate if we have 
made them aware of the issue and they decided to not comply on purpose.

9. Peff: Code change doesn’t matter, whether it’s a security fix or 
feature. And I’m fine giving them a bit of lag time, like a day. Not a 
day.

10. Peff: You’re not obliged to send the source code, but you should 
provide the offer to share the source. In this case, they sent us a tar 
ball, but the sources are not on their open-source. So they probably do 
not yet apply.

11. CB: But everyone on Mac can request to send you the source code. We 
could release a form somewhere to give people an easy option to request 
this.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 12/17] Test harness improvements
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (10 preceding siblings ...)
  2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
@ 2020-03-12  4:08 ` James Ramsay
  2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:08 UTC (permalink / raw)
  To: git

1. Jonathan N: Test harness is an important part of the development 
process, shapes what kinds of tests people write. What can we improve?

2. Peff: I love our test harness

3. Brian: It’s amazing for integration tests. For our C code it’s a 
lot harder to do unit tests. We sometimes have a portability issue, 
about POSIX shell vs others.

4. ZJ: I like how it also acts as a piece of documentation.

5. Jonathan N: If we had more unit tests: if I am working on refs, I 
might like to run all tests related to that. And now we have this lack 
of dependency graph between this

6. Peff: I’m super nervous about that. Tools like code coverage could 
do this. But I’ve seen cases where all new tests are green, and tests 
in the area I expected they succeed. But at some far corner it seems to 
fail. So you’re optimizing for speed, might be losing in correctness. 
I’m biased because I can run all tests on my computer in 1 minute. But 
for Windows this doesn’t seem to work that.

7. Peff: We can spend time on speeding up things, making it better 
parallelized for example. I’ll send some patches out on this.

8. Jonathan N: Really nice contribution to Git by David Barr, whose 
background was as a Java developer and thus the code was written in a 
Java way with clear API boundaries and unit tests.

9. Brian: Yes if your function is doing too much, it should be split up 
making it possible to test the separate pieces and then have a function 
that calls those and tests the end result.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 13/17] Cross implementation test suite
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (11 preceding siblings ...)
  2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
@ 2020-03-12  4:09 ` James Ramsay
  2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:09 UTC (permalink / raw)
  To: git

1. Carlos: some aspects are under specified, or work in very specific 
ways, but need agreement of correct behaviour. For example implementing 
a command line tool that will have expectations, or expected repo state 
so another tool can generate the right output. For example libgit2 
keeping up with ignore rules. How does JGit handle this?

2. Jonathan N: JGit has some tests of matching behavior which I do not 
like. Invokes git-grep, generate patterns and compare output. Having 
non-deterministic tests is not great. I like the idea of table driven 
tests, common data, but different manifestations of how you test those 
things.

3. Patrick: config formatted tests, need to write drivers for other 
projects. Stopped because writing all the tests in this format was not 
fun. Basics work though. Spoke to Peff 2 years ago, likely easy to write 
drivers for Git.

4. Peff: already replaced tests with table driven, and prefer that. 
There are table driven tests for attribute matching.

5. Brian: valuable for LFS. Know attribute matching is not up to spec. 
Could benefit from the tests to help identify gaps. We are MIT licensed, 
so we can’t just drop them in, but we could import them in CI.

6. Peff: make whatever is in Git as authority, add tables, and then 
these can be used by other projects.

7. Jonathan N: example is diff tests

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity?
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (12 preceding siblings ...)
  2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
@ 2020-03-12  4:11 ` James Ramsay
  2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:11 UTC (permalink / raw)
  To: git

1. Elijah: ORT stands for Ostensibly Recursive’s Twin. As a merge 
strategy, just like you can call ‘git merge -s recursive’ you can 
call ‘git merge -s ort’.  Git’s option parsing doesn’t require 
the space after the ‘-s’.

2. Major question is about performance & possible layering violations. 
Merge recursive calls unpacks trees to walks trees, then needs to get 
all file and directory names and so walks the trees on the right again, 
and then the trees on the left again. Then diff needs to walk sets of 
trees twice, and then insert_stage_data() does a bunch more narrow tree 
walks for rename detection. Lots of tree walking. Replaced that with two 
tree walks.

3. Using traverse_trees() instead of unpack_trees(), and avoid the index 
entirely (not even touching or creating cache_entry’s), and building 
up information as I need. I’m not calling diffcore_std(), but instead 
directly calling diffcore_rename(). Is this horrifying? Or is it 
justified by the performance gains?

4. Peff: both, some of it sounds like an improvement, but maybe there 
were hidden benefits previously.

5. Elijah: I write to a tree before I do anything.

6. Peff: I like that. Seems like a clean up to me. We have written 
libgit2-like code for merging server-side

7. Elijah: I’ve been adding tests for the past few years, more to add, 
feel good about it.

8. Jonathan N: If you are using a lower-layer thing, I would not say 
you’re not doing anything you shouldn’t. But if you docs say you 
should not to use diffcore_rename(), you can update the docs to say that 
it’s fine to use it.

9. Elijah: three places directly write tree objects. All have different 
data structures they are writing from. Should I pull them out? But then 
my data structure was also different, so I’d have a fourth.

10. Peff: not worried because trees are simple. Worried about policy 
logic. Can’t write a tree entry with a double slash. Want this to be 
enforced everywhere, but no idea how hard that would be to implement. 
Not about lines of code, but consistency of policy. Fearful that only 
one place does it.

11. Elijah: I know merge-ort checks this, but it’s not nearby, so it 
could change.

12. Peff: as bad as it is to round trip through the index, it may bypass 
quality checks, which you will need to manually implement.

13. Elijah: usability side, with the tree I’ve created, I could have 
.git/AUTOMERGE. I have an old tree, a new tree, and a checkout can get 
me there. Fixed a whole bunch of bugs for sparsity and submodules.

14. Elijah: If we use this to on-the-fly remerge as part of git-log in 
order to compare the merge commit to what the automatic merging would 
have done, where/how should we write objects as we go?

15. Jonathan N: can end up with proliferation of packs, would be nice to 
have similar to fast import and have in memory store. Dream not to have 
loose files written ever.

16. Peff: I like your dream. But fast import packs are bad. We assume 
that packs are good, and thus need to use GC aggressively. This 
increases pollution of that problem. I know about objects, but not 
written to disc, risk that you can write objects that are broken, but 
git doesn’t know because git thinks it has the object but it’s only 
in memory. Log is conceptually a read operation, but this would create 
the need for writes.

17. Elijah: you could write into a temporary directory. Worried about 
`gc --auto` in the middle of my operation. If I write to a temp pack I 
could potentially avoid it.

18. Elijah: large files. Rename detection might not work efficiently OR 
correctly for sufficiently large files (binary or not). Limited bucket 
size means that completely different files treated as renames when both 
are over 8MB. Should big files just not be compared?

19. Peff: maybe we should fix the hash…

20. Elijah: present situation is broken, maybe we can cheat in the short 
term, and avoid fixing?

21. Peff: seems more correct for now, but we’d need to document

22. Elijah: checkout --overwrite-ignore flag. Should merge have the same 
flag.

23. Jonathan N: gitignore original use case was build outputs which can 
be regenerate. But then some people want to ignore `.hg` which is much 
more precious.

24. Peff: we can plumb it through later to other commands

25. Brian: CI doesn’t really care. Moving between branches it would 
complain. For checkout and merge it makes sense to support just 
destroying.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 15/17] Reachability checks
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (13 preceding siblings ...)
  2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
@ 2020-03-12  4:13 ` James Ramsay
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:13 UTC (permalink / raw)
  To: git

1. Jonathan N: seed the idea that it would be nice to hint the ref that 
your commit might be reachable from to help the server avoid iterating 
over all refs. Also, any strategies for speeding up reachability checks?

2. Demitr: reachability by user, or would you consider open to everyone?

3. Stolee: we don’t do branch level security, but we do tailor ref 
list to default, favorites and those you’ve pushed. There is also a 
full endpoint.

4. Brian: security model we have to have is that we assume everyone has 
read to everything. There are too many ways to attack. Useful for 
performance reasons, but not sure reachability checks provide much 
benefit. Don’t think it’s difficult to automate.

5. Demitr: what about security issues

6. Stolee: we’d say find another way.

7. Terry: we have a mono repo, easier to test everything. JGit goes down 
to object level.

8. Peff: Git doesn’t go down to that level, doesn’t validate haves.

9. Jonathan: two lessons, no one except Gerrit cares strongly about 
this; second if we like the model by branch permissions, worth making it 
work well in Git to prevent distance between JGit and Git.

10. Terry: can remove a branch very quickly and prevent new people 
getting it

11. Peff: don’t deny its usefulness, but the performance implication 
is concerning. Trying to keep objects private from determined attackers. 
But pushing a malicious commit to Linux, a user can see it, and won’t 
understand reachability doesn’t imply endorsement.

12. Jonathan: if Git has an easy cheap way to do it, people would use 
it.

13. Peff: have flirted with it, but might have to open 50GB of 
packfiles, or bitmap has corner cases. There are some obvious ways to 
improve, but a lot of work. V2 spec says you’re not allowed to check 
reachability.

14. Jonathan N: nah, it says you don't advertise a capability describing 
whether it is checking reachability.

15. Peff: submodule, but then the commit disappears and becomes 
unreachable. How do you handle?

16. Jonathan N: encourage folks to do fast forward only updates. In 
hooks instead of the git layer

17. Peff: you might not know what ref has reachability to that commit. I 
like the hint thing, if it’s just a hint.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 16/17] “I want a reviewer”
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (14 preceding siblings ...)
  2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
@ 2020-03-12  4:14 ` James Ramsay
  2020-03-12 13:31   ` Emily Shaffer
  2020-03-13 21:25   ` Eric Wong
  2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
                   ` (3 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:14 UTC (permalink / raw)
  To: git

1. Jonathan N: An experience some folks have is, sending a patch and 
hearing nothing. That must mean patch is awesome. But then realize I 
need to do something to get a review. In Git, people like Peff are good 
about responding to newcomers. As an author it can be hard to excite 
people enough to review your patch. Relatedly, you might get a review, 
but it doesn’t give you the feedback you wanted. As a reviewer, you 
want to help people to grow and make progress quickly, but it might not 
be easy to identify patches where this will be possible.

2. Emily: A few months ago we started doing code review book club. Git 
devel IRC, and mailing list, could we be more public about these? I 
queue my patch to list of things that have been idle and needs a review, 
then a bot pops something off the list to increase attention for people 
to review?

3. Jonathan Tan: during book club we discuss and review together. 
Everyone can benefit from review experience and expertise. Emily is 
hoping for similar knowledge transfer in the IRC channel.

4. Brian: general case that patches don’t get lost. There is the git 
context script, but I am now a reviewer because I have touched 
everything for SHA256. But we are losing patches and bug reports because 
things get missed. What tool would we use? How would we do it?

5. Jonathan N: patchwork exists, need to learn how to use it :)

6. Peff: this is all possible on the mailing list. I see things that 
look interesting, and have a to do folder. If someone replies, I’ll 
take it off the list. Once a week go through all the items. I like the 
book club idea, instead of it being ad hoc, or by me, a group of a few 
people review the list in the queue. You might want to use a separate 
tool, like IRC, but it would be good to have it bring it back to the 
mailing list as a summary. Public inbox could be better, but someone 
needs to write it. Maybe nerd snipe Eric?

7. Stolee: not just about doing reviews, but training reviewers.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [TOPIC 17/17] Security
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (15 preceding siblings ...)
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
@ 2020-03-12  4:16 ` James Ramsay
  2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: James Ramsay @ 2020-03-12  4:16 UTC (permalink / raw)
  To: git

1. Demtr: what are people doing to prevent security issues? For example, 
not allowing things into trees that would be problematic for various 
filesystems.

2. Jonathan N: transfer fsck objects by default, to validate at the 
trust boundary (in case some code paths at use time are missing some 
validation)

3. Peff: we have had buffer overflows, most are logic errors, and mostly 
paths related. Recently we’ve tightened up which paths are allowed. 
Forbidding things that might be valid on Linux, but problems on Windows. 
Can’t catch everything though, because Windows is so so complex

4. Stolee: I am fearful, and do not know all the rules.

5. Peff: I don’t think it is possible.

6. Demetr: only latin chars, numbers and a few other characters. Do not 
allow any special symbols.

7. Brian: that’s going to break lots of existing projects. Some 
projects have never been on Windows, and therefore people have no 
concern about Windows. People checking files that are strange to 
deliberately test strange files in their own software. If Windows has an 
API to test filepath, there is not much we can do to protect it. 
Compatibility is important.

8. Peff: probably some cleanup needed, maybe can’t clone git.git. Some 
paths that are innocuous, are a problem in strange situations.

9. Jonathan N: what in Git's design scares the crap out of you?

10. ZJ: GitLab shells out for everything. We had injections. Now we have 
a DSL to verify things. Looking at --end-of-options.

11. Peff: C is terrifying. Rust rewrite please. Still have integer 
overflow risks. Tried to deal with it a few years ago, and found some 
more a few months back. A happy story: OID array uses signed integer, 
because no-one has more than 2billion objects. Someone had 3billion 
objects. Just the SHA1s are 60GB. Found it because it triggered overflow 
in st_add. As soon as they wrapped around, it crashed, preventing under 
allocation

12. Jeff H: communication between processes

13. <musical interlude>

14. Peff: I feel good about where we read and write strings to each 
other. Maybe if we were using JSON encode/decode it might be easier to 
handle obscure cases

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
@ 2020-03-12 13:31   ` Emily Shaffer
  2020-03-12 17:31     ` Konstantin Ryabitsev
  2020-03-17  0:43     ` Philippe Blain
  2020-03-13 21:25   ` Eric Wong
  1 sibling, 2 replies; 62+ messages in thread
From: Emily Shaffer @ 2020-03-12 13:31 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
> 5. Jonathan N: patchwork exists, need to learn how to use it :)

We've actually got a meeting with some Patchwork folks today - if
anybody has a burning need they want filled via Patchwork, just say so,
and we'll try to ask.

 - Emily

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
@ 2020-03-12 14:16   ` Emily Shaffer
  2020-03-13 17:56     ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Emily Shaffer @ 2020-03-12 14:16 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:56:53PM +1100, James Ramsay wrote:
> 1. Emily: hooks in the config file. Sent a read only patch, but didn’t get
> much traction. Add a new header in the config file, then have prefix number
> so that security scan could be configured at system level that would run
> last, and then hook could also be configured at the project level.
> 
> 2. Peff: Having hooks in the config would be nice. But don’t do it at
> `hooks.prereceive`, but use a subconfig like `hooks.prereceive.command` so
> it’s possible to add options later on.
> 
> 3. Brian: sometimes the need to overridden, ordering works for me. For Git
> LFS it would be helpful to have a pre-push hook for LFS, and a different one
> for something else. Want flexibility about finding and discovering hooks.
> 
> 4. Emily: if you want to specify a hook that is in the work tree, then it
> has to be configured after cloning.
> 
> 5. Jonathan: It’s better to start with something low complexity as long as
> it can be extended/changed later. If there's room to tweak it over time then
> I'm not too worried about the initial version being perfect — we can make
> mistakes and learn from them. A challenge will be how hooks interact.
> Analogy to the challenges of stacked union filesystems and security modules
> in Linux. Analogy to sequence number allocation for unit scripts
> 
> 6. CB: Declare dependencies instead of a sequence number? In theory
> independent hooks can also run in parallel.
> 
> 7. Peff: Maybe that’s something to not worry about from the start. Like, how
> many hooks do you expect to run anyway.
> 
> 8. Christian: At booking.com they use a lot of hooks, and they also sent
> patches to the mailing list to improve that.
> 
> 9. Emily: In-tree hooks?
> 
> 10. Brian: You can do `git cat-file <ref> | sh` to run a hook.
> 
> 11. Brandon: Is it possible to globally to disable all hooks locally? It
> might be a security concern. Or is it something we might want to add?
> 
> 12. Peff: No it’s not.

Thanks for the notes, James.

I came away with the understanding that we want the config hook to look
something like this (barring misunderstanding of config file syntax,
plus or minus naming quibbles):

[hook "/path/to/executable.sh"]
	event = pre-commit

The idea being that by using a subsection, we can extend the format
later much more easily, but by starting simply, we can start using it
and see what we need or don't want. We can use config order to begin
with.

This means that we could do something like this:

[hook "/path/to/executable.sh"]
	event = pre-commit
	order = 123
	mustSucceed = false
	parallelizable = true

etc, etc as needed.

But I wonder if we also want to be able to do something like this:

[hook "/etc/git-secrets/git-secrets"]
	event = pre-commit
	event = prepare-commit-msg
	...

I guess the point is that we can choose to allow this, or not. I could
see there being some trouble if you wanted the execution order to work
differently (e.g. run it first for pre-commit but last for
prepare-commit-msg)...

I think, though, that something like
hook.pre-commit."path/to/executable.sh" won't work. It doesn't seem like
multiple subsections are OK in config syntax, as far as I can see. I'd
be interested to know I'm wrong :)

Will try and get some work on this soon, but honestly my hope is to get
bugreport squared away first.

 - Emily

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (16 preceding siblings ...)
  2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
@ 2020-03-12 14:38 ` Derrick Stolee
  2020-03-13 20:47 ` Jeff King
  2020-03-15 18:42 ` Jakub Narebski
  19 siblings, 0 replies; 62+ messages in thread
From: Derrick Stolee @ 2020-03-12 14:38 UTC (permalink / raw)
  To: James Ramsay, git

On 3/11/2020 11:55 PM, James Ramsay wrote:
> It was great to see everyone at the Contributor Summit last week, in person and virtually.
> 
> Particular thanks go to Peff for facilitating, and to GitHub for organizing the logistics of the meeting place and food. Thank you!

Thanks for taking excellent notes!

> On the day, the topics below were discussed:
> 
> 1. Ref table (8 votes)
> 2. Hooks in the future (7 votes)
> 3. Obliterate (6 votes)
> 4. Sparse checkout (5 votes)
> 5. Partial Clone (6 votes)
> 6. GC strategies (6 votes)
> 7. Background operations/maintenance (4 votes)
> 8. Push performance (4 votes)
> 9. Obsolescence markers and evolve (4 votes)
> 10. Expel ‘git shell’? (3 votes)
> 11. GPL enforcement (3 votes)
> 12. Test harness improvements (3 votes)
> 13. Cross implementation test suite (3 votes)
> 14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
> 15. Reachability checks (2 votes)
> 16. “I want a reviewer” (2 votes)
> 17. Security (2 votes)

Wow, this split into separate emails was a fantastic idea to control the multi-threaded discussion. Kudos!

-Stolee

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 13:31   ` Emily Shaffer
@ 2020-03-12 17:31     ` Konstantin Ryabitsev
  2020-03-12 17:42       ` Jonathan Nieder
  2020-03-17  0:43     ` Philippe Blain
  1 sibling, 1 reply; 62+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 17:31 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git

On Thu, Mar 12, 2020 at 06:31:27AM -0700, Emily Shaffer wrote:
> On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
> > 5. Jonathan N: patchwork exists, need to learn how to use it :)
> 
> We've actually got a meeting with some Patchwork folks today - if
> anybody has a burning need they want filled via Patchwork, just say so,
> and we'll try to ask.

Just to highlight this -- a long while ago someone asked me to set up a 
patchwork instance for Git, but I believe they never used it:

https://patchwork.kernel.org/project/git/list/

-K

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 17:31     ` Konstantin Ryabitsev
@ 2020-03-12 17:42       ` Jonathan Nieder
  2020-03-12 18:00         ` Konstantin Ryabitsev
  0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Nieder @ 2020-03-12 17:42 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Emily Shaffer, James Ramsay, git

Hi!

Konstantin Ryabitsev wrote:
> On Thu, Mar 12, 2020 at 06:31:27AM -0700, Emily Shaffer wrote:

>> We've actually got a meeting with some Patchwork folks today - if
>> anybody has a burning need they want filled via Patchwork, just say so,
>> and we'll try to ask.
>
> Just to highlight this -- a long while ago someone asked me to set up a
> patchwork instance for Git, but I believe they never used it:
>
> https://patchwork.kernel.org/project/git/list/

That was me.  In fact, we are using it, but mostly read-only (similar
to lore patchwork) so far.  I'm hoping we can learn more about how to
automatically close reviews when a patch has landed, assign delegates
to reviews, set up bundles, etc and write some docs so it becomes
useful to more people.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 17:42       ` Jonathan Nieder
@ 2020-03-12 18:00         ` Konstantin Ryabitsev
  0 siblings, 0 replies; 62+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 18:00 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Emily Shaffer, James Ramsay, git

On Thu, Mar 12, 2020 at 10:42:12AM -0700, Jonathan Nieder wrote:
> >> We've actually got a meeting with some Patchwork folks today - if
> >> anybody has a burning need they want filled via Patchwork, just say so,
> >> and we'll try to ask.
> >
> > Just to highlight this -- a long while ago someone asked me to set up a
> > patchwork instance for Git, but I believe they never used it:
> >
> > https://patchwork.kernel.org/project/git/list/
> 
> That was me.  In fact, we are using it, but mostly read-only (similar
> to lore patchwork) so far.  I'm hoping we can learn more about how to
> automatically close reviews when a patch has landed, assign delegates
> to reviews, set up bundles, etc and write some docs so it becomes
> useful to more people.

FYI, I can set it up with git-patchwork-bot, which does some of the 
above. You can read more here:

https://korg.wiki.kernel.org/userdoc/patchwork#adding_patchwork-bot_integration

If that's something you would like to see, please send a request per 
that doc.

-K

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
@ 2020-03-12 18:06   ` Konstantin Ryabitsev
  2020-03-15 22:19   ` Damien Robert
  1 sibling, 0 replies; 62+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-12 18:06 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:57:24PM +1100, James Ramsay wrote:
> 8. Jeff H: can we introduce a new type of object -- a "revoked blob" if you
> will that burns the original one but also holds the original SHA in the ODB
> ??
> 
> 9. Peff: what would this mean for signatures? New opportunity to forge
> signatures.

Easy, you just quickly find a collision for that blob's sha1 and put 
that in place of the offending original. ;)

(Fully tongue-in-cheek.)

-K

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 2/17] Hooks in the future
  2020-03-12 14:16   ` Emily Shaffer
@ 2020-03-13 17:56     ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2020-03-13 17:56 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git

Emily Shaffer <emilyshaffer@google.com> writes:

> This means that we could do something like this:
>
> [hook "/path/to/executable.sh"]
> 	event = pre-commit
> 	order = 123
> 	mustSucceed = false
> 	parallelizable = true
>
> etc, etc as needed.

You can do

    [hook "pre-commit"]
	order = 123
	path = "/path/to/executable.sh"

    [hook "pre-commit"]
	order = 234
	path = "/path/to/another-executable.sh"

as well, and using the second level for what hook the (sub)section
is about, instead of "we have this path that is used for a hook.
What hook is it?", feels (at least to me) more natural.

> But I wonder if we also want to be able to do something like this:
>
> [hook "/etc/git-secrets/git-secrets"]
> 	event = pre-commit
> 	event = prepare-commit-msg

Once you start going this route, it no longer makes sense to give
priority (you called it "order") to a path and have that same number
used in contexts of different hooks.  Your git-secrets script may
want to be called early among pre-commit hooks but late among the
prepare-commit-msg hooks, for example.

> I think, though, that something like
> hook.pre-commit."path/to/executable.sh" won't work.

That is why Peff already suggested in the TOPIC notes to use
"command" in the message you are responding to (I used "path" in the
above description).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (17 preceding siblings ...)
  2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
@ 2020-03-13 20:47 ` Jeff King
  2020-03-15 18:42 ` Jakub Narebski
  19 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-13 20:47 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

On Thu, Mar 12, 2020 at 02:55:21PM +1100, James Ramsay wrote:

> It was great to see everyone at the Contributor Summit last week, in person
> and virtually.
> 
> Particular thanks go to Peff for facilitating, and to GitHub for organizing
> the logistics of the meeting place and food. Thank you!

Thanks very much to you and others who took these notes! It's nice to
have a more permanent record of these discussions.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
  2020-03-12 13:31   ` Emily Shaffer
@ 2020-03-13 21:25   ` Eric Wong
  2020-03-14 17:27     ` Jeff King
  1 sibling, 1 reply; 62+ messages in thread
From: Eric Wong @ 2020-03-13 21:25 UTC (permalink / raw)
  To: James Ramsay, Jeff King; +Cc: git

James Ramsay <james@jramsay.com.au> wrote:

James: first off, thank you for these accessible summaries for
non-JS users and those who could not attend for various reasons(*)

<snip>

> 6. Peff: this is all possible on the mailing list. I see things that look
> interesting, and have a to do folder. If someone replies, I’ll take it off
> the list. Once a week go through all the items. I like the book club idea,
> instead of it being ad hoc, or by me, a group of a few people review the
> list in the queue. You might want to use a separate tool, like IRC, but it
> would be good to have it bring it back to the mailing list as a summary.
> Public inbox could be better, but someone needs to write it. Maybe nerd
> snipe Eric?

What now? :o

There's a lot of things it could be better at, but a more
concrete idea of what you want would help.

Right now I only have enough resources to do bugfixes along with scalability
and performance improvements so more people can run it and keep
it 100% reproducible and centralization resistant.

I'm also planning on some local tooling along the lines of
notmuch/mairix which is NNTP/HTTPS-aware but not sure when I'll
be able to do that...


(*) I stopped attending events over a decade ago for privacy reasons
    (facial recognition, invasive airport searches, etc.)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-13 21:25   ` Eric Wong
@ 2020-03-14 17:27     ` Jeff King
  2020-03-15  0:36       ` inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”] Eric Wong
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2020-03-14 17:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: James Ramsay, git

On Fri, Mar 13, 2020 at 09:25:31PM +0000, Eric Wong wrote:

> > 6. Peff: this is all possible on the mailing list. I see things that look
> > interesting, and have a to do folder. If someone replies, I’ll take it off
> > the list. Once a week go through all the items. I like the book club idea,
> > instead of it being ad hoc, or by me, a group of a few people review the
> > list in the queue. You might want to use a separate tool, like IRC, but it
> > would be good to have it bring it back to the mailing list as a summary.
> > Public inbox could be better, but someone needs to write it. Maybe nerd
> > snipe Eric?
> 
> What now? :o
> 
> There's a lot of things it could be better at, but a more
> concrete idea of what you want would help.

short answer: searching for threads that only one person participated in

The discussion here was around people finding useful things to do on the
list: triaging or fixing bugs, responding to questions, etc. And I said
my mechanism for doing that was to hold interesting-looking but
not-yet-responded-to mails in my git-list inbox, treating it like a todo
list, and then eventually:

  1. I sweep through and spend time on each one.

  2. I see that somebody else responded, and I drop it from my queue.

  3. It ages out and I figure that it must not have been that important
     (I do this less individually, and more by occasionally declaring
     bankruptcy).

That's easy for me because I use mutt, and I basically keep my own list
archive anyway. But it would probably be possible to use an existing
archive and just search for "threads with only one author from the last
7 days". And people could sweep through that[1].

You already allow date-based searches, so it would really just be adding
the "thread has only one author" search. It's conceptually simple, but
it might be hard to index (because of course it may change as messages
are added to the archive, though any updates are bounded to the set of
threads the new messages are in).

But to be clear, I don't think you have any obligation here. I just
wondered if it might be interesting enough that you would implement it
for fun. :) As far as I'm concerned, if you never implemented another
feature for public-inbox, what you've done already has been a great
service to the community.

-Peff

[1] The obvious thing this lacks compared to my workflow is a way to
    mark threads as "seen" or "not interesting". But that implies
    per-user storage.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”]
  2020-03-14 17:27     ` Jeff King
@ 2020-03-15  0:36       ` Eric Wong
  0 siblings, 0 replies; 62+ messages in thread
From: Eric Wong @ 2020-03-15  0:36 UTC (permalink / raw)
  To: Jeff King; +Cc: James Ramsay, git, meta

Jeff King <peff@peff.net> wrote:
> On Fri, Mar 13, 2020 at 09:25:31PM +0000, Eric Wong wrote:
> 
> > > 6. Peff: this is all possible on the mailing list. I see things that look
> > > interesting, and have a to do folder. If someone replies, I’ll take it off
> > > the list. Once a week go through all the items. I like the book club idea,
> > > instead of it being ad hoc, or by me, a group of a few people review the
> > > list in the queue. You might want to use a separate tool, like IRC, but it
> > > would be good to have it bring it back to the mailing list as a summary.
> > > Public inbox could be better, but someone needs to write it. Maybe nerd
> > > snipe Eric?
> > 
> > What now? :o
> > 
> > There's a lot of things it could be better at, but a more
> > concrete idea of what you want would help.
> 
> short answer: searching for threads that only one person participated in

+Cc meta@public-inbox.org

OK, something I've thought of doing anyways in the past...

> The discussion here was around people finding useful things to do on the
> list: triaging or fixing bugs, responding to questions, etc. And I said
> my mechanism for doing that was to hold interesting-looking but
> not-yet-responded-to mails in my git-list inbox, treating it like a todo
> list, and then eventually:
> 
>   1. I sweep through and spend time on each one.
> 
>   2. I see that somebody else responded, and I drop it from my queue.
> 
>   3. It ages out and I figure that it must not have been that important
>      (I do this less individually, and more by occasionally declaring
>      bankruptcy).
> 
> That's easy for me because I use mutt, and I basically keep my own list
> archive anyway. But it would probably be possible to use an existing
> archive and just search for "threads with only one author from the last
> 7 days". And people could sweep through that[1].
> 
> You already allow date-based searches, so it would really just be adding
> the "thread has only one author" search. It's conceptually simple, but
> it might be hard to index (because of course it may change as messages
> are added to the archive, though any updates are bounded to the set of
> threads the new messages are in).

Exactly on being conceptually simple but requiring some deeper
changes to the way indexing works.  I'll have to think about it
a bit, but it should be doable without being too intrusive,
invasive or expensive for existing users.

> But to be clear, I don't think you have any obligation here. I just
> wondered if it might be interesting enough that you would implement it
> for fun. :) As far as I'm concerned, if you never implemented another
> feature for public-inbox, what you've done already has been a great
> service to the community.

Thanks.  I'll keep that index change in mind and it should be
doable if I remain alive and society doesn't collapse...

> [1] The obvious thing this lacks compared to my workflow is a way to
>     mark threads as "seen" or "not interesting". But that implies
>     per-user storage.

Yeah, that would be part of the local tools bit I've been
thinking about (user labels such as "important", "seen",
"replied", "new", "ignore", ... flags).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
                   ` (18 preceding siblings ...)
  2020-03-13 20:47 ` Jeff King
@ 2020-03-15 18:42 ` Jakub Narebski
  2020-03-16 19:31   ` Jeff King
  19 siblings, 1 reply; 62+ messages in thread
From: Jakub Narebski @ 2020-03-15 18:42 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

"James Ramsay" <james@jramsay.com.au> writes:

> It was great to see everyone at the Contributor Summit last week, in
> person and virtually.
>
> Particular thanks go to Peff for facilitating, and to GitHub for
> organizing the logistics of the meeting place and food. Thank you!
>
> On the day, the topics below were discussed:
>
> 1. Ref table (8 votes)
> 2. Hooks in the future (7 votes)
> 3. Obliterate (6 votes)
> 4. Sparse checkout (5 votes)
> 5. Partial Clone (6 votes)
> 6. GC strategies (6 votes)
> 7. Background operations/maintenance (4 votes)
> 8. Push performance (4 votes)
> 9. Obsolescence markers and evolve (4 votes)
> 10. Expel ‘git shell’? (3 votes)
> 11. GPL enforcement (3 votes)
> 12. Test harness improvements (3 votes)
> 13. Cross implementation test suite (3 votes)
> 14. Aspects of merge-ort: cool, or crimes against humanity? (2 votes)
> 15. Reachability checks (2 votes)
> 16. “I want a reviewer” (2 votes)
> 17. Security (2 votes)

Thank you very much for sending split writeup to the mailing list.

One question to all participating live (in person): how those topics
were proposed, and how they were voted for?  This was done before remote
access was turned on, I think.

Thanks in advance,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
  2020-03-12 18:06   ` Konstantin Ryabitsev
@ 2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
                       ` (3 more replies)
  1 sibling, 4 replies; 62+ messages in thread
From: Damien Robert @ 2020-03-15 22:19 UTC (permalink / raw)
  To: James Ramsay; +Cc: git

From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
> 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>     a. Stolee: breaks commit graph because of generation numbers.
>     b. Replace refs for blobs, then special packfile, there were edge cases.

I am interested in more details on how to handle this using replace.

My situation: coworkers push big files by mistake, I don't want to rewrite
history because they are not too well versed with git, but I want to keep
*my* repo clean.

Partial solution:
- identify the large blobs (easy)
- write a replace ref (easy):
  $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
  and replace the file (if it is still in the repo) by an empty file.

Now the pain points start:
- first the index does not handle replace (I think), so the replaced file
  appear as changed in git status, even through eg git diff shows nothing.

=> Solution: configure .git/info/sparse-checkout

- secondly, I want to remove the large blob from my repo.

Ideally I'd like to repack everything but filter this blob, except that
repack does not understand --filter. So I need to use `git pack-objects`
directly and then do the naming and clean up that repack usually does
manually, which is error prone.

Furthermore, while `git pack-objects` accepts --filter, I can only filter on
blob size, not blob oid. (there is filter=sparse:oid where I could reuse my
sparse checkout file, but I would need to make a blob of it first). And if I
have one large file I want to keep, I cannot filter by blob size.

Another solution would be to use `git unpack-objects` to unpack all objects
(except I would need to do that in an empty git dir), remove the blob, and
then repack everything.

Am I missing a simpler solution?

- finally, checkouting to a ref including the replaced (now missing) blob
  gives error messages of the form:
error: invalid object 100644 b5f74037bb91c45606b233b0ad6aad86f8e3875e for 'Silverman-Height-NonTorsion.pdf'

On the one hand it is reassuring that git checks that the real object
(rather than only the replaced object) is still there, on the other hand it
would be nice to ask git to completely forget about the original object
(except fsck of course).

Thanks,
Damien

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
@ 2020-03-16 12:55     ` Konstantin Tokarev
  2020-03-26 22:27       ` Damien Robert
  2020-03-16 16:32     ` Elijah Newren
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Konstantin Tokarev @ 2020-03-16 12:55 UTC (permalink / raw)
  To: Damien Robert, James Ramsay; +Cc: git



16.03.2020, 02:13, "Damien Robert" <damien.olivier.robert@gmail.com>:
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
>>  6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>>      a. Stolee: breaks commit graph because of generation numbers.
>>      b. Replace refs for blobs, then special packfile, there were edge cases.
>
> I am interested in more details on how to handle this using replace.
>
> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.

Wouldn't it be better to prevent *them* from such mistakes, e.g. by using
pre-push review system like Gerrit?

-- 
Regards,
Konstantin


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
@ 2020-03-16 16:32     ` Elijah Newren
  2020-03-26 22:30       ` Damien Robert
  2020-03-16 18:32     ` Phillip Susi
  2020-03-16 20:01     ` Philip Oakley
  3 siblings, 1 reply; 62+ messages in thread
From: Elijah Newren @ 2020-03-16 16:32 UTC (permalink / raw)
  To: Damien Robert; +Cc: James Ramsay, Git Mailing List

On Sun, Mar 15, 2020 at 4:16 PM Damien Robert
<damien.olivier.robert@gmail.com> wrote:
>
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
> > 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
> >     a. Stolee: breaks commit graph because of generation numbers.
> >     b. Replace refs for blobs, then special packfile, there were edge cases.
>
> I am interested in more details on how to handle this using replace.

This comment at the conference was in reference to how people rewrite
history to remove the big blobs, but then run into issues because
there are many places outside of git that reference old commit IDs
(wiki pages, old emails, issues/tickets, etc.) that are now broken.

replace refs can help in that situation, because replace refs can be
used to not only replace existing objects with something else, they
can be used to replace non-existing objects with something else,
essentially setting up an alias for an object.

git filter-repo uses this when it rewrites history to give you a way
to access NEW commit hashes using OLD commit hashes, despite the old
commit hashes not being stored in the repository.  The old commit
hashes are just replace refs that replace non-existing objects (at
least within the newly rewritten repo) that happen to match old commit
hashes and map to the new commit hashes.  Unfortunately this isn't
quite a perfect solution, there are still three known downsides:

  * replace refs cannot be abbreviated, unlike real object ids.  Thus,
if you have an abbreviated old commit hash, git won't recognize it in
such a setup.
  * commit-graph apparently assumes that the existence of replace refs
implies that commit objects in the repo have likely been replaced
(even though that is not the case for this situation), and thus is
disabled when such refs are present.
  * external GUI programs such as GitHub and Gerrit and likely others
do not honor replace refs, instead showing you some form of "Not
Found" error.


As for using replace refs to attempt to alleviate problems without
rewriting history, that's an even bigger can of worms and it doesn't
solve clone/fetch/gc/fsck nor the many other places you highlighted in
your email.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
  2020-03-16 12:55     ` Konstantin Tokarev
  2020-03-16 16:32     ` Elijah Newren
@ 2020-03-16 18:32     ` Phillip Susi
  2020-03-26 22:37       ` Damien Robert
  2020-03-16 20:01     ` Philip Oakley
  3 siblings, 1 reply; 62+ messages in thread
From: Phillip Susi @ 2020-03-16 18:32 UTC (permalink / raw)
  To: Damien Robert; +Cc: James Ramsay, git


Damien Robert writes:

> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.
>
> Partial solution:
> - identify the large blobs (easy)
> - write a replace ref (easy):
>   $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
>   and replace the file (if it is still in the repo) by an empty file.
>
> Now the pain points start:
> - first the index does not handle replace (I think), so the replaced file
>   appear as changed in git status, even through eg git diff shows nothing.

Instead of replacing the blob with an empty file, why not replace the
tree that references it with one that does not?  That way you won't have
the file in your checkout at all, and the index won't list it so status
won't show it as changed.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Notes from Git Contributor Summit, Los Angeles (April 5, 2020)
  2020-03-15 18:42 ` Jakub Narebski
@ 2020-03-16 19:31   ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-16 19:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: James Ramsay, git

On Sun, Mar 15, 2020 at 07:42:19PM +0100, Jakub Narebski wrote:

> One question to all participating live (in person): how those topics
> were proposed, and how they were voted for?  This was done before remote
> access was turned on, I think.

During breakfast people wrote topics no a whiteboard and people voted on
them by putting a tick-mark on the board. The topics and votes were
transferred to the Google Doc for notes. Next time I think we'll just go
straight to the online doc to save time and make things friendlier for
remote folks (the whiteboard is a holdover from when we didn't have any
remotes). And I'll make it clear that remote people are welcome to add
topics and vote via the doc.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-15 22:19   ` Damien Robert
                       ` (2 preceding siblings ...)
  2020-03-16 18:32     ` Phillip Susi
@ 2020-03-16 20:01     ` Philip Oakley
  3 siblings, 0 replies; 62+ messages in thread
From: Philip Oakley @ 2020-03-16 20:01 UTC (permalink / raw)
  To: Damien Robert, James Ramsay; +Cc: git

Hi Damien, James, (and 4 other who voted for the topic)

I had been thinking about 'missing' blobs for a long while as a earlier
'partial clone' concept (unpublished)

On 15/03/2020 22:19, Damien Robert wrote:
> From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
>> 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>>     a. Stolee: breaks commit graph because of generation numbers.
>>     b. Replace refs for blobs, then special packfile, there were edge cases.
> I am interested in more details on how to handle this using replace.
>
> My situation: coworkers push big files by mistake, I don't want to rewrite
> history because they are not too well versed with git, but I want to keep
> *my* repo clean.
>
> Partial solution:
> - identify the large blobs (easy)
> - write a replace ref (easy):
>   $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
>   and replace the file (if it is still in the repo) by an empty file.

Here, my idea was to create a deliberately malformed blob object that
would allow self reference to say "this blob is deliberately missing".
(i.e. the same content would exist under two oids, one valid, one
invalid) The change would require extra code (more below).

Managing the verification of the replacement is a bigger problem,
especially if already pushed to a server.
> Now the pain points start:
> - first the index does not handle replace (I think), so the replaced file
>   appear as changed in git status, even through eg git diff shows nothing.
>
> => Solution: configure .git/info/sparse-checkout
>
> - secondly, I want to remove the large blob from my repo.
>
> Ideally I'd like to repack everything but filter this blob, except that
> repack does not understand --filter. So I need to use `git pack-objects`
> directly and then do the naming and clean up that repack usually does
> manually, which is error prone.
>
> Furthermore, while `git pack-objects` accepts --filter, I can only filter on
> blob size, not blob oid. (there is filter=sparse:oid where I could reuse my
> sparse checkout file, but I would need to make a blob of it first). And if I
> have one large file I want to keep, I cannot filter by blob size.
>
> Another solution would be to use `git unpack-objects` to unpack all objects
> (except I would need to do that in an empty git dir), remove the blob, and
> then repack everything.
>
> Am I missing a simpler solution?
>
> - finally, checkouting to a ref including the replaced (now missing) blob
>   gives error messages of the form:
> error: invalid object 100644 b5f74037bb91c45606b233b0ad6aad86f8e3875e for 'Silverman-Height-NonTorsion.pdf'
>
> On the one hand it is reassuring that git checks that the real object
> (rather than only the replaced object) is still there, on the other hand it
> would be nice to ask git to completely forget about the original object
> (except fsck of course).
>
> Thanks,
> Damien
My notes on the "13. Obliterate" ideas.

1. If the object is in the wild & is dangerous : Stop: Failed: Damage
limitation.
2. If the object is external, but still tame : Seek and recapture;
either treat as internal, or treat as wild [1].
3. The object is in captivity, even if distributed around an enclosure.
Proceed to vaccination [4].
4. Create new blob object with exact content "Git revoke: <oid>" (or
similar) This object includes the embedded object type coding as part of
the object format. This object is/becomes part of the git signature/oid
commit hierarchy. This should (ultimately) be on 'master' branch as it
is the verifier for the obliteration.
5. In the old revoked object <oid>, replace the object content (after
zlib etc) with the same content as created in step 4. This deliberately
_malformed_ object would normally cause fsck to barf. see [6]
6. However here we/fsck would detect the length and prefix of the
(barfed) object contents and so determine its oid (the oid of the
content). This results in an oid equal to that found in 4. which can be
looked up and determined to be a self referral to this obliterated oid,
so an fsck 'pass:obliterated' result is returned. This content could be
actually be stored in any removed file if checked out!

Consequences:
Packs and other served object contents no longer contain the  revoked oid.
Hygiene/vaccination needs applied to other distributed recipients of the
former defective object.

Possible attacks: Attacker removes other important commits/blobs/trees
by adding a 'revoke' which propagates to other users: Separate the
hygiene cycle from the initial server revocation.

For trees(?) and commits the message is "revoke <oid> <use-oid>". But
where to 'hold' the commit & tree (maybe require that tree revoke is
treated as a commit revoke, so the the new tree is got for free). We
still need the new commit to be walked by fsck/gc, and the old oid
contents to be gc'd.
For a 'commit' revocation it (the new msg/trees/revision) could maybe be
a 2nd (or third parent after {0}) so a 'normal walk finds it, but
probably that's just a recipe for disaster.
Maybe a revocation reflog that doesn't expire? or can be rebuilt (fsck
would extend it's lost/found to include a revoked list).

The new (XY) problem is now one of tying in the new revoked blob to the
'old' commit/tree hierarchy which only handles tracked files! Maybe its
a .revoked file (like a .gitignore) which has a list of the old oids and
has actual blobs attached under a .revoked tree.

Also need to make sure that re-packing is done if the blob/tree/commit
was a delta-base at the point of obliteration. Also need to prompt the
local user, just in case it's a spoof!. Plus need a way of 'sending' the
revocation. (and flag for what to do about a fetch pack containing a
revocation for which we have the original, esp if we have it as a pack
that will take a long time to recreate. Need a way of writing the
'defective' object (more code).

Newhash transition. When histories are rewritten, then the obliterated
artefacts are truly removed. For new repos using the newhash then the
revocation mechanism is essentially the same other than extending the
nominal size of the revocation objects.

Perhaps use the 'submodule' commit object type (i.e 'stuff held
elsewhere')  for the holder of the revoked ID (for commits & trees).
This could be locked into the history (details not fully thought
through..).

If there is a design error within Git, its the lack of an 'after the
fact' redaction mechanism (and how it is spread across branches and
distributed users/servers) - not easy.

Philip

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 16/17] “I want a reviewer”
  2020-03-12 13:31   ` Emily Shaffer
  2020-03-12 17:31     ` Konstantin Ryabitsev
@ 2020-03-17  0:43     ` Philippe Blain
  1 sibling, 0 replies; 62+ messages in thread
From: Philippe Blain @ 2020-03-17  0:43 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: James Ramsay, git, Johannes Schindelin

Hi Emily,

> Le 12 mars 2020 à 09:31, Emily Shaffer <emilyshaffer@google.com> a écrit :
> 
> On Thu, Mar 12, 2020 at 03:14:25PM +1100, James Ramsay wrote:
>> 5. Jonathan N: patchwork exists, need to learn how to use it :)
> 
> We've actually got a meeting with some Patchwork folks today - if
> anybody has a burning need they want filled via Patchwork, just say so,
> and we'll try to ask.

I just read this so I don't know if it's too late, but patchwork does not cope well with how Gitgitgadget uses the same email address for all submissions.
I reported that here:  https://lore.kernel.org/git/75987318-A9A7-4235-8B1D-315B29B644E8@gmail.com/, but haven't opened an issue yet on patchwork's bug tracker.
I'm not sure either if the best course of action is on the GGG or the patchwork side, though, as perJunio's suggestion in the above thread...

Philippe.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Allowing only blob filtering was: [TOPIC 5/17] Partial Clone
  2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
@ 2020-03-17  7:38   ` " Christian Couder
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  0 siblings, 1 reply; 62+ messages in thread
From: Christian Couder @ 2020-03-17  7:38 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: git, James Ramsay

Hi Taylor and Peff,

On Thu, Mar 12, 2020 at 5:01 AM James Ramsay <james@jramsay.com.au> wrote:
>
> 1. Stolee: what is the status, who is deploying it, what issues need to
> be handled? Example, downloading tags. Hard to highly recommend it.
>
> 2. Taylor: we deployed it. No activity except for internal testing. Some
> more activity, but no crashes. Have been dragging our feet. Chicken egg,
> can’t deploy it because the client may not work, but hoping to hear
> about problems.
>
> 3. ZJ: dark launched for a mission critical repos. Internal questions
> from CI team, not sure about performance. Build farm hitting it hard
> with different filter specs.
>
> 4. Taylor: we have patches we are promising to the list. Blob none and
> limit, for now, but add them incrementally. Bitmap patches are on the
> list

We (GitLab) would be interested in seeing the patches you already have
that only allow blob filtering.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
@ 2020-03-17 20:39     ` Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

Hi Christian,

Of course, I would be happy to send along our patches. They are included
in the series below, and correspond roughly to what we are running at
GitHub. (For us, there have been a few more clean-ups and additional
patches, but I squashed them into 2/2 below).

The approach is roughly that we have:

  - 'uploadpack.filter.allow' -> specifying the default for unspecified
    filter choices, itself defaulting to true in order to maintain
    backwards compatibility, and

  - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
    filter kind is allowed or not. (Originally this was given as 'git
    config uploadpack.filter=blob:none.allow true', but this '=' is
    ambiguous to configuration given over '-c', which itself uses an '='
    to separate keys from values.)

I noted in the second patch that there is the unfortunate possibility of
encountering a SIGPIPE when trying to write the ERR sideband back to a
client who requested a non-supported filter. Peff and I have had some
discussion off-list about resurrecting SZEDZER's work which makes room
in the buffer by reading one packet back from the client when the server
encounters a SIGPIPE. It is for this reason that I am marking the series
as 'RFC'.

For reference, our configuration at GitHub looks something like:

  [uploadpack]
    allowAnySHA1InWant = true
    allowFilter = true
  [uploadpack "filter"]
    allow = false
  [uploadpack "filter.blob:limit"]
    allow = true
  [uploadpack "filter.blob:none"]
    allow = true

with a few irrelevant details elided for the purposes of the list :-).

I'd be happy to take in any comments that you or others might have
before dropping the 'RFC' status.

Taylor Blau (2):
  list_objects_filter_options: introduce 'list_object_filter_config_name'
  upload-pack.c: allow banning certain object filter(s)

 Documentation/config/uploadpack.txt | 12 ++++++
 list-objects-filter-options.c       | 25 +++++++++++
 list-objects-filter-options.h       |  6 +++
 t/t5616-partial-clone.sh            | 23 ++++++++++
 upload-pack.c                       | 67 +++++++++++++++++++++++++++++
 5 files changed, 133 insertions(+)

--
2.26.0.rc2.2.g888d9484cf

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
@ 2020-03-17 20:39       ` Taylor Blau
  2020-03-17 20:53         ` Eric Sunshine
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2 siblings, 1 reply; 62+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

In a subsequent commit, we will add configuration options that are
specific to each kind of object filter, in which case it is handy to
have a function that translates between 'enum
list_objects_filter_choice' and an appropriate configuration-friendly
string.
---
 list-objects-filter-options.c | 25 +++++++++++++++++++++++++
 list-objects-filter-options.h |  6 ++++++
 2 files changed, 31 insertions(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 256bcfbdfe..6b6aa0b3ec 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -15,6 +15,31 @@ static int parse_combine_filter(
 	const char *arg,
 	struct strbuf *errbuf);
 
+const char *list_object_filter_config_name(enum list_objects_filter_choice c)
+{
+	switch (c) {
+	case LOFC_BLOB_NONE:
+		return "blob:none";
+	case LOFC_BLOB_LIMIT:
+		return "blob:limit";
+	case LOFC_TREE_DEPTH:
+		return "tree:depth";
+	case LOFC_SPARSE_OID:
+		return "sparse:oid";
+	case LOFC_COMBINE:
+		return "combine";
+	case LOFC_DISABLED:
+	case LOFC__COUNT:
+		/*
+		 * Include these to catch all enumerated values, but
+		 * break to treat them as a bug. Any new values of this
+		 * enum will cause a compiler error, as desired.
+		 */
+		break;
+	}
+	BUG("list_object_filter_choice_name: invalid argument '%d'", c);
+}
+
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 2ffb39222c..e5259e4ac6 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -17,6 +17,12 @@ enum list_objects_filter_choice {
 	LOFC__COUNT /* must be last */
 };
 
+/*
+ * Returns a configuration key suitable for describing the given object filter,
+ * e.g.: "blob:none", "combine", etc.
+ */
+const char *list_object_filter_config_name(enum list_objects_filter_choice c);
+
 struct list_objects_filter_options {
 	/*
 	 * 'filter_spec' is the raw argument value given on the command line
-- 
2.26.0.rc2.2.g888d9484cf


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
@ 2020-03-17 20:39       ` Taylor Blau
  2020-03-17 21:11         ` Eric Sunshine
  2020-03-18 11:18         ` Philip Oakley
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2 siblings, 2 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-17 20:39 UTC (permalink / raw)
  To: git; +Cc: christian.couder, peff, james

Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.

However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).

Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing a new configuration variable and
section:

  - 'uploadpack.filter.allow'

  - 'uploadpack.filter.<kind>.allow'

where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
and so on. The additional '.' between 'filter' and '<kind>' is part of
the sub-section.

Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.

If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpack.filter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.

NB: this introduces an unfortunate possibility that attempt to write the
ERR sideband will cause a SIGPIPE. This can be prevented by some of
SZEDZER's previous work, but it is silenced in 't' for now.
---
 Documentation/config/uploadpack.txt | 12 ++++++
 t/t5616-partial-clone.sh            | 23 ++++++++++
 upload-pack.c                       | 67 +++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index ed1c835695..6213bd619c 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -57,6 +57,18 @@ uploadpack.allowFilter::
 	If this option is set, `upload-pack` will support partial
 	clone and partial fetch object filtering.
 
+uploadpack.filter.allow::
+	Provides a default value for unspecified object filters (see: the
+	below configuration variable).
+	Defaults to `true`.
+
+uploadpack.filter.<filter>.allow::
+	Explicitly allow or ban the object filter corresponding to `<filter>`,
+	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
+	`sparse:oid`, or `combine`. If using combined filters, both `combine`
+	and all of the nested filter kinds must be allowed.
+	Defaults to `uploadpack.filter.allow`.
+
 uploadpack.allowRefInWant::
 	If this option is set, `upload-pack` will support the `ref-in-want`
 	feature of the protocol version 2 `fetch` command.  This feature
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e976..ee1af9b682 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
 	test_cmp unique_types.expected unique_types.actual
 '
 
+test_expect_success 'upload-pack fails banned object filters' '
+	# Ensure that configuration keys are normalized by capitalizing
+	# "blob:None" below:
+	test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
+		"file://$(pwd)/srv.bare" pc3
+'
+
+test_expect_success 'upload-pack fails banned combine object filters' '
+	test_config -C srv.bare uploadpack.filter.allow false &&
+	test_config -C srv.bare uploadpack.filter.combine.allow true &&
+	test_config -C srv.bare uploadpack.filter.tree:depth.allow true &&
+	test_config -C srv.bare uploadpack.filter.blob:none.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
+		--filter=blob:none "file://$(pwd)/srv.bare" pc3
+'
+
+test_expect_success 'upload-pack fails banned object filters with fallback' '
+	test_config -C srv.bare uploadpack.filter.allow false &&
+	test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
+		"file://$(pwd)/srv.bare" pc3
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/upload-pack.c b/upload-pack.c
index c53249cac1..81f2701f99 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,6 +69,8 @@ static int filter_capability_requested;
 static int allow_filter;
 static int allow_ref_in_want;
 static struct list_objects_filter_options filter_options;
+static struct string_list allowed_filters = STRING_LIST_INIT_DUP;
+static int allow_filter_fallback = 1;
 
 static int allow_sideband_all;
 
@@ -848,6 +850,45 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
 	return 0;
 }
 
+static int allows_filter_choice(enum list_objects_filter_choice c)
+{
+	const char *key = list_object_filter_config_name(c);
+	struct string_list_item *item = string_list_lookup(&allowed_filters,
+							   key);
+	if (item)
+		return (intptr_t) item->util;
+	return allow_filter_fallback;
+}
+
+static struct list_objects_filter_options *banned_filter(
+	struct list_objects_filter_options *opts)
+{
+	size_t i;
+
+	if (!allows_filter_choice(opts->choice))
+		return opts;
+
+	if (opts->choice == LOFC_COMBINE)
+		for (i = 0; i < opts->sub_nr; i++) {
+			struct list_objects_filter_options *sub = &opts->sub[i];
+			if (banned_filter(sub))
+				return sub;
+		}
+	return NULL;
+}
+
+static void die_if_using_banned_filter(struct packet_writer *w,
+				       struct list_objects_filter_options *opts)
+{
+	struct list_objects_filter_options *banned = banned_filter(opts);
+	if (!banned)
+		return;
+
+	packet_writer_error(w, _("filter '%s' not supported\n"),
+			    list_object_filter_config_name(banned->choice));
+	die(_("git upload-pack: banned object filter requested"));
+}
+
 static void receive_needs(struct packet_reader *reader, struct object_array *want_obj)
 {
 	struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -885,6 +926,7 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
 				die("git upload-pack: filtering capability not negotiated");
 			list_objects_filter_die_if_populated(&filter_options);
 			parse_list_objects_filter(&filter_options, arg);
+			die_if_using_banned_filter(&writer, &filter_options);
 			continue;
 		}
 
@@ -1044,6 +1086,9 @@ static int find_symref(const char *refname, const struct object_id *oid,
 
 static int upload_pack_config(const char *var, const char *value, void *unused)
 {
+	const char *sub, *key;
+	int sub_len;
+
 	if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
 		if (git_config_bool(var, value))
 			allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
@@ -1065,6 +1110,26 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
 			keepalive = -1;
 	} else if (!strcmp("uploadpack.allowfilter", var)) {
 		allow_filter = git_config_bool(var, value);
+	} else if (!parse_config_key(var, "uploadpack", &sub, &sub_len, &key) &&
+		   key && !strcmp(key, "allow")) {
+		if (sub && skip_prefix(sub, "filter.", &sub) && sub_len >= 7) {
+			struct string_list_item *item;
+			char *spec;
+
+			/*
+			 * normalize the filter, and chomp off '.allow' from the
+			 * end
+			 */
+			spec = xstrdup_tolower(sub);
+			spec[sub_len - 7] = 0;
+
+			item = string_list_insert(&allowed_filters, spec);
+			item->util = (void *) (intptr_t) git_config_bool(var, value);
+
+			free(spec);
+		} else if (!strcmp("uploadpack.filter.allow", var)) {
+			allow_filter_fallback = git_config_bool(var, value);
+		}
 	} else if (!strcmp("uploadpack.allowrefinwant", var)) {
 		allow_ref_in_want = git_config_bool(var, value);
 	} else if (!strcmp("uploadpack.allowsidebandall", var)) {
@@ -1308,6 +1373,8 @@ static void process_args(struct packet_reader *request,
 		if (allow_filter && skip_prefix(arg, "filter ", &p)) {
 			list_objects_filter_die_if_populated(&filter_options);
 			parse_list_objects_filter(&filter_options, p);
+			die_if_using_banned_filter(&data->writer,
+						   &filter_options);
 			continue;
 		}
 
-- 
2.26.0.rc2.2.g888d9484cf

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
@ 2020-03-17 20:53         ` Eric Sunshine
  2020-03-18 10:03           ` Jeff King
  2020-03-18 21:05           ` Taylor Blau
  0 siblings, 2 replies; 62+ messages in thread
From: Eric Sunshine @ 2020-03-17 20:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> In a subsequent commit, we will add configuration options that are
> specific to each kind of object filter, in which case it is handy to
> have a function that translates between 'enum
> list_objects_filter_choice' and an appropriate configuration-friendly
> string.
> ---

Missing sign-off (but perhaps that's intentional since this is RFC).

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> @@ -15,6 +15,31 @@ static int parse_combine_filter(
> +const char *list_object_filter_config_name(enum list_objects_filter_choice c)
> +{
> +       switch (c) {
> +       case LOFC_BLOB_NONE:
> +               return "blob:none";
> +       case LOFC_BLOB_LIMIT:
> +               return "blob:limit";
> +       case LOFC_TREE_DEPTH:
> +               return "tree:depth";
> +       case LOFC_SPARSE_OID:
> +               return "sparse:oid";
> +       case LOFC_COMBINE:
> +               return "combine";
> +       case LOFC_DISABLED:
> +       case LOFC__COUNT:
> +               /*
> +                * Include these to catch all enumerated values, but
> +                * break to treat them as a bug. Any new values of this
> +                * enum will cause a compiler error, as desired.
> +                */

In general, people will see a warning, not an error, unless they
specifically use -Werror (or such) to turn the warning into an error,
so this statement is misleading. Also, while some compilers may
complain, others may not. So, although the comment claims that we will
notice an unhandled enum constant at compile-time, that isn't
necessarily the case.

Moreover, the comment itself, in is present form, is rather
superfluous since its merely repeating what the BUG() invocation just
below it already tells me. In fact, as a reader of this code, I would
be more interested in knowing why those two cases do not have string
equivalents which are returned (although perhaps even that would be
obvious to someone familiar with the code, hence the comment can
probably be dropped altogether).

> +               break;
> +       }
> +       BUG("list_object_filter_choice_name: invalid argument '%d'", c);
> +}

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
@ 2020-03-17 21:11         ` Eric Sunshine
  2020-03-18 21:18           ` Taylor Blau
  2020-03-18 11:18         ` Philip Oakley
  1 sibling, 1 reply; 62+ messages in thread
From: Eric Sunshine @ 2020-03-17 21:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> NB: this introduces an unfortunate possibility that attempt to write the
> ERR sideband will cause a SIGPIPE. This can be prevented by some of
> SZEDZER's previous work, but it is silenced in 't' for now.

s/SZEDZER/SZEDER/

> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
> +test_expect_success 'upload-pack fails banned object filters' '
> +       # Ensure that configuration keys are normalized by capitalizing
> +       # "blob:None" below:
> +       test_config -C srv.bare uploadpack.filter.blob:None.allow false &&

I found the wording of the comment more confusing than clarifying.
Perhaps rewriting it like this could help:

    Test case-insensitivity by intentional use of "blob:None" rather than
    "blob:none".

or something.

> +       test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> +               "file://$(pwd)/srv.bare" pc3
> +'

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:53         ` Eric Sunshine
@ 2020-03-18 10:03           ` Jeff King
  2020-03-18 19:40             ` Junio C Hamano
  2020-03-18 22:38             ` Eric Sunshine
  2020-03-18 21:05           ` Taylor Blau
  1 sibling, 2 replies; 62+ messages in thread
From: Jeff King @ 2020-03-18 10:03 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, james

On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:

> > +       case LOFC_DISABLED:
> > +       case LOFC__COUNT:
> > +               /*
> > +                * Include these to catch all enumerated values, but
> > +                * break to treat them as a bug. Any new values of this
> > +                * enum will cause a compiler error, as desired.
> > +                */
> 
> In general, people will see a warning, not an error, unless they
> specifically use -Werror (or such) to turn the warning into an error,
> so this statement is misleading. Also, while some compilers may
> complain, others may not. So, although the comment claims that we will
> notice an unhandled enum constant at compile-time, that isn't
> necessarily the case.

Yes, but that's the best we can do, isn't it?

There's sort of a meta-issue here which Taylor and I discussed off-list
and which led to this comment. We quite often write switch statements
over enums like this:

  switch (foo)
  case FOO_ONE:
	...do something...
  case FOO_TWO:
        ...something else...
  default:
	BUG("I don't know what to do with %d", foo);
  }

That's reasonable and does the right thing at runtime if we ever hit
this case. But it has the unfortunate side effect that we lose any
-Wswitch warning that could tell us at compile time that we're missing a
case. Not everybody would see such a warning, as you note, but
developers on gcc and clang generally would (it's part of -Wall).

But we can't just remove the default case. Even though enums don't
generally take on other values, it's legal for them to do so. So we do
want to make sure we BUG() in that instance.

This is awkward to solve in the general case[1]. But because we're
returning in each case arm here, it's easy to just put the BUG() after
the switch. Anything that didn't return is unhandled, and we get the
best of both: -Wswitch warnings when we need to add a new filter type,
and a BUG() in the off chance that we see an unexpected value.

But the cost is that we have to enumerate the set of values that are
defined but not handled here (LOFC__COUNT, for instance, isn't a real
enum value but rather a placeholder to let other code know how many
filter types there are).

So...I dunno. Worth it as a general technique?

-Peff

[1] In the general case where you don't return, you have to somehow know
    whether the value was actually handled or not (and BUG() if it
    wasn't). Presumably by keeping a separate flag variable, which is
    pretty ugly. -Wswitch-enum is supposed to deal with this by
    requiring that you list all of the values even if you have a default
    case. But it triggers in a lot of other places in the code that I
    think would be made much harder to read by having to list out the
    enumerated possibilities.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
@ 2020-03-18 10:18       ` Jeff King
  2020-03-18 18:26         ` Re*: " Junio C Hamano
  2020-03-18 21:28         ` Taylor Blau
  2 siblings, 2 replies; 62+ messages in thread
From: Jeff King @ 2020-03-18 10:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, christian.couder, james

On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:

> Hi Christian,
> 
> Of course, I would be happy to send along our patches. They are included
> in the series below, and correspond roughly to what we are running at
> GitHub. (For us, there have been a few more clean-ups and additional
> patches, but I squashed them into 2/2 below).
> 
> The approach is roughly that we have:
> 
>   - 'uploadpack.filter.allow' -> specifying the default for unspecified
>     filter choices, itself defaulting to true in order to maintain
>     backwards compatibility, and
> 
>   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
>     filter kind is allowed or not. (Originally this was given as 'git
>     config uploadpack.filter=blob:none.allow true', but this '=' is
>     ambiguous to configuration given over '-c', which itself uses an '='
>     to separate keys from values.)

One thing that's a little ugly here is the embedded dot in the
subsection (i.e., "filter.<filter>"). It makes it look like a four-level
key, but really there is no such thing in Git.  But everything else we
tried was even uglier.

I think we want to declare a real subsection for each filter and not
just "uploadpack.filter.<filter>". That gives us room to expand to other
config options besides "allow" later on if we need to.

We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
that's too generic.

Likewise "filter.allow" is too generic.

We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
but that's both ugly _and_ separates these options from the rest of
uploadpack.*.

We could use a character besides ".", which would reduce confusion. But
what? Using colon is kind of ugly, because it's already syntactically
significant in filter names, and you get:

  uploadpack.filter:blob:none.allow

We tried equals, like:

  uploadpack.filter=blob:none.allow

but there's an interesting side effect. Doing:

  git -c uploadpack.filter=blob:none.allow=true upload-pack ...

doesn't work, because the "-c" parser ends the key at the first "=". As
it should, because otherwise we'd get confused by an "=" in a value.
This is a failing of the "-c" syntax; it can't represent values with
"=". Fixing it would be awkward, and I've never seen it come up in
practice outside of this (you _could_ have a branch with a funny name
and try to do "git -c branch.my=funny=branch.remote=origin" or
something, but the lack of bug reports suggests nobody is that
masochistic).

So...maybe the extra dot is the last bad thing?

> I noted in the second patch that there is the unfortunate possibility of
> encountering a SIGPIPE when trying to write the ERR sideband back to a
> client who requested a non-supported filter. Peff and I have had some
> discussion off-list about resurrecting SZEDZER's work which makes room
> in the buffer by reading one packet back from the client when the server
> encounters a SIGPIPE. It is for this reason that I am marking the series
> as 'RFC'.

For reference, the patch I was thinking of was this:

  https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
  2020-03-17 21:11         ` Eric Sunshine
@ 2020-03-18 11:18         ` Philip Oakley
  2020-03-18 21:20           ` Taylor Blau
  1 sibling, 1 reply; 62+ messages in thread
From: Philip Oakley @ 2020-03-18 11:18 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: christian.couder, peff, james

Hi
On 17/03/2020 20:39, Taylor Blau wrote:
> Git clients may ask the server for a partial set of objects, where the
> set of objects being requested is refined by one or more object filters.
> Server administrators can configure 'git upload-pack' to allow or ban
> these filters by setting the 'uploadpack.allowFilter' variable to
> 'true' or 'false', respectively.
>
> However, administrators using bitmaps may wish to allow certain kinds of
> object filters, but ban others. Specifically, they may wish to allow
> object filters that can be optimized by the use of bitmaps, while
> rejecting other object filters which aren't and represent a perceived
> performance degradation (as well as an increased load factor on the
> server).
>
> Allow configuring 'git upload-pack' to support object filters on a
> case-by-case basis by introducing a new configuration variable and
> section:
>
>   - 'uploadpack.filter.allow'
>
>   - 'uploadpack.filter.<kind>.allow'
>
> where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
> and so on. The additional '.' between 'filter' and '<kind>' is part of
> the sub-section.
>
> Setting the second configuration variable for any valid value of
> '<kind>' explicitly allows or disallows restricting that kind of object
> filter.
>
> If a client requests the object filter <kind> and the respective
> configuration value is not set, 'git upload-pack' will default to the
> value of 'uploadpack.filter.allow', which itself defaults to 'true' to
> maintain backwards compatibility. Note that this differs from
> 'uploadpack.allowfilter', which controls whether or not the 'filter'
> capability is advertised.
>
> NB: this introduces an unfortunate possibility that attempt to write the
> ERR sideband will cause a SIGPIPE. This can be prevented by some of
> SZEDZER's previous work, but it is silenced in 't' for now.
> ---
>  Documentation/config/uploadpack.txt | 12 ++++++
>  t/t5616-partial-clone.sh            | 23 ++++++++++
>  upload-pack.c                       | 67 +++++++++++++++++++++++++++++
>  3 files changed, 102 insertions(+)
>
> diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
> index ed1c835695..6213bd619c 100644
> --- a/Documentation/config/uploadpack.txt
> +++ b/Documentation/config/uploadpack.txt
> @@ -57,6 +57,18 @@ uploadpack.allowFilter::
>  	If this option is set, `upload-pack` will support partial
>  	clone and partial fetch object filtering.
>  
> +uploadpack.filter.allow::
> +	Provides a default value for unspecified object filters (see: the
> +	below configuration variable).
> +	Defaults to `true`.
> +
> +uploadpack.filter.<filter>.allow::
> +	Explicitly allow or ban the object filter corresponding to `<filter>`,
> +	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
> +	`sparse:oid`, or `combine`. If using combined filters, both `combine`
> +	and all of the nested filter kinds must be allowed.

Doesn't the man page at least need the part from the commit message "The
additional '.' between 'filter' and '<kind>' is part of
the sub-section." as it's not a common mechanism (other comments not
withstanding)

Philip
> +	Defaults to `uploadpack.filter.allow`.
> +
>  uploadpack.allowRefInWant::
>  	If this option is set, `upload-pack` will support the `ref-in-want`
>  	feature of the protocol version 2 `fetch` command.  This feature
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 77bb91e976..ee1af9b682 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
>  	test_cmp unique_types.expected unique_types.actual
>  '
>  
> +test_expect_success 'upload-pack fails banned object filters' '
> +	# Ensure that configuration keys are normalized by capitalizing
> +	# "blob:None" below:
> +	test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> +		"file://$(pwd)/srv.bare" pc3
> +'
> +
> +test_expect_success 'upload-pack fails banned combine object filters' '
> +	test_config -C srv.bare uploadpack.filter.allow false &&
> +	test_config -C srv.bare uploadpack.filter.combine.allow true &&
> +	test_config -C srv.bare uploadpack.filter.tree:depth.allow true &&
> +	test_config -C srv.bare uploadpack.filter.blob:none.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter=tree:1 \
> +		--filter=blob:none "file://$(pwd)/srv.bare" pc3
> +'
> +
> +test_expect_success 'upload-pack fails banned object filters with fallback' '
> +	test_config -C srv.bare uploadpack.filter.allow false &&
> +	test_must_fail ok=sigpipe git clone --no-checkout --filter=blob:none \
> +		"file://$(pwd)/srv.bare" pc3
> +'
> +
>  test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
>  	rm -rf src dst &&
>  	git init src &&
> diff --git a/upload-pack.c b/upload-pack.c
> index c53249cac1..81f2701f99 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -69,6 +69,8 @@ static int filter_capability_requested;
>  static int allow_filter;
>  static int allow_ref_in_want;
>  static struct list_objects_filter_options filter_options;
> +static struct string_list allowed_filters = STRING_LIST_INIT_DUP;
> +static int allow_filter_fallback = 1;
>  
>  static int allow_sideband_all;
>  
> @@ -848,6 +850,45 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
>  	return 0;
>  }
>  
> +static int allows_filter_choice(enum list_objects_filter_choice c)
> +{
> +	const char *key = list_object_filter_config_name(c);
> +	struct string_list_item *item = string_list_lookup(&allowed_filters,
> +							   key);
> +	if (item)
> +		return (intptr_t) item->util;
> +	return allow_filter_fallback;
> +}
> +
> +static struct list_objects_filter_options *banned_filter(
> +	struct list_objects_filter_options *opts)
> +{
> +	size_t i;
> +
> +	if (!allows_filter_choice(opts->choice))
> +		return opts;
> +
> +	if (opts->choice == LOFC_COMBINE)
> +		for (i = 0; i < opts->sub_nr; i++) {
> +			struct list_objects_filter_options *sub = &opts->sub[i];
> +			if (banned_filter(sub))
> +				return sub;
> +		}
> +	return NULL;
> +}
> +
> +static void die_if_using_banned_filter(struct packet_writer *w,
> +				       struct list_objects_filter_options *opts)
> +{
> +	struct list_objects_filter_options *banned = banned_filter(opts);
> +	if (!banned)
> +		return;
> +
> +	packet_writer_error(w, _("filter '%s' not supported\n"),
> +			    list_object_filter_config_name(banned->choice));
> +	die(_("git upload-pack: banned object filter requested"));
> +}
> +
>  static void receive_needs(struct packet_reader *reader, struct object_array *want_obj)
>  {
>  	struct object_array shallows = OBJECT_ARRAY_INIT;
> @@ -885,6 +926,7 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
>  				die("git upload-pack: filtering capability not negotiated");
>  			list_objects_filter_die_if_populated(&filter_options);
>  			parse_list_objects_filter(&filter_options, arg);
> +			die_if_using_banned_filter(&writer, &filter_options);
>  			continue;
>  		}
>  
> @@ -1044,6 +1086,9 @@ static int find_symref(const char *refname, const struct object_id *oid,
>  
>  static int upload_pack_config(const char *var, const char *value, void *unused)
>  {
> +	const char *sub, *key;
> +	int sub_len;
> +
>  	if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
>  		if (git_config_bool(var, value))
>  			allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
> @@ -1065,6 +1110,26 @@ static int upload_pack_config(const char *var, const char *value, void *unused)
>  			keepalive = -1;
>  	} else if (!strcmp("uploadpack.allowfilter", var)) {
>  		allow_filter = git_config_bool(var, value);
> +	} else if (!parse_config_key(var, "uploadpack", &sub, &sub_len, &key) &&
> +		   key && !strcmp(key, "allow")) {
> +		if (sub && skip_prefix(sub, "filter.", &sub) && sub_len >= 7) {
> +			struct string_list_item *item;
> +			char *spec;
> +
> +			/*
> +			 * normalize the filter, and chomp off '.allow' from the
> +			 * end
> +			 */
> +			spec = xstrdup_tolower(sub);
> +			spec[sub_len - 7] = 0;
> +
> +			item = string_list_insert(&allowed_filters, spec);
> +			item->util = (void *) (intptr_t) git_config_bool(var, value);
> +
> +			free(spec);
> +		} else if (!strcmp("uploadpack.filter.allow", var)) {
> +			allow_filter_fallback = git_config_bool(var, value);
> +		}
>  	} else if (!strcmp("uploadpack.allowrefinwant", var)) {
>  		allow_ref_in_want = git_config_bool(var, value);
>  	} else if (!strcmp("uploadpack.allowsidebandall", var)) {
> @@ -1308,6 +1373,8 @@ static void process_args(struct packet_reader *request,
>  		if (allow_filter && skip_prefix(arg, "filter ", &p)) {
>  			list_objects_filter_die_if_populated(&filter_options);
>  			parse_list_objects_filter(&filter_options, p);
> +			die_if_using_banned_filter(&data->writer,
> +						   &filter_options);
>  			continue;
>  		}
>  


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re*: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
@ 2020-03-18 18:26         ` " Junio C Hamano
  2020-03-19 17:03           ` Jeff King
  2020-03-18 21:28         ` Taylor Blau
  1 sibling, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2020-03-18 18:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, christian.couder, james

Jeff King <peff@peff.net> writes:

>>   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
>>     filter kind is allowed or not. (Originally this was given as 'git
>>     config uploadpack.filter=blob:none.allow true', but this '=' is
>>     ambiguous to configuration given over '-c', which itself uses an '='
>>     to separate keys from values.)
>
> One thing that's a little ugly here is the embedded dot in the
> subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> key, but really there is no such thing in Git.  But everything else we
> tried was even uglier.

I think this gives us the best arrangement by upfront forcing all
the configuration handers for "<subcommand>.*.<token>" namespace,
current and future, to use "<group-prefix>" before the unbounded set
of user-specifiable values that affects the <subcommand> (which is
"uploadpack").

So far, the configuration variables that needs to be grouped by
unbounded set of user-specifiable values we supported happened to
have only one sensible such set for each <subcommand>, so we could
get away without such <group-prefix> and it was perfectly OK to
have, say "guitool.<name>.cmd".

Syntactically, the convention to always end such <group-prefix> with
a dot "." may look unusual, or once readers' eyes get used to them,
may look natural.  One tiny sad thing about it is that it cannot be
mechanically enforced, but that is minor.

> We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> but that's both ugly _and_ separates these options from the rest of
> uploadpack.*.

There is an existing instance of a configuration that affects
<subcommand> that uses a different word after <subcommand>, which is
credentialCache.ignoreSIGHUP, and I tend to agree that it is ugly.

By the way, I noticed the following while I was studying the current
practice, so before I forget...

-- >8 --
Subject: [PATCH] separate tar.* config to its own source file

Even though there is only one configuration variable in the
namespace, it is not quite right to have tar.umask described
among the variables for tag.* namespace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt     | 2 ++
 Documentation/config/tag.txt | 7 -------
 Documentation/config/tar.txt | 6 ++++++
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 08b13ba72b..2450589a0e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -447,6 +447,8 @@ include::config/submodule.txt[]
 
 include::config/tag.txt[]
 
+include::config/tar.txt[]
+
 include::config/trace2.txt[]
 
 include::config/transfer.txt[]
diff --git a/Documentation/config/tag.txt b/Documentation/config/tag.txt
index 6d9110d84c..5062a057ff 100644
--- a/Documentation/config/tag.txt
+++ b/Documentation/config/tag.txt
@@ -15,10 +15,3 @@ tag.gpgSign::
 	convenient to use an agent to avoid typing your gpg passphrase
 	several times. Note that this option doesn't affect tag signing
 	behavior enabled by "-u <keyid>" or "--local-user=<keyid>" options.
-
-tar.umask::
-	This variable can be used to restrict the permission bits of
-	tar archive entries.  The default is 0002, which turns off the
-	world write bit.  The special value "user" indicates that the
-	archiving user's umask will be used instead.  See umask(2) and
-	linkgit:git-archive[1].
diff --git a/Documentation/config/tar.txt b/Documentation/config/tar.txt
new file mode 100644
index 0000000000..de8ff48ea9
--- /dev/null
+++ b/Documentation/config/tar.txt
@@ -0,0 +1,6 @@
+tar.umask::
+	This variable can be used to restrict the permission bits of
+	tar archive entries.  The default is 0002, which turns off the
+	world write bit.  The special value "user" indicates that the
+	archiving user's umask will be used instead.  See umask(2) and
+	linkgit:git-archive[1].

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 10:03           ` Jeff King
@ 2020-03-18 19:40             ` Junio C Hamano
  2020-03-18 22:38             ` Eric Sunshine
  1 sibling, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2020-03-18 19:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Taylor Blau, Git List, Christian Couder, james

Jeff King <peff@peff.net> writes:

> But the cost is that we have to enumerate the set of values that are
> defined but not handled here (LOFC__COUNT, for instance, isn't a real
> enum value but rather a placeholder to let other code know how many
> filter types there are).
>
> So...I dunno. Worth it as a general technique?

"This is a possible value in the enum we are switching on, so I
write a case arm for it, but we do nothing for it here" is OK, but
if it were "we do nothing for it here or anywhere" (i.e. the maximum
enum value defined as a sentinel), the resulting code would be ugly.

I am not sure if the tradeoff is good to force such an ugliness on
readers' eyes to squelch the -Wswitch warnings.

So, I dunno.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-17 20:53         ` Eric Sunshine
  2020-03-18 10:03           ` Jeff King
@ 2020-03-18 21:05           ` Taylor Blau
  1 sibling, 0 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-18 21:05 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, Jeff King, james

Hi Eric,

On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:
> On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> > In a subsequent commit, we will add configuration options that are
> > specific to each kind of object filter, in which case it is handy to
> > have a function that translates between 'enum
> > list_objects_filter_choice' and an appropriate configuration-friendly
> > string.
> > ---
>
> Missing sign-off (but perhaps that's intentional since this is RFC).

Yes, the missing sign-off (in this patch as well as 2/2) is intentional,
since this is an RFC. Sorry for not calling this out more clearly in my
cover.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-17 21:11         ` Eric Sunshine
@ 2020-03-18 21:18           ` Taylor Blau
  0 siblings, 0 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-18 21:18 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, Jeff King, james

On Tue, Mar 17, 2020 at 05:11:42PM -0400, Eric Sunshine wrote:
> On Tue, Mar 17, 2020 at 4:40 PM Taylor Blau <me@ttaylorr.com> wrote:
> > NB: this introduces an unfortunate possibility that attempt to write the
> > ERR sideband will cause a SIGPIPE. This can be prevented by some of
> > SZEDZER's previous work, but it is silenced in 't' for now.
>
> s/SZEDZER/SZEDER/

Thank you for pointing this out, and my apologies to SZEDER.

> > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> > @@ -235,6 +235,29 @@ test_expect_success 'implicitly construct combine: filter with repeated flags' '
> > +test_expect_success 'upload-pack fails banned object filters' '
> > +       # Ensure that configuration keys are normalized by capitalizing
> > +       # "blob:None" below:
> > +       test_config -C srv.bare uploadpack.filter.blob:None.allow false &&
>
> I found the wording of the comment more confusing than clarifying.
> Perhaps rewriting it like this could help:
>
>     Test case-insensitivity by intentional use of "blob:None" rather than
>     "blob:none".
>
> or something.

Sure, your suggestion does clarify things. I'll apply it to my fork.

> > +       test_must_fail ok=sigpipe git clone --no-checkout --filter.blob:none \
> > +               "file://$(pwd)/srv.bare" pc3
> > +'

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s)
  2020-03-18 11:18         ` Philip Oakley
@ 2020-03-18 21:20           ` Taylor Blau
  0 siblings, 0 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-18 21:20 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Taylor Blau, git, christian.couder, peff, james

Hi Philip,

On Wed, Mar 18, 2020 at 11:18:16AM +0000, Philip Oakley wrote:
> Hi
> On 17/03/2020 20:39, Taylor Blau wrote:
> > Git clients may ask the server for a partial set of objects, where the
> > set of objects being requested is refined by one or more object filters.
> > Server administrators can configure 'git upload-pack' to allow or ban
> > these filters by setting the 'uploadpack.allowFilter' variable to
> > 'true' or 'false', respectively.
> >
> > However, administrators using bitmaps may wish to allow certain kinds of
> > object filters, but ban others. Specifically, they may wish to allow
> > object filters that can be optimized by the use of bitmaps, while
> > rejecting other object filters which aren't and represent a perceived
> > performance degradation (as well as an increased load factor on the
> > server).
> >
> > Allow configuring 'git upload-pack' to support object filters on a
> > case-by-case basis by introducing a new configuration variable and
> > section:
> >
> >   - 'uploadpack.filter.allow'
> >
> >   - 'uploadpack.filter.<kind>.allow'
> >
> > where '<kind>' may be one of 'blob:none', 'blob:limit', 'tree:depth',
> > and so on. The additional '.' between 'filter' and '<kind>' is part of
> > the sub-section.
> >
> > Setting the second configuration variable for any valid value of
> > '<kind>' explicitly allows or disallows restricting that kind of object
> > filter.
> >
> > If a client requests the object filter <kind> and the respective
> > configuration value is not set, 'git upload-pack' will default to the
> > value of 'uploadpack.filter.allow', which itself defaults to 'true' to
> > maintain backwards compatibility. Note that this differs from
> > 'uploadpack.allowfilter', which controls whether or not the 'filter'
> > capability is advertised.
> >
> > NB: this introduces an unfortunate possibility that attempt to write the
> > ERR sideband will cause a SIGPIPE. This can be prevented by some of
> > SZEDZER's previous work, but it is silenced in 't' for now.
> > ---
> >  Documentation/config/uploadpack.txt | 12 ++++++
> >  t/t5616-partial-clone.sh            | 23 ++++++++++
> >  upload-pack.c                       | 67 +++++++++++++++++++++++++++++
> >  3 files changed, 102 insertions(+)
> >
> > diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
> > index ed1c835695..6213bd619c 100644
> > --- a/Documentation/config/uploadpack.txt
> > +++ b/Documentation/config/uploadpack.txt
> > @@ -57,6 +57,18 @@ uploadpack.allowFilter::
> >  	If this option is set, `upload-pack` will support partial
> >  	clone and partial fetch object filtering.
> >
> > +uploadpack.filter.allow::
> > +	Provides a default value for unspecified object filters (see: the
> > +	below configuration variable).
> > +	Defaults to `true`.
> > +
> > +uploadpack.filter.<filter>.allow::
> > +	Explicitly allow or ban the object filter corresponding to `<filter>`,
> > +	where `<filter>` may be one of: `blob:none`, `blob:limit`, `tree:depth`,
> > +	`sparse:oid`, or `combine`. If using combined filters, both `combine`
> > +	and all of the nested filter kinds must be allowed.
>
> Doesn't the man page at least need the part from the commit message "The
> additional '.' between 'filter' and '<kind>' is part of
> the sub-section." as it's not a common mechanism (other comments not
> withstanding)

Thanks, you're certainly right. I wrote the man pages back when the
configuration was spelled:

  $ git config uploadpack.filter=blob:none.allow true

But now that there is the extra '.', it's worth calling out here, too.
I'll make sure that this is addressed based on the outcome of the
discussion below when these patches hit non-RFC status.

> Philip

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
  2020-03-18 18:26         ` Re*: " Junio C Hamano
@ 2020-03-18 21:28         ` Taylor Blau
  2020-03-18 22:41           ` Junio C Hamano
  2020-03-19 17:09           ` Jeff King
  1 sibling, 2 replies; 62+ messages in thread
From: Taylor Blau @ 2020-03-18 21:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 06:18:25AM -0400, Jeff King wrote:
> On Tue, Mar 17, 2020 at 02:39:05PM -0600, Taylor Blau wrote:
>
> > Hi Christian,
> >
> > Of course, I would be happy to send along our patches. They are included
> > in the series below, and correspond roughly to what we are running at
> > GitHub. (For us, there have been a few more clean-ups and additional
> > patches, but I squashed them into 2/2 below).
> >
> > The approach is roughly that we have:
> >
> >   - 'uploadpack.filter.allow' -> specifying the default for unspecified
> >     filter choices, itself defaulting to true in order to maintain
> >     backwards compatibility, and
> >
> >   - 'uploadpack.filter.<filter>.allow' -> specifying whether or not each
> >     filter kind is allowed or not. (Originally this was given as 'git
> >     config uploadpack.filter=blob:none.allow true', but this '=' is
> >     ambiguous to configuration given over '-c', which itself uses an '='
> >     to separate keys from values.)
>
> One thing that's a little ugly here is the embedded dot in the
> subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> key, but really there is no such thing in Git.  But everything else we
> tried was even uglier.
>
> I think we want to declare a real subsection for each filter and not
> just "uploadpack.filter.<filter>". That gives us room to expand to other
> config options besides "allow" later on if we need to.
>
> We don't want to claim "uploadpack.allow" and "uploadpack.<filter>.allow";
> that's too generic.
>
> Likewise "filter.allow" is too generic.

I wonder. A multi-valued 'uploadpack.filter.allow' *might* solve some
problems, but the more I turn it over in my head, the more that I think
that it's creating more headaches for us than it's removing.

On the pro's side, is that we could have this be a multi-valued key
where each value is the name of an allowed filter. I guess that would
solve the subsection-naming problem, but it is admittedly generic, not
to mention the fact that we already *use* this key to specify a default
value for missing 'uploadpack.filter.<filter>.allow' values. For that
reason, it seems like a non-starter to me.

> We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> but that's both ugly _and_ separates these options from the rest of
> uploadpack.*.
>
> We could use a character besides ".", which would reduce confusion. But
> what? Using colon is kind of ugly, because it's already syntactically
> significant in filter names, and you get:
>
>   uploadpack.filter:blob:none.allow
>
> We tried equals, like:
>
>   uploadpack.filter=blob:none.allow
>
> but there's an interesting side effect. Doing:
>
>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
>
> doesn't work, because the "-c" parser ends the key at the first "=". As
> it should, because otherwise we'd get confused by an "=" in a value.
> This is a failing of the "-c" syntax; it can't represent values with
> "=". Fixing it would be awkward, and I've never seen it come up in
> practice outside of this (you _could_ have a branch with a funny name
> and try to do "git -c branch.my=funny=branch.remote=origin" or
> something, but the lack of bug reports suggests nobody is that
> masochistic).

Thanks for adding some more detail to this decision.

Another thing we could do is just simply use a different character. It
may be a little odd, but it keeps the filter-related variables in their
own sub-section, allowing us to add more configuration sub-variables in
the future. I guess that calling it something like:

  $ git config uploadpack.filter@blob:none.allow <true|false>

is a little strange (i.e., why '@' over '#'? There's certainly no
precedent here that I can think of...), but maybe it is slightly
less-weird than a pseudo-four-level key.

> So...maybe the extra dot is the last bad thing?
>
> > I noted in the second patch that there is the unfortunate possibility of
> > encountering a SIGPIPE when trying to write the ERR sideband back to a
> > client who requested a non-supported filter. Peff and I have had some
> > discussion off-list about resurrecting SZEDZER's work which makes room
> > in the buffer by reading one packet back from the client when the server
> > encounters a SIGPIPE. It is for this reason that I am marking the series
> > as 'RFC'.
>
> For reference, the patch I was thinking of was this:
>
>   https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

Thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 10:03           ` Jeff King
  2020-03-18 19:40             ` Junio C Hamano
@ 2020-03-18 22:38             ` Eric Sunshine
  2020-03-19 17:15               ` Jeff King
  1 sibling, 1 reply; 62+ messages in thread
From: Eric Sunshine @ 2020-03-18 22:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Git List, Christian Couder, james

On Wed, Mar 18, 2020 at 6:03 AM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 17, 2020 at 04:53:44PM -0400, Eric Sunshine wrote:
> > > +       case LOFC__COUNT:
> > > +               /*
> > > +                * Include these to catch all enumerated values, but
> > > +                * break to treat them as a bug. Any new values of this
> > > +                * enum will cause a compiler error, as desired.
> > > +                */
> >
> > In general, people will see a warning, not an error, unless they
> > specifically use -Werror (or such) to turn the warning into an error,
> > so this statement is misleading. Also, while some compilers may
> > complain, others may not. So, although the comment claims that we will
> > notice an unhandled enum constant at compile-time, that isn't
> > necessarily the case.
>
> Yes, but that's the best we can do, isn't it?

To be clear, I wasn't questioning the code structure at all. I was
specifically referring to the comment talking about "error" when it
should say "warning" or "possible warning".

Moreover, normally, we use comments to highlight something in the code
which is not obvious or straightforward, so I was questioning whether
this comment is even helpful since the code seems reasonably clear.
And...

> But we can't just remove the default case. Even though enums don't
> generally take on other values, it's legal for them to do so. So we do
> want to make sure we BUG() in that instance.
>
> This is awkward to solve in the general case[1]. But because we're
> returning in each case arm here, it's easy to just put the BUG() after
> the switch. Anything that didn't return is unhandled, and we get the
> best of both: -Wswitch warnings when we need to add a new filter type,
> and a BUG() in the off chance that we see an unexpected value.
>
> So...I dunno. Worth it as a general technique?

...if this is or will become an idiom we want in this codebase, then
it would be silly to write an explanatory comment every place we
employ it. Instead, a document such as CodingGuidelines would likely
be a better fit for such knowledge.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 21:28         ` Taylor Blau
@ 2020-03-18 22:41           ` Junio C Hamano
  2020-03-19 17:10             ` Jeff King
  2020-03-19 17:09           ` Jeff King
  1 sibling, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2020-03-18 22:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, christian.couder, james

Taylor Blau <me@ttaylorr.com> writes:

>> We tried equals, like:
>>
>>   uploadpack.filter=blob:none.allow
>>
>> but there's an interesting side effect. Doing:
>>
>>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
>>
>> doesn't work, because the "-c" parser ends the key at the first "=". As
>> it should, because otherwise we'd get confused by an "=" in a value.
>> This is a failing of the "-c" syntax; it can't represent values with
>> "=". 

s/value/key/ I presume ;-)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Re*: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 18:26         ` Re*: " Junio C Hamano
@ 2020-03-19 17:03           ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-19 17:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 11:26:00AM -0700, Junio C Hamano wrote:

> > One thing that's a little ugly here is the embedded dot in the
> > subsection (i.e., "filter.<filter>"). It makes it look like a four-level
> > key, but really there is no such thing in Git.  But everything else we
> > tried was even uglier.
> 
> I think this gives us the best arrangement by upfront forcing all
> the configuration handers for "<subcommand>.*.<token>" namespace,
> current and future, to use "<group-prefix>" before the unbounded set
> of user-specifiable values that affects the <subcommand> (which is
> "uploadpack").
> 
> So far, the configuration variables that needs to be grouped by
> unbounded set of user-specifiable values we supported happened to
> have only one sensible such set for each <subcommand>, so we could
> get away without such <group-prefix> and it was perfectly OK to
> have, say "guitool.<name>.cmd".

Yeah. We have often just split those out into a separate hierarchy from
<subcommand> E.g., tar.<format>.command, which is really feeding the
git-archive command. We could do that here, too, but I wasn't sure of a
good name (this really is upload-pack specific, though I guess in theory
other commands could grow a need to look at or restrict "remote object
filters").

> Syntactically, the convention to always end such <group-prefix> with
> a dot "." may look unusual, or once readers' eyes get used to them,
> may look natural.  One tiny sad thing about it is that it cannot be
> mechanically enforced, but that is minor.

The biggest downside to implying a 4-level key is that the
case-sensitivity rules may be different. I.e., you can say:

  UploadPack.filter.blob:none.Allow

but not:

  UploadPack.Filter.blob:none.Allow

Since "filter" is part of the subsection, it's case sensitive. We could
match it case-insensitively in upload_pack_config(), but it would crop
up in other laces (e.g., "git config --unset" would still care).

> > We could do "uploadpackfilter.allow" and "uploadpackfilter.<filter>.allow",
> > but that's both ugly _and_ separates these options from the rest of
> > uploadpack.*.
> 
> There is an existing instance of a configuration that affects
> <subcommand> that uses a different word after <subcommand>, which is
> credentialCache.ignoreSIGHUP, and I tend to agree that it is ugly.

I don't think that's what's going on here. It affects only the
credential-cache subcommand, but we avoid hyphens in our key names.
So it really is the subcommand; it's just that the name is a superset of
another command name. :)

> By the way, I noticed the following while I was studying the current
> practice, so before I forget...
> 
> -- >8 --
> Subject: [PATCH] separate tar.* config to its own source file
> 
> Even though there is only one configuration variable in the
> namespace, it is not quite right to have tar.umask described
> among the variables for tag.* namespace.

Yeah, this is definitely an improvement. But I was surprised that
tar.<format>.* wasn't covered here. It is documented in git-archive.
Probably worth moving or duplicating it in git-config.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 21:28         ` Taylor Blau
  2020-03-18 22:41           ` Junio C Hamano
@ 2020-03-19 17:09           ` Jeff King
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-19 17:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, christian.couder, james

On Wed, Mar 18, 2020 at 03:28:18PM -0600, Taylor Blau wrote:

> I wonder. A multi-valued 'uploadpack.filter.allow' *might* solve some
> problems, but the more I turn it over in my head, the more that I think
> that it's creating more headaches for us than it's removing.

IMHO we should avoid multi-valued keys when there's not a compelling
reason. There are a lot of corner cases they introduce (e.g., there's no
standard way to override them rather than adding to the list).

> Another thing we could do is just simply use a different character. It
> may be a little odd, but it keeps the filter-related variables in their
> own sub-section, allowing us to add more configuration sub-variables in
> the future. I guess that calling it something like:
> 
>   $ git config uploadpack.filter@blob:none.allow <true|false>
> 
> is a little strange (i.e., why '@' over '#'? There's certainly no
> precedent here that I can think of...), but maybe it is slightly
> less-weird than a pseudo-four-level key.

I guess it's subjective, but the "@" just feels odd because it's
associated with so many other meanings. Likewise "#".

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices
  2020-03-18 22:41           ` Junio C Hamano
@ 2020-03-19 17:10             ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-19 17:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, christian.couder, james

On Wed, Mar 18, 2020 at 03:41:51PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> We tried equals, like:
> >>
> >>   uploadpack.filter=blob:none.allow
> >>
> >> but there's an interesting side effect. Doing:
> >>
> >>   git -c uploadpack.filter=blob:none.allow=true upload-pack ...
> >>
> >> doesn't work, because the "-c" parser ends the key at the first "=". As
> >> it should, because otherwise we'd get confused by an "=" in a value.
> >> This is a failing of the "-c" syntax; it can't represent values with
> >> "=". 
> 
> s/value/key/ I presume ;-)

Yes. :)

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name'
  2020-03-18 22:38             ` Eric Sunshine
@ 2020-03-19 17:15               ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2020-03-19 17:15 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Taylor Blau, Git List, Christian Couder, james

On Wed, Mar 18, 2020 at 06:38:49PM -0400, Eric Sunshine wrote:

> To be clear, I wasn't questioning the code structure at all. I was
> specifically referring to the comment talking about "error" when it
> should say "warning" or "possible warning".
> 
> Moreover, normally, we use comments to highlight something in the code
> which is not obvious or straightforward, so I was questioning whether
> this comment is even helpful since the code seems reasonably clear.
> And...

OK, I agree with all that. :)

> ...if this is or will become an idiom we want in this codebase, then
> it would be silly to write an explanatory comment every place we
> employ it. Instead, a document such as CodingGuidelines would likely
> be a better fit for such knowledge.

Yeah, that makes sense. If we do use this technique, though, we'll have
to explicitly list "case" lines for the enum values which are meant to
break out to the BUG(). And there it _is_ worth commenting on "yes, we
know about this value but it is not handled here because...". Which is
what you asked for in your original message. :)

Something like:

  switch (c) {
  case LOFC_BLOB_NONE:
	return "blob:none":
  ..etc...
  case LOFC__COUNT:
	/* not a real filter type; just a marker for counting the number */
	break;
  case LOFC_DISABLED:
	/* we have no name for "no filter at all" */
	break;
  }
  BUG(...);

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 12:55     ` Konstantin Tokarev
@ 2020-03-26 22:27       ` Damien Robert
  0 siblings, 0 replies; 62+ messages in thread
From: Damien Robert @ 2020-03-26 22:27 UTC (permalink / raw)
  To: Konstantin Tokarev; +Cc: James Ramsay, git

From Konstantin Tokarev, Mon 16 Mar 2020 at 15:55:39 (+0300) :
> > My situation: coworkers push big files by mistake, I don't want to rewrite
> > history because they are not too well versed with git, but I want to keep
> > *my* repo clean.

> Wouldn't it be better to prevent *them* from such mistakes, e.g. by using
> pre-push review system like Gerrit?

So my coworkers are mathematicians, and not all of them are comfortable
with dvcs, and I already have a hard time convincing them to use git rather
than dropbox. I take it upon myself to make it as easy as possible to use
git (by telling them to push to a different branch when there is a conflict
so that I can handle the conflict myself).

I don't think Gerrit is a solution there...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 16:32     ` Elijah Newren
@ 2020-03-26 22:30       ` Damien Robert
  0 siblings, 0 replies; 62+ messages in thread
From: Damien Robert @ 2020-03-26 22:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: James Ramsay, Git Mailing List

From Elijah Newren, Mon 16 Mar 2020 at 09:32:45 (-0700) :
> > I am interested in more details on how to handle this using replace.

> This comment at the conference was in reference to how people rewrite
> history to remove the big blobs, but then run into issues because
> there are many places outside of git that reference old commit IDs
> (wiki pages, old emails, issues/tickets, etc.) that are now broken.
[...]

Interesting, thanks for the context!

> As for using replace refs to attempt to alleviate problems without
> rewriting history, that's an even bigger can of worms and it doesn't
> solve clone/fetch/gc/fsck nor the many other places you highlighted in
> your email.

I agreed, but one part that makes it easier in my context is that I don't
need to distribute the replaced references, I just need them for myself.
This alleviate a lot of problems already, and as I outlined in my email the
combination of replace ref and sparse checkout is almost enough.

-- 
Damien Robert
http://www.normalesup.org/~robert/pro

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [TOPIC 3/17] Obliterate
  2020-03-16 18:32     ` Phillip Susi
@ 2020-03-26 22:37       ` Damien Robert
  0 siblings, 0 replies; 62+ messages in thread
From: Damien Robert @ 2020-03-26 22:37 UTC (permalink / raw)
  To: Phillip Susi; +Cc: James Ramsay, git

From Phillip Susi, Mon 16 Mar 2020 at 14:32:46 (-0400) :
> Instead of replacing the blob with an empty file, why not replace the
> tree that references it with one that does not?  That way you won't have
> the file in your checkout at all, and the index won't list it so status
> won't show it as changed.

That's an interesting solution, but it only works if the tree itself does
not change.

- This is the case when these large objects were uploaded and then removed
  in another commit [*].
  But in this case I won't checkout (usually) back to this tree anyway, so the
  error due to the missing blob is not a big problem.

- When my coauthors use git as a dropbox alternative where they upload big
  pdf files (rather than only source code or .tex files), they also want to
  keep them there. If they had uploaded these files to the special literature/
  folder I had made for them, I could just replace the literature/ tree by
  an empty one, but they managed to upload them in the root folder which is
  subject to change unfortunately, and it would be annoying to make a new
  replace ref each time.

[*] by myself usually, for instance when people commit spurious tex
generated files like eg *.synctex, despite my .gitignore (I don't know how
they manage this...)

-- 
Damien Robert
http://www.normalesup.org/~robert/pro

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, back to index

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
2020-03-12 14:16   ` Emily Shaffer
2020-03-13 17:56     ` Junio C Hamano
2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
2020-03-12 18:06   ` Konstantin Ryabitsev
2020-03-15 22:19   ` Damien Robert
2020-03-16 12:55     ` Konstantin Tokarev
2020-03-26 22:27       ` Damien Robert
2020-03-16 16:32     ` Elijah Newren
2020-03-26 22:30       ` Damien Robert
2020-03-16 18:32     ` Phillip Susi
2020-03-26 22:37       ` Damien Robert
2020-03-16 20:01     ` Philip Oakley
2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
2020-03-17 20:53         ` Eric Sunshine
2020-03-18 10:03           ` Jeff King
2020-03-18 19:40             ` Junio C Hamano
2020-03-18 22:38             ` Eric Sunshine
2020-03-19 17:15               ` Jeff King
2020-03-18 21:05           ` Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
2020-03-17 21:11         ` Eric Sunshine
2020-03-18 21:18           ` Taylor Blau
2020-03-18 11:18         ` Philip Oakley
2020-03-18 21:20           ` Taylor Blau
2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
2020-03-18 18:26         ` Re*: " Junio C Hamano
2020-03-19 17:03           ` Jeff King
2020-03-18 21:28         ` Taylor Blau
2020-03-18 22:41           ` Junio C Hamano
2020-03-19 17:10             ` Jeff King
2020-03-19 17:09           ` Jeff King
2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
2020-03-12 13:31   ` Emily Shaffer
2020-03-12 17:31     ` Konstantin Ryabitsev
2020-03-12 17:42       ` Jonathan Nieder
2020-03-12 18:00         ` Konstantin Ryabitsev
2020-03-17  0:43     ` Philippe Blain
2020-03-13 21:25   ` Eric Wong
2020-03-14 17:27     ` Jeff King
2020-03-15  0:36       ` inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”] Eric Wong
2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
2020-03-13 20:47 ` Jeff King
2020-03-15 18:42 ` Jakub Narebski
2020-03-16 19:31   ` Jeff King

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git