* Git in Outreachy December 2019? @ 2019-08-27 5:17 Jeff King 2019-08-31 7:58 ` Christian Couder ` (2 more replies) 0 siblings, 3 replies; 63+ messages in thread From: Jeff King @ 2019-08-27 5:17 UTC (permalink / raw) To: git; +Cc: Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer Do we have interested mentors for the next round of Outreachy? The deadline for Git to apply to the program is September 5th. The deadline for mentors to have submitted project descriptions is September 24th. Intern applications would start on October 1st. If there are mentors who want to participate, I can handle the project application and can start asking around for funding. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-08-27 5:17 Git in Outreachy December 2019? Jeff King @ 2019-08-31 7:58 ` Christian Couder 2019-08-31 19:44 ` Olga Telezhnaya 2019-09-04 19:41 ` Jeff King 2019-09-13 20:03 ` Jonathan Tan 2 siblings, 1 reply; 63+ messages in thread From: Christian Couder @ 2019-08-31 7:58 UTC (permalink / raw) To: Jeff King; +Cc: git, Olga Telezhnaya, Elijah Newren, Thomas Gummerer On Tue, Aug 27, 2019 at 7:17 AM Jeff King <peff@peff.net> wrote: > > Do we have interested mentors for the next round of Outreachy? I am interested to co-mentor. > The deadline for Git to apply to the program is September 5th. The > deadline for mentors to have submitted project descriptions is September > 24th. Intern applications would start on October 1st. > > If there are mentors who want to participate, I can handle the project > application and can start asking around for funding. That would be really nice, thank you! ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-08-31 7:58 ` Christian Couder @ 2019-08-31 19:44 ` Olga Telezhnaya 0 siblings, 0 replies; 63+ messages in thread From: Olga Telezhnaya @ 2019-08-31 19:44 UTC (permalink / raw) To: Christian Couder; +Cc: Jeff King, git, Elijah Newren, Thomas Gummerer сб, 31 авг. 2019 г. в 10:58, Christian Couder <christian.couder@gmail.com>: > > On Tue, Aug 27, 2019 at 7:17 AM Jeff King <peff@peff.net> wrote: > > > > Do we have interested mentors for the next round of Outreachy? > > I am interested to co-mentor. I am not ready to give the answer right now, but I will definitely think about the project proposals, I want to help with that part. By the way, we can take some projects (with description) from this summer round of GSoC. 3 of 4 projects were not selected. > > > The deadline for Git to apply to the program is September 5th. The > > deadline for mentors to have submitted project descriptions is September > > 24th. Intern applications would start on October 1st. > > > > If there are mentors who want to participate, I can handle the project > > application and can start asking around for funding. > > That would be really nice, thank you! Thank you! ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-08-27 5:17 Git in Outreachy December 2019? Jeff King 2019-08-31 7:58 ` Christian Couder @ 2019-09-04 19:41 ` Jeff King 2019-09-05 7:24 ` Christian Couder ` (3 more replies) 2019-09-13 20:03 ` Jonathan Tan 2 siblings, 4 replies; 63+ messages in thread From: Jeff King @ 2019-09-04 19:41 UTC (permalink / raw) To: git Cc: Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Tue, Aug 27, 2019 at 01:17:57AM -0400, Jeff King wrote: > Do we have interested mentors for the next round of Outreachy? > > The deadline for Git to apply to the program is September 5th. The > deadline for mentors to have submitted project descriptions is September > 24th. Intern applications would start on October 1st. > > If there are mentors who want to participate, I can handle the project > application and can start asking around for funding. Funding is still up in the air, but in the meantime I've tentatively signed us up (we have until the 24th to have the funding committed). Next we need mentors to submit projects, as well as first-time contribution micro-projects. Project proposals can be made here: https://www.outreachy.org/communities/cfp/git/ If you want to know more about the program, there's a mentor FAQ here: https://www.outreachy.org/mentor/mentor-faq/ or just ask in this thread. The project page has a section to point people in the right direction for first-time contributions. I've left it blank for now, but I think it makes sense to point one (or both) of: - https://git-scm.com/docs/MyFirstContribution - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git as well as a list of micro-projects (or at least instructions on how to find #leftoverbits, though we'd definitely have to step up our labeling, as I do not recall having seen one for a while). -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-04 19:41 ` Jeff King @ 2019-09-05 7:24 ` Christian Couder 2019-09-05 19:39 ` Emily Shaffer ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Christian Couder @ 2019-09-05 7:24 UTC (permalink / raw) To: Jeff King Cc: git, Olga Telezhnaya, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Wed, Sep 4, 2019 at 9:41 PM Jeff King <peff@peff.net> wrote: > > Funding is still up in the air, but in the meantime I've tentatively > signed us up (we have until the 24th to have the funding committed). > Next we need mentors to submit projects, as well as first-time > contribution micro-projects. Great! Thanks for signing up and for all the information! ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-04 19:41 ` Jeff King 2019-09-05 7:24 ` Christian Couder @ 2019-09-05 19:39 ` Emily Shaffer 2019-09-06 11:55 ` Carlo Arenas 2019-09-07 6:36 ` Jeff King 2019-09-08 14:56 ` Pratyush Yadav 2019-09-23 18:07 ` SZEDER Gábor 3 siblings, 2 replies; 63+ messages in thread From: Emily Shaffer @ 2019-09-05 19:39 UTC (permalink / raw) To: Jeff King Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote: > On Tue, Aug 27, 2019 at 01:17:57AM -0400, Jeff King wrote: > > > Do we have interested mentors for the next round of Outreachy? > > > > The deadline for Git to apply to the program is September 5th. The > > deadline for mentors to have submitted project descriptions is September > > 24th. Intern applications would start on October 1st. > > > > If there are mentors who want to participate, I can handle the project > > application and can start asking around for funding. > > Funding is still up in the air, but in the meantime I've tentatively > signed us up (we have until the 24th to have the funding committed). > Next we need mentors to submit projects, as well as first-time > contribution micro-projects. I'm interested to mentor too, but I haven't done anything like this - official mentoring, intern hosting, anything - so I will need to learn :) - Emily ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-05 19:39 ` Emily Shaffer @ 2019-09-06 11:55 ` Carlo Arenas 2019-09-07 6:39 ` Jeff King 2019-09-07 6:36 ` Jeff King 1 sibling, 1 reply; 63+ messages in thread From: Carlo Arenas @ 2019-09-06 11:55 UTC (permalink / raw) To: Emily Shaffer Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino I'm interested to mentor/help too, but I am definitely not a (some people would even argue against "reliable") contributor but I might be better than nothing and could pass my "lessons learned" along, so hopefully next contributors are less of a pain to deal with than I am Carlo ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-06 11:55 ` Carlo Arenas @ 2019-09-07 6:39 ` Jeff King 2019-09-07 10:13 ` Carlo Arenas 0 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-07 6:39 UTC (permalink / raw) To: Carlo Arenas Cc: Emily Shaffer, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Fri, Sep 06, 2019 at 04:55:46AM -0700, Carlo Arenas wrote: > I'm interested to mentor/help too, but I am definitely not a (some > people would even argue against "reliable") contributor but I might be > better than nothing and could pass my "lessons learned" along, so > hopefully next contributors are less of a pain to deal with than I am I just wrote a response to Emily, but I think a lot of it applies to you, as well. In particular, I think both of you are a bit newer to the project than most of the other people who have mentored in the past. In some ways that may be a good thing, as it likely makes it easier to see things from the intern's perspective. :) But it may also introduce some complications if you're working in an area of the code you're not familiar with. Co-mentoring may help with that. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-07 6:39 ` Jeff King @ 2019-09-07 10:13 ` Carlo Arenas 0 siblings, 0 replies; 63+ messages in thread From: Carlo Arenas @ 2019-09-07 10:13 UTC (permalink / raw) To: Jeff King Cc: Emily Shaffer, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Fri, Sep 6, 2019 at 11:40 PM Jeff King <peff@peff.net> wrote: > > I just wrote a response to Emily, but I think a lot of it applies to > you, as well. With the exception of course that Emily can definitely write better code than my attempted hacks > In particular, I think both of you are a bit newer to the project than > most of the other people who have mentored in the past. In some ways > that may be a good thing, as it likely makes it easier to see things > from the intern's perspective. :) But it may also introduce some > complications if you're working in an area of the code you're not > familiar with. Co-mentoring may help with that. agree, and glad to help any way I can, specially if I can help offset some of the boring work required that would free resources somewhere else; then again by replying to Emily, I didn't meant we were both in the same level both technically and resource wise (after all she wrote the documentation on how to contribute and I only found a silly bug on her code that will prevent really obsolete systems to compile and didn't even wrote the patch) Carlo ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-05 19:39 ` Emily Shaffer 2019-09-06 11:55 ` Carlo Arenas @ 2019-09-07 6:36 ` Jeff King 1 sibling, 0 replies; 63+ messages in thread From: Jeff King @ 2019-09-07 6:36 UTC (permalink / raw) To: Emily Shaffer Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Thu, Sep 05, 2019 at 12:39:59PM -0700, Emily Shaffer wrote: > > Funding is still up in the air, but in the meantime I've tentatively > > signed us up (we have until the 24th to have the funding committed). > > Next we need mentors to submit projects, as well as first-time > > contribution micro-projects. > > I'm interested to mentor too, but I haven't done anything like this - > official mentoring, intern hosting, anything - so I will need to learn > :) Great! I think a big challenge is coming up with an appropriately scoped project for the intern. You can take a look at previous Outreachy and GSoC proposals[1] to get the general idea, and then think about some area you feel comfortable working in. You're welcome to send ideas or drafts to the list to get feedback. If you're feeling overwhelmed (or even if you're not), it might make sense to try to partner with somebody else as a co-mentor, especially if they've participated in the past. (It might also be that you can offer to co-mentor with somebody else's project proposal). -Peff [1] Older materials can be found at https://git.github.io/, though these days the proposed projects go directly into the Outreachy system. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-04 19:41 ` Jeff King 2019-09-05 7:24 ` Christian Couder 2019-09-05 19:39 ` Emily Shaffer @ 2019-09-08 14:56 ` Pratyush Yadav 2019-09-09 17:00 ` Jeff King 2019-09-23 18:07 ` SZEDER Gábor 3 siblings, 1 reply; 63+ messages in thread From: Pratyush Yadav @ 2019-09-08 14:56 UTC (permalink / raw) To: Jeff King; +Cc: git Hi Jeff, On 04/09/19 03:41PM, Jeff King wrote: [snip] > The project page has a section to point people in the right direction > for first-time contributions. I've left it blank for now, but I think it > makes sense to point one (or both) of: > > - https://git-scm.com/docs/MyFirstContribution > > - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git > > as well as a list of micro-projects (or at least instructions on how to > find #leftoverbits, though we'd definitely have to step up our labeling, > as I do not recall having seen one for a while). I'd like to put out a proposal regarding first contributions and micro projects. I have a small list of small isolated features and bug fixes that _I think_ git-gui would benefit with. And other people using it can probably add their pet peeves and issues as well. My question is, are these something new contributors should try to work on as an introduction to the community? Since most of these features and fixes are small and isolated, they should be pretty easy to work on. And I think people generally find UI apps a little easier to work on. But I'll play the devil's advocate on my proposal and point out some problems/flaws: - Git-gui is written in Tcl, and git in C (and other languages too, but not Tcl). That means while people do get a feel of the community and general workflow, they don't necessarily get a feel of the actual git internal codebase. - Since I don't see a git-gui related project worth being into the Outreachy program, it essentially means they will likely not work on anything related to their project. - Git-gui is essentially a wrapper on top of git, so people won't get exposure to the git internals. I'd like to hear your and the rest of the community's thoughts about this proposal, and whether it will be a good idea or not. If people do like this idea, I can do a write up on "things to fix in git-gui" that people can add to (and they get a chance to call me stupid for even thinking feature X is a good idea ;)). -- Regards, Pratyush Yadav ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-08 14:56 ` Pratyush Yadav @ 2019-09-09 17:00 ` Jeff King 0 siblings, 0 replies; 63+ messages in thread From: Jeff King @ 2019-09-09 17:00 UTC (permalink / raw) To: Pratyush Yadav; +Cc: git On Sun, Sep 08, 2019 at 08:26:10PM +0530, Pratyush Yadav wrote: > I'd like to put out a proposal regarding first contributions and micro > projects. > > I have a small list of small isolated features and bug fixes that > _I think_ git-gui would benefit with. And other people using it can > probably add their pet peeves and issues as well. My question is, are > these something new contributors should try to work on as an > introduction to the community? Since most of these features and fixes > are small and isolated, they should be pretty easy to work on. And I > think people generally find UI apps a little easier to work on. > > But I'll play the devil's advocate on my proposal and point out some > problems/flaws: > - Git-gui is written in Tcl, and git in C (and other languages too, but > not Tcl). That means while people do get a feel of the community and > general workflow, they don't necessarily get a feel of the actual git > internal codebase. > - Since I don't see a git-gui related project worth being into the > Outreachy program, it essentially means they will likely not work on > anything related to their project. > - Git-gui is essentially a wrapper on top of git, so people won't get > exposure to the git internals. > > I'd like to hear your and the rest of the community's thoughts about > this proposal, and whether it will be a good idea or not. Right, I came up with similar devil's advocate arguments. :) I'm not totally opposed, because part of the point of these microprojects just getting people familiar with interacting with the community and submitting a patch. They're not always in the same area the intern intends to work, just because there's not always a trivial problem to be solved there. So we do look at it mostly as a "can you do this basic test" test, and not necessarily as a prelude to the project. But it would be nice if it were at least in the same _language_ that the ultimate project will be done in. Because we're evaluating the applicant's ability to write code in that language, too. So I dunno. I am on the fence. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-04 19:41 ` Jeff King ` (2 preceding siblings ...) 2019-09-08 14:56 ` Pratyush Yadav @ 2019-09-23 18:07 ` SZEDER Gábor 2019-09-26 9:47 ` SZEDER Gábor 2019-09-26 11:42 ` Johannes Schindelin 3 siblings, 2 replies; 63+ messages in thread From: SZEDER Gábor @ 2019-09-23 18:07 UTC (permalink / raw) To: Jeff King Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote: > The project page has a section to point people in the right direction > for first-time contributions. I've left it blank for now, but I think it > makes sense to point one (or both) of: > > - https://git-scm.com/docs/MyFirstContribution > > - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git > > as well as a list of micro-projects (or at least instructions on how to > find #leftoverbits, though we'd definitely have to step up our labeling, > as I do not recall having seen one for a while). And we should make sure that all microprojects are indeed micro in size. Matheus sent v8 of a 10 patch series in July that started out as a microproject back in February... Here is one more idea for microprojects: Find a group of related preprocessor constants and turn them into an enum. Also find where those constants are stored in variables and in structs and passed around as function parameters, and change the type of those variables, fields and parameters to the new enum. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 18:07 ` SZEDER Gábor @ 2019-09-26 9:47 ` SZEDER Gábor 2019-09-26 19:32 ` Johannes Schindelin 2019-09-26 11:42 ` Johannes Schindelin 1 sibling, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-26 9:47 UTC (permalink / raw) To: Jeff King Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote: > Here is one more idea for microprojects: > > Find a group of related preprocessor constants and turn them into an > enum. Also find where those constants are stored in variables and > in structs and passed around as function parameters, and change the > type of those variables, fields and parameters to the new enum. Peff thought elsewhere in the thread that this is a good idea, so I wanted to try out how this microproject would work in practice, and to add a commit that we can show as a good example, and therefore set out to convert 'cache_entry->ce_flags' to an enum... and will soon send out a RFH patch, because I hit a snag, and am not sure what to do about it :) Anyway: - Finding a group of related preprocessor constants is trivial: the common prefixes and vertically aligned values of related constants stand out in output of 'git grep #define'. Converting them to an enum is fairly trivial as well. - Converting various integer types of variables, struct fields, and function parameters to the new enum is... well, I wouldn't say that it's hard, but it's tedious (but 'ce_flags' with about 20 related constants is perhaps the biggest we have). OTOH, it's all fairly mechanical, and doesn't require any understanding of Git internals. Overall I think that this is indeed a micro-sized microproject, but... - The bad news is that I expect that reviewing the variable, etc. type conversions will be just as tedious, and it's quite easy to miss a conversion or three, so I'm afraid that several rerolls will be necessary. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 9:47 ` SZEDER Gábor @ 2019-09-26 19:32 ` Johannes Schindelin 2019-09-26 21:54 ` SZEDER Gábor 0 siblings, 1 reply; 63+ messages in thread From: Johannes Schindelin @ 2019-09-26 19:32 UTC (permalink / raw) To: SZEDER Gábor Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino [-- Attachment #1: Type: text/plain, Size: 1876 bytes --] Hi, On Thu, 26 Sep 2019, SZEDER Gábor wrote: > On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote: > > Here is one more idea for microprojects: > > > > Find a group of related preprocessor constants and turn them into an > > enum. Also find where those constants are stored in variables and > > in structs and passed around as function parameters, and change the > > type of those variables, fields and parameters to the new enum. > > Peff thought elsewhere in the thread that this is a good idea, so I > wanted to try out how this microproject would work in practice, and to > add a commit that we can show as a good example, and therefore set out > to convert 'cache_entry->ce_flags' to an enum... and will soon send > out a RFH patch, because I hit a snag, and am not sure what to do > about it :) Anyway: > > - Finding a group of related preprocessor constants is trivial: the > common prefixes and vertically aligned values of related constants > stand out in output of 'git grep #define'. Converting them to an > enum is fairly trivial as well. > > - Converting various integer types of variables, struct fields, and > function parameters to the new enum is... well, I wouldn't say > that it's hard, but it's tedious (but 'ce_flags' with about 20 > related constants is perhaps the biggest we have). OTOH, it's all > fairly mechanical, and doesn't require any understanding of Git > internals. Overall I think that this is indeed a micro-sized > microproject, but... > > - The bad news is that I expect that reviewing the variable, etc. > type conversions will be just as tedious, and it's quite easy to > miss a conversion or three, so I'm afraid that several rerolls > will be necessary. I thought Coccinelle could help with that? Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 19:32 ` Johannes Schindelin @ 2019-09-26 21:54 ` SZEDER Gábor 0 siblings, 0 replies; 63+ messages in thread From: SZEDER Gábor @ 2019-09-26 21:54 UTC (permalink / raw) To: Johannes Schindelin Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino On Thu, Sep 26, 2019 at 09:32:35PM +0200, Johannes Schindelin wrote: > Hi, > > On Thu, 26 Sep 2019, SZEDER Gábor wrote: > > > On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote: > > > Here is one more idea for microprojects: > > > > > > Find a group of related preprocessor constants and turn them into an > > > enum. Also find where those constants are stored in variables and > > > in structs and passed around as function parameters, and change the > > > type of those variables, fields and parameters to the new enum. > > > > Peff thought elsewhere in the thread that this is a good idea, so I > > wanted to try out how this microproject would work in practice, and to > > add a commit that we can show as a good example, and therefore set out > > to convert 'cache_entry->ce_flags' to an enum... and will soon send > > out a RFH patch, because I hit a snag, and am not sure what to do > > about it :) Anyway: > > > > - Finding a group of related preprocessor constants is trivial: the > > common prefixes and vertically aligned values of related constants > > stand out in output of 'git grep #define'. Converting them to an > > enum is fairly trivial as well. > > > > - Converting various integer types of variables, struct fields, and > > function parameters to the new enum is... well, I wouldn't say > > that it's hard, but it's tedious (but 'ce_flags' with about 20 > > related constants is perhaps the biggest we have). OTOH, it's all > > fairly mechanical, and doesn't require any understanding of Git > > internals. Overall I think that this is indeed a micro-sized > > microproject, but... > > > > - The bad news is that I expect that reviewing the variable, etc. > > type conversions will be just as tedious, and it's quite easy to > > miss a conversion or three, so I'm afraid that several rerolls > > will be necessary. > > I thought Coccinelle could help with that? Maybe it could, I don't know. I mean, it should be able to e.g. change the data type of the function parameter 'param' if the function body contains a param & <any named value of the new enum or their combinarion> statement, or similar statements with operators '|=', '&=', or '='. I have no idea how to tell Coccinelle to do that. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 18:07 ` SZEDER Gábor 2019-09-26 9:47 ` SZEDER Gábor @ 2019-09-26 11:42 ` Johannes Schindelin 1 sibling, 0 replies; 63+ messages in thread From: Johannes Schindelin @ 2019-09-26 11:42 UTC (permalink / raw) To: SZEDER Gábor Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino [-- Attachment #1: Type: text/plain, Size: 1346 bytes --] Hi, On Mon, 23 Sep 2019, SZEDER Gábor wrote: > On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote: > > The project page has a section to point people in the right direction > > for first-time contributions. I've left it blank for now, but I think it > > makes sense to point one (or both) of: > > > > - https://git-scm.com/docs/MyFirstContribution > > > > - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git > > > > as well as a list of micro-projects (or at least instructions on how to > > find #leftoverbits, though we'd definitely have to step up our labeling, > > as I do not recall having seen one for a while). > > And we should make sure that all microprojects are indeed micro in > size. Matheus sent v8 of a 10 patch series in July that started out > as a microproject back in February... Indeed. > Here is one more idea for microprojects: > > Find a group of related preprocessor constants and turn them into an > enum. Also find where those constants are stored in variables and > in structs and passed around as function parameters, and change the > type of those variables, fields and parameters to the new enum. I agree that this is a good suggestion, and turned this #leftoverbits into https://github.com/gitgitgadget/git/issues/357. Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-08-27 5:17 Git in Outreachy December 2019? Jeff King 2019-08-31 7:58 ` Christian Couder 2019-09-04 19:41 ` Jeff King @ 2019-09-13 20:03 ` Jonathan Tan 2019-09-13 20:51 ` Jeff King 2 siblings, 1 reply; 63+ messages in thread From: Jonathan Tan @ 2019-09-13 20:03 UTC (permalink / raw) To: peff; +Cc: git, Jonathan Tan > Do we have interested mentors for the next round of Outreachy? > > The deadline for Git to apply to the program is September 5th. The > deadline for mentors to have submitted project descriptions is September > 24th. Intern applications would start on October 1st. > > If there are mentors who want to participate, I can handle the project > application and can start asking around for funding. I probably should have replied earlier, but if Git has applied to the program, feel free to include me as a mentor. There was a discussion about mentors/co-mentors possibly working in a part of a codebase that they are not familiar with [1] - firstly, I think that's possible and even likely for most of us. :-) If any question arises, maybe it would be sufficient for the mentors to just help formulate the question (or pose the question themselves) to the mailing list. If "[Outreachy]" appears in the subject, I'll make it a higher priority for myself to answer those. [1] https://public-inbox.org/git/20190907063958.GB28860@sigill.intra.peff.net/ ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-13 20:03 ` Jonathan Tan @ 2019-09-13 20:51 ` Jeff King 2019-09-16 18:42 ` Emily Shaffer 2019-09-20 17:04 ` Jonathan Tan 0 siblings, 2 replies; 63+ messages in thread From: Jeff King @ 2019-09-13 20:51 UTC (permalink / raw) To: Jonathan Tan; +Cc: git On Fri, Sep 13, 2019 at 01:03:17PM -0700, Jonathan Tan wrote: > > Do we have interested mentors for the next round of Outreachy? > > > > The deadline for Git to apply to the program is September 5th. The > > deadline for mentors to have submitted project descriptions is September > > 24th. Intern applications would start on October 1st. > > > > If there are mentors who want to participate, I can handle the project > > application and can start asking around for funding. > > I probably should have replied earlier, but if Git has applied to the > program, feel free to include me as a mentor. Great! See my followup here: https://public-inbox.org/git/20190904194114.GA31398@sigill.intra.peff.net/ Prospective mentors need to sign up on that site, and should propose a project they'd be willing to mentor. > There was a discussion about mentors/co-mentors possibly working in a > part of a codebase that they are not familiar with [1] - firstly, I > think that's possible and even likely for most of us. :-) If any > question arises, maybe it would be sufficient for the mentors to just > help formulate the question (or pose the question themselves) to the > mailing list. If "[Outreachy]" appears in the subject, I'll make it a > higher priority for myself to answer those. I do think it's OK for mentors to not be intimately familiar with the part of the code that is being touched, as long as the project is simple enough that they can pick up the technical details easily as-needed. A lot of what mentors will help mentees with is the overall process (both Git-specific parts, but also more general development issues). But I think the proposed projects do need to be feasible. I'm happy to discuss possible projects if anybody has an idea but isn't sure how to develop it into a proposal. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-13 20:51 ` Jeff King @ 2019-09-16 18:42 ` Emily Shaffer 2019-09-16 21:33 ` Eric Wong ` (2 more replies) 2019-09-20 17:04 ` Jonathan Tan 1 sibling, 3 replies; 63+ messages in thread From: Emily Shaffer @ 2019-09-16 18:42 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, git On Fri, Sep 13, 2019 at 04:51:49PM -0400, Jeff King wrote: > On Fri, Sep 13, 2019 at 01:03:17PM -0700, Jonathan Tan wrote: > > > > Do we have interested mentors for the next round of Outreachy? > > > > > > The deadline for Git to apply to the program is September 5th. The > > > deadline for mentors to have submitted project descriptions is September > > > 24th. Intern applications would start on October 1st. > > > > > > If there are mentors who want to participate, I can handle the project > > > application and can start asking around for funding. > > > > I probably should have replied earlier, but if Git has applied to the > > program, feel free to include me as a mentor. > > Great! See my followup here: > > https://public-inbox.org/git/20190904194114.GA31398@sigill.intra.peff.net/ > > Prospective mentors need to sign up on that site, and should propose a > project they'd be willing to mentor. > > > There was a discussion about mentors/co-mentors possibly working in a > > part of a codebase that they are not familiar with [1] - firstly, I > > think that's possible and even likely for most of us. :-) If any > > question arises, maybe it would be sufficient for the mentors to just > > help formulate the question (or pose the question themselves) to the > > mailing list. If "[Outreachy]" appears in the subject, I'll make it a > > higher priority for myself to answer those. > > I do think it's OK for mentors to not be intimately familiar with the > part of the code that is being touched, as long as the project is simple > enough that they can pick up the technical details easily as-needed. A > lot of what mentors will help mentees with is the overall process (both > Git-specific parts, but also more general development issues). But I > think the proposed projects do need to be feasible. > > I'm happy to discuss possible projects if anybody has an idea but isn't > sure how to develop it into a proposal. Hi Peff, Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to talk about projects and we came up with a trimmed list; not sure what more needs to be done to make them into fully-fledged proposals. For starter microprojects, we came up with: - cleanup a test script (although we need to identify particularly which ones and what counts as "clean") - moving doc from documentation/technical/api-* to comments in the appropriate header instead - teach a command which currently handles its own argv how to use parse-options instead - add a user.timezone option which Git can use if present rather than checking system local time For the longer projects, we came up with a few more: - find places where we can pass in the_repository as arg instead of using global the_repository - convert sh/pl commands to C, including: - git-submodules.sh - git-bisect.sh - rebase --preserve-merges - add -i (We were afraid this might be too boring, though.) - reduce/eliminate use of fetch_if_missing global - create a better difftool/mergetool for format of choice (this one ends up existing outside of the Git codebase, but still may be pretty adjacent and big impact) - training wheels/intro/tutorial mode? (We thought it may be useful to make available a very basic "I just want to make a single PR and not learn graph theory" mode, toggled by config switch) - "did you mean?" for common use cases, e.g. commit with a dirty working tree and no staged files - either offer a hint or offer a prompt to continue ("Stage changed files and commit? [Y/n]") - new `git partial-clone` command to interactively set a filter, configure other partial clone settings - add progress bars in various situations - add a TUI to deal more easily with the mailing list. Jonathan Tan has a strong idea of what this TUI would do... This one would also end up external but adjacent to the Git codebase. - try and make progress towards running many tests from a single test file in parallel - maybe this is too big, I'm not sure if we know how many of our tests are order-dependent within a file for now... It might make sense to only focus on scoping the ones we feel most interested in. We came up with a pretty big list because we had some other programs in mind, so I suppose it's not necessary to develop all of them for this program. - Emily ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-16 18:42 ` Emily Shaffer @ 2019-09-16 21:33 ` Eric Wong 2019-09-16 21:44 ` SZEDER Gábor 2019-09-17 11:23 ` Johannes Schindelin 2 siblings, 0 replies; 63+ messages in thread From: Eric Wong @ 2019-09-16 21:33 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git, Konstantin Ryabitsev Emily Shaffer <emilyshaffer@google.com> wrote: > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to talk > about projects and we came up with a trimmed list; not sure what more > needs to be done to make them into fully-fledged proposals. <snip> > For the longer projects, we came up with a few more: <snip> > - add a TUI to deal more easily with the mailing list. Jonathan Tan has > a strong idea of what this TUI would do... This one would also end up > external but adjacent to the Git codebase. AFAIK, Konstantin is/was interested in exploring some of these ideas with Linux Foundation, too (but he's on vacation atm) <snip> > It might make sense to only focus on scoping the ones we feel most > interested in. We came up with a pretty big list because we had some > other programs in mind, so I suppose it's not necessary to develop all > of them for this program. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-16 18:42 ` Emily Shaffer 2019-09-16 21:33 ` Eric Wong @ 2019-09-16 21:44 ` SZEDER Gábor 2019-09-16 23:13 ` Jonathan Nieder 2019-09-17 11:23 ` Johannes Schindelin 2 siblings, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-16 21:44 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git On Mon, Sep 16, 2019 at 11:42:08AM -0700, Emily Shaffer wrote: > - try and make progress towards running many tests from a single test > file in parallel - maybe this is too big, I'm not sure if we know how > many of our tests are order-dependent within a file for now... Forget it, too many (most?) of them are order-dependent. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-16 21:44 ` SZEDER Gábor @ 2019-09-16 23:13 ` Jonathan Nieder 2019-09-17 0:59 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: Jonathan Nieder @ 2019-09-16 23:13 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git SZEDER Gábor wrote: > On Mon, Sep 16, 2019 at 11:42:08AM -0700, Emily Shaffer wrote: >> - try and make progress towards running many tests from a single test >> file in parallel - maybe this is too big, I'm not sure if we know how >> many of our tests are order-dependent within a file for now... > > Forget it, too many (most?) of them are order-dependent. Hm, I remember a conversation about this with Thomas Rast a while ago. It seemed possible at the time. Most tests use "setup" or "set up" in the names of test assertions that are required by later tests. It's very helpful for debugging and maintenance to be able to skip or reorder some tests, so I've been able to rely on this a bit. Of course there's no automated checking in place for that, so there are plenty of test scripts that are exceptions to it. If we introduce a test_setup helper, then we would not have to rely on convention any more. A test_setup test assertion would represent a "barrier" that all later tests in the file can rely on. We could introduce some automated checking that these semantics are respected, and then we get a maintainability improvement in every test script that uses test_setup. (In scripts without any test_setup, treat all test assertions as barriers since they haven't been vetted.) With such automated tests in place, we can then try updating all tests that say "setup" or "set up" to use test_setup and see what fails. Some other tests cannot run in parallel for other reasons (e.g. HTTP tests). These can be declared as such, and then we have the ability to run arbitrary individual tests in parallel. Most of the time in a test run involves multiple test scripts running in parallel already, so this isn't a huge win for the time to complete a normal test run. It helps more with expensive runs like --valgrind. Thanks, Jonathan ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-16 23:13 ` Jonathan Nieder @ 2019-09-17 0:59 ` Jeff King 0 siblings, 0 replies; 63+ messages in thread From: Jeff King @ 2019-09-17 0:59 UTC (permalink / raw) To: Jonathan Nieder; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git On Mon, Sep 16, 2019 at 04:13:49PM -0700, Jonathan Nieder wrote: > Most tests use "setup" or "set up" in the names of test assertions > that are required by later tests. It's very helpful for debugging and > maintenance to be able to skip or reorder some tests, so I've been > able to rely on this a bit. Of course there's no automated checking > in place for that, so there are plenty of test scripts that are > exceptions to it. > > If we introduce a test_setup helper, then we would not have to rely on > convention any more. A test_setup test assertion would represent a > "barrier" that all later tests in the file can rely on. We could > introduce some automated checking that these semantics are respected, > and then we get a maintainability improvement in every test script > that uses test_setup. (In scripts without any test_setup, treat all > test assertions as barriers since they haven't been vetted.) > > With such automated tests in place, we can then try updating all tests > that say "setup" or "set up" to use test_setup and see what fails. > > Some other tests cannot run in parallel for other reasons (e.g. HTTP > tests). These can be declared as such, and then we have the ability > to run arbitrary individual tests in parallel. This isn't quite the same, but couldn't we get most of the gain just by splitting the tests into more scripts? As you note, we already run those in parallel, so it increases the granularity of our parallelism. And you don't have to worry about skipping tests 1 through 18 if they're in another file; you just don't consider them at all. It also Just Works with things like HTTP, which choose ports under the assumption that the other tests are running simultaneously. It doesn't help with the case that test 1 does setup, and then tests 2, 3, and 4 are logically independent (and some could be skipped or not). If anybody is interested in splitting up scripts, the obvious ones to look at are the ones that take the longest (t9001 takes 55s on my system, though the whole suite runs in only 95s). Of course you can get most of the parallelism benefit by using "prove --state=slow,save", which ends up with lots of short scripts at the end (rather than one slow one chewing one CPU while the rest sit idle). > Most of the time in a test run involves multiple test scripts running > in parallel already, so this isn't a huge win for the time to complete > a normal test run. It helps more with expensive runs like --valgrind. Two easier suggestions than trying to make --valgrind faster: - use SANITIZE=address, which is way cheaper than valgrind (and catches more things!) - use --valgrind-only=17 to run everything else in "fast" mode, but check the test you care about -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-16 18:42 ` Emily Shaffer 2019-09-16 21:33 ` Eric Wong 2019-09-16 21:44 ` SZEDER Gábor @ 2019-09-17 11:23 ` Johannes Schindelin 2019-09-17 12:02 ` SZEDER Gábor ` (3 more replies) 2 siblings, 4 replies; 63+ messages in thread From: Johannes Schindelin @ 2019-09-17 11:23 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git Hi Emily, On Mon, 16 Sep 2019, Emily Shaffer wrote: > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to > talk about projects and we came up with a trimmed list; not sure what > more needs to be done to make them into fully-fledged proposals. Thank you for doing this! > For starter microprojects, we came up with: > > - cleanup a test script (although we need to identify particularly > which ones and what counts as "clean") > - moving doc from documentation/technical/api-* to comments in the > appropriate header instead > - teach a command which currently handles its own argv how to use > parse-options instead > - add a user.timezone option which Git can use if present rather than > checking system local time Nice projects, all. There are a couple more ideas on https://github.com/gitgitgadget/git/issues, they could probably use some tagging. > For the longer projects, we came up with a few more: > > - find places where we can pass in the_repository as arg instead of > using global the_repository Good project, if a bit boring ;-) Also, `the_index` is used a lot in `builtin/*.c`, still. > - convert sh/pl commands to C, including: > - git-submodules.sh I am of two minds there. But mostly, I am of the "friends don't let friends use submodules" camp, so I would not even want to mentor for this project: it just makes me shudder too much every time I have to work with/on submodules. Of course, if others are interested, I'd hardly object to turn this into a built-in. > - git-bisect.sh That would be my top recommendation, especially given how much effort Tanushree put in last winter to make this conversion to C so much more achievable than before. > - rebase --preserve-merges No. `rebase -p` is already deprecated in favor of `rebase -r` (which _is_ already built-in). I already have patches lined up to drop that rebase backend. Let's not waste effort on converting this script to C. > - add -i Please see PRs #170-#175 on https://github.com/gitgitgadget/git/pulls, and please do help by adding your review of #170 (which was already submitted as v4: https://public-inbox.org/git/pull.170.v4.git.gitgitgadget@gmail.com/ In other words: this project is well under way. In fact, Git for Windows users enjoy this as an opt-in already. > (We were afraid this might be too boring, though.) Converting shell/Perl scripts into built-in C never looks as much fun as open-ended projects with lots of playing around, but the advantage of the former is that they can be easily structured, offer a lot of opportunity for learning, and they are ultimately more rewarding because the goals are much better defined than many other projects'. Another script that would _really_ benefit from being converted to C: `mergetool`. Especially on Windows, where the over-use of spawned processes really hurts, it is awfully slow a command. To complete the list of sh/pl commands: - git-merge-octopus.sh - git-merge-one-file.sh - git-merge-resolve.sh These seem to be good candidates for conversion to built-ins. Their functionality is well-exercised in the test suite, their complexity is quite manageable, and there is no good reason that these should be scripted. The only slightly challenging aspect might be that `merge-one-file` is actually not a merge strategy, but it is used as helper to be passed to `git merge-index` via the `-o <helper>` option, which makes it slightly awkward to be implemented as a built-in. A better approach would therefore be to special-case this value in `merge-index` and execute the C code directly, without the detour of spawning a built-in. - git-difftool--helper.sh - git-mergetool--lib.sh These would be converted as part of making `mergetool` a built-in, I believe. - git-filter-branch.sh This one is in the process of being deprecated in favor of `git filter-repo` (which is an external tool), so I don't think there would be much use in wasting energy on trying to convert it to C. Especially given that it wants to call shell script snippets all over the place, and those shell script snippets are supposed to run in the same context, which might actually make it completely impossible to convert this to C at all. - git-legacy-stash.sh This will go away once the built-in stash is considered good enough. - git-instaweb.sh - git-request-pull.sh - git-send-email.perl - git-web--browse.sh I don't think that any of these should be converted. They are just too unimportant from a performance point of view, and obscure enough that even their portability issues don't matter too much. As to `send-email` in particular: I would not want anybody to drag in all the dependencies required to convert `send-email` to a built-in to begin with. - git-archimport.perl - git-cvsexportcommit.perl - git-cvsimport.perl - git-cvsserver.perl - git-quiltimport.sh - git-svn.perl These are all connectors of some sort to other version control software. It also feels like they become less and less important, as Git really takes over the world. At some stage, I think, it would make sense to push those scripts out into their own repositories, looking for new maintainers (and if none can be found, then there really is not enough need for them to begin with, and they can be archived). > - reduce/eliminate use of fetch_if_missing global > - create a better difftool/mergetool for format of choice (this one > ends up existing outside of the Git codebase, but still may be pretty > adjacent and big impact) > - training wheels/intro/tutorial mode? (We thought it may be useful to > make available a very basic "I just want to make a single PR and not > learn graph theory" mode, toggled by config switch) > - "did you mean?" for common use cases, e.g. commit with a dirty > working tree and no staged files - either offer a hint or offer a > prompt to continue ("Stage changed files and commit? [Y/n]") > - new `git partial-clone` command to interactively set a filter, > configure other partial clone settings > - add progress bars in various situations > - add a TUI to deal more easily with the mailing list. Jonathan Tan has > a strong idea of what this TUI would do... This one would also end up > external but adjacent to the Git codebase. I don't think that this would be a good project for anybody except people who are already really, really familiar with our mailing list-centric workflow. > - try and make progress towards running many tests from a single test > file in parallel - maybe this is too big, I'm not sure if we know how > many of our tests are order-dependent within a file for now... Another, potentially more rewarding, project would be to modernize our test suite framework, so that it is not based on Unix shell scripting, but on C instead. The fact that it is based on Unix shell scripting not only costs a lot of speed, especially on Windows, it also limits us quite a bit, and I am talking about a lot more than just the awkwardness of having to think about options of BSD vs GNU variants of common command-line tools. For example, many, many, if not all, test cases, spend the majority of their code on setting up specific scenarios. I don't know about you, but personally I have to dive into many of them when things fail (and I _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really have to say that most of that code is hard to follow and does not make it easy to form a mental model of what the code tries to accomplish. To address this, a while ago Thomas Rast started to use `fast-export`ed commit histories in test scripts (see e.g. `t/t3206/history.export`). I still find that this fails to make it easier for occasional readers to understand the ideas underlying the test cases. Another approach is to document heavily the ideas first, then use code to implement them. For example, t3430 starts with this: [...] Initial setup: -- B -- (first) / \ A - C - D - E - H (master) \ \ / \ F - G (second) \ Conflicting-G [...] test_commit A && git checkout -b first && test_commit B && git checkout master && test_commit C && test_commit D && git merge --no-commit B && test_tick && git commit -m E && git tag -m E E && git checkout -b second C && test_commit F && test_commit G && git checkout master && git merge --no-commit G && test_tick && git commit -m H && git tag -m H H && git checkout A && test_commit conflicting-G G.t [...] While this is _somewhat_ better than having only the code, I am still unhappy about it: this wall of `test_commit` lines interspersed with other commands is very hard to follow. If we were to (slowly) convert our test suite framework to C, we could change that. One idea would be to allow recreating commit history from something that looks like the output of `git log`, or even `git log --graph --oneline`, much like `git mktree` (which really should have been a test helper instead of a Git command, but I digress) takes something that looks like the output of `git ls-tree` and creates a tree object from it. Another thing that would be much easier if we moved more and more parts of the test suite framework to C: we could implement more powerful assertions, a lot more easily. For example, the trace output of a failed `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more focused on what is going wrong than on cluttering the terminal window with almost useless lines which are tedious to sift through. Likewise, having a framework in C would make it a lot easier to improve debugging, e.g. by making test scripts "resumable" (guarded by an option, it could store a complete state, including a copy of the trash directory, before executing commands, which would allow "going back in time" and calling a failing command with a debugger, or with valgrind, or just seeing whether the command would still fail, i.e. whether the test case is flaky). Also, things like the code tracing via `-x` (which relies on Bash functionality in order to work properly, and which _still_ does not work as intended if your test case evaluates a lazy prereq that has not been evaluated before) could be "done right". In many ways, our current test suite seems to test Git's functionality as much as (core) contributors' abilities to implement test cases in Unix shell script, _correctly_, and maybe also contributors' patience. You could say that it tests for the wrong thing at least half of the time, by design. It might look like a somewhat less important project, but given that we exercise almost 150,000 test cases with every CI build, I think it does make sense to grind our axe for a while, so to say. Therefore, it might be a really good project to modernize our test suite. To take ideas from modern test frameworks such as Jest and try to bring them to C. Which means that new contributors would probably be better suited to work on this project than Git old-timers! And the really neat thing about this project is that it could be done incrementally. > It might make sense to only focus on scoping the ones we feel most > interested in. We came up with a pretty big list because we had some > other programs in mind, so I suppose it's not necessary to develop all > of them for this program. I don't find that list particularly big, to be honest ;-) Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 11:23 ` Johannes Schindelin @ 2019-09-17 12:02 ` SZEDER Gábor 2019-09-23 12:47 ` Johannes Schindelin 2019-09-17 15:10 ` Christian Couder ` (2 subsequent siblings) 3 siblings, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-17 12:02 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > Also, things like the code tracing via `-x` (which relies on Bash > functionality in order to work properly, Not really. > and which _still_ does not work > as intended if your test case evaluates a lazy prereq that has not been > evaluated before I don't see any striking differences between the trace output of a test involving a lazy prereq from Bash or dash: $ cat t9999-test.sh #!/bin/sh test_description='test' . ./test-lib.sh test_lazy_prereq DUMMY_PREREQ ' : lazily evaluating a dummy prereq ' test_expect_success DUMMY_PREREQ 'test' ' true ' test_done $ ./t9999-test.sh -x Initialized empty Git repository in /home/szeder/src/git/t/trash directory.t9999-test/.git/ checking prerequisite: DUMMY_PREREQ mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" && ( cd "$TRASH_DIRECTORY/prereq-test-dir" && : lazily evaluating a dummy prereq ) + mkdir -p /home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir + cd /home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir + : lazily evaluating a dummy prereq prerequisite DUMMY_PREREQ ok expecting success of 9999.1 'test': true + true ok 1 - test # passed all 1 test(s) 1..1 $ bash ./t9999-test.sh -x Initialized empty Git repository in /home/szeder/src/git/t/trash directory.t9999-test/.git/ checking prerequisite: DUMMY_PREREQ mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" && ( cd "$TRASH_DIRECTORY/prereq-test-dir" && : lazily evaluating a dummy prereq ) ++ mkdir -p '/home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir' ++ cd '/home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir' ++ : lazily evaluating a dummy prereq prerequisite DUMMY_PREREQ ok expecting success of 9999.1 'test': true ++ true ok 1 - test # passed all 1 test(s) 1..1 ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 12:02 ` SZEDER Gábor @ 2019-09-23 12:47 ` Johannes Schindelin 2019-09-23 16:58 ` SZEDER Gábor 2019-09-23 18:19 ` Jeff King 0 siblings, 2 replies; 63+ messages in thread From: Johannes Schindelin @ 2019-09-23 12:47 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git [-- Attachment #1: Type: text/plain, Size: 883 bytes --] Hi, On Tue, 17 Sep 2019, SZEDER Gábor wrote: > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > > Also, things like the code tracing via `-x` (which relies on Bash > > functionality in order to work properly, > > Not really. To work properly. What I meant was the trick we need to play with `BASH_XTRACEFD`. > > and which _still_ does not work as intended if your test case > > evaluates a lazy prereq that has not been evaluated before > > I don't see any striking differences between the trace output of a test > involving a lazy prereq from Bash or dash: > > [...] The evaluation of the lazy prereq is indeed not different between Bash or dash. It is nevertheless quite disruptive in the trace of a test script, especially when it is evaluated for a test case that is skipped explicitly via the `--run` option. Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 12:47 ` Johannes Schindelin @ 2019-09-23 16:58 ` SZEDER Gábor 2019-09-26 11:04 ` Johannes Schindelin 2019-09-23 18:19 ` Jeff King 1 sibling, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-23 16:58 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote: > Hi, > > On Tue, 17 Sep 2019, SZEDER Gábor wrote: > > > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > > > Also, things like the code tracing via `-x` (which relies on Bash > > > functionality in order to work properly, > > > > Not really. > > To work properly. What I meant was the trick we need to play with > `BASH_XTRACEFD`. I'm still unsure what BASH_XTRACEFD trick you mean. AFAICT we don't play any tricks with it to make '-x' work properly, and indeed '-x' tracing works properly even without BASH_XTRACEFD (and to achive that we did have to play some tricks, but not any with BASH_XTRACEFD; perhaps these tricks are what you meant?). > > > and which _still_ does not work as intended if your test case > > > evaluates a lazy prereq that has not been evaluated before > > > > I don't see any striking differences between the trace output of a test > > involving a lazy prereq from Bash or dash: > > > > [...] > > The evaluation of the lazy prereq is indeed not different between Bash > or dash. It is nevertheless quite disruptive in the trace of a test > script, especially when it is evaluated for a test case that is skipped > explicitly via the `--run` option. But then the actual issue is the unnecessary evaluation of the prereq even when the test framework could know in advance that the test case should be skipped anyway, and the trace from it is a mere side effect, no? ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 16:58 ` SZEDER Gábor @ 2019-09-26 11:04 ` Johannes Schindelin 2019-09-26 13:28 ` SZEDER Gábor 0 siblings, 1 reply; 63+ messages in thread From: Johannes Schindelin @ 2019-09-26 11:04 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git [-- Attachment #1: Type: text/plain, Size: 2681 bytes --] Hi, On Mon, 23 Sep 2019, SZEDER Gábor wrote: > On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote: > > > > On Tue, 17 Sep 2019, SZEDER Gábor wrote: > > > > > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > > > > Also, things like the code tracing via `-x` (which relies on Bash > > > > functionality in order to work properly, > > > > > > Not really. > > > > To work properly. What I meant was the trick we need to play with > > `BASH_XTRACEFD`. > > I'm still unsure what BASH_XTRACEFD trick you mean. AFAICT we don't > play any tricks with it to make '-x' work properly, and indeed '-x' > tracing works properly even without BASH_XTRACEFD (and to achive that > we did have to play some tricks, but not any with BASH_XTRACEFD; > perhaps these tricks are what you meant?). It works okay some of the time. But IIRC `-x -V` requires the `BASH_XTRACEFD` trick. However, I start to feel like I am distracted deliberately from my main argument: that shell scripting is simply an awful language to implement a highly reliable test framework. That we need to rely on Bash, at least some of the time, is just _one_ of the many shortcomings. > > > > and which _still_ does not work as intended if your test case > > > > evaluates a lazy prereq that has not been evaluated before > > > > > > I don't see any striking differences between the trace output of a test > > > involving a lazy prereq from Bash or dash: > > > > > > [...] > > > > The evaluation of the lazy prereq is indeed not different between Bash > > or dash. It is nevertheless quite disruptive in the trace of a test > > script, especially when it is evaluated for a test case that is skipped > > explicitly via the `--run` option. > > But then the actual issue is the unnecessary evaluation of the prereq > even when the test framework could know in advance that the test case > should be skipped anyway, and the trace from it is a mere side effect, > no? I forgot a crucial tidbit: if you run with `-x` and a lazy prereq is evaluated, not only is the output disruptive, the trace is also turned off after the lazy prereq, _before_ the actual test case is run. So you don't see any trace of the actual test case. In any case, I really do not want to see this thread derailed into specifics of Bashisms and bugs in our test framework. My main point should not be diluted: a test framework should be implemented in a language that offers speedy execution of even complicated logic, proper error checking, and higher data types (i.e. other than "everything is a string"). Unix shell script is not it. Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 11:04 ` Johannes Schindelin @ 2019-09-26 13:28 ` SZEDER Gábor 2019-09-26 19:39 ` Johannes Schindelin 0 siblings, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-26 13:28 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote: > > > > > Also, things like the code tracing via `-x` (which relies on Bash > > > > > functionality in order to work properly, > > > > > > > > Not really. > > > > > > To work properly. What I meant was the trick we need to play with > > > `BASH_XTRACEFD`. > > > > I'm still unsure what BASH_XTRACEFD trick you mean. AFAICT we don't > > play any tricks with it to make '-x' work properly, and indeed '-x' > > tracing works properly even without BASH_XTRACEFD (and to achive that > > we did have to play some tricks, but not any with BASH_XTRACEFD; > > perhaps these tricks are what you meant?). > > It works okay some of the time. As far as I can tell it works all the time. (Well, Ok, with the exception of t1510, but only because back then I couldn't be bothered to figure out how that test script works. But even that script handles '-x' without BASH_XTRACEFD gracefully, and it's safe to run the whole test suite with '-x'.) > But IIRC `-x -V` requires the `BASH_XTRACEFD` trick. No, it doesn't; '-V' should have no effect on the '-x' trace whatsoever. As soon as I fixed running the test suite with '-x' and /bin/sh I added GIT_TEST_OPTS="--verbose-log -x" to my 'config.mak' and to our CI scripts. The default shell running the test suite in our Linux CI jobs is dash, and in our macOS jobs it's an ancient Bash version that doesn't yet have BASH_XTRACEFD. As far as I know they all work as they should. > However, I start to feel like I am distracted deliberately from my main > argument That was definitely not my intention. However, if there are any open issues with '-x', then I do want to know about it and fix it sooner rather than later. Alas, I still don't have the slightest clue about what your issue actually is. > I forgot a crucial tidbit: if you run with `-x` and a lazy prereq is > evaluated, not only is the output disruptive, the trace is also turned > off after the lazy prereq, _before_ the actual test case is run. So you > don't see any trace of the actual test case. Tracing is always turned on before running the test case, independently from whether a lazy prereq was evaluated or not, so we do always see the trace of the actual test case. Notice the '+ true' and '++ true' lines in my earlier reply including the test traces: those lines are the trace of the actual test case. https://public-inbox.org/git/20190917120230.GA27531@szeder.dev/ ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 13:28 ` SZEDER Gábor @ 2019-09-26 19:39 ` Johannes Schindelin 2019-09-26 21:44 ` SZEDER Gábor 0 siblings, 1 reply; 63+ messages in thread From: Johannes Schindelin @ 2019-09-26 19:39 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git [-- Attachment #1: Type: text/plain, Size: 1740 bytes --] Hi, On Thu, 26 Sep 2019, SZEDER Gábor wrote: > On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote: > > > > > > Also, things like the code tracing via `-x` (which relies on Bash > > > > > > functionality in order to work properly, > > > > > > > > > > Not really. > > > > > > > > To work properly. What I meant was the trick we need to play with > > > > `BASH_XTRACEFD`. > > > > > > I'm still unsure what BASH_XTRACEFD trick you mean. AFAICT we don't > > > play any tricks with it to make '-x' work properly, and indeed '-x' > > > tracing works properly even without BASH_XTRACEFD (and to achive that > > > we did have to play some tricks, but not any with BASH_XTRACEFD; > > > perhaps these tricks are what you meant?). > > > > It works okay some of the time. > > As far as I can tell it works all the time. I must be misinterpreting this part of `t/test-lib.sh`, then: -- snipsnap -- if test -n "$trace" && test -n "$test_untraceable" then # '-x' tracing requested, but this test script can't be reliably # traced, unless it is run with a Bash version supporting # BASH_XTRACEFD (introduced in Bash v4.1). # # Perform this version check _after_ the test script was # potentially re-executed with $TEST_SHELL_PATH for '--tee' or # '--verbose-log', so the right shell is checked and the # warning is issued only once. if test -n "$BASH_VERSION" && eval ' test ${BASH_VERSINFO[0]} -gt 4 || { test ${BASH_VERSINFO[0]} -eq 4 && test ${BASH_VERSINFO[1]} -ge 1 } ' then : Executed by a Bash version supporting BASH_XTRACEFD. Good. else echo >&2 "warning: ignoring -x; '$0' is untraceable without BASH_XTRACEFD" trace= fi fi ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 19:39 ` Johannes Schindelin @ 2019-09-26 21:44 ` SZEDER Gábor 2019-09-27 22:18 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-09-26 21:44 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git On Thu, Sep 26, 2019 at 09:39:58PM +0200, Johannes Schindelin wrote: > Hi, > > On Thu, 26 Sep 2019, SZEDER Gábor wrote: > > > On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote: > > > > > > > Also, things like the code tracing via `-x` (which relies on Bash > > > > > > > functionality in order to work properly, > > > > > > > > > > > > Not really. > > > > > > > > > > To work properly. What I meant was the trick we need to play with > > > > > `BASH_XTRACEFD`. > > > > > > > > I'm still unsure what BASH_XTRACEFD trick you mean. AFAICT we don't > > > > play any tricks with it to make '-x' work properly, and indeed '-x' > > > > tracing works properly even without BASH_XTRACEFD (and to achive that > > > > we did have to play some tricks, but not any with BASH_XTRACEFD; > > > > perhaps these tricks are what you meant?). > > > > > > It works okay some of the time. > > > > As far as I can tell it works all the time. > > I must be misinterpreting this part of `t/test-lib.sh`, then: Ok, let me try to clarify. There are a couple of things that we can't do in our tests without BASH_XTRACEFD, e.g. redirecting the standard error of a subshell or a loop to a file and then check that file with 'test_cmp' or 'test_must_be_empty'. With tracing enabled but without BASH_XTRACEFD, the trace of the commands executed within the subshell or loop end up in that file as well, and cause failure (grepping through that file is mostly ok, though). Back then we had 23 test cases failing because they were doing things like this and needed to be fixed, so considering the total number of test cases we only rarely used such problematic constructs. Still, as I recall, Peff was concerned that these limitations might lead to maintenance burden on the long run, so I decided to add an escape hatch, just in case someone constructs such an elaborate test script, where redirecting the stderr of a compound command could considerably simplify the tests. That snippet of code that you copied is this escape hatch: if $test_untraceable is set to a non-empty value before sourcing 'test-lib.sh', then tracing will only be enabled if BASH_XTRACEFD is available. All that was over a year and a half ago, and these limitations weren't a maintenance burden at all so far, and nobody needed that escape hatch. Well, nobody except me, that is :) When I saw back then that t1510 saves the stderr of nested function calls with 7 parameters, I shrugged in disgust, admitted defeat, and simply reached for that escape hatch: partly because I couldn't be bothered to figure out how that test script works, but more importantly because I didn't want to risk that any cleanup inadvertently hides a bug in the future. So that's the only user that piece of code ever had, and I certainly hope that no other test script will ever grow so complicated that it will need this escape hatch. I would actually prefer to remove it, but t1510 must be cleaned up first... so I'm afraid it will be with us for a while. > -- snipsnap -- > if test -n "$trace" && test -n "$test_untraceable" > then > # '-x' tracing requested, but this test script can't be reliably > # traced, unless it is run with a Bash version supporting > # BASH_XTRACEFD (introduced in Bash v4.1). > # > # Perform this version check _after_ the test script was > # potentially re-executed with $TEST_SHELL_PATH for '--tee' or > # '--verbose-log', so the right shell is checked and the > # warning is issued only once. > if test -n "$BASH_VERSION" && eval ' > test ${BASH_VERSINFO[0]} -gt 4 || { > test ${BASH_VERSINFO[0]} -eq 4 && > test ${BASH_VERSINFO[1]} -ge 1 > } > ' > then > : Executed by a Bash version supporting BASH_XTRACEFD. Good. > else > echo >&2 "warning: ignoring -x; '$0' is untraceable without BASH_XTRACEFD" > trace= > fi > fi ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 21:44 ` SZEDER Gábor @ 2019-09-27 22:18 ` Jeff King 2019-10-09 17:25 ` SZEDER Gábor 0 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-27 22:18 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git On Thu, Sep 26, 2019 at 11:44:48PM +0200, SZEDER Gábor wrote: > All that was over a year and a half ago, and these limitations weren't > a maintenance burden at all so far, and nobody needed that escape > hatch. > > Well, nobody except me, that is :) When I saw back then that t1510 > saves the stderr of nested function calls with 7 parameters, I > shrugged in disgust, admitted defeat, and simply reached for that > escape hatch: partly because I couldn't be bothered to figure out how > that test script works, but more importantly because I didn't want to > risk that any cleanup inadvertently hides a bug in the future. > > So that's the only user that piece of code ever had, and I certainly > hope that no other test script will ever grow so complicated that it > will need this escape hatch. I would actually prefer to remove it, > but t1510 must be cleaned up first... so I'm afraid it will be with > us for a while. I'm actually surprised we haven't run into it more. We have some custom test scripts in our fork of Git at GitHub. We usually just use TEST_SHELL_PATH=bash, but curious, I tried running with dash and "-x", and three of them failed. Probably they'd be easy enough to fix (and they're out of tree anyway), so I'm not really arguing against the escape hatch exactly. Mostly I'm just surprised that if I introduced 3 cases (out of probably a dozen scripts), I'm surprised that more contributors aren't accidentally doing so upstream. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-27 22:18 ` Jeff King @ 2019-10-09 17:25 ` SZEDER Gábor 2019-10-11 6:34 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: SZEDER Gábor @ 2019-10-09 17:25 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git On Fri, Sep 27, 2019 at 06:18:58PM -0400, Jeff King wrote: > On Thu, Sep 26, 2019 at 11:44:48PM +0200, SZEDER Gábor wrote: > > > All that was over a year and a half ago, and these limitations weren't > > a maintenance burden at all so far, and nobody needed that escape > > hatch. > I'm actually surprised we haven't run into it more. We have some custom > test scripts in our fork of Git at GitHub. We usually just use > TEST_SHELL_PATH=bash, but curious, I tried running with dash and "-x", > and three of them failed. I try to avoid using TEST_SHELL_PATH at all costs, it is far too keen to not do what I naively expect. > Probably they'd be easy enough to fix (and they're out of tree anyway), > so I'm not really arguing against the escape hatch exactly. Mostly I'm > just surprised that if I introduced 3 cases (out of probably a dozen > scripts), I'm surprised that more contributors aren't accidentally doing > so upstream. I see it a bit differently. Over a decade we gathered about twenty-something such tests cases: that's about two cases per year. You added three such cases in about a year and a half: that's two cases per year. The numbers add up perfectly, you singlehandedly took care of everything ;) Anyway, I did some more digging, and, unfortunately, it turned out that Dscho is somewhat right. While the situation is not as bad as he made it look like ("We need Bash!"), it's not as good as I thought it is ("But it Just Works!!") either. - Some shells do include file descriptor redirections in the trace output, and it varies between implementations to which fd the trace of the redirection goes. - 'ksh/ksh93' and NetBSD's /bin/sh send the trace of redirections to the "wrong" fd, in the sense that e.g. the trace of commands invoked in 'test_must_fail' goes to the function's standard error, and checking its stderr with 'test_cmp' would then fail. (But 'ksh/ksh93' doesn't really matter, because they don't support the 'local' keyword, so they fail a bunch of tests even without '-x' anyway.) I don't think we can do anything about these shells. - 'mksh/lksh' send the trace of redirections to the "right" fd, so they won't pollute the stderr of test helper functions. And indeed the test suite passes when run with 'mksh' (well, at least the subset of the test suite that I usually run). - We do call 'test_have_prereq' from within test cases as well, notably from the 'test_i18ngrep', 'test_i18ncmp' and 'test_ln_s_add' helper functions. In those cases all trace output from 'test_have_prereq' is included in the test case's trace output, which means that during the first invocation: - there is lots of distracting and confusing trace output, as the script evaluating the prereq is passed around to a bunch of functions. - after running the script evaluating the prereq 'test_eval_' does indeed turn off tracing, so there will be no trace from the remainder of that test case (except with 'mksh': while it does run 'set +x' in 'test_eval_', that somehow doesn't turn off tracing... I have no idea whether that's a bug or a feature). As far as 'test_i18ngrep' is concerned, which accounts for the majority of 'test_have_prereq' invocations within test cases, I don't understand why it uses 'test_have_prereq' in the first place instead of checking the GIT_TEST_GETTEXT_POISON environment variable; and 6cdccfce1e (i18n: make GETTEXT_POISON a runtime option, 2018-11-08) doesn't give me any insight. I recall that some months ago we discussed the idea of how to disable trace output from within test helper functions; that would help with this 'test_have_prereq' issue as well, at least in case of the more "common" shells. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-10-09 17:25 ` SZEDER Gábor @ 2019-10-11 6:34 ` Jeff King 0 siblings, 0 replies; 63+ messages in thread From: Jeff King @ 2019-10-11 6:34 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git On Wed, Oct 09, 2019 at 07:25:51PM +0200, SZEDER Gábor wrote: > > Probably they'd be easy enough to fix (and they're out of tree anyway), > > so I'm not really arguing against the escape hatch exactly. Mostly I'm > > just surprised that if I introduced 3 cases (out of probably a dozen > > scripts), I'm surprised that more contributors aren't accidentally doing > > so upstream. > > I see it a bit differently. Over a decade we gathered about > twenty-something such tests cases: that's about two cases per year. > You added three such cases in about a year and a half: that's two > cases per year. The numbers add up perfectly, you singlehandedly took > care of everything ;) Those cases are actually much older than that. I just didn't bother to clean them up until recently. So my rate is even lower :) > - Some shells do include file descriptor redirections in the trace > output, and it varies between implementations to which fd the > trace of the redirection goes. > > - 'ksh/ksh93' and NetBSD's /bin/sh send the trace of > redirections to the "wrong" fd, in the sense that e.g. the > trace of commands invoked in 'test_must_fail' goes to the > function's standard error, and checking its stderr with > 'test_cmp' would then fail. > > (But 'ksh/ksh93' doesn't really matter, because they don't > support the 'local' keyword, so they fail a bunch of tests > even without '-x' anyway.) > > I don't think we can do anything about these shells. Yeah, unless somebody is complaining, I don't know that it's worth worrying about too much. The test suite is certainly useful without being able to use "-x" on every single test run (you can still run it without "-x" obviously, or selectively use "-x" to debug a single test or script). So if it is only unreliable on a few tests on a few obscure shells, we can probably live with it until somebody demonstrates a real-world problem (e.g., that they're running automated CI on an obscure platform that is stuck with an old shell, and really want "-x --verbose-log" to get more verbose failures). > - We do call 'test_have_prereq' from within test cases as well, > notably from the 'test_i18ngrep', 'test_i18ncmp' and > 'test_ln_s_add' helper functions. In those cases all trace output > from 'test_have_prereq' is included in the test case's trace > output, which means that during the first invocation: > > - there is lots of distracting and confusing trace output, as > the script evaluating the prereq is passed around to a bunch > of functions. Yeah, I think this is probably an issue even with bash. > As far as 'test_i18ngrep' is concerned, which accounts for the > majority of 'test_have_prereq' invocations within test cases, I > don't understand why it uses 'test_have_prereq' in the first place > instead of checking the GIT_TEST_GETTEXT_POISON environment > variable; and 6cdccfce1e (i18n: make GETTEXT_POISON a runtime > option, 2018-11-08) doesn't give me any insight. I think it's just that checking the environment variable is non-trivial: we invoke env--helper to handle bool interpretation. So we'd prefer to cache the result (and not to run it at all if a test script doesn't use i18ngrep, though it's perhaps ubiquitous enough that we should just run it up front for every script). > I recall that some months ago we discussed the idea of how to > disable trace output from within test helper functions; that would > help with this 'test_have_prereq' issue as well, at least in case > of the more "common" shells. That might be worth doing, though IIRC it got kind of hairy. :) -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 12:47 ` Johannes Schindelin 2019-09-23 16:58 ` SZEDER Gábor @ 2019-09-23 18:19 ` Jeff King 2019-09-24 14:30 ` Johannes Schindelin 1 sibling, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 18:19 UTC (permalink / raw) To: Johannes Schindelin; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote: > The evaluation of the lazy prereq is indeed not different between Bash > or dash. It is nevertheless quite disruptive in the trace of a test > script, especially when it is evaluated for a test case that is skipped > explicitly via the `--run` option. That sounds like a bug: if we know we are not going to run the test anyway due to --run or GIT_TEST_SKIP, we should probably avoid checking the prereq at all. -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 18:19 ` Jeff King @ 2019-09-24 14:30 ` Johannes Schindelin 0 siblings, 0 replies; 63+ messages in thread From: Johannes Schindelin @ 2019-09-24 14:30 UTC (permalink / raw) To: Jeff King; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git Hi Peff, On Mon, 23 Sep 2019, Jeff King wrote: > On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote: > > > The evaluation of the lazy prereq is indeed not different between Bash > > or dash. It is nevertheless quite disruptive in the trace of a test > > script, especially when it is evaluated for a test case that is skipped > > explicitly via the `--run` option. > > That sounds like a bug: if we know we are not going to run the test > anyway due to --run or GIT_TEST_SKIP, we should probably avoid checking > the prereq at all. Yep. Likewise, the `&&` chain checker seems to run even if the test case is skipped. Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 11:23 ` Johannes Schindelin 2019-09-17 12:02 ` SZEDER Gábor @ 2019-09-17 15:10 ` Christian Couder 2019-09-23 12:50 ` Johannes Schindelin 2019-09-23 19:30 ` Jeff King 2019-09-23 18:07 ` Jeff King 2019-09-24 0:55 ` Eric Wong 3 siblings, 2 replies; 63+ messages in thread From: Christian Couder @ 2019-09-17 15:10 UTC (permalink / raw) To: Emily Shaffer, Johannes Schindelin; +Cc: Jeff King, Jonathan Tan, git Hi Emily and Dscho, On Tue, Sep 17, 2019 at 1:28 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Mon, 16 Sep 2019, Emily Shaffer wrote: > > > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to > > talk about projects and we came up with a trimmed list; not sure what > > more needs to be done to make them into fully-fledged proposals. > > Thank you for doing this! Yeah, great! > > For starter microprojects, we came up with: > > > > - cleanup a test script (although we need to identify particularly > > which ones and what counts as "clean") > > - moving doc from documentation/technical/api-* to comments in the > > appropriate header instead > > - teach a command which currently handles its own argv how to use > > parse-options instead > > - add a user.timezone option which Git can use if present rather than > > checking system local time > > Nice projects, all. There are a couple more ideas on > https://github.com/gitgitgadget/git/issues, they could probably use some > tagging. Thanks! Maybe we should have a page with Outreachy microprojects on https://git.github.io/ I will see if I find the time to create one soon with the above information. > > For the longer projects, we came up with a few more: [...] > > - git-bisect.sh > > That would be my top recommendation, especially given how much effort > Tanushree put in last winter to make this conversion to C so much more > achievable than before. I just added a project in the Outreachy system about it. I would have added the link but Outreachy asks to not share the link publicly. I am willing to co-mentor (or maybe mentor alone if no one else wants to co-mentor) it. Anyone willing to co-mentor can register on the outreachy website. Thanks for the other suggestions by the way. > Converting shell/Perl scripts into built-in C never looks as much fun as > open-ended projects with lots of playing around, but the advantage of > the former is that they can be easily structured, offer a lot of > opportunity for learning, and they are ultimately more rewarding because > the goals are much better defined than many other projects'. I agree. Outreachy also suggest avoiding projects that have to be discussed a lot or are not clear enough. > > - reduce/eliminate use of fetch_if_missing global I like this one! > > It might make sense to only focus on scoping the ones we feel most > > interested in. We came up with a pretty big list because we had some > > other programs in mind, so I suppose it's not necessary to develop all > > of them for this program. I agree as I don't think we will have enough mentors or co-mentors for a big number of projects. Best, Christian. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 15:10 ` Christian Couder @ 2019-09-23 12:50 ` Johannes Schindelin 2019-09-23 19:30 ` Jeff King 1 sibling, 0 replies; 63+ messages in thread From: Johannes Schindelin @ 2019-09-23 12:50 UTC (permalink / raw) To: Christian Couder; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git Hi, On Tue, 17 Sep 2019, Christian Couder wrote: > On Tue, Sep 17, 2019 at 1:28 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > On Mon, 16 Sep 2019, Emily Shaffer wrote: > > > > > - reduce/eliminate use of fetch_if_missing global > > I like this one! It looks as if a (non-Outreachy) contributor also does like this one already: https://github.com/git/git/pull/650 Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 15:10 ` Christian Couder 2019-09-23 12:50 ` Johannes Schindelin @ 2019-09-23 19:30 ` Jeff King 1 sibling, 0 replies; 63+ messages in thread From: Jeff King @ 2019-09-23 19:30 UTC (permalink / raw) To: Christian Couder; +Cc: Emily Shaffer, Johannes Schindelin, Jonathan Tan, git On Tue, Sep 17, 2019 at 05:10:43PM +0200, Christian Couder wrote: > > Nice projects, all. There are a couple more ideas on > > https://github.com/gitgitgadget/git/issues, they could probably use some > > tagging. > > Thanks! Maybe we should have a page with Outreachy microprojects on > https://git.github.io/ > > I will see if I find the time to create one soon with the above information. Please let me know if you do; there's a spot in the project page for: Description of your first time contribution tutorial: If your applicants need to complete a tutorial before working on contributions with mentors, please provide a description and the URL for the tutorial. For example, the Linux kernel asks applicants to complete a tutorial for compiling and installing a custom kernel, and sending in a simple whitespace change patch. Once applicants complete this tutorial, they can start to work with mentors on more complex contributions. which could link to that list (and the list should probably discuss how we consider micro-projects, etc, and link to MyFirstContribution or similar). I think this suggestion is a good one to add, as well: https://public-inbox.org/git/20190923180649.GA2886@szeder.dev/ -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 11:23 ` Johannes Schindelin 2019-09-17 12:02 ` SZEDER Gábor 2019-09-17 15:10 ` Christian Couder @ 2019-09-23 18:07 ` Jeff King 2019-09-24 14:25 ` Johannes Schindelin 2019-09-24 0:55 ` Eric Wong 3 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 18:07 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jonathan Tan, git On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > The only slightly challenging aspect might be that `merge-one-file` is > actually not a merge strategy, but it is used as helper to be passed to > `git merge-index` via the `-o <helper>` option, which makes it slightly > awkward to be implemented as a built-in. A better approach would > therefore be to special-case this value in `merge-index` and execute the > C code directly, without the detour of spawning a built-in. I think it could make sense for merge-index to be able to directly run the merge-one-file code[1]. But I think we'd want to keep its ability to run an arbitrary script, and for people to call merge-one-file separately, since right now you can do: git merge-index my-script and have "my-script" do some processing of its own, then hand off more work to merge-one-file. So the weird calling convention is actually a user-visible and potentially useful interface. So it would need a deprecation period before being removed. -Peff [1] Certainly doing it in-process would be faster for the common case of "git merge-index git-merge-one-file", but I wonder if anybody really does that. These days most people would just merge-recursive anyway. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 18:07 ` Jeff King @ 2019-09-24 14:25 ` Johannes Schindelin 2019-09-24 15:33 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: Johannes Schindelin @ 2019-09-24 14:25 UTC (permalink / raw) To: Jeff King; +Cc: Emily Shaffer, Jonathan Tan, git Hi Peff, On Mon, 23 Sep 2019, Jeff King wrote: > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote: > > > The only slightly challenging aspect might be that `merge-one-file` is > > actually not a merge strategy, but it is used as helper to be passed to > > `git merge-index` via the `-o <helper>` option, which makes it slightly > > awkward to be implemented as a built-in. A better approach would > > therefore be to special-case this value in `merge-index` and execute the > > C code directly, without the detour of spawning a built-in. > > I think it could make sense for merge-index to be able to directly run > the merge-one-file code[1]. But I think we'd want to keep its ability to > run an arbitrary script, and for people to call merge-one-file > separately, since right now you can do: > > git merge-index my-script > > and have "my-script" do some processing of its own, then hand off more > work to merge-one-file. Oh, sorry, I did not mean to say that we should do away with this at all! Rather, I meant to say that `merge-index` could detect when it was asked to run `git-merge-one-file` and re-route to internal code instead of spawning a process. If any other script/program was specified, it should be spawned off, just like it is done today. Thanks, Dscho > So the weird calling convention is actually a user-visible and > potentially useful interface. So it would need a deprecation period > before being removed. > > -Peff > > [1] Certainly doing it in-process would be faster for the common case of > "git merge-index git-merge-one-file", but I wonder if anybody really > does that. These days most people would just merge-recursive anyway. > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-24 14:25 ` Johannes Schindelin @ 2019-09-24 15:33 ` Jeff King 2019-09-28 3:56 ` Junio C Hamano 0 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-24 15:33 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jonathan Tan, git On Tue, Sep 24, 2019 at 04:25:45PM +0200, Johannes Schindelin wrote: > > I think it could make sense for merge-index to be able to directly run > > the merge-one-file code[1]. But I think we'd want to keep its ability to > > run an arbitrary script, and for people to call merge-one-file > > separately, since right now you can do: > > > > git merge-index my-script > > > > and have "my-script" do some processing of its own, then hand off more > > work to merge-one-file. > > Oh, sorry, I did not mean to say that we should do away with this at > all! Rather, I meant to say that `merge-index` could detect when it was > asked to run `git-merge-one-file` and re-route to internal code instead > of spawning a process. If any other script/program was specified, it > should be spawned off, just like it is done today. OK, great, then we are completely on the same page. :) -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-24 15:33 ` Jeff King @ 2019-09-28 3:56 ` Junio C Hamano 0 siblings, 0 replies; 63+ messages in thread From: Junio C Hamano @ 2019-09-28 3:56 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git Jeff King <peff@peff.net> writes: > On Tue, Sep 24, 2019 at 04:25:45PM +0200, Johannes Schindelin wrote: > >> > I think it could make sense for merge-index to be able to directly run >> > the merge-one-file code[1]. But I think we'd want to keep its ability to >> > run an arbitrary script, and for people to call merge-one-file >> > separately, since right now you can do: >> > >> > git merge-index my-script >> > >> > and have "my-script" do some processing of its own, then hand off more >> > work to merge-one-file. >> >> Oh, sorry, I did not mean to say that we should do away with this at >> all! Rather, I meant to say that `merge-index` could detect when it was >> asked to run `git-merge-one-file` and re-route to internal code instead >> of spawning a process. If any other script/program was specified, it >> should be spawned off, just like it is done today. > > OK, great, then we are completely on the same page. :) I wondered briefly if we want tospecial case the string 'git-merge-one-file' (iow, teaching merge-index a new option "--use-builtin-merge-one-file" and update our own use in git-merge-octopus.sh, git-merge-resolve.sh, etc. would be safer). But "git merge-index git-merge-one-file" called in the context of scrpted Porcelain has the directory that has *OUR* git-merge-one-file as the first element on $PATH, I think, so it may be perfectly safe to use the "ah, the command name we are asked to run happens to be git-merge-one-file, so let's use the internal version instead without spawning" short-cut. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-17 11:23 ` Johannes Schindelin ` (2 preceding siblings ...) 2019-09-23 18:07 ` Jeff King @ 2019-09-24 0:55 ` Eric Wong 2019-09-26 12:45 ` Johannes Schindelin 2019-09-28 4:01 ` Junio C Hamano 3 siblings, 2 replies; 63+ messages in thread From: Eric Wong @ 2019-09-24 0:55 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Mon, 16 Sep 2019, Emily Shaffer wrote: > > - try and make progress towards running many tests from a single test > > file in parallel - maybe this is too big, I'm not sure if we know how > > many of our tests are order-dependent within a file for now... > > Another, potentially more rewarding, project would be to modernize our > test suite framework, so that it is not based on Unix shell scripting, > but on C instead. I worry more C would reduce the amount of contributors (some of the C rewrites already scared me off hacking years ago). I figure more users are familiar with sh than C. It would also increase the disparity between tests and use of actual users from the command-line. > The fact that it is based on Unix shell scripting not only costs a lot > of speed, especially on Windows, it also limits us quite a bit, and I am > talking about a lot more than just the awkwardness of having to think > about options of BSD vs GNU variants of common command-line tools. I agree that it costs a lot of time, and I'm even on Linux using dash as /bin/sh + eatmydata (but ancient laptop) > For example, many, many, if not all, test cases, spend the majority of > their code on setting up specific scenarios. I don't know about you, > but personally I have to dive into many of them when things fail (and I > _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really > have to say that most of that code is hard to follow and does not make > it easy to form a mental model of what the code tries to accomplish. > > To address this, a while ago Thomas Rast started to use `fast-export`ed > commit histories in test scripts (see e.g. `t/t3206/history.export`). I > still find that this fails to make it easier for occasional readers to > understand the ideas underlying the test cases. > > Another approach is to document heavily the ideas first, then use code > to implement them. For example, t3430 starts with this: > > [...] > > Initial setup: > > -- B -- (first) > / \ > A - C - D - E - H (master) > \ \ / > \ F - G (second) > \ > Conflicting-G > > [...] > > test_commit A && > git checkout -b first && > test_commit B && > git checkout master && > test_commit C && > test_commit D && > git merge --no-commit B && > test_tick && > git commit -m E && > git tag -m E E && > git checkout -b second C && > test_commit F && > test_commit G && > git checkout master && > git merge --no-commit G && > test_tick && > git commit -m H && > git tag -m H H && > git checkout A && > test_commit conflicting-G G.t > > [...] > > While this is _somewhat_ better than having only the code, I am still > unhappy about it: this wall of `test_commit` lines interspersed with > other commands is very hard to follow. Agreed. More on the readability part below... As far as speeding that up, I think moving some parts of test setup to Makefiles + fast-import/fast-export would give us a nice balance of speed + maintainability: 1. initial setup is done using normal commands (or graph drawing tool) 2. the result of setup is "built" with fast-export 3. test uses fast-import Makefile rules would prevent subsequent test runs from repeating 1. and 2. > If we were to (slowly) convert our test suite framework to C, we could > change that. > > One idea would be to allow recreating commit history from something that > looks like the output of `git log`, or even `git log --graph --oneline`, > much like `git mktree` (which really should have been a test helper > instead of a Git command, but I digress) takes something that looks like > the output of `git ls-tree` and creates a tree object from it. I've been playing with Graph::Easy (Perl5 module) in other projects, and I also think the setup could be more easily expressed with a declarative language (e.g. GNU make) > Another thing that would be much easier if we moved more and more parts > of the test suite framework to C: we could implement more powerful > assertions, a lot more easily. For example, the trace output of a failed > `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more > focused on what is going wrong than on cluttering the terminal window > with almost useless lines which are tedious to sift through. I fail to see how language choice here matters. But then again, I have plenty of experience writing bad code in ALL languages I know :> > Likewise, having a framework in C would make it a lot easier to improve > debugging, e.g. by making test scripts "resumable" (guarded by an > option, it could store a complete state, including a copy of the trash > directory, before executing commands, which would allow "going back in > time" and calling a failing command with a debugger, or with valgrind, or > just seeing whether the command would still fail, i.e. whether the test > case is flaky). Resumability sounds like a perfect job for GNU make. (that said, I don't know if you use make or something else to build gfw) > In many ways, our current test suite seems to test Git's functionality > as much as (core) contributors' abilities to implement test cases in > Unix shell script, _correctly_, and maybe also contributors' patience. > You could say that it tests for the wrong thing at least half of the > time, by design. Basic (not advanced) sh is already a prerequisite for using git. Writing correct code and tests in ANY language is still a challenge for me; but I'm least convinced a low-level language such as C is the right language for writing integration tests in. C is fine for unit tests, and maybe we can use more unit tests and less integration tests. > It might look like a somewhat less important project, but given that we > exercise almost 150,000 test cases with every CI build, I think it does > make sense to grind our axe for a while, so to say. Something that would benefit both users and regular contributors is the use and adoption of more batch and eval-friendly interfaces. e.g. fast-import/export, cat-file --batch, for-each-ref --perl... I haven't used hg since 2005, but I know "hg server" exists nowadays to get rid of a lot of startup overhead in Mercurial, and maybe git could steal that idea, too... > Therefore, it might be a really good project to modernize our test > suite. To take ideas from modern test frameworks such as Jest and try to > bring them to C. Which means that new contributors would probably be > better suited to work on this project than Git old-timers! > > And the really neat thing about this project is that it could be done > incrementally. I hope to find time to hack some more batch/eval-friendly stuff that can make scripting git more performant; but no idea on my availability :< ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-24 0:55 ` Eric Wong @ 2019-09-26 12:45 ` Johannes Schindelin 2019-09-30 8:55 ` Eric Wong 2019-09-28 4:01 ` Junio C Hamano 1 sibling, 1 reply; 63+ messages in thread From: Johannes Schindelin @ 2019-09-26 12:45 UTC (permalink / raw) To: Eric Wong; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git Hi Eric, On Tue, 24 Sep 2019, Eric Wong wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Mon, 16 Sep 2019, Emily Shaffer wrote: > > > - try and make progress towards running many tests from a single test > > > file in parallel - maybe this is too big, I'm not sure if we know how > > > many of our tests are order-dependent within a file for now... > > > > Another, potentially more rewarding, project would be to modernize our > > test suite framework, so that it is not based on Unix shell scripting, > > but on C instead. > > I worry more C would reduce the amount of contributors (some of > the C rewrites already scared me off hacking years ago). I > figure more users are familiar with sh than C. Seeing as most of the patches/patch series require contributors not only to write test cases in Unix shell script, but also to patch or implement code in C, I fail to be concerned about that. > It would also increase the disparity between tests and use of > actual users from the command-line. I find it really endearing whenever I hear anybody talking about Git users as if every single one of them crafted extensive shell scripts around Git. In my experience, most of the people who do that gather on this here mailing list, or on a nearby one, and that is but a tiny fraction of Git users. Besides, it was my understanding that Git's test suite tries to prevent regressions in Git's code, whether it is called in scripts or not. As such, it would not matter _how_ that functionality is tested. Does that not match your understanding? > > The fact that it is based on Unix shell scripting not only costs a lot > > of speed, especially on Windows, it also limits us quite a bit, and I am > > talking about a lot more than just the awkwardness of having to think > > about options of BSD vs GNU variants of common command-line tools. > > I agree that it costs a lot of time, and I'm even on Linux using > dash as /bin/sh + eatmydata (but ancient laptop) One thing that I meant to play with (but which is hampered by too many parts of Git being implemented in Unix shell/Perl, still, therefore making code coverage analysis hard) is Test Impact Analysis (https://docs.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis). In short, a way to avoid running tests the code touched by them was already tested before. Example: if I change the `README.md`, no regression test needs to be run at all. If I change `git-p4.py`, the majority of test scripts can be skipped, only t98*.sh need to be run (and maybe not even all of them). There is a lot of work to be done on the built-in'ification, still, before that becomes feasible, of course. > > For example, many, many, if not all, test cases, spend the majority of > > their code on setting up specific scenarios. I don't know about you, > > but personally I have to dive into many of them when things fail (and I > > _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really > > have to say that most of that code is hard to follow and does not make > > it easy to form a mental model of what the code tries to accomplish. > > > > To address this, a while ago Thomas Rast started to use `fast-export`ed > > commit histories in test scripts (see e.g. `t/t3206/history.export`). I > > still find that this fails to make it easier for occasional readers to > > understand the ideas underlying the test cases. > > > > Another approach is to document heavily the ideas first, then use code > > to implement them. For example, t3430 starts with this: > > > > [...] > > > > Initial setup: > > > > -- B -- (first) > > / \ > > A - C - D - E - H (master) > > \ \ / > > \ F - G (second) > > \ > > Conflicting-G > > > > [...] > > > > test_commit A && > > git checkout -b first && > > test_commit B && > > git checkout master && > > test_commit C && > > test_commit D && > > git merge --no-commit B && > > test_tick && > > git commit -m E && > > git tag -m E E && > > git checkout -b second C && > > test_commit F && > > test_commit G && > > git checkout master && > > git merge --no-commit G && > > test_tick && > > git commit -m H && > > git tag -m H H && > > git checkout A && > > test_commit conflicting-G G.t > > > > [...] > > > > While this is _somewhat_ better than having only the code, I am still > > unhappy about it: this wall of `test_commit` lines interspersed with > > other commands is very hard to follow. > > Agreed. More on the readability part below... > > As far as speeding that up, I think moving some parts > of test setup to Makefiles + fast-import/fast-export would give > us a nice balance of speed + maintainability: > > 1. initial setup is done using normal commands (or graph drawing tool) > 2. the result of setup is "built" with fast-export > 3. test uses fast-import I actually talked about this in my mail. If you find it easy to deduce the intent behind a commit history that was exported via fast-export, more power to you. (Was the committer name crucial? The file name? Or the temporal order of the commits?) In contrast, I find it very challenging, myself. And please keep in mind that the first thing any contributor needs to do who sees a failing regression test (where the failure is most likely caused by the patch they plan on contributing): understand what the heck the regression test case is trying to ensure. The harder the code makes that, the worse it does its primary job: to (help) prevent regressions. So no, I am not at all on board with moving to fast-imported commit histories in Git's test suite. They provide some convenience to the authors of those regression tests, which is not the audience you need to cater for in this case: instead, regression tests should make it not only easy to catch, but _especially_ easy to fix, regressions. And that audience would pay dearly for that erstwhile convenience. > Makefile rules would prevent subsequent test runs from repeating > 1. and 2. That is a cute idea, until you realize that the number of developers fluent in `make` is even smaller than the number of developers fluent in `C`. In other words, you would again _increase_ the the number of prerequisites instead of reducing it. > > If we were to (slowly) convert our test suite framework to C, we could > > change that. > > > > One idea would be to allow recreating commit history from something that > > looks like the output of `git log`, or even `git log --graph --oneline`, > > much like `git mktree` (which really should have been a test helper > > instead of a Git command, but I digress) takes something that looks like > > the output of `git ls-tree` and creates a tree object from it. > > I've been playing with Graph::Easy (Perl5 module) in other > projects, and I also think the setup could be more easily > expressed with a declarative language (e.g. GNU make) I am dubious. But hey, if you show me something that looks _dead_ easy to understand, and even easier to write for new contributors, who am I to object? But mind, I am not fluent in Perl. I can probably hack my way through `git-svn`, but that's a far cry from knowing what I am doing there. And wouldn't that _again_ increase the number of prerequisites on contributors? I mean, you sounded genuinely concerned about that at the beginning of the mail. And I share that concern. > > Another thing that would be much easier if we moved more and more parts > > of the test suite framework to C: we could implement more powerful > > assertions, a lot more easily. For example, the trace output of a failed > > `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more > > focused on what is going wrong than on cluttering the terminal window > > with almost useless lines which are tedious to sift through. > > I fail to see how language choice here matters. In Unix shell script, we either trace (everything) or we don't. In C, you can be a lot more fine-grained with log messages. A *lot*. We even already have such a fine-grained log machinery in place, see `trace.h`. Also, because of the ability to perform more sophisticated locking, lazy prerequisites can easily be cached in C, whereas that is not so easy in Unix shell (and hence it is not done). > > Likewise, having a framework in C would make it a lot easier to improve > > debugging, e.g. by making test scripts "resumable" (guarded by an > > option, it could store a complete state, including a copy of the trash > > directory, before executing commands, which would allow "going back in > > time" and calling a failing command with a debugger, or with valgrind, or > > just seeing whether the command would still fail, i.e. whether the test > > case is flaky). > > Resumability sounds like a perfect job for GNU make. Umm. So let me give you an example of something I had to debug recently. A `git stash apply` marked files outside the sparse checkout as deleted, when they actually had staged changes during the `git stash` call. If this had been a regression test case, it would have looked like this: test_expect_success 'stash handles skip-worktree entries nicely' ' test_commit A && echo changed >A.t && git add A.t && git update-index --skip-worktree A.t && rm A.t && git stash && : this should not mark A.t as deleted && git stash apply && test -n "$(git ls-files A.t)" ' Now, the problem would have occurred in the very last command: `A.t` would have been missing from the index. In order to debug this, you would have had to put in "breakpoints" (inserting `false &&` at strategic places), or prefix commands with `debug` to start them in GDB, then re-run the test case. Lather, rinse and repeat, until you figured out that `git stash` was failing to record `A.t` properly. Then dig into that, recreating the same worktree situation every time you run `git stash`, until you find out that there is a call to `git update-index -z --add --remove --stdin` that removes that file. Further investigation would show you that this command pretty much does what it is told to, because it is fed `A.t` in its `stdin`. The next step would probably be to go back even one more step and see that `diff_index()` reported this file as modified between the index and `HEAD`. Slowly, you would form an understanding of what is going wrong, and you would have to go back and forth between blaming `diff_index()` and `update-index`, or the options `git stash` passes to them. You would have to recreate the worktree many, many times, in order to dig in deep, and of course you would need to understand the intention not only of the regression test, but also of the code it calls. In this instance, it is merely tedious, but possible. I know, because I did it. For flaky tests, not so much. *That* is the scenario I tried to get at. Writing the test cases in `make` would not help that. Not one bit. It would actually make things a lot worse. > (that said, I don't know if you use make or something else to build > gfw) We are talking about the test suite, yes? Not about building Git? Because it does not matter whether we use `make` or not (we do by default, although we also have an option to build in Visual Studio and/or via MSBuild). On Windows, just like on Linux, we use a Unix shell interpreter (Bash). Sure, to run the entire test suite, we use `make` (sometimes in conjunction with `prove`), but the tests themselves are written in Unix shell script, so I have a hard time imagining a different method to run them -- whether on Windows or not -- than to use a Unix shell interpreter such as Bash. > > In many ways, our current test suite seems to test Git's > > functionality as much as (core) contributors' abilities to implement > > test cases in Unix shell script, _correctly_, and maybe also > > contributors' patience. You could say that it tests for the wrong > > thing at least half of the time, by design. > > Basic (not advanced) sh is already a prerequisite for using git. Well, if you are happy with that prerequisite to be set in stone, I am not. Why should any Git user *need* to know sh? > Writing correct code and tests in ANY language is still a > challenge for me; but I'm least convinced a low-level language > such as C is the right language for writing integration tests in. I would be delighted to go for a proper, well-maintained test framework such as Jest. But of course, that would not be accepted in this project. So I won't even think about it. > C is fine for unit tests, and maybe we can use more unit tests and > less integration tests. We don't have integration tests. Unless your concept of what constitutes an "integration test" is very different from mine. For me, an integration test would set up an environment that involves multiple points of failure, e.g. setting up an SSH server on Linux and accessing that via Git from Ubuntu. Or setting up a web server with a self-signed certificate, import the public key into the Windows Certificate Store and then accessing the server both using OpenSSL and Secure Channel (the native Windows way to communicate via TLS). There is nothing even close to that in Git's test suite. > > It might look like a somewhat less important project, but given that we > > exercise almost 150,000 test cases with every CI build, I think it does > > make sense to grind our axe for a while, so to say. > > Something that would benefit both users and regular contributors > is the use and adoption of more batch and eval-friendly interfaces. > e.g. fast-import/export, cat-file --batch, for-each-ref --perl... Given how hard it is to deduce the intention behind such invocations, I am rather doubtful that this would improve our test suite. > I haven't used hg since 2005, but I know "hg server" exists > nowadays to get rid of a lot of startup overhead in Mercurial, > and maybe git could steal that idea, too... I have no idea what `hg server` does. Care to enlighten me? > > Therefore, it might be a really good project to modernize our test > > suite. To take ideas from modern test frameworks such as Jest and try to > > bring them to C. Which means that new contributors would probably be > > better suited to work on this project than Git old-timers! > > > > And the really neat thing about this project is that it could be done > > incrementally. > > I hope to find time to hack some more batch/eval-friendly stuff > that can make scripting git more performant; but no idea on my > availability :< Knock yourself out, if you enjoy that type of project. And who knows, maybe you will convince me yet that it benefits the tests... Ciao, Dscho ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-26 12:45 ` Johannes Schindelin @ 2019-09-30 8:55 ` Eric Wong 0 siblings, 0 replies; 63+ messages in thread From: Eric Wong @ 2019-09-30 8:55 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Tue, 24 Sep 2019, Eric Wong wrote: > > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > On Mon, 16 Sep 2019, Emily Shaffer wrote: > > > > - try and make progress towards running many tests from a single test > > > > file in parallel - maybe this is too big, I'm not sure if we know how > > > > many of our tests are order-dependent within a file for now... > > > > > > Another, potentially more rewarding, project would be to modernize our > > > test suite framework, so that it is not based on Unix shell scripting, > > > but on C instead. > > > > I worry more C would reduce the amount of contributors (some of > > the C rewrites already scared me off hacking years ago). I > > figure more users are familiar with sh than C. > > Seeing as most of the patches/patch series require contributors not only > to write test cases in Unix shell script, but also to patch or implement > code in C, I fail to be concerned about that. My point was that it was easier to experiment using a less tedious and verbose language than C. git already has a lot of commands and options, but they're mostly documented. Having to learn our internal C APIs on top of that is a lot of cognitive overhead (and I'm still learning our APIs, myself). > > It would also increase the disparity between tests and use of > > actual users from the command-line. > > I find it really endearing whenever I hear anybody talking about Git > users as if every single one of them crafted extensive shell scripts > around Git. In my experience, most of the people who do that gather on > this here mailing list, or on a nearby one, and that is but a tiny > fraction of Git users. I only mean stuff like: "git add foo && git commit -m ..." in tests, which is standard command-line usage and using our documented UI. Fwiw, I don't have extensive scripts or customizations around git or any of the tools I use, either. > Besides, it was my understanding that Git's test suite tries to prevent > regressions in Git's code, whether it is called in scripts or not. As > such, it would not matter _how_ that functionality is tested. Does that > not match your understanding? That matches my understanding. My concern that most of our functionality is exposed to users as commands; so new developers would feel more comfortable recreating tests off failures they encounter under normal use, instead of having to learn C API internals. > > > The fact that it is based on Unix shell scripting not only costs a lot > > > of speed, especially on Windows, it also limits us quite a bit, and I am > > > talking about a lot more than just the awkwardness of having to think > > > about options of BSD vs GNU variants of common command-line tools. > > > > I agree that it costs a lot of time, and I'm even on Linux using > > dash as /bin/sh + eatmydata (but ancient laptop) > > One thing that I meant to play with (but which is hampered by too many > parts of Git being implemented in Unix shell/Perl, still, therefore > making code coverage analysis hard) is Test Impact Analysis > (https://docs.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis). > In short, a way to avoid running tests the code touched by them was > already tested before. Example: if I change the `README.md`, no > regression test needs to be run at all. If I change `git-p4.py`, the > majority of test scripts can be skipped, only t98*.sh need to be run > (and maybe not even all of them). Cool. I wonder how much effort it would take to do with gcov + Devel::Cover. > There is a lot of work to be done on the built-in'ification, still, > before that becomes feasible, of course. > > > > For example, many, many, if not all, test cases, spend the majority of > > > their code on setting up specific scenarios. I don't know about you, > > > but personally I have to dive into many of them when things fail (and I > > > _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really > > > have to say that most of that code is hard to follow and does not make > > > it easy to form a mental model of what the code tries to accomplish. > > > > > > To address this, a while ago Thomas Rast started to use `fast-export`ed > > > commit histories in test scripts (see e.g. `t/t3206/history.export`). I > > > still find that this fails to make it easier for occasional readers to > > > understand the ideas underlying the test cases. > > > > > > Another approach is to document heavily the ideas first, then use code > > > to implement them. For example, t3430 starts with this: > > > > > > [...] > > > > > > Initial setup: > > > > > > -- B -- (first) > > > / \ > > > A - C - D - E - H (master) > > > \ \ / > > > \ F - G (second) > > > \ > > > Conflicting-G > > > > > > [...] > > > > > > test_commit A && > > > git checkout -b first && > > > test_commit B && > > > git checkout master && > > > test_commit C && > > > test_commit D && > > > git merge --no-commit B && > > > test_tick && > > > git commit -m E && > > > git tag -m E E && > > > git checkout -b second C && > > > test_commit F && > > > test_commit G && > > > git checkout master && > > > git merge --no-commit G && > > > test_tick && > > > git commit -m H && > > > git tag -m H H && > > > git checkout A && > > > test_commit conflicting-G G.t > > > > > > [...] > > > > > > While this is _somewhat_ better than having only the code, I am still > > > unhappy about it: this wall of `test_commit` lines interspersed with > > > other commands is very hard to follow. > > > > Agreed. More on the readability part below... > > > > As far as speeding that up, I think moving some parts > > of test setup to Makefiles + fast-import/fast-export would give > > us a nice balance of speed + maintainability: > > > > 1. initial setup is done using normal commands (or graph drawing tool) > > 2. the result of setup is "built" with fast-export > > 3. test uses fast-import > > I actually talked about this in my mail. If you find it easy to deduce > the intent behind a commit history that was exported via fast-export, > more power to you. (Was the committer name crucial? The file name? Or > the temporal order of the commits?) > > In contrast, I find it very challenging, myself. And please keep in mind > that the first thing any contributor needs to do who sees a failing > regression test (where the failure is most likely caused by the patch > they plan on contributing): understand what the heck the regression test > case is trying to ensure. The harder the code makes that, the worse it > does its primary job: to (help) prevent regressions. > > So no, I am not at all on board with moving to fast-imported commit > histories in Git's test suite. They provide some convenience to the > authors of those regression tests, which is not the audience you need to > cater for in this case: instead, regression tests should make it not > only easy to catch, but _especially_ easy to fix, regressions. And that > audience would pay dearly for that erstwhile convenience. Sorry I wasn't clear. I only want fast-export/import to be used as a local cache mechanism for tests. I don't want to be writing or maintaining fast-import histories by hand, either :> > > Makefile rules would prevent subsequent test runs from repeating > > 1. and 2. > > That is a cute idea, until you realize that the number of developers > fluent in `make` is even smaller than the number of developers fluent in > `C`. In other words, you would again _increase_ the the number of > prerequisites instead of reducing it. Wouldn't any C developer need to know SOME build system? We already use make, and make uses sh. I'm not especially fluent in make or sh, either; but I can hack my way through them. > > > If we were to (slowly) convert our test suite framework to C, we could > > > change that. > > > > > > One idea would be to allow recreating commit history from something that > > > looks like the output of `git log`, or even `git log --graph --oneline`, > > > much like `git mktree` (which really should have been a test helper > > > instead of a Git command, but I digress) takes something that looks like > > > the output of `git ls-tree` and creates a tree object from it. > > > > I've been playing with Graph::Easy (Perl5 module) in other > > projects, and I also think the setup could be more easily > > expressed with a declarative language (e.g. GNU make) > > I am dubious. But hey, if you show me something that looks _dead_ easy > to understand, and even easier to write for new contributors, who am I > to object? > > But mind, I am not fluent in Perl. I can probably hack my way through > `git-svn`, but that's a far cry from knowing what I am doing there. > > And wouldn't that _again_ increase the number of prerequisites on > contributors? I mean, you sounded genuinely concerned about that at the > beginning of the mail. And I share that concern. I'm not saying we use Graph::Easy, but we could probably take inspiration from it for creating something that makes creating histories easy and maintainable. But it could also be writing better comments, using "git commit" as normal, and letting fast-export cache to speed up subsequent runs. > > > Another thing that would be much easier if we moved more and more parts > > > of the test suite framework to C: we could implement more powerful > > > assertions, a lot more easily. For example, the trace output of a failed > > > `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more > > > focused on what is going wrong than on cluttering the terminal window > > > with almost useless lines which are tedious to sift through. > > > > I fail to see how language choice here matters. > > In Unix shell script, we either trace (everything) or we don't. In C, > you can be a lot more fine-grained with log messages. A *lot*. > > We even already have such a fine-grained log machinery in place, see > `trace.h`. > > Also, because of the ability to perform more sophisticated locking, lazy > prerequisites can easily be cached in C, whereas that is not so easy in > Unix shell (and hence it is not done). `make` seems good at driving stuff to cache. > > > Likewise, having a framework in C would make it a lot easier to improve > > > debugging, e.g. by making test scripts "resumable" (guarded by an > > > option, it could store a complete state, including a copy of the trash > > > directory, before executing commands, which would allow "going back in > > > time" and calling a failing command with a debugger, or with valgrind, or > > > just seeing whether the command would still fail, i.e. whether the test > > > case is flaky). > > > > Resumability sounds like a perfect job for GNU make. > > Umm. > > So let me give you an example of something I had to debug recently. A > `git stash apply` marked files outside the sparse checkout as deleted, > when they actually had staged changes during the `git stash` call. > > If this had been a regression test case, it would have looked like this: > > test_expect_success 'stash handles skip-worktree entries nicely' ' > test_commit A && > echo changed >A.t && > git add A.t && > git update-index --skip-worktree A.t && > rm A.t && > git stash && > > : this should not mark A.t as deleted && > git stash apply && > test -n "$(git ls-files A.t)" > ' > > Now, the problem would have occurred in the very last command: `A.t` > would have been missing from the index. > > In order to debug this, you would have had to put in "breakpoints" > (inserting `false &&` at strategic places), or prefix commands with > `debug` to start them in GDB, then re-run the test case. > > Lather, rinse and repeat, until you figured out that `git stash` was > failing to record `A.t` properly. > > Then dig into that, recreating the same worktree situation every time > you run `git stash`, until you find out that there is a call to `git > update-index -z --add --remove --stdin` that removes that file. > > Further investigation would show you that this command pretty much does > what it is told to, because it is fed `A.t` in its `stdin`. > > The next step would probably be to go back even one more step and see > that `diff_index()` reported this file as modified between the index and > `HEAD`. > > Slowly, you would form an understanding of what is going wrong, and you > would have to go back and forth between blaming `diff_index()` and > `update-index`, or the options `git stash` passes to them. > > You would have to recreate the worktree many, many times, in order to > dig in deep, and of course you would need to understand the intention > not only of the regression test, but also of the code it calls. > > In this instance, it is merely tedious, but possible. I know, because I > did it. For flaky tests, not so much. > > *That* is the scenario I tried to get at. > > Writing the test cases in `make` would not help that. Not one bit. It > would actually make things a lot worse. I agree debugging can be tedious, but I'm not sure how to get around that besides making test cases smaller, more isolated (which can also lead to missing coverage), and better documented. How would your ideal test framework approach it? Was using git tracing in the above scenarios not enough? I've never cared much about project-specific tracers; but instead prefer to rely on project-agnostic (but OS-specific) facilities such as truss/strace since I can reuse that knowledge across languages/projects. > > (that said, I don't know if you use make or something else to build > > gfw) > > We are talking about the test suite, yes? > > Not about building Git? Because it does not matter whether we use `make` > or not (we do by default, although we also have an option to build in > Visual Studio and/or via MSBuild). > > On Windows, just like on Linux, we use a Unix shell interpreter (Bash). > Sure, to run the entire test suite, we use `make` (sometimes in > conjunction with `prove`), but the tests themselves are written in Unix > shell script, so I have a hard time imagining a different method to run > them -- whether on Windows or not -- than to use a Unix shell > interpreter such as Bash. > > > > In many ways, our current test suite seems to test Git's > > > functionality as much as (core) contributors' abilities to implement > > > test cases in Unix shell script, _correctly_, and maybe also > > > contributors' patience. You could say that it tests for the wrong > > > thing at least half of the time, by design. > > > > Basic (not advanced) sh is already a prerequisite for using git. > > Well, if you are happy with that prerequisite to be set in stone, I am > not. Why should any Git user *need* to know sh? For chaining commands together, why not? I expect anybody who's learned C to also be able to figure out some basic scripting to tie a series of commands together in the same way you can poke around in Perl without being fluent. And there's plenty of overlap between C and sh when it comes to with control flow (if/else/while/&&/for). > > Writing correct code and tests in ANY language is still a > > challenge for me; but I'm least convinced a low-level language > > such as C is the right language for writing integration tests in. > > I would be delighted to go for a proper, well-maintained test framework > such as Jest. But of course, that would not be accepted in this project. > So I won't even think about it. > > > C is fine for unit tests, and maybe we can use more unit tests and > > less integration tests. > > We don't have integration tests. > > Unless your concept of what constitutes an "integration test" is very > different from mine. For me, an integration test would set up an > environment that involves multiple points of failure, e.g. setting up an > SSH server on Linux and accessing that via Git from Ubuntu. Or setting > up a web server with a self-signed certificate, import the public key > into the Windows Certificate Store and then accessing the server both > using OpenSSL and Secure Channel (the native Windows way to communicate > via TLS). I think of integration tests as anything which covers real-world use cases (e.g. running documented commands as a normal user would). Unit tests would be t/helper/*.c in my mind... But maybe my terminology is all wrong *shrug* > There is nothing even close to that in Git's test suite. > > > > It might look like a somewhat less important project, but given that we > > > exercise almost 150,000 test cases with every CI build, I think it does > > > make sense to grind our axe for a while, so to say. > > > > Something that would benefit both users and regular contributors > > is the use and adoption of more batch and eval-friendly interfaces. > > e.g. fast-import/export, cat-file --batch, for-each-ref --perl... > > Given how hard it is to deduce the intention behind such invocations, I > am rather doubtful that this would improve our test suite. > > > I haven't used hg since 2005, but I know "hg server" exists > > nowadays to get rid of a lot of startup overhead in Mercurial, > > and maybe git could steal that idea, too... > > I have no idea what `hg server` does. Care to enlighten me? Sorry, "hg serve" (no 'r'). It starts a long-lived server to expose the API over a pipe or socket: https://www.mercurial-scm.org/wiki/CommandServer Closest we have is fast-import and "cat-file --batch*" ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-24 0:55 ` Eric Wong 2019-09-26 12:45 ` Johannes Schindelin @ 2019-09-28 4:01 ` Junio C Hamano 1 sibling, 0 replies; 63+ messages in thread From: Junio C Hamano @ 2019-09-28 4:01 UTC (permalink / raw) To: Eric Wong Cc: Johannes Schindelin, Emily Shaffer, Jeff King, Jonathan Tan, git Eric Wong <e@80x24.org> writes: > C is fine for unit tests, and maybe we can use more unit tests > and less integration tests. Nicely put. I often find it somewhat disturbing that what some of the t/helper/ tests are trying to exercise is at too low a level that the distance from the real-world observable effect is too many hops detached. For unit tests (of an API, for example), that is exactly what we want. For a test of an entire command, it feels like scratching foot from outside while still wearing a shoe. > I hope to find time to hack some more batch/eval-friendly stuff > that can make scripting git more performant; but no idea on my > availability :< ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-13 20:51 ` Jeff King 2019-09-16 18:42 ` Emily Shaffer @ 2019-09-20 17:04 ` Jonathan Tan 2019-09-21 1:47 ` Emily Shaffer ` (2 more replies) 1 sibling, 3 replies; 63+ messages in thread From: Jonathan Tan @ 2019-09-20 17:04 UTC (permalink / raw) To: peff; +Cc: jonathantanmy, git > Prospective mentors need to sign up on that site, and should propose a > project they'd be willing to mentor. [snip] > I'm happy to discuss possible projects if anybody has an idea but isn't > sure how to develop it into a proposal. I'm new to Outreachy and programs like this, so does anyone have an opinion on my draft proposal below? It does not have any immediate user-facing benefit, but it does have a definite end point. Also let me know if an Outreachy proposal should have more detail, etc. Refactor "git index-pack" logic into library code Currently, whenever any Git code needs a pack to be indexed, it needs to spawn a new "git index-pack" process, passing command-line arguments and communicating with it using file descriptors (standard input and output), much like an end-user would if invoking "git index-pack" directly. Refactor the pack indexing logic into library code callable from other Git code, make "git index-pack" a thin wrapper around that library code, and (to demonstrate that the refactoring works) change fetch-pack.c to use the library code instead of spawning the "git index-pack" process. This allows the pack indexing code to communicate with its callers with the full power of C (structs, callbacks, etc.) instead of being restricted to command-line arguments and file descriptors. It also simplifies debugging in that there will no longer be 2 inter-communicating processes to deal with, only 1. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-20 17:04 ` Jonathan Tan @ 2019-09-21 1:47 ` Emily Shaffer 2019-09-23 14:23 ` Christian Couder ` (2 more replies) 2019-09-23 11:49 ` Christian Couder 2019-09-23 19:15 ` Jeff King 2 siblings, 3 replies; 63+ messages in thread From: Emily Shaffer @ 2019-09-21 1:47 UTC (permalink / raw) To: Jonathan Tan; +Cc: peff, git On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote: > > Prospective mentors need to sign up on that site, and should propose a > > project they'd be willing to mentor. > > [snip] > > > I'm happy to discuss possible projects if anybody has an idea but isn't > > sure how to develop it into a proposal. > > I'm new to Outreachy and programs like this, so does anyone have an > opinion on my draft proposal below? It does not have any immediate > user-facing benefit, but it does have a definite end point. I'd appreciate similar opinion if anybody has it - and I'd also really feel more comfortable with a co-mentor. """ "Did You Mean..?" There are some situations where it's fairly clear what a user meant to do, even though they did not do that thing correctly. For example, if a user runs `git commit` with tracked, modified, unstaged files in their worktree, but no staged files at all, it's fairly likely that they simply forgot to add the files they wanted. In this case, the error message is slightly obtuse: $ git commit On branch master Changes not staged for commit: modified: foo.txt no changes added to commit Since we have an idea of what the user _meant_ to do, we can offer something more like: $ git commit On branch master Changes not staged for commit: modified: foo.txt Stage listed changes and continue? [Y/n] While the above case is a good starting place, other similar cases can be added afterwards if time permits. These helper prompts should be enabled/disabled via a config option so that people who are used to their current workflow won't be impacted. """ Thanks in advance for feedback. - Emily ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-21 1:47 ` Emily Shaffer @ 2019-09-23 14:23 ` Christian Couder 2019-09-23 19:40 ` Jeff King 2019-10-22 21:16 ` Emily Shaffer 2 siblings, 0 replies; 63+ messages in thread From: Christian Couder @ 2019-09-23 14:23 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jonathan Tan, Jeff King, git On Mon, Sep 23, 2019 at 3:35 PM Emily Shaffer <emilyshaffer@google.com> wrote: > > On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote: > > > I'm new to Outreachy and programs like this, so does anyone have an > > opinion on my draft proposal below? It does not have any immediate > > user-facing benefit, but it does have a definite end point. > > I'd appreciate similar opinion if anybody has it - and I'd also really > feel more comfortable with a co-mentor. First as the deadline is tomorrow, I think it is important to submit projects to Outreachy even if they are not perfect and even if we would like a co-mentor (which is also my case by the way). We can hopefully improve the projects after the deadline or perhaps drop them if we cannot find enough co-mentors or if we don't agree with the goal or find it too difficult. > """ > "Did You Mean..?" > > There are some situations where it's fairly clear what a user meant to > do, even though they did not do that thing correctly. For example, if a > user runs `git commit` with tracked, modified, unstaged files in their > worktree, but no staged files at all, it's fairly likely that they > simply forgot to add the files they wanted. In this case, the error > message is slightly obtuse: > > $ git commit > On branch master > Changes not staged for commit: > modified: foo.txt > > no changes added to commit > > > Since we have an idea of what the user _meant_ to do, we can offer > something more like: > > $ git commit > On branch master > Changes not staged for commit: > modified: foo.txt > > Stage listed changes and continue? [Y/n] > > While the above case is a good starting place, other similar cases can > be added afterwards if time permits. These helper prompts should be > enabled/disabled via a config option so that people who are used to > their current workflow won't be impacted. > """ I agree that it might help. There could be significant discussion about what the UI should be though. For example maybe we could just promote `git commit -p` in the tutorials instead of doing the above. Or have a commit.patch config option if we haven't one already. Thanks, Christian. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-21 1:47 ` Emily Shaffer 2019-09-23 14:23 ` Christian Couder @ 2019-09-23 19:40 ` Jeff King 2019-09-23 22:29 ` Philip Oakley 2019-10-22 21:16 ` Emily Shaffer 2 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 19:40 UTC (permalink / raw) To: Emily Shaffer; +Cc: Jonathan Tan, git On Fri, Sep 20, 2019 at 06:47:01PM -0700, Emily Shaffer wrote: > """ > "Did You Mean..?" > > There are some situations where it's fairly clear what a user meant to > do, even though they did not do that thing correctly. For example, if a > user runs `git commit` with tracked, modified, unstaged files in their > worktree, but no staged files at all, it's fairly likely that they > simply forgot to add the files they wanted. In this case, the error > message is slightly obtuse: > > $ git commit > On branch master > Changes not staged for commit: > modified: foo.txt > > no changes added to commit > > > Since we have an idea of what the user _meant_ to do, we can offer > something more like: > > $ git commit > On branch master > Changes not staged for commit: > modified: foo.txt > > Stage listed changes and continue? [Y/n] > > While the above case is a good starting place, other similar cases can > be added afterwards if time permits. These helper prompts should be > enabled/disabled via a config option so that people who are used to > their current workflow won't be impacted. > """ This is an interesting idea. At first I thought it might be too small for a project, but I think it could be expanded or contracted as much as the time allows by just looking for more "did you mean" spots. I have mixed feelings on making things interactive. For one, it gets awkward when Git commands are called as part of a script or other program (and a lot of programs like git-commit ride the line of plumbing and porcelain). I know this would kick in only when a config option is set, but I think that might things even _more_ confusing, as something that works for one user (without the config) would start behaving weirdly for another. I also think it might be an opportunity to educate. Instead of giving a yes/no prompt, we can actually recommend one (or more!) sets of commands to get the desired effect. I _thought_ we already did for this case by default (triggered by advice.statusHints, which is true by default). But it looks like those don't get printed for git-commit? -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 19:40 ` Jeff King @ 2019-09-23 22:29 ` Philip Oakley 0 siblings, 0 replies; 63+ messages in thread From: Philip Oakley @ 2019-09-23 22:29 UTC (permalink / raw) To: Jeff King, Emily Shaffer; +Cc: Jonathan Tan, git On 23/09/2019 20:40, Jeff King wrote: > On Fri, Sep 20, 2019 at 06:47:01PM -0700, Emily Shaffer wrote: > >> """ >> "Did You Mean..?" >> >> There are some situations where it's fairly clear what a user meant to >> do, even though they did not do that thing correctly. For example, if a >> user runs `git commit` with tracked, modified, unstaged files in their >> worktree, but no staged files at all, it's fairly likely that they >> simply forgot to add the files they wanted. In this case, the error >> message is slightly obtuse: >> >> $ git commit >> On branch master >> Changes not staged for commit: >> modified: foo.txt >> >> no changes added to commit >> >> >> Since we have an idea of what the user _meant_ to do, we can offer >> something more like: >> >> $ git commit >> On branch master >> Changes not staged for commit: >> modified: foo.txt >> >> Stage listed changes and continue? [Y/n] >> >> While the above case is a good starting place, other similar cases can >> be added afterwards if time permits. These helper prompts should be >> enabled/disabled via a config option so that people who are used to >> their current workflow won't be impacted. >> """ > This is an interesting idea. At first I thought it might be too small > for a project, but I think it could be expanded or contracted as much as > the time allows by just looking for more "did you mean" spots. > > I have mixed feelings on making things interactive. For one, it gets > awkward when Git commands are called as part of a script or other > program (and a lot of programs like git-commit ride the line of plumbing > and porcelain). I know this would kick in only when a config option is > set, but I think that might things even _more_ confusing, as something > that works for one user (without the config) would start behaving > weirdly for another. > > I also think it might be an opportunity to educate. Instead of giving a > yes/no prompt, we can actually recommend one (or more!) sets of commands > to get the desired effect. I _thought_ we already did for this case by > default (triggered by advice.statusHints, which is true by default). But > it looks like those don't get printed for git-commit? > > -Peff Also there is a lot of common problems and issues that can be mined from StackOverflow for similar "Did You Mean..?"user problems. Philip ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-21 1:47 ` Emily Shaffer 2019-09-23 14:23 ` Christian Couder 2019-09-23 19:40 ` Jeff King @ 2019-10-22 21:16 ` Emily Shaffer 2 siblings, 0 replies; 63+ messages in thread From: Emily Shaffer @ 2019-10-22 21:16 UTC (permalink / raw) To: Jonathan Tan; +Cc: peff, git On Fri, Sep 20, 2019 at 06:47:01PM -0700, Emily Shaffer wrote: > On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote: > > > Prospective mentors need to sign up on that site, and should propose a > > > project they'd be willing to mentor. > > > > [snip] > > > > > I'm happy to discuss possible projects if anybody has an idea but isn't > > > sure how to develop it into a proposal. > > > > I'm new to Outreachy and programs like this, so does anyone have an > > opinion on my draft proposal below? It does not have any immediate > > user-facing benefit, but it does have a definite end point. > > I'd appreciate similar opinion if anybody has it - and I'd also really > feel more comfortable with a co-mentor. I know early on in this thread about Outreachy projects some folks expressed interest in comentoring. Is anybody still interested in doing so? For context, I've been in contact with 3 applicants who have either sent their first patch already or are getting ready to (and have needed some involved discussion or help) plus another few applicants who have inquired and may or may not send patches in the future. I've also received quite a few mails outside of my timezone working hours (I usually am awake/working 18:00GMT-02:00GMT), which I feel badly about not being able to respond to in a timely fashion. If anybody wants to comentor I would be so excited to have the help :) - Emily ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-20 17:04 ` Jonathan Tan 2019-09-21 1:47 ` Emily Shaffer @ 2019-09-23 11:49 ` Christian Couder 2019-09-23 17:58 ` Jonathan Tan 2019-09-23 19:15 ` Jeff King 2 siblings, 1 reply; 63+ messages in thread From: Christian Couder @ 2019-09-23 11:49 UTC (permalink / raw) To: Jonathan Tan; +Cc: Jeff King, git On Mon, Sep 23, 2019 at 10:53 AM Jonathan Tan <jonathantanmy@google.com> wrote: > > > Prospective mentors need to sign up on that site, and should propose a > > project they'd be willing to mentor. Yeah, you are very welcome to sign up soon if you haven't already done so as I think the deadline is really soon. > > I'm happy to discuss possible projects if anybody has an idea but isn't > > sure how to develop it into a proposal. > > I'm new to Outreachy and programs like this, so does anyone have an > opinion on my draft proposal below? It does not have any immediate > user-facing benefit, but it does have a definite end point. No need for user-facing benefits. Refactoring or improving the code in other useful ways are very good subjects (as I already said in my reply to Emily and Dscho). > Also let me know if an Outreachy proposal should have more detail, etc. > > Refactor "git index-pack" logic into library code > > Currently, whenever any Git code needs a pack to be indexed, it > needs to spawn a new "git index-pack" process, passing command-line > arguments and communicating with it using file descriptors (standard > input and output), much like an end-user would if invoking "git > index-pack" directly. Refactor the pack indexing logic into library > code callable from other Git code, make "git index-pack" a thin > wrapper around that library code, and (to demonstrate that the > refactoring works) change fetch-pack.c to use the library code > instead of spawning the "git index-pack" process. > > This allows the pack indexing code to communicate with its callers > with the full power of C (structs, callbacks, etc.) instead of being > restricted to command-line arguments and file descriptors. It also > simplifies debugging in that there will no longer be 2 > inter-communicating processes to deal with, only 1. I think this is really great, both the idea and the description! No need for more details. Thanks, Christian. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 11:49 ` Christian Couder @ 2019-09-23 17:58 ` Jonathan Tan 2019-09-23 19:27 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: Jonathan Tan @ 2019-09-23 17:58 UTC (permalink / raw) To: christian.couder; +Cc: jonathantanmy, peff, git > No need for user-facing benefits. Refactoring or improving the code in > other useful ways are very good subjects (as I already said in my > reply to Emily and Dscho). Thanks! > I think this is really great, both the idea and the description! No > need for more details. Thanks! I've just submitted the project proposal - hopefully it will be approved soon. In any case, it seems that project information can be edited after submission. There was a "How can applicants make a contribution to your project?" question and a few questions about communication channels. I answered them as best I could but if anyone has already answered them, it would be great to just use the same answer everywhere. (I can't see all project information of other projects since I haven't filled out a "short initial application", but I don't think that applies to me.) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 17:58 ` Jonathan Tan @ 2019-09-23 19:27 ` Jeff King 2019-09-23 20:48 ` Jonathan Tan 0 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 19:27 UTC (permalink / raw) To: Jonathan Tan; +Cc: christian.couder, git On Mon, Sep 23, 2019 at 10:58:05AM -0700, Jonathan Tan wrote: > > I think this is really great, both the idea and the description! No > > need for more details. > > Thanks! I've just submitted the project proposal - hopefully it will be > approved soon. In any case, it seems that project information can be > edited after submission. I approved this. I did leave some comments elsewhere in the thread, but I think we can continue to iterate on the idea. > There was a "How can applicants make a contribution to your project?" > question and a few questions about communication channels. I answered > them as best I could but if anyone has already answered them, it would > be great to just use the same answer everywhere. (I can't see all > project information of other projects since I haven't filled out a > "short initial application", but I don't think that applies to me.) What you wrote there looks pretty good to me. Copying it here for purposes of discussion: > Please introduce yourself on the public project chat: > > IRC - Follow this link to join this project's public chat. If you are > asked for username, pick any username! You will not need a password to > join this chat. > > Git mailing list - Follow this link to join this project's public > chat. where the IRC link goes to Freenode's webchat, and the mailing list link goes to https://git-scm.com/community. The other proposal is from Christian, who wrote: > Please introduce yourself on the public project chat: > > a mailing list - Once you join the project's communication > channel, the mentors have some additional instructions for you to > follow: > > Start the subject of your emails to the mailing list with "[Outreachy]" so that mentors and people interested in Outreachy can easily notice your emails. > Follow this link to join this project's public chat. > > Send an email to majordomo@vger.kernel.org with "subscribe git" in > the body of the email. I think the "[Outreachy]" advice is good, and worth adding to yours. I think linking to the "community" page is a good idea for Christian's, as it has more tips on using the mailing list. I'd leave it up to individual mentors whether they want to mention IRC (based on whether or not they actually use IRC themselves). -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 19:27 ` Jeff King @ 2019-09-23 20:48 ` Jonathan Tan 0 siblings, 0 replies; 63+ messages in thread From: Jonathan Tan @ 2019-09-23 20:48 UTC (permalink / raw) To: peff; +Cc: jonathantanmy, christian.couder, git > I approved this. I did leave some comments elsewhere in the thread, but > I think we can continue to iterate on the idea. Thanks. > > There was a "How can applicants make a contribution to your project?" > > question and a few questions about communication channels. I answered > > them as best I could but if anyone has already answered them, it would > > be great to just use the same answer everywhere. (I can't see all > > project information of other projects since I haven't filled out a > > "short initial application", but I don't think that applies to me.) > > What you wrote there looks pretty good to me. Copying it here for > purposes of discussion: > > > Please introduce yourself on the public project chat: > > > > IRC - Follow this link to join this project's public chat. If you are > > asked for username, pick any username! You will not need a password to > > join this chat. > > > > Git mailing list - Follow this link to join this project's public > > chat. > > where the IRC link goes to Freenode's webchat, and the mailing list link > goes to https://git-scm.com/community. Thanks. For the record, most of the text is from Outreachy - I just supplied the links. > The other proposal is from Christian, who wrote: > > > Please introduce yourself on the public project chat: > > > > a mailing list - Once you join the project's communication > > channel, the mentors have some additional instructions for you to > > follow: > > > > Start the subject of your emails to the mailing list with "[Outreachy]" so that mentors and people interested in Outreachy can easily notice your emails. > > Follow this link to join this project's public chat. > > > > Send an email to majordomo@vger.kernel.org with "subscribe git" in > > the body of the email. > > I think the "[Outreachy]" advice is good, and worth adding to yours. I > think linking to the "community" page is a good idea for Christian's, as > it has more tips on using the mailing list. Good idea on the "[Outreachy]" - I've added it to my project as well. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-20 17:04 ` Jonathan Tan 2019-09-21 1:47 ` Emily Shaffer 2019-09-23 11:49 ` Christian Couder @ 2019-09-23 19:15 ` Jeff King 2019-09-23 20:38 ` Jonathan Tan 2 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 19:15 UTC (permalink / raw) To: Jonathan Tan; +Cc: git On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote: > > I'm happy to discuss possible projects if anybody has an idea but isn't > > sure how to develop it into a proposal. > > I'm new to Outreachy and programs like this, so does anyone have an > opinion on my draft proposal below? It does not have any immediate > user-facing benefit, but it does have a definite end point. > > Also let me know if an Outreachy proposal should have more detail, etc. > > Refactor "git index-pack" logic into library code > > Currently, whenever any Git code needs a pack to be indexed, it > needs to spawn a new "git index-pack" process, passing command-line > arguments and communicating with it using file descriptors (standard > input and output), much like an end-user would if invoking "git > index-pack" directly. Refactor the pack indexing logic into library > code callable from other Git code, make "git index-pack" a thin > wrapper around that library code, and (to demonstrate that the > refactoring works) change fetch-pack.c to use the library code > instead of spawning the "git index-pack" process. > > This allows the pack indexing code to communicate with its callers > with the full power of C (structs, callbacks, etc.) instead of being > restricted to command-line arguments and file descriptors. It also > simplifies debugging in that there will no longer be 2 > inter-communicating processes to deal with, only 1. I think this is an OK level of detail. I'm not sure quite sure about the goal of the project, though. In particular: - I'm not clear what we'd hope to gain. I.e., what richer information would we want to pass back and forth between index-pack and the other processes? It might also be more efficient, but I'm not sure it's measurably so (we save a single process, and we save some pipe traffic, but the sideband demuxer would probably end up passing it over a self-pipe anyway). - index-pack is prone to dying on bad input, and we wouldn't want it to take down the outer fetch-pack or receive-pack, which are what produce useful messages to the user. That's something that could be fixed as part of the libification, but I suspect the control flow might be a little tricky. - we don't always call index-pack, but sometimes call unpack-objects. I suppose we could continue to call an external unpack-objects in that path, but that eliminates the utility of having richer communication if we sometimes have to take the "dumb" path. A while ago I took a stab at teaching index-pack to unpack. It works, but there are a few ugly bits, as discussed in: https://github.com/peff/git/commit/7df82454a855281e9c147f3023225f8a6f72e303 Maybe that would be worth making part of the project? -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 19:15 ` Jeff King @ 2019-09-23 20:38 ` Jonathan Tan 2019-09-23 21:28 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: Jonathan Tan @ 2019-09-23 20:38 UTC (permalink / raw) To: peff; +Cc: jonathantanmy, git > I think this is an OK level of detail. I'm not sure quite sure about the > goal of the project, though. In particular: > > - I'm not clear what we'd hope to gain. I.e., what richer information > would we want to pass back and forth between index-pack and the > other processes? It might also be more efficient, but I'm not sure > it's measurably so (we save a single process, and we save some pipe > traffic, but the sideband demuxer would probably end up passing it > over a self-pipe anyway). I didn't have any concrete ideas so I didn't include those, but some unrefined ideas: - index-pack has the CLI option to specify a message to be written into the .promisor file, but in my patch to write fetched refs to .promisor [1], I ended up making fetch-pack.c write the information because I didn't know how many refs were going to be written (and I didn't want to bump into CLI argument length limits). If we had this feature, I might have been able to pass a callback to index-pack that writes the list of refs once we have the fd into .promisor, eliminating some code duplication (but I haven't verified this). - In your reply [2] to the above [1], you mentioned the possibility of keeping a list of cutoff points. One way of doing this, as I state in [3], is my original suggestion back in 2017 of one such repository-wide list. If we do this, it would be better for fetch-pack to handle this instead of index-pack, and it seems more efficient to me to have index-pack be able to pass objects to fetch-pack as they are inflated instead of fetch-pack rereading the compressed forms on disk (but again, I haven't verified this). [1] https://public-inbox.org/git/20190826214737.164132-1-jonathantanmy@google.com/ [2] https://public-inbox.org/git/20190905070153.GE21450@sigill.intra.peff.net/ [3] https://public-inbox.org/git/20190905183926.137490-1-jonathantanmy@google.com/ There are also the debuggability improvements of not having to deal with 2 processes. > - index-pack is prone to dying on bad input, and we wouldn't want it > to take down the outer fetch-pack or receive-pack, which are what > produce useful messages to the user. That's something that could be > fixed as part of the libification, but I suspect the control flow > might be a little tricky. Good point. > - we don't always call index-pack, but sometimes call unpack-objects. > I suppose we could continue to call an external unpack-objects in > that path, but that eliminates the utility of having richer > communication if we sometimes have to take the "dumb" path. A while > ago I took a stab at teaching index-pack to unpack. It works, but > there are a few ugly bits, as discussed in: > > https://github.com/peff/git/commit/7df82454a855281e9c147f3023225f8a6f72e303 > > Maybe that would be worth making part of the project? I'm reluctant to do so because I don't want to increase the scope too much - although if my project has relatively narrow scope for an Outreachy project, we can do so. As for eliminating the utility of having richer communication, I don't think so, because in the situations where we require richer communication (right now, situations to do with partial clone), we specifically run index-pack anyway. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 20:38 ` Jonathan Tan @ 2019-09-23 21:28 ` Jeff King 2019-09-24 17:07 ` Jonathan Tan 0 siblings, 1 reply; 63+ messages in thread From: Jeff King @ 2019-09-23 21:28 UTC (permalink / raw) To: Jonathan Tan; +Cc: git On Mon, Sep 23, 2019 at 01:38:54PM -0700, Jonathan Tan wrote: > I didn't have any concrete ideas so I didn't include those, but some > unrefined ideas: One risk to a mentoring project like this is that the intern does a good job of steps 1-5, and then in step 6 we realize that the whole thing is not useful, and upstream doesn't want it. Which isn't to say the intern didn't learn something, and the project didn't benefit. Negative results can be useful; but it can also be demoralizing. I'm not arguing that's going to be the case here. But I do think it's worth talking through these things a bit as part of thinking about proposals. > - index-pack has the CLI option to specify a message to be written into > the .promisor file, but in my patch to write fetched refs to > .promisor [1], I ended up making fetch-pack.c write the information > because I didn't know how many refs were going to be written (and I > didn't want to bump into CLI argument length limits). If we had this > feature, I might have been able to pass a callback to index-pack that > writes the list of refs once we have the fd into .promisor, > eliminating some code duplication (but I haven't verified this). That makes some sense. We could pass the data over a pipe, but obviously stdin is already in use to receive the pack here. Ideally we'd be able to pass multiple streams between the programs, but I think due to Windows support, we can't assume that arbitrary pipe descriptors will make it across the run-command boundary. So I think we'd be left with communicating via temporary files (which really isn't the worst thing in the world, but has its own complications). > - In your reply [2] to the above [1], you mentioned the possibility of > keeping a list of cutoff points. One way of doing this, as I state in > [3], is my original suggestion back in 2017 of one such > repository-wide list. If we do this, it would be better for > fetch-pack to handle this instead of index-pack, and it seems more > efficient to me to have index-pack be able to pass objects to > fetch-pack as they are inflated instead of fetch-pack rereading the > compressed forms on disk (but again, I haven't verified this). And this is the flip-side problem: we need to get data back, but we have only stdout, which is already in use (so we need some kind of protocol). That leads to things like the horrible NUL-byte added by 83558686ce (receive-pack: send keepalives during quiet periods, 2016-07-15). > There are also the debuggability improvements of not having to deal with > 2 processes. I think it can sometimes be easier to debug with two separate processes, because the input to index-pack is well-defined and can be repeated without hitting the network (though you do have to figure out how to record the network response, which can be non-trivial). I've also done similar things for running performance simulations. We'll still have the stand-alone index-pack command, so it can be used for those cases. But as we add more features that utilize the in-process interface, that may eventually stop being feasible. > > [dropping unpack-objects] > > Maybe that would be worth making part of the project? > > I'm reluctant to do so because I don't want to increase the scope too > much - although if my project has relatively narrow scope for an > Outreachy project, we can do so. As for eliminating the utility of > having richer communication, I don't think so, because in the situations > where we require richer communication (right now, situations to do with > partial clone), we specifically run index-pack anyway. Yeah, we're in kind of a weird situation there, where unpack-objects is used less and less. I wonder how many surprises are lurking where somebody reasoned about index-pack behavior, but unpack-objects may do something slightly differently (I know this came up when we looked at fsck-ing incoming objects for submodule vulnerabilities). I kind of wonder if it would be reasonable to just always use index-pack for the sake of simplicity, even if it never learns to actually unpack objects. We've been doing that for years on the server side at GitHub without ill effects (I think the unpack route is slightly more efficient for a thin pack, but since it only kicks in when there are few objects anyway, I wonder how big an advantage it is in general). -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-23 21:28 ` Jeff King @ 2019-09-24 17:07 ` Jonathan Tan 2019-09-26 7:09 ` Jeff King 0 siblings, 1 reply; 63+ messages in thread From: Jonathan Tan @ 2019-09-24 17:07 UTC (permalink / raw) To: peff; +Cc: jonathantanmy, git > On Mon, Sep 23, 2019 at 01:38:54PM -0700, Jonathan Tan wrote: > > > I didn't have any concrete ideas so I didn't include those, but some > > unrefined ideas: > > One risk to a mentoring project like this is that the intern does a good > job of steps 1-5, and then in step 6 we realize that the whole thing is > not useful, and upstream doesn't want it. Which isn't to say the intern > didn't learn something, and the project didn't benefit. Negative results > can be useful; but it can also be demoralizing. That's true. I think that libification is in itself a useful and non-controversial goal. > I'm not arguing that's going to be the case here. But I do think it's > worth talking through these things a bit as part of thinking about > proposals. [snip] > > - index-pack has the CLI option to specify a message to be written into > > the .promisor file, but in my patch to write fetched refs to > > .promisor [1], I ended up making fetch-pack.c write the information > > because I didn't know how many refs were going to be written (and I > > didn't want to bump into CLI argument length limits). If we had this > > feature, I might have been able to pass a callback to index-pack that > > writes the list of refs once we have the fd into .promisor, > > eliminating some code duplication (but I haven't verified this). > > That makes some sense. We could pass the data over a pipe, but obviously > stdin is already in use to receive the pack here. Ideally we'd be able > to pass multiple streams between the programs, but I think due to > Windows support, we can't assume that arbitrary pipe descriptors will > make it across the run-command boundary. So I think we'd be left with > communicating via temporary files (which really isn't the worst thing in > the world, but has its own complications). > > > - In your reply [2] to the above [1], you mentioned the possibility of > > keeping a list of cutoff points. One way of doing this, as I state in > > [3], is my original suggestion back in 2017 of one such > > repository-wide list. If we do this, it would be better for > > fetch-pack to handle this instead of index-pack, and it seems more > > efficient to me to have index-pack be able to pass objects to > > fetch-pack as they are inflated instead of fetch-pack rereading the > > compressed forms on disk (but again, I haven't verified this). > > And this is the flip-side problem: we need to get data back, but we have > only stdout, which is already in use (so we need some kind of protocol). > That leads to things like the horrible NUL-byte added by 83558686ce > (receive-pack: send keepalives during quiet periods, 2016-07-15). Sounds good. With this, do you think that there is enough likelihood of acceptance that we can move ahead with my proposed project? Besides discussing the likelihood of patches being accepted/rejected, should we record the result of discussion somewhere (or, if only the mentor should give their ideas, for me to write in more detail)? I don't recall a place in the Outreachy form to write this, so I just mentioned the benefits in outline, but maybe I can just include it somewhere anyway. > > There are also the debuggability improvements of not having to deal with > > 2 processes. > > I think it can sometimes be easier to debug with two separate processes, > because the input to index-pack is well-defined and can be repeated > without hitting the network (though you do have to figure out how to > record the network response, which can be non-trivial). I've also done > similar things for running performance simulations. Hmm...that's true, but I think this is a matter of degree. The input to a lib function for index-pack can be similarly simple and well-defined (a C interface that we can exercise using a throwaway patch to test-tool, for example), but I agree that it usually won't be as simple as input to CLI (but this is because of limitations that the CLI imposes). > > > [dropping unpack-objects] > > > Maybe that would be worth making part of the project? > > > > I'm reluctant to do so because I don't want to increase the scope too > > much - although if my project has relatively narrow scope for an > > Outreachy project, we can do so. As for eliminating the utility of > > having richer communication, I don't think so, because in the situations > > where we require richer communication (right now, situations to do with > > partial clone), we specifically run index-pack anyway. > > Yeah, we're in kind of a weird situation there, where unpack-objects is > used less and less. I wonder how many surprises are lurking where > somebody reasoned about index-pack behavior, but unpack-objects may do > something slightly differently (I know this came up when we looked at > fsck-ing incoming objects for submodule vulnerabilities). > > I kind of wonder if it would be reasonable to just always use index-pack > for the sake of simplicity, even if it never learns to actually unpack > objects. We've been doing that for years on the server side at GitHub > without ill effects (I think the unpack route is slightly more efficient > for a thin pack, but since it only kicks in when there are few objects > anyway, I wonder how big an advantage it is in general). This sounds reasonable to me. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Git in Outreachy December 2019? 2019-09-24 17:07 ` Jonathan Tan @ 2019-09-26 7:09 ` Jeff King 0 siblings, 0 replies; 63+ messages in thread From: Jeff King @ 2019-09-26 7:09 UTC (permalink / raw) To: Jonathan Tan; +Cc: git On Tue, Sep 24, 2019 at 10:07:46AM -0700, Jonathan Tan wrote: > > > - In your reply [2] to the above [1], you mentioned the possibility of > > > keeping a list of cutoff points. One way of doing this, as I state in > > > [3], is my original suggestion back in 2017 of one such > > > repository-wide list. If we do this, it would be better for > > > fetch-pack to handle this instead of index-pack, and it seems more > > > efficient to me to have index-pack be able to pass objects to > > > fetch-pack as they are inflated instead of fetch-pack rereading the > > > compressed forms on disk (but again, I haven't verified this). > > > > And this is the flip-side problem: we need to get data back, but we have > > only stdout, which is already in use (so we need some kind of protocol). > > That leads to things like the horrible NUL-byte added by 83558686ce > > (receive-pack: send keepalives during quiet periods, 2016-07-15). > > Sounds good. With this, do you think that there is enough likelihood of > acceptance that we can move ahead with my proposed project? > > Besides discussing the likelihood of patches being accepted/rejected, > should we record the result of discussion somewhere (or, if only the > mentor should give their ideas, for me to write in more detail)? I don't > recall a place in the Outreachy form to write this, so I just mentioned > the benefits in outline, but maybe I can just include it somewhere > anyway. Yeah, I think it's OK to go ahead. I think an intern who is interested in the project would get in touch with you either directly, or via the list. So that gives some opportunity to discuss the ideas with them, and to go into more detail on the proposal in an interactive way (it would also be fine to point at this thread, too, of course). -Peff ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2019-10-22 21:16 UTC | newest] Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-27 5:17 Git in Outreachy December 2019? Jeff King 2019-08-31 7:58 ` Christian Couder 2019-08-31 19:44 ` Olga Telezhnaya 2019-09-04 19:41 ` Jeff King 2019-09-05 7:24 ` Christian Couder 2019-09-05 19:39 ` Emily Shaffer 2019-09-06 11:55 ` Carlo Arenas 2019-09-07 6:39 ` Jeff King 2019-09-07 10:13 ` Carlo Arenas 2019-09-07 6:36 ` Jeff King 2019-09-08 14:56 ` Pratyush Yadav 2019-09-09 17:00 ` Jeff King 2019-09-23 18:07 ` SZEDER Gábor 2019-09-26 9:47 ` SZEDER Gábor 2019-09-26 19:32 ` Johannes Schindelin 2019-09-26 21:54 ` SZEDER Gábor 2019-09-26 11:42 ` Johannes Schindelin 2019-09-13 20:03 ` Jonathan Tan 2019-09-13 20:51 ` Jeff King 2019-09-16 18:42 ` Emily Shaffer 2019-09-16 21:33 ` Eric Wong 2019-09-16 21:44 ` SZEDER Gábor 2019-09-16 23:13 ` Jonathan Nieder 2019-09-17 0:59 ` Jeff King 2019-09-17 11:23 ` Johannes Schindelin 2019-09-17 12:02 ` SZEDER Gábor 2019-09-23 12:47 ` Johannes Schindelin 2019-09-23 16:58 ` SZEDER Gábor 2019-09-26 11:04 ` Johannes Schindelin 2019-09-26 13:28 ` SZEDER Gábor 2019-09-26 19:39 ` Johannes Schindelin 2019-09-26 21:44 ` SZEDER Gábor 2019-09-27 22:18 ` Jeff King 2019-10-09 17:25 ` SZEDER Gábor 2019-10-11 6:34 ` Jeff King 2019-09-23 18:19 ` Jeff King 2019-09-24 14:30 ` Johannes Schindelin 2019-09-17 15:10 ` Christian Couder 2019-09-23 12:50 ` Johannes Schindelin 2019-09-23 19:30 ` Jeff King 2019-09-23 18:07 ` Jeff King 2019-09-24 14:25 ` Johannes Schindelin 2019-09-24 15:33 ` Jeff King 2019-09-28 3:56 ` Junio C Hamano 2019-09-24 0:55 ` Eric Wong 2019-09-26 12:45 ` Johannes Schindelin 2019-09-30 8:55 ` Eric Wong 2019-09-28 4:01 ` Junio C Hamano 2019-09-20 17:04 ` Jonathan Tan 2019-09-21 1:47 ` Emily Shaffer 2019-09-23 14:23 ` Christian Couder 2019-09-23 19:40 ` Jeff King 2019-09-23 22:29 ` Philip Oakley 2019-10-22 21:16 ` Emily Shaffer 2019-09-23 11:49 ` Christian Couder 2019-09-23 17:58 ` Jonathan Tan 2019-09-23 19:27 ` Jeff King 2019-09-23 20:48 ` Jonathan Tan 2019-09-23 19:15 ` Jeff King 2019-09-23 20:38 ` Jonathan Tan 2019-09-23 21:28 ` Jeff King 2019-09-24 17:07 ` Jonathan Tan 2019-09-26 7:09 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).