All of lore.kernel.org
 help / color / mirror / Atom feed
* [GSoC][Draft Proposal] Finish converting git submodule to builtin
@ 2021-04-03 14:08 Atharva Raykar
  2021-04-05 16:02 ` Christian Couder
  2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
  0 siblings, 2 replies; 11+ messages in thread
From: Atharva Raykar @ 2021-04-03 14:08 UTC (permalink / raw)
  To: git; +Cc: christian.couder, shouryashukla.oo, periperidip

Hi all,

Below is my draft of my GSoC proposal. I have noticed that Chinmoy has already
submitted a proposal for the same idea before me, so would that be considered
"taken"? (I don't think I can submit another proposal for the other idea either,
because someone has already sent one for that as well)

Since I have already put my effort into this for a while, I thought I might as
well send it, but I'll accept whatever the mentors say about the eligibility of
this proposal.

Here is a prettier markdown version:
https://gist.github.com/tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19


--8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--

                          ___________________

                           GSOC GIT PROPOSAL

                             Atharva Raykar
                          ___________________


Table of Contents
_________________

1. Personal Details
2. Background
3. Me and Git
.. 1. Current knowledge of git
4. The Project: Finish converting `git submodule' to builtin
5. Prior work
6. General implementation strategy
7. Timeline (using the format dd/mm)
8. Beyond GSoC
9. Blogging
10. Final Remarks: A little more about me


1 Personal Details
==================

  Name        : Atharva Raykar
  Major       : Computer Science and Engineering
  Email       : raykar.ath@gmail.com
  IRC nick    : atharvaraykar on #git and #git-devel
  Address     : RB 103, Purva Riviera, Marathahalli, Bangalore
  Postal Code : 560037
  Time Zone   : IST (UTC+5:30)
  GitHub      : http://github.com/tfidfwastaken


2 Background
============

  I am Atharva Raykar, currently in my third year of studying Computer
  Science and Engineering at PES University, Bangalore. I have always
  enjoyed programming since a young age, but my deep appreciation for
  good program design and creating the right abstractions came during my
  exploration of the various rabbitholes of knowledge originating from
  communities around the internet. I have personally enjoyed learning
  about Functional Programming, Database Architecture and Operating
  Systems, and my interests keep expanding as I explore more in this
  field.

  I owe my appreciation of this rich field to these communities, and I
  always wanted to give back. With that goal, I restarted the [PES Open
  Source] community in our campus, with the goal of creating spaces
  where members could share knowledge, much in the same spirit as the
  communities that kickstarted my journey in Computer Science. I learnt
  a lot about collaborating in the open, maintainership, and reviewing
  code. While I have made many small contributions to projects in the
  past, I am hoping GSoC will help me make the leap to a larger and more
  substantial contribution to one of my favourite projects that made it
  all possible in my journey with Open Source.


[PES Open Source] <https://pesos.github.io>


3 Me and Git
============

  Here are the various forms of contributions that I have made to Git:

  - [Microproject] userdiff: userdiff: add support for Scheme Status: In
    progress, patch v2 pending List:
    <https://public-inbox.org/git/20210327173938.59391-1-raykar.ath@gmail.com/>

  - [Git Education] Conducted a workshop with attendance of hundreds of
    students new to git, and increased the prevalence of of git's usage
    in my campus.
    Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
    <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>

  I intend to continue helping people out on the mailing list and IRC
  and tending to patches wherever possible in the meantime.


3.1 Current knowledge of git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  I use git almost daily in some form, and I am fairly comfortable with
  it. I have already read and understood the chapters from the Git
  Book about submodules along with the one on objects, references,
  packfiles and the refspec.


4 The Project: Finish converting `git submodule' to builtin
===========================================================

  Git has historically had many components implemented in the form of
  shell scripts. This was less than ideal for several reasons:
  - Portability: Non-POSIX systems like Windows don't play nice with
    shell script commands like grep, cd and printf, to name a few, and
    these commands have to be reimplemented for the system. There are
    also POSIX to Windows path conversion issues.
  - No direct access to plumbing: Shell commands do not have direct
    access to the low level git API, and a separate shell is spawned to
    just to carry out their operations.
  - Performance: Shell scripts tend to create a lot of child processes
    which slows down the functioning of these commands, especially with
    large repositories.
  Over the years, many GSoC students have converted the shell versions
  of these commands to C. Git `submodule' is the last of these to be
  converted.


5 Prior work
============

  I will be taking advantage of the knowledge that was gained in the
  process of the converting the previous scripts and avoiding all the
  gotchas that may be present in the process. There may be a bunch of
  useful helper functions in the previous patches that can be reused as
  well (more investigation needed to determine what exactly is
  reusable).

  Currently the only other commands left to be completed for `submodule'
  are `add' and `update'. Work for `add' has already been started by a
  previous GSoCer, Shourya Shukla, and needs to picked up from there.

  Reference:
  <https://github.com/gitgitgadget/git/issues/541#issuecomment-769245064>

  I'll have these as my references when I am working on the project:
  His blog about his progress:
  <https://shouryashukla.blogspot.com/2020/08/the-final-report.html>
  (more has been implemented since)
  Shourya's latest patch for `submodule add':
  <https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/>

  For the most part, the implementation looks fairly complete, but there
  seems to be a segfault occurring, along with a few changes suggested
  by the reviewers. It will be helpful to contact Shourya to fully
  understand what needs to be done.

  Prathamesh's previous conversion work:
  <https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t>


6 General implementation strategy
=================================

  The way to port the shell to C code for `submodule' will largely
  remain the same. There already exists the builtin
  `submodule--helper.c' which contains most of the previous commands'
  ports. All that the shell script for `git-submodule.sh' is doing for
  the previously completed ports is parsing the flags and then calling
  the helper, which does all the business logic.

  So I will be moving out all the business logic that the shell script
  is performing to `submodule--helper.c'. Any reusable functionality
  that is introduced during the port will be added to `submodule.c' in
  the top level.

  For example: The general strategy for converting `cmd_update()' would
  be to have a call to `submodule--helper' in the shell script to a
  function which would resemble something like `module_update()' which
  would perform the work being done by the shell script past the flags
  being parsed and make the necessary calls to `update_clone()', and the
  git interface in C for performing the merging, checkout and rebase
  where necessary.

  After this process, the builtin is added to the commands array in
  `submodule--helper.c'. And since these two functions are the last bit
  of functionality left to convert in submodules, an extended goal can
  be to get rid of the shell script altogether, and make the helper into
  the actual builtin [1].

  [1]
  <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>


7 Timeline (using the format dd/mm)
===================================

  Periods of limited availability (read: hectic chaos):
  - From 13/04 to 20/04 I will be having project evaluations and lab
    assessments for five of my courses.
  - From 20/04 to 01/05 I have my in-semester exams.
  - For a period of two weeks in the range of 08/05 to 29/05 I will be
    having my end-semester exams.
  My commitment: I will still have time during my finals to help people
  out on the mailing list, get acquainted with the community and its
  processes, and even review patches if I can. This is because we get
  holidays between each exam, and my grades are good enough to that I
  can prioritise git over my studies ;-)

  And on the safe side, I will still engage with the community from now
  till 07/06 so that the community bonding period is not compromised in
  any way.

  Periods of abundant availability: After 29/05 all the way to the first
  week of August, I will be having my summer break, so I can dedicate
  myself to git full-time :-)

  I would have also finished all my core courses, so even after that, I
  will have enough of time to give back to git past my GSoC period.

  Phase 1: 07/06 to 14/06 -- Investigate and devise a strategy to port
  the submodule functions
  - This phase will be more diagrams in my notebook than code in my
    editor -- I will go through all the methods used to port the other
    submodule functions and see how to do the same for what is left.
  - I will find the C equivalents of all the shell invocations in
    `git-submodule.sh', and see what invocations have /no/ equivalent
    and need to be created as helpers in C (Eg: What is the equivalent
    to the `ensure-core-worktree' invocation in C?). For all the helpers
    and new functionality that I do introduce, I will need to create the
    testing strategy for the same.
  - I will go through all the work done by Shourya in his patch, and try
    to understand it properly. I will also see the mistakes that were
    caught in all the reviews for previous submodule conversion patches
    and try to learn from them before I jump to the code.
  - Deliverable: I will create a checklist for all the work that needs
    to be done with as much detail as I can with the help of inputs from
    my mentor and all the knowledge I have gained in the process.

  Phase 2: 14/06 to 28/06 -- Convert `add' to builtin in C
  - I will work on completing `git submodule add'. One strategy would be
    to either reimplement the whole thing using what was learnt in
    Shourya's attempt, but it is probably wiser to just take his patch
    and modify it. I would know what to do by the time I reach this
    phase.
  - I will also add tests for this functionality. I will also document
    my changes when required. These would be unit tests for the helpers
    introduced, and integration of `add' with the other commands.
  - Deliverable: Completely port `add' to C!

  Phase 3: 28/06 to 16/08 -- Convert `update' to builtin
  - Some work has already been done by Stephan Beller that moves the
    functionality of `update' to `submodule--helper.c':
    <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>,
    but a lot of the business logic of going into the submodule and
    checking out or merging or rebasing needs to still be converted.
    Plenty to do here.
  - As with `add', all of the appropriate tests need to be written and
    the changes documented. As I have learnt from the Pro Git Book,
    there are a lot of subtleties with how update does its work that I
    need to watch out for.
  - Deliverable: Completely port `update' to C!

  Bonus Phase: If I am ahead of time -- Remove the need for a
  `submodule--helper', and make it a proper C builtin.
  - Once all the submodule functionality is ported, the shell script is
    not really doing much more than parsing the arguments and passing it
    to the helper. We won't need this anymore if it is implemented.


8 Beyond GSoC
=============

  I love the process of working as a community more than anything else,
  and I already felt very welcomed by the git community the moment I
  started sending in my microproject patch series. Whether I am selected
  or not, I will continue giving back to git wherever I can. Since my
  final year is light on coursework, I will be able to mentor people and
  help expand the git developer community through all the ways I can (be
  it code review, helping people find the right resources or evangelism
  of git).


9 Blogging
==========

  I will be blogging about my progress on a weekly basis and either post
  it on my website at <https://atharvaraykar.me> (probably will tuck it
  away in a /gsoc path). Technical blogging is not particularly new to
  me, and I hope my posts can help future contributors of git.


10 Final Remarks: A little more about me
========================================

  These are some of my core values that I believe will be important to
  pull off this project and make the most of my time in GSoC:
  - Hard problems don't frustrate me, rather they excite me. Bugs make
    my brain perk up (this sentence best left with context). I love
    learning.
  - I am pro-transparency. If I am having some trouble, I will be open
    about it. I don't hesitate to ask questions and dig deep if I need
    to.
  - At the same time, when I ask a question, I only do so after I have
    struggled with the problem for enough time and done my due diligence
    in trying to solve it. Clear communication is very important to make
    this work.
  - I am also very comfortable with learning things all on my own (I
    have barely known any other way), and working in a remote,
    asynchronous setting.
  I hope to make the world better in my own small way by contributing to
  a tool that everyone uses and I like. It's more rewarding than any
  internship that my peers are doing this year. I look forward to
  learning more.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal] Finish converting git submodule to builtin
  2021-04-03 14:08 [GSoC][Draft Proposal] Finish converting git submodule to builtin Atharva Raykar
@ 2021-04-05 16:02 ` Christian Couder
  2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Couder @ 2021-04-05 16:02 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: git, Shourya Shukla, Shourya Shukla

Hi,

On Sat, Apr 3, 2021 at 4:08 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>
> Hi all,
>
> Below is my draft of my GSoC proposal. I have noticed that Chinmoy has already
> submitted a proposal for the same idea before me, so would that be considered
> "taken"? (I don't think I can submit another proposal for the other idea either,
> because someone has already sent one for that as well)

Unfortunately, it looks like we will mentor only 2 students on the 2
projects listed on https://git.github.io/SoC-2021-Ideas/, so we might
have to make tough choices.

> Since I have already put my effort into this for a while, I thought I might as
> well send it, but I'll accept whatever the mentors say about the eligibility of
> this proposal.

Thanks for sending it anyway!

> Here is a prettier markdown version:
> https://gist.github.com/tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19
>
>
> --8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--
>
>                           ___________________
>
>                            GSOC GIT PROPOSAL
>
>                              Atharva Raykar
>                           ___________________
>
>
> Table of Contents
> _________________
>
> 1. Personal Details
> 2. Background
> 3. Me and Git
> .. 1. Current knowledge of git
> 4. The Project: Finish converting `git submodule' to builtin
> 5. Prior work
> 6. General implementation strategy
> 7. Timeline (using the format dd/mm)
> 8. Beyond GSoC
> 9. Blogging
> 10. Final Remarks: A little more about me
>
>
> 1 Personal Details
> ==================
>
>   Name        : Atharva Raykar
>   Major       : Computer Science and Engineering
>   Email       : raykar.ath@gmail.com
>   IRC nick    : atharvaraykar on #git and #git-devel
>   Address     : RB 103, Purva Riviera, Marathahalli, Bangalore
>   Postal Code : 560037
>   Time Zone   : IST (UTC+5:30)
>   GitHub      : http://github.com/tfidfwastaken
>
>
> 2 Background
> ============
>
>   I am Atharva Raykar, currently in my third year of studying Computer
>   Science and Engineering at PES University, Bangalore. I have always
>   enjoyed programming since a young age, but my deep appreciation for
>   good program design and creating the right abstractions came during my
>   exploration of the various rabbitholes of knowledge originating from
>   communities around the internet. I have personally enjoyed learning
>   about Functional Programming, Database Architecture and Operating
>   Systems, and my interests keep expanding as I explore more in this
>   field.
>
>   I owe my appreciation of this rich field to these communities, and I
>   always wanted to give back. With that goal, I restarted the [PES Open
>   Source] community in our campus, with the goal of creating spaces
>   where members could share knowledge, much in the same spirit as the
>   communities that kickstarted my journey in Computer Science. I learnt
>   a lot about collaborating in the open, maintainership, and reviewing
>   code. While I have made many small contributions to projects in the
>   past, I am hoping GSoC will help me make the leap to a larger and more
>   substantial contribution to one of my favourite projects that made it
>   all possible in my journey with Open Source.
>
>
> [PES Open Source] <https://pesos.github.io>
>
>
> 3 Me and Git
> ============
>
>   Here are the various forms of contributions that I have made to Git:
>
>   - [Microproject] userdiff: userdiff: add support for Scheme Status: In
>     progress, patch v2 pending List:
>     <https://public-inbox.org/git/20210327173938.59391-1-raykar.ath@gmail.com/>
>
>   - [Git Education] Conducted a workshop with attendance of hundreds of
>     students new to git, and increased the prevalence of of git's usage
>     in my campus.
>     Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
>     <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>
>
>   I intend to continue helping people out on the mailing list and IRC
>   and tending to patches wherever possible in the meantime.

Nice!

> 3.1 Current knowledge of git

s/git/Git/

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   I use git almost daily in some form, and I am fairly comfortable with
>   it. I have already read and understood the chapters from the Git
>   Book about submodules along with the one on objects, references,
>   packfiles and the refspec.
>
>
> 4 The Project: Finish converting `git submodule' to builtin
> ===========================================================
>
>   Git has historically had many components implemented in the form of
>   shell scripts. This was less than ideal for several reasons:
>   - Portability: Non-POSIX systems like Windows don't play nice with
>     shell script commands like grep, cd and printf, to name a few, and
>     these commands have to be reimplemented for the system. There are
>     also POSIX to Windows path conversion issues.
>   - No direct access to plumbing: Shell commands do not have direct
>     access to the low level git API, and a separate shell is spawned to
>     just to carry out their operations.
>   - Performance: Shell scripts tend to create a lot of child processes
>     which slows down the functioning of these commands, especially with
>     large repositories.
>   Over the years, many GSoC students have converted the shell versions
>   of these commands to C. Git `submodule' is the last of these to be
>   converted.
>
>
> 5 Prior work
> ============
>
>   I will be taking advantage of the knowledge that was gained in the
>   process of the converting the previous scripts and avoiding all the
>   gotchas that may be present in the process. There may be a bunch of
>   useful helper functions in the previous patches that can be reused as
>   well (more investigation needed to determine what exactly is
>   reusable).
>
>   Currently the only other commands left to be completed for `submodule'
>   are `add' and `update'. Work for `add' has already been started by a
>   previous GSoCer, Shourya Shukla, and needs to picked up from there.

Yeah, 'update' uses  ̀git submodule--helper update-clone`, `git
submodule--helper update-module-mode` and other `git
submodule--helper` sub-commands, but is not fully ported.

>   Reference:
>   <https://github.com/gitgitgadget/git/issues/541#issuecomment-769245064>
>
>   I'll have these as my references when I am working on the project:
>   His blog about his progress:
>   <https://shouryashukla.blogspot.com/2020/08/the-final-report.html>
>   (more has been implemented since)
>   Shourya's latest patch for `submodule add':
>   <https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/>
>
>   For the most part, the implementation looks fairly complete, but there
>   seems to be a segfault occurring, along with a few changes suggested
>   by the reviewers. It will be helpful to contact Shourya to fully
>   understand what needs to be done.
>
>   Prathamesh's previous conversion work:
>   <https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t>

It would be nice if, after finishing 'add' and 'update', you could
also completely get rid of git-submodule.sh and instead use `git
submodule-helper` as `git submodule`.

> 6 General implementation strategy
> =================================
>
>   The way to port the shell to C code for `submodule' will largely
>   remain the same. There already exists the builtin
>   `submodule--helper.c' which contains most of the previous commands'
>   ports. All that the shell script for `git-submodule.sh' is doing for
>   the previously completed ports is parsing the flags and then calling
>   the helper, which does all the business logic.
>
>   So I will be moving out all the business logic that the shell script
>   is performing to `submodule--helper.c'. Any reusable functionality
>   that is introduced during the port will be added to `submodule.c' in
>   the top level.

Ok.

>   For example: The general strategy for converting `cmd_update()' would
>   be to have a call to `submodule--helper' in the shell script to a
>   function which would resemble something like `module_update()' which
>   would perform the work being done by the shell script past the flags
>   being parsed and make the necessary calls to `update_clone()', and the
>   git interface in C for performing the merging, checkout and rebase
>   where necessary.

It would be nice if you could go into more details about what
`module_update()' would look like. Do you see steps that you could
take to not have to do everything related to `module_update()' in only
one patch?

>   After this process, the builtin is added to the commands array in
>   `submodule--helper.c'. And since these two functions are the last bit

It's not very clear here that by "these two functions" you reference
the 'add' and 'update' sub-commands.

>   of functionality left to convert in submodules, an extended goal can
>   be to get rid of the shell script altogether, and make the helper into
>   the actual builtin [1].

Nice that you are talking about this!

>   [1]
>   <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>
>
>
> 7 Timeline (using the format dd/mm)
> ===================================
>
>   Periods of limited availability (read: hectic chaos):
>   - From 13/04 to 20/04 I will be having project evaluations and lab
>     assessments for five of my courses.
>   - From 20/04 to 01/05 I have my in-semester exams.
>   - For a period of two weeks in the range of 08/05 to 29/05 I will be
>     having my end-semester exams.
>   My commitment: I will still have time during my finals to help people
>   out on the mailing list, get acquainted with the community and its
>   processes, and even review patches if I can. This is because we get
>   holidays between each exam, and my grades are good enough to that I
>   can prioritise git over my studies ;-)

s/git/Git/

>   And on the safe side, I will still engage with the community from now
>   till 07/06 so that the community bonding period is not compromised in
>   any way.
>
>   Periods of abundant availability: After 29/05 all the way to the first
>   week of August, I will be having my summer break, so I can dedicate
>   myself to git full-time :-)
>
>   I would have also finished all my core courses, so even after that, I
>   will have enough of time to give back to git past my GSoC period.

Ok.

Also: s/git/Git/

>   Phase 1: 07/06 to 14/06 -- Investigate and devise a strategy to port
>   the submodule functions
>   - This phase will be more diagrams in my notebook than code in my
>     editor -- I will go through all the methods used to port the other
>     submodule functions and see how to do the same for what is left.
>   - I will find the C equivalents of all the shell invocations in
>     `git-submodule.sh', and see what invocations have /no/ equivalent
>     and need to be created as helpers in C (Eg: What is the equivalent
>     to the `ensure-core-worktree' invocation in C?). For all the helpers
>     and new functionality that I do introduce, I will need to create the
>     testing strategy for the same.
>   - I will go through all the work done by Shourya in his patch, and try
>     to understand it properly. I will also see the mistakes that were
>     caught in all the reviews for previous submodule conversion patches
>     and try to learn from them before I jump to the code.
>   - Deliverable: I will create a checklist for all the work that needs
>     to be done with as much detail as I can with the help of inputs from
>     my mentor and all the knowledge I have gained in the process.
>
>   Phase 2: 14/06 to 28/06 -- Convert `add' to builtin in C
>   - I will work on completing `git submodule add'. One strategy would be
>     to either reimplement the whole thing using what was learnt in
>     Shourya's attempt, but it is probably wiser to just take his patch
>     and modify it. I would know what to do by the time I reach this
>     phase.
>   - I will also add tests for this functionality. I will also document
>     my changes when required. These would be unit tests for the helpers
>     introduced, and integration of `add' with the other commands.
>   - Deliverable: Completely port `add' to C!
>
>   Bonus Phase: If I am ahead of time -- Remove the need for a
>   `submodule--helper', and make it a proper C builtin.
>   - Once all the submodule functionality is ported, the shell script is
>     not really doing much more than parsing the arguments and passing it
>     to the helper. We won't need this anymore if it is implemented.

Ok, great!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-03 14:08 [GSoC][Draft Proposal] Finish converting git submodule to builtin Atharva Raykar
  2021-04-05 16:02 ` Christian Couder
@ 2021-04-08 10:19 ` Atharva Raykar
  2021-04-10 12:59   ` Christian Couder
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Atharva Raykar @ 2021-04-08 10:19 UTC (permalink / raw)
  To: git; +Cc: christian.couder, shouryashukla.oo, periperidip

Here's my updated draft. Changes since v1:

- Elaborated more on example porting strategy, stating how the patches
   could be broken up.
- Made language at the end of section 6 less ambiguous.
- Updated status of microproject.
- s/git/Git in several places.

Markdown version: https://gist.github.com/tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19

--8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--
                          ___________________

                           GSOC GIT PROPOSAL

                             Atharva Raykar
                          ___________________


Table of Contents
_________________

1. Personal Details
2. Background
3. Me and Git
.. 1. Current knowledge of Git
4. The Project: Finish converting `git submodule' to builtin
5. Prior work
6. General implementation strategy
7. Timeline (using the format dd/mm)
8. Beyond GSoC
9. Blogging
10. Final Remarks: A little more about me


1 Personal Details
==================

  Name : Atharva Raykar
  Major : Computer Science and Engineering
  Email : raykar.ath@gmail.com
  IRC nick : atharvaraykar on #git and #git-devel
  Address : RB 103, Purva Riviera, Marathahalli, Bangalore
  Postal Code : 560037
  Time Zone : IST (UTC+5:30)
  GitHub : github.com/tfidfwastaken


2 Background
============

  I am Atharva Raykar, currently in my third year of studying Computer
  Science and Engineering at PES University, Bangalore. I have always
  enjoyed programming since a young age, but my deep appreciation for
  good program design and creating the right abstractions came during my
  exploration of the various rabbitholes of knowledge originating from
  communities around the internet. I have personally enjoyed learning
  about Functional Programming, Database Architecture and Operating
  Systems, and my interests keep expanding as I explore more in this
  field.

  I owe my appreciation of this rich field to these communities, and I
  always wanted to give back. With that goal, I restarted the [PES Open
  Source] community in our campus, with the goal of creating spaces
  where members could share knowledge, much in the same spirit as the
  communities that kickstarted my journey in Computer Science. I learnt
  a lot about collaborating in the open, maintainership, and reviewing
  code. While I have made many small contributions to projects in the
  past, I am hoping GSoC will help me make the leap to a larger and more
  substantial contribution to one of my favourite projects that made it
  all possible in my journey with Open Source.


[PES Open Source] <https://pesos.github.io>


3 Me and Git
============

  Here are the various forms of contributions that I have made to Git:

  - [Microproject] userdiff: userdiff: add support for Scheme Status: In
    progress, patch v3 requiring a review List:
    <https://lore.kernel.org/git/20210408091442.22740-1-raykar.ath@gmail.com/>

  - [Git Education] Conducted a workshop with attendance of hundreds of
    students new to git, and increased the prevalence of of git's usage
    in my campus.
    Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
    <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>

  I intend to continue helping people out on the mailing list and IRC
  and tending to patches wherever possible in the meantime.


3.1 Current knowledge of Git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  I use Git almost daily in some form, and I am fairly comfortable with
  it. I have already read and understood the chapters from the Git Book
  about submodules along with the one on objects, references, packfiles
  and the refspec.


4 The Project: Finish converting `git submodule' to builtin
===========================================================

  Git has historically had many components implemented in the form of
  shell scripts. This was less than ideal for several reasons:
  - Portability: Non-POSIX systems like Windows don't play nice with
    shell script commands like grep, cd and printf, to name a few, and
    these commands have to be reimplemented for the system. There are
    also POSIX to Windows path conversion issues.
  - No direct access to plumbing: Shell commands do not have direct
    access to the low level Git API, and a separate shell is spawned to
    just to carry out their operations.
  - Performance: Shell scripts tend to create a lot of child processes
    which slows down the functioning of these commands, especially with
    large repositories.
  Over the years, many GSoC students have converted the shell versions
  of these commands to C. Git `submodule' is the last of these to be
  converted.


5 Prior work
============

  I will be taking advantage of the knowledge that was gained in the
  process of the converting the previous scripts and avoiding all the
  gotchas that may be present in the process. There may be a bunch of
  useful helper functions in the previous patches that can be reused as
  well (more investigation needed to determine what exactly is
  reusable).

  Currently the only other commands left to be completed for `submodule'
  are `add' and `update'. Work for `add' has already been started by a
  previous GSoCer, Shourya Shukla, and needs to picked up from there.
  `update' has had some of its functionality moved over to
  `submodule--helper.c' where Stefan Beller added the helper functions
  `update-clone', `update-module-mode', `remote-branch' and more.

  References:
  <https://github.com/gitgitgadget/git/issues/541#issuecomment-769245064>
  <https://github.com/git/git/commit/4d6d6ef1fc>
  <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>
  <https://github.com/git/git/commit/ee69b2a90c5031bffb3341c5e50653a6ecca89ac>
  <https://github.com/git/git/commit/92bbe7ccf1fedac825f2c6ab4c8de91dc5370fd2>

  I'll have these as my references when I am working on the project:
  His blog about his progress:
  <https://shouryashukla.blogspot.com/2020/08/the-final-report.html>
  (more has been implemented since)
  Shourya's latest patch for `submodule add':
  <https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/>

  For the most part, the implementation looks fairly complete, but there
  seems to be a segfault occurring, along with a few changes suggested
  by the reviewers. It will be helpful to contact Shourya to fully
  understand what needs to be done.

  Prathamesh's previous conversion work:
  <https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t>

  The ultimate goal would be to get rid of `git-submodules.sh'
  altogether -- which will complete the porting efforts of `submodule'
  to C.


6 General implementation strategy
=================================

  The way to port the shell to C code for `submodule' will largely
  remain the same. There already exists the builtin
  `submodule--helper.c' which contains most of the previous commands'
  ports. All that the shell script for `git-submodule.sh' is doing for
  the previously completed ports is parsing the flags and then calling
  the helper, which does all the business logic.

  So I will be moving out all the business logic that the shell script
  is performing to `submodule--helper.c'. Any reusable functionality
  that is introduced during the port will be added to `submodule.c' in
  the top level.

      For example: The general strategy for converting `cmd_update()' would
      be to have a call to `submodule--helper' in the shell script to a
      function which would resemble something like `module_update()'. This
      would perform the work being done by the shell script past the flags
      being parsed and make the necessary call to `update_clone()', which
      returns information about the cloned modules. For each cloned module,
      it will find out the update mode through `module_update_mode()', and
      run the appropriate operation according to that mode (like a rebase,
      if that was the update mode).

      One possible way this work can be broken up into multiple patches, is
      by moving over the shell code into C in a bottom-up manner.
      For example: The shell part which retrieves the latest revision in the
      remote (if --remote is specified) can be wrapped into a command of
      `submodule--helper.c'. Then we can move the part where we run the
      update method (ie the `case' on line 611 onwards) into a C function.
      Eventually, the shell part will just look like a bunch of invocations
      to `submodule--helper', at which point, the whole thing can be
      encapsulated in a single command called `git submodule--helper update'
      (Bonus: Move the whole functionality to C, including the parsing of
      flags, to work towards getting rid of `git-submodule.sh'). I believe
      this is a fairly non-destructive and incremental way to work, and the
      porting efforts by Stefan seem to follow this same kind of philosophy.
      I will most likely end up tuning the size of these increments when I
      get around to planning in my first phase of the project.

  After this process, I will be adding the `add' and `update' command to
  the commands array in `submodule--helper.c'. And since these two
  functions are the last bit of functionality left to convert in
  submodules, an extended goal can be to get rid of the shell script
  altogether, and make the helper into the actual builtin [1].

  [1]
  <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>


7 Timeline (using the format dd/mm)
===================================

  Periods of limited availability (read: hectic chaos):
  - From 13/04 to 20/04 I will be having project evaluations and lab
    assessments for five of my courses.
  - From 20/04 to 01/05 I have my in-semester exams.
  - For a period of two weeks in the range of 08/05 to 29/05 I will be
    having my end-semester exams.
  My commitment: I will still have time during my finals to help people
  out on the mailing list, get acquainted with the community and its
  processes, and even review patches if I can. This is because we get
  holidays between each exam, and my grades are good enough to that I
  can prioritise Git over my studies ;-)

  And on the safe side, I will still engage with the community from now
  till 07/06 so that the community bonding period is not compromised in
  any way.

  Periods of abundant availability: After 29/05 all the way to the first
  week of August, I will be having my summer break, so I can dedicate
  myself to Git full-time :-)

  I would have also finished all my core courses, so even after that, I
  will have enough of time to give back to Git past my GSoC period.

  Phase 1: 07/06 to 14/06 -- Investigate and devise a strategy to port
  the submodule functions
  - This phase will be more diagrams in my notebook than code in my
    editor -- I will go through all the methods used to port the other
    submodule functions and see how to do the same for what is left.
  - I will find the C equivalents of all the shell invocations in
    `git-submodule.sh', and see what invocations have /no/ equivalent
    and need to be created as helpers in C (Eg: What is the equivalent
    to the `ensure-core-worktree' invocation in C?). For all the helpers
    and new functionality that I do introduce, I will need to create the
    testing strategy for the same.
  - I will go through all the work done by Shourya in his patch, and try
    to understand it properly. I will also see the mistakes that were
    caught in all the reviews for previous submodule conversion patches
    and try to learn from them before I jump to the code.
  - Deliverable: I will create a checklist for all the work that needs
    to be done with as much detail as I can with the help of inputs from
    my mentor and all the knowledge I have gained in the process.

  Phase 2: 14/06 to 28/06 -- Convert `add' to builtin in C
  - I will work on completing `git submodule add'. One strategy would be
    to either reimplement the whole thing using what was learnt in
    Shourya's attempt, but it is probably wiser to just take his patch
    and modify it. I would know what to do by the time I reach this
    phase.
  - I will also add tests for this functionality. I will also document
    my changes when required. These would be unit tests for the helpers
    introduced, and integration of `add' with the other commands.
  - Deliverable: Completely port `add' to C!

  Phase 3: 28/06 to 16/08 -- Convert `update' to builtin
  - Some work has already been done by Stephan Beller that moves the
    functionality of `update' to `submodule--helper.c':
    <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>,
    but a lot of the business logic of going into the submodule and
    checking out or merging or rebasing needs to still be converted.
    Plenty to do here.
  - As with `add', all of the appropriate tests need to be written and
    the changes documented. As I have learnt from the Pro Git Book,
    there are a lot of subtleties with how update does its work that I
    need to watch out for.
  - Deliverable: Completely port `update' to C!

  Bonus Phase: If I am ahead of time -- Remove the need for a
  `submodule--helper', and make it a proper C builtin.
  - Once all the submodule functionality is ported, the shell script is
    not really doing much more than parsing the arguments and passing it
    to the helper. We won't need this anymore if it is implemented.


8 Beyond GSoC
=============

  I love the process of working as a community more than anything else,
  and I already felt very welcomed by the Git community the moment I
  started sending in my microproject patch series. Whether I am selected
  or not, I will continue giving back to Git wherever I can. Since my
  final year is light on coursework, I will be able to mentor people and
  help expand the Git developer community through all the ways I can (be
  it code review, helping people find the right resources or evangelism
  of Git).


9 Blogging
==========

  I will be blogging about my progress on a weekly basis and either post
  it on my website at <https://atharvaraykar.me> (probably will tuck it
  away in a /gsoc path). Technical blogging is not particularly new to
  me, and I hope my posts can help future contributors of Git.


10 Final Remarks: A little more about me
========================================

  These are some of my core values that I believe will be important to
  pull off this project and make the most of my time in GSoC:
  - Hard problems don't frustrate me, rather they excite me. Bugs make
    my brain perk up. I love the process of learning.
  - I am pro-transparency. If I am having some trouble, I will be open
    about it. I don't hesitate to ask questions and dig deep if I need
    to.
  - At the same time, when I ask a question, I only do so after I have
    struggled with the problem for enough time and done my due diligence
    in trying to solve it. Clear communication is very important to make
    this work.
  - I am also very comfortable with learning things all on my own (I
    have barely known any other way), and working in a remote,
    asynchronous setting.
  I hope to make the world better in my own small way by contributing to
  a tool that everyone uses and I like. It's more rewarding than any
  internship that my peers are doing this year. I look forward to
  learning more.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
@ 2021-04-10 12:59   ` Christian Couder
  2021-04-11  9:40     ` Atharva Raykar
  2021-04-11 10:17   ` [GSoC][Draft Proposal v3] " Atharva Raykar
  2021-05-14 16:00   ` [GSoC][Draft Proposal v2] " Atharva Raykar
  2 siblings, 1 reply; 11+ messages in thread
From: Christian Couder @ 2021-04-10 12:59 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: git, Shourya Shukla, Shourya Shukla

On Thu, Apr 8, 2021 at 12:19 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>
> Here's my updated draft. Changes since v1:
>
> - Elaborated more on example porting strategy, stating how the patches
>    could be broken up.
> - Made language at the end of section 6 less ambiguous.
> - Updated status of microproject.
> - s/git/Git in several places.

Thanks for this summary of the changes since the previous version!

> 3 Me and Git
> ============
>
>   Here are the various forms of contributions that I have made to Git:
>
>   - [Microproject] userdiff: userdiff: add support for Scheme Status: In

s/userdiff: userdiff/userdiff/

>     progress, patch v3 requiring a review List:
>     <https://lore.kernel.org/git/20210408091442.22740-1-raykar.ath@gmail.com/>
>
>   - [Git Education] Conducted a workshop with attendance of hundreds of
>     students new to git, and increased the prevalence of of git's usage

s/git/Git/
s/of of git/of Git/

>     in my campus.
>     Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
>     <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>

[...]

> 6 General implementation strategy
> =================================
>
>   The way to port the shell to C code for `submodule' will largely
>   remain the same. There already exists the builtin
>   `submodule--helper.c' which contains most of the previous commands'
>   ports. All that the shell script for `git-submodule.sh' is doing for
>   the previously completed ports is parsing the flags and then calling
>   the helper, which does all the business logic.
>
>   So I will be moving out all the business logic that the shell script
>   is performing to `submodule--helper.c'. Any reusable functionality
>   that is introduced during the port will be added to `submodule.c' in
>   the top level.
>
>       For example: The general strategy for converting `cmd_update()' would
>       be to have a call to `submodule--helper' in the shell script to a
>       function which would resemble something like `module_update()'.

Does module_update() already exists? It's hard to understand if you
are referring to something that already exists (where?) or that you
would create (how?) here. More details about this would be nice.

> This
>       would perform the work being done by the shell script past the flags
>       being parsed and make the necessary call to `update_clone()', which
>       returns information about the cloned modules.

How does it return information?

> For each cloned module,
>       it will find out the update mode through `module_update_mode()', and
>       run the appropriate operation according to that mode (like a rebase,
>       if that was the update mode).
>
>       One possible way this work can be broken up into multiple patches, is
>       by moving over the shell code into C in a bottom-up manner.
>       For example: The shell part which retrieves the latest revision in the
>       remote (if --remote is specified) can be wrapped into a command of
>       `submodule--helper.c'.

Could you give an example of how the command would be named, what
arguments it would take and how it could be used?

> Then we can move the part where we run the
>       update method (ie the `case' on line 611 onwards) into a C function.

Do you mean the code that does something like:

                       case "$update_module" in
                       checkout)
                               ...
                       rebase)
                               ...
                       merge)
                               ...
                       !*)
                               ...
                       *)
                               ...
                       esac

                       if (sanitize_submodule_env; cd "$sm_path" &&
$command "$sha1")
                       then
                               say "$say_msg"
                       elif test -n "$must_die_on_failure"
                       then
                               die_with_status 2 "$die_msg"
                       else
                               err="${err};$die_msg"
                               continue
                       fi

?

Could you also give an example of how the command would be named, what
arguments it would take and how it could be used?

>       Eventually, the shell part will just look like a bunch of invocations
>       to `submodule--helper', at which point, the whole thing can be
>       encapsulated in a single command called `git submodule--helper update'
>       (Bonus: Move the whole functionality to C, including the parsing of
>       flags, to work towards getting rid of `git-submodule.sh'). I believe
>       this is a fairly non-destructive and incremental way to work, and the
>       porting efforts by Stefan seem to follow this same kind of philosophy.
>       I will most likely end up tuning the size of these increments when I
>       get around to planning in my first phase of the project.
>
>   After this process, I will be adding the `add' and `update' command to
>   the commands array in `submodule--helper.c'. And since these two
>   functions are the last bit of functionality left to convert in
>   submodules, an extended goal can be to get rid of the shell script
>   altogether, and make the helper into the actual builtin [1].
>
>   [1]
>   <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-10 12:59   ` Christian Couder
@ 2021-04-11  9:40     ` Atharva Raykar
  2021-04-11 19:32       ` Kaartic Sivaraam
  0 siblings, 1 reply; 11+ messages in thread
From: Atharva Raykar @ 2021-04-11  9:40 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Shourya Shukla, Shourya Shukla



> On 10-Apr-2021, at 18:29, Christian Couder <christian.couder@gmail.com> wrote:
> 
> On Thu, Apr 8, 2021 at 12:19 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>> 
>> Here's my updated draft. Changes since v1:
>> 
>> - Elaborated more on example porting strategy, stating how the patches
>>   could be broken up.
>> - Made language at the end of section 6 less ambiguous.
>> - Updated status of microproject.
>> - s/git/Git in several places.
> 
> Thanks for this summary of the changes since the previous version!
> 
>> 3 Me and Git
>> ============
>> 
>>  Here are the various forms of contributions that I have made to Git:
>> 
>>  - [Microproject] userdiff: userdiff: add support for Scheme Status: In
> 
> s/userdiff: userdiff/userdiff/
> 
>>    progress, patch v3 requiring a review List:
>>    <https://lore.kernel.org/git/20210408091442.22740-1-raykar.ath@gmail.com/>
>> 
>>  - [Git Education] Conducted a workshop with attendance of hundreds of
>>    students new to git, and increased the prevalence of of git's usage
> 
> s/git/Git/
> s/of of git/of Git/
> 

Thanks, will fix these.

>>    in my campus.
>>    Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
>>    <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>
> 
> [...]
> 
>> 6 General implementation strategy
>> =================================
>> 
>>  The way to port the shell to C code for `submodule' will largely
>>  remain the same. There already exists the builtin
>>  `submodule--helper.c' which contains most of the previous commands'
>>  ports. All that the shell script for `git-submodule.sh' is doing for
>>  the previously completed ports is parsing the flags and then calling
>>  the helper, which does all the business logic.
>> 
>>  So I will be moving out all the business logic that the shell script
>>  is performing to `submodule--helper.c'. Any reusable functionality
>>  that is introduced during the port will be added to `submodule.c' in
>>  the top level.
>> 
>>      For example: The general strategy for converting `cmd_update()' would
>>      be to have a call to `submodule--helper' in the shell script to a
>>      function which would resemble something like `module_update()'.
> 
> Does module_update() already exists? It's hard to understand if you
> are referring to something that already exists (where?) or that you
> would create (how?) here. More details about this would be nice.

It is a function that I intend to write, will make that more clear.

>> This
>>      would perform the work being done by the shell script past the flags
>>      being parsed and make the necessary call to `update_clone()', which
>>      returns information about the cloned modules.
> 
> How does it return information?
> 
>> For each cloned module,
>>      it will find out the update mode through `module_update_mode()', and
>>      run the appropriate operation according to that mode (like a rebase,
>>      if that was the update mode).
>> 
>>      One possible way this work can be broken up into multiple patches, is
>>      by moving over the shell code into C in a bottom-up manner.
>>      For example: The shell part which retrieves the latest revision in the
>>      remote (if --remote is specified) can be wrapped into a command of
>>      `submodule--helper.c'.
> 
> Could you give an example of how the command would be named, what
> arguments it would take and how it could be used?
> 
>> Then we can move the part where we run the
>>      update method (ie the `case' on line 611 onwards) into a C function.
> 
> Do you mean the code that does something like:
> 
>                       case "$update_module" in
>                       checkout)
>                               ...
>                       rebase)
>                               ...
>                       merge)
>                               ...
>                       !*)
>                               ...
>                       *)
>                               ...
>                       esac
> 
>                       if (sanitize_submodule_env; cd "$sm_path" &&
> $command "$sha1")
>                       then
>                               say "$say_msg"
>                       elif test -n "$must_die_on_failure"
>                       then
>                               die_with_status 2 "$die_msg"
>                       else
>                               err="${err};$die_msg"
>                               continue
>                       fi
> 
> ?
> 
> Could you also give an example of how the command would be named, what
> arguments it would take and how it could be used?

I could add more detail about the exact arguments each converted part
would take, but I feel a little hesitant because I will most likely
change my mind on a lot of those kind of lower-level decisions as I
understand the codebase better. The point I was trying to convey is
that the high-level workflow I would follow while converting would look
like this:

1. Identify parts in git-submodule.sh that have cohesive functionality
2. Rewrite that functionality in C, which can be invoked from
    `git submodule--helper <function name> <args>`
3. Remove the shell code and replace it with the above invocation
4. Once the shell code is reduced to only a bunch of calls to
    submodule--helper, wrap all of that into one call that looks like
    `git submodule--helper update <flags>` that encapsulates all the
    functionality done by the other helper function calls.

(In other words: I will cluster the functionality in a bottom-up way.
Maybe I should mention the above four points in my proposal?)

The example I gave for how to handle the presence of the remote flag
and the function that performs the module updation method (ie, the `case`
on line 611) was just to illustrate the above workflow, rather than say
that this is how I will exactly do it.

I also would like to know what level of granularity is ideal for the
proposal. For now I have tried to keep it at "whatever I will surely
follow through when I work on the project", which at the moment is the
covered by the four points I mentioned above.

If I go too much into detail about the functions and arguments
of every helper in my example, I will feel compelled to do the same for
the `git submodule add` example. I also will have to reason more carefully
because I do not want to end up in a situation where I do not actually
stick to my proposal all that much, because I realise in my investigation
phase that there is a different, much better way.

Do let me know what is preferred.

>>      Eventually, the shell part will just look like a bunch of invocations
>>      to `submodule--helper', at which point, the whole thing can be
>>      encapsulated in a single command called `git submodule--helper update'
>>      (Bonus: Move the whole functionality to C, including the parsing of
>>      flags, to work towards getting rid of `git-submodule.sh'). I believe
>>      this is a fairly non-destructive and incremental way to work, and the
>>      porting efforts by Stefan seem to follow this same kind of philosophy.
>>      I will most likely end up tuning the size of these increments when I
>>      get around to planning in my first phase of the project.
>> 
>>  After this process, I will be adding the `add' and `update' command to
>>  the commands array in `submodule--helper.c'. And since these two
>>  functions are the last bit of functionality left to convert in
>>  submodules, an extended goal can be to get rid of the shell script
>>  altogether, and make the helper into the actual builtin [1].
>> 
>>  [1]
>>  <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [GSoC][Draft Proposal v3] Finish converting git submodule to builtin
  2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
  2021-04-10 12:59   ` Christian Couder
@ 2021-04-11 10:17   ` Atharva Raykar
  2021-05-14 16:00   ` [GSoC][Draft Proposal v2] " Atharva Raykar
  2 siblings, 0 replies; 11+ messages in thread
From: Atharva Raykar @ 2021-04-11 10:17 UTC (permalink / raw)
  To: git; +Cc: christian.couder, shouryashukla.oo, periperidip

Changes since v2:

- Add more detail in my example of how I would convert `submodule update`
  -- mainly I showed possible names for the invocations, the arguments it
  takes, and operations it performs.

- Clarify that `module_update()` is a function that I will write, and
  does not currently exist in the codebase.

- Add stepwise high-level workflow in section 6

- Exorcise the last remaining gits and bring back Gits

Markdown version: https://gist.github.com/tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19

--8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--

                          ___________________

                           GSOC GIT PROPOSAL

                             Atharva Raykar
                          ___________________


Table of Contents
_________________

1. Personal Details
2. Background
3. Me and Git
.. 1. Current knowledge of Git
4. The Project: Finish converting `git submodule' to builtin
5. Prior work
6. General implementation strategy
7. Timeline (using the format dd/mm)
8. Beyond GSoC
9. Blogging
10. Final Remarks: A little more about me


1 Personal Details
==================

  Name : Atharva Raykar
  Major : Computer Science and Engineering
  Email : raykar.ath@gmail.com
  IRC nick : atharvaraykar on #git and #git-devel
  Address : RB 103, Purva Riviera, Marathahalli, Bangalore
  Postal Code : 560037
  Time Zone : IST (UTC+5:30)
  GitHub : github.com/tfidfwastaken


2 Background
============

  I am Atharva Raykar, currently in my third year of studying Computer
  Science and Engineering at PES University, Bangalore. I have always
  enjoyed programming since a young age, but my deep appreciation for
  good program design and creating the right abstractions came during my
  exploration of the various rabbitholes of knowledge originating from
  communities around the internet. I have personally enjoyed learning
  about Functional Programming, Database Architecture and Operating
  Systems, and my interests keep expanding as I explore more in this
  field.

  I owe my appreciation of this rich field to these communities, and I
  always wanted to give back. With that goal, I restarted the [PES Open
  Source] community in our campus, with the goal of creating spaces
  where members could share knowledge, much in the same spirit as the
  communities that kickstarted my journey in Computer Science. I learnt
  a lot about collaborating in the open, maintainership, and reviewing
  code. While I have made many small contributions to projects in the
  past, I am hoping GSoC will help me make the leap to a larger and more
  substantial contribution to one of my favourite projects that made it
  all possible in my journey with Open Source.


[PES Open Source] <https://pesos.github.io>


3 Me and Git
============

  Here are the various forms of contributions that I have made to Git:

  - [Microproject] userdiff: add support for Scheme
    Status: In progress, patch v3 requiring a review
    List: <https://lore.kernel.org/git/20210408091442.22740-1-raykar.ath@gmail.com/>

  - [Git Education] Conducted a workshop with attendance of hundreds of
    students new to Git, and increased the prevalence of of Git's usage
    in my campus.
    Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
    <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>

  I intend to continue helping people out on the mailing list and IRC
  and tending to patches wherever possible in the meantime.


3.1 Current knowledge of Git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  I use Git almost daily in some form, and I am fairly comfortable with
  it. I have already read and understood the chapters from the Git Book
  about submodules along with the one on objects, references, packfiles
  and the refspec.


4 The Project: Finish converting `git submodule' to builtin
===========================================================

  Git has historically had many components implemented in the form of
  shell scripts. This was less than ideal for several reasons:
  - Portability: Non-POSIX systems like Windows don't play nice with
    shell script commands like grep, cd and printf, to name a few, and
    these commands have to be reimplemented for the system. There are
    also POSIX to Windows path conversion issues.
  - No direct access to plumbing: Shell commands do not have direct
    access to the low level Git API, and a separate shell is spawned to
    just to carry out their operations.
  - Performance: Shell scripts tend to create a lot of child processes
    which slows down the functioning of these commands, especially with
    large repositories.
  Over the years, many GSoC students have converted the shell versions
  of these commands to C. Git `submodule' is the last of these to be
  converted.


5 Prior work
============

  I will be taking advantage of the knowledge that was gained in the
  process of the converting the previous scripts and avoiding all the
  gotchas that may be present in the process. There may be a bunch of
  useful helper functions in the previous patches that can be reused as
  well (more investigation needed to determine what exactly is
  reusable).

  Currently the only other commands left to be completed for `submodule'
  are `add' and `update'. Work for `add' has already been started by a
  previous GSoCer, Shourya Shukla, and needs to picked up from there.
  `update' has had some of its functionality moved over to
  `submodule--helper.c' where Stefan Beller added the helper functions
  `update-clone', `update-module-mode', `remote-branch' and more.

  References:
  <https://github.com/gitgitgadget/git/issues/541#issuecomment-769245064>
  <https://github.com/git/git/commit/4d6d6ef1fc>
  <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>
  <https://github.com/git/git/commit/ee69b2a90c5031bffb3341c5e50653a6ecca89ac>
  <https://github.com/git/git/commit/92bbe7ccf1fedac825f2c6ab4c8de91dc5370fd2>

  I'll have these as my references when I am working on the project:
  His blog about his progress:
  <https://shouryashukla.blogspot.com/2020/08/the-final-report.html>
  (more has been implemented since)
  Shourya's latest patch for `submodule add':
  <https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/>

  For the most part, the implementation looks fairly complete, but there
  seems to be a segfault occurring, along with a few changes suggested
  by the reviewers. It will be helpful to contact Shourya to fully
  understand what needs to be done.

  Prathamesh's previous conversion work:
  <https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t>

  The ultimate goal would be to get rid of `git-submodules.sh'
  altogether -- which will complete the porting efforts of `submodule'
  to C.


6 General implementation strategy
=================================

  The way to port the shell to C code for `submodule' will largely
  remain the same. There already exists the builtin
  `submodule--helper.c' which contains most of the previous commands'
  ports. All that the shell script for `git-submodule.sh' is doing for
  the previously completed ports is parsing the flags and then calling
  the helper, which does all the business logic.

  So I will be moving out all the business logic that the shell script
  is performing to `submodule--helper.c'. Any reusable functionality
  that is introduced during the port will be added to `submodule.c' in
  the top level.

      For example: The general strategy for converting `cmd_update()' would
      be to have an invocation of to `submodule--helper update <flags>' in
      the shell script which maps to a C function which I would create,
      named `module_update()'. This would perform the work being done by the
      shell script past the flags being parsed and make the necessary call
      to `update_clone()'.
      `update_clone()' takes care of cloning all the submodules and returns
      their SHA1, whether the module was just cloned, and the path to the
      submodule. For each cloned module, it uses the information in those
      entries to find out the update mode through `module_update_mode()',
      and run the appropriate operation according to that mode (like a
      rebase, if that was the update mode). The SHA1 from `update_clone()'
      helps us determine whether we need to update the submodules to match
      what the superproject expects.

      One possible way this work can be broken up into multiple patches is
      by moving over the shell code into C in a bottom-up manner.

      For example: The shell part which retrieves the latest revision in the
      remote (if --remote is specified) can be wrapped into an invocation
      like `git submodule--helper update-remote ${nofetch:+--nofetch}
      <sm_path>'. This would return the remote name and SHA1 for the remote
      tracked by the submodule. Then we can move the part where we run the
      update method (ie the `case' on line 611 onwards) into a C function
      that is invoked by something that looks like `git submodule--helper
      run-update-operation $update-module'. This will run the update
      function, ie, either checkout, merge or rebase depending on the flags
      passed, or configuration setup by the end user. Eventually, the shell
      part will just look like a bunch of invocations to
      `submodule--helper', at which point, the whole thing can be
      encapsulated in a single command called `git submodule--helper update
      <flags>' (Bonus: Move the whole functionality to C, including the
      parsing of flags, to work towards getting rid of `git-submodule.sh').
      I believe this is a fairly non-destructive and incremental way to
      work, and the porting efforts by Stefan seem to follow this same kind
      of philosophy. I will most likely end up tuning the size of these
      increments when I get around to planning in my first phase of the
      project.

  What I have mentioned above is just illustrating what my workflow
  might look like, and the details are subject to change as I will
  probably discover nicer ways to get to the end goal of moving
  everything to `submodule--helper'. What will remain unchanged though,
  is my high level workflow, which can be summarized to these four
  steps:

  1. Identify parts in git-submodule.sh that have cohesive functionality
  2. Rewrite that functionality in C, which can be invoked from `git
     submodule--helper <function name> <args>`
  3. Remove the shell code and replace it with the above invocation.
     This could be sent as one patch, making it easier to review. Steps
     1 to 3 are repeated until the shell code is reduced to a bunch of
     calls to `submodule--helper'
  4. Once the shell code is reduced to only a bunch of calls to
     `submodule--helper', wrap all of that into one call that looks like
     `git submodule--helper update <flags>' that encapsulates all the
     functionality done by the other helper function calls.

  After this process, I will be adding the `add' and `update' command to
  the commands array in `submodule--helper.c'. And since these two
  functions are the last bit of functionality left to convert in
  submodules, an extended goal can be to get rid of the shell script
  altogether, and make the helper into the actual builtin [1].

  [1]
  <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>


7 Timeline (using the format dd/mm)
===================================

  Periods of limited availability (read: hectic chaos):
  - From 13/04 to 20/04 I will be having project evaluations and lab
    assessments for five of my courses.
  - From 20/04 to 01/05 I have my in-semester exams.
  - For a period of two weeks in the range of 08/05 to 29/05 I will be
    having my end-semester exams.
  My commitment: I will still have time during my finals to help people
  out on the mailing list, get acquainted with the community and its
  processes, and even review patches if I can. This is because we get
  holidays between each exam, and my grades are good enough to that I
  can prioritise Git over my studies ;-)

  And on the safe side, I will still engage with the community from now
  till 07/06 so that the community bonding period is not compromised in
  any way.

  Periods of abundant availability: After 29/05 all the way to the first
  week of August, I will be having my summer break, so I can dedicate
  myself to Git full-time :-)

  I would have also finished all my core courses, so even after that, I
  will have enough of time to give back to Git past my GSoC period.

  Phase 1: 07/06 to 14/06 -- Investigate and devise a strategy to port
  the submodule functions
  - This phase will be more diagrams in my notebook than code in my
    editor -- I will go through all the methods used to port the other
    submodule functions and see how to do the same for what is left.
  - I will find the C equivalents of all the shell invocations in
    `git-submodule.sh', and see what invocations have /no/ equivalent
    and need to be created as helpers in C (Eg: What is the equivalent
    to the `ensure-core-worktree' invocation in C?). For all the helpers
    and new functionality that I do introduce, I will need to create the
    testing strategy for the same.
  - I will go through all the work done by Shourya in his patch, and try
    to understand it properly. I will also see the mistakes that were
    caught in all the reviews for previous submodule conversion patches
    and try to learn from them before I jump to the code.
  - Deliverable: I will create a checklist for all the work that needs
    to be done with as much detail as I can with the help of inputs from
    my mentor and all the knowledge I have gained in the process.

  Phase 2: 14/06 to 28/06 -- Convert `add' to builtin in C
  - I will work on completing `git submodule add'. One strategy would be
    to either reimplement the whole thing using what was learnt in
    Shourya's attempt, but it is probably wiser to just take his patch
    and modify it. I would know what to do by the time I reach this
    phase.
  - I will also add tests for this functionality. I will also document
    my changes when required. These would be unit tests for the helpers
    introduced, and integration of `add' with the other commands.
  - Deliverable: Completely port `add' to C!

  Phase 3: 28/06 to 16/08 -- Convert `update' to builtin
  - Some work has already been done by Stephan Beller that moves the
    functionality of `update' to `submodule--helper.c':
    <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>,
    but a lot of the business logic of going into the submodule and
    checking out or merging or rebasing needs to still be converted.
    Plenty to do here.
  - As with `add', all of the appropriate tests need to be written and
    the changes documented. As I have learnt from the Pro Git Book,
    there are a lot of subtleties with how update does its work that I
    need to watch out for.
  - Deliverable: Completely port `update' to C!

  Bonus Phase: If I am ahead of time -- Remove the need for a
  `submodule--helper', and make it a proper C builtin.
  - Once all the submodule functionality is ported, the shell script is
    not really doing much more than parsing the arguments and passing it
    to the helper. We won't need this anymore if it is implemented.


8 Beyond GSoC
=============

  I love the process of working as a community more than anything else,
  and I already felt very welcomed by the Git community the moment I
  started sending in my microproject patch series. Whether I am selected
  or not, I will continue giving back to Git wherever I can. Since my
  final year is light on coursework, I will be able to mentor people and
  help expand the Git developer community through all the ways I can (be
  it code review, helping people find the right resources or evangelism
  of Git).


9 Blogging
==========

  I will be blogging about my progress on a weekly basis and either post
  it on my website at <https://atharvaraykar.me> (probably will tuck it
  away in a /gsoc path). Technical blogging is not particularly new to
  me, and I hope my posts can help future contributors of Git.


10 Final Remarks: A little more about me
========================================

  These are some of my core values that I believe will be important to
  pull off this project and make the most of my time in GSoC:
  - Hard problems don't frustrate me, rather they excite me. Bugs make
    my brain perk up. I love the process of learning.
  - I am pro-transparency. If I am having some trouble, I will be open
    about it. I don't hesitate to ask questions and dig deep if I need
    to.
  - At the same time, when I ask a question, I only do so after I have
    struggled with the problem for enough time and done my due diligence
    in trying to solve it. Clear communication is very important to make
    this work.
  - I am also very comfortable with learning things all on my own (I
    have barely known any other way), and working in a remote,
    asynchronous setting.
  I hope to make the world better in my own small way by contributing to
  a tool that everyone uses and I like. It's more rewarding than any
  internship that my peers are doing this year. I look forward to
  learning more.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-11  9:40     ` Atharva Raykar
@ 2021-04-11 19:32       ` Kaartic Sivaraam
  2021-04-12  5:56         ` Atharva Raykar
  0 siblings, 1 reply; 11+ messages in thread
From: Kaartic Sivaraam @ 2021-04-11 19:32 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: Christian Couder, git, Shourya Shukla, Shourya Shukla

Hi Atharva,

On 11/04/21 3:10 pm, Atharva Raykar wrote:
> 

>> On 10-Apr-2021, at 18:29, Christian Couder <christian.couder@gmail.com> wrote:
>>
>> On Thu, Apr 8, 2021 at 12:19 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>>>
>>> Here's my updated draft. Changes since v1:
>>>
>>> - Elaborated more on example porting strategy, stating how the patches
>>>    could be broken up.
>>> - Made language at the end of section 6 less ambiguous.
>>> - Updated status of microproject.
>>> - s/git/Git in several places.
>>
>> Thanks for this summary of the changes since the previous version!
>>

Yeah. Summaries are really helpful :)

[ ... ]

>>> This
>>>       would perform the work being done by the shell script past the flags
>>>       being parsed and make the necessary call to `update_clone()', which
>>>       returns information about the cloned modules.
>>
>> How does it return information?
>>
>>> For each cloned module,
>>>       it will find out the update mode through `module_update_mode()', and
>>>       run the appropriate operation according to that mode (like a rebase,
>>>       if that was the update mode).
>>>
>>>       One possible way this work can be broken up into multiple patches, is
>>>       by moving over the shell code into C in a bottom-up manner.
>>>       For example: The shell part which retrieves the latest revision in the
>>>       remote (if --remote is specified) can be wrapped into a command of
>>>       `submodule--helper.c'.
>>
>> Could you give an example of how the command would be named, what
>> arguments it would take and how it could be used?
>>
>>> Then we can move the part where we run the
>>>       update method (ie the `case' on line 611 onwards) into a C function.
>>
>> Do you mean the code that does something like:
>>
>>                        case "$update_module" in
>>                        checkout)
>>                                ...
>>                        rebase)
>>                                ...
>>                        merge)
>>                                ...
>>                        !*)
>>                                ...
>>                        *)
>>                                ...
>>                        esac
>>
>>                        if (sanitize_submodule_env; cd "$sm_path" &&
>> $command "$sha1")
>>                        then
>>                                say "$say_msg"
>>                        elif test -n "$must_die_on_failure"
>>                        then
>>                                die_with_status 2 "$die_msg"
>>                        else
>>                                err="${err};$die_msg"
>>                                continue
>>                        fi
>>
>> ?
>>
>> Could you also give an example of how the command would be named, what
>> arguments it would take and how it could be used?
> 
> I could add more detail about the exact arguments each converted part
> would take, but I feel a little hesitant because I will most likely
> change my mind on a lot of those kind of lower-level decisions as I
> understand the codebase better. The point I was trying to convey is
> that the high-level workflow I would follow while converting would look
> like this:
> 
> 1. Identify parts in git-submodule.sh that have cohesive functionality
> 2. Rewrite that functionality in C, which can be invoked from
>      `git submodule--helper <function name> <args>`
> 3. Remove the shell code and replace it with the above invocation
> 4. Once the shell code is reduced to only a bunch of calls to
>      submodule--helper, wrap all of that into one call that looks like
>      `git submodule--helper update <flags>` that encapsulates all the
>      functionality done by the other helper function calls.
> 
> (In other words: I will cluster the functionality in a bottom-up way.
> Maybe I should mention the above four points in my proposal?)
> 

That sounds like a good idea which wouldn't result in one huge patch and
thus avoids reviewer fatigue.

> The example I gave for how to handle the presence of the remote flag
> and the function that performs the module updation method (ie, the `case`
> on line 611) was just to illustrate the above workflow, rather than say
> that this is how I will exactly do it.
> 
> I also would like to know what level of granularity is ideal for the
> proposal. For now I have tried to keep it at "whatever I will surely
> follow through when I work on the project", which at the moment is the
> covered by the four points I mentioned above.
> 
> If I go too much into detail about the functions and arguments
> of every helper in my example, I will feel compelled to do the same for
> the `git submodule add` example. I also will have to reason more carefully
> because I do not want to end up in a situation where I do not actually
> stick to my proposal all that much, because I realise in my investigation
> phase that there is a different, much better way.
> 
> Do let me know what is preferred.
> 

It makes sense that you don't want to go into too much detail in your
proposal. I think Christian wasn't expecting it either. As far as I
understand, he was just trying to make your proposal clear to the person
who reads it. Just mentioning something like,

   This would perform the work being done by the shell script past the
   flags being parsed and make the necessary call to `update_clone()',
   which returns information about the cloned modules.

is not clear as it doesn't say how you're "thinking" the function would
return information. Mention this would be helpful for the reader to know
what your expectations are and if they need any correction. So, it is
better to mention such related information to make your proposal
complete. The high-level flow looks good to me.

Also, I believe Christian would correct me in case I got anything
wrong :)

-- 
Sivaraam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-11 19:32       ` Kaartic Sivaraam
@ 2021-04-12  5:56         ` Atharva Raykar
  2021-04-12 13:29           ` Christian Couder
  0 siblings, 1 reply; 11+ messages in thread
From: Atharva Raykar @ 2021-04-12  5:56 UTC (permalink / raw)
  To: Kaartic Sivaraam; +Cc: Christian Couder, git, Shourya Shukla, Shourya Shukla

On 12-Apr-2021, at 01:02, Kaartic Sivaraam <kaartic.sivaraam@gmail.com> wrote:
> 
> Hi Atharva,
> 
> On 11/04/21 3:10 pm, Atharva Raykar wrote:
> 
>>> On 10-Apr-2021, at 18:29, Christian Couder <christian.couder@gmail.com> wrote:
>>> 
>>> On Thu, Apr 8, 2021 at 12:19 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>>>> 
>>>> Here's my updated draft. Changes since v1:
>>>> 
>>>> - Elaborated more on example porting strategy, stating how the patches
>>>>   could be broken up.
>>>> - Made language at the end of section 6 less ambiguous.
>>>> - Updated status of microproject.
>>>> - s/git/Git in several places.
>>> 
>>> Thanks for this summary of the changes since the previous version!
>>> 
> 
> Yeah. Summaries are really helpful :)
> 
> [ ... ]
> 
>>>> This
>>>>      would perform the work being done by the shell script past the flags
>>>>      being parsed and make the necessary call to `update_clone()', which
>>>>      returns information about the cloned modules.
>>> 
>>> How does it return information?
>>> 
>>>> For each cloned module,
>>>>      it will find out the update mode through `module_update_mode()', and
>>>>      run the appropriate operation according to that mode (like a rebase,
>>>>      if that was the update mode).
>>>> 
>>>>      One possible way this work can be broken up into multiple patches, is
>>>>      by moving over the shell code into C in a bottom-up manner.
>>>>      For example: The shell part which retrieves the latest revision in the
>>>>      remote (if --remote is specified) can be wrapped into a command of
>>>>      `submodule--helper.c'.
>>> 
>>> Could you give an example of how the command would be named, what
>>> arguments it would take and how it could be used?
>>> 
>>>> Then we can move the part where we run the
>>>>      update method (ie the `case' on line 611 onwards) into a C function.
>>> 
>>> Do you mean the code that does something like:
>>> 
>>>                       case "$update_module" in
>>>                       checkout)
>>>                               ...
>>>                       rebase)
>>>                               ...
>>>                       merge)
>>>                               ...
>>>                       !*)
>>>                               ...
>>>                       *)
>>>                               ...
>>>                       esac
>>> 
>>>                       if (sanitize_submodule_env; cd "$sm_path" &&
>>> $command "$sha1")
>>>                       then
>>>                               say "$say_msg"
>>>                       elif test -n "$must_die_on_failure"
>>>                       then
>>>                               die_with_status 2 "$die_msg"
>>>                       else
>>>                               err="${err};$die_msg"
>>>                               continue
>>>                       fi
>>> 
>>> ?
>>> 
>>> Could you also give an example of how the command would be named, what
>>> arguments it would take and how it could be used?
>> I could add more detail about the exact arguments each converted part
>> would take, but I feel a little hesitant because I will most likely
>> change my mind on a lot of those kind of lower-level decisions as I
>> understand the codebase better. The point I was trying to convey is
>> that the high-level workflow I would follow while converting would look
>> like this:
>> 1. Identify parts in git-submodule.sh that have cohesive functionality
>> 2. Rewrite that functionality in C, which can be invoked from
>>     `git submodule--helper <function name> <args>`
>> 3. Remove the shell code and replace it with the above invocation
>> 4. Once the shell code is reduced to only a bunch of calls to
>>     submodule--helper, wrap all of that into one call that looks like
>>     `git submodule--helper update <flags>` that encapsulates all the
>>     functionality done by the other helper function calls.
>> (In other words: I will cluster the functionality in a bottom-up way.
>> Maybe I should mention the above four points in my proposal?)
> 
> That sounds like a good idea which wouldn't result in one huge patch and
> thus avoids reviewer fatigue.
> 
>> The example I gave for how to handle the presence of the remote flag
>> and the function that performs the module updation method (ie, the `case`
>> on line 611) was just to illustrate the above workflow, rather than say
>> that this is how I will exactly do it.
>> I also would like to know what level of granularity is ideal for the
>> proposal. For now I have tried to keep it at "whatever I will surely
>> follow through when I work on the project", which at the moment is the
>> covered by the four points I mentioned above.
>> If I go too much into detail about the functions and arguments
>> of every helper in my example, I will feel compelled to do the same for
>> the `git submodule add` example. I also will have to reason more carefully
>> because I do not want to end up in a situation where I do not actually
>> stick to my proposal all that much, because I realise in my investigation
>> phase that there is a different, much better way.
>> Do let me know what is preferred.
> 
> It makes sense that you don't want to go into too much detail in your
> proposal. I think Christian wasn't expecting it either. As far as I
> understand, he was just trying to make your proposal clear to the person
> who reads it. Just mentioning something like,
> 
>  This would perform the work being done by the shell script past the
>  flags being parsed and make the necessary call to `update_clone()',
>  which returns information about the cloned modules.
> 
> is not clear as it doesn't say how you're "thinking" the function would
> return information. Mention this would be helpful for the reader to know
> what your expectations are and if they need any correction. So, it is
> better to mention such related information to make your proposal
> complete. The high-level flow looks good to me.

Alright, I get what you mean. I hope my v3 communicated my intention
more clearly. Translating my thoughts to text is hard work, and the
good part of revisiting my proposal and fleshing out the details is
it is forcing me to understand the problem better :)

> Also, I believe Christian would correct me in case I got anything
> wrong :)
> 
> -- 
> Sivaraam


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-12  5:56         ` Atharva Raykar
@ 2021-04-12 13:29           ` Christian Couder
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Couder @ 2021-04-12 13:29 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: Kaartic Sivaraam, git, Shourya Shukla, Shourya Shukla

On Mon, Apr 12, 2021 at 7:56 AM Atharva Raykar <raykar.ath@gmail.com> wrote:
> On 12-Apr-2021, at 01:02, Kaartic Sivaraam <kaartic.sivaraam@gmail.com> wrote:
> > On 11/04/21 3:10 pm, Atharva Raykar wrote:

> >> The example I gave for how to handle the presence of the remote flag
> >> and the function that performs the module updation method (ie, the `case`
> >> on line 611) was just to illustrate the above workflow, rather than say
> >> that this is how I will exactly do it.
> >> I also would like to know what level of granularity is ideal for the
> >> proposal. For now I have tried to keep it at "whatever I will surely
> >> follow through when I work on the project", which at the moment is the
> >> covered by the four points I mentioned above.
> >> If I go too much into detail about the functions and arguments
> >> of every helper in my example, I will feel compelled to do the same for
> >> the `git submodule add` example. I also will have to reason more carefully
> >> because I do not want to end up in a situation where I do not actually
> >> stick to my proposal all that much, because I realise in my investigation
> >> phase that there is a different, much better way.
> >> Do let me know what is preferred.
> >
> > It makes sense that you don't want to go into too much detail in your
> > proposal. I think Christian wasn't expecting it either. As far as I
> > understand, he was just trying to make your proposal clear to the person
> > who reads it. Just mentioning something like,
> >
> >  This would perform the work being done by the shell script past the
> >  flags being parsed and make the necessary call to `update_clone()',
> >  which returns information about the cloned modules.
> >
> > is not clear as it doesn't say how you're "thinking" the function would
> > return information. Mention this would be helpful for the reader to know
> > what your expectations are and if they need any correction. So, it is
> > better to mention such related information to make your proposal
> > complete. The high-level flow looks good to me.
>
> Alright, I get what you mean. I hope my v3 communicated my intention
> more clearly. Translating my thoughts to text is hard work, and the
> good part of revisiting my proposal and fleshing out the details is
> it is forcing me to understand the problem better :)

Yeah, the idea is that you should try to show in your proposal that
you have understood some of the problems well enough. If there are
things that are not clear or not very detailed, they are not very
useful as they won't show us that you have understood much. It's
better to focus on a few things or examples and explain them clearly
and with enough detail, than to try to cover a lot of ground in a
vague way.

In other words if you can explain well a sensible plan to convert a
small part of the code, and give sensible details about that small
part, we can have trust that you will manage to do it for the whole
project even if some of the details change.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
  2021-04-10 12:59   ` Christian Couder
  2021-04-11 10:17   ` [GSoC][Draft Proposal v3] " Atharva Raykar
@ 2021-05-14 16:00   ` Atharva Raykar
  2021-05-16 18:40     ` Kaartic Sivaraam
  2 siblings, 1 reply; 11+ messages in thread
From: Atharva Raykar @ 2021-05-14 16:00 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Shourya Shukla, Shourya Shukla, Kaartic Sivaraam

On 08-Apr-2021, at 15:49, Atharva Raykar <raykar.ath@gmail.com> wrote:
> 
> Here's my updated draft. Changes since v1:
> 
> - Elaborated more on example porting strategy, stating how the patches
>   could be broken up.
> - Made language at the end of section 6 less ambiguous.
> - Updated status of microproject.
> - s/git/Git in several places.
> 
> Markdown version: https://gist.github.com/tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19
> 
> --8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--
>                          ___________________
> 
>                           GSOC GIT PROPOSAL
> 
>                             Atharva Raykar
>                          ___________________
> 
> 
> Table of Contents
> _________________
> 
> 1. Personal Details
> 2. Background
> 3. Me and Git
> .. 1. Current knowledge of Git
> 4. The Project: Finish converting `git submodule' to builtin
> 5. Prior work
> 6. General implementation strategy
> 7. Timeline (using the format dd/mm)
> 8. Beyond GSoC
> 9. Blogging
> 10. Final Remarks: A little more about me
> 
> 
> 1 Personal Details
> ==================
> 
>  Name : Atharva Raykar
>  Major : Computer Science and Engineering
>  Email : raykar.ath@gmail.com
>  IRC nick : atharvaraykar on #git and #git-devel
>  Address : RB 103, Purva Riviera, Marathahalli, Bangalore
>  Postal Code : 560037
>  Time Zone : IST (UTC+5:30)
>  GitHub : github.com/tfidfwastaken
> 
> 
> 2 Background
> ============
> 
>  I am Atharva Raykar, currently in my third year of studying Computer
>  Science and Engineering at PES University, Bangalore. I have always
>  enjoyed programming since a young age, but my deep appreciation for
>  good program design and creating the right abstractions came during my
>  exploration of the various rabbitholes of knowledge originating from
>  communities around the internet. I have personally enjoyed learning
>  about Functional Programming, Database Architecture and Operating
>  Systems, and my interests keep expanding as I explore more in this
>  field.
> 
>  I owe my appreciation of this rich field to these communities, and I
>  always wanted to give back. With that goal, I restarted the [PES Open
>  Source] community in our campus, with the goal of creating spaces
>  where members could share knowledge, much in the same spirit as the
>  communities that kickstarted my journey in Computer Science. I learnt
>  a lot about collaborating in the open, maintainership, and reviewing
>  code. While I have made many small contributions to projects in the
>  past, I am hoping GSoC will help me make the leap to a larger and more
>  substantial contribution to one of my favourite projects that made it
>  all possible in my journey with Open Source.
> 
> 
> [PES Open Source] <https://pesos.github.io>
> 
> 
> 3 Me and Git
> ============
> 
>  Here are the various forms of contributions that I have made to Git:
> 
>  - [Microproject] userdiff: userdiff: add support for Scheme Status: In
>    progress, patch v3 requiring a review List:
>    <https://lore.kernel.org/git/20210408091442.22740-1-raykar.ath@gmail.com/>
> 
>  - [Git Education] Conducted a workshop with attendance of hundreds of
>    students new to git, and increased the prevalence of of git's usage
>    in my campus.
>    Photos: <https://photos.app.goo.gl/T7CPk1zkHdK7mx6v7> and
>    <https://photos.app.goo.gl/bzTgdHMttxDen6z9A>
> 
>  I intend to continue helping people out on the mailing list and IRC
>  and tending to patches wherever possible in the meantime.
> 
> 
> 3.1 Current knowledge of Git
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  I use Git almost daily in some form, and I am fairly comfortable with
>  it. I have already read and understood the chapters from the Git Book
>  about submodules along with the one on objects, references, packfiles
>  and the refspec.
> 
> 
> 4 The Project: Finish converting `git submodule' to builtin
> ===========================================================
> 
>  Git has historically had many components implemented in the form of
>  shell scripts. This was less than ideal for several reasons:
>  - Portability: Non-POSIX systems like Windows don't play nice with
>    shell script commands like grep, cd and printf, to name a few, and
>    these commands have to be reimplemented for the system. There are
>    also POSIX to Windows path conversion issues.
>  - No direct access to plumbing: Shell commands do not have direct
>    access to the low level Git API, and a separate shell is spawned to
>    just to carry out their operations.
>  - Performance: Shell scripts tend to create a lot of child processes
>    which slows down the functioning of these commands, especially with
>    large repositories.
>  Over the years, many GSoC students have converted the shell versions
>  of these commands to C. Git `submodule' is the last of these to be
>  converted.
> 
> 
> 5 Prior work
> ============
> 
>  I will be taking advantage of the knowledge that was gained in the
>  process of the converting the previous scripts and avoiding all the
>  gotchas that may be present in the process. There may be a bunch of
>  useful helper functions in the previous patches that can be reused as
>  well (more investigation needed to determine what exactly is
>  reusable).
> 
>  Currently the only other commands left to be completed for `submodule'
>  are `add' and `update'. Work for `add' has already been started by a
>  previous GSoCer, Shourya Shukla, and needs to picked up from there.
>  `update' has had some of its functionality moved over to
>  `submodule--helper.c' where Stefan Beller added the helper functions
>  `update-clone', `update-module-mode', `remote-branch' and more.
> 
>  References:
>  <https://github.com/gitgitgadget/git/issues/541#issuecomment-769245064>
>  <https://github.com/git/git/commit/4d6d6ef1fc>
>  <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>
>  <https://github.com/git/git/commit/ee69b2a90c5031bffb3341c5e50653a6ecca89ac>
>  <https://github.com/git/git/commit/92bbe7ccf1fedac825f2c6ab4c8de91dc5370fd2>
> 
>  I'll have these as my references when I am working on the project:
>  His blog about his progress:
>  <https://shouryashukla.blogspot.com/2020/08/the-final-report.html>
>  (more has been implemented since)
>  Shourya's latest patch for `submodule add':
>  <https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/>
> 
>  For the most part, the implementation looks fairly complete, but there
>  seems to be a segfault occurring, along with a few changes suggested
>  by the reviewers. It will be helpful to contact Shourya to fully
>  understand what needs to be done.
> 
>  Prathamesh's previous conversion work:
>  <https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t>
> 
>  The ultimate goal would be to get rid of `git-submodules.sh'
>  altogether -- which will complete the porting efforts of `submodule'
>  to C.
> 
> 
> 6 General implementation strategy
> =================================
> 
>  The way to port the shell to C code for `submodule' will largely
>  remain the same. There already exists the builtin
>  `submodule--helper.c' which contains most of the previous commands'
>  ports. All that the shell script for `git-submodule.sh' is doing for
>  the previously completed ports is parsing the flags and then calling
>  the helper, which does all the business logic.
> 
>  So I will be moving out all the business logic that the shell script
>  is performing to `submodule--helper.c'. Any reusable functionality
>  that is introduced during the port will be added to `submodule.c' in
>  the top level.
> 
>      For example: The general strategy for converting `cmd_update()' would
>      be to have a call to `submodule--helper' in the shell script to a
>      function which would resemble something like `module_update()'. This
>      would perform the work being done by the shell script past the flags
>      being parsed and make the necessary call to `update_clone()', which
>      returns information about the cloned modules. For each cloned module,
>      it will find out the update mode through `module_update_mode()', and
>      run the appropriate operation according to that mode (like a rebase,
>      if that was the update mode).
> 
>      One possible way this work can be broken up into multiple patches, is
>      by moving over the shell code into C in a bottom-up manner.
>      For example: The shell part which retrieves the latest revision in the
>      remote (if --remote is specified) can be wrapped into a command of
>      `submodule--helper.c'. Then we can move the part where we run the
>      update method (ie the `case' on line 611 onwards) into a C function.
>      Eventually, the shell part will just look like a bunch of invocations
>      to `submodule--helper', at which point, the whole thing can be
>      encapsulated in a single command called `git submodule--helper update'
>      (Bonus: Move the whole functionality to C, including the parsing of
>      flags, to work towards getting rid of `git-submodule.sh'). I believe
>      this is a fairly non-destructive and incremental way to work, and the
>      porting efforts by Stefan seem to follow this same kind of philosophy.
>      I will most likely end up tuning the size of these increments when I
>      get around to planning in my first phase of the project.
> 
>  After this process, I will be adding the `add' and `update' command to
>  the commands array in `submodule--helper.c'. And since these two
>  functions are the last bit of functionality left to convert in
>  submodules, an extended goal can be to get rid of the shell script
>  altogether, and make the helper into the actual builtin [1].
> 
>  [1]
>  <https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/>

Hi all. I wanted to keep you informed, there have been some changes
in my personal schedule.

> 7 Timeline (using the format dd/mm)
> ===================================
> 
>  Periods of limited availability (read: hectic chaos):
>  - From 13/04 to 20/04 I will be having project evaluations and lab
>    assessments for five of my courses.
>  - From 20/04 to 01/05 I have my in-semester exams.
>  - For a period of two weeks in the range of 08/05 to 29/05 I will be
>    having my end-semester exams.
>  My commitment: I will still have time during my finals to help people
>  out on the mailing list, get acquainted with the community and its
>  processes, and even review patches if I can. This is because we get
>  holidays between each exam, and my grades are good enough to that I
>  can prioritise Git over my studies ;-)

Because of how hard COVID's second wave has hit my place, my exams
(which happen offline) had been indefinitely postponed. My university
has since given me the new dates for the finals -- either I give it on
June 2, or give it on an unannounced date in July. Either way it will
be happening during the GSoC coding period.

I just want to reiterate that I will be less available in the two weeks
during which the exams take place. I intend to work half-time on the
project only for those two weeks, and I don't mind working a little
extra on the other days to make up for it.

I will send another update for which slot I will be choosing *if* I get
selected for GSoC. Since the dates are shifting around a lot, according
to the situation, I will send updates on those as well.

>  And on the safe side, I will still engage with the community from now
>  till 07/06 so that the community bonding period is not compromised in
>  any way.
> 
>  Periods of abundant availability: After 29/05 all the way to the first
>  week of August, I will be having my summer break, so I can dedicate
>  myself to Git full-time :-)
> 
>  I would have also finished all my core courses, so even after that, I
>  will have enough of time to give back to Git past my GSoC period.
> 
>  Phase 1: 07/06 to 14/06 -- Investigate and devise a strategy to port
>  the submodule functions
>  - This phase will be more diagrams in my notebook than code in my
>    editor -- I will go through all the methods used to port the other
>    submodule functions and see how to do the same for what is left.
>  - I will find the C equivalents of all the shell invocations in
>    `git-submodule.sh', and see what invocations have /no/ equivalent
>    and need to be created as helpers in C (Eg: What is the equivalent
>    to the `ensure-core-worktree' invocation in C?). For all the helpers
>    and new functionality that I do introduce, I will need to create the
>    testing strategy for the same.
>  - I will go through all the work done by Shourya in his patch, and try
>    to understand it properly. I will also see the mistakes that were
>    caught in all the reviews for previous submodule conversion patches
>    and try to learn from them before I jump to the code.
>  - Deliverable: I will create a checklist for all the work that needs
>    to be done with as much detail as I can with the help of inputs from
>    my mentor and all the knowledge I have gained in the process.
> 
>  Phase 2: 14/06 to 28/06 -- Convert `add' to builtin in C
>  - I will work on completing `git submodule add'. One strategy would be
>    to either reimplement the whole thing using what was learnt in
>    Shourya's attempt, but it is probably wiser to just take his patch
>    and modify it. I would know what to do by the time I reach this
>    phase.
>  - I will also add tests for this functionality. I will also document
>    my changes when required. These would be unit tests for the helpers
>    introduced, and integration of `add' with the other commands.
>  - Deliverable: Completely port `add' to C!
> 
>  Phase 3: 28/06 to 16/08 -- Convert `update' to builtin
>  - Some work has already been done by Stephan Beller that moves the
>    functionality of `update' to `submodule--helper.c':
>    <https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e>,
>    but a lot of the business logic of going into the submodule and
>    checking out or merging or rebasing needs to still be converted.
>    Plenty to do here.
>  - As with `add', all of the appropriate tests need to be written and
>    the changes documented. As I have learnt from the Pro Git Book,
>    there are a lot of subtleties with how update does its work that I
>    need to watch out for.
>  - Deliverable: Completely port `update' to C!
> 
>  Bonus Phase: If I am ahead of time -- Remove the need for a
>  `submodule--helper', and make it a proper C builtin.
>  - Once all the submodule functionality is ported, the shell script is
>    not really doing much more than parsing the arguments and passing it
>    to the helper. We won't need this anymore if it is implemented.
> 
> 
> 8 Beyond GSoC
> =============
> 
>  I love the process of working as a community more than anything else,
>  and I already felt very welcomed by the Git community the moment I
>  started sending in my microproject patch series. Whether I am selected
>  or not, I will continue giving back to Git wherever I can. Since my
>  final year is light on coursework, I will be able to mentor people and
>  help expand the Git developer community through all the ways I can (be
>  it code review, helping people find the right resources or evangelism
>  of Git).
> 
> 
> 9 Blogging
> ==========
> 
>  I will be blogging about my progress on a weekly basis and either post
>  it on my website at <https://atharvaraykar.me> (probably will tuck it
>  away in a /gsoc path). Technical blogging is not particularly new to
>  me, and I hope my posts can help future contributors of Git.
> 
> 
> 10 Final Remarks: A little more about me
> ========================================
> 
>  These are some of my core values that I believe will be important to
>  pull off this project and make the most of my time in GSoC:
>  - Hard problems don't frustrate me, rather they excite me. Bugs make
>    my brain perk up. I love the process of learning.
>  - I am pro-transparency. If I am having some trouble, I will be open
>    about it. I don't hesitate to ask questions and dig deep if I need
>    to.
>  - At the same time, when I ask a question, I only do so after I have
>    struggled with the problem for enough time and done my due diligence
>    in trying to solve it. Clear communication is very important to make
>    this work.
>  - I am also very comfortable with learning things all on my own (I
>    have barely known any other way), and working in a remote,
>    asynchronous setting.
>  I hope to make the world better in my own small way by contributing to
>  a tool that everyone uses and I like. It's more rewarding than any
>  internship that my peers are doing this year. I look forward to
>  learning more.
> 

--
Atharva Raykar


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GSoC][Draft Proposal v2] Finish converting git submodule to builtin
  2021-05-14 16:00   ` [GSoC][Draft Proposal v2] " Atharva Raykar
@ 2021-05-16 18:40     ` Kaartic Sivaraam
  0 siblings, 0 replies; 11+ messages in thread
From: Kaartic Sivaraam @ 2021-05-16 18:40 UTC (permalink / raw)
  To: Atharva Raykar; +Cc: git, Christian Couder, Shourya Shukla, Shourya Shukla

Hi,

When quoting, only retain the most relevant portions of the e-mail. That
would help the readers of the e-mail a lot :)

On Fri, May 14, 2021 at 9:30 PM Atharva Raykar <raykar.ath@gmail.com> wrote:
>
> Hi all. I wanted to keep you informed, there have been some changes
> in my personal schedule.
>
> > 7 Timeline (using the format dd/mm)
> > ===================================
> >
> >  Periods of limited availability (read: hectic chaos):
> >  - From 13/04 to 20/04 I will be having project evaluations and lab
> >    assessments for five of my courses.
> >  - From 20/04 to 01/05 I have my in-semester exams.
> >  - For a period of two weeks in the range of 08/05 to 29/05 I will be
> >    having my end-semester exams.
> >  My commitment: I will still have time during my finals to help people
> >  out on the mailing list, get acquainted with the community and its
> >  processes, and even review patches if I can. This is because we get
> >  holidays between each exam, and my grades are good enough to that I
> >  can prioritise Git over my studies ;-)
>
> Because of how hard COVID's second wave has hit my place, my exams
> (which happen offline) had been indefinitely postponed. My university
> has since given me the new dates for the finals -- either I give it on
> June 2, or give it on an unannounced date in July. Either way it will
> be happening during the GSoC coding period.
>
> I just want to reiterate that I will be less available in the two weeks
> during which the exams take place. I intend to work half-time on the
> project only for those two weeks, and I don't mind working a little
> extra on the other days to make up for it.
>
> I will send another update for which slot I will be choosing *if* I get
> selected for GSoC. Since the dates are shifting around a lot, according
> to the situation, I will send updates on those as well.
>

Thanks for letting us know.

On a general note, this year's GSoC is unlike previous years. There's
flexibility in the program that should help in situations like these. Some
information about it could be found here:

https://opensource.googleblog.com/2020/10/google-summer-of-code-2021-is-bringing.html

-- 
Sivaraam

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-16 18:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-03 14:08 [GSoC][Draft Proposal] Finish converting git submodule to builtin Atharva Raykar
2021-04-05 16:02 ` Christian Couder
2021-04-08 10:19 ` [GSoC][Draft Proposal v2] " Atharva Raykar
2021-04-10 12:59   ` Christian Couder
2021-04-11  9:40     ` Atharva Raykar
2021-04-11 19:32       ` Kaartic Sivaraam
2021-04-12  5:56         ` Atharva Raykar
2021-04-12 13:29           ` Christian Couder
2021-04-11 10:17   ` [GSoC][Draft Proposal v3] " Atharva Raykar
2021-05-14 16:00   ` [GSoC][Draft Proposal v2] " Atharva Raykar
2021-05-16 18:40     ` Kaartic Sivaraam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.