Git Mailing List Archive on
 help / color / Atom feed
* [GSoC][RFC][Proposal v5] Convert submodule to builtin
@ 2020-03-25 18:50 Shourya Shukla
  0 siblings, 0 replies; only message in thread
From: Shourya Shukla @ 2020-03-25 18:50 UTC (permalink / raw)
  To: git
  Cc: christian.couder, peff, heba.waly, newren, Johannes.Schindelin,
	Shourya Shukla

Hello everyone,

Thank you so much for reviewing my proposal v3 and v4 Christian! :)
This is the fifth draft of my GSoC Proposal. After discussions with Christian, I decided to improve the timeline and made some changes accordingly.

Also, could you please tell me which contributions are worth keeping in the 'Contributions to Git' section? I listed almost everything here. I have
answered multiple questions on StackOverflow as well. There is no need to separately link them right? I have mentioned my StackOverflow profile below BTW.

Changes made:
	1. Added more entries in the 'Contributions to Git' section
	2. Improved the 'Outline' section
	3. Added the 'Submodules and git submodule' section
	4. Added more links in the 'Contribution process and interaction with the mentors' section
	5. Improved the 'Project Timeline' section. Made some changes to the timeline
	6. Made some additions in the 'Workflow' section
	7. Corrected grammatical errors and spelling mistakes

PS: A prettier version of this proposal is on Docs, it is more readable than the plain-text version :)
Google Docs:



# Convert submodule to builtin

## Contact Information

Name          : Shourya Shukla
Major         : Computer Science and Engineering
E-mail        :
IRC           : rasengan_chidori on #git & #git-devel
Mobile no     : <<mobile no>>
GitHub        : periperidip[]
Linkedin      : shuklashourya[]
StackOverflow : rasengan__[]
Website	      :
Address       : <<address>>
Time Zone     : IST (UTC +0530)

## Background

I am Shourya Shukla, a sophomore in Computer Science and Engineering at the Indian Institute of Technology Roorkee[].
I was introduced to programming at a young age and I have been trying to learn new concepts everyday since. My
interests include modern mobile networks, Internet of Things, system software development and Cryptography. I have been working
on a research project[] which involves providing cellular network
access to users in a disaster-struck area via drones. I love low-level coding and FLOSS as well. I have been an active part of
the Git community since January of this year, contributing to Git.

## Work Environment

I am fluent in C/C++, Java and Shell script, and have an understanding of Python
as well. I use Git as my VCS and Visual Studio Code with integrated GDB as my
primary code editor and Ubuntu 19.10 as my primary Operating System unless the
work specifically needs Windows.

## Contributions to Git

Contributing to Git helped me understand a lot about how modern & robust software
work as well as how real world development takes place. I plan on contributing even
more to Git and make my contributions count. As of now, my contributions at Git are:

status: merged
[Microproject]: Modernise tests and use helper functions in test script.

[Solved doubt]: fatal: cannot rebase with locally recorded submodule modifications

[Aided a new contributor]: Need help to start contributing

[Aided a potential GSoC student]: [GSoC] Microproject for git

[Reviewed a Microproject]: [GSoC][PATCH 1/2] t4131: modernize style

## The Project: Convert submodule to builtin

#### Outline

Some Git commands were initially implemented directly in shell script with some instances of Perl as well. As times progressed, various platforms to run Git emerged & projects became large(spanning millions of lines of code), enter, problems in production level code:

- There were issues with portability of code. The submodule shell script uses commands such as echo, grep, cd, test and printf to name a few. When switching to non-POSIX compliant systems, one
will have to re-implement these commands specific to the system. There are also POSIX-to-Windows path conversion issues. To fix these issues, it was decided to convert these scripts into
portable C code(the original intention C was developed with, to have portable code and software).

- There is large overhead involved in calling the command. As these commands implemented in shell script are not buitlins, they tend to call multiple fork() and exec() syscalls for creating more child processes hence creating another shell. This is the aforementioned overhead we are talking about and it rather takes a huge toll on big repositories in terms of the time elapsed to run a command as well as the extra memory consumed.

- If commands tend to use other commands inside of them(such as git submodule using git rev-parse, git ls-files and git add to name a few), the overhead mentioned in the point above tends to rise exponentially which again would contribute to the slowing down of the whole git suite.

Various commands have been converted as of now due to the reasons mentioned above, such as add, blame, bisect(work in progress), etc. In my project, I intend to convert submodule into C fully, hence making it a ‘builtin’.

#### Submodules and git submodule

Submodule, as defined in the gitglossary is, “A repository that holds the history of a separate project inside another repository (the latter of which is called superproject).”, which translates to, an independent git repository inside another git repository.

Submodules are used when we need to use some work from an external repository(say we need a particular library(eg: boost) to implement in our code) but at the same time keep it “independent” from our repository, meaning that they do not really interfere with our superproject’s tree as the submodules commit are not put on the top of, or in fact, anywhere into the superprojects tree. Any changes in the submodule are reflected as a change in any other directory w.r.t the superproject. In a nutshell, the submodule’s tree is independent of the superproject’s tree.

Git, for instance, uses the sha1collisiondetection repository[] as a submodule.
git submodule is a command to manipulate and deal with submodules. Our aim is to convert this command from its shell form into its C form.

#### Previous Work

There has been an ongoing work in the conversion of various Git commands such as 'add', 'commit', 'blame', etc. from their shell form into their C form. 'git submodule' is one of the commands left to fully convert into its C form. Stefan Beller <> converted a large part of this command up until 2019. Prathamesh Chavan <> also aided in the conversion of the command during his GSoC project in the year 2017. In its current state, four git submodule subcommands are due for conversion, namely: 'add', 'set-branch', 'set-url' and 'summary'. Also, the Command Line parsing Interface needs improvements, such as better error messages and support for more subcommands.

Prathamesh implemented and improved the subcommands status[], sync[], deinit[] and some more. The relevancy of this to my project is that some helper functions(located in submodule.c) such as print_submodule_summary(),prepare_submodule_summary(), etc. have been implemented beforehand. In the case of subcommand summary, use these functions, integrate them with the basic scaffolding(mentioned in the table below) and implement the module_summary() frontend function. He also ported various helper functions such as set_name_rev()[]. He kept offering improvements to his conversions till around January of 2018.

Stefan Beller finished the implementation of the subcommand init[] as well as laid its foundation[]. He implemented foreach[] and improved deinit[] & update[] as well. He also ported various helper functions such as resolve_relative_url()[].

#### Current Status of the subcommand and future vision

The current status of the conversion as well as the direction I will take for the conversion of the subcommands are as follows:

add: pending conversion, full code needs to be written for the same. Need to implement callback macros and structures, i.e. struct add_cb,
ADD_CB_INIT, as well as frontend function module_add(). Other helper functions may be needed in the process as well. Compare with shell
script and try to “translate” it into C. I guesstimate around 400-500 lines of code for this(including helper functions).

set-branch: pending conversion, full code needs to be written for the same. Need to implement macros and structures, i.e. struct setbranch,
SETBRANCH_CB_INIT, as well as frontend function module_setbranch(). Other helper functions(such as remote_submodule_branch() &
get_default_remote() which are already implemented may prove helpful later) may be needed in the process as well. Compare with shell
script and try to “translate” it into C. This subcommand may take about 200 lines of C code to implement(including helper functions).

set-url: pending conversion, full code needs to be written for the same. Need to implement macros and structures, i.e. struct seturl,
SETURL_CB_INIT, as well as frontend function module_seturl(). Other helper functions(such as remote_url() & resolve_resolve_url() which
are already implemented may prove helpful later) may be needed in the process as well. Compare with shell script and try to “translate” it
into C. It will have a similar implementation to set-branch because they are “setter” functions. This subcommand may take about 200 lines
of C code to implement(including helper functions).

summary: pending conversion, work in progress; callback structures, functions and macros have been created, also, basic scaffolding of the command
is done, i.e., functions module_summary(), summary_submodule(), summary_submodule_cb(). As this is a prototype, some functions may be scrapped
or added later. Other functions to complement the subcommand have already been created; learn from Prathamesh's mistakes and implement a better code.
After discussions with Junio C Hamano[],
I intend to add a “--recursive” option as well for summary so as to obtain summaries of nested submodules as well.
I estimate about 400 lines of code for this subcommand(excluding the  “--recursive” option, yet including the helper functions)

status: conversion complete, currently in a functional state.
init: conversion complete, currently in a functional state.

deinit: conversion complete, currently in a functional state.
update: conversion complete, currently in a functional state.
foreach: conversion complete, currently in a functional state.

sync: conversion complete, currently in a functional state.

absorbgitdirs: conversion complete, currently in a functional state.

I aim to follow the same approach as Stefan and Prathamesh as mentioned above, that is, I will also create a scaffolding first(which will be based on the already implemented commands). Followed by a comparison with the shell script and then picking out which helper functions might be needed and also reusing already implemented functions in 'submodule.c' and 'submodule--helper.c'.

Though, there is about a 3 year gap between their work and mine, the model for porting seems to be consistent even if coding style may vary and might even give out improvements over previous implementations.

#### Contribution process and interaction with the mentors

I will keep committing changes on my GitHub fork[] and finally post a patch series on the Mailing List. I will make sure to keep interacting with the community as well as the mentors regularly. I aim to write weekly “progress report” blogs, which I will post on my website[] as well as the List. Apart from that, I will document anything new I learn as well as my journey in the GSoC program on my blogs and maybe as self-answered questions on StackOverflow with the aim that they will help me as well as others in case of reference.

#### Project Timeline

I have been studying the code of 'submodule.c', 'submodule--helper.c' and ''
since the submission of my microproject. After studying the codes, I tried to devise an effective
conversion strategy for 'submodule'. I noticed that 'submodule.c' contains various helper functions
for 'submodule--helper.c' whereas the latter houses the main "converted" command as of now.
The subcommands ‘set-branch’ and ‘set-url’ will provide easy conversion due to the vast array of helper functions already available for them. Hence, I intend to implement them before the other subcommands due to their simplicity in implementation as well as the motivation it will give me to do more.

After considering a lot of things, and important advice from Christian Couder, I have decided that I will first implement ‘set-url’ and ‘set-branch’, followed by ‘summary’ and finally ‘add’. Integration testing and documentation updates will keep following the implementations. To add on, the conversion of summary might become a tad bit easier due to the existence of a patch[] to convert it, which will aid me in learning from the mistakes committed before and thus help me offer an even more improved version of the subcommand. .

Therefore, after all these considerations, the timeline looks like:

- Empty Period (Present - April 26)
--> I am writing a paper(on the project[] I have been working upon) for a conference which I have to finalise and submit by first week of April. Hence, I might be inactive in that period
--> My end-semester exams begin on April 23(tentative, may change due to the Corona pandemic) hence I might be a bit busy a week or so before their commencement as well as the 14 days in which exams take place
--> I plan on starting the conversion of set-url’ and 'set-branch’ in this period. Although I am busy, I will try my best to implement a basic scaffolding and maybe even complete some good portion of the subcommands and will keep my mentors posted regarding the same

- Community Bonding Period (April 27 - May 18)
--> Get familiar with the community
--> Improve the project workflow: make some timeline changes if necessary
--> Finish implementation of set-url’ and 'set-branch’ subcommands
--> Update the Documentation

- Phase 1 (May 19 - June 6)
--> Convert ‘summary’ subcommand
--> Improve CLI parsing(give out better error messages)
--> Update the Documentation
--> Add appropriate tests for integration testing of ‘set-url’, ‘set-branch’ and ‘summary’

- Phase 2 (June 7- August 8)
--> Convert 'add' subcommand
--> Improve the remaining bits of the CLI parsing
--> Update the Documentation
--> Add appropriate tests for integration testing of ‘add’ with the whole system

- Final Phase (August 9 - August 17)
--> Improve and add Documentation(if there is any still left)
--> Apply final touch-ups to code

If there is some extra time left, I will try to implement some BONUS features.

**BONUS features:** Consist of command touch ups and improving some bugs such as code sections with 'NEEDSWORK' tags, improving the test files and maybe improve some previous implementations of helper functions. Also, there are some incomplete bits[] of the ‘update’ subcommand as well in the shell file, as pointed out by Dscho[], which may need to be corrected.

## Workflow

I have divided the project into 3 subprojects(SP).

1. **SP 1:** Convert ‘set-branch’ and set-url’
2. **SP 2:** Convert ‘summary’ and and improve CLI(Command Line Interface) parsing
3. **SP 3:** Convert ‘add’ and improve CLI parsing

After discussions with Christian Couder, I plan to start SP1 before GSoC itself. Currently,
I am studying the code in detail and constructing a scaffolding for this implementation.
I aim to complete the leftover bits of SP1 during Phase 1 and SP2 & SP3 during Phase 2 of

As Derrick Stolee advised[],
the conversion may not be possible in one whole summer, hence, I think an early start might be needed to finish things in time if possible.

As of now(March 21(UTC)), my progress is described by the following commit[].
I have implemented the frontend function(almost) module_summary(). I hope to increase my work speed once I get a hang of the inner working and coding style
of the command.

## Availability

The official GSoC period starts from April 27 and ends on August 17. My vacations start
from May 10 and will be over by July 13. I can easily devote 45-50 hours per week until
the commencement of my Semester. Other than this project, I have no commitments planned
for my vacations. I shall keep the community posted in case of any change in plans.

## Post GSoC

Even after the completion of Google Summer of Code, I plan on continuing my contributions
to Git, on the technical front(in terms of code and documentation contributions) as well
as on the social front(solving people's doubts/problems on the List as well as on StackOverflow).
I vision to convert the remaining of the commands as pointed out by Dscho[]
as well as improve the test files.

I aim to develop mentorship skills as well as the ability to guide others and try to give back to
the community by mentoring and guiding others as well(by reviewing their code, helping them out, etc.)

## Final Remarks

I have a habit of not giving up. I will keep trying things until I succeed at them. Same was my case with learning to use Git in my freshman year. I was so scared of it for some reason that I refrained from using ‘git bash’. But I knew that I had to master this tool(or at least learn it to a satisfactory extent) because of the utility it has in a programmer’s life. I kept going, watching tons of tutorials, reading the documentation and articles and lo, here I am writing code for Git.

I hope that you give me the chance to showcase my abilities by considering my proposal for working with you during the summer of 2020. Really looking forward to learning from you :)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 18:50 [GSoC][RFC][Proposal v5] Convert submodule to builtin Shourya Shukla

Git Mailing List Archive on

Archives are clonable:
	git clone --mirror git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ \
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone