All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable
@ 2020-05-04 13:31 Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 1/6] fixup! Add reftable library Johannes Schindelin via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

These patches are not intended to be complete, not by any stretch of
imagination. They are just enough to get the CI run to pass, even in the
Windows-specific parts.

As I mentioned elsewhere, I would much prefer for hn/reftable to not 
re-invent get_be*(), struct strbuf, struct string_list, struct lock_file 
etc.

However, in the context of the test failures, I do not think that this would
have made a big difference. Apart from the unportable constructs, and from
the "delete/rename while there is still a handle on the file" issues, it
would appear that one big reason why it was so hard to debug and fix the
test is the recursive complexity of the data structures.

To elaborate on that: struct reftable_stack has an attribute called merged 
that has an array of struct reftable_reader * (confusingly called "stack").
Each of these readers has an attribute of type struct reftable_block_source 
that uses a vtable and an opaque pointer to abstract three types of block
sources: file, slice (which is apparently unused, i.e. it is apparently just
dead weight for now) and malloc.

I am not sure that this abstraction fest serves us well here.

Quite honestly, I would have expected the packed_git data structure to be
imitated more closely, as it strikes me as similar in purpose, and it has
seen a ton of testing over the past decade and a half. I could not recognize
that design in the reftable, though.

It is quite obvious, of course, that the code tries to imitate the
object-oriented nature of the Go code, but it seems quite obvious from my
difficulties to address the problem where stack_compact_range() would try to
delete stale reftable files (without even so much as a warning when the
files could not be deleted!) without releasing all file handles to all
reftable files, even the ones that do not need to be deleted. To be smarter
about this, the code has to know more about the nature of the block source
than the abstraction layer suggests. It has to know that a block source
refers to a file, and that that file is marked for deletion. My heavy-handed
work-around, even if it works, is not really a good solution, but it is a
testament that there is a lot of opportunity to simplify the code
drastically while at the same time simplifying the design, too.

I know you have been putting a lot of effort into this library, so I feel a
bit bad about saying the following: The hn/reftable patches will need
substantial work before we can merge it with confidence. Part of the reason
why it is so hard to review the reftable patches is that they intentionally 
refuse to integrate well within Git's source code, such as (re-)implementing
its own fundamental data structures, intentionally using a totally different
coding style, and it concerns me that the stated requirement for bug fixes
is to treat Git's source code as a downstream of the actual project. I am
not too bad a C developer and would consider myself competent in debugging
issues, even hard ones, in Git, and yet... it was really hard to wade
through the layers of abstraction to figure out where the file handles
should be closed that were opened and prevented deleting/renaming files.

At this point, I don't feel that it makes sense to keep insisting on having
this in a separate library. The only other user of that library will be
libgit2, and I really think that given libgit2's heritage, it won't be a
problem to adapt the code after it stabilized in git.git (and since libgit2
treats git.git as upstream for the libxdiff code, it won't be a problem to
do the same for the reftable code, too). I believe that the best course of
action is to reuse the data structures libgit.a provides, and to delete the
re-implementations in reftable/. Only then can we start working effectively
on refactoring the code to simplify the data structures in order to clarify
resource usage (which was the root cause for the bugs I fixed, although I am
sure that there are way more lurking in there, hidden by the fact that the
code is not covered thoroughly by our tests).

Johannes Schindelin (6):
  fixup! Add reftable library
  fixup! Add reftable library
  fixup! Add reftable library
  fixup! Add reftable library
  fixup! Add reftable library
  vcxproj: adjust for the reftable changes

 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 reftable/record.c                          |  2 +-
 reftable/stack.c                           | 17 +++++++++++++++--
 4 files changed, 27 insertions(+), 5 deletions(-)


base-commit: 1b749f9d62f2c8b5a8e91f382d2be14902459cba
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-623%2Fdscho%2Freftable-on-windows-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-623/dscho/reftable-on-windows-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/623
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] fixup! Add reftable library
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 2/6] " Johannes Schindelin via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Close files before trying to unlink them. This is actually required on
Windows.

Note: this is probably not the only part of the code that requires this
type of fix, but it seems to be enough to let the _current_ version of
t0031 pass.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/stack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/reftable/stack.c b/reftable/stack.c
index e7b625d924a..2e32f7671c2 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -852,6 +852,10 @@ static int stack_compact_range(struct reftable_stack *st, int first, int last,
 		}
 	}
 
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
 	err = unlink(slice_as_string(&lock_file_name));
 	if (err < 0) {
 		goto exit;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] fixup! Add reftable library
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 1/6] fixup! Add reftable library Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 3/6] " Johannes Schindelin via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Do close the file descriptors to obsolete files before trying to delete
them. This is actually required on Windows.

Note: this patch is probably a bit heavy-handed, releasing more than
necessary (which will then have to be re-read). But it seems to be
enough to let t0031 pass on Windows, so it is at least a clue as to what
the fix will look like eventually.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/stack.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/reftable/stack.c b/reftable/stack.c
index 2e32f7671c2..3c5f4a43130 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -941,6 +941,10 @@ static int stack_compact_range(struct reftable_stack *st, int first, int last,
 	}
 	have_lock = false;
 
+	reftable_merged_table_close(st->merged);
+	merged_table_clear(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
 	{
 		char **p = delete_on_success;
 		while (*p) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] fixup! Add reftable library
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 1/6] fixup! Add reftable library Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 2/6] " Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 4/6] " Johannes Schindelin via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

A `void` function cannot return a value.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/record.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reftable/record.c b/reftable/record.c
index b0f18f26c55..5f48be1639e 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -1005,7 +1005,7 @@ int record_decode(struct record rec, struct slice key, byte extra,
 
 void record_clear(struct record rec)
 {
-	return rec.ops->clear(rec.data);
+	rec.ops->clear(rec.data);
 }
 
 bool record_is_deletion(struct record rec)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] fixup! Add reftable library
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-05-04 13:31 ` [PATCH 3/6] " Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 5/6] " Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

`usleep()` is unportable. We need to find a way _not_ to use it.

For Visual C, we can use `sleep_millisec()`, but the current design of
libreftable seems to be _really_ keen _not_ to depend on anything in
libgit.a.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/stack.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/reftable/stack.c b/reftable/stack.c
index 3c5f4a43130..2d0b831dda1 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -289,7 +289,12 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
 		free_names(names_after);
 
 		delay = delay + (delay * rand()) / RAND_MAX + 100;
+#ifdef _MSC_VER
+		sleep_millisec(delay/1000);
+#else
+		/* TODO! fix this, `usleep()` is not portable enough for us to use */
 		usleep(delay);
+#endif
 	}
 
 	return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] fixup! Add reftable library
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-05-04 13:31 ` [PATCH 4/6] " Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 13:31 ` [PATCH 6/6] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

Yet another instance of `= {}` initialization.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/stack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/stack.c b/reftable/stack.c
index 2d0b831dda1..2f3dfd51861 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -19,7 +19,7 @@ int reftable_new_stack(struct reftable_stack **dest, const char *dir,
 {
 	struct reftable_stack *p =
 		reftable_calloc(sizeof(struct reftable_stack));
-	struct slice list_file_name = {};
+	struct slice list_file_name = { 0 };
 	int err = 0;
 
 	if (config.hash_id == 0) {
@@ -417,7 +417,7 @@ static int reftable_stack_init_addition(struct reftable_addition *add,
 void reftable_addition_close(struct reftable_addition *add)
 {
 	int i = 0;
-	struct slice nm = {};
+	struct slice nm = { 0 };
 	for (i = 0; i < add->new_tables_len; i++) {
 		slice_set_string(&nm, add->stack->list_file);
 		slice_append_string(&nm, "/");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] vcxproj: adjust for the reftable changes
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-05-04 13:31 ` [PATCH 5/6] " Johannes Schindelin via GitGitGadget
@ 2020-05-04 13:31 ` Johannes Schindelin via GitGitGadget
  2020-05-04 17:12   ` Junio C Hamano
  2020-05-04 15:56 ` [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Junio C Hamano
  2020-05-04 17:04 ` Han-Wen Nienhuys
  7 siblings, 1 reply; 12+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 13:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index 0ab8e009383..8a01a0da3f1 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -700,7 +700,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-05-04 13:31 ` [PATCH 6/6] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-04 15:56 ` Junio C Hamano
  2020-05-05  8:02   ` Johannes Schindelin
  2020-05-04 17:04 ` Han-Wen Nienhuys
  7 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2020-05-04 15:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Han-Wen Nienhuys, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> These patches are not intended to be complete, not by any stretch of
> imagination. They are just enough to get the CI run to pass, even in the
> Windows-specific parts.

Thanks.  Now I know at least somebody has been paying attention to
call for help in the past few issues of the "What's cooking" report ;-) 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable
  2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-05-04 15:56 ` [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Junio C Hamano
@ 2020-05-04 17:04 ` Han-Wen Nienhuys
  2020-05-05  8:00   ` Johannes Schindelin
  7 siblings, 1 reply; 12+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-04 17:04 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Mon, May 4, 2020 at 3:31 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> These patches are not intended to be complete, not by any stretch of
> imagination. They are just enough to get the CI run to pass, even in the
> Windows-specific parts.

Thanks for the investigation. I had begun to setup a Windows
installation, but you beat me to it.

Here are the two fixes as I applied them.

 https://github.com/google/reftable/commit/201f21aa39f993c970ef4d19122fedd5b91e716d
 https://github.com/google/reftable/commit/18b041472306b31813dbd6408f6a61daaba60013

> As I mentioned elsewhere, I would much prefer for hn/reftable to not
> re-invent get_be*(), struct strbuf, struct string_list, struct lock_file
> etc.
>
> However, in the context of the test failures, I do not think that this would
> have made a big difference. Apart from the unportable constructs, and from
> the "delete/rename while there is still a handle on the file" issues, it
> would appear that one big reason why it was so hard to debug and fix the
> test is the recursive complexity of the data structures.
>
> To elaborate on that: struct reftable_stack has an attribute called merged
> that has an array of struct reftable_reader * (confusingly called "stack").
> Each of these readers has an attribute of type struct reftable_block_source
> that uses a vtable and an opaque pointer to abstract three types of block
> sources: file, slice (which is apparently unused, i.e. it is apparently just
> dead weight for now) and malloc.

All of the unittests use slices to read and write single reftables in-memory.

The malloc block source is used to swap out a block coming from the
file directly, with zlib  uncompressed data (log blocks are zlib
compressed). I think there should be a mmap block source too, at some
point.

> I am not sure that this abstraction fest serves us well here.
>
> Quite honestly, I would have expected the packed_git data structure to be
> imitated more closely, as it strikes me as similar in purpose, and it has
> seen a ton of testing over the past decade and a half. I could not recognize
> that design in the reftable, though.
>
> It is quite obvious, of course, that the code tries to imitate the
> object-oriented nature of the Go code, but it seems quite obvious from my
> difficulties to address the problem where stack_compact_range() would try to
> delete stale reftable files (without even so much as a warning when the
> files could not be deleted!) without releasing all file handles to all
> reftable files, even the ones that do not need to be deleted. To be smarter
> about this, the code has to know more about the nature of the block source
> than the abstraction layer suggests. It has to know that a block source
> refers to a file, and that that file is marked for deletion. My heavy-handed
> work-around, even if it works, is not really a good solution, but it is a
> testament that there is a lot of opportunity to simplify the code
> drastically while at the same time simplifying the design, too.

If the code tries to delete a file while it is open, that is a
testament to the fact that I haven't written code for Windows for many
years now. It is not a testament to any fundamental problems with the
library design.

The library which you are complaining of weighs in at about 7500 lines
of code (excluding tests), which compares pretty well to the original
JGit code which is now ~6500 lines of code. I don't think there is
room to simplify it further, and I say this as a person who has
significant experience with this format.

If you wish to prove me wrong, you can send me patch. Until then, I
don't buy your arguments.

> I know you have been putting a lot of effort into this library, so I feel a
> bit bad about saying the following: The hn/reftable patches will need
> substantial work before we can merge it with confidence. Part of the reason
> why it is so hard to review the reftable patches is that they intentionally
> refuse to integrate well within Git's source code, such as (re-)implementing
> its own fundamental data structures, intentionally using a totally different
> coding style, and it concerns me that the stated requirement for bug fixes
> is to treat Git's source code as a downstream of the actual project. I am
> not too bad a C developer and would consider myself competent in debugging
> issues, even hard ones, in Git, and yet... it was really hard to wade
> through the layers of abstraction to figure out where the file handles
> should be closed that were opened and prevented deleting/renaming files.
>
> At this point, I don't feel that it makes sense to keep insisting on having
> this in a separate library. The only other user of that library will be
> libgit2, and I really think that given libgit2's heritage, it won't be a
> problem to adapt the code after it stabilized in git.git (and since libgit2
> treats git.git as upstream for the libxdiff code, it won't be a problem to
> do the same for the reftable code, too). I believe that the best course of
> action is to reuse the data structures libgit.a provides, and to delete the
> re-implementations in reftable/. Only then can we start working effectively
> on refactoring the code to simplify the data structures in order to clarify
> resource usage (which was the root cause for the bugs I fixed, although I am
> sure that there are way more lurking in there, hidden by the fact that the
> code is not covered thoroughly by our tests).

I'm happy to change the library to use more common primitives that are
shared between libgit2 and git. Could you point them out to me? Note
that the basics that you are complaining of (put_be*, strbuf vs slice
routines, etc.) constitute around 700 lines of code. It's not going to
make an appreciable difference in complexity.

Here is my counterpoint to your proposal:

Reftable was developed outside of git-core. If you feel this series
with its giant patch is too much to review, you can have a look at
the incremental story of how it came to be, both in the history and
code reviews of the JGit project, and in the commit history of
https://github.com/google/reftable.

The idea to put the code into git-core, and especially your proposed
(but unsubstantiated) plans to "simplify" its design, make me very
worried about interoperability with the JGit reference implementation.

Could you explain to me how you would qualify a reftable-in-git
library against the JGit implementation?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] vcxproj: adjust for the reftable changes
  2020-05-04 13:31 ` [PATCH 6/6] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-04 17:12   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2020-05-04 17:12 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Han-Wen Nienhuys, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> This allows Git to be compiled via Visual Studio again after integrating
> the `hn/reftable` branch.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  config.mak.uname                           |  2 +-
>  contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/config.mak.uname b/config.mak.uname
> index 0ab8e009383..8a01a0da3f1 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -700,7 +700,7 @@ vcxproj:
>  	# Make .vcxproj files and add them
>  	unset QUIET_GEN QUIET_BUILT_IN; \
>  	perl contrib/buildsystems/generate -g Vcxproj
> -	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
> +	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
>  
>  	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
>  	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
> diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
> index 5c666f9ac03..33a08d31652 100644
> --- a/contrib/buildsystems/Generators/Vcxproj.pm
> +++ b/contrib/buildsystems/Generators/Vcxproj.pm
> @@ -77,7 +77,7 @@ sub createProject {
>      my $libs_release = "\n    ";
>      my $libs_debug = "\n    ";
>      if (!$static_library) {
> -      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
> +      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
>        $libs_debug = $libs_release;
>        $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
>        $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
> @@ -231,6 +231,7 @@ sub createProject {
>  EOM
>      if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
>        my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
> +      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
>        my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
>  
>        print F << "EOM";
> @@ -240,6 +241,14 @@ sub createProject {
>        <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
>      </ProjectReference>
>  EOM
> +      if (!($name =~ /xdiff|libreftable/)) {

Wow.  That is messy.  I find it somewhat funny that this is not just
libreftable, i.e. "don't include/refer to libreftable in itself",
but as long as it works, I won't complain ;-)

Thanks for helping the tip of 'pu' moving forward.

> +        print F << "EOM";
> +    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
> +      <Project>$uuid_libreftable</Project>
> +      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
> +    </ProjectReference>
> +EOM
> +      }
>        if (!($name =~ 'xdiff')) {
>          print F << "EOM";
>      <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable
  2020-05-04 17:04 ` Han-Wen Nienhuys
@ 2020-05-05  8:00   ` Johannes Schindelin
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin @ 2020-05-05  8:00 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Han-Wen,

On Mon, 4 May 2020, Han-Wen Nienhuys wrote:

> On Mon, May 4, 2020 at 3:31 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > These patches are not intended to be complete, not by any stretch of
> > imagination. They are just enough to get the CI run to pass, even in the
> > Windows-specific parts.
>
> Thanks for the investigation. I had begun to setup a Windows
> installation, but you beat me to it.
>
> Here are the two fixes as I applied them.
>
>  https://github.com/google/reftable/commit/201f21aa39f993c970ef4d19122fedd5b91e716d
>  https://github.com/google/reftable/commit/18b041472306b31813dbd6408f6a61daaba60013

If you ever needed _any_ kind of proof that the current setup with that
monster patch and a totally separate "upstream" does simply _not_ work, it
is these two commits.

_Every_ bug fix originating from git.git would have to manually port the
fixes because of the vendor copy, just like you did, losing all provenance
apart from a meager "Thanks to" (which mentions just the bug report
forgetting about the actually-tested patch).

But I don't get the impression that my objections are heard, so I might
just as well stop talking about it. Please do not misinterpret my silence
about your patches as any indication that I am happy with `hn/reftable`: I
am not. Should Junio accept them without drastic changes (which I hope he
will not), I expect this to turn into a complete maintenance nightmare.

Besides, it is a bit disappointing to see only those two patches. I made
very clear in my patch series that I got the _distinct_ impression that
those two bug fixes were just the tip of the iceberg: there are most
likely a lot more bugs waiting to be discovered where a file is not closed
when it is no longer needed (or related: memory not being released as soon
as it can be released). I expect _many_ more similar issues to lurk in the
code base.

> > As I mentioned elsewhere, I would much prefer for hn/reftable to not
> > re-invent get_be*(), struct strbuf, struct string_list, struct lock_file
> > etc.
> >
> > However, in the context of the test failures, I do not think that this would
> > have made a big difference. Apart from the unportable constructs, and from
> > the "delete/rename while there is still a handle on the file" issues, it
> > would appear that one big reason why it was so hard to debug and fix the
> > test is the recursive complexity of the data structures.
> >
> > To elaborate on that: struct reftable_stack has an attribute called merged
> > that has an array of struct reftable_reader * (confusingly called "stack").
> > Each of these readers has an attribute of type struct reftable_block_source
> > that uses a vtable and an opaque pointer to abstract three types of block
> > sources: file, slice (which is apparently unused, i.e. it is apparently just
> > dead weight for now) and malloc.
>
> All of the unittests use slices to read and write single reftables
> in-memory.
>
>
> The malloc block source is used to swap out a block coming from the
> file directly, with zlib  uncompressed data (log blocks are zlib
> compressed). I think there should be a mmap block source too, at some
> point.

If that was intended to convince me that I am wrong in my assessment of
the unnecessary complexity of libreftable, I am afraid it did not.

Only code should be included in the patch adding libreftable to git.git
which is actually needed in git.git. Everything above that is just
unnecessary complexity.

> > I am not sure that this abstraction fest serves us well here.
> >
> > Quite honestly, I would have expected the packed_git data structure to be
> > imitated more closely, as it strikes me as similar in purpose, and it has
> > seen a ton of testing over the past decade and a half. I could not recognize
> > that design in the reftable, though.
> >
> > It is quite obvious, of course, that the code tries to imitate the
> > object-oriented nature of the Go code, but it seems quite obvious from my
> > difficulties to address the problem where stack_compact_range() would try to
> > delete stale reftable files (without even so much as a warning when the
> > files could not be deleted!) without releasing all file handles to all
> > reftable files, even the ones that do not need to be deleted. To be smarter
> > about this, the code has to know more about the nature of the block source
> > than the abstraction layer suggests. It has to know that a block source
> > refers to a file, and that that file is marked for deletion. My heavy-handed
> > work-around, even if it works, is not really a good solution, but it is a
> > testament that there is a lot of opportunity to simplify the code
> > drastically while at the same time simplifying the design, too.
>
> If the code tries to delete a file while it is open, that is a
> testament to the fact that I haven't written code for Windows for many
> years now. It is not a testament to any fundamental problems with the
> library design.

To the contrary. The library design neglects taking good care of
resources (such as closing file descriptors when they are clearly no
longer needed). C code _has_ to be careful with resources.

By the way, I am not really trying to convince you of this. That would be
the wrong direction: you have to convince _me_, the reviewer, that your
patches do the right thing, in the right way. So far, I am unconvinced.

Even worse: my debugging session two days ago convinced me that there _is_
unnecessary complexity, and unclear resource management. Based on how much
experience I have with debugging and developing in unfamiliar code bases,
this means quite a bit. I am not a junior intern who gets easily
overwhelmed by code they haven't seen before.

When I point out the unnecessary complexity of having a vtable for a mere
two "subclasses", of course, as an open source contributor you are free to
dismiss this without even so much as trying to refute it. Just like the
Git project is free to reject contributions when the reviewer comments are
not taken seriously.

> The library which you are complaining of weighs in at about 7500 lines
> of code (excluding tests), which compares pretty well to the original
> JGit code which is now ~6500 lines of code. I don't think there is
> room to simplify it further, and I say this as a person who has
> significant experience with this format.
>
> If you wish to prove me wrong, you can send me patch. Until then, I
> don't buy your arguments.

Again, it is not so much a question whether _you_ buy _my_ arguments (and
at the same time, it is somewhat shocking that you dismiss that it took me
two hours to figure out a single bug, just because the code and the data
structures are unnecessarily complex/abstract).

It is more a question whether you can convince the reviewers that your
code is correct.

In short: I am not selling anything here that you need to buy. You are
trying to sell me something, and I do not like the shape, and I like even
less how easily you seem to brush aside my explanations how this would need
to be improved to make it viable in Git's source code.

> > I know you have been putting a lot of effort into this library, so I feel a
> > bit bad about saying the following: The hn/reftable patches will need
> > substantial work before we can merge it with confidence. Part of the reason
> > why it is so hard to review the reftable patches is that they intentionally
> > refuse to integrate well within Git's source code, such as (re-)implementing
> > its own fundamental data structures, intentionally using a totally different
> > coding style, and it concerns me that the stated requirement for bug fixes
> > is to treat Git's source code as a downstream of the actual project. I am
> > not too bad a C developer and would consider myself competent in debugging
> > issues, even hard ones, in Git, and yet... it was really hard to wade
> > through the layers of abstraction to figure out where the file handles
> > should be closed that were opened and prevented deleting/renaming files.
> >
> > At this point, I don't feel that it makes sense to keep insisting on having
> > this in a separate library. The only other user of that library will be
> > libgit2, and I really think that given libgit2's heritage, it won't be a
> > problem to adapt the code after it stabilized in git.git (and since libgit2
> > treats git.git as upstream for the libxdiff code, it won't be a problem to
> > do the same for the reftable code, too). I believe that the best course of
> > action is to reuse the data structures libgit.a provides, and to delete the
> > re-implementations in reftable/. Only then can we start working effectively
> > on refactoring the code to simplify the data structures in order to clarify
> > resource usage (which was the root cause for the bugs I fixed, although I am
> > sure that there are way more lurking in there, hidden by the fact that the
> > code is not covered thoroughly by our tests).
>
> I'm happy to change the library to use more common primitives that are
> shared between libgit2 and git.

I never talked about primitives that are shared between libgit2. I talked
about data structures and helpers that already exist in Git's source code.

And I said that I am fairly convinced that it won't be hard to adapt the
code to libgit2 after it enters git.git.

I am sorry if I did not make myself clear.

It is a bit unreasonable to reject a review on the basis that git.git
should accept cruft that _might_ be required to support libgit2 better. It
would be very unreasonable to accept a patch series that demands to
include code in git.git that it does not need just because a separate
project might need that code.

And if you truly care about libgit2 support, you will include the libgit2
developers in the development of the reftable support, not throw a
nominally finalized library at them that might, or might not integrate
well into libgit2.

> Could you point them out to me? Note that the basics that you are
> complaining of (put_be*, strbuf vs slice routines, etc.) constitute
> around 700 lines of code. It's not going to make an appreciable
> difference in complexity.

Was I unclear? This additional code _makes it harder to review the monster
patch_.

The complexity _will_ need to be reduced. In addition.

But that cannot be the first step: in order to integrate reasonably well
with Git's source code, it will _have_ to stop duplicating code. You
cannot possibly deny that you reimplemented your own string manipulation
library in libreftable. Such code adds to the maintenance burden, and in
this instance (and in other, similar instances) for no good reason at all
because libgit.a already has code for that.

Without integrating the code better in git.git, there is no sense to even
start reducing the complexity. Both will be necessary.

> Here is my counterpoint to your proposal:
>
> Reftable was developed outside of git-core.

That was your decision, not mine. I merely pointed out that it would have
been wiser to develop it within.

And I offered that it is undesirable to take `hn/reftable` as long as it
looks very much like it _wants_ to stay out of core Git. I mean, what's
the point of accepting code into our code base that does not really want
to live there?

> If you feel this series with its giant patch is too much to review, you
> can have a look at the incremental story of how it came to be, both in
> the history and code reviews of the JGit project, and in the commit
> history of
> https://github.com/google/reftable.

So now you want to place _even more_ of a burden on reviewers? Seriously?

> The idea to put the code into git-core, and especially your proposed
> (but unsubstantiated) plans to "simplify" its design, make me very
> worried about interoperability with the JGit reference implementation.

The JGit reference implementation is written in Java. In an
object-oriented language with its own idiosyncracies demanded by the
design of said language (the lengths to which Shawn had to go to make it
performant are recorded in the commit history).

It will be better to have a clean-room implementation in C that does not
suffer from transpiling artifacts.

About ways to simplify its design: I mentioned the layers of abstraction.
I mentioned the design of `struct packed_git`, which could (and should)
serve as an example to follow.

If those points weren't ignored, I would mention other suggestions, such
as how the code builds file contents via `slice_append_string()` only to
write them out in one go, when the contents could have been written much
more simply via `fprintf()` instead. And that is just _one_ more example
how complexity could be reduced, very, very easily. There's much more.
Would there be a point of me writing these up? So far, I don't get that
impression. All I get is "I don't buy this". That is unnecessarily
frustrating: code review is already thankless (as much as it is
necessary!) as it is.

> Could you explain to me how you would qualify a reftable-in-git
> library against the JGit implementation?

Of course: by a comprehensive set of unit/regression tests, including
pre-generated data that the implementation must read and parse correctly.

The fact that keeping the code close to its non-C origins resulted in
long-undetected segmentation faults makes me rather certain the idea of
keeping the code close to said reference implementation is overrated: it
does not build the confidence in the code that you think it does. Not
after discovering those segmentation faults, it doesn't.

At the end of the day, a comprehensive set of unit/regression tests will
be required for implementing reftable support in other projects, anyway,
in particular for projects that cannot, or do not want to, rely on your
libreftable implementation, such as isomorphic-git or Dulwich.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable
  2020-05-04 15:56 ` [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Junio C Hamano
@ 2020-05-05  8:02   ` Johannes Schindelin
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin @ 2020-05-05  8:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Han-Wen Nienhuys

Hi Junio,

On Mon, 4 May 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > These patches are not intended to be complete, not by any stretch of
> > imagination. They are just enough to get the CI run to pass, even in
> > the Windows-specific parts.
>
> Thanks.

You're welcome!

> Now I know at least somebody has been paying attention to call for help
> in the past few issues of the "What's cooking" report ;-)

Yes, I saw those requests for assistance, and I would have gladly jumped
in earlier, if my time had allowed.

Let's see whether my efforts bear fruit...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-05 15:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 13:31 [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 1/6] fixup! Add reftable library Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 2/6] " Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 3/6] " Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 4/6] " Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 5/6] " Johannes Schindelin via GitGitGadget
2020-05-04 13:31 ` [PATCH 6/6] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-04 17:12   ` Junio C Hamano
2020-05-04 15:56 ` [PATCH 0/6] Minimal patch series to fix the CI runs of hn/reftable Junio C Hamano
2020-05-05  8:02   ` Johannes Schindelin
2020-05-04 17:04 ` Han-Wen Nienhuys
2020-05-05  8:00   ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.