git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Solaris cloning woes partly diagnosed
@ 2006-04-02 10:41 Junio C Hamano
  2006-04-02 18:33 ` Linus Torvalds
  2006-04-04  8:47 ` [RFH] Solaris cloning woes Junio C Hamano
  0 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2006-04-02 10:41 UTC (permalink / raw)
  To: git

This is just an interim report, but people might have heard that
OpenSolaris team are in the process of choosing a free DSCM and
we are one of the candidates.  They initially wanted to try
1.2.4 but had trouble using it for local cloning, and the
evaluation is being done with 1.2.2.

I was on #git tonight with Oejet, and managed to reproduce this
problem. The local clone problem seems to disappear if we
disable the progress bar in pack-objects.

We do two funky things when we have progress bar.  We play games
with timer signal (setitimer(ITIMER_REAL) and signal(SIGALRM)),
and we spit out messages to stderr.

It's too late tonight for me to continue digging this, but if
somebody with access to a Solaris box is so inclined, I'd really
appreciate help on this one.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano
@ 2006-04-02 18:33 ` Linus Torvalds
  2006-04-02 19:10   ` Jason Riedy
                     ` (2 more replies)
  2006-04-04  8:47 ` [RFH] Solaris cloning woes Junio C Hamano
  1 sibling, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 2 Apr 2006, Junio C Hamano wrote:
>
> We do two funky things when we have progress bar.  We play games
> with timer signal (setitimer(ITIMER_REAL) and signal(SIGALRM)),
> and we spit out messages to stderr.

I'd be willing to bet that it's the fact that we take signals.

Suddenly, some system calls will either return -1/EINTR, or they'll return 
partial reads or writes. 

We should be pretty good at handling that, but maybe some place forgets.

One thing to do might be to make the itimer use a much higher frequency, 
to trigger the problem more easily.

We do, for example, expect that regular file writing not do that. At least 
"write_sha1_from_fd()" will just do a "write()" without testing the error 
return, which is bad (it would silently create a truncated object if the 
/tmp filesystem filled up). If somebody has their filesystem over NFS 
mounted interruptible, partial writes could also happen.

Ho humm.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 18:33 ` Linus Torvalds
@ 2006-04-02 19:10   ` Jason Riedy
  2006-04-02 19:22     ` Linus Torvalds
  2006-04-02 19:18   ` Linus Torvalds
  2006-04-04 18:21   ` H. Peter Anvin
  2 siblings, 1 reply; 20+ messages in thread
From: Jason Riedy @ 2006-04-02 19:10 UTC (permalink / raw)
  To: git

And Linus Torvalds writes:
 - 
 - I'd be willing to bet that it's the fact that we take signals.

Unfortunately, I'm too busy to check into this, but I've
run into similar problems in the past.  Just takes a busy 
file server.

 - We do, for example, expect that regular file writing not do that. At least 
 - "write_sha1_from_fd()" will just do a "write()" without testing the error 
 - return, [...]

There is an xwrite in git-compat-util.h...

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 18:33 ` Linus Torvalds
  2006-04-02 19:10   ` Jason Riedy
@ 2006-04-02 19:18   ` Linus Torvalds
  2006-04-02 19:52     ` Jason Riedy
  2006-04-03  3:06     ` Solaris cloning woes partly diagnosed Linus Torvalds
  2006-04-04 18:21   ` H. Peter Anvin
  2 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 2 Apr 2006, Linus Torvalds wrote:
> 
> Suddenly, some system calls will either return -1/EINTR, or they'll return 
> partial reads or writes. 

Hmm. If I read the IRC logs right, the bad pack is still a _valid_ pack, 
and passes git-verify-pack with flying colors.

That certainly implies that we had no problems with write-out: not only 
must the SHA1 of the resulting file match itself, but it must match the 
index too, and the number of objects there must match the index.

So the only way I see the pack being bad (if it does indeed pass 
git-verify-pack) is if the object list we generated was bad.

However, "-q" only affects git-pack-file itself, not the generation of the 
list. Which would imply that we have trouble _reading_ the list as it 
comes in through a pipe. Which is just insane, because we use just a 
bog-standard "fgets(... stdin)" for that. And no _way_ can stdio have 
problems with a few SIGALRM's, that would break a lot of other problems.

But Oeje1 seems to be saying (in http://pastebin.com/635566):

	git rev-list --objects --all | git pack-objects pack
	Generating pack...
	Done counting 15 objects.
	Deltifying 15 objects.
	 100% (15/15) done
	Writing 15 objects.
	 100% (15/15) done
	806439fdfa5e9990b03f9301bd68e243795fff50

where the result _should_ be 16385 objects, not 15.

And the thing is, the _only_ thing we do there is that

	while (fgets(line, sizeof(line), stdin) != NULL) {
		...
		add_object_entry(sha1, name_hash(NULL, line+41), 0);

so it really really looks like fgets() would have problems with a SIGALRM 
coming in and doesn't just re-try on EINTR. Can Solaris stdio _really_ be 
that broken? (Yeah, yeah, it may be "conforming". It's also so incredibly 
programmer-unfriendly that it's not even funny)

That would be truly insane. Can somebody with Solaris check what the 
following patch results in...

		Linus

----
diff --git a/pack-objects.c b/pack-objects.c
index ccfaa5f..daba5de 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1099,8 +1099,18 @@ int main(int argc, char **argv)
 		fprintf(stderr, "Generating pack...\n");
 	}
 
-	while (fgets(line, sizeof(line), stdin) != NULL) {
+	for (;;) {
 		unsigned char sha1[20];
+
+		if (!fgets(line, sizeof(line), stdin)) {
+			if (feof(stdin))
+				break;
+			if (!ferror(stdin))
+				die("fgets returned NULL, not EOF, not error!");
+			if (errno == EINTR)
+				continue;
+			die("fgets: %s", strerror(errno));
+		}
 
 		if (line[0] == '-') {
 			if (get_sha1_hex(line+1, sha1))

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 19:10   ` Jason Riedy
@ 2006-04-02 19:22     ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 19:22 UTC (permalink / raw)
  To: Jason Riedy; +Cc: git



On Sun, 2 Apr 2006, Jason Riedy wrote:

> And Linus Torvalds writes:
>  - 
>  - I'd be willing to bet that it's the fact that we take signals.
> 
> Unfortunately, I'm too busy to check into this, but I've
> run into similar problems in the past.  Just takes a busy 
> file server.
> 
>  - We do, for example, expect that regular file writing not do that. At least 
>  - "write_sha1_from_fd()" will just do a "write()" without testing the error 
>  - return, [...]
> 
> There is an xwrite in git-compat-util.h...

Well, git itself is actually fairly good about these things. Right now I'm 
seriously suspecting Solaris stdio as being just horribly impolite.

git tends to not just use xwrite() in most places, but check the return 
value for partial sizes etc. I tried to grep for places where we were 
lazy, and there really seems to be just a very small handful, and they 
shouldn't impact this case at all (you have to have a seriously broken 
setup for them to matter, but we should fix them nonetheless.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 19:18   ` Linus Torvalds
@ 2006-04-02 19:52     ` Jason Riedy
  2006-04-02 20:28       ` Linus Torvalds
  2006-04-03  3:06     ` Solaris cloning woes partly diagnosed Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Jason Riedy @ 2006-04-02 19:52 UTC (permalink / raw)
  To: git

And Linus Torvalds writes:
 - 
 - so it really really looks like fgets() would have problems with a SIGALRM 
 - coming in and doesn't just re-try on EINTR. Can Solaris stdio _really_ be 
 - that broken? (Yeah, yeah, it may be "conforming". It's also so incredibly 
 - programmer-unfriendly that it's not even funny)

Yes, it is that broken.  I haven't encountered the problem 
consistently in git myself, so I can't tell you if the patch 
works.  Google finds similar reports and patches for BOINC, ruby,
and a few other projects.

Solaris folks will say you should be using sigaction with
SA_RESTART.  IIRC, SA_RESTART isn't guaranteed to be there 
or work, but all the systems I deal with right now have it.
So an alternate patch for this one use is appended...  Other
uses of signal could be changed to sigaction, too.  And
progress_update "should" be sig_atomic_t.

Passes the pack-objects tests, but I can't make the problem 
happen on demand.  (I have seen it occur before, but never
during make test, and I'd not tracked it down...)

Jason
----
diff --git a/pack-objects.c b/pack-objects.c
index ccfaa5f..1faa0bb 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -877,10 +877,21 @@ static int try_delta(struct unpacked *cu
 	return 0;
 }
 
-static void progress_interval(int signum)
+static void progress_interval(int);
+
+static void setup_progress_signal(void)
+{
+	struct sigaction sa;
+	sa.sa_handler = progress_interval;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_RESTART;
+	sigaction(SIGALRM, &sa, NULL);
+}
+
+void progress_interval(int signum)
 {
-	signal(SIGALRM, progress_interval);
 	progress_update = 1;
+	setup_progress_signal();
 }
 
 static void find_deltas(struct object_entry **list, int window, int depth)
@@ -1094,7 +1105,7 @@ int main(int argc, char **argv)
 		v.it_interval.tv_sec = 1;
 		v.it_interval.tv_usec = 0;
 		v.it_value = v.it_interval;
-		signal(SIGALRM, progress_interval);
+		setup_progress_signal();
 		setitimer(ITIMER_REAL, &v, NULL);
 		fprintf(stderr, "Generating pack...\n");
 	}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 19:52     ` Jason Riedy
@ 2006-04-02 20:28       ` Linus Torvalds
  2006-04-02 20:31         ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds
  2006-04-02 22:29         ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 20:28 UTC (permalink / raw)
  To: Jason Riedy; +Cc: git



On Sun, 2 Apr 2006, Jason Riedy wrote:
> 
> Solaris folks will say you should be using sigaction with
> SA_RESTART.  IIRC, SA_RESTART isn't guaranteed to be there 
> or work, but all the systems I deal with right now have it.

I think we might as well do that _too_.

However, once you use "sigaction()", you don't need to re-arm the signal 
handler any more, so I'd suggest a simpler patch like this instead..

Junio, I think this confirms/explains the Solaris breakage.

I'll re-send the "anal stdio semantics" version of the patch on top of 
this in the next email.

			Linus
----
Subject: Fix Solaris stdio signal handling stupidities

This uses sigaction() to install the SIGALRM handler with SA_RESTART, so
that Solaris stdio doesn't break completely when a signal interrupts a
read.

Thanks to Jason Riedy for confirming the silly Solaris signal behaviour.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
----

diff --git a/pack-objects.c b/pack-objects.c
index ccfaa5f..1817b58 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -58,7 +58,7 @@ static int nr_objects = 0, nr_alloc = 0,
 static const char *base_name;
 static unsigned char pack_file_sha1[20];
 static int progress = 1;
-static volatile int progress_update = 0;
+static volatile sig_atomic_t progress_update = 0;
 
 /*
  * The object names in objects array are hashed with this hashtable,
@@ -879,7 +879,6 @@ static int try_delta(struct unpacked *cu
 
 static void progress_interval(int signum)
 {
-	signal(SIGALRM, progress_interval);
 	progress_update = 1;
 }
 
@@ -1025,6 +1024,23 @@ static int reuse_cached_pack(unsigned ch
 	return 1;
 }
 
+static void setup_progress_signal(void)
+{
+	struct sigaction sa;
+	struct itimerval v;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = progress_interval;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_RESTART;
+	sigaction(SIGALRM, &sa, NULL);
+
+	v.it_interval.tv_sec = 1;
+	v.it_interval.tv_usec = 0;
+	v.it_value = v.it_interval;
+	setitimer(ITIMER_REAL, &v, NULL);
+}
+
 int main(int argc, char **argv)
 {
 	SHA_CTX ctx;
@@ -1090,13 +1106,8 @@ int main(int argc, char **argv)
 	prepare_packed_git();
 
 	if (progress) {
-		struct itimerval v;
-		v.it_interval.tv_sec = 1;
-		v.it_interval.tv_usec = 0;
-		v.it_value = v.it_interval;
-		signal(SIGALRM, progress_interval);
-		setitimer(ITIMER_REAL, &v, NULL);
 		fprintf(stderr, "Generating pack...\n");
+		setup_progress_signal();
 	}
 
 	while (fgets(line, sizeof(line), stdin) != NULL) {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics
  2006-04-02 20:28       ` Linus Torvalds
@ 2006-04-02 20:31         ` Linus Torvalds
  2006-04-02 21:09           ` Junio C Hamano
  2006-04-02 22:29         ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 20:31 UTC (permalink / raw)
  To: Jason Riedy, Junio C Hamano; +Cc: Git Mailing List


This is the "letter of the law" version of using fgets() properly in the
face of incredibly broken stdio implementations.  We can work around the
Solaris breakage with SA_RESTART, but in case anybody else is ever that
stupid, here's the "safe" (read: "insanely anal") way to use fgets.

It probably goes without saying that I'm not terribly impressed by
Solaris libc.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

This is the same one that I already sent out, but re-diffed, and with a 
proper commit message.

Not tested on Solaris.

Junio - I think that I forgot to Cc: you on the 1/2 patch, but you'll see 
it on the git list.

diff --git a/pack-objects.c b/pack-objects.c
index 1817b58..0ea16ad 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1110,8 +1110,18 @@ int main(int argc, char **argv)
 		setup_progress_signal();
 	}
 
-	while (fgets(line, sizeof(line), stdin) != NULL) {
+	for (;;) {
 		unsigned char sha1[20];
+
+		if (!fgets(line, sizeof(line), stdin)) {
+			if (feof(stdin))
+				break;
+			if (!ferror(stdin))
+				die("fgets returned NULL, not EOF, not error!");
+			if (errno == EINTR)
+				continue;
+			die("fgets: %s", strerror(errno));
+		}
 
 		if (line[0] == '-') {
 			if (get_sha1_hex(line+1, sha1))

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics
  2006-04-02 20:31         ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds
@ 2006-04-02 21:09           ` Junio C Hamano
  2006-04-02 21:21             ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2006-04-02 21:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> This is the "letter of the law" version of using fgets() properly in the
> face of incredibly broken stdio implementations.  We can work around the
> Solaris breakage with SA_RESTART, but in case anybody else is ever that
> stupid, here's the "safe" (read: "insanely anal") way to use fgets.

Thanks.

It's good that I can say "Oh, I think this is the part that is
broken, but I am going to bed" to find the problem solved by
capable others when I wake up the next day.  Global distributed
development process at the finest, although I suspect Jason,
Linus and myself are all in the same timezone ;-)

Did you mean this as a real change or a demonstration?  The
sigaction change is a real fix, but somehow I find this one
similar to the "(void*) NULL" thing you objected earlier (which
was not merged because I agreed with your argument)...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics
  2006-04-02 21:09           ` Junio C Hamano
@ 2006-04-02 21:21             ` Linus Torvalds
  2006-04-02 22:12               ` Jason Riedy
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 21:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 2 Apr 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > This is the "letter of the law" version of using fgets() properly in the
> > face of incredibly broken stdio implementations.  We can work around the
> > Solaris breakage with SA_RESTART, but in case anybody else is ever that
> > stupid, here's the "safe" (read: "insanely anal") way to use fgets.
> 
> Did you mean this as a real change or a demonstration?  The
> sigaction change is a real fix, but somehow I find this one
> similar to the "(void*) NULL" thing you objected earlier (which
> was not merged because I agreed with your argument)...

I don't have any really strong opinions on it. I think that any libc that 
needs the "ferror()" test + EINTR loopback is totally broken. I would 
happily say that people should just not use a development platform that is 
that horrible.

But the fact that Solaris actually had that as a real problem (never mind 
that we could work around it another way) just makes me go "Hmm..".

So I _think_ we're safe with just the "sigaction()" diff.  Neither of the 
patches _should_ make any difference at all on a sane platform. 

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics
  2006-04-02 21:21             ` Linus Torvalds
@ 2006-04-02 22:12               ` Jason Riedy
  2006-04-02 22:58                 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Riedy @ 2006-04-02 22:12 UTC (permalink / raw)
  To: git

And Linus Torvalds writes:
 - 
 - I don't have any really strong opinions on it. I think that any libc that 
 - needs the "ferror()" test + EINTR loopback is totally broken. I would 
 - happily say that people should just not use a development platform that is 
 - that horrible.

If you consider stdio to be a low-level wrapper over syscalls
that only adds buffering and simple parsing, then passing EINTR
back to the application is a sensible choice.  I wouldn't be
too surprised if L4, VxWorks, etc. do something similar.

 - So I _think_ we're safe with just the "sigaction()" diff.  Neither of the 
 - patches _should_ make any difference at all on a sane platform. 

Anyone with an older HP/UX care to try these patches?  HP/UX 
may not be sane, but I think it may lack SA_RESTART.  I don't 
know if stdio calls need restarted, though.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  2006-04-02 20:28       ` Linus Torvalds
  2006-04-02 20:31         ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds
@ 2006-04-02 22:29         ` Jason Riedy
  2006-04-03  1:02           ` Linus Torvalds
  2006-04-03  4:20           ` Junio C Hamano
  1 sibling, 2 replies; 20+ messages in thread
From: Jason Riedy @ 2006-04-02 22:29 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano; +Cc: git

Might as well ape the sigaction change in read-tree.c to avoid
the same potential problems.  The fprintf status output will
be overwritten in a second, so don't bother guarding it.  Do
move the fputc after disabling SIGALRM to ensure we go to the
next line, though.

Also add a NO_SA_RESTART option in the Makefile in case someone
doesn't have SA_RESTART but does restart (maybe older HP/UX?).
We want the builder to chose this specifically in case the
system both lacks SA_RESTART and does not restart stdio calls;
a compat #define in git-compat-utils.h would silently allow
broken systems.

Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>

---

 Makefile    |    7 +++++++
 read-tree.c |   28 +++++++++++++++++++---------
 2 files changed, 26 insertions(+), 9 deletions(-)

521dc260ea90a23d58a4e920203af5035ca1a673
diff --git a/Makefile b/Makefile
index c79d646..f3b1d49 100644
--- a/Makefile
+++ b/Makefile
@@ -63,6 +63,9 @@
 # Define COLLISION_CHECK below if you believe that SHA1's
 # 1461501637330902918203684832716283019655932542976 hashes do not give you
 # sufficient guarantee that no collisions between objects will ever happen.
+#
+# Define NO_SA_RESTART if your platform does not have SA_RESTART, _AND_ if
+# it automatically restarts interrupted stdio calls.
 
 # Define USE_NSEC below if you want git to care about sub-second file mtimes
 # and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
@@ -425,6 +428,10 @@
 endif
 ifdef NO_ACCURATE_DIFF
 	ALL_CFLAGS += -DNO_ACCURATE_DIFF
+endif
+
+ifdef NO_SA_RESTART
+	ALL_CFLAGS += -DSA_RESTART=0
 endif
 
 # Shell quote (do not use $(call) to accomodate ancient setups);
diff --git a/read-tree.c b/read-tree.c
index eaff444..4422dbf 100644
--- a/read-tree.c
+++ b/read-tree.c
@@ -273,10 +273,26 @@
 
 static void progress_interval(int signum)
 {
-	signal(SIGALRM, progress_interval);
 	progress_update = 1;
 }
 
+static void setup_progress_signal(void)
+{
+	struct sigaction sa;
+	struct itimerval v;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.sa_handler = progress_interval;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_RESTART;
+	sigaction(SIGALRM, &sa, NULL);
+
+	v.it_interval.tv_sec = 1;
+	v.it_interval.tv_usec = 0;
+	v.it_value = v.it_interval;
+	setitimer(ITIMER_REAL, &v, NULL);
+}
+
 static void check_updates(struct cache_entry **src, int nr)
 {
 	static struct checkout state = {
@@ -289,8 +305,6 @@
 	unsigned last_percent = 200, cnt = 0, total = 0;
 
 	if (update && verbose_update) {
-		struct itimerval v;
-
 		for (total = cnt = 0; cnt < nr; cnt++) {
 			struct cache_entry *ce = src[cnt];
 			if (!ce->ce_mode || ce->ce_flags & mask)
@@ -302,12 +316,8 @@
 			total = 0;
 
 		if (total) {
-			v.it_interval.tv_sec = 1;
-			v.it_interval.tv_usec = 0;
-			v.it_value = v.it_interval;
-			signal(SIGALRM, progress_interval);
-			setitimer(ITIMER_REAL, &v, NULL);
 			fprintf(stderr, "Checking files out...\n");
+			setup_progress_signal();
 			progress_update = 1;
 		}
 		cnt = 0;
@@ -341,8 +351,8 @@
 		}
 	}
 	if (total) {
-		fputc('\n', stderr);
 		signal(SIGALRM, SIG_IGN);
+		fputc('\n', stderr);
 	}
 }
 
-- 
1.3.0.rc1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics
  2006-04-02 22:12               ` Jason Riedy
@ 2006-04-02 22:58                 ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-02 22:58 UTC (permalink / raw)
  To: Jason Riedy; +Cc: git



On Sun, 2 Apr 2006, Jason Riedy wrote:
> 
> If you consider stdio to be a low-level wrapper over syscalls
> that only adds buffering and simple parsing, then passing EINTR
> back to the application is a sensible choice.  I wouldn't be
> too surprised if L4, VxWorks, etc. do something similar.

No, returning EINTR is insane, because stdio has to do re-starting of 
system calls by hand _anyway_ for the "short read" case. 

EINTR really _is_ 100% equivalent to "short read of zero bytes" (that 
literally is what it means for a read/write system call - anybody who 
tells you otherwise is just crazy). 

So any library that handles short reads, but doesn't handle EINTR is by 
definition doing something totally idiotic and broken.

Now, I agree that somebody else might be broken too. I would not agree 
that it's "acceptable". It's craptastically bad library code.

> Anyone with an older HP/UX care to try these patches?  HP/UX 
> may not be sane, but I think it may lack SA_RESTART.  I don't 
> know if stdio calls need restarted, though.

I'd assume that older HPUX is so BSD-based that all signals end up 
restarting. That was the BSD signal() semantics, after all.

			Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  2006-04-02 22:29         ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy
@ 2006-04-03  1:02           ` Linus Torvalds
  2006-04-03  4:20           ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-03  1:02 UTC (permalink / raw)
  To: Jason Riedy; +Cc: Junio C Hamano, git



On Sun, 2 Apr 2006, Jason Riedy wrote:
>
> Might as well ape the sigaction change in read-tree.c to avoid
> the same potential problems.

Looks good. I didn't realize we had the exact same code duplicated. I 
guess it's small enough that there isn't a huge win in moving it to some 
common file..

Does somebody have access to Solaris to verify that this all actually does 
fix it? I obviously believe it will, since this just explains the symptoms 
to a tee, but it would still be good to have an actual confirmation by 
somebody who has access to a Solaris environment.

> Also add a NO_SA_RESTART option in the Makefile in case someone
> doesn't have SA_RESTART but does restart (maybe older HP/UX?).
> We want the builder to chose this specifically in case the
> system both lacks SA_RESTART and does not restart stdio calls;
> a compat #define in git-compat-utils.h would silently allow
> broken systems.

I do believe that we already require POSIX.2 functionality (regex, 
fnmatch, C90 compiler), which implies that git probably wouldn't compile 
anyway on things that are _really_ ancient. 

I think SA_RESTART was part of the original POSIX.1 specs, so anybody that 
doesn't have it is likely to not have a lot of other things we rely on 
too. There are other SA_* flags that aren't as standard, but I'd expect 
SA_RESTART to be everywhere (or it likely doesn't have sigaction() at 
all..).

But hey, I certainly don't have really old HP-UX to test either.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 19:18   ` Linus Torvalds
  2006-04-02 19:52     ` Jason Riedy
@ 2006-04-03  3:06     ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-03  3:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 2 Apr 2006, Linus Torvalds wrote:
>  
> -	while (fgets(line, sizeof(line), stdin) != NULL) {
> +	for (;;) {
>  		unsigned char sha1[20];
> +
> +		if (!fgets(line, sizeof(line), stdin)) {
> +			if (feof(stdin))
> +				break;
> +			if (!ferror(stdin))
> +				die("fgets returned NULL, not EOF, not error!");
> +			if (errno == EINTR)
> +				continue;
> +			die("fgets: %s", strerror(errno));

While it shouldn't actually matter, I just realized that this incredibly 
anal sequence isn't actually quite anal enough, sinceit really should have 
a "clearerr(stdin)" for the continue case when we ignore the EINTR thing.

Otherwise, some stdio implementation might just decide to continue to 
return NULL from fgets(), since the error indicator is technically sticky. 
I don't know if they do so, but it's entirely possible.

So I'd almost suggest something like

	char *safe_fgets(char *s, int size, FILE *stream)
	{
		for (;;) {
			if (fgets(s, size, stream))
				return s;
			if (feof(stream))
				return NULL;
			if (!ferror(stream))
				die("fgets returned NULL, not EOF, not error!");
			if (errno != EINTR)
				die("fgets: %s", strerror(errno));
			clearerr(stream);
		}
	}

which sbould then hopefully work as a sane fgets()-replacement for broken 
environments.

(And then we can just do

	while (safe_fgets(..))

like normal people again, and not be afraid of strange stdio 
implementations).

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  2006-04-02 22:29         ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy
  2006-04-03  1:02           ` Linus Torvalds
@ 2006-04-03  4:20           ` Junio C Hamano
  2006-04-03  4:40             ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2006-04-03  4:20 UTC (permalink / raw)
  To: Jason Riedy; +Cc: Linus Torvalds, git

Jason Riedy <ejr@EECS.Berkeley.EDU> writes:

> Also add a NO_SA_RESTART option in the Makefile in case someone
> doesn't have SA_RESTART but does restart (maybe older HP/UX?).
> We want the builder to chose this specifically in case the
> system both lacks SA_RESTART and does not restart stdio calls;
> a compat #define in git-compat-utils.h would silently allow
> broken systems.

What am I missing...?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  2006-04-03  4:20           ` Junio C Hamano
@ 2006-04-03  4:40             ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2006-04-03  4:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jason Riedy, git



On Sun, 2 Apr 2006, Junio C Hamano wrote:

> Jason Riedy <ejr@EECS.Berkeley.EDU> writes:
> 
> > Also add a NO_SA_RESTART option in the Makefile in case someone
> > doesn't have SA_RESTART but does restart (maybe older HP/UX?).
> > We want the builder to chose this specifically in case the
> > system both lacks SA_RESTART and does not restart stdio calls;
> > a compat #define in git-compat-utils.h would silently allow
> > broken systems.
> 
> What am I missing...?

I don't think this part is worth it at least for now.

If there really are systems without SA_RESTART, they probably don't have 
sigaction() either, so #defining SA_RESTART to zero likely won't help.

But somebody can prove me wrong..

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFH] Solaris cloning woes...
  2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano
  2006-04-02 18:33 ` Linus Torvalds
@ 2006-04-04  8:47 ` Junio C Hamano
  2006-04-04 18:53   ` Jason Riedy
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2006-04-04  8:47 UTC (permalink / raw)
  To: git; +Cc: Peter Eriksen

The sigaction() patch cooked by Linus and Jason are in "next",
to fix pack-objects breakage started between 1.2.2 and 1.2.3.

Once this is confirmed to fix the problem on Solaris boxes, I'd
like to include and do an 1.2.5, just in case OpenSolaris folks
would want to play with it, and also put them in the "master"
branch.

I have an access to a not-so-well-maintained Solaris box at
work, and built the relevant parts with somewhat stripped down
configuration to run the test.  The "master" version without the
sigaction() patches exhibits the symptom Oejet and I observed
the other night, and "next" seems to fix it.  I am reasonably
happy.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Solaris cloning woes partly diagnosed
  2006-04-02 18:33 ` Linus Torvalds
  2006-04-02 19:10   ` Jason Riedy
  2006-04-02 19:18   ` Linus Torvalds
@ 2006-04-04 18:21   ` H. Peter Anvin
  2 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2006-04-04 18:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds wrote:
> 
> One thing to do might be to make the itimer use a much higher frequency, 
> to trigger the problem more easily.
> 
> We do, for example, expect that regular file writing not do that. At least 
> "write_sha1_from_fd()" will just do a "write()" without testing the error 
> return, which is bad (it would silently create a truncated object if the 
> /tmp filesystem filled up). If somebody has their filesystem over NFS 
> mounted interruptible, partial writes could also happen.
> 

There seems to be a whole bunch of places where we use naked write()s 
where xwrite or fwrite would be a lot more appropriate.  The ssh-* files 
seem to be particularly offensive in that way.

There are also a number of places which call xwrite with the apparent 
belief that returning short is an error (e.g. blame.c).  This as far as 
I know the more common definition of xwrite(), but is *not* the one used 
in git -- the one in git only guarantees that at least one character is 
written.

	-hpa

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFH] Solaris cloning woes...
  2006-04-04  8:47 ` [RFH] Solaris cloning woes Junio C Hamano
@ 2006-04-04 18:53   ` Jason Riedy
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Riedy @ 2006-04-04 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Peter Eriksen

And Junio C Hamano writes:
 - Once this is confirmed to fix the problem on Solaris boxes, I'd
 - like to include and do an 1.2.5, just in case OpenSolaris folks
 - would want to play with it, and also put them in the "master"
 - branch.

I haven't run into the problem since, but I almost never saw
it in the first place.  Have you been able to trigger it
reliably?  Maybe I'm "lucky" that my Sun's so slow...

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-04-04 18:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano
2006-04-02 18:33 ` Linus Torvalds
2006-04-02 19:10   ` Jason Riedy
2006-04-02 19:22     ` Linus Torvalds
2006-04-02 19:18   ` Linus Torvalds
2006-04-02 19:52     ` Jason Riedy
2006-04-02 20:28       ` Linus Torvalds
2006-04-02 20:31         ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds
2006-04-02 21:09           ` Junio C Hamano
2006-04-02 21:21             ` Linus Torvalds
2006-04-02 22:12               ` Jason Riedy
2006-04-02 22:58                 ` Linus Torvalds
2006-04-02 22:29         ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy
2006-04-03  1:02           ` Linus Torvalds
2006-04-03  4:20           ` Junio C Hamano
2006-04-03  4:40             ` Linus Torvalds
2006-04-03  3:06     ` Solaris cloning woes partly diagnosed Linus Torvalds
2006-04-04 18:21   ` H. Peter Anvin
2006-04-04  8:47 ` [RFH] Solaris cloning woes Junio C Hamano
2006-04-04 18:53   ` Jason Riedy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).