* Solaris cloning woes partly diagnosed @ 2006-04-02 10:41 Junio C Hamano 2006-04-02 18:33 ` Linus Torvalds 2006-04-04 8:47 ` [RFH] Solaris cloning woes Junio C Hamano 0 siblings, 2 replies; 20+ messages in thread From: Junio C Hamano @ 2006-04-02 10:41 UTC (permalink / raw) To: git This is just an interim report, but people might have heard that OpenSolaris team are in the process of choosing a free DSCM and we are one of the candidates. They initially wanted to try 1.2.4 but had trouble using it for local cloning, and the evaluation is being done with 1.2.2. I was on #git tonight with Oejet, and managed to reproduce this problem. The local clone problem seems to disappear if we disable the progress bar in pack-objects. We do two funky things when we have progress bar. We play games with timer signal (setitimer(ITIMER_REAL) and signal(SIGALRM)), and we spit out messages to stderr. It's too late tonight for me to continue digging this, but if somebody with access to a Solaris box is so inclined, I'd really appreciate help on this one. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano @ 2006-04-02 18:33 ` Linus Torvalds 2006-04-02 19:10 ` Jason Riedy ` (2 more replies) 2006-04-04 8:47 ` [RFH] Solaris cloning woes Junio C Hamano 1 sibling, 3 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 18:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, 2 Apr 2006, Junio C Hamano wrote: > > We do two funky things when we have progress bar. We play games > with timer signal (setitimer(ITIMER_REAL) and signal(SIGALRM)), > and we spit out messages to stderr. I'd be willing to bet that it's the fact that we take signals. Suddenly, some system calls will either return -1/EINTR, or they'll return partial reads or writes. We should be pretty good at handling that, but maybe some place forgets. One thing to do might be to make the itimer use a much higher frequency, to trigger the problem more easily. We do, for example, expect that regular file writing not do that. At least "write_sha1_from_fd()" will just do a "write()" without testing the error return, which is bad (it would silently create a truncated object if the /tmp filesystem filled up). If somebody has their filesystem over NFS mounted interruptible, partial writes could also happen. Ho humm. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 18:33 ` Linus Torvalds @ 2006-04-02 19:10 ` Jason Riedy 2006-04-02 19:22 ` Linus Torvalds 2006-04-02 19:18 ` Linus Torvalds 2006-04-04 18:21 ` H. Peter Anvin 2 siblings, 1 reply; 20+ messages in thread From: Jason Riedy @ 2006-04-02 19:10 UTC (permalink / raw) To: git And Linus Torvalds writes: - - I'd be willing to bet that it's the fact that we take signals. Unfortunately, I'm too busy to check into this, but I've run into similar problems in the past. Just takes a busy file server. - We do, for example, expect that regular file writing not do that. At least - "write_sha1_from_fd()" will just do a "write()" without testing the error - return, [...] There is an xwrite in git-compat-util.h... Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 19:10 ` Jason Riedy @ 2006-04-02 19:22 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 19:22 UTC (permalink / raw) To: Jason Riedy; +Cc: git On Sun, 2 Apr 2006, Jason Riedy wrote: > And Linus Torvalds writes: > - > - I'd be willing to bet that it's the fact that we take signals. > > Unfortunately, I'm too busy to check into this, but I've > run into similar problems in the past. Just takes a busy > file server. > > - We do, for example, expect that regular file writing not do that. At least > - "write_sha1_from_fd()" will just do a "write()" without testing the error > - return, [...] > > There is an xwrite in git-compat-util.h... Well, git itself is actually fairly good about these things. Right now I'm seriously suspecting Solaris stdio as being just horribly impolite. git tends to not just use xwrite() in most places, but check the return value for partial sizes etc. I tried to grep for places where we were lazy, and there really seems to be just a very small handful, and they shouldn't impact this case at all (you have to have a seriously broken setup for them to matter, but we should fix them nonetheless. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 18:33 ` Linus Torvalds 2006-04-02 19:10 ` Jason Riedy @ 2006-04-02 19:18 ` Linus Torvalds 2006-04-02 19:52 ` Jason Riedy 2006-04-03 3:06 ` Solaris cloning woes partly diagnosed Linus Torvalds 2006-04-04 18:21 ` H. Peter Anvin 2 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 19:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, 2 Apr 2006, Linus Torvalds wrote: > > Suddenly, some system calls will either return -1/EINTR, or they'll return > partial reads or writes. Hmm. If I read the IRC logs right, the bad pack is still a _valid_ pack, and passes git-verify-pack with flying colors. That certainly implies that we had no problems with write-out: not only must the SHA1 of the resulting file match itself, but it must match the index too, and the number of objects there must match the index. So the only way I see the pack being bad (if it does indeed pass git-verify-pack) is if the object list we generated was bad. However, "-q" only affects git-pack-file itself, not the generation of the list. Which would imply that we have trouble _reading_ the list as it comes in through a pipe. Which is just insane, because we use just a bog-standard "fgets(... stdin)" for that. And no _way_ can stdio have problems with a few SIGALRM's, that would break a lot of other problems. But Oeje1 seems to be saying (in http://pastebin.com/635566): git rev-list --objects --all | git pack-objects pack Generating pack... Done counting 15 objects. Deltifying 15 objects. 100% (15/15) done Writing 15 objects. 100% (15/15) done 806439fdfa5e9990b03f9301bd68e243795fff50 where the result _should_ be 16385 objects, not 15. And the thing is, the _only_ thing we do there is that while (fgets(line, sizeof(line), stdin) != NULL) { ... add_object_entry(sha1, name_hash(NULL, line+41), 0); so it really really looks like fgets() would have problems with a SIGALRM coming in and doesn't just re-try on EINTR. Can Solaris stdio _really_ be that broken? (Yeah, yeah, it may be "conforming". It's also so incredibly programmer-unfriendly that it's not even funny) That would be truly insane. Can somebody with Solaris check what the following patch results in... Linus ---- diff --git a/pack-objects.c b/pack-objects.c index ccfaa5f..daba5de 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -1099,8 +1099,18 @@ int main(int argc, char **argv) fprintf(stderr, "Generating pack...\n"); } - while (fgets(line, sizeof(line), stdin) != NULL) { + for (;;) { unsigned char sha1[20]; + + if (!fgets(line, sizeof(line), stdin)) { + if (feof(stdin)) + break; + if (!ferror(stdin)) + die("fgets returned NULL, not EOF, not error!"); + if (errno == EINTR) + continue; + die("fgets: %s", strerror(errno)); + } if (line[0] == '-') { if (get_sha1_hex(line+1, sha1)) ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 19:18 ` Linus Torvalds @ 2006-04-02 19:52 ` Jason Riedy 2006-04-02 20:28 ` Linus Torvalds 2006-04-03 3:06 ` Solaris cloning woes partly diagnosed Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Jason Riedy @ 2006-04-02 19:52 UTC (permalink / raw) To: git And Linus Torvalds writes: - - so it really really looks like fgets() would have problems with a SIGALRM - coming in and doesn't just re-try on EINTR. Can Solaris stdio _really_ be - that broken? (Yeah, yeah, it may be "conforming". It's also so incredibly - programmer-unfriendly that it's not even funny) Yes, it is that broken. I haven't encountered the problem consistently in git myself, so I can't tell you if the patch works. Google finds similar reports and patches for BOINC, ruby, and a few other projects. Solaris folks will say you should be using sigaction with SA_RESTART. IIRC, SA_RESTART isn't guaranteed to be there or work, but all the systems I deal with right now have it. So an alternate patch for this one use is appended... Other uses of signal could be changed to sigaction, too. And progress_update "should" be sig_atomic_t. Passes the pack-objects tests, but I can't make the problem happen on demand. (I have seen it occur before, but never during make test, and I'd not tracked it down...) Jason ---- diff --git a/pack-objects.c b/pack-objects.c index ccfaa5f..1faa0bb 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -877,10 +877,21 @@ static int try_delta(struct unpacked *cu return 0; } -static void progress_interval(int signum) +static void progress_interval(int); + +static void setup_progress_signal(void) +{ + struct sigaction sa; + sa.sa_handler = progress_interval; + sigemptyset(&sa.sa_mask); + sa.sa_flags = SA_RESTART; + sigaction(SIGALRM, &sa, NULL); +} + +void progress_interval(int signum) { - signal(SIGALRM, progress_interval); progress_update = 1; + setup_progress_signal(); } static void find_deltas(struct object_entry **list, int window, int depth) @@ -1094,7 +1105,7 @@ int main(int argc, char **argv) v.it_interval.tv_sec = 1; v.it_interval.tv_usec = 0; v.it_value = v.it_interval; - signal(SIGALRM, progress_interval); + setup_progress_signal(); setitimer(ITIMER_REAL, &v, NULL); fprintf(stderr, "Generating pack...\n"); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 19:52 ` Jason Riedy @ 2006-04-02 20:28 ` Linus Torvalds 2006-04-02 20:31 ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds 2006-04-02 22:29 ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 20:28 UTC (permalink / raw) To: Jason Riedy; +Cc: git On Sun, 2 Apr 2006, Jason Riedy wrote: > > Solaris folks will say you should be using sigaction with > SA_RESTART. IIRC, SA_RESTART isn't guaranteed to be there > or work, but all the systems I deal with right now have it. I think we might as well do that _too_. However, once you use "sigaction()", you don't need to re-arm the signal handler any more, so I'd suggest a simpler patch like this instead.. Junio, I think this confirms/explains the Solaris breakage. I'll re-send the "anal stdio semantics" version of the patch on top of this in the next email. Linus ---- Subject: Fix Solaris stdio signal handling stupidities This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <torvalds@osdl.org> ---- diff --git a/pack-objects.c b/pack-objects.c index ccfaa5f..1817b58 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -58,7 +58,7 @@ static int nr_objects = 0, nr_alloc = 0, static const char *base_name; static unsigned char pack_file_sha1[20]; static int progress = 1; -static volatile int progress_update = 0; +static volatile sig_atomic_t progress_update = 0; /* * The object names in objects array are hashed with this hashtable, @@ -879,7 +879,6 @@ static int try_delta(struct unpacked *cu static void progress_interval(int signum) { - signal(SIGALRM, progress_interval); progress_update = 1; } @@ -1025,6 +1024,23 @@ static int reuse_cached_pack(unsigned ch return 1; } +static void setup_progress_signal(void) +{ + struct sigaction sa; + struct itimerval v; + + memset(&sa, 0, sizeof(sa)); + sa.sa_handler = progress_interval; + sigemptyset(&sa.sa_mask); + sa.sa_flags = SA_RESTART; + sigaction(SIGALRM, &sa, NULL); + + v.it_interval.tv_sec = 1; + v.it_interval.tv_usec = 0; + v.it_value = v.it_interval; + setitimer(ITIMER_REAL, &v, NULL); +} + int main(int argc, char **argv) { SHA_CTX ctx; @@ -1090,13 +1106,8 @@ int main(int argc, char **argv) prepare_packed_git(); if (progress) { - struct itimerval v; - v.it_interval.tv_sec = 1; - v.it_interval.tv_usec = 0; - v.it_value = v.it_interval; - signal(SIGALRM, progress_interval); - setitimer(ITIMER_REAL, &v, NULL); fprintf(stderr, "Generating pack...\n"); + setup_progress_signal(); } while (fgets(line, sizeof(line), stdin) != NULL) { ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics 2006-04-02 20:28 ` Linus Torvalds @ 2006-04-02 20:31 ` Linus Torvalds 2006-04-02 21:09 ` Junio C Hamano 2006-04-02 22:29 ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 20:31 UTC (permalink / raw) To: Jason Riedy, Junio C Hamano; +Cc: Git Mailing List This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- This is the same one that I already sent out, but re-diffed, and with a proper commit message. Not tested on Solaris. Junio - I think that I forgot to Cc: you on the 1/2 patch, but you'll see it on the git list. diff --git a/pack-objects.c b/pack-objects.c index 1817b58..0ea16ad 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -1110,8 +1110,18 @@ int main(int argc, char **argv) setup_progress_signal(); } - while (fgets(line, sizeof(line), stdin) != NULL) { + for (;;) { unsigned char sha1[20]; + + if (!fgets(line, sizeof(line), stdin)) { + if (feof(stdin)) + break; + if (!ferror(stdin)) + die("fgets returned NULL, not EOF, not error!"); + if (errno == EINTR) + continue; + die("fgets: %s", strerror(errno)); + } if (line[0] == '-') { if (get_sha1_hex(line+1, sha1)) ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics 2006-04-02 20:31 ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds @ 2006-04-02 21:09 ` Junio C Hamano 2006-04-02 21:21 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2006-04-02 21:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > This is the "letter of the law" version of using fgets() properly in the > face of incredibly broken stdio implementations. We can work around the > Solaris breakage with SA_RESTART, but in case anybody else is ever that > stupid, here's the "safe" (read: "insanely anal") way to use fgets. Thanks. It's good that I can say "Oh, I think this is the part that is broken, but I am going to bed" to find the problem solved by capable others when I wake up the next day. Global distributed development process at the finest, although I suspect Jason, Linus and myself are all in the same timezone ;-) Did you mean this as a real change or a demonstration? The sigaction change is a real fix, but somehow I find this one similar to the "(void*) NULL" thing you objected earlier (which was not merged because I agreed with your argument)... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics 2006-04-02 21:09 ` Junio C Hamano @ 2006-04-02 21:21 ` Linus Torvalds 2006-04-02 22:12 ` Jason Riedy 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 21:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, 2 Apr 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > This is the "letter of the law" version of using fgets() properly in the > > face of incredibly broken stdio implementations. We can work around the > > Solaris breakage with SA_RESTART, but in case anybody else is ever that > > stupid, here's the "safe" (read: "insanely anal") way to use fgets. > > Did you mean this as a real change or a demonstration? The > sigaction change is a real fix, but somehow I find this one > similar to the "(void*) NULL" thing you objected earlier (which > was not merged because I agreed with your argument)... I don't have any really strong opinions on it. I think that any libc that needs the "ferror()" test + EINTR loopback is totally broken. I would happily say that people should just not use a development platform that is that horrible. But the fact that Solaris actually had that as a real problem (never mind that we could work around it another way) just makes me go "Hmm..". So I _think_ we're safe with just the "sigaction()" diff. Neither of the patches _should_ make any difference at all on a sane platform. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics 2006-04-02 21:21 ` Linus Torvalds @ 2006-04-02 22:12 ` Jason Riedy 2006-04-02 22:58 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Jason Riedy @ 2006-04-02 22:12 UTC (permalink / raw) To: git And Linus Torvalds writes: - - I don't have any really strong opinions on it. I think that any libc that - needs the "ferror()" test + EINTR loopback is totally broken. I would - happily say that people should just not use a development platform that is - that horrible. If you consider stdio to be a low-level wrapper over syscalls that only adds buffering and simple parsing, then passing EINTR back to the application is a sensible choice. I wouldn't be too surprised if L4, VxWorks, etc. do something similar. - So I _think_ we're safe with just the "sigaction()" diff. Neither of the - patches _should_ make any difference at all on a sane platform. Anyone with an older HP/UX care to try these patches? HP/UX may not be sane, but I think it may lack SA_RESTART. I don't know if stdio calls need restarted, though. Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics 2006-04-02 22:12 ` Jason Riedy @ 2006-04-02 22:58 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-02 22:58 UTC (permalink / raw) To: Jason Riedy; +Cc: git On Sun, 2 Apr 2006, Jason Riedy wrote: > > If you consider stdio to be a low-level wrapper over syscalls > that only adds buffering and simple parsing, then passing EINTR > back to the application is a sensible choice. I wouldn't be > too surprised if L4, VxWorks, etc. do something similar. No, returning EINTR is insane, because stdio has to do re-starting of system calls by hand _anyway_ for the "short read" case. EINTR really _is_ 100% equivalent to "short read of zero bytes" (that literally is what it means for a read/write system call - anybody who tells you otherwise is just crazy). So any library that handles short reads, but doesn't handle EINTR is by definition doing something totally idiotic and broken. Now, I agree that somebody else might be broken too. I would not agree that it's "acceptable". It's craptastically bad library code. > Anyone with an older HP/UX care to try these patches? HP/UX > may not be sane, but I think it may lack SA_RESTART. I don't > know if stdio calls need restarted, though. I'd assume that older HPUX is so BSD-based that all signals end up restarting. That was the BSD signal() semantics, after all. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. 2006-04-02 20:28 ` Linus Torvalds 2006-04-02 20:31 ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds @ 2006-04-02 22:29 ` Jason Riedy 2006-04-03 1:02 ` Linus Torvalds 2006-04-03 4:20 ` Junio C Hamano 1 sibling, 2 replies; 20+ messages in thread From: Jason Riedy @ 2006-04-02 22:29 UTC (permalink / raw) To: Linus Torvalds, Junio C Hamano; +Cc: git Might as well ape the sigaction change in read-tree.c to avoid the same potential problems. The fprintf status output will be overwritten in a second, so don't bother guarding it. Do move the fputc after disabling SIGALRM to ensure we go to the next line, though. Also add a NO_SA_RESTART option in the Makefile in case someone doesn't have SA_RESTART but does restart (maybe older HP/UX?). We want the builder to chose this specifically in case the system both lacks SA_RESTART and does not restart stdio calls; a compat #define in git-compat-utils.h would silently allow broken systems. Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu> --- Makefile | 7 +++++++ read-tree.c | 28 +++++++++++++++++++--------- 2 files changed, 26 insertions(+), 9 deletions(-) 521dc260ea90a23d58a4e920203af5035ca1a673 diff --git a/Makefile b/Makefile index c79d646..f3b1d49 100644 --- a/Makefile +++ b/Makefile @@ -63,6 +63,9 @@ # Define COLLISION_CHECK below if you believe that SHA1's # 1461501637330902918203684832716283019655932542976 hashes do not give you # sufficient guarantee that no collisions between objects will ever happen. +# +# Define NO_SA_RESTART if your platform does not have SA_RESTART, _AND_ if +# it automatically restarts interrupted stdio calls. # Define USE_NSEC below if you want git to care about sub-second file mtimes # and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and @@ -425,6 +428,10 @@ endif ifdef NO_ACCURATE_DIFF ALL_CFLAGS += -DNO_ACCURATE_DIFF +endif + +ifdef NO_SA_RESTART + ALL_CFLAGS += -DSA_RESTART=0 endif # Shell quote (do not use $(call) to accomodate ancient setups); diff --git a/read-tree.c b/read-tree.c index eaff444..4422dbf 100644 --- a/read-tree.c +++ b/read-tree.c @@ -273,10 +273,26 @@ static void progress_interval(int signum) { - signal(SIGALRM, progress_interval); progress_update = 1; } +static void setup_progress_signal(void) +{ + struct sigaction sa; + struct itimerval v; + + memset(&sa, 0, sizeof(sa)); + sa.sa_handler = progress_interval; + sigemptyset(&sa.sa_mask); + sa.sa_flags = SA_RESTART; + sigaction(SIGALRM, &sa, NULL); + + v.it_interval.tv_sec = 1; + v.it_interval.tv_usec = 0; + v.it_value = v.it_interval; + setitimer(ITIMER_REAL, &v, NULL); +} + static void check_updates(struct cache_entry **src, int nr) { static struct checkout state = { @@ -289,8 +305,6 @@ unsigned last_percent = 200, cnt = 0, total = 0; if (update && verbose_update) { - struct itimerval v; - for (total = cnt = 0; cnt < nr; cnt++) { struct cache_entry *ce = src[cnt]; if (!ce->ce_mode || ce->ce_flags & mask) @@ -302,12 +316,8 @@ total = 0; if (total) { - v.it_interval.tv_sec = 1; - v.it_interval.tv_usec = 0; - v.it_value = v.it_interval; - signal(SIGALRM, progress_interval); - setitimer(ITIMER_REAL, &v, NULL); fprintf(stderr, "Checking files out...\n"); + setup_progress_signal(); progress_update = 1; } cnt = 0; @@ -341,8 +351,8 @@ } } if (total) { - fputc('\n', stderr); signal(SIGALRM, SIG_IGN); + fputc('\n', stderr); } } -- 1.3.0.rc1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. 2006-04-02 22:29 ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy @ 2006-04-03 1:02 ` Linus Torvalds 2006-04-03 4:20 ` Junio C Hamano 1 sibling, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-03 1:02 UTC (permalink / raw) To: Jason Riedy; +Cc: Junio C Hamano, git On Sun, 2 Apr 2006, Jason Riedy wrote: > > Might as well ape the sigaction change in read-tree.c to avoid > the same potential problems. Looks good. I didn't realize we had the exact same code duplicated. I guess it's small enough that there isn't a huge win in moving it to some common file.. Does somebody have access to Solaris to verify that this all actually does fix it? I obviously believe it will, since this just explains the symptoms to a tee, but it would still be good to have an actual confirmation by somebody who has access to a Solaris environment. > Also add a NO_SA_RESTART option in the Makefile in case someone > doesn't have SA_RESTART but does restart (maybe older HP/UX?). > We want the builder to chose this specifically in case the > system both lacks SA_RESTART and does not restart stdio calls; > a compat #define in git-compat-utils.h would silently allow > broken systems. I do believe that we already require POSIX.2 functionality (regex, fnmatch, C90 compiler), which implies that git probably wouldn't compile anyway on things that are _really_ ancient. I think SA_RESTART was part of the original POSIX.1 specs, so anybody that doesn't have it is likely to not have a lot of other things we rely on too. There are other SA_* flags that aren't as standard, but I'd expect SA_RESTART to be everywhere (or it likely doesn't have sigaction() at all..). But hey, I certainly don't have really old HP-UX to test either. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. 2006-04-02 22:29 ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy 2006-04-03 1:02 ` Linus Torvalds @ 2006-04-03 4:20 ` Junio C Hamano 2006-04-03 4:40 ` Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2006-04-03 4:20 UTC (permalink / raw) To: Jason Riedy; +Cc: Linus Torvalds, git Jason Riedy <ejr@EECS.Berkeley.EDU> writes: > Also add a NO_SA_RESTART option in the Makefile in case someone > doesn't have SA_RESTART but does restart (maybe older HP/UX?). > We want the builder to chose this specifically in case the > system both lacks SA_RESTART and does not restart stdio calls; > a compat #define in git-compat-utils.h would silently allow > broken systems. What am I missing...? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. 2006-04-03 4:20 ` Junio C Hamano @ 2006-04-03 4:40 ` Linus Torvalds 0 siblings, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-03 4:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jason Riedy, git On Sun, 2 Apr 2006, Junio C Hamano wrote: > Jason Riedy <ejr@EECS.Berkeley.EDU> writes: > > > Also add a NO_SA_RESTART option in the Makefile in case someone > > doesn't have SA_RESTART but does restart (maybe older HP/UX?). > > We want the builder to chose this specifically in case the > > system both lacks SA_RESTART and does not restart stdio calls; > > a compat #define in git-compat-utils.h would silently allow > > broken systems. > > What am I missing...? I don't think this part is worth it at least for now. If there really are systems without SA_RESTART, they probably don't have sigaction() either, so #defining SA_RESTART to zero likely won't help. But somebody can prove me wrong.. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 19:18 ` Linus Torvalds 2006-04-02 19:52 ` Jason Riedy @ 2006-04-03 3:06 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2006-04-03 3:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, 2 Apr 2006, Linus Torvalds wrote: > > - while (fgets(line, sizeof(line), stdin) != NULL) { > + for (;;) { > unsigned char sha1[20]; > + > + if (!fgets(line, sizeof(line), stdin)) { > + if (feof(stdin)) > + break; > + if (!ferror(stdin)) > + die("fgets returned NULL, not EOF, not error!"); > + if (errno == EINTR) > + continue; > + die("fgets: %s", strerror(errno)); While it shouldn't actually matter, I just realized that this incredibly anal sequence isn't actually quite anal enough, sinceit really should have a "clearerr(stdin)" for the continue case when we ignore the EINTR thing. Otherwise, some stdio implementation might just decide to continue to return NULL from fgets(), since the error indicator is technically sticky. I don't know if they do so, but it's entirely possible. So I'd almost suggest something like char *safe_fgets(char *s, int size, FILE *stream) { for (;;) { if (fgets(s, size, stream)) return s; if (feof(stream)) return NULL; if (!ferror(stream)) die("fgets returned NULL, not EOF, not error!"); if (errno != EINTR) die("fgets: %s", strerror(errno)); clearerr(stream); } } which sbould then hopefully work as a sane fgets()-replacement for broken environments. (And then we can just do while (safe_fgets(..)) like normal people again, and not be afraid of strange stdio implementations). Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Solaris cloning woes partly diagnosed 2006-04-02 18:33 ` Linus Torvalds 2006-04-02 19:10 ` Jason Riedy 2006-04-02 19:18 ` Linus Torvalds @ 2006-04-04 18:21 ` H. Peter Anvin 2 siblings, 0 replies; 20+ messages in thread From: H. Peter Anvin @ 2006-04-04 18:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds wrote: > > One thing to do might be to make the itimer use a much higher frequency, > to trigger the problem more easily. > > We do, for example, expect that regular file writing not do that. At least > "write_sha1_from_fd()" will just do a "write()" without testing the error > return, which is bad (it would silently create a truncated object if the > /tmp filesystem filled up). If somebody has their filesystem over NFS > mounted interruptible, partial writes could also happen. > There seems to be a whole bunch of places where we use naked write()s where xwrite or fwrite would be a lot more appropriate. The ssh-* files seem to be particularly offensive in that way. There are also a number of places which call xwrite with the apparent belief that returning short is an error (e.g. blame.c). This as far as I know the more common definition of xwrite(), but is *not* the one used in git -- the one in git only guarantees that at least one character is written. -hpa ^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFH] Solaris cloning woes... 2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano 2006-04-02 18:33 ` Linus Torvalds @ 2006-04-04 8:47 ` Junio C Hamano 2006-04-04 18:53 ` Jason Riedy 1 sibling, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2006-04-04 8:47 UTC (permalink / raw) To: git; +Cc: Peter Eriksen The sigaction() patch cooked by Linus and Jason are in "next", to fix pack-objects breakage started between 1.2.2 and 1.2.3. Once this is confirmed to fix the problem on Solaris boxes, I'd like to include and do an 1.2.5, just in case OpenSolaris folks would want to play with it, and also put them in the "master" branch. I have an access to a not-so-well-maintained Solaris box at work, and built the relevant parts with somewhat stripped down configuration to run the test. The "master" version without the sigaction() patches exhibits the symptom Oejet and I observed the other night, and "next" seems to fix it. I am reasonably happy. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFH] Solaris cloning woes... 2006-04-04 8:47 ` [RFH] Solaris cloning woes Junio C Hamano @ 2006-04-04 18:53 ` Jason Riedy 0 siblings, 0 replies; 20+ messages in thread From: Jason Riedy @ 2006-04-04 18:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Peter Eriksen And Junio C Hamano writes: - Once this is confirmed to fix the problem on Solaris boxes, I'd - like to include and do an 1.2.5, just in case OpenSolaris folks - would want to play with it, and also put them in the "master" - branch. I haven't run into the problem since, but I almost never saw it in the first place. Have you been able to trigger it reliably? Maybe I'm "lucky" that my Sun's so slow... Jason ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-04-04 18:53 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-04-02 10:41 Solaris cloning woes partly diagnosed Junio C Hamano 2006-04-02 18:33 ` Linus Torvalds 2006-04-02 19:10 ` Jason Riedy 2006-04-02 19:22 ` Linus Torvalds 2006-04-02 19:18 ` Linus Torvalds 2006-04-02 19:52 ` Jason Riedy 2006-04-02 20:28 ` Linus Torvalds 2006-04-02 20:31 ` [PATCH 2/2] pack-objects: be incredibly anal about stdio semantics Linus Torvalds 2006-04-02 21:09 ` Junio C Hamano 2006-04-02 21:21 ` Linus Torvalds 2006-04-02 22:12 ` Jason Riedy 2006-04-02 22:58 ` Linus Torvalds 2006-04-02 22:29 ` [PATCH] Use sigaction and SA_RESTART in read-tree.c; add option in Makefile Jason Riedy 2006-04-03 1:02 ` Linus Torvalds 2006-04-03 4:20 ` Junio C Hamano 2006-04-03 4:40 ` Linus Torvalds 2006-04-03 3:06 ` Solaris cloning woes partly diagnosed Linus Torvalds 2006-04-04 18:21 ` H. Peter Anvin 2006-04-04 8:47 ` [RFH] Solaris cloning woes Junio C Hamano 2006-04-04 18:53 ` Jason Riedy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).