* Interrupted system call [not found] <14b3d372-f3fe-c06c-dd56-1d9799a12632@yahoo.de> @ 2020-07-01 9:43 ` R. Diez 2020-07-01 14:22 ` Santiago Torres Arias 2020-07-01 16:21 ` Jeff King 0 siblings, 2 replies; 7+ messages in thread From: R. Diez @ 2020-07-01 9:43 UTC (permalink / raw) To: git Hi all: First of all, many thanks for Git. After a 3-month pause, I recently updated my Ubuntu 18.04.4. I am using a PPA to keep Git more up to date, so I have now "git version 2.27.0". I am now getting this kind of errors: fatal: failed to read object cf965547a433493caa80e84d7a2b78b32a26ee35: Interrupted system call error: unable to mmap /home/rdiez/[blah blah]/SrcRepo.git/objects/2e/f96ffba4c0d60f36c8779758f82752be380689: Interrupted system call I am using a mount point for a network share. Keep in mind that Git thinks it is working on a local directory, so there should be no sockets or non-blocking I/O involved. The problem is probably caused by using SMB to connect to an outdated Windows server. It has been working for years, but at some point in time it is bound to fail. The Linux kernel itself seems to introduce bugs in the SMB/CIFS code every now and then. Nevertheless, I am surprised to get such an "Interrupted system call" from Git. A long time ago I learnt that it is OK for many syscalls to get interrupted, so you have to loop around them. See here for more information: http://250bpm.com/blog:12 As a result, users should never actually get an "Interrupted system call" error from any software, at least when no sockets or non-blocking I/O is involved. How can I pin-point this problem? I would like to know where Git is encountering this error, so that I can troubleshoot it, and maybe report yet another bug to the Linux SMB/CIFS maintainer. Thanks in advance, rdiez ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-01 9:43 ` Interrupted system call R. Diez @ 2020-07-01 14:22 ` Santiago Torres Arias 2020-07-01 16:21 ` Jeff King 1 sibling, 0 replies; 7+ messages in thread From: Santiago Torres Arias @ 2020-07-01 14:22 UTC (permalink / raw) To: R. Diez; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1155 bytes --] Hi, > Nevertheless, I am surprised to get such an "Interrupted system call" from > Git. A long time ago I learnt that it is OK for many syscalls to get > interrupted, so you have to loop around them. See here for more information: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__250bpm.com_blog-3A12&d=DwICaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=yZMPY-APGKyVIX7HgQFZJA&m=JwtG1XJ8aqvchYKsbjW23-PqEl4qm4xuOrYLaF8MOK4&s=k58MMdPdIRPl0kpuTohwZo_3GbW7elvojU1wjTil2GY&e= > > As a result, users should never actually get an "Interrupted system call" > error from any software, at least when no sockets or non-blocking I/O is > involved. I'm not sure if you can blame git right away (it could be an underlying library), and I'm also not convinced that "interrupted system call" is an error that should never exist for users (error handling is generally very nuanced). I'd advice to use GIT_TRACE_FSMONITOR or just GIT_TRACE to figure out what component is the last one in place before things failed. You can read about these on the manpage on the "other" subsection of the "ENVIRONMENT VARIABLES" section. I hope this helps! -Santiago [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-01 9:43 ` Interrupted system call R. Diez 2020-07-01 14:22 ` Santiago Torres Arias @ 2020-07-01 16:21 ` Jeff King 2020-07-02 7:07 ` R. Diez 2020-07-12 8:41 ` R. Diez 1 sibling, 2 replies; 7+ messages in thread From: Jeff King @ 2020-07-01 16:21 UTC (permalink / raw) To: R. Diez; +Cc: git On Wed, Jul 01, 2020 at 11:43:15AM +0200, R. Diez wrote: > After a 3-month pause, I recently updated my Ubuntu 18.04.4. I am > using a PPA to keep Git more up to date, so I have now "git version > 2.27.0". > > I am now getting this kind of errors: > > fatal: failed to read object cf965547a433493caa80e84d7a2b78b32a26ee35: Interrupted system call > > error: unable to mmap /home/rdiez/[blah blah]/SrcRepo.git/objects/2e/f96ffba4c0d60f36c8779758f82752be380689: Interrupted system call > > I am using a mount point for a network share. Keep in mind that Git thinks > it is working on a local directory, so there should be no sockets or > non-blocking I/O involved. Looking at the code, that message is slightly deceptive. It's reporting a failure from map_loose_object_1(), which calls both open() and mmap(), as well as fstat(). It would be interesting to know which syscall is actually failing. Running the failure case under "strace" would be interesting (likewise to see which signal is causing the interruption). > The problem is probably caused by using SMB to connect to an outdated > Windows server. It has been working for years, but at some point in time it > is bound to fail. The Linux kernel itself seems to introduce bugs in the > SMB/CIFS code every now and then. > > Nevertheless, I am surprised to get such an "Interrupted system call" from > Git. A long time ago I learnt that it is OK for many syscalls to get > interrupted, so you have to loop around them. See here for more information: We do check for signals and re-start read() and write() calls as appropriate. We don't for open(), and nobody has ever complained (though it definitely is documented to result in EINTR, I'd imagine it's relatively rare). I'm not excited about the prospect of adding retry code to every open(), though perhaps doing it with our git_open() wrapper would be sufficient (it's unclear how stdio fopen() behaves). > How can I pin-point this problem? I would like to know where Git is > encountering this error, so that I can troubleshoot it, and maybe report yet > another bug to the Linux SMB/CIFS maintainer. I think the first step is using strace to record the system call returning EINTR (and the signal that interrupted it). I suspect it's in open(), though, and probably not a bug: opening network files may take a while and need to be interruptable. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-01 16:21 ` Jeff King @ 2020-07-02 7:07 ` R. Diez 2020-07-15 9:38 ` Jeff King 2020-07-12 8:41 ` R. Diez 1 sibling, 1 reply; 7+ messages in thread From: R. Diez @ 2020-07-02 7:07 UTC (permalink / raw) To: Jeff King; +Cc: git, santiago > [...] > It would be interesting to know which syscall is > actually failing. Running the failure case under "strace" would be > interesting (likewise to see which signal is causing the interruption). > [...] First of all, thanks for your help. GIT_TRACE alone does not tell me anything useful: $ GIT_TRACE=true git fsck 07:58:47.229138 git.c:442 trace: built-in: git fsck error: unable to mmap ./objects/cb/fec04963c1090535d2670b741912e17fd27b27: Interrupted system call error: cbfec04963c1090535d2670b741912e17fd27b27: object corrupt or missing: ./objects/cb/fec04963c1090535d2670b741912e17fd27b27 Checking object directories: 100% (256/256), done. Checking objects: 100% (70229/70229), done. Checking connectivity: 75316, done. missing commit cbfec04963c1090535d2670b741912e17fd27b27 dangling commit 6835e962b227e957520addbc5c28aedc97b253f3 dangling tree a9d1a1321066d8a8402f1c9e584675146d250952 GIT_TRACE_FSMONITOR does not either: $ GIT_TRACE_FSMONITOR=true git fsck error: unable to mmap ./objects/56/af267465e7cdb7ccebe8242e55c03d4b675684: Interrupted system call error: 56af267465e7cdb7ccebe8242e55c03d4b675684: object corrupt or missing: ./objects/56/af267465e7cdb7ccebe8242e55c03d4b675684 Checking object directories: 100% (256/256), done. Checking objects: 100% (70229/70229), done. Checking connectivity: 75666, done. missing tree 56af267465e7cdb7ccebe8242e55c03d4b675684 It is the same Git repository, so it looks like every time a different, random file fails. I managed to make it fail once with: strace -f -- git fsck --progress The signal involved is SIGALRM. I am guessing that Git is setting it up in order to display its progress messages. This is one of the few calls to rt_sigaction(SIGALRM): rt_sigaction(SIGALRM, {sa_handler=0x556c8ac0fe80, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fbdca7da890}, NULL, 8) = 0 This is the first failure: openat(AT_FDCWD, "./objects/11/a327f469cc40015d6d873f6eed328e977c4234", O_RDONLY|O_CLOEXEC) = -1 EINTR (Interrupted system call) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) write(2, "error: unable to mmap ./objects/"..., 99error: unable to mmap ./objects/11/a327f469cc40015d6d873f6eed328e977c4234: Interrupted system call ) = 99 write(2, "error: 11a327f469cc40015d6d873f6"..., 128error: 11a327f469cc40015d6d873f6eed328e977c4234: object corrupt or missing: ./objects/11/a327f469cc40015d6d873f6eed328e977c4234 ) = 128 This is the second one: openat(AT_FDCWD, "./objects/18/5b82729943708795b635899348ecca97aa7804", O_RDONLY|O_CLOEXEC) = -1 EINTR (Interrupted system call) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) write(2, "error: unable to mmap ./objects/"..., 99error: unable to mmap ./objects/18/5b82729943708795b635899348ecca97aa7804: Interrupted system call ) = 99 write(2, "error: 185b82729943708795b635899"..., 128error: 185b82729943708795b635899348ecca97aa7804: object corrupt or missing: ./objects/18/5b82729943708795b635899348ecca97aa7804 ) = 128 There are a few more failures. This is the last one. Afterwards, Git exited: openat(AT_FDCWD, "./objects/f4/56439700761946c57ef467a8a125a80f0304bd", O_RDONLY|O_CLOEXEC) = -1 EINTR (Interrupted system call) --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) openat(AT_FDCWD, "./objects/pack", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 brk(0x556c934af000) = 0x556c934af000 getdents(3, /* 19 entries */, 1048576) = 1272 getdents(3, /* 0 entries */, 1048576) = 0 close(3) = 0 write(2, "fatal: failed to read object f45"..., 95fatal: failed to read object f456439700761946c57ef467a8a125a80f0304bd: Interrupted system call ) = 95 exit_group(128) = ? +++ exited with 128 +++ I am not an expert in Unix signals, but I'll do my best here. I do not understand why Git is getting these interruptions due to SIGALRM, because SA_RESTART is in place. Interestingly, the man page signal(7) does list open() under that flag, but not openat(). The description for open() under SA_RESTART is also interesting: * open(2), if it can block (e.g., when opening a FIFO; see fifo(7)). I am not sure that opening a normal disk file may qualify as "can block" with that definition though. Best regards, rdiez ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-02 7:07 ` R. Diez @ 2020-07-15 9:38 ` Jeff King 2020-07-15 16:06 ` Chris Torek 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2020-07-15 9:38 UTC (permalink / raw) To: R. Diez; +Cc: git, santiago On Thu, Jul 02, 2020 at 09:07:46AM +0200, R. Diez wrote: > I managed to make it fail once with: > > strace -f -- git fsck --progress > > The signal involved is SIGALRM. I am guessing that Git is setting it up in > order to display its progress messages. This is one of the few calls to > rt_sigaction(SIGALRM): > > rt_sigaction(SIGALRM, {sa_handler=0x556c8ac0fe80, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fbdca7da890}, NULL, 8) = 0 That makes sense (and likewise your "--quiet" workaround seems reasonable). > I am not an expert in Unix signals, but I'll do my best here. > > I do not understand why Git is getting these interruptions due to SIGALRM, because SA_RESTART is in place. > > Interestingly, the man page signal(7) does list open() under that flag, but not openat(). Yes, though since open(2) says: The openat() system call operates in exactly the same way as open(), except for the differences described here. I'd expect that would include any SA_RESTART handling. Peeking at the Linux implementation in fs/open.c, it looks like both syscalls quickly end up in the same do_sys_open(). > The description for open() under SA_RESTART is also interesting: > > * open(2), if it can block (e.g., when opening a FIFO; see fifo(7)). > > I am not sure that opening a normal disk file may qualify as "can block" with that definition though. Delivering EINTR on a non-blocking call seems even more confusing, though. I think the "if it can block" is just "you won't even get a signal if it's not blocking". This really _seems_ like a kernel bug, either: - openat() does not get the same SA_RESTART treatment as open(); or - open() on a network file can get EINTR even with SA_RESTART But it's quite possible that I'm missing some corner case or historical reason that it would need to behave the way you're seeing. It might be worth reporting to kernel folks. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-15 9:38 ` Jeff King @ 2020-07-15 16:06 ` Chris Torek 0 siblings, 0 replies; 7+ messages in thread From: Chris Torek @ 2020-07-15 16:06 UTC (permalink / raw) To: Jeff King; +Cc: R. Diez, Git List, santiago On Wed, Jul 15, 2020 at 2:45 AM Jeff King <peff@peff.net> wrote: > On Thu, Jul 02, 2020 at 09:07:46AM +0200, R. Diez wrote: > > I do not understand why Git is getting these interruptions due to SIGALRM, because SA_RESTART is in place. It really shouldn't -- that's the whole point of SA_RESTART. > Delivering EINTR on a non-blocking call seems even more confusing, > though. I think the "if it can block" is just "you won't even get a > signal if it's not blocking". > > This really _seems_ like a kernel bug, either: > > - openat() does not get the same SA_RESTART treatment as open(); or > > - open() on a network file can get EINTR even with SA_RESTART > > But it's quite possible that I'm missing some corner case or historical > reason that it would need to behave the way you're seeing. It might be > worth reporting to kernel folks. > > -Peff Right. This goes way back to pre-v7-Unix signals, as a sort of a side effect of the implementation. In ancient times, the kernel code for the internal wait-for-some-event took a priority number, and anything below a cutoff value meant "not interrupted by signals" while anything above it meant "interrupted by signals". Disk operations were all at PRIBIO which was never interrupted. This is all quite different in modern systems and hence it's all adjustable, but in general we like to distinguish between "operations that will definitely complete fairly quickly" (normally not interrupted) and "operations that might take significant amounts of time" (normally interrupted with the option of restarting the system call). *Restarting*, though, means exactly that: not *resuming*, but *restarting*. So whatever system call is to be interrupted by the signal *must* be one that can simply be started over from the beginning. That means, for instance, that read() or write() can only be restarted if no data have yet moved. So if you're in a read() on a device (e.g., serial port, or tape drive, or whatever) and have gotten a few bytes, but not yet all you wanted, and then the system call is to be interrupted by a signal, the read() must return with a short count. An open() can be restarted on the assumption that no path names have been changed. That's not necessarily a good assumption, but it's traditional. The openat() can be restarted for the same reason (and in fact correct use of openat() can protect against some pathname issues). It's up to the programmer to decide whether to use SA_RESTART, and hence allow this, or not. Chris ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interrupted system call 2020-07-01 16:21 ` Jeff King 2020-07-02 7:07 ` R. Diez @ 2020-07-12 8:41 ` R. Diez 1 sibling, 0 replies; 7+ messages in thread From: R. Diez @ 2020-07-12 8:41 UTC (permalink / raw) To: Jeff King; +Cc: git > fatal: failed to read object cf965547a433493caa80e84d7a2b78b32a26ee35: Interrupted system call > [...] In case anybody else has the same problem and finds this thread in the future, the workaround I am using is to disable progress messages. For "git pull", "git gc" and "git push" the appropriate option is "--quiet", but for "git fsck" it is "--no-progress". Best regards, rdiez ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-15 16:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <14b3d372-f3fe-c06c-dd56-1d9799a12632@yahoo.de> 2020-07-01 9:43 ` Interrupted system call R. Diez 2020-07-01 14:22 ` Santiago Torres Arias 2020-07-01 16:21 ` Jeff King 2020-07-02 7:07 ` R. Diez 2020-07-15 9:38 ` Jeff King 2020-07-15 16:06 ` Chris Torek 2020-07-12 8:41 ` R. Diez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).