All of lore.kernel.org
 help / color / mirror / Atom feed
* git failure on HP-UX
@ 2011-05-05 18:04 Kibler, Bill
  2011-05-05 19:06 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Kibler, Bill @ 2011-05-05 18:04 UTC (permalink / raw)
  To: git; +Cc: Kibler, Bill, Richard Lloyd

An outside HP service provides open source packages compiled to run on HP-UX systems. Our group within HP is currently moving to git with the majority of the git usage on linux systems. However, some older work still needs to be done on HP-UX systems and will require a working git set of tools. The current set of git tools works properly except for accessing gitolite, where ssh access and support is used. When trying to "clone" a gitolite hosted repo over ssh, the HP-UX git session fails with a core dump and "SIGBUS" error.

I went through updating the HP-UX system with all the latest patches and was unable to alter the response in any way. After trying many options it became clear that the problem was inside git. I used wdb/gdb and tusc, where I tracked the failure down to a call to "recv_sideband" in sideband.c. It cores when the call is made. The calling process is "sideband_demux" in builtin/fetch-pack.c. "tusc" output showed a "pipe" of size 8192 being used prior to the core. I believe that git "forks" off a "ssh user@hostanme upload-pack 'repo_name'" that is "piped" back to the git stream process.

In looking at the code, "sideband.h" defines "LARGE_PACKET_MAX 65520" and is related to the passed flag "side-band-64k" as discussed in git document pack-protocol.txt. The current default usage seems to be 64K transfers, yet if we check the "include/limits.h" of HP-UX we see a "PIPE_BUF" set to 8192. Along with the tusc indication of 8K pipe size, I suspect that HP-UX is coring due to git trying to use a 64K pipe when 8K is max.

I solved the probem for now, by changing the file sideband.h to use "LARGE_PACKET_MAX 8208". If you use 8192 or less, you get a failure of too small size in "packet_read_line". So I added 16 bytes to 8192 to get the 8208 value. I noticed the previous value and some comments indicating an 8 to 10 byte overhead was needed and thus a few bytes more is needed in this define.

I suspect the correct process would be making the HP-UX git use/send "side-band" and not "side-band-64k" in the get packet protocol, but I was unable to find out how to do that. The pack-protocol.txt discussion completely ignores this topic and how to handle clients with smaller abilities than servers. It appeared to me that the server set the transfer size and the client is just suppose to accept it. Under HPUX as a client, that is not an option, as it has a 8K max limit.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git failure on HP-UX
  2011-05-05 18:04 git failure on HP-UX Kibler, Bill
@ 2011-05-05 19:06 ` Junio C Hamano
  2011-05-05 20:51   ` Kibler, Bill
  2011-05-06 19:02   ` git failure on HP-UX - more data Kibler, Bill
  0 siblings, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2011-05-05 19:06 UTC (permalink / raw)
  To: Kibler, Bill; +Cc: git, Richard Lloyd

"Kibler, Bill" <bill.kibler@hp.com> writes:

> In looking at the code, "sideband.h" defines "LARGE_PACKET_MAX 65520"
> and is related to the passed flag "side-band-64k" as discussed in git
> document pack-protocol.txt. The current default usage seems to be 64K
> transfers, yet if we check the "include/limits.h" of HP-UX we see a
> "PIPE_BUF" set to 8192. Along with the tusc indication of 8K pipe size,
> I suspect that HP-UX is coring due to git trying to use a 64K pipe when
> 8K is max.
>
> I solved the probem for now, by changing the file sideband.h to use
> "LARGE_PACKET_MAX 8208".

This does not make any sense.  We may make write(2) and read(2) system
calls with 64k (or maybe bit more) chunk, but that does not mean the
implementation of these system calls must take that as a whole.  Your
write(2) is allowed to write only whatever fits your pipe buffer, and tell
the caller "I wrote only 8192 bytes", and the code is supposed to loop,
advancing the write pointer by 8k and calling write(2) again, until you
write everything to whoever is reading the other end of the pipe.  The
same thing for the read(2).

If you can find a place where we make write(2)/read(2) and blindly assumes
that a non-negative return means everything was written/read successfully,
then you have found a bug.

If the symptom _were_ a deadlock where the writer of one pipe expected to
be able to send 64k to the other end of the pipe and then hear back from
the other side with a separate read, I would understand that could happen
(actually we know a local pipe transfer without ssh has that kind of
potential deadlock but I think the size we assume that can fit in the pipe
buffer is far smaller than 8k).  But I do not understand where a SIGBUS
can come from.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git failure on HP-UX
  2011-05-05 19:06 ` Junio C Hamano
@ 2011-05-05 20:51   ` Kibler, Bill
  2011-05-06 19:02   ` git failure on HP-UX - more data Kibler, Bill
  1 sibling, 0 replies; 6+ messages in thread
From: Kibler, Bill @ 2011-05-05 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Richard Lloyd

Let me say first off that my change seems to work, but clearly I feel it
was more a lucky guess on my part than hunting down the actual cause since
I feel it is a combination of git actions and HP-UX libraries. I can't really
debug all of HP-UX libc, so I have to make an educated guess as to what might
be happening. For me, I felt that some mechanism inside of git should be
possible to set on the client side to limit transfer buffer sizes without
a recompile.

Now having said that and had a chance to consider my explanation as stated,
it might be more accurate to say that what I think is happening is closer to
this - as I understand the SIGBUS and other messages around the action,
I think the library call is setting up the pipe buffer as 8k of memory,
while git is assuming(?) a 64K space and returns a pointer to the libc
function that is well beyond - 48K beyond - the size of the buffer.
As I take what is happing, one of these processes is returning a pointer
that points outside approved space and causing the system to fault.
The debug steps all showed the values from git as being reasonable or
what I thought them to be, yet HP-UX faulted when entering the called
function. 

Since I was unable to date to clearly understand all the coding associated with
the fetch-pack process, just running out of time on the project, I was hoping
to get more data from the git email group, that might highlight something I missed
debugging the problem. I clearly was unable to find enough text to explain both
sides of the side-band handshake and how sideband values are used - maybe if
I had more time to understand the code fully, but I don't.

Bill. 

-----Original Message-----
From: Junio C Hamano [mailto:gitster@pobox.com] 
Sent: Thursday, May 05, 2011 12:07 PM
To: Kibler, Bill
Cc: git@vger.kernel.org; Richard Lloyd
Subject: Re: git failure on HP-UX

"Kibler, Bill" <bill.kibler@hp.com> writes:

> In looking at the code, "sideband.h" defines "LARGE_PACKET_MAX 65520"
> and is related to the passed flag "side-band-64k" as discussed in git
> document pack-protocol.txt. The current default usage seems to be 64K
> transfers, yet if we check the "include/limits.h" of HP-UX we see a
> "PIPE_BUF" set to 8192. Along with the tusc indication of 8K pipe size,
> I suspect that HP-UX is coring due to git trying to use a 64K pipe when
> 8K is max.
>
> I solved the probem for now, by changing the file sideband.h to use
> "LARGE_PACKET_MAX 8208".

This does not make any sense.  We may make write(2) and read(2) system
calls with 64k (or maybe bit more) chunk, but that does not mean the
implementation of these system calls must take that as a whole.  Your
write(2) is allowed to write only whatever fits your pipe buffer, and tell
the caller "I wrote only 8192 bytes", and the code is supposed to loop,
advancing the write pointer by 8k and calling write(2) again, until you
write everything to whoever is reading the other end of the pipe.  The
same thing for the read(2).

If you can find a place where we make write(2)/read(2) and blindly assumes
that a non-negative return means everything was written/read successfully,
then you have found a bug.

If the symptom _were_ a deadlock where the writer of one pipe expected to
be able to send 64k to the other end of the pipe and then hear back from
the other side with a separate read, I would understand that could happen
(actually we know a local pipe transfer without ssh has that kind of
potential deadlock but I think the size we assume that can fit in the pipe
buffer is far smaller than 8k).  But I do not understand where a SIGBUS
can come from.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git failure on HP-UX - more data
  2011-05-05 19:06 ` Junio C Hamano
  2011-05-05 20:51   ` Kibler, Bill
@ 2011-05-06 19:02   ` Kibler, Bill
  2011-05-06 20:11     ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Kibler, Bill @ 2011-05-06 19:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Richard Lloyd, kibler, Kibler, Bill

I spent some time thinking about this problem and realized that guessing the
pipe size was the issue probably was wrong, when it is likely something more 
generic in nature. I did some more testing, and remembered a similar problem
when trying to do a git clone of the arm kernel for my $99 netbook article
(see www.kiblerelectronics./corner/ccii.shtml), where it failed much
the same as on HP-UX. My thinking is that this is a variable that is controlled
by the OS/Libc settings and not something that can be selected as a
good enough value. My real problem and reason for putting in a bug report,
is wanting to know how this value was intended to be used.

I just ran several tests on hp-UX using various values for the
"LARGE_PACKET_MAX", all trying to see what the actual failure value is.
I thought it might be related to ssize_t, which in HP-UX is 32K, but
values around 32K all worked fine. 65000 failed, while 48000 worked.
I went through the normal trial and error process and found the value
to vary over a number of tests - in short - not fixed. That suggested
it is partly controlled by some amount of free space(?).

I will test out my arm netbook and see if changing this value helps git
on my system, but for now I feel this value was selected somewhat
arbitrarily when it should be selected as the smallest value that can
work for your OS of choice. I thought of using 32K on HP-UX, but decided
that if I don't really know what the mechanism is controlling this value
then the smallest working value is the safest to use (8208).

Can anyone in this group explain what is going on for me? Why was 64K
selected in the first place?
Thanks.
Bill. 

-----Original Message-----
From: Junio C Hamano [mailto:gitster@pobox.com] 
Sent: Thursday, May 05, 2011 12:07 PM
To: Kibler, Bill
Cc: git@vger.kernel.org; Richard Lloyd
Subject: Re: git failure on HP-UX

"Kibler, Bill" <bill.kibler@hp.com> writes:

> In looking at the code, "sideband.h" defines "LARGE_PACKET_MAX 65520"
> and is related to the passed flag "side-band-64k" as discussed in git
> document pack-protocol.txt. The current default usage seems to be 64K
> transfers, yet if we check the "include/limits.h" of HP-UX we see a
> "PIPE_BUF" set to 8192. Along with the tusc indication of 8K pipe size,
> I suspect that HP-UX is coring due to git trying to use a 64K pipe when
> 8K is max.
>
> I solved the probem for now, by changing the file sideband.h to use
> "LARGE_PACKET_MAX 8208".

This does not make any sense.  We may make write(2) and read(2) system
calls with 64k (or maybe bit more) chunk, but that does not mean the
implementation of these system calls must take that as a whole.  Your
write(2) is allowed to write only whatever fits your pipe buffer, and tell
the caller "I wrote only 8192 bytes", and the code is supposed to loop,
advancing the write pointer by 8k and calling write(2) again, until you
write everything to whoever is reading the other end of the pipe.  The
same thing for the read(2).

If you can find a place where we make write(2)/read(2) and blindly assumes
that a non-negative return means everything was written/read successfully,
then you have found a bug.

If the symptom _were_ a deadlock where the writer of one pipe expected to
be able to send 64k to the other end of the pipe and then hear back from
the other side with a separate read, I would understand that could happen
(actually we know a local pipe transfer without ssh has that kind of
potential deadlock but I think the size we assume that can fit in the pipe
buffer is far smaller than 8k).  But I do not understand where a SIGBUS
can come from.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git failure on HP-UX - more data
  2011-05-06 19:02   ` git failure on HP-UX - more data Kibler, Bill
@ 2011-05-06 20:11     ` Junio C Hamano
  2011-05-06 21:34       ` Kibler, Bill
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2011-05-06 20:11 UTC (permalink / raw)
  To: Kibler, Bill; +Cc: git, Richard Lloyd, kibler

"Kibler, Bill" <bill.kibler@hp.com> writes:

> I just ran several tests on hp-UX using various values for the
> "LARGE_PACKET_MAX", ...

The original sideband protocol had a fixed receiver side buffer that was
only 1000 bytes long.  When we updated the protocol so that we can carry
more payload with a single logical pkt-line format, which has the maximum
packet length of a bit less than 64k (it has a fixed 4 hexadecimal digits
field at the front that indicates its size, so the maximum payload size is
64k minus 4 or something like that), we added a protocol extension that is
negotiated between the server and the client for both sides to make sure
that they have the updated implementation in which the receiver is
prepared to accept a 64k packet, not just a small 1000-byte static buffer.

But all of that is at the logical protocol level.

Even if the transfer goes over the Ethernet, this size is in no way
limited by its MTU of 1500 bytes, because the kernel will take care of
buffering and reassembling for us.

It is the same deal for the pkt-line protocol, where we issue a write(2)
and expect that the system may write less than what we asked it to write,
and return us how many bytes it has actually written. As long as write(2)
correctly returns the number of bytes it wrote, and our code that calls
write(2) correctly expects a short-write and loops until writing
everything out, there is no need to worry about LARGE_PACKET_MAX.

At least, that is the theory.

I think already said this in my previous message to you, but it is
possible that we have a bug in our code that fails to expect write(2) to
result in short-write and loop until we write everything out.  My gut
feeling is that it is slightly more plausible that we have such a bug than
that your libc has a buggy implementation of write(2) that returns a bogus
value (say 64k) when in fact it wrote only what would fit in your pipe
buffer (you said 8k, I think) when asked to write 64k.

And the right thing to do is to find and fix such a bug. I am afraid you
are wasting your time by futzing LARGE_PACKET_MAX. Even if you find a good
small value that happens to work on _your_ machine, it would not be a real
fix for the problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git failure on HP-UX - more data
  2011-05-06 20:11     ` Junio C Hamano
@ 2011-05-06 21:34       ` Kibler, Bill
  0 siblings, 0 replies; 6+ messages in thread
From: Kibler, Bill @ 2011-05-06 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Richard Lloyd, kibler

Thanks for the response, it was very informative. I understand now
that the "side-band" values really are more of "what version" of
the protocol can you run, than an actual "use this size" handshake.

I think it might help to say again, that on HP-UX I ran several tests
of cloning using other processes, and they always worked correctly.
I even did some tusc/traces which showed multiple pipe transfers of
8K much as you would expected for your analysis of what is suppose
to happen. It is only when doing this under the ssh pipe process
that the problem occurs.

I did look, but not closely at the code, to see what might be handled
differently between the ssh and say a http transfer as it related to
pack or sideband routines. I feel if we can just determine what really 
is different using ssh we would be pretty close to the problem.

I am making it clear in my notes, that this fix is hopefully only a
temporary fix for what is still really an unknown problem.


-----Original Message-----
From: Junio C Hamano [mailto:gitster@pobox.com] 
Sent: Friday, May 06, 2011 1:12 PM
To: Kibler, Bill
Cc: git@vger.kernel.org; Richard Lloyd; kibler@psyber.com
Subject: Re: git failure on HP-UX - more data

"Kibler, Bill" <bill.kibler@hp.com> writes:

> I just ran several tests on hp-UX using various values for the
> "LARGE_PACKET_MAX", ...

The original sideband protocol had a fixed receiver side buffer that was
only 1000 bytes long.  When we updated the protocol so that we can carry
more payload with a single logical pkt-line format, which has the maximum
packet length of a bit less than 64k (it has a fixed 4 hexadecimal digits
field at the front that indicates its size, so the maximum payload size is
64k minus 4 or something like that), we added a protocol extension that is
negotiated between the server and the client for both sides to make sure
that they have the updated implementation in which the receiver is
prepared to accept a 64k packet, not just a small 1000-byte static buffer.

But all of that is at the logical protocol level.

Even if the transfer goes over the Ethernet, this size is in no way
limited by its MTU of 1500 bytes, because the kernel will take care of
buffering and reassembling for us.

It is the same deal for the pkt-line protocol, where we issue a write(2)
and expect that the system may write less than what we asked it to write,
and return us how many bytes it has actually written. As long as write(2)
correctly returns the number of bytes it wrote, and our code that calls
write(2) correctly expects a short-write and loops until writing
everything out, there is no need to worry about LARGE_PACKET_MAX.

At least, that is the theory.

I think already said this in my previous message to you, but it is
possible that we have a bug in our code that fails to expect write(2) to
result in short-write and loop until we write everything out.  My gut
feeling is that it is slightly more plausible that we have such a bug than
that your libc has a buggy implementation of write(2) that returns a bogus
value (say 64k) when in fact it wrote only what would fit in your pipe
buffer (you said 8k, I think) when asked to write 64k.

And the right thing to do is to find and fix such a bug. I am afraid you
are wasting your time by futzing LARGE_PACKET_MAX. Even if you find a good
small value that happens to work on _your_ machine, it would not be a real
fix for the problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-06 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-05 18:04 git failure on HP-UX Kibler, Bill
2011-05-05 19:06 ` Junio C Hamano
2011-05-05 20:51   ` Kibler, Bill
2011-05-06 19:02   ` git failure on HP-UX - more data Kibler, Bill
2011-05-06 20:11     ` Junio C Hamano
2011-05-06 21:34       ` Kibler, Bill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.