git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git smart protocol via WebSockets - feedback wanted
@ 2012-06-05 18:20 Stephan Peijnik
  2012-06-05 18:31 ` Junio C Hamano
  2012-06-05 18:36 ` Shawn Pearce
  0 siblings, 2 replies; 7+ messages in thread
From: Stephan Peijnik @ 2012-06-05 18:20 UTC (permalink / raw)
  To: git

Dear list,

Since I have been working on a proof of concept showing that git's smart 
protocol can be tunneled via WebSocket connections quite easily [0] I 
wanted to ask for some feedback on the idea in general and on my 
implementation [1].

So, basically, what do you think about tunneling git's smart protocol 
via WebSockets (and thus HTTP)?
Do you believe that a full implementation of this, as opposed to the 
current proof of concept, is worthwhile?
Are there any issues with this approach I missed (apart from the missing 
authentication/authorization in the server component of course)?

Thanks in advance for your input.

Regards,

Stephan

[0] 
http://blog.sp.or.at/2012/06/git-smart-protocol-via-websockets-proof.html
[1] https://github.com/speijnik/gitws

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik
@ 2012-06-05 18:31 ` Junio C Hamano
  2012-06-05 18:41   ` Stephan Peijnik
  2012-06-05 18:36 ` Shawn Pearce
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-06-05 18:31 UTC (permalink / raw)
  To: Stephan Peijnik; +Cc: git

How does this compare with the smart-http support that tunnels the
git protocol over http (with some butchering)?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik
  2012-06-05 18:31 ` Junio C Hamano
@ 2012-06-05 18:36 ` Shawn Pearce
  1 sibling, 0 replies; 7+ messages in thread
From: Shawn Pearce @ 2012-06-05 18:36 UTC (permalink / raw)
  To: Stephan Peijnik; +Cc: git

On Tue, Jun 5, 2012 at 11:20 AM, Stephan Peijnik <stephan@peijnik.at> wrote:
> Since I have been working on a proof of concept showing that git's smart
> protocol can be tunneled via WebSocket connections quite easily [0] I wanted
> to ask for some feedback on the idea in general and on my implementation
> [1].
>
> So, basically, what do you think about tunneling git's smart protocol via
> WebSockets (and thus HTTP)?
...
> [0]
> http://blog.sp.or.at/2012/06/git-smart-protocol-via-websockets-proof.html
> [1] https://github.com/speijnik/gitws

How does this compare with the smart HTTP protocol that has been
supported since Git 1.6.6, and uses the git-http-backend CGI at the
server side?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 18:31 ` Junio C Hamano
@ 2012-06-05 18:41   ` Stephan Peijnik
  2012-06-05 18:54     ` Shawn Pearce
  0 siblings, 1 reply; 7+ messages in thread
From: Stephan Peijnik @ 2012-06-05 18:41 UTC (permalink / raw)
  To: git

On 06/05/2012 08:31 PM, Junio C Hamano wrote:
> How does this compare with the smart-http support that tunnels the
> git protocol over http (with some butchering)?

To be honest, I didn't know smart-http support yet. Is that the approach 
introduced with git 1.6.6?

If so, that approach uses multiple POST requests, meaning multiple TCP 
and HTTP connections need to be established, multiple requests 
processed, etc.

The WebSocket approach uses a single HTTP connection which gets upgraded 
to a WebSocket. This WebSocket then allows the same communication to 
happen as with the ssh implementation.

So in comparison there is possibly a lot less overhead and, in theory, 
the performance should be comparable to running the smart protocol over 
ssh. Personally I'd say the WebSocket approach is cleaner than the 
HTTP-POST approach.

-- Stephan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 18:41   ` Stephan Peijnik
@ 2012-06-05 18:54     ` Shawn Pearce
  2012-06-05 19:28       ` Stephan Peijnik
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn Pearce @ 2012-06-05 18:54 UTC (permalink / raw)
  To: Stephan Peijnik; +Cc: git

On Tue, Jun 5, 2012 at 11:41 AM, Stephan Peijnik <stephan@peijnik.at> wrote:
> On 06/05/2012 08:31 PM, Junio C Hamano wrote:
>>
>> How does this compare with the smart-http support that tunnels the
>> git protocol over http (with some butchering)?
>
>
> To be honest, I didn't know smart-http support yet. Is that the approach
> introduced with git 1.6.6?

Yes. So its been around for a while now. Like 2 years.

> If so, that approach uses multiple POST requests, meaning multiple TCP and
> HTTP connections need to be established, multiple requests processed, etc.

Its actually only one TCP connection... assuming the servers in
between the client and the Git endpoint correctly support HTTP
keep-alive semantics.

> The WebSocket approach uses a single HTTP connection which gets upgraded to
> a WebSocket. This WebSocket then allows the same communication to happen as
> with the ssh implementation.

How does this fair going through crappy proxy servers that perform
man-in-the-middle attacks on SSL connections? Just last week I was
trying to help someone whose local proxy server was MITM the SSL
session behind Git's back, and their IT department forgot to install
the proxy server's certificate into the system certificate directory.
They only installed it into the browser. That proxy also doesn't
correctly grok HTTP 1.1 keep-alive with chunked transfer encodings.
Let alone something as new as web sockets.

> So in comparison there is possibly a lot less overhead and, in theory, the
> performance should be comparable to running the smart protocol over ssh.
> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST
> approach.

This may be true. But its also a lot more complex to implement. I
noticed you reused Python code to help make this work. Let me know
when there is a GPLv2 client library that implements sufficient
semantics for WebSockets that Git can bundle it out of the box. And
let me know when most corporate IT proxy servers correctly grok
WebSockets. I suspect it will be many more years given that they still
can't even grok chunked transfer encoding.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 18:54     ` Shawn Pearce
@ 2012-06-05 19:28       ` Stephan Peijnik
  2012-06-05 21:11         ` Shawn Pearce
  0 siblings, 1 reply; 7+ messages in thread
From: Stephan Peijnik @ 2012-06-05 19:28 UTC (permalink / raw)
  To: git

On 06/05/2012 08:54 PM, Shawn Pearce wrote:
> On Tue, Jun 5, 2012 at 11:41 AM, Stephan Peijnik<stephan@peijnik.at>  wrote:
>> To be honest, I didn't know smart-http support yet. Is that the approach
>> introduced with git 1.6.6?
>
> Yes. So its been around for a while now. Like 2 years.

I have just read-up on that. My fault.

>> If so, that approach uses multiple POST requests, meaning multiple TCP and
>> HTTP connections need to be established, multiple requests processed, etc.
>
> Its actually only one TCP connection... assuming the servers in
> between the client and the Git endpoint correctly support HTTP
> keep-alive semantics.

With keep-alive that is true, but a quick check on the actual data 
exchange tells me that multiple HTTP requests are still needed. But I 
guess the overhead caused by a second HTTP requests can be ignored.

> How does this fair going through crappy proxy servers that perform
> man-in-the-middle attacks on SSL connections? Just last week I was
> trying to help someone whose local proxy server was MITM the SSL
> session behind Git's back, and their IT department forgot to install
> the proxy server's certificate into the system certificate directory.
> They only installed it into the browser. That proxy also doesn't
> correctly grok HTTP 1.1 keep-alive with chunked transfer encodings.
> Let alone something as new as web sockets.

Proxy servers could be an issue, yes. For proxy servers not acting as 
MITM and which are supporting CONNECT this shouldn't be an issue though.
Also, given the current HTML5 hype things should get better in the 
future, but you are correct about potential current issues with the 
approach.

>> So in comparison there is possibly a lot less overhead and, in theory, the
>> performance should be comparable to running the smart protocol over ssh.
>> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST
>> approach.
>
> This may be true. But its also a lot more complex to implement. I
> noticed you reused Python code to help make this work.

The only reason I used Python is that I wanted to quickly come up with a 
prototype. I am also aware of the fact that a proper implementation 
should possibly be done in C.

> Let me know when there is a GPLv2 client library that implements sufficient
> semantics for WebSockets that Git can bundle it out of the box.

As for the WebSocket client library that is GPLv2 compatible: there is 
at least libwebsockets [0], which is licensed under the terms of the 
LGPL v2.1, and as such GPLv2 only compatible.
What do you think about using this as basis for a proper implementation?

> And let me know when most corporate IT proxy servers correctly grok
> WebSockets. I suspect it will be many more years given that they still
> can't even grok chunked transfer encoding.

As stated above, this could be a problem, yes.
The question is whether one only wants to provide an alternative 
approach when it is usable for everyone.
My intention never was to have the current http implementation, be it 
the dumb or http-backend one, replaced. The idea here was to provide an 
additional option that makes use of a fairly new technology, with all 
benefits and drawbacks of using something new.

Thanks for your feedback.

-- Stephan

[0] http://git.warmcat.com/cgi-bin/cgit/libwebsockets/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git smart protocol via WebSockets - feedback wanted
  2012-06-05 19:28       ` Stephan Peijnik
@ 2012-06-05 21:11         ` Shawn Pearce
  0 siblings, 0 replies; 7+ messages in thread
From: Shawn Pearce @ 2012-06-05 21:11 UTC (permalink / raw)
  To: Stephan Peijnik; +Cc: git

On Tue, Jun 5, 2012 at 12:28 PM, Stephan Peijnik <stephan@peijnik.at> wrote:
> On 06/05/2012 08:54 PM, Shawn Pearce wrote:
>>
>> Its actually only one TCP connection... assuming the servers in
>> between the client and the Git endpoint correctly support HTTP
>> keep-alive semantics.
>
> With keep-alive that is true, but a quick check on the actual data exchange
> tells me that multiple HTTP requests are still needed. But I guess the
> overhead caused by a second HTTP requests can be ignored.

There is extra overhead from the HTTP request headers, this is true.
Fortunately its relatively small and bounded.

There isn't that much additional latency in the smart HTTP protocol.
Where the client is waiting on data from the server is where we end an
HTTP request and start a new one. The mulit_ack capability on normal
TCP or SSH connections does get to interleave a bit more in the native
protocol to try and hide the RTT latency. I don't know that anyone has
done extensive testing to determine how effective that is vs. the
batch sizes we run in the HTTP POST format. With the key part being
how quickly the overall negotiation exchange went for the end-user.

Colby Ranger's recent contribution to contrib/persistent-https
provides a local proxy server for Git over HTTP that tries to reuse
HTTP connections across Git command invocations. This can go further
than even git:// does with TCP connection reuse, cutting latency. Of
course a user can do the same thing with their own local HTTP proxy,
but persistent-https can be easier to install and configure.

>> How does this fair going through crappy proxy servers that perform
>> man-in-the-middle attacks on SSL connections? Just last week I was
>> trying to help someone whose local proxy server was MITM the SSL
>> session behind Git's back, and their IT department forgot to install
>> the proxy server's certificate into the system certificate directory.
>> They only installed it into the browser. That proxy also doesn't
>> correctly grok HTTP 1.1 keep-alive with chunked transfer encodings.
>> Let alone something as new as web sockets.
>
> Proxy servers could be an issue, yes. For proxy servers not acting as MITM
> and which are supporting CONNECT this shouldn't be an issue though.

I am still annoyed by the failure of "Expect: 100-continue". The
original smart HTTP protocol used this during push to try and avoid
sending a 100 MiB POST payload before finding out authentication is
required and failing. It turns out far too many HTTP servers and proxy
servers do not correctly implement 100-continue to rely on it in the
protocol, so we had to back that out and use a special POST body with
4 bytes to "probe" the remote server before sending the full payload.

100-continue is in RFC 2616, dated June 1999. My calendar says June
2012. So 13 years later and we still cannot rely on 100-continue as
specified by RFC 2616 working correctly on the public Internet.

Chunked Transfer-Encoding is described in RFC 2038, dated Jan 1997 and
is the RFC that RFC 2616 made obsolete. This is still not working
reliable everywhere... 15 years after being specified proxy servers
are still converting chunked transfer encoding to "Connection: close"
and destroying any HTTP keep-alive that might have been possible.

Basically I learned a lot in the past 2 years deploying Git on a
rather broad scale with HTTP. The public Internet doesn't resemble the
standards enough, and you really have to code to the lowest common
denominator, because there is some user out there that matters to you
who is stuck behind some HTTP proxy that implements the HTTP standard
as it existed in 1995 and whose managers/suppliers refuse to bring
forward to the current century.

> Also, given the current HTML5 hype things should get better in the future,
> but you are correct about potential current issues with the approach.

WebSockets seems pretty full of fail to me.

The protocol specification is really complex. Its reimplementing TCP
on top of HTTP to work around an artificial browser imposed limitation
on the number of suggested HTTP connections the browser opens to the
server. Meanwhile SPDY goes the other direction and tries to support
multiplexing a larger number of HTTP requests into a single TCP
connection, while reusing a lot of header data across those requests.
I have higher hopes for SPDY adoption than for WebSockets. SPDY solves
a lot of common problems on the Internet that social networking sites
care about, like time to load assets for a game, or that publishers
care about, like time to load all assets for a site on initial visit,
increasing the chances the user doesn't immediately jump away due to
perceived high loading time.

WebSockets is a large amount of wanking to make playing a game written
in JavaScript easier, using a very ugly protocol, and a much more
complex software stack. I think the WebSockets authors saw the problem
of HTTP connections in a browser and solved it the wrong way. They saw
the bidirectional stream problem... and solved it for a very narrow
use case. SPDY also relies on a bidirectional stream, but it lets the
server do more, like suggest pushing assets down ahead of the browser
realizing it needs them.

>>> So in comparison there is possibly a lot less overhead and, in theory,
>>> the
>>> performance should be comparable to running the smart protocol over ssh.
>>> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST
>>> approach.
>>
>> This may be true. But its also a lot more complex to implement. I
>> noticed you reused Python code to help make this work.
>
> The only reason I used Python is that I wanted to quickly come up with a
> prototype. I am also aware of the fact that a proper implementation should
> possibly be done in C.

Any implementation of a new embedding of the Git protocol into e.g.
WebSockets also requires implementing it in Java for JGit, both client
and server, and probably also in Python for Dulwich, again client and
server. Given that WebSockets is all about cramming TCP into HTTP in
SSL in TCP, and doesn't always work on the public Internet given the
current state of proxy servers, I just don't see the value in this.

We have to provide complete support across the major Git
implementations, otherwise users will come to this mailing list and
complain about how implementation $X cannot talk to server running $Y
because server owner $Z only configured the new fangled WebSockets
protocol. And I am simply too lazy to write a procmail script to
direct all such inquiries to your address.

>> Let me know when there is a GPLv2 client library that implements
>> sufficient
>> semantics for WebSockets that Git can bundle it out of the box.
>
> As for the WebSocket client library that is GPLv2 compatible: there is at
> least libwebsockets [0], which is licensed under the terms of the LGPL v2.1,
> and as such GPLv2 only compatible.

OK, yay. Someone actually bothered to implement this?

>> And let me know when most corporate IT proxy servers correctly grok
>> WebSockets. I suspect it will be many more years given that they still
>> can't even grok chunked transfer encoding.
>
> As stated above, this could be a problem, yes.
> The question is whether one only wants to provide an alternative approach
> when it is usable for everyone.

I predict WebSockets will be usable by everyone about... never. Its
too complex of a standard, and too narrow of a corner case. We are
talking about proxy servers that can't do 100-continue correctly,
because async network IO was too hard for them to code. Those authors
and their software will never support WebSockets' bidirectional
requirement. WebSockets isn't critical to browse the web. It never
will be. 100-continue might be useful with form based file uploads,
which are at least 100x more common on the web than a WebSocket
powered thing. Seriously. Call me when WebSockets is actually working.
I'd like to come back and see what the world looks like in 2152.

> My intention never was to have the current http implementation, be it the
> dumb or http-backend one, replaced. The idea here was to provide an
> additional option that makes use of a fairly new technology, with all
> benefits and drawbacks of using something new.

You are free to develop your own remote helper that does this. But I
don't expect any Git distribution or implementation to be supporting
it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-05 21:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-05 18:20 git smart protocol via WebSockets - feedback wanted Stephan Peijnik
2012-06-05 18:31 ` Junio C Hamano
2012-06-05 18:41   ` Stephan Peijnik
2012-06-05 18:54     ` Shawn Pearce
2012-06-05 19:28       ` Stephan Peijnik
2012-06-05 21:11         ` Shawn Pearce
2012-06-05 18:36 ` Shawn Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).