linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@redhat.com>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: e0206@foo21.com, linux-kernel@vger.kernel.org, kuznet@ms2.inr.ac.ru
Subject: Re: [Patch][RFC] epoll and half closed TCP connections
Date: Sat, 12 Jul 2003 22:24:57 -0700	[thread overview]
Message-ID: <20030712222457.3d132897.davem@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.55.0307121256200.4720@bigblue.dev.mcafeelabs.com>

On Sat, 12 Jul 2003 13:01:21 -0700 (PDT)
Davide Libenzi <davidel@xmailserver.org> wrote:

> 
> [Cc:ing DaveM ]

[Cc:ing Alexey :-) ]

Alexey, they seem to want to add some kind of POLLRDHUP thing,
comments wrt. TCP and elsewhere in the networking?  See below...

> On Sat, 12 Jul 2003, Eric Varsanyi wrote:
> 
> > I'm proposing adding a new POLL event type (POLLRDHUP) as way to solve
> > a new race introduced by having an edge triggered event mechanism
> > (epoll). The problem occurs when a client writes data and then does a
> > write side shutdown(). The server (using epoll) sees only one event for
> > the read data ready and the read EOF condition and has no way to tell
> > that an EOF occurred.
> >
> > -Eric Varsanyi
> >
> > Details
> > -----------
> > 	- remote sends data and does a shutdown
> > 	   [ we see a data bearing packet and FIN from client on the wire ]
> >
> > 	- user mode server gets around to doing accept() and registers
> > 	  for EPOLLIN events (along with HUP and ERR which are forced on)
> >
> > 	- epoll_wait() returns a single EPOLLIN event on the FD representing
> > 	  both the 1/2 shutdown state and data available
> >
> > At this point there is no way the app can tell if there is a half closed
> > connection so it may issue a close() back to the client after writing
> > results. Normally the server would distinguish these events by assuming
> > EOF if it got a read ready indication and the first read returned 0 bytes,
> > or would issue read calls until less data was returned than was asked for.
> >
> > In a level triggered world this all just works because the read ready
> > indication is driven back to the app as long as the socket state is half
> > closed. The event driven epoll mechanism folds these two indications
> > together and thus loses one 'edge'.
> >
> > One would be tempted to issue an extra read() after getting back less than
> > expected, but this is an extra system call on every read event and you get
> > back the same '0' bytes that you get if the buffer is just empty. The only
> > sure bet seems to be CTL_MODding the FD to force a re-poll (which would
> > cost a syscall and hash-lookup in eventpoll for every read event).
> >
> 
> Yes, this is overhead that should be avoided. It is true that you could
> use Level Triggered events, but if you structured your app on edge you
> should be able to solve this w/out overhead.
> 
> 
> 
> > 	2) add a new 1/2 closed event type that a poll routine can return
> >
> > The implementation is trivial, a patch is included below. If this idea sees
> > favor I'll fix the other architectures, ipv6, epoll.h, and make a 'real'
> > patch. I do not believe any drivers deserve to be modified to return this
> > new event.
> 
> This looks good to me. David what do you think ?
> 
> 
> 
> > diff -Naur linux-2.4.20/include/asm-i386/poll.h linux-2.4.20_ev/include/asm-i386/poll.h
> > --- linux-2.4.20/include/asm-i386/poll.h	Thu Jan 23 13:01:28 1997
> > +++ linux-2.4.20_ev/include/asm-i386/poll.h	Sat Jul 12 12:29:11 2003
> > @@ -15,6 +15,7 @@
> >  #define POLLWRNORM	0x0100
> >  #define POLLWRBAND	0x0200
> >  #define POLLMSG		0x0400
> > +#define POLLRDHUP	0x0800
> >
> >  struct pollfd {
> >  	int fd;
> > diff -Naur linux-2.4.20/net/ipv4/tcp.c linux-2.4.20_ev/net/ipv4/tcp.c
> > --- linux-2.4.20/net/ipv4/tcp.c	Tue Jul  8 09:40:42 2003
> > +++ linux-2.4.20_ev/net/ipv4/tcp.c	Sat Jul 12 12:29:56 2003
> > @@ -424,7 +424,7 @@
> >  	if (sk->shutdown == SHUTDOWN_MASK || sk->state == TCP_CLOSE)
> >  		mask |= POLLHUP;
> >  	if (sk->shutdown & RCV_SHUTDOWN)
> > -		mask |= POLLIN | POLLRDNORM;
> > +		mask |= POLLIN | POLLRDNORM | POLLRDHUP;
> >
> >  	/* Connected? */
> >  	if ((1 << sk->state) & ~(TCPF_SYN_SENT|TCPF_SYN_RECV)) {
> >
> 
> 
> 
> - Davide

  reply	other threads:[~2003-07-13  5:19 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-12 18:16 [Patch][RFC] epoll and half closed TCP connections Eric Varsanyi
2003-07-12 19:44 ` Jamie Lokier
2003-07-12 20:51   ` Eric Varsanyi
2003-07-12 20:48     ` Davide Libenzi
2003-07-12 21:19       ` Eric Varsanyi
2003-07-12 21:20         ` Davide Libenzi
2003-07-12 21:41         ` Davide Libenzi
2003-07-12 23:11           ` Eric Varsanyi
2003-07-12 23:55             ` Davide Libenzi
2003-07-13  1:05               ` Eric Varsanyi
2003-07-13 20:32       ` David Schwartz
2003-07-13 21:10         ` Jamie Lokier
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:09             ` Davide Libenzi
2003-07-14  8:14               ` Alan Cox
2003-07-14 15:03                 ` Davide Libenzi
2003-07-14  1:27             ` Jamie Lokier
2003-07-13 21:14         ` Davide Libenzi
2003-07-13 23:05           ` David Schwartz
2003-07-13 23:11             ` Davide Libenzi
2003-07-13 23:52             ` Entrope
2003-07-14  6:14               ` David Schwartz
2003-07-14  7:20                 ` Jamie Lokier
2003-07-14  1:51             ` Jamie Lokier
2003-07-14  6:14               ` David Schwartz
2003-07-15 20:27             ` James Antill
2003-07-16  1:46               ` David Schwartz
2003-07-16  2:09                 ` James Antill
2003-07-13 13:12     ` Jamie Lokier
2003-07-13 16:55       ` Davide Libenzi
2003-07-12 20:01 ` Davide Libenzi
2003-07-13  5:24   ` David S. Miller [this message]
2003-07-13 14:07     ` Jamie Lokier
2003-07-13 17:00       ` Davide Libenzi
2003-07-13 19:15         ` Jamie Lokier
2003-07-13 23:03           ` Davide Libenzi
2003-07-14  1:41             ` Jamie Lokier
2003-07-14  2:24               ` POLLRDONCE optimisation for epoll users (was: epoll and half closed TCP connections) Jamie Lokier
2003-07-14  2:37                 ` Davide Libenzi
2003-07-14  2:43                   ` Davide Libenzi
2003-07-14  2:56                   ` Jamie Lokier
2003-07-14  3:02                     ` Davide Libenzi
2003-07-14  3:16                       ` Jamie Lokier
2003-07-14  3:21                         ` Davide Libenzi
2003-07-14  3:42                           ` Jamie Lokier
2003-07-14  4:00                             ` Davide Libenzi
2003-07-14  5:51                               ` Jamie Lokier
2003-07-14  6:24                                 ` Davide Libenzi
2003-07-14  6:57                                   ` Jamie Lokier
2003-07-14  3:12                     ` Jamie Lokier
2003-07-14  3:17                       ` Davide Libenzi
2003-07-14  3:35                         ` Jamie Lokier
2003-07-14  3:04                   ` Jamie Lokier
2003-07-14  3:12                     ` Davide Libenzi
2003-07-14  3:27                       ` Jamie Lokier
2003-07-14 17:09     ` [Patch][RFC] epoll and half closed TCP connections kuznet
2003-07-14 17:09       ` Davide Libenzi
2003-07-14 21:45       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030712222457.3d132897.davem@redhat.com \
    --to=davem@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=e0206@foo21.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).