All of lore.kernel.org
 help / color / mirror / Atom feed
* SO_TIMESTAMP on tcp sockets?
@ 2012-05-08  1:39 Andy Lutomirski
  2012-05-08  4:37 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Andy Lutomirski @ 2012-05-08  1:39 UTC (permalink / raw)
  To: Network Development

I've been using SO_TIMESTAMPNS to good effect on udp sockets.  I'd
like to do the same thing for tcp.  I realize that this is
semantically strange [1], but I don't think there's a real issue for
my use case.  We have very thin streams -- we are likely to process
each incoming segment as it is received, and I want the most precise
timestamp possible on each segment.

A simple approach (I think) would be for a recvmsg on a tcp socket
with SO_TIMESTAMP(NS) to return at most one skb worth of data along
with the timestamp associated with that skb.  This could be a little
strange if multiple segments overlap or if lro is involved, but
neither of those cases seems like a major problem.

Is there any interest in something like this?

Thanks,
Andy

[1] http://comments.gmane.org/gmane.linux.network/159574

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SO_TIMESTAMP on tcp sockets?
  2012-05-08  1:39 SO_TIMESTAMP on tcp sockets? Andy Lutomirski
@ 2012-05-08  4:37 ` Eric Dumazet
  2012-05-08 21:35   ` Andy Lutomirski
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2012-05-08  4:37 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Network Development

On Mon, 2012-05-07 at 18:39 -0700, Andy Lutomirski wrote:
> I've been using SO_TIMESTAMPNS to good effect on udp sockets.  I'd
> like to do the same thing for tcp.  I realize that this is
> semantically strange [1], but I don't think there's a real issue for
> my use case.  We have very thin streams -- we are likely to process
> each incoming segment as it is received, and I want the most precise
> timestamp possible on each segment.
> 
> A simple approach (I think) would be for a recvmsg on a tcp socket
> with SO_TIMESTAMP(NS) to return at most one skb worth of data along
> with the timestamp associated with that skb.  This could be a little
> strange if multiple segments overlap or if lro is involved, but
> neither of those cases seems like a major problem.
> 
> Is there any interest in something like this?
> 

LRO/GRO is not really a problem, buffers are merged because they are
received in a very short time period. If you want nanosec timestamping
on TCP, just cancel the whole idea.

TCP can 'collapse' several buffers onto single ones (to reduce memory
overhead). Which timestamp would be chosen at collapse time ?

net-next also has tcp coalescing, wich also merge buffers as soon as
they enter receive or ofo queue.

Another problem with SO_TIMESTAMPNS is it globally enables time stamping
on all skbs on the host, adding some latencies. (ktime_get() can be
slowed down when time keeping triggers and hold xtime seqlock)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SO_TIMESTAMP on tcp sockets?
  2012-05-08  4:37 ` Eric Dumazet
@ 2012-05-08 21:35   ` Andy Lutomirski
  0 siblings, 0 replies; 3+ messages in thread
From: Andy Lutomirski @ 2012-05-08 21:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development

On Mon, May 7, 2012 at 9:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2012-05-07 at 18:39 -0700, Andy Lutomirski wrote:
>> I've been using SO_TIMESTAMPNS to good effect on udp sockets.  I'd
>> like to do the same thing for tcp.  I realize that this is
>> semantically strange [1], but I don't think there's a real issue for
>> my use case.  We have very thin streams -- we are likely to process
>> each incoming segment as it is received, and I want the most precise
>> timestamp possible on each segment.
>>
>> A simple approach (I think) would be for a recvmsg on a tcp socket
>> with SO_TIMESTAMP(NS) to return at most one skb worth of data along
>> with the timestamp associated with that skb.  This could be a little
>> strange if multiple segments overlap or if lro is involved, but
>> neither of those cases seems like a major problem.
>>
>> Is there any interest in something like this?
>>
>
> LRO/GRO is not really a problem, buffers are merged because they are
> received in a very short time period. If you want nanosec timestamping
> on TCP, just cancel the whole idea.
>
> TCP can 'collapse' several buffers onto single ones (to reduce memory
> overhead). Which timestamp would be chosen at collapse time ?
>
> net-next also has tcp coalescing, wich also merge buffers as soon as
> they enter receive or ofo queue.

Hmm.  Here are two possibilities:

1. When timestamping is on, turn off all coalescing on that socket.
Throughput starts to suck, but at least for my use case this is
irrelevant.

2. Instead of timestamping when a given piece of data arrived,
timestamp when the socket last became readable in the POLLIN sense.
Return the answer as ancillary data on the first recvmsg after the
socket becomes readable.  This would be enough for my purposes.
(Basically, I want to be able to correlate my receives with pcap data,
at least in the common case, and I also want to be able to estimate
latency between the network interrupt and my app handling the data.
The phy timestamp would be even better, but that's not supported on my
hardware.)

>
> Another problem with SO_TIMESTAMPNS is it globally enables time stamping
> on all skbs on the host, adding some latencies. (ktime_get() can be
> slowed down when time keeping triggers and hold xtime seqlock)
>
>

This doesn't bother me too much -- I'm already paying that cost.  In
any case, it should be mostly fixable by taking the xtime lock for
write a lot less often than we do now.  Getting the time (via vdso,
which is probably much better optimized than ktime_get) takes about
15ns on my machine.

--Andy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-08 21:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-08  1:39 SO_TIMESTAMP on tcp sockets? Andy Lutomirski
2012-05-08  4:37 ` Eric Dumazet
2012-05-08 21:35   ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.