All of lore.kernel.org
 help / color / mirror / Atom feed
* TX time stamping
@ 2009-03-03 18:31 Patrick Ohly
  2009-03-19 21:05 ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Ohly @ 2009-03-03 18:31 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev, Kirsher, Jeffrey T

Hello!

Based on previous discussions, there are some areas of time stamping
which could be improved or needing a fix.

The first being, using skb->sk to send back TX information. The goal is
to notify the sender, and just the sender of the packet about the time
when the packet got transmitted.

I cannot think of how this can be done without skb->sk, but I am open to
suggestions. Without it, one would have to fall back to some kind of
broadcasting scheme were all TX stamped packets are received by all processes
potentially interested in them. This has security issues (packet data
needs to be included to identify packets -> data sent by one user would
be visible to other users; TX stamping feature would have to be
restricted to root, which is not the case now), and potentially impacts
performance (every process has to analyze every packet) and might not be
reliable (depending on the packet payload, processes might not be able
to identify their own packets when they get mixed with those of other
processes).

When an early call to skb_orphan() gets implemented, would it be
acceptable to only do that if no TX time stamping is requested for the
packet? This can be tested relatively fast (check one field in shinfo)?

Second, loopback. I cannot see any issues with loopback, but please let
me know if I have missed something. If I remember (and understand)
correctly, loopback is implemented as writing to a special network
device. Once the TX software fallback is implemented again (see third
point below), then this case can be tested with the timestamping test
program by selecting "lo" as device and asking for TX software time
stamps.

Third, TX software time stamping. I believe a clean solution could work
like this:
      * if TX time stamping is requested, then clone the skb and get a
        reference to the socket
      * ask driver to send packet
      * check status of cloned skb: if TX software fallback is
        necessary, do it using the socket reference
      * drop the references to the cloned skb and the socket

In this scenario the extra work is only done for those packets which
need it. It will work for drivers which call skb_orphan().

As an extra refinement one can later add a driver flag which tells the
kernel that the driver will do the software TX fallback itself. In that
case no extra work needs to be done before calling the driver.

Of course, this all depends on still having the socket before calling
the driver - see above  :-/

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-03 18:31 TX time stamping Patrick Ohly
@ 2009-03-19 21:05 ` David Miller
  2009-03-20  2:10   ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2009-03-19 21:05 UTC (permalink / raw)
  To: patrick.ohly; +Cc: herbert, netdev, jeffrey.t.kirsher


You still haven't done anything to address this fundamental issue with
your timestamping changes.  I've been more than patient enough.

It is also not anyone's obligation to help you fix this problem.
You want to add this new feature, whereas I do not consider it
essential in any way whatsoever.

So I am going to have to revert this work, since it appears that being
able to get at skb->sk after the transmit function runs was an
essential requirement to how your code works.

And that will simply never ever work.

This is why I tried to wait a long time to integrate these patches in
the first place, I'd had hoped that you had worked out all of these
kinds of issues during that time.

Instead, we have a half-working turd in a tree, and I'm removing it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-19 21:05 ` David Miller
@ 2009-03-20  2:10   ` Herbert Xu
  2009-03-25  7:08     ` Patrick Ohly
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2009-03-20  2:10 UTC (permalink / raw)
  To: David Miller; +Cc: patrick.ohly, netdev, jeffrey.t.kirsher

On Thu, Mar 19, 2009 at 02:05:09PM -0700, David Miller wrote:
>
> Instead, we have a half-working turd in a tree, and I'm removing it.

Yes, we need to spend a bit more time pondering the semantics
of all this.

First of all, if a packet bifurcates and is transmitted through
two interfaces both capable of timestamping, which event do we
take as the timestamp of the original packet?

As to the problem of skb->sk, I don't think that's even needed
as we can simply use the skb shared area as the communication
medium.  However, for that to work we need to figure out what
we do when the packet becomes two independent entities, hence
we need to resolve the above question first.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-20  2:10   ` Herbert Xu
@ 2009-03-25  7:08     ` Patrick Ohly
  2009-03-26 14:48       ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Ohly @ 2009-03-25  7:08 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev, Kirsher, Jeffrey T

Hello Dave, Herbert!

On Fri, 2009-03-20 at 02:10 +0000, Herbert Xu wrote: 
> On Thu, Mar 19, 2009 at 02:05:09PM -0700, David Miller wrote:
> >
> > Instead, we have a half-working turd in a tree, and I'm removing it.
> 
> Yes, we need to spend a bit more time pondering the semantics
> of all this.

Agreed. It never was the goal to somehow force this into the kernel
unless you are happy with it - not that this would have worked
anyway ;-) I'd be happy to discuss better ways of solving these issues;
the current patches work, but they have their shortcomings. There's no
point in including them when you don't deem them sufficient.

As I said in my initial email in this thread, I don't know a better
solution and depend on some guidance by experts in this area. When I
didn't get a reply to that email I thought that the current solution had
been accepted, but clearly that wasn't the case. I have had a patch
ready for the TX software time stamping for a while (the third point of
my email), but that doesn't address the main reason why you are unhappy
about the patches.

> First of all, if a packet bifurcates and is transmitted through
> two interfaces both capable of timestamping, which event do we
> take as the timestamp of the original packet?

I suggest to make it so that the sender gets the packet back once per
interface, with different time stamps and information about the
interface.

> As to the problem of skb->sk, I don't think that's even needed
> as we can simply use the skb shared area as the communication
> medium.

Can you elaborate on that?

I still think we need to ensure that only the sender is told about the
send time stamp and the associated packet data. Is there perhaps a
unique integer ID for each socket, or is adding it acceptable (a running
count basically)?

In that case we could do something like this: 
      * when sending a packet with TX request, add the socket ID 
      * clear skb->sk 
      * after generating the TX time stamp, try to find the socket by
        ID 
              * if it is found, send packet back with additional info
                like it is done now
              * if not, discard information because the sender is gone 

The drawback is the more costly socket lookup. For PTP this isn't an
issue due to the low packet rate, so a very simple solution would be
good enough. But for other use cases it might be problematic. I also
have no idea how the locking for the socket lookup can be done safely.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-25  7:08     ` Patrick Ohly
@ 2009-03-26 14:48       ` Herbert Xu
  2009-03-26 15:30         ` Patrick Ohly
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2009-03-26 14:48 UTC (permalink / raw)
  To: Patrick Ohly; +Cc: David Miller, netdev, Kirsher, Jeffrey T

On Wed, Mar 25, 2009 at 08:08:44AM +0100, Patrick Ohly wrote:
>
> I suggest to make it so that the sender gets the packet back once per
> interface, with different time stamps and information about the
> interface.

That could be awkward.  What if the second or subsequent instance
of the packet is held up indefinitely?

> > As to the problem of skb->sk, I don't think that's even needed
> > as we can simply use the skb shared area as the communication
> > medium.
> 
> Can you elaborate on that?

I was thinking of the sender keeping hold of the sent skb and
using its shared area to retrieve the time stamp(s).

> The drawback is the more costly socket lookup. For PTP this isn't an
> issue due to the low packet rate, so a very simple solution would be
> good enough. But for other use cases it might be problematic. I also
> have no idea how the locking for the socket lookup can be done safely.

Could you give the main usage scenarios for timestamps, including
what the user-space app does with the info?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-26 14:48       ` Herbert Xu
@ 2009-03-26 15:30         ` Patrick Ohly
  2009-03-30 18:09           ` Oliver Hartkopp
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Ohly @ 2009-03-26 15:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, Kirsher, Jeffrey T

On Thu, 2009-03-26 at 14:48 +0000, Herbert Xu wrote:
> On Wed, Mar 25, 2009 at 08:08:44AM +0100, Patrick Ohly wrote:
> >
> > I suggest to make it so that the sender gets the packet back once per
> > interface, with different time stamps and information about the
> > interface.
> 
> That could be awkward.  What if the second or subsequent instance
> of the packet is held up indefinitely?

The sender must be prepared for packet loss anyway (for example, during
stress tests I have seen that packets were dropped between socket layer
and device driver).

Getting a TX time stamp only for some, but not all expected interfaces,
or getting some results much later isn't that different.

After a certain timeout the sender must assume that the packet was lost
and resend. An unexpected response for a packet that was supposed to be
lost must be ignored.

> > > As to the problem of skb->sk, I don't think that's even needed
> > > as we can simply use the skb shared area as the communication
> > > medium.
> > 
> > Can you elaborate on that?
> 
> I was thinking of the sender keeping hold of the sent skb and
> using its shared area to retrieve the time stamp(s).

The sender is in user space, at least in the case of PTP. Given the
current user space APIs it seems complicated to tie the sender to a
particular skb :-/

> > The drawback is the more costly socket lookup. For PTP this isn't an
> > issue due to the low packet rate, so a very simple solution would be
> > good enough. But for other use cases it might be problematic. I also
> > have no idea how the locking for the socket lookup can be done safely.
> 
> Could you give the main usage scenarios for timestamps, including
> what the user-space app does with the info?

The main usage scenario right now is a hardware assisted implementation
of the Precision Time Protocol (PTP, IEEE 1588). "main usage" as in "the
one I worked on"; others have expressed interest in this hardware
assisted time stamping for other purposes, including detailed network
monitoring.

With PTP, the user space daemons exchange multicast messages with a rate
of less than one per second. Sender and receiver record send and receive
time stamps of each packet. This is done in both directions. These two
delays are used to determine clock drift between two nodes. Filters are
used to remove noise in the input data as much as possible, but it is
much better to avoid the noise in the first place.

Noise in this context are varying delays of the packets. It's okay if
the delay is large, as long as it is constant. But when the delay
increases by a value of x compared to the previous measurement, then PTP
cannot tell whether this particular packet just was transmitted more
slowly or the clocks have drifted apart.

Hardware time stamping removes the varying packet delays inside the
kernel's IP stack because the send/receive time stamps are generated
directly when the packet hits the wire. I found that hardware assistance
is essential for good accuracy [1].

It's also important to remove the varying delays inside the network
switches from the equation. PTP v2 has support for that and equipment
manufacturers are starting to support it.

[1] http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-26 15:30         ` Patrick Ohly
@ 2009-03-30 18:09           ` Oliver Hartkopp
  2009-03-31  6:53             ` Patrick Ohly
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2009-03-30 18:09 UTC (permalink / raw)
  To: Patrick Ohly; +Cc: Herbert Xu, David Miller, netdev, Kirsher, Jeffrey T

Patrick Ohly wrote:
> On Thu, 2009-03-26 at 14:48 +0000, Herbert Xu wrote:
>> On Wed, Mar 25, 2009 at 08:08:44AM +0100, Patrick Ohly wrote:
>>> I suggest to make it so that the sender gets the packet back once per
>>> interface, with different time stamps and information about the
>>> interface.
>> That could be awkward.  What if the second or subsequent instance
>> of the packet is held up indefinitely?
> 
> The sender must be prepared for packet loss anyway (for example, during
> stress tests I have seen that packets were dropped between socket layer
> and device driver).
> 
> Getting a TX time stamp only for some, but not all expected interfaces,
> or getting some results much later isn't that different.
> 
> After a certain timeout the sender must assume that the packet was lost
> and resend. An unexpected response for a packet that was supposed to be
> lost must be ignored.
> 

Hello Patrick,

i wonder if using the IP stack for PTP with the possibility to send TX-stamped
 PDUs on various interfaces is the best solution.

I'm not aware of all the routing, packet scheduling, etc. stuff that much -
but does it probably make sense to use AF_PACKET for PTP, where you can
specify the interface and build a PTP IP PDU directly? I assume this does not
make that big difference to the ptpd in userspace.

Any ideas? Or am i completely wrong here?

Regards,
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TX time stamping
  2009-03-30 18:09           ` Oliver Hartkopp
@ 2009-03-31  6:53             ` Patrick Ohly
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Ohly @ 2009-03-31  6:53 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Herbert Xu, David Miller, netdev, Kirsher, Jeffrey T

On Mon, 2009-03-30 at 21:09 +0300, Oliver Hartkopp wrote:
> i wonder if using the IP stack for PTP with the possibility to send TX-stamped
>  PDUs on various interfaces is the best solution.

PTPd already only sends on one interface. How to deal with packets that
go out via multiple interfaces becomes relevant when generalizing the
hardware time stamping concept.

> I'm not aware of all the routing, packet scheduling, etc. stuff that much -
> but does it probably make sense to use AF_PACKET for PTP, where you can
> specify the interface and build a PTP IP PDU directly? I assume this does not
> make that big difference to the ptpd in userspace.

I'm not familiar with AF_PACKET. It let's user space assemble the
complete packet (including Ethernet header) and send directly via a
specific interface, right?

The drawback is that the user space daemon would have to reimplement the
joining/leaving of a multicast group. When using the IP stack, it can
let the kernel do that.

It also still uses write() or sendmsg(), doesn't it? In that case
there's no advantage over the current approach because the only link
back to the sender is still only the socket.

But perhaps I am simply unaware of some aspects of the socket API for
AF_PACKET. Is there something which would allow implementing Herbert's
approach with communication via sh_info when the sender is in user
space?

Herbert, what do you think about the "identify socket via unique ID"
idea? Is that possible/doable/acceptable/stupid/all of these?

-- 
Best Regards

Patrick Ohly
Senior Software Engineer

Intel GmbH
Software & Solutions Group                
Hermuelheimer Strasse 8a                  Phone: +49-2232-2090-30
50321 Bruehl                              Fax: +49-2232-2090-29
Germany


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-03-31  6:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-03 18:31 TX time stamping Patrick Ohly
2009-03-19 21:05 ` David Miller
2009-03-20  2:10   ` Herbert Xu
2009-03-25  7:08     ` Patrick Ohly
2009-03-26 14:48       ` Herbert Xu
2009-03-26 15:30         ` Patrick Ohly
2009-03-30 18:09           ` Oliver Hartkopp
2009-03-31  6:53             ` Patrick Ohly

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.