Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
       [not found] <D3F292ADF945FB49B35E96C94C2061B916484EC4@nsmail.netscout.com>
@ 2012-03-15 17:18 ` chetan loke
  2012-03-16  6:55   ` Richard Cochran
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-15 17:18 UTC (permalink / raw)
  To: richardcochran, netdev
  Cc: e1000-devel, jacob.e.keller, jeffrey.t.kirsher, john.ronciak,
	john.stultz, tglx

> Subject: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of
> the timecompare method
>
> This commit removes the legacy timecompare code from the igb driver and
> offers a tunable PHC instead.

Richard (and others who contributed),

This ptp stuff is great work(time-sync sources, pkt-time,
system-time)! Now, I've a few questions :) :

1) how can I use igb->PHC to set/discipline my system time?
 a) clock_gettime(IGB_CLK_ID...);
 b) clock_settime(REAL_TIME,...) ? Good/bad idea?

2) what is this deadlock from intel's datasheet
(http://download.intel.com/design/network/datashts/82576_Datasheet.pdf)
- goto pg 426
  2.1) Is this applicable? If not please ignore this question.
  2.2) Do we need to worry about it. Just asking because a quick
visual scan didn't pull up - TIMADJL/H.

3) Testing IGB PHC (couple of months from now because I'm busy with
something else):

   3.1) CardX(It's a different non-intel NIC) and CardY(intel) are in
hostA. Both cards will use the same time-source Z.
   3.2) CardX is time-sync'd using PTP.
   3.3) CardY - is a dual port card. Port 0 will receive PTP packets
via time-srcZ.
   3.4) Wait for some time for both the cards to get in-sync.
   3.5) feed same traffic to Card X and CardY-Port1.
   3.6) measure time-difference across packets from different NICs on hostA.

   Is this a good test configuration? Now, ofcourse there will be some
delta but the intention is to get a feel. See how far off are both the
adapters.

4) possible contention on tmreg_lock?
I will read previous emails/patches shortly but I was thinking - what
if there are many readers who would like to use PHC as reference time?
Then do we want them to block the driver who wants to time-stamp the
packet?
 igb_gettime->spin_lock::tmreg_lock         <---contention--->  driver
->igb_systim_to_hwtstamp->spin_lock::tmreg_lock

thanks
Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-15 17:18 ` [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method chetan loke
@ 2012-03-16  6:55   ` Richard Cochran
  2012-03-21 15:00     ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-16  6:55 UTC (permalink / raw)
  To: chetan loke
  Cc: e1000-devel, netdev, john.ronciak, john.stultz, jacob.e.keller, tglx

On Thu, Mar 15, 2012 at 01:18:26PM -0400, chetan loke wrote:
> 
> 1) how can I use igb->PHC to set/discipline my system time?
>  a) clock_gettime(IGB_CLK_ID...);
>  b) clock_settime(REAL_TIME,...) ? Good/bad idea?

This is a very crude method, and it will cause frequent jumps in your
system clock.

Better would be:

    a = clock_gettime(igb_clk_id);
    clock_settime(CLOCK_REALTIME, a);

    once_every_second()
    {
        a = clock_gettime(igb_clk_id);
        b = clock_gettime(CLOCK_REALTIME);
        offset = a - b;
        freq_adj = pi_servo(offset);
        clock_adjtime(CLOCK_REALTIME, freq_adj);
    }

There is a program, phc2sys, that does something very similar to this
(but using a kernel PPS time stamp instead of clock_gettime) in the
linuxptp code. You could modify that program to do the above fairly
easily. With a heavy enough filter, it might even be acceptable.

    http://linuxptp.sourceforge.net/

Even better would be to use the internal PPS provided by some PHC
hardware to get a better time stamp from the system clock. The IGB
could provide this (I think), but it is not implemented yet.

> 2) what is this deadlock from intel's datasheet
> (http://download.intel.com/design/network/datashts/82576_Datasheet.pdf)
> - goto pg 426
>   2.1) Is this applicable? If not please ignore this question.
>   2.2) Do we need to worry about it. Just asking because a quick
> visual scan didn't pull up - TIMADJL/H.

Do you mean RXSTMPL/H?

Looking at igb_rx_hwtstamp() in igb_main.c, it appears that the 82576
might deliver wrong time stamps with a received packet, if ever a
packet gets time stamped in hardware but does not make it into the Rx
ring buffer.

The 82580 delivers the time stamp with the frame and does not have this
problem.

> 3) Testing IGB PHC (couple of months from now because I'm busy with
> something else):
> 
>    3.1) CardX(It's a different non-intel NIC) and CardY(intel) are in
> hostA. Both cards will use the same time-source Z.
>    3.2) CardX is time-sync'd using PTP.
>    3.3) CardY - is a dual port card. Port 0 will receive PTP packets
> via time-srcZ.
>    3.4) Wait for some time for both the cards to get in-sync.
>    3.5) feed same traffic to Card X and CardY-Port1.
>    3.6) measure time-difference across packets from different NICs on hostA.
> 
>    Is this a good test configuration? Now, ofcourse there will be some
> delta but the intention is to get a feel. See how far off are both the
> adapters.

The best way to do this is to compare a 1 PPS output from each card. I
think the IGB hardware can provide a PPS output, but driver support is
not yet there. Also, the pin providing the signal may or may not be
accessible on your card. Talk to Intel about this.
 
> 4) possible contention on tmreg_lock?
> I will read previous emails/patches shortly but I was thinking - what
> if there are many readers who would like to use PHC as reference time?
> Then do we want them to block the driver who wants to time-stamp the
> packet?
>  igb_gettime->spin_lock::tmreg_lock         <---contention--->  driver
> ->igb_systim_to_hwtstamp->spin_lock::tmreg_lock

Under normal use cases, I would not expect much contention at all.

HTH,

Richard

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-16  6:55   ` Richard Cochran
@ 2012-03-21 15:00     ` chetan loke
  2012-03-21 16:08       ` Richard Cochran
  2012-03-21 17:00       ` Richard Cochran
  0 siblings, 2 replies; 38+ messages in thread
From: chetan loke @ 2012-03-21 15:00 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, e1000-devel, jacob.e.keller, jeffrey.t.kirsher,
	john.ronciak, john.stultz, tglx

On Fri, Mar 16, 2012 at 2:55 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Thu, Mar 15, 2012 at 01:18:26PM -0400, chetan loke wrote:
>>
>> 1) how can I use igb->PHC to set/discipline my system time?
>>  a) clock_gettime(IGB_CLK_ID...);
>>  b) clock_settime(REAL_TIME,...) ? Good/bad idea?
>
> This is a very crude method, and it will cause frequent jumps in your
> system clock.

Of course it's crude :). I wanted to make sure if my understanding was correct.

>> 4) possible contention on tmreg_lock?
>> I will read previous emails/patches shortly but I was thinking - what
>> if there are many readers who would like to use PHC as reference time?
>> Then do we want them to block the driver who wants to time-stamp the
>> packet?
>>  igb_gettime->spin_lock::tmreg_lock         <---contention--->  driver
>> ->igb_systim_to_hwtstamp->spin_lock::tmreg_lock
>
> Under normal use cases, I would not expect much contention at all.
>

Once PHC->gettime goes live(aka exported to user space), we can't
really control how users will use it in their applications. There
could be 100+ apps all trying to get real-time from the network to do
some time-keeping stuff. They might pound the ioctls at high rate. The
last thing we would want is to self-induce a light weight DOS. What
Eric Dumazet mentioned in the very first patch set seems like a good
comment. seqlock or whatever it is we use for jiffies.
We shouldn't let get_time block network traffic under cases when there
is high load and large number of threads are pounding get-time.
Reader's *must* wait.

reminder to myself: once this patch is applied, tpacket_v3 will
break(not because of spinlock but in general) under specific
conditions. Fix tpacket_v3.

> HTH,
>
> Richard

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 15:00     ` chetan loke
@ 2012-03-21 16:08       ` Richard Cochran
  2012-03-21 17:02         ` chetan loke
  2012-03-21 17:06         ` Keller, Jacob E
  2012-03-21 17:00       ` Richard Cochran
  1 sibling, 2 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-21 16:08 UTC (permalink / raw)
  To: chetan loke
  Cc: netdev, e1000-devel, jacob.e.keller, jeffrey.t.kirsher,
	john.ronciak, john.stultz, tglx

On Wed, Mar 21, 2012 at 11:00:59AM -0400, chetan loke wrote:
> 
> Once PHC->gettime goes live(aka exported to user space), we can't
> really control how users will use it in their applications. There
> could be 100+ apps all trying to get real-time from the network to do
> some time-keeping stuff. They might pound the ioctls at high rate. The
> last thing we would want is to self-induce a light weight DOS.

Well, if people want to write programs that make no sense at all,
then I can cannot stop them. There really isn't any point in general
applications using the PHC directly. You can easily synchronize the
system time to the PHC to within a few microseconds. Just the error
from reading the clock (due to various kinds of latency) is much
larger than that.

Also, although you might be able to optimize clock_gettime performance
for a PCI card, this will never work for PHY based devices, which, by
the way, offer superior PTP time stamping. So, in general, applications
should stick to reading the system time.

We still need to get some more tools out there in order to support the
PHC->system synchronization, but that is not too far off. That is what
I am working on now, and I don't think optimizing the igb is worth the
effort. If you want to try it, please go right ahead, but adding the
PPS would be much more useful IMHO.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 15:00     ` chetan loke
  2012-03-21 16:08       ` Richard Cochran
@ 2012-03-21 17:00       ` Richard Cochran
  2012-03-21 17:09         ` Keller, Jacob E
  1 sibling, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-21 17:00 UTC (permalink / raw)
  To: chetan loke
  Cc: netdev, e1000-devel, jacob.e.keller, jeffrey.t.kirsher,
	john.ronciak, john.stultz, tglx

On Wed, Mar 21, 2012 at 11:00:59AM -0400, chetan loke wrote:
> Once PHC->gettime goes live(aka exported to user space), we can't
> really control how users will use it in their applications. There
> could be 100+ apps all trying to get real-time from the network to do
> some time-keeping stuff. They might pound the ioctls at high rate. The
> last thing we would want is to self-induce a light weight DOS. What
> Eric Dumazet mentioned in the very first patch set seems like a good
> comment. seqlock or whatever it is we use for jiffies.

Can you please explain how using a seqlock could help here?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 16:08       ` Richard Cochran
@ 2012-03-21 17:02         ` chetan loke
  2012-03-22  7:00           ` Richard Cochran
  2012-03-21 17:06         ` Keller, Jacob E
  1 sibling, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-21 17:02 UTC (permalink / raw)
  To: Richard Cochran
  Cc: e1000-devel, netdev, john.ronciak, john.stultz, jacob.e.keller, tglx

On Wed, Mar 21, 2012 at 12:08 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Wed, Mar 21, 2012 at 11:00:59AM -0400, chetan loke wrote:
>>
>> Once PHC->gettime goes live(aka exported to user space), we can't
>> really control how users will use it in their applications. There
>> could be 100+ apps all trying to get real-time from the network to do
>> some time-keeping stuff. They might pound the ioctls at high rate. The
>> last thing we would want is to self-induce a light weight DOS.
>
> Well, if people want to write programs that make no sense at all,
> then I can cannot stop them. There really isn't any point in general
> applications using the PHC directly. You can easily synchronize the

I thought the core patches enables using PHC as a reference time, no?
So that seems to be contradicting the PHC API. If there's no point
then why are we exporting PHC->get_time as a generic interface? May be
just limit get_time interface to something like ethtool?

> system time to the PHC to within a few microseconds. Just the error
> from reading the clock (due to various kinds of latency) is much
> larger than that.
>

Once, the clocks are in-sync, it's not an error - but  *somewhat
fixed* latency. There are users who don't want to spend extra money
for expensive GPS time-sync stuff but yet have enough CPUs such that
they can dedicate 1 CPU for book-keeping. For such users, this
*somewhat fixed* latency should be constant over a period of time even
when other CPUs are processing traffic at line rate.

> Also, although you might be able to optimize clock_gettime performance
> for a PCI card, this will never work for PHY based devices, which, by
> the way, offer superior PTP time stamping. So, in general, applications
> should stick to reading the system time. We still need to get some more tools out there in order to support the
> PHC->system synchronization, but that is not too far off. That is what
> I am working on now,

Sounds good on the tools part. But if applications should stick to
reading sys-time then either we shouldn't export the API or export it
selectively.If there's an API then user-space guys will use it. For
this discussion, lets focus on PCI cards because that is what this
patch talks about. Also the majority of the deployment will be LOM or
PCI cards.

>
> Thanks,
> Richard
>
thanks
Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 16:08       ` Richard Cochran
  2012-03-21 17:02         ` chetan loke
@ 2012-03-21 17:06         ` Keller, Jacob E
  2012-03-22  7:09           ` Richard Cochran
  1 sibling, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-21 17:06 UTC (permalink / raw)
  To: Richard Cochran, chetan loke
  Cc: netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak, John,
	john.stultz, tglx



> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Wednesday, March 21, 2012 9:08 AM
> To: chetan loke
> Cc: netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net; Keller, Jacob
> E; Kirsher, Jeffrey T; Ronciak, John; john.stultz@linaro.org;
> tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> On Wed, Mar 21, 2012 at 11:00:59AM -0400, chetan loke wrote:
> >
> > Once PHC->gettime goes live(aka exported to user space), we can't
> > really control how users will use it in their applications. There
> > could be 100+ apps all trying to get real-time from the network to do
> > some time-keeping stuff. They might pound the ioctls at high rate. The
> > last thing we would want is to self-induce a light weight DOS.
> 
> Well, if people want to write programs that make no sense at all, then I can
> cannot stop them. There really isn't any point in general applications using
> the PHC directly. You can easily synchronize the system time to the PHC to
> within a few microseconds. Just the error from reading the clock (due to
> various kinds of latency) is much larger than that.
> 
> Also, although you might be able to optimize clock_gettime performance for a
> PCI card, this will never work for PHY based devices, which, by the way, offer
> superior PTP time stamping. So, in general, applications should stick to
> reading the system time.
> 
> We still need to get some more tools out there in order to support the
> PHC->system synchronization, but that is not too far off. That is what
> I am working on now, and I don't think optimizing the igb is worth the effort.
> If you want to try it, please go right ahead, but adding the PPS would be much
> more useful IMHO.
> 
> Thanks,
> Richard
> 

I agree with Chetan. I think it would be best to make sure the correct form of locking is done, as we are providing an interface to the user. Using a seqlock would allow for preventing the ioctls from blocking the hardware timestamp code.

It's a fairly simple change for the gettime function (the most likely culprit to be hammered) by changing it to use timecounter_cyc2time function instead of timecounter_read. (as long as timecounter_read is called at least every 1/2 the system time overflow, which it should be due to the work task.)

With that change, then the section use a seqlock (along with the section for checking hardware timestamps). Other places would do the full write lock.

I do agree with Richard about supporting the PPS if possible, as that is much more useful as a general clock synchronization tool.

- Jake

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 17:00       ` Richard Cochran
@ 2012-03-21 17:09         ` Keller, Jacob E
  2012-03-21 21:50           ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-21 17:09 UTC (permalink / raw)
  To: Richard Cochran, chetan loke
  Cc: netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak, John,
	john.stultz, tglx

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Wednesday, March 21, 2012 10:00 AM
> To: chetan loke
> Cc: netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net; Keller, Jacob
> E; Kirsher, Jeffrey T; Ronciak, John; john.stultz@linaro.org;
> tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> Can you please explain how using a seqlock could help here?
> 
> Thanks,
> Richard

My understanding of the seqlock, is that it prevents starvation of the hwtstamp calls in the rx and tx routines if/when a user hammers the gettime ioctl due to bad software design where 100+ apps are wanting direct access to the PHC.

- Jake

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 17:09         ` Keller, Jacob E
@ 2012-03-21 21:50           ` chetan loke
  2012-03-22  6:41             ` Richard Cochran
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-21 21:50 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: e1000-devel, netdev, Ronciak, John, john.stultz, tglx

On Wed, Mar 21, 2012 at 1:09 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:
>> -----Original Message-----
>> From: Richard Cochran [mailto:richardcochran@gmail.com]
>> Sent: Wednesday, March 21, 2012 10:00 AM
>> To: chetan loke
>> Cc: netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net; Keller, Jacob
>> E; Kirsher, Jeffrey T; Ronciak, John; john.stultz@linaro.org;
>> tglx@linutronix.de
>> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
>> timecompare method
>>
>> Can you please explain how using a seqlock could help here?
>>
>> Thanks,
>> Richard
>
> My understanding of the seqlock, is that it prevents starvation of the hwtstamp calls in the rx and tx routines if/when a user hammers the gettime ioctl due to bad software design where 100+ apps are wanting direct access to the PHC.
>

Richard - Intent is to make the readers(get_time) wait (or return last
read value if the seq_counter tripped because you know that this value
was recent) and let the tx/rx path continue. I haven't looked in more
details but as Jake mentioned you will also need to change the way you
read the values(by not using timecounter_read in get_time).

thanks
Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 21:50           ` chetan loke
@ 2012-03-22  6:41             ` Richard Cochran
  2012-03-22 23:13               ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-22  6:41 UTC (permalink / raw)
  To: chetan loke
  Cc: e1000-devel, netdev, Ronciak, John, john.stultz, Keller, Jacob E, tglx

On Wed, Mar 21, 2012 at 05:50:34PM -0400, chetan loke wrote:
> 
> Richard - Intent is to make the readers(get_time) wait (or return last
> read value if the seq_counter tripped because you know that this value
> was recent) and let the tx/rx path continue. I haven't looked in more
> details but as Jake mentioned you will also need to change the way you
> read the values(by not using timecounter_read in get_time).

I don't get what you guys are saying. How can you avoid the spin lock
around the two time register reads? How about a patch or some pseudo
code?

Thanks,
Richard

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 17:02         ` chetan loke
@ 2012-03-22  7:00           ` Richard Cochran
  0 siblings, 0 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-22  7:00 UTC (permalink / raw)
  To: chetan loke
  Cc: netdev, e1000-devel, jacob.e.keller, jeffrey.t.kirsher,
	john.ronciak, john.stultz, tglx

On Wed, Mar 21, 2012 at 01:02:38PM -0400, chetan loke wrote:
> > Well, if people want to write programs that make no sense at all,
> > then I can cannot stop them. There really isn't any point in general
> > applications using the PHC directly. You can easily synchronize the
> 
> I thought the core patches enables using PHC as a reference time, no?
> So that seems to be contradicting the PHC API. If there's no point
> then why are we exporting PHC->get_time as a generic interface? May be
> just limit get_time interface to something like ethtool?

There was a long discussion on this list and the lkml about the API,
and I think what we came up is a good solution. There is nothing to
prevent you from rewriting all of user space to use a dynamic clock ID
instead of CLOCK_REALTIME, but I seriously doubt anyone is going to do
this.

It does make sense for the time synchronization services to use
the clock_gettime calls to do their work.

Calls to read the system clock are highly optimized (at least on x86),
and thus applications wanting quick time stamps are wise to use that
clock. Calling clock_gettime on another clock, a PHY based clock for
example, might block for a long, long time. There is no way to
optimize around that, since it is a MDIO bus transaction.

> Once, the clocks are in-sync, it's not an error - but  *somewhat
> fixed* latency. There are users who don't want to spend extra money
> for expensive GPS time-sync stuff but yet have enough CPUs such that
> they can dedicate 1 CPU for book-keeping. For such users, this
> *somewhat fixed* latency should be constant over a period of time even
> when other CPUs are processing traffic at line rate.

Consider the following program:

	if (interesting_event()) {
		/* what happens in this empty space? preemption? interrupt? */
		clock_gettime();
	}

Run cyclictest on your system, and let's then talk about your 'fixed'
latency numbers.

> Sounds good on the tools part. But if applications should stick to
> reading sys-time then either we shouldn't export the API or export it
> selectively.If there's an API then user-space guys will use it. For
> this discussion, lets focus on PCI cards because that is what this
> patch talks about. Also the majority of the deployment will be LOM or
> PCI cards.

Like I said, I personally don't see much utility in optimizing igb. I
only have so much time to spend, but I am not against you or Jacob or
anyone else doing that work.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-21 17:06         ` Keller, Jacob E
@ 2012-03-22  7:09           ` Richard Cochran
  2012-03-22 21:59             ` Keller, Jacob E
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-22  7:09 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: e1000-devel, netdev, Ronciak, John, john.stultz, tglx, chetan loke

On Wed, Mar 21, 2012 at 05:06:09PM +0000, Keller, Jacob E wrote:
> 
> I agree with Chetan. I think it would be best to make sure the
> correct form of locking is done, as we are providing an interface to
> the user. Using a seqlock would allow for preventing the ioctls from
> blocking the hardware timestamp code.

Okay, you can improve the time stamping path in the driver to avoid
contending with callers to clock_gettime. But that will not help the
hundreds of clock_gettime callers from contending with each other.

> It's a fairly simple change for the gettime function (the most
> likely culprit to be hammered) by changing it to use
> timecounter_cyc2time function instead of timecounter_read. (as long
> as timecounter_read is called at least every 1/2 the system time
> overflow, which it should be due to the work task.)
>
> With that change, then the section use a seqlock (along with the
> section for checking hardware timestamps). Other places would do the
> full write lock.

So, you think that clock_gettime should not read the card's time
registers?

Richard



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-22  7:09           ` Richard Cochran
@ 2012-03-22 21:59             ` Keller, Jacob E
  2012-03-23  6:05               ` Richard Cochran
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-22 21:59 UTC (permalink / raw)
  To: Richard Cochran
  Cc: chetan loke, netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak,
	John, john.stultz, tglx

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Thursday, March 22, 2012 12:09 AM
> To: Keller, Jacob E
> Cc: chetan loke; netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net;
> Kirsher, Jeffrey T; Ronciak, John; john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> On Wed, Mar 21, 2012 at 05:06:09PM +0000, Keller, Jacob E wrote:
> >
> > I agree with Chetan. I think it would be best to make sure the correct
> > form of locking is done, as we are providing an interface to the user.
> > Using a seqlock would allow for preventing the ioctls from blocking
> > the hardware timestamp code.
> 
> Okay, you can improve the time stamping path in the driver to avoid contending
> with callers to clock_gettime. But that will not help the hundreds of
> clock_gettime callers from contending with each other.
>

Seqlocks don't block when reading. The way they work is with a sequence number. When a reader wants to perform a read, it atomically records the sequence number. Then it performs the operation it wants (which can't have *any* side effects. It needs to be read only on the data). After finishing, it checks to see if the sequence number has increased. If so, it repeats the process until the sequence number stays the same.

Writers work like normal spin locks but increase a sequence number whenever they start work.

This means that readers don't block at all, and as long as the readers don't conflict with each other, there is no contention or writer starvation. It is possible for readers to 'live' lock, due to a large number of write operations. However, the lock is designed for few-writers, many readers. Which is what we have.

> > It's a fairly simple change for the gettime function (the most likely
> > culprit to be hammered) by changing it to use timecounter_cyc2time
> > function instead of timecounter_read. (as long as timecounter_read is
> > called at least every 1/2 the system time overflow, which it should be
> > due to the work task.)
> >
> > With that change, then the section use a seqlock (along with the
> > section for checking hardware timestamps). Other places would do the
> > full write lock.
> 
> So, you think that clock_gettime should not read the card's time registers?
> 
You can have it read the register value directly and use timecounter_cyc2time to convert that value into ns since the epoch. This works because you are already using timecounter_read during the worktask which should be running at least as often as half the counter wraparound time. Timecounter_read has a side effect of updating the last-read value in the timecounter structure. Timecounter_cyc2time does not. The MAC timestamps will be correct as long as the timecounter_read is updated at most half a wrap-around time before the systime is calculated.

This means that gettime, and tx/rx hwtstamp functions would be read operations, while settime, adjtime, cyclecounter_update and (if igb needs it?) the function which adjusts for link speed would be writes. Since these operations only occur occasionally (relative to tx/rx hwtstamp and gettime) we have a many readers/few writers scenario.

I can show you the changes on the ixgbe driver.

> Richard
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-22  6:41             ` Richard Cochran
@ 2012-03-22 23:13               ` chetan loke
  2012-03-23  5:59                 ` Richard Cochran
  2012-03-23 18:27                 ` Keller, Jacob E
  0 siblings, 2 replies; 38+ messages in thread
From: chetan loke @ 2012-03-22 23:13 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Thu, Mar 22, 2012 at 2:41 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Wed, Mar 21, 2012 at 05:50:34PM -0400, chetan loke wrote:
>>
>> Richard - Intent is to make the readers(get_time) wait (or return last
>> read value if the seq_counter tripped because you know that this value
>> was recent) and let the tx/rx path continue. I haven't looked in more
>> details but as Jake mentioned you will also need to change the way you
>> read the values(by not using timecounter_read in get_time).
>
> I don't get what you guys are saying. How can you avoid the spin lock
> around the two time register reads? How about a patch or some pseudo
> code?
>

tmreg_lock now becomes seqlock_t instead of spinlock_t.

/* users can keep re-trying - dont really care */
igb_gettime_locking (...) {
     unsigned int seq;
     u64 ns;
     do {
     seq = read_seqbegin( &pigb->tmreg_seq_lock);
     ns = timecounter_read(&pigb->tc);
     } while (read_seqretry(&pigb->tmreg_seq_lock, seq));
     // process ns
}


copyright 2012 - Chetan Loke <lokechetan@gmail.com>

// trip cnt will ensure/enforce - evil adjtime user-space code can
still not block us.
// called from igb_tx[rx]_hwtstamp
driver_rx_tx_path_locking( ... ) {
     unsigned int seq, trip_cnt = 0;
     u64 ns;
     do {
     seq = read_seqbegin( &pigb->tmreg_seq_lock);
     trip_cnt++;
     ns = timecounter_read(&pigb->tc);
     } while (read_seqretry(&pigb->tmreg_seq_lock, seq) && trip_cnt < 2));
     // process ns
}


igb_adjtime () {
    /* just let first user of adjtime or settime succeed? */
    if (write_try_seqlock(&pigb->tmreg_seq_lock)) {
         // update NIC counter - here ...
         write_sequnlock(&pigb->tmreg_seq_lock);
    }
}

igb_settime () {
    // let all of them do their thing ... or you can use trylock here
too if you like...
    write_seqlock(&pigb->tmreg_seq_lock);
         // update NIC counter - here ...
    write_sequnlock(&pigb->tmreg_seq_lock);
 }

write_seq_xxx will spinlock as usual but read_seq reads the atomically
incremented counter. Also notice the use of trip_cnt to further ensure
the driver never keeps re-trying.

The regular __irqsave flavors are also available for use for the
write_seq calls.

> Thanks,
> Richard

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-22 23:13               ` chetan loke
@ 2012-03-23  5:59                 ` Richard Cochran
  2012-03-23 18:27                 ` Keller, Jacob E
  1 sibling, 0 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-23  5:59 UTC (permalink / raw)
  To: chetan loke
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Thu, Mar 22, 2012 at 07:13:42PM -0400, chetan loke wrote:

> /* users can keep re-trying - dont really care */

[Sigh] No, they do in fact care...

> igb_gettime_locking (...) {
>      unsigned int seq;
>      u64 ns;
>      do {
>      seq = read_seqbegin( &pigb->tmreg_seq_lock);
>      ns = timecounter_read(&pigb->tc);

And what happens here?

timecounter_read()
 timecounter_read_delta()
  tc->cc->read() == igb_825xx_systim_read()

   lo = rd32(E1000_SYSTIML); /* this read latches the time value */
   hi = rd32(E1000_SYSTIMH); /* and here is your race */

If two readers enter this code at nearly the same time, then they can
corrupt each other's hi/lo values.

>      } while (read_seqretry(&pigb->tmreg_seq_lock, seq));
>      // process ns
> }
> 
> 
> copyright 2012 - Chetan Loke <lokechetan@gmail.com>
> 
> // trip cnt will ensure/enforce - evil adjtime user-space code can
> still not block us.
> // called from igb_tx[rx]_hwtstamp
> driver_rx_tx_path_locking( ... ) {
>      unsigned int seq, trip_cnt = 0;
>      u64 ns;
>      do {
>      seq = read_seqbegin( &pigb->tmreg_seq_lock);
>      trip_cnt++;
>      ns = timecounter_read(&pigb->tc);

Same here.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-22 21:59             ` Keller, Jacob E
@ 2012-03-23  6:05               ` Richard Cochran
  0 siblings, 0 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-23  6:05 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: chetan loke, netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak,
	John, john.stultz, tglx

On Thu, Mar 22, 2012 at 09:59:01PM +0000, Keller, Jacob E wrote:
> 
> This means that readers don't block at all, and as long as the
> readers don't conflict with each other, there is no contention or
> writer starvation. It is possible for readers to 'live' lock, due to
> a large number of write operations. However, the lock is designed
> for few-writers, many readers. Which is what we have.

The readers do conflict with each other simply by reading the
registers E1000_SYSTIML and E1000_SYSTIMH. The locking is needed in
order to make these two reads "atomic."

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-22 23:13               ` chetan loke
  2012-03-23  5:59                 ` Richard Cochran
@ 2012-03-23 18:27                 ` Keller, Jacob E
  2012-03-23 19:39                   ` chetan loke
  1 sibling, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-23 18:27 UTC (permalink / raw)
  To: chetan loke, Richard Cochran
  Cc: netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak, John,
	john.stultz, tglx

> -----Original Message-----
> From: chetan loke [mailto:loke.chetan@gmail.com]
> Sent: Thursday, March 22, 2012 4:14 PM
> To: Richard Cochran
> Cc: Keller, Jacob E; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> // trip cnt will ensure/enforce - evil adjtime user-space code can still not
> block us.
> // called from igb_tx[rx]_hwtstamp
> driver_rx_tx_path_locking( ... ) {
>      unsigned int seq, trip_cnt = 0;
>      u64 ns;
>      do {
>      seq = read_seqbegin( &pigb->tmreg_seq_lock);
>      trip_cnt++;
>      ns = timecounter_read(&pigb->tc);
These functions would use timecounter_cyc2time, instead, and convert the RXTSTAMPL/H registers. That prevents the issue that Richard mentions (here), because two packets can't be timestamped at the same time anyways. (and on the devices where they can, the timestamp isn't returned via registers

However the try_cnt would not work, because of a case where the timecounter is just about to roll over. (So last check was 25 seconds ago.) Then the cyc2time function is called, determines that some RX timestamp is in the 'future'. At this point on another CPU the timecounter_read is run, and updates the nsec value in the timecounter by 25 seconds, and stores the last read timestamp value. Now the cyc2time function uses the 'new' values, but doesn't know to correct for it by using the "in the past" subtraction. Thus, the resulting ns returned by cyc2time would be 25 seconds in the future. This isn't just not accurate, it is very wrong and could throw of PTP calculations.

The solution here is when the try_cnt runs out, drop the timestamp or return 0 as the value. (So that the PTP app knows the timestamp was bad.) However this is introducing more potentially dropped timestamps,  which could be an issue.

I think a better way to solve this problem is throttle the ioctls. Would it be possible to detect users spamming the ioctls and return some sort of 'EBUSY' error? Maybe by using spin_trylock? That would prevent the starvation issues mentioned if a user were to spam set/get/adj in some weird loop, without a potentially dangerous locking scheme.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-23 18:27                 ` Keller, Jacob E
@ 2012-03-23 19:39                   ` chetan loke
  2012-03-24  6:51                     ` Richard Cochran
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-23 19:39 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Fri, Mar 23, 2012 at 2:27 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:

> However the try_cnt would not work, because of a case where the timecounter is just about to roll over. (So last check was 25 seconds ago.) Then the cyc2time function is called, determines that some RX timestamp is in the 'future'. At this point on another CPU the timecounter_read is run, and updates the nsec value in the timecounter by 25 seconds, and stores the last read timestamp value. Now the cyc2time function uses the 'new' values, but doesn't know to correct for it by using the "in the past" subtraction. Thus, the resulting ns returned by cyc2time would be 25 seconds in the future. This isn't just not accurate, it is very wrong and could throw of PTP calculations.
>

PS: for some reason the line endings were messed up already when I hit
reply to this msg.

So, how is it working today? Because we could have tx and rx
completions on different CPUs. Is it not possible to have the
following race today - between timecompare_update->timecompare_offset
-> timecounter_readdelta of say Rx and timecounter_cyc2time from Tx?



> The solution here is when the try_cnt runs out, drop the timestamp or return 0 as the value. (So that the PTP app knows the timestamp was bad.) However this is introducing more potentially dropped timestamps,  which could be an issue.
>
> I think a better way to solve this problem is throttle the ioctls. Would it be possible to detect users spamming the ioctls and return some sort of 'EBUSY' error? Maybe by using spin_trylock? That would prevent the starvation issues mentioned if a user were to spam set/get/adj in some weird loop, without a potentially dangerous locking scheme.
>
>

I was thinking of caching the tstamp values from the Tx/Rx path () and
also firing a kernel-thread per-driver to update the cached value. So
gettime (ab)users could be taken care off that way by returning the
cached value. But settime/adjtime would still be left.

So yes, rate limiting ioctls seems like a good idea.

How about rate limiting at the PHC class driver level? And then it
will work across the board for all the adapters at the device level.

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-23 19:39                   ` chetan loke
@ 2012-03-24  6:51                     ` Richard Cochran
  2012-03-26 15:07                       ` chetan loke
  2012-03-26 20:51                       ` Keller, Jacob E
  0 siblings, 2 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-24  6:51 UTC (permalink / raw)
  To: chetan loke
  Cc: e1000-devel, netdev, Ronciak, John, john.stultz, Keller, Jacob E, tglx

On Fri, Mar 23, 2012 at 03:39:08PM -0400, chetan loke wrote:
> 
> So, how is it working today? Because we could have tx and rx
> completions on different CPUs. Is it not possible to have the
> following race today - between timecompare_update->timecompare_offset
> -> timecounter_readdelta of say Rx and timecounter_cyc2time from Tx?

I works (in the igb) because of the spinlock. You know, that thing
that you are so against using.

> So yes, rate limiting ioctls seems like a good idea.

No, that is a terrible idea.
 
> How about rate limiting at the PHC class driver level? And then it
> will work across the board for all the adapters at the device level.

No, don't go there. Enough bikeshedding already. If you have a serious
performance issue, please post a test case, and we will look for a
solution.

Thanks,
Richard



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-24  6:51                     ` Richard Cochran
@ 2012-03-26 15:07                       ` chetan loke
  2012-03-26 15:27                         ` Richard Cochran
  2012-03-26 20:51                       ` Keller, Jacob E
  1 sibling, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-26 15:07 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Sat, Mar 24, 2012 at 2:51 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Mar 23, 2012 at 03:39:08PM -0400, chetan loke wrote:
>>
>> So, how is it working today? Because we could have tx and rx
>> completions on different CPUs. Is it not possible to have the
>> following race today - between timecompare_update->timecompare_offset
>> -> timecounter_readdelta of say Rx and timecounter_cyc2time from Tx?
>
> I works (in the igb) because of the spinlock. You know, that thing
> that you are so against using.
>

I meant, was there a lock before the PHC functionality in igb?

>
>> How about rate limiting at the PHC class driver level? And then it
>> will work across the board for all the adapters at the device level.
>
> No, don't go there. Enough bikeshedding already. If you have a serious

can a user without root privileges use get/adj/set time ioctls for the
PHC functionality?


> Thanks,
> Richard

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 15:07                       ` chetan loke
@ 2012-03-26 15:27                         ` Richard Cochran
  2012-03-26 17:11                           ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-26 15:27 UTC (permalink / raw)
  To: chetan loke
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Mon, Mar 26, 2012 at 11:07:40AM -0400, chetan loke wrote:
> On Sat, Mar 24, 2012 at 2:51 AM, Richard Cochran
> <richardcochran@gmail.com> wrote:
> > On Fri, Mar 23, 2012 at 03:39:08PM -0400, chetan loke wrote:
> >>
> >> So, how is it working today? Because we could have tx and rx
> >> completions on different CPUs. Is it not possible to have the
> >> following race today - between timecompare_update->timecompare_offset
> >> -> timecounter_readdelta of say Rx and timecounter_cyc2time from Tx?
> >
> > I works (in the igb) because of the spinlock. You know, that thing
> > that you are so against using.
> >
> 
> I meant, was there a lock before the PHC functionality in igb?

There was no lock, and yes, it was a bug.

> >> How about rate limiting at the PHC class driver level? And then it
> >> will work across the board for all the adapters at the device level.
> >
> > No, don't go there. Enough bikeshedding already. If you have a serious
> 
> can a user without root privileges use get/adj/set time ioctls for the
> PHC functionality?

Depends on how you set the character device node permissions.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 15:27                         ` Richard Cochran
@ 2012-03-26 17:11                           ` chetan loke
  2012-03-26 18:56                             ` Keller, Jacob E
  2012-03-27 15:33                             ` Richard Cochran
  0 siblings, 2 replies; 38+ messages in thread
From: chetan loke @ 2012-03-26 17:11 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Mon, Mar 26, 2012 at 11:27 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Mon, Mar 26, 2012 at 11:07:40AM -0400, chetan loke wrote:
>> On Sat, Mar 24, 2012 at 2:51 AM, Richard Cochran
>> <richardcochran@gmail.com> wrote:
>> > On Fri, Mar 23, 2012 at 03:39:08PM -0400, chetan loke wrote:
>> >>
>> >> So, how is it working today? Because we could have tx and rx
>> >> completions on different CPUs. Is it not possible to have the
>> >> following race today - between timecompare_update->timecompare_offset
>> >> -> timecounter_readdelta of say Rx and timecounter_cyc2time from Tx?
>> >
>> > I works (in the igb) because of the spinlock. You know, that thing
>> > that you are so against using.
>> >
>>
>> I meant, was there a lock before the PHC functionality in igb?
>
> There was no lock, and yes, it was a bug.

Ok, so this needs to be fixed irrespective of the PHC code.

>
>> >> How about rate limiting at the PHC class driver level? And then it
>> >> will work across the board for all the adapters at the device level.
>> >
>> > No, don't go there. Enough bikeshedding already. If you have a serious

Did I ever tell you that your patch is costing us 'N' clock-cycles in
the fast path? We all understand that to gain some features we may
have to sacrifice something. Folks who want time-stamping might have
to take a small performance hit (may be to work around hardware issues
and so on).

You are confusing 'blocking the driver's fast path' with
performance/optimization etc. We cannot let user-space code jam the
system. Kernel code should be designed such that
bugs(intentional/unintentional) in user-space code cannot cause system
wide adverse affects. Period.

Does your existing design limit (ab)users from pounding the ioctls?

As I mentioned earlier, it could be possible to take care of gettime
and the driver's Rx/Tx path by using a mixture of locks/kernel-thread.
But settime/adjtime still needs to be curbed.

Why isn't ioctl-rate limiting acceptable? Let's say an app that is
trying to set/adj NIC counter is running on host side then how often
would it need to read and correct/set/adj? once every msec(so 1000
times a second), once every 10 msec(100 times a second) etc?

So, will pounding the ioctl 1000 times, while processing ~820K
frames(1500 byte payload on 10G link) still cause a problem for the
driver is what we would need to see. rate can also be a factor of
link-speed(?).

And we don't need 100 such apps. Only 1 app should be working in
tandem with the NIC. If other apps fail then atleast the sysadmin or
users would know someone else is (ab)using it.

> Thanks,
> Richard

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 17:11                           ` chetan loke
@ 2012-03-26 18:56                             ` Keller, Jacob E
  2012-03-26 19:32                               ` chetan loke
  2012-03-27 15:33                             ` Richard Cochran
  1 sibling, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-26 18:56 UTC (permalink / raw)
  To: chetan loke, Richard Cochran
  Cc: netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak, John,
	john.stultz, tglx

> -----Original Message-----
> From: chetan loke [mailto:loke.chetan@gmail.com]
> Sent: Monday, March 26, 2012 10:12 AM
> To: Richard Cochran
> Cc: Keller, Jacob E; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> You are confusing 'blocking the driver's fast path' with
> performance/optimization etc. We cannot let user-space code jam the system.
> Kernel code should be designed such that
> bugs(intentional/unintentional) in user-space code cannot cause system wide
> adverse affects. Period.
>
I agree. But I also think a test case should be in order here. What actually
happens with a badly designed user app cramming ioctls. (Note that under most
system designs the ioctls require root access unless you change the default
permissions.)

> Does your existing design limit (ab)users from pounding the ioctls?
> 
> As I mentioned earlier, it could be possible to take care of gettime and the
> driver's Rx/Tx path by using a mixture of locks/kernel-thread.
> But settime/adjtime still needs to be curbed.
> 
I am curious how you see a kernel thread resolving the Tx/Rx issue? or is the
kernel thread being used by gettime? I don't believe we can wait for the Tx/Rx
path, because if we take too long the software sees it as a dropped timestamp.
Also we have a pointer to the skb which is free'd right after the call for
(rx/tx)hwtstamp so we can't just copy that pointer.

> Why isn't ioctl-rate limiting acceptable? Let's say an app that is trying to
> set/adj NIC counter is running on host side then how often would it need to
> read and correct/set/adj? once every msec(so 1000 times a second), once every
> 10 msec(100 times a second) etc?
> 
> So, will pounding the ioctl 1000 times, while processing ~820K
> frames(1500 byte payload on 10G link) still cause a problem for the driver is
> what we would need to see. rate can also be a factor of link-speed(?).
> 
> And we don't need 100 such apps. Only 1 app should be working in tandem with
> the NIC. If other apps fail then atleast the sysadmin or users would know
> someone else is (ab)using it.

How would we only allow one app? Any app with permissions could call the ioctls.
I do agree that having too many ioctls is a problem. Even in cases where PTP is
only enabled on a "trusted" system. Allowing the kernel to assume the system
is trusted means we introduce security holes. I am not sure of the right solution
or whether we need a different one, though. I am going to try and provide a test case
where someone hammers the ioctls in a loop, and see what happens.

> 
> 
> > Thanks,
> > Richard
> 
> Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 18:56                             ` Keller, Jacob E
@ 2012-03-26 19:32                               ` chetan loke
  2012-03-26 20:46                                 ` Keller, Jacob E
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-26 19:32 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: e1000-devel, netdev, Ronciak, John, john.stultz, tglx

On Mon, Mar 26, 2012 at 2:56 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:

> I am curious how you see a kernel thread resolving the Tx/Rx issue? or is the
> kernel thread being used by gettime? I don't believe we can wait for the Tx/Rx

tx, rx path and kernel thread will use seq_lock_isave(tmreg_lock).

A new 'u64 cached_ns' will be introduced.

tx and rx path can update 'cached_ns' when they read the NIC counter.

A kernel-thread should be scheduled periodically, will read the NIC
counter and update cached_ns. It will use a _trylock variant. If the
lock fails then its a hint that 'cached_ns' is getting updated
somewhere, so just refresh the timer. Periodic update is needed to
handle idle/bursty link conditions because the Rx/tx path may not run
that often.

gettime uses read_seq_lock and reads 'cached_ns'. I mean we could also
do a atomic_read(?).

> path, because if we take too long the software sees it as a dropped timestamp.

What code-path is this?

> How would we only allow one app? Any app with permissions could call the ioctls.
> I do agree that having too many ioctls is a problem. Even in cases where PTP is

We don't. So this is what I'm thinking. Ideally speaking only 1 app
should be adjusting the host-clock and 1 app per NIC if you adjust
NIC's clock. Ignore the host-clock app. Once you enforce the
rate-limit, driver will return -EBUSY if you exceed it. Too many
EBUSYs will provide a hint on user-side.

Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 19:32                               ` chetan loke
@ 2012-03-26 20:46                                 ` Keller, Jacob E
  2012-03-27 18:05                                   ` Richard Cochran
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-26 20:46 UTC (permalink / raw)
  To: chetan loke
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

> -----Original Message-----
> From: chetan loke [mailto:loke.chetan@gmail.com]
> Sent: Monday, March 26, 2012 12:32 PM
> To: Keller, Jacob E
> Cc: Richard Cochran; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> On Mon, Mar 26, 2012 at 2:56 PM, Keller, Jacob E <jacob.e.keller@intel.com>
> wrote:
> 
> > I am curious how you see a kernel thread resolving the Tx/Rx issue? or
> > is the kernel thread being used by gettime? I don't believe we can
> > wait for the Tx/Rx
> 
> tx, rx path and kernel thread will use seq_lock_isave(tmreg_lock).
> 
> A new 'u64 cached_ns' will be introduced.
> 
> tx and rx path can update 'cached_ns' when they read the NIC counter.
>
Tx and rx paths don't read the SYSTIM registers. When hardware (other
than the 82580 which could use a different software implementation)
receives a PTP packet, it is time stamped *by the hardware* and the
result is stored in the RXSTAMP registers (analog for rx/tx). The
hardware will not timestamp again until after the register is read.
(ignore 82580, it is different). Then the driver will read this register.

The clincher for why we have so much issues here: the SYSTIM registers
are *not* 64bit nanosecond values. They are 64 bits wide (82580 is 40
bits + 32 bits of sub precision or 72 bits wide) but with a fixed point
system. This system allows for defining how much room for adjustment
you want but sacrifices the total amount that can be stored as nanoseconds.

The timecounter structure keeps track of a 64bit field of nanoseconds,
and is based on the cyclecounter. (By knowing how fast the cyclecounter
should be running at 100% optimum circumstances.)

In the tx/rx routine, we convert the cycle timestamps which has an
overflow rate anywhere from once a minute to a few hours. This
conversion uses last updated value of the timecounter, but does not
modify/update the counter. However, it has to lock the structure in
case an update occurs during the cyc2time function call, otherwise you
can corrupt the values. There is no problem using seqlock here. (as
long as we retry). I would take the write lock here, so that it doesn't
have to retry, although you could take the read version. Retrying is
necessary because otherwise there is a window where wildly inaccurate
nanosecond values could be obtained.

The issue with gettime is in the SYSTIM registers. Gettime could also
use cyc2time, and allow the background worktask to be the only one updating
the timecounter's last read value. (this only has to happen at least twice
every wraparound value, so for the 10gig 82599 about once every 30 seconds).
But the lock is also protecting against corruption of the SYSTIM register
reads. (The lock does 2 different things. We could add a separate lock,
but I am not convinced it is worth it).

We can't use timecounter_read inside a read portion of the seqlock because
it has side effects. Reading SYSTIM registers has side effects also
because we have to read them sequentially (lo, then hi, cannot intersperse
two sets of lo and hi reads).

I wrote a patch which shows how you could use just seqlocks, but it has
the issue mentioned above with the SYSTIM registers. These only get used
in the gettime code and the watchdog task that updates the timecounter.
We could implement a basic spinlock around that also, and that would mean
the seqlock would only be around the timecounter structure. This does add
the complication of two locks. :(

I feel like this approach is best, considering that each lock does precisely
one thing, and rarely need to be acquired at the same time. (the only place
being inside timecounter_read, and possibly in cyclecounter_update for the ixgbe)

> A kernel-thread should be scheduled periodically, will read the NIC counter
> and update cached_ns. It will use a _trylock variant. If the lock fails then
> its a hint that 'cached_ns' is getting updated somewhere, so just refresh the
> timer. Periodic update is needed to handle idle/bursty link conditions because
> the Rx/tx path may not run that often.
> 
> gettime uses read_seq_lock and reads 'cached_ns'. I mean we could also do a
> atomic_read(?)

As above cached_ns would not work. Timecounter structure isn't ticking on its
own. It converts the SYSTIM values to nanoseconds and keeps track of overflows.
(thereby allowing it to create nanoseconds since the epoch as a 64bit value,
which is what the user software expects). Atomic reads don't work because the
atomic values only have around 24bits of actual data. (we would need at least 64bits)

>
> 
> > path, because if we take too long the software sees it as a dropped
> timestamp.
> 
> What code-path is this?
>
For transmit only, inside the user software, it sends a packet then waits
for a copy of that packet on the receive errqueue which is copied by the kernel
and put there with the TX timestamp. If the driver takes too long to report the
timestamp then the user software times out and reports a time stamp as missing
even though it would later be returned by the driver. It is not an issue with
the RX timestamps.

> 
> > How would we only allow one app? Any app with permissions could call the
> ioctls.
> > I do agree that having too many ioctls is a problem. Even in cases
> > where PTP is
> 
> We don't. So this is what I'm thinking. Ideally speaking only 1 app should be
> adjusting the host-clock and 1 app per NIC if you adjust NIC's clock. Ignore
> the host-clock app. Once you enforce the rate-limit, driver will return -EBUSY
> if you exceed it. Too many EBUSYs will provide a hint on user-side.
>

This might be possible, but how would you code the solution? Software wouldn't
have any idea how often things are being run. Maybe have a timeout value that you
check for determining whether the last call came too soon?

> 
> Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-24  6:51                     ` Richard Cochran
  2012-03-26 15:07                       ` chetan loke
@ 2012-03-26 20:51                       ` Keller, Jacob E
  2012-03-27 18:33                         ` Richard Cochran
  1 sibling, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-26 20:51 UTC (permalink / raw)
  To: Richard Cochran, chetan loke
  Cc: netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak, John,
	john.stultz, tglx

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Friday, March 23, 2012 11:52 PM
> To: chetan loke
> Cc: Keller, Jacob E; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> > How about rate limiting at the PHC class driver level? And then it
> > will work across the board for all the adapters at the device level.
> 
> No, don't go there. Enough bikeshedding already. If you have a serious
> performance issue, please post a test case, and we will look for a solution.
> 

This is about a case where the user does something stupid. (runs the ioctls too
fast). It appears that your answer is something like "The user did something
stupid, we shouldn't care". My answer is, "we should do what we can to prevent
this." I don't think modifying the ptp framework is necessary or even a good
idea. (Because it is a particular issue with intel parts design, not with general
PTP design.) I do think if possible we should modify igb/ixgbe patches properly
to prevent the breaking. I am currently working on a test case where the user
*does* hammer the ioctls and seeing whether that actually crashes the driver.

> Thanks,
> Richard
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 17:11                           ` chetan loke
  2012-03-26 18:56                             ` Keller, Jacob E
@ 2012-03-27 15:33                             ` Richard Cochran
  2012-03-27 18:39                               ` chetan loke
  1 sibling, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-27 15:33 UTC (permalink / raw)
  To: chetan loke
  Cc: Keller, Jacob E, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Mon, Mar 26, 2012 at 01:11:32PM -0400, chetan loke wrote:
> 
> Why isn't ioctl-rate limiting acceptable?

Let me turn the question around.

What other kernel subsystem rate limits ioctls?

Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 20:46                                 ` Keller, Jacob E
@ 2012-03-27 18:05                                   ` Richard Cochran
  2012-03-27 18:29                                     ` Keller, Jacob E
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Cochran @ 2012-03-27 18:05 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: chetan loke, netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak,
	John, john.stultz, tglx

Jacob,

Thanks for your detailed explanation...

On Mon, Mar 26, 2012 at 08:46:13PM +0000, Keller, Jacob E wrote:
>
> The issue with gettime is in the SYSTIM registers. Gettime could also
> use cyc2time, and allow the background worktask to be the only one updating
> the timecounter's last read value. (this only has to happen at least twice
> every wraparound value, so for the 10gig 82599 about once every 30 seconds).
> But the lock is also protecting against corruption of the SYSTIM register
> reads. (The lock does 2 different things. We could add a separate lock,
> but I am not convinced it is worth it).

Right, the spinlock in my patch protects both the SYSTIM registers and
the struct timecounter. For the SYSTIM pair, you must use a
spinlock. There is no way around it.

The struct timecounter fields .cycle_last and .nsec are used as follows

   WRITE:   timercoutner_read()
            timercoutner_init()

   READ:    timercoutner_cyc2time()

Here you could conceivably use a reader/writer semaphore or a
seqlock. That would in turn enlarge the data structure and would
require nesting the locks.

I really doubt you will see any performance gain from such a
change. It increases code complexity and size for some dubious,
theoretical performance gain, for some really whacked use case.

Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 18:05                                   ` Richard Cochran
@ 2012-03-27 18:29                                     ` Keller, Jacob E
  2012-03-27 18:43                                       ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-27 18:29 UTC (permalink / raw)
  To: Richard Cochran
  Cc: chetan loke, netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak,
	John, john.stultz, tglx

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Tuesday, March 27, 2012 11:05 AM
> To: Keller, Jacob E
> Cc: chetan loke; netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net;
> Kirsher, Jeffrey T; Ronciak, John; john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> I really doubt you will see any performance gain from such a change. It
> increases code complexity and size for some dubious, theoretical performance
> gain, for some really whacked use case.
> 
> Richard
> 
> 

It isn't so much about performance gains as it is about preventing poorly written user apps from stalling the clean descriptor routines. I am working on a test case that should prove whether this is even an issue (at least with ixgbe). Once I have data on that it can be determined if the extra lock would alleviate it. The conceptual issue is that spamming get-time could cause the clean tx/rx irq routines to stall inside the interrupt for too long. (thereby not freeing up descriptors on the ring to allow more packets). While performance is an issue, I don't feel that it can be brushed off as "performance loss due to feature" unless I can show that it isn't very easy to trip, or doesn't cause a major issue. Once I know the data, it can be determined what the right approach is.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-26 20:51                       ` Keller, Jacob E
@ 2012-03-27 18:33                         ` Richard Cochran
  0 siblings, 0 replies; 38+ messages in thread
From: Richard Cochran @ 2012-03-27 18:33 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: chetan loke, netdev, e1000-devel, Kirsher, Jeffrey T, Ronciak,
	John, john.stultz, tglx

On Mon, Mar 26, 2012 at 08:51:28PM +0000, Keller, Jacob E wrote:
> 
> This is about a case where the user does something stupid. (runs the ioctls too
> fast). It appears that your answer is something like "The user did something
> stupid, we shouldn't care". My answer is, "we should do what we can to prevent
> this."

No matter how much you optimize your driver (or library or whatever),
there is always some user load that will kill your performance.

The locking is need for data integrity. The code path in the lock is
short. I doubt there is any serious performance issue here.

Richard

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 15:33                             ` Richard Cochran
@ 2012-03-27 18:39                               ` chetan loke
  0 siblings, 0 replies; 38+ messages in thread
From: chetan loke @ 2012-03-27 18:39 UTC (permalink / raw)
  To: Richard Cochran
  Cc: e1000-devel, netdev, Ronciak, John, john.stultz, Keller, Jacob E, tglx

On Tue, Mar 27, 2012 at 11:33 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Mon, Mar 26, 2012 at 01:11:32PM -0400, chetan loke wrote:
>>
>> Why isn't ioctl-rate limiting acceptable?
>
> Let me turn the question around.
>
> What other kernel subsystem rate limits ioctls?
>

netdev needs to worry about reaching the host via ethernet. If you
lose SCSI LUNs because you are pounding those mailbox cmds to get that
temperature status from the HBA then who the heck cares? I mean we do
but we can reach the host via it's mgmt-IP and look at it. From what I
can see in SCSI for example, control commands are normally issued in
separate paths by the driver and don't choke the fast path. Those
completions are asynchronous like everything else and do NOT contend
with the Rx/Tx path.

May be I'm paranoid but let's wait for Intel(Jake) to run the ioctl test.

Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 18:29                                     ` Keller, Jacob E
@ 2012-03-27 18:43                                       ` chetan loke
  2012-03-27 18:51                                         ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-27 18:43 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: e1000-devel, netdev, Ronciak, John, john.stultz, tglx

On Tue, Mar 27, 2012 at 2:29 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:

>>
>
> It isn't so much about performance gains as it is about preventing poorly written user apps from stalling the clean descriptor routines. I am working on a test case that should prove whether this is even an issue (at least with ixgbe). Once I have data on that it can be determined if the extra lock would alleviate it. The conceptual issue is that spamming get-time could cause the clean tx/rx irq routines to stall inside the interrupt for too long. (thereby not freeing up descriptors on the ring to

What would be interesting to see is something like:

1) App1(getter/setter calls), executed '1000' ioctls per sec on eth0.
Other 'M' apps(execute just get calls), where M = 1 .. 10(?). And one
buggy app executed - 10K getter calls per sec.

2) eth0 receives PTP traffic and also around 80-90%(link rate)
regular/mixed traffic.

And BTW, eth0 is the management IP.

While the test is running, if we are still able to reach the host and
manage it via SSH then we can stop being paranoid.


Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 18:43                                       ` chetan loke
@ 2012-03-27 18:51                                         ` chetan loke
  2012-03-27 20:58                                           ` Keller, Jacob E
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-27 18:51 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: e1000-devel, netdev, Ronciak, John, john.stultz, tglx

On Tue, Mar 27, 2012 at 2:43 PM, chetan loke <loke.chetan@gmail.com> wrote:
> On Tue, Mar 27, 2012 at 2:29 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
>
>>>
>>
>> It isn't so much about performance gains as it is about preventing poorly written user apps from stalling the clean descriptor routines. I am working on a test case that should prove whether this is even an issue (at least with ixgbe). Once I have data on that it can be determined if the extra lock would alleviate it. The conceptual issue is that spamming get-time could cause the clean tx/rx irq routines to stall inside the interrupt for too long. (thereby not freeing up descriptors on the ring to
>
> What would be interesting to see is something like:
>
> 1) App1(getter/setter calls), executed '1000' ioctls per sec on eth0.
> Other 'M' apps(execute just get calls), where M = 1 .. 10(?). And one
> buggy app executed - 10K getter calls per sec.
>
> 2) eth0 receives PTP traffic and also around 80-90%(link rate)
> regular/mixed traffic.
>
> And BTW, eth0 is the management IP.
>
> While the test is running, if we are still able to reach the host and
> manage it via SSH then we can stop being paranoid.
>
>

Guys please correct me if I'm wrong but if RPS/RFS is enabled then at
a minimum we would need "N * 2" apps (where N == num_cpus). That way
there will be contention w/ the completions and we will be able to see
the real effect?

Chetan

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 18:51                                         ` chetan loke
@ 2012-03-27 20:58                                           ` Keller, Jacob E
  2012-03-27 21:55                                             ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-27 20:58 UTC (permalink / raw)
  To: chetan loke
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

> -----Original Message-----
> From: chetan loke [mailto:loke.chetan@gmail.com]
> Sent: Tuesday, March 27, 2012 11:52 AM
> To: Keller, Jacob E
> Cc: Richard Cochran; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> On Tue, Mar 27, 2012 at 2:43 PM, chetan loke <loke.chetan@gmail.com> wrote:
> > On Tue, Mar 27, 2012 at 2:29 PM, Keller, Jacob E
> > <jacob.e.keller@intel.com> wrote:
> >
> >>>
> >>
> >> It isn't so much about performance gains as it is about preventing
> >> poorly written user apps from stalling the clean descriptor routines.
> >> I am working on a test case that should prove whether this is even an
> >> issue (at least with ixgbe). Once I have data on that it can be
> >> determined if the extra lock would alleviate it. The conceptual issue
> >> is that spamming get-time could cause the clean tx/rx irq routines to
> >> stall inside the interrupt for too long. (thereby not freeing up
> >> descriptors on the ring to
> >
> > What would be interesting to see is something like:
> >
> > 1) App1(getter/setter calls), executed '1000' ioctls per sec on eth0.
> > Other 'M' apps(execute just get calls), where M = 1 .. 10(?). And one
> > buggy app executed - 10K getter calls per sec.
> >
> > 2) eth0 receives PTP traffic and also around 80-90%(link rate)
> > regular/mixed traffic.
> >
> > And BTW, eth0 is the management IP.
> >
> > While the test is running, if we are still able to reach the host and
> > manage it via SSH then we can stop being paranoid.
> >
> >
> 
> Guys please correct me if I'm wrong but if RPS/RFS is enabled then at a
> minimum we would need "N * 2" apps (where N == num_cpus). That way there will
> be contention w/ the completions and we will be able to see the real effect?
> 
> Chetan

I think we could see contention regardless because the spinlock doesn't guarantee the ordering of who gets it next.

I am not sure. But I will try and set something like this up. However, I do think that many get-set calls is pretty high for even a 'highly' loaded system. Though the buggy app for sure is possible.

Here is what I am thinking as a test case. Linuxptp running normally with a higher sync rate than once per second, plus a 'buggy' app which will try to infinitely thread the gettime calls. I hope to have something like this working soon.

- Jake

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 20:58                                           ` Keller, Jacob E
@ 2012-03-27 21:55                                             ` chetan loke
  2012-03-29 23:08                                               ` Keller, Jacob E
  0 siblings, 1 reply; 38+ messages in thread
From: chetan loke @ 2012-03-27 21:55 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Tue, Mar 27, 2012 at 4:58 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:

>
> I think we could see contention regardless because the spinlock doesn't guarantee the ordering of who gets it next.
>
> I am not sure. But I will try and set something like this up. However, I do think that many get-set calls is pretty high for even a 'highly' loaded system. Though the buggy app for sure is possible.
>
> Here is what I am thinking as a test case. Linuxptp running normally with a higher sync rate than once per second, plus a 'buggy' app which will try to infinitely thread the gettime calls. I hope to have something like this working soon.
>

Agreed, we don't know who would grab the lock next. But with just one
app/process we may not be able to induce the contention because
process scheduling would come into play. A single process would only
get that much time slice. With multiple processes, you will be able to
schedule them on multiple CPUs and hence contend with the driver's
completion path because that is what a real-exploit would do.

make sure numactl is installed on your system. Within a shell script,
launch multiple instances of the process as follows:

#!/bin/bash

num_cpus=`cat /proc/cpuinfo |grep -i processor |wc -l`

for ((i=0; i<$num_cpus; i++))
do
  echo "Launching instance:$(($i+1))"
  numa_cmd="numactl --physcpubind=$i /path/to/buggy-app &"
  echo "executing numa-cmd:$numa_cmd"
  eval $numa_cmd
done


> - Jake

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-27 21:55                                             ` chetan loke
@ 2012-03-29 23:08                                               ` Keller, Jacob E
  2012-03-30 15:17                                                 ` chetan loke
  0 siblings, 1 reply; 38+ messages in thread
From: Keller, Jacob E @ 2012-03-29 23:08 UTC (permalink / raw)
  To: chetan loke
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

> -----Original Message-----
> From: chetan loke [mailto:loke.chetan@gmail.com]
> Sent: Tuesday, March 27, 2012 2:55 PM
> To: Keller, Jacob E
> Cc: Richard Cochran; netdev@vger.kernel.org; e1000-
> devel@lists.sourceforge.net; Kirsher, Jeffrey T; Ronciak, John;
> john.stultz@linaro.org; tglx@linutronix.de
> Subject: Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the
> timecompare method
> 
> On Tue, Mar 27, 2012 at 4:58 PM, Keller, Jacob E <jacob.e.keller@intel.com>
> wrote:
> 
> >
> > I think we could see contention regardless because the spinlock doesn't
> guarantee the ordering of who gets it next.
> >
> > I am not sure. But I will try and set something like this up. However, I do
> think that many get-set calls is pretty high for even a 'highly' loaded
> system. Though the buggy app for sure is possible.
> >
> > Here is what I am thinking as a test case. Linuxptp running normally with a
> higher sync rate than once per second, plus a 'buggy' app which will try to
> infinitely thread the gettime calls. I hope to have something like this
> working soon.
> >
> 
> Agreed, we don't know who would grab the lock next. But with just one
> app/process we may not be able to induce the contention because process
> scheduling would come into play. A single process would only get that much
> time slice. With multiple processes, you will be able to schedule them on
> multiple CPUs and hence contend with the driver's completion path because that
> is what a real-exploit would do.
> 
> make sure numactl is installed on your system. Within a shell script, launch
> multiple instances of the process as follows:
> 
> #!/bin/bash
> 
> num_cpus=`cat /proc/cpuinfo |grep -i processor |wc -l`
> 
> for ((i=0; i<$num_cpus; i++))
> do
>   echo "Launching instance:$(($i+1))"
>   numa_cmd="numactl --physcpubind=$i /path/to/buggy-app &"
>   echo "executing numa-cmd:$numa_cmd"
>   eval $numa_cmd
> done
> 
> 
> > - Jake
> 
> Chetan

I performed this test on my machine. Even with a buggy app that calls clock_gettime
inside a while loop, this does not produce any contention whatsoever with
time stamping. I checked the timestamps returned also, and it appears to be running
more than 20k times a second.

Based on the following factors I do not believe this is an issue:

1) A user requires root to use the ioctl. (Or root user has to grant the user access, which isn't necessary for normal ptp functionality).

2) A root user can already trash the system, therefore we cannot consider this an
   exploit as it isn't a method to gain root access or halt a system under a normal
   user.

3) A program operating at such a high ioctl rate is flawed design which is not
   expected. PTP users should already understand the ioctl has more latency than a
   PPS setup.

4) Even running 48 processes (2 per CPU) with a while loop repeatedly calling
   clock_gettime, zero contention was observed. The times returned have about a
   50us delay between them. This translates to somewhere in the ballpark of 20k
   calls per second. This has no effect on network traffic (I can still ssh/ping
   over that interface) and has no impact on dropping timestamps of packets. Using
   large amounts of traffic is not valid because that is already a known constraint
   on ptp. (Too much traffic, especially ptp traffic) results in dropped timestamps
   because of the way the hardware stores timestamps for packets in the registers)

If you are not satisfied with this, please provide evidence of a failed test case.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-03-29 23:08                                               ` Keller, Jacob E
@ 2012-03-30 15:17                                                 ` chetan loke
  0 siblings, 0 replies; 38+ messages in thread
From: chetan loke @ 2012-03-30 15:17 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Richard Cochran, netdev, e1000-devel, Kirsher, Jeffrey T,
	Ronciak, John, john.stultz, tglx

On Thu, Mar 29, 2012 at 7:08 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:

>
> I performed this test on my machine. Even with a buggy app that calls clock_gettime
> inside a while loop, this does not produce any contention whatsoever with
> time stamping. I checked the timestamps returned also, and it appears to be running
> more than 20k times a second.
>

Great. Thanks for running the tests. I'm all set.

Intel/Richard - thanks for being patient.

Chetan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method
  2012-01-21 16:03 [PATCH net V4 0/2] igb: ptp hardware clock Richard Cochran
@ 2012-01-21 16:03 ` Richard Cochran
  0 siblings, 0 replies; 38+ messages in thread
From: Richard Cochran @ 2012-01-21 16:03 UTC (permalink / raw)
  To: netdev
  Cc: e1000-devel, John Ronciak, John Stultz, Jacob Keller, Thomas Gleixner

This commit removes the legacy timecompare code from the igb driver and
offers a tunable PHC instead.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 drivers/net/ethernet/intel/igb/Makefile   |    2 +-
 drivers/net/ethernet/intel/igb/igb.h      |   12 ++-
 drivers/net/ethernet/intel/igb/igb_main.c |  167 +----------------------------
 drivers/net/ethernet/intel/igb/igb_ptp.c  |   59 ++++++++++
 4 files changed, 70 insertions(+), 170 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/Makefile b/drivers/net/ethernet/intel/igb/Makefile
index c6e4621..42f0868 100644
--- a/drivers/net/ethernet/intel/igb/Makefile
+++ b/drivers/net/ethernet/intel/igb/Makefile
@@ -32,6 +32,6 @@
 
 obj-$(CONFIG_IGB) += igb.o
 
-igb-objs := igb_main.o igb_ethtool.o e1000_82575.o \
+igb-objs := igb_main.o igb_ethtool.o igb_ptp.o e1000_82575.o \
 	    e1000_mac.o e1000_nvm.o e1000_phy.o e1000_mbx.o
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 79f354b..cd08f73 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -35,7 +35,6 @@
 #include "e1000_82575.h"
 
 #include <linux/clocksource.h>
-#include <linux/timecompare.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
 #include <linux/bitops.h>
@@ -329,9 +328,6 @@ struct igb_adapter {
 
 	/* OS defined structs */
 	struct pci_dev *pdev;
-	struct cyclecounter cycles;
-	struct timecounter clock;
-	struct timecompare compare;
 	struct hwtstamp_config hwtstamp_config;
 
 	spinlock_t stats64_lock;
@@ -386,7 +382,6 @@ struct igb_adapter {
 #define IGB_DMCTLX_DCFLUSH_DIS     0x80000000  /* Disable DMA Coal Flush */
 
 #define IGB_82576_TSYNC_SHIFT 19
-#define IGB_82580_TSYNC_SHIFT 24
 #define IGB_TS_HDR_LEN        16
 enum e1000_state_t {
 	__IGB_TESTING,
@@ -423,6 +418,13 @@ extern bool igb_has_link(struct igb_adapter *adapter);
 extern void igb_set_ethtool_ops(struct net_device *);
 extern void igb_power_up_link(struct igb_adapter *);
 
+extern void igb_ptp_init(struct igb_adapter *adapter);
+extern void igb_ptp_remove(struct igb_adapter *adapter);
+
+extern void igb_systim_to_hwtstamp(struct igb_adapter *adapter,
+				   struct skb_shared_hwtstamps *hwtstamps,
+				   u64 systim);
+
 static inline s32 igb_reset_phy(struct e1000_hw *hw)
 {
 	if (hw->phy.ops.reset)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 01e5e89..4cae135 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -114,7 +114,6 @@ static void igb_free_all_rx_resources(struct igb_adapter *);
 static void igb_setup_mrqc(struct igb_adapter *);
 static int igb_probe(struct pci_dev *, const struct pci_device_id *);
 static void __devexit igb_remove(struct pci_dev *pdev);
-static void igb_init_hw_timer(struct igb_adapter *adapter);
 static int igb_sw_init(struct igb_adapter *);
 static int igb_open(struct net_device *);
 static int igb_close(struct net_device *);
@@ -558,33 +557,6 @@ exit:
 	return;
 }
 
-
-/**
- * igb_read_clock - read raw cycle counter (to be used by time counter)
- */
-static cycle_t igb_read_clock(const struct cyclecounter *tc)
-{
-	struct igb_adapter *adapter =
-		container_of(tc, struct igb_adapter, cycles);
-	struct e1000_hw *hw = &adapter->hw;
-	u64 stamp = 0;
-	int shift = 0;
-
-	/*
-	 * The timestamp latches on lowest register read. For the 82580
-	 * the lowest register is SYSTIMR instead of SYSTIML.  However we never
-	 * adjusted TIMINCA so SYSTIMR will just read as all 0s so ignore it.
-	 */
-	if (hw->mac.type >= e1000_82580) {
-		stamp = rd32(E1000_SYSTIMR) >> 8;
-		shift = IGB_82580_TSYNC_SHIFT;
-	}
-
-	stamp |= (u64)rd32(E1000_SYSTIML) << shift;
-	stamp |= (u64)rd32(E1000_SYSTIMH) << (shift + 32);
-	return stamp;
-}
-
 /**
  * igb_get_hw_dev - return device
  * used by hardware layer to print debugging information
@@ -2090,7 +2062,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 
 #endif
 	/* do hw tstamp init after resetting */
-	igb_init_hw_timer(adapter);
+	igb_ptp_init(adapter);
 
 	dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n");
 	/* print bus type/speed/width info */
@@ -2164,6 +2136,8 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 
 	pm_runtime_get_noresume(&pdev->dev);
 
+	igb_ptp_remove(adapter);
+
 	/*
 	 * The watchdog timer may be rescheduled, so explicitly
 	 * disable watchdog from being rescheduled.
@@ -2283,112 +2257,6 @@ out:
 }
 
 /**
- * igb_init_hw_timer - Initialize hardware timer used with IEEE 1588 timestamp
- * @adapter: board private structure to initialize
- *
- * igb_init_hw_timer initializes the function pointer and values for the hw
- * timer found in hardware.
- **/
-static void igb_init_hw_timer(struct igb_adapter *adapter)
-{
-	struct e1000_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case e1000_i350:
-	case e1000_82580:
-		memset(&adapter->cycles, 0, sizeof(adapter->cycles));
-		adapter->cycles.read = igb_read_clock;
-		adapter->cycles.mask = CLOCKSOURCE_MASK(64);
-		adapter->cycles.mult = 1;
-		/*
-		 * The 82580 timesync updates the system timer every 8ns by 8ns
-		 * and the value cannot be shifted.  Instead we need to shift
-		 * the registers to generate a 64bit timer value.  As a result
-		 * SYSTIMR/L/H, TXSTMPL/H, RXSTMPL/H all have to be shifted by
-		 * 24 in order to generate a larger value for synchronization.
-		 */
-		adapter->cycles.shift = IGB_82580_TSYNC_SHIFT;
-		/* disable system timer temporarily by setting bit 31 */
-		wr32(E1000_TSAUXC, 0x80000000);
-		wrfl();
-
-		/* Set registers so that rollover occurs soon to test this. */
-		wr32(E1000_SYSTIMR, 0x00000000);
-		wr32(E1000_SYSTIML, 0x80000000);
-		wr32(E1000_SYSTIMH, 0x000000FF);
-		wrfl();
-
-		/* enable system timer by clearing bit 31 */
-		wr32(E1000_TSAUXC, 0x0);
-		wrfl();
-
-		timecounter_init(&adapter->clock,
-				 &adapter->cycles,
-				 ktime_to_ns(ktime_get_real()));
-		/*
-		 * Synchronize our NIC clock against system wall clock. NIC
-		 * time stamp reading requires ~3us per sample, each sample
-		 * was pretty stable even under load => only require 10
-		 * samples for each offset comparison.
-		 */
-		memset(&adapter->compare, 0, sizeof(adapter->compare));
-		adapter->compare.source = &adapter->clock;
-		adapter->compare.target = ktime_get_real;
-		adapter->compare.num_samples = 10;
-		timecompare_update(&adapter->compare, 0);
-		break;
-	case e1000_82576:
-		/*
-		 * Initialize hardware timer: we keep it running just in case
-		 * that some program needs it later on.
-		 */
-		memset(&adapter->cycles, 0, sizeof(adapter->cycles));
-		adapter->cycles.read = igb_read_clock;
-		adapter->cycles.mask = CLOCKSOURCE_MASK(64);
-		adapter->cycles.mult = 1;
-		/**
-		 * Scale the NIC clock cycle by a large factor so that
-		 * relatively small clock corrections can be added or
-		 * subtracted at each clock tick. The drawbacks of a large
-		 * factor are a) that the clock register overflows more quickly
-		 * (not such a big deal) and b) that the increment per tick has
-		 * to fit into 24 bits.  As a result we need to use a shift of
-		 * 19 so we can fit a value of 16 into the TIMINCA register.
-		 */
-		adapter->cycles.shift = IGB_82576_TSYNC_SHIFT;
-		wr32(E1000_TIMINCA,
-		                (1 << E1000_TIMINCA_16NS_SHIFT) |
-		                (16 << IGB_82576_TSYNC_SHIFT));
-
-		/* Set registers so that rollover occurs soon to test this. */
-		wr32(E1000_SYSTIML, 0x00000000);
-		wr32(E1000_SYSTIMH, 0xFF800000);
-		wrfl();
-
-		timecounter_init(&adapter->clock,
-				 &adapter->cycles,
-				 ktime_to_ns(ktime_get_real()));
-		/*
-		 * Synchronize our NIC clock against system wall clock. NIC
-		 * time stamp reading requires ~3us per sample, each sample
-		 * was pretty stable even under load => only require 10
-		 * samples for each offset comparison.
-		 */
-		memset(&adapter->compare, 0, sizeof(adapter->compare));
-		adapter->compare.source = &adapter->clock;
-		adapter->compare.target = ktime_get_real;
-		adapter->compare.num_samples = 10;
-		timecompare_update(&adapter->compare, 0);
-		break;
-	case e1000_82575:
-		/* 82575 does not support timesync */
-	default:
-		break;
-	}
-
-}
-
-/**
  * igb_sw_init - Initialize general software structures (struct igb_adapter)
  * @adapter: board private structure to initialize
  *
@@ -5678,35 +5546,6 @@ static int igb_poll(struct napi_struct *napi, int budget)
 }
 
 /**
- * igb_systim_to_hwtstamp - convert system time value to hw timestamp
- * @adapter: board private structure
- * @shhwtstamps: timestamp structure to update
- * @regval: unsigned 64bit system time value.
- *
- * We need to convert the system time value stored in the RX/TXSTMP registers
- * into a hwtstamp which can be used by the upper level timestamping functions
- */
-static void igb_systim_to_hwtstamp(struct igb_adapter *adapter,
-                                   struct skb_shared_hwtstamps *shhwtstamps,
-                                   u64 regval)
-{
-	u64 ns;
-
-	/*
-	 * The 82580 starts with 1ns at bit 0 in RX/TXSTMPL, shift this up to
-	 * 24 to match clock shift we setup earlier.
-	 */
-	if (adapter->hw.mac.type >= e1000_82580)
-		regval <<= IGB_82580_TSYNC_SHIFT;
-
-	ns = timecounter_cyc2time(&adapter->clock, regval);
-	timecompare_update(&adapter->compare, ns);
-	memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
-	shhwtstamps->hwtstamp = ns_to_ktime(ns);
-	shhwtstamps->syststamp = timecompare_transform(&adapter->compare, ns);
-}
-
-/**
  * igb_tx_hwtstamp - utility function which checks for TX time stamp
  * @q_vector: pointer to q_vector containing needed info
  * @buffer: pointer to igb_tx_buffer structure
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index a8dc3e5..061a897 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -26,6 +26,9 @@
 #define ISGN			0x80000000
 
 /*
+ * The 82580 timesync updates the system timer every 8ns by 8ns,
+ * and this update value cannot be reprogrammed.
+ *
  * Neither the 82576 nor the 82580 offer registers wide enough to hold
  * nanoseconds time values for very long. For the 82580, SYSTIM always
  * counts nanoseconds, but the upper 24 bits are not availible. The
@@ -37,6 +40,14 @@
  * field are needed to provide the nominal 16 nanosecond period,
  * leaving 19 bits for fractional nanoseconds.
  *
+ * We scale the NIC clock cycle by a large factor so that relatively
+ * small clock corrections can be added or subtracted at each clock
+ * tick. The drawbacks of a large factor are a) that the clock
+ * register overflows more quickly (not such a big deal) and b) that
+ * the increment per tick has to fit into 24 bits.  As a result we
+ * need to use a shift of 19 so we can fit a value of 16 into the
+ * TIMINCA register.
+ *
  *
  *             SYSTIMH            SYSTIML
  *        +--------------+   +---+---+------+
@@ -94,6 +105,11 @@ static cycle_t igb_82580_systim_read(const struct cyclecounter *cc)
 	struct igb_adapter *igb = container_of(cc, struct igb_adapter, cc);
 	struct e1000_hw *hw = &igb->hw;
 
+	/*
+	 * The timestamp latches on lowest register read. For the 82580
+	 * the lowest register is SYSTIMR instead of SYSTIML.  However we only
+	 * need to provide nanosecond resolution, so we just ignore it.
+	 */
 	jk = rd32(E1000_SYSTIMR);
 	lo = rd32(E1000_SYSTIML);
 	hi = rd32(E1000_SYSTIMH);
@@ -327,3 +343,46 @@ void igb_ptp_remove(struct igb_adapter *adapter)
 
 #endif /*CONFIG_PTP_1588_CLOCK*/
 }
+
+/**
+ * igb_systim_to_hwtstamp - convert system time value to hw timestamp
+ * @adapter: board private structure
+ * @hwtstamps: timestamp structure to update
+ * @systim: unsigned 64bit system time value.
+ *
+ * We need to convert the system time value stored in the RX/TXSTMP registers
+ * into a hwtstamp which can be used by the upper level timestamping functions.
+ *
+ * The 'tmreg_lock' spinlock is used to protect the consistency of the
+ * system time value. This is needed because reading the 64 bit time
+ * value involves reading two (or three) 32 bit registers. The first
+ * read latches the value. Ditto for writing.
+ *
+ * In addition, here have extended the system time with an overflow
+ * counter in software.
+ */
+void igb_systim_to_hwtstamp(struct igb_adapter *adapter,
+			    struct skb_shared_hwtstamps *hwtstamps,
+			    u64 systim)
+{
+	u64 ns;
+	unsigned long flags;
+
+	switch (adapter->hw.mac.type) {
+	case e1000_i350:
+	case e1000_82580:
+	case e1000_82576:
+		break;
+	default:
+		return;
+	}
+
+	spin_lock_irqsave(&adapter->tmreg_lock, flags);
+
+	ns = timecounter_cyc2time(&adapter->tc, systim);
+
+	spin_unlock_irqrestore(&adapter->tmreg_lock, flags);
+
+	memset(hwtstamps, 0, sizeof(*hwtstamps));
+	hwtstamps->hwtstamp = ns_to_ktime(ns);
+}
-- 
1.7.2.5


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-03-30 15:17 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D3F292ADF945FB49B35E96C94C2061B916484EC4@nsmail.netscout.com>
2012-03-15 17:18 ` [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method chetan loke
2012-03-16  6:55   ` Richard Cochran
2012-03-21 15:00     ` chetan loke
2012-03-21 16:08       ` Richard Cochran
2012-03-21 17:02         ` chetan loke
2012-03-22  7:00           ` Richard Cochran
2012-03-21 17:06         ` Keller, Jacob E
2012-03-22  7:09           ` Richard Cochran
2012-03-22 21:59             ` Keller, Jacob E
2012-03-23  6:05               ` Richard Cochran
2012-03-21 17:00       ` Richard Cochran
2012-03-21 17:09         ` Keller, Jacob E
2012-03-21 21:50           ` chetan loke
2012-03-22  6:41             ` Richard Cochran
2012-03-22 23:13               ` chetan loke
2012-03-23  5:59                 ` Richard Cochran
2012-03-23 18:27                 ` Keller, Jacob E
2012-03-23 19:39                   ` chetan loke
2012-03-24  6:51                     ` Richard Cochran
2012-03-26 15:07                       ` chetan loke
2012-03-26 15:27                         ` Richard Cochran
2012-03-26 17:11                           ` chetan loke
2012-03-26 18:56                             ` Keller, Jacob E
2012-03-26 19:32                               ` chetan loke
2012-03-26 20:46                                 ` Keller, Jacob E
2012-03-27 18:05                                   ` Richard Cochran
2012-03-27 18:29                                     ` Keller, Jacob E
2012-03-27 18:43                                       ` chetan loke
2012-03-27 18:51                                         ` chetan loke
2012-03-27 20:58                                           ` Keller, Jacob E
2012-03-27 21:55                                             ` chetan loke
2012-03-29 23:08                                               ` Keller, Jacob E
2012-03-30 15:17                                                 ` chetan loke
2012-03-27 15:33                             ` Richard Cochran
2012-03-27 18:39                               ` chetan loke
2012-03-26 20:51                       ` Keller, Jacob E
2012-03-27 18:33                         ` Richard Cochran
2012-01-21 16:03 [PATCH net V4 0/2] igb: ptp hardware clock Richard Cochran
2012-01-21 16:03 ` [PATCH net V4 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method Richard Cochran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.