All of lore.kernel.org
 help / color / mirror / Atom feed
* e1000e I219 timestamping oops related to TSYNCRXCTL read
@ 2018-04-25  6:52 ` Benjamin Poirier
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Poirier @ 2018-04-25  6:52 UTC (permalink / raw)
  To: Bruce Allan, Yanir Lubetkin, Jacob Keller, Neftin, Sasha
  Cc: Alexander Duyck, Jeff Kirsher, Achim Mildenberger, olouvignes,
	jayanth, ehabkost, postmodern.mod3, Bart.VanAssche,
	intel-wired-lan, netdev

In the following openSUSE bug report
https://bugzilla.suse.com/show_bug.cgi?id=1075876
Achim reported an oops related to e1000e timestamping:
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

It always occurs 4 hours after boot but not on every boot. It is most
likely the same problem reported here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356
http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530
https://bugzilla.redhat.com/show_bug.cgi?id=1463882
https://bugzilla.redhat.com/show_bug.cgi?id=1431863

This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has
reproduced it on a v4.16 derivative.

We've traced it to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL,
which leads to a null deref in timecounter_read() (see comment 8 of the
suse bugzilla for more details.)

In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir
reworked e1000e_get_base_timinca() in such a way that it can return
-EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
This is also the commit that was identified by bisection in the second
link above (lkml).

What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes
don't have the SYSCFI bit set. Retrying the read shortly after finds the
bit to be set. This was observed at boot (probe) but also link up and
link down.

I have a few questions:

What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the
datasheet)?

Why does it look like subsequent reads of TSYNCRXCTL sometimes have the
SYSCFI bit set/not set on I219?

Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for
_spt and return -EINVAL if it's not set? Could we just remove that
check?

The patch in comment 13 of the suse bugzilla works around the problem by
retrying TSYNCRXCTL reads, maybe we could instead remove that read
altogether or move the timecounter_init() call to at least avoid the
oops. The best approach to take seems to depend on the behavior expected
of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more
info on that.

Thanks,
-Benjamin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] e1000e I219 timestamping oops related to TSYNCRXCTL read
@ 2018-04-25  6:52 ` Benjamin Poirier
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Poirier @ 2018-04-25  6:52 UTC (permalink / raw)
  To: intel-wired-lan

In the following openSUSE bug report
https://bugzilla.suse.com/show_bug.cgi?id=1075876
Achim reported an oops related to e1000e timestamping:
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

It always occurs 4 hours after boot but not on every boot. It is most
likely the same problem reported here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356
http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530
https://bugzilla.redhat.com/show_bug.cgi?id=1463882
https://bugzilla.redhat.com/show_bug.cgi?id=1431863

This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has
reproduced it on a v4.16 derivative.

We've traced it to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL,
which leads to a null deref in timecounter_read() (see comment 8 of the
suse bugzilla for more details.)

In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir
reworked e1000e_get_base_timinca() in such a way that it can return
-EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
This is also the commit that was identified by bisection in the second
link above (lkml).

What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes
don't have the SYSCFI bit set. Retrying the read shortly after finds the
bit to be set. This was observed at boot (probe) but also link up and
link down.

I have a few questions:

What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the
datasheet)?

Why does it look like subsequent reads of TSYNCRXCTL sometimes have the
SYSCFI bit set/not set on I219?

Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for
_spt and return -EINVAL if it's not set? Could we just remove that
check?

The patch in comment 13 of the suse bugzilla works around the problem by
retrying TSYNCRXCTL reads, maybe we could instead remove that read
altogether or move the timecounter_init() call to at least avoid the
oops. The best approach to take seems to depend on the behavior expected
of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more
info on that.

Thanks,
-Benjamin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: e1000e I219 timestamping oops related to TSYNCRXCTL read
  2018-04-25  6:52 ` [Intel-wired-lan] " Benjamin Poirier
@ 2018-04-25 19:59   ` Keller, Jacob E
  -1 siblings, 0 replies; 12+ messages in thread
From: Keller, Jacob E @ 2018-04-25 19:59 UTC (permalink / raw)
  To: Benjamin Poirier, Allan, Bruce W, Yanir Lubetkin, Neftin, Sasha
  Cc: Alexander Duyck, Kirsher, Jeffrey T, Achim Mildenberger,
	olouvignes, jayanth, ehabkost, postmodern.mod3, Bart.VanAssche,
	intel-wired-lan, netdev

Hi Benjamin,

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Benjamin Poirier
> Sent: Tuesday, April 24, 2018 11:53 PM
> To: Allan, Bruce W <bruce.w.allan@intel.com>; Yanir Lubetkin
> <yanirx.lubetkin@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Neftin,
> Sasha <sasha.neftin@intel.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Achim Mildenberger <admin@fph.physik.uni-
> karlsruhe.de>; olouvignes@gmail.com; jayanth@goubiq.com;
> ehabkost@redhat.com; postmodern.mod3@gmail.com;
> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org
> Subject: e1000e I219 timestamping oops related to TSYNCRXCTL read
> 
> In the following openSUSE bug report
> https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Achim reported an oops related to e1000e timestamping:
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> It always occurs 4 hours after boot but not on every boot. It is most
> likely the same problem reported here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356
> http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530
> https://bugzilla.redhat.com/show_bug.cgi?id=1463882
> https://bugzilla.redhat.com/show_bug.cgi?id=1431863
> 

It probably occurs due to the systim overflow check, yes.

> This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has
> reproduced it on a v4.16 derivative.
> 
> We've traced it to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL,
> which leads to a null deref in timecounter_read() (see comment 8 of the
> suse bugzilla for more details.)
> 
> In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir
> reworked e1000e_get_base_timinca() in such a way that it can return
> -EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> This is also the commit that was identified by bisection in the second
> link above (lkml).
> 
> What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes
> don't have the SYSCFI bit set. Retrying the read shortly after finds the
> bit to be set. This was observed at boot (probe) but also link up and
> link down.
> 

I don't know offhand what the SYSCFI bit is for yet still digging into it.

> I have a few questions:
> 
> What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the
> datasheet)?
> 
> Why does it look like subsequent reads of TSYNCRXCTL sometimes have the
> SYSCFI bit set/not set on I219?
> 
> Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for
> _spt and return -EINVAL if it's not set? Could we just remove that
> check?
> 

I think the right approach might be proper cleanup when we fail to reset. I think the problem is that when e1000e_systim_reset is called and fails, we don't properly cleanup the work items. I think we need to actually stop and kill the work task so that it won't run.

> The patch in comment 13 of the suse bugzilla works around the problem by
> retrying TSYNCRXCTL reads, maybe we could instead remove that read
> altogether or move the timecounter_init() call to at least avoid the
> oops. The best approach to take seems to depend on the behavior expected
> of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more
> info on that.
> 

Yea, we need to do something here, I'm still investigating why we need the SYSCFI check, but at a minimum, we should disable the overflow check task if we fail here, I think.

It looks like the SYSCFI is the System Clock Frequency Indicator bit, and it should be used to tell which of two clock frequencies to choose. I do not understand why that would be changing at different reads. We do need to make sure the clock is already enabled, but we do that prior to the switch case... Something is really weird here...

Thanks,
Jake

> Thanks,
> -Benjamin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] e1000e I219 timestamping oops related to TSYNCRXCTL read
@ 2018-04-25 19:59   ` Keller, Jacob E
  0 siblings, 0 replies; 12+ messages in thread
From: Keller, Jacob E @ 2018-04-25 19:59 UTC (permalink / raw)
  To: intel-wired-lan

Hi Benjamin,

> -----Original Message-----
> From: netdev-owner at vger.kernel.org [mailto:netdev-owner at vger.kernel.org]
> On Behalf Of Benjamin Poirier
> Sent: Tuesday, April 24, 2018 11:53 PM
> To: Allan, Bruce W <bruce.w.allan@intel.com>; Yanir Lubetkin
> <yanirx.lubetkin@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Neftin,
> Sasha <sasha.neftin@intel.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Achim Mildenberger <admin@fph.physik.uni-
> karlsruhe.de>; olouvignes at gmail.com; jayanth at goubiq.com;
> ehabkost at redhat.com; postmodern.mod3 at gmail.com;
> Bart.VanAssche at wdc.com; intel-wired-lan at lists.osuosl.org;
> netdev at vger.kernel.org
> Subject: e1000e I219 timestamping oops related to TSYNCRXCTL read
> 
> In the following openSUSE bug report
> https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Achim reported an oops related to e1000e timestamping:
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> It always occurs 4 hours after boot but not on every boot. It is most
> likely the same problem reported here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356
> http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530
> https://bugzilla.redhat.com/show_bug.cgi?id=1463882
> https://bugzilla.redhat.com/show_bug.cgi?id=1431863
> 

It probably occurs due to the systim overflow check, yes.

> This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has
> reproduced it on a v4.16 derivative.
> 
> We've traced it to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL,
> which leads to a null deref in timecounter_read() (see comment 8 of the
> suse bugzilla for more details.)
> 
> In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir
> reworked e1000e_get_base_timinca() in such a way that it can return
> -EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> This is also the commit that was identified by bisection in the second
> link above (lkml).
> 
> What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes
> don't have the SYSCFI bit set. Retrying the read shortly after finds the
> bit to be set. This was observed at boot (probe) but also link up and
> link down.
> 

I don't know offhand what the SYSCFI bit is for yet still digging into it.

> I have a few questions:
> 
> What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the
> datasheet)?
> 
> Why does it look like subsequent reads of TSYNCRXCTL sometimes have the
> SYSCFI bit set/not set on I219?
> 
> Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for
> _spt and return -EINVAL if it's not set? Could we just remove that
> check?
> 

I think the right approach might be proper cleanup when we fail to reset. I think the problem is that when e1000e_systim_reset is called and fails, we don't properly cleanup the work items. I think we need to actually stop and kill the work task so that it won't run.

> The patch in comment 13 of the suse bugzilla works around the problem by
> retrying TSYNCRXCTL reads, maybe we could instead remove that read
> altogether or move the timecounter_init() call to at least avoid the
> oops. The best approach to take seems to depend on the behavior expected
> of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more
> info on that.
> 

Yea, we need to do something here, I'm still investigating why we need the SYSCFI check, but at a minimum, we should disable the overflow check task if we fail here, I think.

It looks like the SYSCFI is the System Clock Frequency Indicator bit, and it should be used to tell which of two clock frequencies to choose. I do not understand why that would be changing at different reads. We do need to make sure the clock is already enabled, but we do that prior to the switch case... Something is really weird here...

Thanks,
Jake

> Thanks,
> -Benjamin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  2018-04-25 19:59   ` [Intel-wired-lan] " Keller, Jacob E
@ 2018-05-10  7:28     ` Benjamin Poirier
  -1 siblings, 0 replies; 12+ messages in thread
From: Benjamin Poirier @ 2018-05-10  7:28 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Keller, Jacob E, Achim Mildenberger, olouvignes, jayanth,
	ehabkost, postmodern.mod3, Bart.VanAssche, intel-wired-lan,
	netdev, linux-kernel

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca)
 		}
 		break;
 	case e1000_pch_spt:
-		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-			/* Stable 24MHz frequency */
-			incperiod = INCPERIOD_24MHZ;
-			incvalue = INCVALUE_24MHZ;
-			shift = INCVALUE_SHIFT_24MHZ;
-			adapter->cc.shift = shift;
-			break;
-		}
-		return -EINVAL;
+		/* Stable 24MHz frequency */
+		incperiod = INCPERIOD_24MHZ;
+		incvalue = INCVALUE_24MHZ;
+		shift = INCVALUE_SHIFT_24MHZ;
+		adapter->cc.shift = shift;
+		break;
 	case e1000_pch_cnp:
 		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
 			/* Stable 24MHz frequency */
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
@ 2018-05-10  7:28     ` Benjamin Poirier
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Poirier @ 2018-05-10  7:28 UTC (permalink / raw)
  To: intel-wired-lan

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca)
 		}
 		break;
 	case e1000_pch_spt:
-		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-			/* Stable 24MHz frequency */
-			incperiod = INCPERIOD_24MHZ;
-			incvalue = INCVALUE_24MHZ;
-			shift = INCVALUE_SHIFT_24MHZ;
-			adapter->cc.shift = shift;
-			break;
-		}
-		return -EINVAL;
+		/* Stable 24MHz frequency */
+		incperiod = INCPERIOD_24MHZ;
+		incvalue = INCVALUE_24MHZ;
+		shift = INCVALUE_SHIFT_24MHZ;
+		adapter->cc.shift = shift;
+		break;
 	case e1000_pch_cnp:
 		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
 			/* Stable 24MHz frequency */
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  2018-05-10  7:28     ` [Intel-wired-lan] " Benjamin Poirier
@ 2018-05-10 18:42       ` Keller, Jacob E
  -1 siblings, 0 replies; 12+ messages in thread
From: Keller, Jacob E @ 2018-05-10 18:42 UTC (permalink / raw)
  To: Benjamin Poirier, Kirsher, Jeffrey T
  Cc: Achim Mildenberger, olouvignes, jayanth, ehabkost,
	postmodern.mod3, Bart.VanAssche, intel-wired-lan, netdev,
	linux-kernel

> -----Original Message-----
> From: Benjamin Poirier [mailto:bpoirier@suse.com]
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ec4a9759a6f2..3afb1f3b6f91 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca)
>  		}
>  		break;
>  	case e1000_pch_spt:
> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> -			/* Stable 24MHz frequency */
> -			incperiod = INCPERIOD_24MHZ;
> -			incvalue = INCVALUE_24MHZ;
> -			shift = INCVALUE_SHIFT_24MHZ;
> -			adapter->cc.shift = shift;
> -			break;
> -		}
> -		return -EINVAL;
> +		/* Stable 24MHz frequency */
> +		incperiod = INCPERIOD_24MHZ;
> +		incvalue = INCVALUE_24MHZ;
> +		shift = INCVALUE_SHIFT_24MHZ;
> +		adapter->cc.shift = shift;
> +		break;
>  	case e1000_pch_cnp:
>  		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>  			/* Stable 24MHz frequency */
> --
> 2.16.3

Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
@ 2018-05-10 18:42       ` Keller, Jacob E
  0 siblings, 0 replies; 12+ messages in thread
From: Keller, Jacob E @ 2018-05-10 18:42 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Benjamin Poirier [mailto:bpoirier at suse.com]
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; olouvignes at gmail.com;
> jayanth at goubiq.com; ehabkost at redhat.com; postmodern.mod3 at gmail.com;
> Bart.VanAssche at wdc.com; intel-wired-lan at lists.osuosl.org;
> netdev at vger.kernel.org; linux-kernel at vger.kernel.org
> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ec4a9759a6f2..3afb1f3b6f91 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca)
>  		}
>  		break;
>  	case e1000_pch_spt:
> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> -			/* Stable 24MHz frequency */
> -			incperiod = INCPERIOD_24MHZ;
> -			incvalue = INCVALUE_24MHZ;
> -			shift = INCVALUE_SHIFT_24MHZ;
> -			adapter->cc.shift = shift;
> -			break;
> -		}
> -		return -EINVAL;
> +		/* Stable 24MHz frequency */
> +		incperiod = INCPERIOD_24MHZ;
> +		incvalue = INCVALUE_24MHZ;
> +		shift = INCVALUE_SHIFT_24MHZ;
> +		adapter->cc.shift = shift;
> +		break;
>  	case e1000_pch_cnp:
>  		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>  			/* Stable 24MHz frequency */
> --
> 2.16.3

Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  2018-05-10 18:42       ` [Intel-wired-lan] " Keller, Jacob E
@ 2018-05-13  6:55         ` Neftin, Sasha
  -1 siblings, 0 replies; 12+ messages in thread
From: Neftin, Sasha @ 2018-05-13  6:55 UTC (permalink / raw)
  To: Keller, Jacob E, Benjamin Poirier, Kirsher, Jeffrey T
  Cc: ehabkost, netdev, jayanth, linux-kernel, postmodern.mod3,
	Achim Mildenberger, intel-wired-lan, Bart.VanAssche, olouvignes

On 5/10/2018 21:42, Keller, Jacob E wrote:
>> -----Original Message-----
>> From: Benjamin Poirier [mailto:bpoirier@suse.com]
>> Sent: Thursday, May 10, 2018 12:29 AM
>> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
>> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
>> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
>> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
>> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>>
>> There have been multiple reports of crashes that look like
>> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
>> [...]
>> kernel: Call Trace:
>> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
>> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
>> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
>> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
>> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
>> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>>
>> These can be traced back to the fact that e1000e_systim_reset() skips the
>> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
>> leads to a null deref in timecounter_read().
>>
>> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
>> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
>> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>>
>> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
>> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
>> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
>> finds the bit to be set. This was observed at boot (probe) but also link up
>> and link down.
>>
>> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
>> reads where SYSCFI=0. Therefore, remove this register read and
>> unconditionally set the clock parameters.
>>
>> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
>> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
>> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
>> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ec4a9759a6f2..3afb1f3b6f91 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
>> *adapter, u32 *timinca)
>>   		}
>>   		break;
>>   	case e1000_pch_spt:
>> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> -			/* Stable 24MHz frequency */
>> -			incperiod = INCPERIOD_24MHZ;
>> -			incvalue = INCVALUE_24MHZ;
>> -			shift = INCVALUE_SHIFT_24MHZ;
>> -			adapter->cc.shift = shift;
>> -			break;
>> -		}
>> -		return -EINVAL;
>> +		/* Stable 24MHz frequency */
>> +		incperiod = INCPERIOD_24MHZ;
>> +		incvalue = INCVALUE_24MHZ;
>> +		shift = INCVALUE_SHIFT_24MHZ;
>> +		adapter->cc.shift = shift;
>> +		break;
>>   	case e1000_pch_cnp:
>>   		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>>   			/* Stable 24MHz frequency */
>> --
>> 2.16.3
> 
> Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.
> 
> Thanks,
> Jake
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
@ 2018-05-13  6:55         ` Neftin, Sasha
  0 siblings, 0 replies; 12+ messages in thread
From: Neftin, Sasha @ 2018-05-13  6:55 UTC (permalink / raw)
  To: intel-wired-lan

On 5/10/2018 21:42, Keller, Jacob E wrote:
>> -----Original Message-----
>> From: Benjamin Poirier [mailto:bpoirier at suse.com]
>> Sent: Thursday, May 10, 2018 12:29 AM
>> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
>> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
>> <admin@fph.physik.uni-karlsruhe.de>; olouvignes at gmail.com;
>> jayanth at goubiq.com; ehabkost at redhat.com; postmodern.mod3 at gmail.com;
>> Bart.VanAssche at wdc.com; intel-wired-lan at lists.osuosl.org;
>> netdev at vger.kernel.org; linux-kernel at vger.kernel.org
>> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>>
>> There have been multiple reports of crashes that look like
>> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
>> [...]
>> kernel: Call Trace:
>> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
>> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
>> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
>> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
>> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
>> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>>
>> These can be traced back to the fact that e1000e_systim_reset() skips the
>> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
>> leads to a null deref in timecounter_read().
>>
>> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
>> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
>> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>>
>> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
>> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
>> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
>> finds the bit to be set. This was observed at boot (probe) but also link up
>> and link down.
>>
>> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
>> reads where SYSCFI=0. Therefore, remove this register read and
>> unconditionally set the clock parameters.
>>
>> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
>> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
>> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
>> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ec4a9759a6f2..3afb1f3b6f91 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
>> *adapter, u32 *timinca)
>>   		}
>>   		break;
>>   	case e1000_pch_spt:
>> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> -			/* Stable 24MHz frequency */
>> -			incperiod = INCPERIOD_24MHZ;
>> -			incvalue = INCVALUE_24MHZ;
>> -			shift = INCVALUE_SHIFT_24MHZ;
>> -			adapter->cc.shift = shift;
>> -			break;
>> -		}
>> -		return -EINVAL;
>> +		/* Stable 24MHz frequency */
>> +		incperiod = INCPERIOD_24MHZ;
>> +		incvalue = INCVALUE_24MHZ;
>> +		shift = INCVALUE_SHIFT_24MHZ;
>> +		adapter->cc.shift = shift;
>> +		break;
>>   	case e1000_pch_cnp:
>>   		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>>   			/* Stable 24MHz frequency */
>> --
>> 2.16.3
> 
> Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.
> 
> Thanks,
> Jake
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  2018-05-10  7:28     ` [Intel-wired-lan] " Benjamin Poirier
@ 2018-05-23  0:44       ` Brown, Aaron F
  -1 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2018-05-23  0:44 UTC (permalink / raw)
  To: Benjamin Poirier, Kirsher, Jeffrey T
  Cc: ehabkost, netdev, jayanth, linux-kernel, Bart.VanAssche,
	postmodern.mod3, Achim Mildenberger, intel-wired-lan, olouvignes

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: ehabkost@redhat.com; netdev@vger.kernel.org; jayanth@goubiq.com;
> linux-kernel@vger.kernel.org; Bart.VanAssche@wdc.com;
> postmodern.mod3@gmail.com; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; intel-wired-lan@lists.osuosl.org;
> olouvignes@gmail.com
> Subject: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting
> I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80
> [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even
> after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
@ 2018-05-23  0:44       ` Brown, Aaron F
  0 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2018-05-23  0:44 UTC (permalink / raw)
  To: intel-wired-lan

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: ehabkost at redhat.com; netdev at vger.kernel.org; jayanth at goubiq.com;
> linux-kernel at vger.kernel.org; Bart.VanAssche at wdc.com;
> postmodern.mod3 at gmail.com; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; intel-wired-lan at lists.osuosl.org;
> olouvignes at gmail.com
> Subject: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting
> I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80
> [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even
> after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-23  0:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  6:52 e1000e I219 timestamping oops related to TSYNCRXCTL read Benjamin Poirier
2018-04-25  6:52 ` [Intel-wired-lan] " Benjamin Poirier
2018-04-25 19:59 ` Keller, Jacob E
2018-04-25 19:59   ` [Intel-wired-lan] " Keller, Jacob E
2018-05-10  7:28   ` [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes Benjamin Poirier
2018-05-10  7:28     ` [Intel-wired-lan] " Benjamin Poirier
2018-05-10 18:42     ` Keller, Jacob E
2018-05-10 18:42       ` [Intel-wired-lan] " Keller, Jacob E
2018-05-13  6:55       ` Neftin, Sasha
2018-05-13  6:55         ` Neftin, Sasha
2018-05-23  0:44     ` Brown, Aaron F
2018-05-23  0:44       ` Brown, Aaron F

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.