PTP vclock: BUG: scheduling while atomic

* PTP vclock: BUG: scheduling while atomic
@ 2023-02-02 16:02 Íñigo Huguet
  2023-02-02 16:33 ` Miroslav Lichvar
  2023-02-03  9:09 ` Martin Habets
  0 siblings, 2 replies; 6+ messages in thread
From: Íñigo Huguet @ 2023-02-02 16:02 UTC (permalink / raw)
  To: netdev, richardcochran, yangbo.lu, mlichvar, gerhard
  Cc: habetsm.xilinx, ecree.xilinx, davem, edumazet, kuba, pabeni, alex.maftei

Hello,

Our QA team was testing PTP vclocks, and they've found this error with sfc NIC/driver:
  BUG: scheduling while atomic: ptp5/25223/0x00000002

The reason seems to be that vclocks disable interrupts with `spin_lock_irqsave` in
`ptp_vclock_gettime`, and then read the timecounter, which in turns ends calling to
the driver's `gettime64` callback.

Vclock framework was added in commit 5d43f951b1ac ("ptp: add ptp virtual clock driver
framework").

At first glance, it seems that vclock framework is reusing the already existing callbacks
of the drivers' ptp clocks, but it's imposing a new limitation that didn't exist before:
now they can't sleep (due the spin_lock_irqsave). Sfc driver might sleep waiting for the
fw response.

Sfc driver can be fixed to avoid this issue, but I wonder if something might not be
correct in the vclock framework. I don't have enough knowledge about how clocks
synchronization should work regarding this, so I leave it to your consideration.

These are the logs with stack traces:
 BUG: scheduling while atomic: ptp5/25223/0x00000002
 [...skip...]
 Call Trace:
  dump_stack_lvl+0x34/0x48
  __schedule_bug.cold+0x47/0x53
  __schedule+0x40e/0x580
  schedule+0x43/0xa0
  schedule_timeout+0x88/0x160
  ? __bpf_trace_tick_stop+0x10/0x10
  _efx_mcdi_rpc_finish+0x2a9/0x480 [sfc]
  ? efx_mcdi_send_request+0x1d5/0x260 [sfc]
  ? dequeue_task_stop+0x70/0x70
  _efx_mcdi_rpc.constprop.0+0xcd/0x3d0 [sfc]
  ? update_load_avg+0x7e/0x730
  _efx_mcdi_rpc_evb_retry+0x5d/0x1d0 [sfc]
  efx_mcdi_rpc+0x10/0x20 [sfc]
  efx_phc_gettime+0x5f/0xc0 [sfc]
  ptp_vclock_read+0xa3/0xc0
  timecounter_read+0x11/0x60
  ptp_vclock_refresh+0x31/0x60
  ? ptp_clock_release+0x50/0x50
  ptp_aux_kworker+0x19/0x40
  kthread_worker_fn+0xa9/0x250
  ? kthread_should_park+0x30/0x30
  kthread+0x146/0x170
  ? set_kthread_struct+0x50/0x50
  ret_from_fork+0x1f/0x30
 BUG: scheduling while atomic: ptp5/25223/0x00000000
 [...skip...]
 Call Trace:
  dump_stack_lvl+0x34/0x48
  __schedule_bug.cold+0x47/0x53
  __schedule+0x40e/0x580
  ? ptp_clock_release+0x50/0x50
  schedule+0x43/0xa0
  kthread_worker_fn+0x128/0x250
  ? kthread_should_park+0x30/0x30
  kthread+0x146/0x170
  ? set_kthread_struct+0x50/0x50
  ret_from_fork+0x1f/0x30

^ permalink raw reply	[flat|nested] 6+ messages in thread