All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	Len Brown <len.brown@intel.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Amit Kucheria <amitk@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Lukasz Luba <lukasz.luba@arm.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 5/7] thermal: intel: hfi: Enable notification interrupt
Date: Thu, 13 Jan 2022 05:48:58 -0800	[thread overview]
Message-ID: <20220113134858.GA15346@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <0d50595854a9d9cd25eb3ac179bd39cf1cc30bcd.camel@linux.intel.com>

On Wed, Jan 12, 2022 at 03:26:41PM -0800, Srinivas Pandruvada wrote:
> On Wed, 2022-01-12 at 20:47 +0100, Rafael J. Wysocki wrote:
> > On Sat, Jan 8, 2022 at 4:46 AM Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > > When hardware wants to inform the operating system about updates in
> > > the HFI
> > > table, it issues a package-level thermal event interrupt. For this,
> > > hardware has new interrupt and status bits in the
> > > IA32_PACKAGE_THERM_
> > > INTERRUPT and IA32_PACKAGE_THERM_STATUS registers. The existing
> > > thermal
> > > throttle driver already handles thermal event interrupts: it
> > > initializes
> > > the thermal vector of the local APIC as well as per-CPU and
> > > package-level
> > > interrupt reporting. It also provides routines to service such
> > > interrupts.
> > > Extend its functionality to also handle HFI interrupts.
> > > 
> > > The frequency of the thermal HFI interrupt is specific to each
> > > processor
> > > model. On some processors, a single interrupt happens as soon as
> > > the HFI is
> > > enabled and hardware will never update HFI capabilities afterwards.
> > > On
> > > other processors, thermal and power constraints may cause thermal
> > > HFI
> > > interrupts every tens of milliseconds.
> > > 
> > > To not overwhelm consumers of the HFI data, use delayed work to
> > > throttle
> > > the rate at which HFI updates are processed.
> > > 
> > > 
> 
> [...]
> 
> > > +void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
> > > +{
> > > +       struct hfi_instance *hfi_instance;
> > > +       int cpu = smp_processor_id();
> > > +       struct hfi_cpu_info *info;
> > > +       u64 new_timestamp;
> > > +
> > > +       if (!pkg_therm_status_msr_val)
> > > +               return;
> > > +
> > > +       info = &per_cpu(hfi_cpu_info, cpu);
> > > +       if (!info)
> > > +               return;
> > > +
> > > +       /*
> > > +        * It is possible that we get an HFI thermal interrupt on
> > > this CPU
> > > +        * before its HFI instance is initialized.
> Although this code can handle this situation, you can avoid this.
> 
> You can call intel_hfi_online(cpu) before 
> "
> l = apic_read(APIC_LVTTHMR);
> 	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
> "

Indeed this could work.

intel_hfi_online() also enables the HFI. It is possible to get an HFI
interrupt as soon as we enable it. I was concerned about getting the
interrupt with APIC_LVTTHMR masked and miss it. This should not be
a problem since we will get it as soon as we unmask it.

> in thermal_throttle_online()
> 
> In the same way call intel_hfi_offline(cpu)
> after
> 
> /* Mask the thermal vector before draining evtl. pending work */
> 	l = apic_read(APIC_LVTTHMR);
> 	apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
> 
> in thermal_throttle_offline()

Sure, I can do this. 

> 
> 
> > >  This is not a problem. The
> > > +        * CPU that enabled the interrupt for this package will
> > > also get the
> > > +        * interrupt and is fully initialized.
> > > +        */
> > > +       hfi_instance = info->hfi_instance;
> > > +       if (!hfi_instance)
> > > +               return;
> > 
> > Generally, a CPU whose info has been initialized can be offline, so
> > this code may run on an offline CPU.
> > 
> > I'm not actually sure if this is a concern, but just mentioning it in
> > case it is.
> > 
> It will not matter as the handler of the message should be handle case
> as CPU can go offline later after the message even if the CPU was
> offline.
> But I think we can avoid this situation.
> 
> > > +
> > > +       /*
> > > +        * On most systems, all CPUs in the package receive a
> > > package-level
> > > +        * thermal interrupt when there is an HFI update. It is
> > > sufficient to
> > > +        * let a single CPU to acknowledge the update and schedule
> > > work to
> > > +        * process it. The remaining CPUs can resume their work.
> > > +        */
> > > +       if (!raw_spin_trylock(&hfi_instance->event_lock))
> > > +               return;
> > > +
> > > +       /* Skip duplicated updates. */
> > > +       new_timestamp = *(u64 *)hfi_instance->hw_table;
> > > +       if (*hfi_instance->timestamp == new_timestamp) {
> > > +               raw_spin_unlock(&hfi_instance->event_lock);
> > > +               return;
> > > +       }
> > > +
> > > +       raw_spin_lock(&hfi_instance->table_lock);
> > > +
> > > +       /*
> > > +        * Copy the updated table into our local copy. This
> > > includes the new
> > > +        * timestamp.
> > > +        */
> > > +       memcpy(hfi_instance->local_table, hfi_instance->hw_table,
> > > +              hfi_features.nr_table_pages << PAGE_SHIFT);
> > > +
> > > +       raw_spin_unlock(&hfi_instance->table_lock);
> > > +       raw_spin_unlock(&hfi_instance->event_lock);
> > > +
> > > +       /*
> > > +        * Let hardware know that we are done reading the HFI table
> > > and it is
> > > +        * free to update it again.
> > > +        */
> > > +       pkg_therm_status_msr_val &= THERM_STATUS_CLEAR_PKG_MASK &
> > > +                                   ~PACKAGE_THERM_STATUS_HFI_UPDAT
> > > ED;
> > > +       wrmsrl(MSR_IA32_PACKAGE_THERM_STATUS,
> > > pkg_therm_status_msr_val);
> > > +
> > > +       schedule_delayed_work(&hfi_instance->update_work,
> > > HFI_UPDATE_INTERVAL);
> > 
> > AFAICS, 
> For my understanding:
> 
> > if update_work has been scheduled already,
> queue_delayed_work_on is called
> 
> >  but is not pending
> > yet, the delay will be set to the current time plus
> > HFI_UPDATE_INTERVAL, but shouldn't it actually run earlier in that
> > case?
> 
> 
> "
> 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT,
> work_data_bits(work))) {
> 		__queue_delayed_work(cpu, wq, dwork, delay);
> 		ret = true;
> 	}
> "
> Pending bit will be set only one time, so delay will be from 
> the first call of queue_delayed_work_on() + HFI_UPDATE_INTERVAL.
> 
> So on subsequent calls of schedule_delayed_work() the delay is always
> with reference to first call.
> 
> > 
> > Also it looks like the processing introduced in the next patch can
> > take quite a bit of time if there is a sufficiently large number of
> > CPUs in the package, so is it suitable for system_wq in all cases?
> > 
> Good question. What is the threshold of not using system_wq?
> With current set of max cpus/package, I did experiments with busy
> systems with a test functions with several workqueues and check if they
> drift in expiry. But I think we can move away from system_wq. Can this
> be add on patch?

Sure, I can do this.

Thanks and BR,
Ricardo

  reply	other threads:[~2022-01-13 13:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-08  3:47 [PATCH v4 0/7] Thermal: Introduce the Hardware Feedback Interface for thermal and performance management Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 1/7] x86/Documentation: Describe the Intel Hardware Feedback Interface Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 2/7] x86/cpu: Add definitions for " Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 3/7] thermal: intel: hfi: Minimally initialize the " Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 4/7] thermal: intel: hfi: Handle CPU hotplug events Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 5/7] thermal: intel: hfi: Enable notification interrupt Ricardo Neri
2022-01-12 19:47   ` Rafael J. Wysocki
2022-01-12 23:26     ` Srinivas Pandruvada
2022-01-13 13:48       ` Ricardo Neri [this message]
2022-01-08  3:47 ` [PATCH v4 6/7] thermal: netlink: Add a new event to notify CPU capabilities change Ricardo Neri
2022-01-08  3:47 ` [PATCH v4 7/7] thermal: intel: hfi: Notify user space for HFI events Ricardo Neri
2022-01-12 19:53   ` Rafael J. Wysocki
2022-01-12 23:54     ` Srinivas Pandruvada
2022-01-13  5:50       ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220113134858.GA15346@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=amitk@kernel.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=len.brown@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.