All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>,
	Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org, Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output
Date: Wed, 9 Nov 2022 11:05:36 +0000	[thread overview]
Message-ID: <ad19d7ce-4102-4f8f-903d-7390b004b2e9@linux.intel.com> (raw)
In-Reply-To: <c9742b0f-546f-cccc-021a-7bad68410838@intel.com>


On 08/11/2022 20:15, John Harrison wrote:
> On 11/8/2022 01:01, Tvrtko Ursulin wrote:
>> On 07/11/2022 19:14, John Harrison wrote:
>>> On 11/7/2022 08:17, Tvrtko Ursulin wrote:
>>>> On 07/11/2022 09:33, Tvrtko Ursulin wrote:
>>>>> On 05/11/2022 01:03, Ceraolo Spurio, Daniele wrote:
>>>>>> On 11/4/2022 10:25 AM, John.C.Harrison@Intel.com wrote:
>>>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>>>
>>>>>>> When trying to analyse bug reports from CI, customers, etc. it 
>>>>>>> can be
>>>>>>> difficult to work out exactly what is happening on which GT in a
>>>>>>> multi-GT system. So add GT oriented debug/error message wrappers. If
>>>>>>> used instead of the drm_ equivalents, you get the same output but 
>>>>>>> with
>>>>>>> a GT# prefix on it.
>>>>>>>
>>>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>>
>>>>>> The only downside to this is that we'll print "GT0: " even on 
>>>>>> single-GT devices. We could introduce a gt->info.name and print 
>>>>>> that, so we could have it different per-platform, but IMO it's not 
>>>>>> worth the effort.
>>>>>>
>>>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>>
>>>>>> I think it might be worth getting an ack from one of the 
>>>>>> maintainers to make sure we're all aligned on transitioning to 
>>>>>> these new logging macro for gt code.
>>>>>
>>>>> Idea is I think a very good one. First I would suggest 
>>>>> standardising to lowercase GT in logs because:
>>>>>
>>>>> $ grep "GT%" i915/ -r
>>>>> $ grep "gt%" i915/ -r
>>>>> i915/gt/intel_gt_sysfs.c: gt->i915->sysfs_gt, "gt%d", gt->info.id))
>>>>> i915/gt/intel_gt_sysfs.c:                "failed to initialize gt%d 
>>>>> sysfs root\n", gt->info.id);
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u RC6 sysfs files (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u RC6p sysfs 
>>>>> files (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u RPS sysfs files (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to create 
>>>>> gt%u punit_req_freq_mhz sysfs (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u throttle sysfs 
>>>>> files (%pe)",
>>>>> i915/gt/intel_gt_sysfs_pm.c: "failed to create gt%u 
>>>>> media_perf_power_attrs sysfs (%pe)\n",
>>>>> i915/gt/intel_gt_sysfs_pm.c:                     "failed to add 
>>>>> gt%u rps defaults (%pe)\n",
>>>>> i915/i915_driver.c: drm_err(&gt->i915->drm, "gt%d: intel_pcode_init 
>>>>> failed %d\n", id, ret);
>>>>> i915/i915_hwmon.c:              snprintf(ddat_gt->name, 
>>>>> sizeof(ddat_gt->name), "i915_gt%u", i);
>>>>>
>>>
>>> Just because there are 11 existing instances of one form doesn't mean 
>>> that the 275 instances that are waiting to be converted should be 
>>> done incorrectly. GT is an acronym and should be capitalised.
>>
>> Okay just make it consistent then.
>>
>>> Besides:
>>> grep -r "GT " i915 | grep '"'
>>> i915/vlv_suspend.c:             drm_err(&i915->drm, "timeout 
>>> disabling GT waking\n");
>>> i915/vlv_suspend.c:                     "timeout waiting for GT wells 
>>> to go %s\n",
>>> i915/vlv_suspend.c:     drm_dbg(&i915->drm, "GT register access while 
>>> GT waking disabled\n");
>>> i915/i915_gpu_error.c:  err_printf(m, "GT awake: %s\n", 
>>> str_yes_no(gt->awake));
>>> i915/i915_debugfs.c:    seq_printf(m, "GT awake? %s [%d], %llums\n",
>>> i915/selftests/i915_gem_evict.c: pr_err("Failed to idle GT (on %s)", 
>>> engine->name);
>>> i915/intel_uncore.c:                  "GT thread status wait timed 
>>> out\n");
>>> i915/gt/uc/selftest_guc_multi_lrc.c: drm_err(&gt->i915->drm, "GT 
>>> failed to idle: %d\n", ret);
>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>> idle: %d\n", ret);
>>> i915/gt/uc/selftest_guc.c: drm_err(&gt->i915->drm, "GT failed to 
>>> idle: %d\n", ret);
>>> i915/gt/intel_gt_mcr.c: * Some GT registers are designed as 
>>> "multicast" or "replicated" registers:
>>> i915/gt/selftest_rps.c:                 pr_info("%s: rps counted %d 
>>> C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of 
>>> %uKHz\n",
>>> i915/gt/selftest_hangcheck.c:                   pr_err("[%s] GT is 
>>> wedged!\n", engine->name);
>>> i915/gt/selftest_hangcheck.c:           pr_err("GT is wedged!\n");
>>> i915/gt/intel_gt_clock_utils.c:                 "GT clock frequency 
>>> changed, was %uHz, now %uHz!\n",
>>> i915/gt/selftest_engine_pm.c:           pr_err("Unable to flush GT pm 
>>> before test\n");
>>> i915/gt/selftest_engine_pm.c: pr_err("GT failed to idle\n");
>>> i915/i915_sysfs.c:                       "failed to register GT sysfs 
>>> directory\n");
>>> i915/intel_uncore.h:     * of the basic non-engine GT registers 
>>> (referred to as "GSI" on
>>> i915/intel_uncore.h:     * newer platforms, or "GT block" on older 
>>> platforms)?  If so, we'll
>>>
>>>
>>>
>>>>> Then there is a question of naming. Are we okay with GT_XXX or, do 
>>>>> we want intel_gt_, or something completely different. I don't have 
>>>>> a strong opinion at the moment so I'll add some more folks to Cc.
>>>>
>>> You mean GT_ERR("msg") vs intel_gt_err("msg")? Personally, I would 
>>> prefer just gt_err("msg") to keep it as close to the official drm_* 
>>> versions as possible. Print lines tend to be excessively long 
>>> already. Taking a 'gt' parameter instead of a '&gt->i915->drm' 
>>> parameter does help with that but it seems like calling the wrapper 
>>> intel_gt_* is shooting ourselves in the foot on that one. And GT_ERR 
>>> vs gt_err just comes down to the fact that it is a macro wrapper and 
>>> therefore is required to be in upper case.
>>>
>>>> There was a maintainer level mini-discussion on this topic which I 
>>>> will try to summarise.
>>>>
>>>> Main contention point was the maintenance cost and generally an 
>>>> undesirable pattern of needing to add many 
>>>> subsystem/component/directory specific macros. Which then typically 
>>>> need extra flavours and so on. But over verbosity of the 
>>> How many versions are you expecting to add? Beyond the tile instance, 
>>> what further addressing requirements are there? The card instance is 
>>> already printed as part of the PCI address. The only other reason to 
>>> add per component wrappers would be to wrap the mechanism for getting 
>>> from some random per component object back to the intel_gt structure. 
>>> But that is hardware a new issue being added by this wrapper. It is 
>>> also not a requirement. Much of the code has a gt pointer already. 
>>> For the parts that don't, some of it would be a trivial engine->gt 
>>> type dereference, some of it is a more complex container_of type 
>>> construction. But for those, the given file will already have 
>>> multiple instances of that already (usually as the first or second 
>>> line of the function - 'intel_gt *gt = fancy_access_method(my_obj)' 
>>> so adding one or two more of those as necessary is not making the 
>>> code harder to read.
>>>
>>>> code is obviously also bad, so one compromise idea was to add a 
>>>> macro which builds the GT string and use drm logging helpers 
>>>> directly. This would be something like:
>>>>
>>>>  drm_err(GT_LOG("something went wrong ret=%d\n", gt), ret);
>>>>  drm_info(GT_LOG(...same...));
>>> Seriously? As above, some of these lines are already way too long, 
>>> this version makes them even longer with no obvious benefit. Worse, 
>>> it makes it harder to read what is going on. It is much less 
>>> intuitive to read than just replacing the drm_err itself. And having 
>>> two sets of parenthesis with some parameters inside the first and 
>>> some only inside the second is really horrid! Also, putting the 'gt' 
>>> parameter in the middle just confuses it with the rest of the printf 
>>> arguments even though there is no %d in the string for it. So now a 
>>> quick glances tells you that your code is wrong because you have 
>>> three format specifiers but four parameters.
>>>
>>> Whereas, just replacing drm_err with gt_err (or GT_ERR or 
>>> intel_gt_err) keeps everything else consistent. The first parameter 
>>> changes from 'drm' to 'gt' but is still the master object parameter 
>>> and it matches the function/macro prefix so inherently looks correct. 
>>> Then you have your message plus parameters. No confusing orders, no 
>>> confusing parenthesis, no excessive macro levels, no confusion at 
>>> all. Just nice simple, easy to read, easy to maintain code.
>>
>> I am personally okay with gt_err/GT_ERR some other folks might object 
>> though. And I can also understand the argument why it is better to not 
>> have to define gt_err, gt_warn, gt_info, gt_notice, gt_debug, 
>> gt_err_ratelimited, gt_warn_once.. and instead have only one macro.
> A small set of trivial macro definitions vs a complicated and unreadable 
> construct on every single print? Erm, isn't that the very definition of 
> abstracting to helpers as generally required by every code review ever?
> 
> And what 'other folks might object'? People already CC'd? People outside 
> of i915?
> 
> 
>>
>> Because of that I was passing on to you the compromise option.
>>
>> It maybe still has net space savings since we wouldn't have to be 
>> repeating the gt->i915->drm whatever and gt->info.id on every line.
>>
>> You are free to try the most compact one and see how hard those 
>> objections will be.
> Um. I already did. This patch. And you are the only person to have 
> objected in any manner at all.

Where I have objected?

I was a) asking to convert all gt/ within one kernel release, b) 
transferring the maintainer discussion from IRC to this email chain to 
outlay one alternative, for which I said I could see the pros and cons 
of both, and c) raised the naming question early since that can usually 
become a churn point later on when we have large scale code transformations.

As said, FWIW you have my ack for GT_XXX naming and approach, but please 
do convert the whole of gt/ so we don't ship with a mish-mash of log 
messages.

Regards,

Tvrtko

  reply	other threads:[~2022-11-09 11:05 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 17:25 [PATCH 0/2] Add GT oriented dmesg output John.C.Harrison
2022-11-04 17:25 ` [Intel-gfx] " John.C.Harrison
2022-11-04 17:25 ` [PATCH 1/2] drm/i915/gt: " John.C.Harrison
2022-11-04 17:25   ` [Intel-gfx] " John.C.Harrison
2022-11-05  1:03   ` Ceraolo Spurio, Daniele
2022-11-07  9:33     ` Tvrtko Ursulin
2022-11-07 16:17       ` Tvrtko Ursulin
2022-11-07 19:14         ` John Harrison
2022-11-08  9:01           ` Tvrtko Ursulin
2022-11-08 20:15             ` John Harrison
2022-11-09 11:05               ` Tvrtko Ursulin [this message]
2022-11-09 17:46                 ` John Harrison
2022-11-09 17:46                   ` John Harrison
2022-11-09 19:57                   ` Michal Wajdeczko
2022-11-10  9:55                     ` Tvrtko Ursulin
2022-11-10 10:35                       ` Michal Wajdeczko
2022-11-15 10:23                         ` Tvrtko Ursulin
2022-11-10 10:33                     ` Jani Nikula
2022-11-14 22:10                       ` John Harrison
2022-11-10  9:43                   ` Tvrtko Ursulin
2022-11-10  9:43                     ` Tvrtko Ursulin
2022-11-14 22:14                     ` John Harrison
2022-11-14 22:14                       ` John Harrison
2022-11-04 17:25 ` [PATCH 2/2] drm/i915/uc: Update the gt/uc code to use GT_ERR and friends John.C.Harrison
2022-11-04 17:25   ` [Intel-gfx] " John.C.Harrison
2022-11-04 17:47 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GT oriented dmesg output Patchwork
2022-11-04 18:04 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-05  9:24 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad19d7ce-4102-4f8f-903d-7390b004b2e9@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=john.c.harrison@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.