* [Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission
@ 2022-09-22 14:26 Dan Carpenter
2022-09-22 20:01 ` John Harrison
0 siblings, 1 reply; 3+ messages in thread
From: Dan Carpenter @ 2022-09-22 14:26 UTC (permalink / raw)
To: matthew.brost; +Cc: intel-gfx
Hello Matthew Brost,
The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc
submission" from Oct 14, 2021, leads to the following Smatch static
checker warning:
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request()
warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752'
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
673 {
674 int err = 0;
675 struct intel_context *ce = request_to_scheduling_context(rq);
676 u32 action[3];
677 int len = 0;
678 u32 g2h_len_dw = 0;
679 bool enabled;
680
681 lockdep_assert_held(&rq->engine->sched_engine->lock);
682
683 /*
684 * Corner case where requests were sitting in the priority list or a
685 * request resubmitted after the context was banned.
686 */
687 if (unlikely(intel_context_is_banned(ce))) {
688 i915_request_put(i915_request_mark_eio(rq));
689 intel_engine_signal_breadcrumbs(ce->engine);
690 return 0;
691 }
692
693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
694 GEM_BUG_ON(context_guc_id_invalid(ce));
695
696 if (context_policy_required(ce)) {
697 err = guc_context_policy_init_v70(ce, false);
698 if (err)
699 return err;
700 }
701
702 spin_lock(&ce->guc_state.lock);
703
704 /*
705 * The request / context will be run on the hardware when scheduling
706 * gets enabled in the unblock. For multi-lrc we still submit the
707 * context to move the LRC tails.
708 */
709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
710 goto out;
711
712 enabled = context_enabled(ce) || context_blocked(ce);
713
714 if (!enabled) {
715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
716 action[len++] = ce->guc_id.id;
717 action[len++] = GUC_CONTEXT_ENABLE;
718 set_context_pending_enable(ce);
719 intel_context_get(ce);
720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
721 } else {
722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
723 action[len++] = ce->guc_id.id;
724 }
725
726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
727 if (!enabled && !err) {
728 trace_intel_context_sched_enable(ce);
729 atomic_inc(&guc->outstanding_submission_g2h);
730 set_context_enabled(ce);
731
732 /*
733 * Without multi-lrc KMD does the submission step (moving the
734 * lrc tail) so enabling scheduling is sufficient to submit the
735 * context. This isn't the case in multi-lrc submission as the
736 * GuC needs to move the tails, hence the need for another H2G
737 * to submit a multi-lrc context after enabling scheduling.
738 */
739 if (intel_context_is_parent(ce)) {
740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
741 err = intel_guc_send_nb(guc, action, len - 1, 0);
Should this have a something like:
if (err)
intel_context_put(ce);
742 }
743 } else if (!enabled) {
744 clr_context_pending_enable(ce);
745 intel_context_put(ce);
746 }
747 if (likely(!err))
748 trace_i915_request_guc_submit(rq);
749
750 out:
751 spin_unlock(&ce->guc_state.lock);
--> 752 return err;
753 }
regards,
dan carpenter
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission
2022-09-22 14:26 [Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission Dan Carpenter
@ 2022-09-22 20:01 ` John Harrison
2022-09-23 9:32 ` Dan Carpenter
0 siblings, 1 reply; 3+ messages in thread
From: John Harrison @ 2022-09-22 20:01 UTC (permalink / raw)
To: Dan Carpenter, matthew.brost; +Cc: intel-gfx
On 9/22/2022 07:26, Dan Carpenter wrote:
> Hello Matthew Brost,
>
> The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc
> submission" from Oct 14, 2021, leads to the following Smatch static
> checker warning:
>
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request()
> warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752'
>
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> 672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> 673 {
> 674 int err = 0;
> 675 struct intel_context *ce = request_to_scheduling_context(rq);
> 676 u32 action[3];
> 677 int len = 0;
> 678 u32 g2h_len_dw = 0;
> 679 bool enabled;
> 680
> 681 lockdep_assert_held(&rq->engine->sched_engine->lock);
> 682
> 683 /*
> 684 * Corner case where requests were sitting in the priority list or a
> 685 * request resubmitted after the context was banned.
> 686 */
> 687 if (unlikely(intel_context_is_banned(ce))) {
> 688 i915_request_put(i915_request_mark_eio(rq));
> 689 intel_engine_signal_breadcrumbs(ce->engine);
> 690 return 0;
> 691 }
> 692
> 693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
> 694 GEM_BUG_ON(context_guc_id_invalid(ce));
> 695
> 696 if (context_policy_required(ce)) {
> 697 err = guc_context_policy_init_v70(ce, false);
> 698 if (err)
> 699 return err;
> 700 }
> 701
> 702 spin_lock(&ce->guc_state.lock);
> 703
> 704 /*
> 705 * The request / context will be run on the hardware when scheduling
> 706 * gets enabled in the unblock. For multi-lrc we still submit the
> 707 * context to move the LRC tails.
> 708 */
> 709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
> 710 goto out;
> 711
> 712 enabled = context_enabled(ce) || context_blocked(ce);
> 713
> 714 if (!enabled) {
> 715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> 716 action[len++] = ce->guc_id.id;
> 717 action[len++] = GUC_CONTEXT_ENABLE;
> 718 set_context_pending_enable(ce);
> 719 intel_context_get(ce);
> 720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
> 721 } else {
> 722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> 723 action[len++] = ce->guc_id.id;
> 724 }
> 725
> 726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> 727 if (!enabled && !err) {
> 728 trace_intel_context_sched_enable(ce);
> 729 atomic_inc(&guc->outstanding_submission_g2h);
> 730 set_context_enabled(ce);
> 731
> 732 /*
> 733 * Without multi-lrc KMD does the submission step (moving the
> 734 * lrc tail) so enabling scheduling is sufficient to submit the
> 735 * context. This isn't the case in multi-lrc submission as the
> 736 * GuC needs to move the tails, hence the need for another H2G
> 737 * to submit a multi-lrc context after enabling scheduling.
> 738 */
> 739 if (intel_context_is_parent(ce)) {
> 740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> 741 err = intel_guc_send_nb(guc, action, len - 1, 0);
>
> Should this have a something like:
>
> if (err)
> intel_context_put(ce);
No.
The ce has been marked as enabled above because the actual enable call
succeeded. This is a secondary call to 'poke' the context. If this
fails, the context might not get scheduled in a timely manner but the
tracking state is still correct. The context is now in use by GuC and
therefore must be reference locked. And the 'context_enabled' flag is
set on the i915 side to note that the reference count is held.
If you want to unwind at this point, you would need to send a
CONTEXT_MODE_SET(DISABLE) H2G message (amongst other things) and only
once that call has completed, can you call intel_context_put(ce).
I don't get why the analyser would be claiming a leak here. I'm not sure
if it is just that the analyser is confused or if there is some other
potential route to a leak. Is it possible to get more details as to how
it thinks the leak can occur?
John.
>
> 742 }
> 743 } else if (!enabled) {
> 744 clr_context_pending_enable(ce);
> 745 intel_context_put(ce);
> 746 }
> 747 if (likely(!err))
> 748 trace_i915_request_guc_submit(rq);
> 749
> 750 out:
> 751 spin_unlock(&ce->guc_state.lock);
> --> 752 return err;
> 753 }
>
> regards,
> dan carpenter
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission
2022-09-22 20:01 ` John Harrison
@ 2022-09-23 9:32 ` Dan Carpenter
0 siblings, 0 replies; 3+ messages in thread
From: Dan Carpenter @ 2022-09-23 9:32 UTC (permalink / raw)
To: John Harrison; +Cc: intel-gfx
On Thu, Sep 22, 2022 at 01:01:48PM -0700, John Harrison wrote:
> On 9/22/2022 07:26, Dan Carpenter wrote:
> > Hello Matthew Brost,
> >
> > The patch 6b540bf6f143: "drm/i915/guc: Implement multi-lrc
> > submission" from Oct 14, 2021, leads to the following Smatch static
> > checker warning:
> >
> > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:752 __guc_add_request()
> > warn: refcount leak 'ce->ref.refcount.refs.counter': lines='752'
> >
> > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > 672 static int __guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > 673 {
> > 674 int err = 0;
> > 675 struct intel_context *ce = request_to_scheduling_context(rq);
> > 676 u32 action[3];
> > 677 int len = 0;
> > 678 u32 g2h_len_dw = 0;
> > 679 bool enabled;
> > 680
> > 681 lockdep_assert_held(&rq->engine->sched_engine->lock);
> > 682
> > 683 /*
> > 684 * Corner case where requests were sitting in the priority list or a
> > 685 * request resubmitted after the context was banned.
> > 686 */
> > 687 if (unlikely(intel_context_is_banned(ce))) {
> > 688 i915_request_put(i915_request_mark_eio(rq));
> > 689 intel_engine_signal_breadcrumbs(ce->engine);
> > 690 return 0;
> > 691 }
> > 692
> > 693 GEM_BUG_ON(!atomic_read(&ce->guc_id.ref));
> > 694 GEM_BUG_ON(context_guc_id_invalid(ce));
> > 695
> > 696 if (context_policy_required(ce)) {
> > 697 err = guc_context_policy_init_v70(ce, false);
> > 698 if (err)
> > 699 return err;
> > 700 }
> > 701
> > 702 spin_lock(&ce->guc_state.lock);
> > 703
> > 704 /*
> > 705 * The request / context will be run on the hardware when scheduling
> > 706 * gets enabled in the unblock. For multi-lrc we still submit the
> > 707 * context to move the LRC tails.
> > 708 */
> > 709 if (unlikely(context_blocked(ce) && !intel_context_is_parent(ce)))
> > 710 goto out;
> > 711
> > 712 enabled = context_enabled(ce) || context_blocked(ce);
> > 713
> > 714 if (!enabled) {
> > 715 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > 716 action[len++] = ce->guc_id.id;
> > 717 action[len++] = GUC_CONTEXT_ENABLE;
> > 718 set_context_pending_enable(ce);
> > 719 intel_context_get(ce);
> > 720 g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
> > 721 } else {
> > 722 action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > 723 action[len++] = ce->guc_id.id;
> > 724 }
> > 725
> > 726 err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> > 727 if (!enabled && !err) {
> > 728 trace_intel_context_sched_enable(ce);
> > 729 atomic_inc(&guc->outstanding_submission_g2h);
> > 730 set_context_enabled(ce);
> > 731
> > 732 /*
> > 733 * Without multi-lrc KMD does the submission step (moving the
> > 734 * lrc tail) so enabling scheduling is sufficient to submit the
> > 735 * context. This isn't the case in multi-lrc submission as the
> > 736 * GuC needs to move the tails, hence the need for another H2G
> > 737 * to submit a multi-lrc context after enabling scheduling.
> > 738 */
> > 739 if (intel_context_is_parent(ce)) {
> > 740 action[0] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > 741 err = intel_guc_send_nb(guc, action, len - 1, 0);
> >
> > Should this have a something like:
> >
> > if (err)
> > intel_context_put(ce);
> No.
>
> The ce has been marked as enabled above because the actual enable call
> succeeded.? This is a secondary call to 'poke' the context. If this fails,
> the context might not get scheduled in a timely manner but the tracking
> state is still correct. The context is now in use by GuC and therefore must
> be reference locked. And the 'context_enabled' flag is set on the i915 side
> to note that the reference count is held.
>
> If you want to unwind at this point, you would need to send a
> CONTEXT_MODE_SET(DISABLE) H2G message (amongst other things) and only once
> that call has completed, can you call intel_context_put(ce).
>
> I don't get why the analyser would be claiming a leak here. I'm not sure if
> it is just that the analyser is confused or if there is some other potential
> route to a leak. Is it possible to get more details as to how it thinks the
> leak can occur?
Thanks, this helps me a lot!
This is a Smatch static analysis thing I'm working on. It simple enough
to silence the false positive. I add this line which says that after
set_context_enabled() then ignore the reference counting.
add_function_param_key_hook("set_context_enabled", &match_ignore,
0, "$->ref.refcount.refs.counter", NULL);
The heuristic that the check is using is that if some error paths drop
the refcount then all error paths should do it. So adding the if
statement I suggested would silence the warning (but introduce a bug in
the kernel).
regards,
dan carpenter
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-23 9:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 14:26 [Intel-gfx] [bug report] drm/i915/guc: Implement multi-lrc submission Dan Carpenter
2022-09-22 20:01 ` John Harrison
2022-09-23 9:32 ` Dan Carpenter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.