On 16/01/2019 18:04, Lionel Landwerlin wrote:

On 16/01/2019 16:31, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-01-16 16:25:26)

On 16/01/2019 16:05, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-01-16 15:58:00)

On 16/01/2019 15:52, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-01-16 15:36:20)

@@ -1877,6 +1883,21 @@ struct drm_i915_private {
                           wait_queue_head_t poll_wq;
                           bool pollin;
    +                       /**
+                        * Atomic counter incremented by the interrupt
+                        * handling code for each OA half full interrupt
+                        * received.
+                        */
+                       atomic64_t half_full_count;
+
+                       /**
+                        * Copy of the atomic half_full_count that was last
+                        * processed in the i915-perf driver. If both counters
+                        * differ, there is data available to read in the OA
+                        * buffer.
+                        */
+                       u64 half_full_count_last;

Eh? But why a relatively expensive atomic64. You only need one bit, and
reading the tail pointer from WB memory should just be cheap. You should
be able to sample the perf ringbuffer pointers very cheaply... What am I
missing?
-Chris

Fair comment.

The thing is with this series there are 2 mechanism that notify the poll_wq.

One is the hrtimer that kicks in at regular interval and polls the
register with the workaround.

The other is the interrupt which doesn't read the registers and workaround.

What's the complication with the workaround?

It's a bit more than just looking at registers, we actually have to look
at the content of the buffer to figure out what landed in memory.

The register values are not synchronized with the memory writes...

I don't want to look at registers at all for polling, and you shouldn't
need to since communication is via a ringbuf.

There is a comment in the code (i915_perf_poll_locked) about not
checking the register after each wakeup because that may be a very hot path.

The atomic64 sounded like a lesser evil.

I'm clearly not understanding something here...

Does the hardware not do:
    update ringbuf data;
    wmb() (or post to global observation point in their parlance)
    update ringbuf tail

As far as I understand, the OA unit :

    sends its memory write requests to the memory controller

    immediately updates the ringbuf tail, without waiting for the previous requests to complete

By experimentation, I've haven't seen a delta between what is available in memory and what the OA tail register indicate larger than 768 bytes, which is about 3 OA reports at the largest size.
There is probably a maximum number of write requests the OA unit can queue before blocking.

So again maybe you would prefer a 2 stage mechanism :

OA interrupt -----> head/tail pointer worker  -----> wake up userspace

                    hrtimer head/tail pointer --|

Then we just need to sample the ringbuf tail and compare against how far
we read last time?
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx