* (no subject)
@ 2012-07-19 20:00 Olivier Galibert
2012-07-19 20:00 ` [PATCH 1/9] intel gen4-5: fix the vue view in the fs Olivier Galibert
` (8 more replies)
0 siblings, 9 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev
Hi,
This is the second verion of the clipping/interpolation patches.
Main differences:
- I tried to take all of Paul's remarks into account
- I exploded the first patch in 4 independant ones
- I've added a patch to ensure that integers pass through unscathed
Patch 4/9 is (slightly) controversial. There may be better ways to do
it, or at least more general ones. But it's simple, it works, and it
allows to validate the other 8. It's an easy one to revert if we
build an alternative.
Best,
OG.
[PATCH 1/9] intel gen4-5: fix the vue view in the fs.
[PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf.
[PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection.
[PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one
[PATCH 5/9] intel gen4-5: Compute the interpolation status for every
[PATCH 6/9] intel gen4-5: Correctly setup the parameters in the sf.
[PATCH 7/9] intel gen4-5: Correctly handle flat vs. non-flat in the
[PATCH 8/9] intel gen4-5: Make noperspective clipping work.
[PATCH 9/9] intel gen4-5: Don't touch flatshaded values when
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 1/9] intel gen4-5: fix the vue view in the fs.
2012-07-19 20:00 (no subject) Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-26 17:18 ` [Mesa-dev] " Eric Anholt
2012-07-19 20:00 ` [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf Olivier Galibert
` (7 subsequent siblings)
8 siblings, 1 reply; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
In some cases the fragment shader view of the vue registers was out of
sync with the builder. This fixes it.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 9 ++++++++-
src/mesa/drivers/dri/i965/brw_wm_pass2.c | 10 +++++++++-
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b3b25cc..3f98137 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -972,8 +972,15 @@ fs_visitor::calculate_urb_setup()
if (c->key.vp_outputs_written & BITFIELD64_BIT(i)) {
int fp_index = _mesa_vert_result_to_frag_attrib((gl_vert_result) i);
+ /* The back color slot is skipped when the front color is
+ * also written to. In addition, some slots can be
+ * written in the vertex shader and not read in the
+ * fragment shader. So the register number must always be
+ * incremented, mapped or not.
+ */
if (fp_index >= 0)
- urb_setup[fp_index] = urb_next++;
+ urb_setup[fp_index] = urb_next;
+ urb_next++;
}
}
diff --git a/src/mesa/drivers/dri/i965/brw_wm_pass2.c b/src/mesa/drivers/dri/i965/brw_wm_pass2.c
index 27c0a94..eacf7c0 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_pass2.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_pass2.c
@@ -97,8 +97,16 @@ static void init_registers( struct brw_wm_compile *c )
int fp_index = _mesa_vert_result_to_frag_attrib(j);
nr_interp_regs++;
+
+ /* The back color slot is skipped when the front color is
+ * also written to. In addition, some slots can be
+ * written in the vertex shader and not read in the
+ * fragment shader. So the register number must always be
+ * incremented, mapped or not.
+ */
if (fp_index >= 0)
- prealloc_reg(c, &c->payload.input_interp[fp_index], i++);
+ prealloc_reg(c, &c->payload.input_interp[fp_index], i);
+ i++;
}
}
assert(nr_interp_regs >= 1);
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [Mesa-dev] [PATCH 1/9] intel gen4-5: fix the vue view in the fs.
2012-07-19 20:00 ` [PATCH 1/9] intel gen4-5: fix the vue view in the fs Olivier Galibert
@ 2012-07-26 17:18 ` Eric Anholt
2012-07-27 9:21 ` Olivier Galibert
0 siblings, 1 reply; 41+ messages in thread
From: Eric Anholt @ 2012-07-26 17:18 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
[-- Attachment #1.1: Type: text/plain, Size: 408 bytes --]
Olivier Galibert <galibert@pobox.com> writes:
> In some cases the fragment shader view of the vue registers was out of
> sync with the builder. This fixes it.
s/builder/SF outputs/ ?
I'd love to see the pre-gen6 code get rearranged so the FS walked the
bitfield of FS inputs from SF and chose the urb offset for each. But
this does look like the minimal fix.
Reviewed-by: Eric Anholt <eric@anholt.net>
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Mesa-dev] [PATCH 1/9] intel gen4-5: fix the vue view in the fs.
2012-07-26 17:18 ` [Mesa-dev] " Eric Anholt
@ 2012-07-27 9:21 ` Olivier Galibert
0 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-27 9:21 UTC (permalink / raw)
To: Eric Anholt; +Cc: mesa-dev, intel-gfx
On Thu, Jul 26, 2012 at 10:18:01AM -0700, Eric Anholt wrote:
> Olivier Galibert <galibert@pobox.com> writes:
>
> > In some cases the fragment shader view of the vue registers was out of
> > sync with the builder. This fixes it.
>
> s/builder/SF outputs/ ?
>
> I'd love to see the pre-gen6 code get rearranged so the FS walked the
> bitfield of FS inputs from SF and chose the urb offset for each. But
> this does look like the minimal fix.
In other words, an explicit linking pass? That could be useful with
geometry shaders, too.
OG.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf.
2012-07-19 20:00 (no subject) Olivier Galibert
2012-07-19 20:00 ` [PATCH 1/9] intel gen4-5: fix the vue view in the fs Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-26 17:20 ` Eric Anholt
2012-07-19 20:00 ` [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection Olivier Galibert
` (6 subsequent siblings)
8 siblings, 1 reply; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
This patch is mostly designed to make followup patches simpler, but
it's a simplification by itself.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_sf_emit.c | 93 +++++++++++++++++--------------
1 file changed, 52 insertions(+), 41 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_sf_emit.c b/src/mesa/drivers/dri/i965/brw_sf_emit.c
index ff6383b..9d8aa38 100644
--- a/src/mesa/drivers/dri/i965/brw_sf_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_sf_emit.c
@@ -79,24 +79,9 @@ have_attr(struct brw_sf_compile *c, GLuint attr)
/***********************************************************************
* Twoside lighting
*/
-static void copy_bfc( struct brw_sf_compile *c,
- struct brw_reg vert )
-{
- struct brw_compile *p = &c->func;
- GLuint i;
-
- for (i = 0; i < 2; i++) {
- if (have_attr(c, VERT_RESULT_COL0+i) &&
- have_attr(c, VERT_RESULT_BFC0+i))
- brw_MOV(p,
- get_vert_result(c, vert, VERT_RESULT_COL0+i),
- get_vert_result(c, vert, VERT_RESULT_BFC0+i));
- }
-}
-
-
static void do_twoside_color( struct brw_sf_compile *c )
{
+ GLuint i, need_0, need_1;
struct brw_compile *p = &c->func;
GLuint backface_conditional = c->key.frontface_ccw ? BRW_CONDITIONAL_G : BRW_CONDITIONAL_L;
@@ -105,12 +90,14 @@ static void do_twoside_color( struct brw_sf_compile *c )
if (c->key.primitive == SF_UNFILLED_TRIS)
return;
- /* XXX: What happens if BFC isn't present? This could only happen
- * for user-supplied vertex programs, as t_vp_build.c always does
- * the right thing.
+ /* If the vertex shader provides both front and backface color, do
+ * the selection. Otherwise the generated code will pick up
+ * whichever there is.
*/
- if (!(have_attr(c, VERT_RESULT_COL0) && have_attr(c, VERT_RESULT_BFC0)) &&
- !(have_attr(c, VERT_RESULT_COL1) && have_attr(c, VERT_RESULT_BFC1)))
+ need_0 = have_attr(c, VERT_RESULT_COL0) && have_attr(c, VERT_RESULT_BFC0);
+ need_1 = have_attr(c, VERT_RESULT_COL1) && have_attr(c, VERT_RESULT_BFC1);
+
+ if (!need_0 && !need_1)
return;
/* Need to use BRW_EXECUTE_4 and also do an 4-wide compare in order
@@ -121,12 +108,15 @@ static void do_twoside_color( struct brw_sf_compile *c )
brw_push_insn_state(p);
brw_CMP(p, vec4(brw_null_reg()), backface_conditional, c->det, brw_imm_f(0));
brw_IF(p, BRW_EXECUTE_4);
- {
- switch (c->nr_verts) {
- case 3: copy_bfc(c, c->vert[2]);
- case 2: copy_bfc(c, c->vert[1]);
- case 1: copy_bfc(c, c->vert[0]);
- }
+ for (i=0; i<c->nr_verts; i++) {
+ if (need_0)
+ brw_MOV(p,
+ get_vert_result(c, c->vert[i], VERT_RESULT_COL0),
+ get_vert_result(c, c->vert[i], VERT_RESULT_BFC0));
+ if (need_1)
+ brw_MOV(p,
+ get_vert_result(c, c->vert[i], VERT_RESULT_COL1),
+ get_vert_result(c, c->vert[i], VERT_RESULT_BFC1));
}
brw_ENDIF(p);
brw_pop_insn_state(p);
@@ -139,20 +129,27 @@ static void do_twoside_color( struct brw_sf_compile *c )
*/
#define VERT_RESULT_COLOR_BITS (BITFIELD64_BIT(VERT_RESULT_COL0) | \
- BITFIELD64_BIT(VERT_RESULT_COL1))
+ BITFIELD64_BIT(VERT_RESULT_COL1))
static void copy_colors( struct brw_sf_compile *c,
struct brw_reg dst,
- struct brw_reg src)
+ struct brw_reg src,
+ int allow_twoside)
{
struct brw_compile *p = &c->func;
GLuint i;
for (i = VERT_RESULT_COL0; i <= VERT_RESULT_COL1; i++) {
- if (have_attr(c,i))
+ if (have_attr(c,i)) {
brw_MOV(p,
get_vert_result(c, dst, i),
get_vert_result(c, src, i));
+
+ } else if(allow_twoside && have_attr(c, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0)) {
+ brw_MOV(p,
+ get_vert_result(c, dst, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0),
+ get_vert_result(c, src, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0));
+ }
}
}
@@ -167,9 +164,19 @@ static void do_flatshade_triangle( struct brw_sf_compile *c )
struct brw_compile *p = &c->func;
struct intel_context *intel = &p->brw->intel;
struct brw_reg ip = brw_ip_reg();
- GLuint nr = _mesa_bitcount_64(c->key.attrs & VERT_RESULT_COLOR_BITS);
GLuint jmpi = 1;
+ GLuint nr;
+
+ if (c->key.do_twoside_color) {
+ nr = ((c->key.attrs & (BITFIELD64_BIT(VERT_RESULT_COL0) | BITFIELD64_BIT(VERT_RESULT_BFC0))) != 0) +
+ ((c->key.attrs & (BITFIELD64_BIT(VERT_RESULT_COL1) | BITFIELD64_BIT(VERT_RESULT_BFC1))) != 0);
+
+ } else {
+ nr = ((c->key.attrs & BITFIELD64_BIT(VERT_RESULT_COL0)) != 0) +
+ ((c->key.attrs & BITFIELD64_BIT(VERT_RESULT_COL1)) != 0);
+ }
+
if (!nr)
return;
@@ -186,16 +193,16 @@ static void do_flatshade_triangle( struct brw_sf_compile *c )
brw_MUL(p, c->pv, c->pv, brw_imm_d(jmpi*(nr*2+1)));
brw_JMPI(p, ip, ip, c->pv);
- copy_colors(c, c->vert[1], c->vert[0]);
- copy_colors(c, c->vert[2], c->vert[0]);
+ copy_colors(c, c->vert[1], c->vert[0], c->key.do_twoside_color);
+ copy_colors(c, c->vert[2], c->vert[0], c->key.do_twoside_color);
brw_JMPI(p, ip, ip, brw_imm_d(jmpi*(nr*4+1)));
- copy_colors(c, c->vert[0], c->vert[1]);
- copy_colors(c, c->vert[2], c->vert[1]);
+ copy_colors(c, c->vert[0], c->vert[1], c->key.do_twoside_color);
+ copy_colors(c, c->vert[2], c->vert[1], c->key.do_twoside_color);
brw_JMPI(p, ip, ip, brw_imm_d(jmpi*nr*2));
- copy_colors(c, c->vert[0], c->vert[2]);
- copy_colors(c, c->vert[1], c->vert[2]);
+ copy_colors(c, c->vert[0], c->vert[2], c->key.do_twoside_color);
+ copy_colors(c, c->vert[1], c->vert[2], c->key.do_twoside_color);
brw_pop_insn_state(p);
}
@@ -224,10 +231,10 @@ static void do_flatshade_line( struct brw_sf_compile *c )
brw_MUL(p, c->pv, c->pv, brw_imm_d(jmpi*(nr+1)));
brw_JMPI(p, ip, ip, c->pv);
- copy_colors(c, c->vert[1], c->vert[0]);
+ copy_colors(c, c->vert[1], c->vert[0], 0);
brw_JMPI(p, ip, ip, brw_imm_ud(jmpi*nr));
- copy_colors(c, c->vert[0], c->vert[1]);
+ copy_colors(c, c->vert[0], c->vert[1], 0);
brw_pop_insn_state(p);
}
@@ -337,13 +344,17 @@ calculate_masks(struct brw_sf_compile *c,
if (c->key.do_flat_shading)
persp_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_HPOS) |
BITFIELD64_BIT(VERT_RESULT_COL0) |
- BITFIELD64_BIT(VERT_RESULT_COL1));
+ BITFIELD64_BIT(VERT_RESULT_COL1) |
+ BITFIELD64_BIT(VERT_RESULT_BFC0) |
+ BITFIELD64_BIT(VERT_RESULT_BFC1));
else
persp_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_HPOS));
if (c->key.do_flat_shading)
linear_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_COL0) |
- BITFIELD64_BIT(VERT_RESULT_COL1));
+ BITFIELD64_BIT(VERT_RESULT_COL1) |
+ BITFIELD64_BIT(VERT_RESULT_BFC0) |
+ BITFIELD64_BIT(VERT_RESULT_BFC1));
else
linear_mask = c->key.attrs;
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf.
2012-07-19 20:00 ` [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf Olivier Galibert
@ 2012-07-26 17:20 ` Eric Anholt
0 siblings, 0 replies; 41+ messages in thread
From: Eric Anholt @ 2012-07-26 17:20 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
[-- Attachment #1.1: Type: text/plain, Size: 961 bytes --]
Olivier Galibert <galibert@pobox.com> writes:
> @@ -121,12 +108,15 @@ static void do_twoside_color( struct brw_sf_compile *c )
> brw_push_insn_state(p);
> brw_CMP(p, vec4(brw_null_reg()), backface_conditional, c->det, brw_imm_f(0));
> brw_IF(p, BRW_EXECUTE_4);
> - {
> - switch (c->nr_verts) {
> - case 3: copy_bfc(c, c->vert[2]);
> - case 2: copy_bfc(c, c->vert[1]);
> - case 1: copy_bfc(c, c->vert[0]);
> - }
> + for (i=0; i<c->nr_verts; i++) {
We tend to put spaces around our binary operators.
> + if (need_0)
> + brw_MOV(p,
> + get_vert_result(c, c->vert[i], VERT_RESULT_COL0),
> + get_vert_result(c, c->vert[i], VERT_RESULT_BFC0));
> + if (need_1)
> + brw_MOV(p,
> + get_vert_result(c, c->vert[i], VERT_RESULT_COL1),
> + get_vert_result(c, c->vert[i], VERT_RESULT_BFC1));
trim trailing whitespace.
Other than that,
Reviewed-by: Eric Anholt <eric@anholt.net>
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection.
2012-07-19 20:00 (no subject) Olivier Galibert
2012-07-19 20:00 ` [PATCH 1/9] intel gen4-5: fix the vue view in the fs Olivier Galibert
2012-07-19 20:00 ` [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-26 17:19 ` Eric Anholt
2012-07-19 20:00 ` [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to Olivier Galibert
` (5 subsequent siblings)
8 siblings, 1 reply; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
Previous code only selected two side in pure fixed-function setups.
This version also activates it when needed with shaders programs.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_sf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c
index 23a874a..791210f 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -192,7 +192,7 @@ brw_upload_sf_prog(struct brw_context *brw)
/* _NEW_LIGHT */
key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
- key.do_twoside_color = (ctx->Light.Enabled && ctx->Light.Model.TwoSide);
+ key.do_twoside_color = ctx->VertexProgram._TwoSideEnabled;
/* _NEW_POLYGON */
if (key.do_twoside_color) {
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection.
2012-07-19 20:00 ` [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection Olivier Galibert
@ 2012-07-26 17:19 ` Eric Anholt
0 siblings, 0 replies; 41+ messages in thread
From: Eric Anholt @ 2012-07-26 17:19 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
[-- Attachment #1.1: Type: text/plain, Size: 1051 bytes --]
Olivier Galibert <galibert@pobox.com> writes:
> Previous code only selected two side in pure fixed-function setups.
> This version also activates it when needed with shaders programs.
>
> Signed-off-by: Olivier Galibert <galibert@pobox.com>
> ---
> src/mesa/drivers/dri/i965/brw_sf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c
> index 23a874a..791210f 100644
> --- a/src/mesa/drivers/dri/i965/brw_sf.c
> +++ b/src/mesa/drivers/dri/i965/brw_sf.c
> @@ -192,7 +192,7 @@ brw_upload_sf_prog(struct brw_context *brw)
>
> /* _NEW_LIGHT */
> key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
> - key.do_twoside_color = (ctx->Light.Enabled && ctx->Light.Model.TwoSide);
> + key.do_twoside_color = ctx->VertexProgram._TwoSideEnabled;
ctx->VertexProgram._TwoSideEnabled is also changed when _NEW_PROGRAM is
set, so that should be noted in the _NEW_LIGHT comment above and
included in brw_sf_prog.dirty.mesa.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to.
2012-07-19 20:00 (no subject) Olivier Galibert
` (2 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-20 17:01 ` Eric Anholt
2012-07-19 20:00 ` [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place Olivier Galibert
` (4 subsequent siblings)
8 siblings, 1 reply; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
Shaders, piglit test ones in particular, may write only to one of
gl_FrontColor/gl_BackColor. The standard is unclear on whether the
behaviour is defined in that case, but it seems reasonable to support
it.
The choice done there to pick up whichever color was actually written
to. That makes most of the generated piglit tests useless to test the
backface selection, but it's simple and it works.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++++++++
src/mesa/drivers/dri/i965/brw_wm_pass2.c | 9 +++++++++
2 files changed, 18 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3f98137..3b62952 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -972,6 +972,15 @@ fs_visitor::calculate_urb_setup()
if (c->key.vp_outputs_written & BITFIELD64_BIT(i)) {
int fp_index = _mesa_vert_result_to_frag_attrib((gl_vert_result) i);
+ /* Special case: two-sided vertex option, vertex program
+ * only writes to the back color. Map it to the
+ * associated front color location.
+ */
+ if (i >= VERT_RESULT_BFC0 && i <= VERT_RESULT_BFC1 &&
+ ctx->VertexProgram._TwoSideEnabled &&
+ urb_setup[i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0] == -1)
+ fp_index = i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0;
+
/* The back color slot is skipped when the front color is
* also written to. In addition, some slots can be
* written in the vertex shader and not read in the
diff --git a/src/mesa/drivers/dri/i965/brw_wm_pass2.c b/src/mesa/drivers/dri/i965/brw_wm_pass2.c
index eacf7c0..48143f3 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_pass2.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_pass2.c
@@ -96,6 +96,15 @@ static void init_registers( struct brw_wm_compile *c )
if (c->key.vp_outputs_written & BITFIELD64_BIT(j)) {
int fp_index = _mesa_vert_result_to_frag_attrib(j);
+ /* Special case: two-sided vertex option, vertex program
+ * only writes to the back color. Map it to the
+ * associated front color location.
+ */
+ if (j >= VERT_RESULT_BFC0 && j <= VERT_RESULT_BFC1 &&
+ intel->ctx.VertexProgram._TwoSideEnabled &&
+ !(c->key.vp_outputs_written & BITFIELD64_BIT(j - VERT_RESULT_BFC0 + VERT_RESULT_COL0)))
+ fp_index = j - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0;
+
nr_interp_regs++;
/* The back color slot is skipped when the front color is
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to.
2012-07-19 20:00 ` [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to Olivier Galibert
@ 2012-07-20 17:01 ` Eric Anholt
2012-07-20 18:03 ` Olivier Galibert
0 siblings, 1 reply; 41+ messages in thread
From: Eric Anholt @ 2012-07-20 17:01 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
[-- Attachment #1.1: Type: text/plain, Size: 1951 bytes --]
Olivier Galibert <galibert@pobox.com> writes:
> Shaders, piglit test ones in particular, may write only to one of
> gl_FrontColor/gl_BackColor. The standard is unclear on whether the
> behaviour is defined in that case, but it seems reasonable to support
> it.
>
> The choice done there to pick up whichever color was actually written
> to. That makes most of the generated piglit tests useless to test the
> backface selection, but it's simple and it works.
>
> Signed-off-by: Olivier Galibert <galibert@pobox.com>
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++++++++
> src/mesa/drivers/dri/i965/brw_wm_pass2.c | 9 +++++++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 3f98137..3b62952 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -972,6 +972,15 @@ fs_visitor::calculate_urb_setup()
> if (c->key.vp_outputs_written & BITFIELD64_BIT(i)) {
> int fp_index = _mesa_vert_result_to_frag_attrib((gl_vert_result) i);
>
> + /* Special case: two-sided vertex option, vertex program
> + * only writes to the back color. Map it to the
> + * associated front color location.
> + */
> + if (i >= VERT_RESULT_BFC0 && i <= VERT_RESULT_BFC1 &&
> + ctx->VertexProgram._TwoSideEnabled &&
> + urb_setup[i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0] == -1)
> + fp_index = i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0;
In the fs_visitor (and brw_wm_pass*), you don't get to look at ctx->
state like that -- you're getting called once with some set of ctx
state, but the program will get reused even if the ctx state changes.
You'd have to get that state into the wm prog key, and use that, which
would guarantee that you have the appropriate program code.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to.
2012-07-20 17:01 ` Eric Anholt
@ 2012-07-20 18:03 ` Olivier Galibert
0 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-20 18:03 UTC (permalink / raw)
To: Eric Anholt; +Cc: mesa-dev, intel-gfx
On Fri, Jul 20, 2012 at 10:01:03AM -0700, Eric Anholt wrote:
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 3f98137..3b62952 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -972,6 +972,15 @@ fs_visitor::calculate_urb_setup()
> > if (c->key.vp_outputs_written & BITFIELD64_BIT(i)) {
> > int fp_index = _mesa_vert_result_to_frag_attrib((gl_vert_result) i);
> >
> > + /* Special case: two-sided vertex option, vertex program
> > + * only writes to the back color. Map it to the
> > + * associated front color location.
> > + */
> > + if (i >= VERT_RESULT_BFC0 && i <= VERT_RESULT_BFC1 &&
> > + ctx->VertexProgram._TwoSideEnabled &&
> > + urb_setup[i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0] == -1)
> > + fp_index = i - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0;
>
> In the fs_visitor (and brw_wm_pass*), you don't get to look at ctx->
> state like that -- you're getting called once with some set of ctx
> state, but the program will get reused even if the ctx state changes.
> You'd have to get that state into the wm prog key, and use that, which
> would guarantee that you have the appropriate program code.
Ok. OTOH, we don't actually *need* to look at TwoSideEnabled. If the
rest of the condition triggers it's either correct or undefined
behaviour. So we can do it systematically.
OG.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place.
2012-07-19 20:00 (no subject) Olivier Galibert
` (3 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-26 17:22 ` [Mesa-dev] " Eric Anholt
2012-07-19 20:00 ` [PATCH 6/9] intel gen4-5: Correctly setup the parameters in the sf Olivier Galibert
` (3 subsequent siblings)
8 siblings, 1 reply; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
The program keys are updated accordingly, but the values are not used
yet.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_clip.c | 90 ++++++++++++++++++++++++++++++-
src/mesa/drivers/dri/i965/brw_clip.h | 1 +
src/mesa/drivers/dri/i965/brw_context.h | 11 ++++
src/mesa/drivers/dri/i965/brw_sf.c | 5 +-
src/mesa/drivers/dri/i965/brw_sf.h | 1 +
src/mesa/drivers/dri/i965/brw_wm.c | 2 +
src/mesa/drivers/dri/i965/brw_wm.h | 1 +
7 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c
index d411208..b4a2e0a 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -47,6 +47,86 @@
#define FRONT_UNFILLED_BIT 0x1
#define BACK_UNFILLED_BIT 0x2
+/**
+ * Lookup the interpolation mode information for every element in the
+ * vue.
+ */
+static void
+brw_lookup_interpolation(struct brw_context *brw)
+{
+ /* pprog means "previous program", i.e. the last program before the
+ * fragment shader. It can only be the vertex shader for now, but
+ * it may be a geometry shader in the future.
+ */
+ const struct gl_program *pprog = &brw->vertex_program->Base;
+ const struct gl_fragment_program *fprog = brw->fragment_program;
+ struct brw_vue_map *vue_map = &brw->vs.prog_data->vue_map;
+
+ /* Default everything to INTERP_QUALIFIER_NONE */
+ memset(brw->interpolation_mode, INTERP_QUALIFIER_NONE, BRW_VERT_RESULT_MAX);
+
+ /* If there is no fragment shader, interpolation won't be needed,
+ * so defaulting to none is good.
+ */
+ if (!fprog)
+ return;
+
+ for (int i = 0; i < vue_map->num_slots; i++) {
+ /* First lookup the vert result, skip if there isn't one */
+ int vert_result = vue_map->slot_to_vert_result[i];
+ if (vert_result == BRW_VERT_RESULT_MAX)
+ continue;
+
+ /* HPOS is special. In the clipper, it is handled specifically,
+ * so its value is irrelevant. In the sf, it's forced to
+ * linear. In the wm, it's special cased, irrelevant again. So
+ * force linear to remove the sf special case.
+ */
+ if (vert_result == VERT_RESULT_HPOS) {
+ brw->interpolation_mode[i] = INTERP_QUALIFIER_NOPERSPECTIVE;
+ continue;
+ }
+
+ /* There is a 1-1 mapping of vert result to frag attrib except
+ * for BackColor and vars
+ */
+ int frag_attrib = vert_result;
+ if (vert_result >= VERT_RESULT_BFC0 && vert_result <= VERT_RESULT_BFC1)
+ frag_attrib = vert_result - VERT_RESULT_BFC0 + FRAG_ATTRIB_COL0;
+ else if(vert_result >= VERT_RESULT_VAR0)
+ frag_attrib = vert_result - VERT_RESULT_VAR0 + FRAG_ATTRIB_VAR0;
+
+ /* If the output is not used by the fragment shader, skip it. */
+ if (!(fprog->Base.InputsRead & BITFIELD64_BIT(frag_attrib)))
+ continue;
+
+ /* Lookup the interpolation mode */
+ enum glsl_interp_qualifier interpolation_mode = fprog->InterpQualifier[frag_attrib];
+
+ /* If the mode is not specified, then the default varies. Color
+ * values follow the shader model, while all the rest uses
+ * smooth.
+ */
+ if (interpolation_mode == INTERP_QUALIFIER_NONE) {
+ if (frag_attrib >= FRAG_ATTRIB_COL0 && frag_attrib <= FRAG_ATTRIB_COL1)
+ interpolation_mode = brw->intel.ctx.Light.ShadeModel == GL_FLAT ? INTERP_QUALIFIER_FLAT : INTERP_QUALIFIER_SMOOTH;
+ else
+ interpolation_mode = INTERP_QUALIFIER_SMOOTH;
+ }
+
+ /* Finally, if we have both a front color and a back color for
+ * the same channel, the selection will be done before
+ * interpolation and the back color copied over the front color
+ * if necessary. So interpolating the back color is
+ * unnecessary.
+ */
+ if (vert_result >= VERT_RESULT_BFC0 && vert_result <= VERT_RESULT_BFC1)
+ if (pprog->OutputsWritten & BITFIELD64_BIT(vert_result - VERT_RESULT_BFC0 + VERT_RESULT_COL0))
+ interpolation_mode = INTERP_QUALIFIER_NONE;
+
+ brw->interpolation_mode[i] = interpolation_mode;
+ }
+}
static void compile_clip_prog( struct brw_context *brw,
struct brw_clip_prog_key *key )
@@ -143,6 +223,10 @@ brw_upload_clip_prog(struct brw_context *brw)
/* Populate the key:
*/
+
+ /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
+ brw_lookup_interpolation(brw);
+
/* BRW_NEW_REDUCED_PRIMITIVE */
key.primitive = brw->intel.reduced_primitive;
/* CACHE_NEW_VS_PROG (also part of VUE map) */
@@ -150,6 +234,10 @@ brw_upload_clip_prog(struct brw_context *brw)
/* _NEW_LIGHT */
key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
key.pv_first = (ctx->Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION);
+
+ /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
+ memcpy(key.interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
+
/* _NEW_TRANSFORM (also part of VUE map)*/
key.nr_userclip = _mesa_bitcount_64(ctx->Transform.ClipPlanesEnabled);
@@ -258,7 +346,7 @@ const struct brw_tracked_state brw_clip_prog = {
_NEW_TRANSFORM |
_NEW_POLYGON |
_NEW_BUFFERS),
- .brw = (BRW_NEW_REDUCED_PRIMITIVE),
+ .brw = (BRW_NEW_FRAGMENT_PROGRAM|BRW_NEW_REDUCED_PRIMITIVE),
.cache = CACHE_NEW_VS_PROG
},
.emit = brw_upload_clip_prog
diff --git a/src/mesa/drivers/dri/i965/brw_clip.h b/src/mesa/drivers/dri/i965/brw_clip.h
index 9185651..e78d074 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.h
+++ b/src/mesa/drivers/dri/i965/brw_clip.h
@@ -43,6 +43,7 @@
*/
struct brw_clip_prog_key {
GLbitfield64 attrs;
+ unsigned char interpolation_mode[BRW_VERT_RESULT_MAX]; /* copy of the main context */
GLuint primitive:4;
GLuint nr_userclip:4;
GLuint do_flat_shading:1;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index b4868fe..afafa47 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1054,6 +1054,17 @@ struct brw_context
uint32_t render_target_format[MESA_FORMAT_COUNT];
bool format_supported_as_render_target[MESA_FORMAT_COUNT];
+ /* Interpolation modes, one byte per vue slot, values equal to
+ * glsl_interp_qualifier.
+ *
+ * Used on gen4/5 by the clipper, sf and wm stages. Given the
+ * update order, the clipper is responsible to update it.
+ *
+ * Ignored on gen 6+
+ */
+
+ unsigned char interpolation_mode[BRW_VERT_RESULT_MAX];
+
/* PrimitiveRestart */
struct {
bool in_progress;
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c
index 791210f..26cbaf7 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -194,6 +194,9 @@ brw_upload_sf_prog(struct brw_context *brw)
key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
key.do_twoside_color = ctx->VertexProgram._TwoSideEnabled;
+ /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
+ memcpy(key.interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
+
/* _NEW_POLYGON */
if (key.do_twoside_color) {
/* If we're rendering to a FBO, we have to invert the polygon
@@ -215,7 +218,7 @@ const struct brw_tracked_state brw_sf_prog = {
.dirty = {
.mesa = (_NEW_HINT | _NEW_LIGHT | _NEW_POLYGON | _NEW_POINT |
_NEW_TRANSFORM | _NEW_BUFFERS),
- .brw = (BRW_NEW_REDUCED_PRIMITIVE),
+ .brw = (BRW_NEW_FRAGMENT_PROGRAM|BRW_NEW_REDUCED_PRIMITIVE),
.cache = CACHE_NEW_VS_PROG
},
.emit = brw_upload_sf_prog
diff --git a/src/mesa/drivers/dri/i965/brw_sf.h b/src/mesa/drivers/dri/i965/brw_sf.h
index f908fc0..5e261fb 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.h
+++ b/src/mesa/drivers/dri/i965/brw_sf.h
@@ -46,6 +46,7 @@
struct brw_sf_prog_key {
GLbitfield64 attrs;
+ unsigned char interpolation_mode[BRW_VERT_RESULT_MAX]; /* copy of the main context */
uint8_t point_sprite_coord_replace;
GLuint primitive:2;
GLuint do_twoside_color:1;
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c
index 587cc35..b54f4b1 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -513,6 +513,8 @@ static void brw_wm_populate_key( struct brw_context *brw,
/* _NEW_LIGHT */
key->flat_shade = (ctx->Light.ShadeModel == GL_FLAT);
+ if (intel->gen < 6)
+ memcpy(key->interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
/* _NEW_FRAG_CLAMP | _NEW_BUFFERS */
key->clamp_fragment_color = ctx->Color._ClampFragmentColor;
diff --git a/src/mesa/drivers/dri/i965/brw_wm.h b/src/mesa/drivers/dri/i965/brw_wm.h
index b976a60..add9dd6 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.h
+++ b/src/mesa/drivers/dri/i965/brw_wm.h
@@ -60,6 +60,7 @@
#define AA_ALWAYS 2
struct brw_wm_prog_key {
+ unsigned char interpolation_mode[BRW_VERT_RESULT_MAX]; /* copy of the main context */
uint8_t iz_lookup;
GLuint stats_wm:1;
GLuint flat_shade:1;
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [Mesa-dev] [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place.
2012-07-19 20:00 ` [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place Olivier Galibert
@ 2012-07-26 17:22 ` Eric Anholt
2012-07-27 9:12 ` Olivier Galibert
0 siblings, 1 reply; 41+ messages in thread
From: Eric Anholt @ 2012-07-26 17:22 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
[-- Attachment #1.1: Type: text/plain, Size: 1841 bytes --]
Olivier Galibert <galibert@pobox.com> writes:
> The program keys are updated accordingly, but the values are not used
> yet.
>
> Signed-off-by: Olivier Galibert <galibert@pobox.com>
> ---
> src/mesa/drivers/dri/i965/brw_clip.c | 90 ++++++++++++++++++++++++++++++-
> src/mesa/drivers/dri/i965/brw_clip.h | 1 +
> src/mesa/drivers/dri/i965/brw_context.h | 11 ++++
> src/mesa/drivers/dri/i965/brw_sf.c | 5 +-
> src/mesa/drivers/dri/i965/brw_sf.h | 1 +
> src/mesa/drivers/dri/i965/brw_wm.c | 2 +
> src/mesa/drivers/dri/i965/brw_wm.h | 1 +
> 7 files changed, 109 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c
> index d411208..b4a2e0a 100644
> --- a/src/mesa/drivers/dri/i965/brw_clip.c
> +++ b/src/mesa/drivers/dri/i965/brw_clip.c
> @@ -47,6 +47,86 @@
> #define FRONT_UNFILLED_BIT 0x1
> #define BACK_UNFILLED_BIT 0x2
>
> +/**
> + * Lookup the interpolation mode information for every element in the
> + * vue.
> + */
> +static void
> +brw_lookup_interpolation(struct brw_context *brw)
> +{
> + /* pprog means "previous program", i.e. the last program before the
> + * fragment shader. It can only be the vertex shader for now, but
> + * it may be a geometry shader in the future.
> + */
> + const struct gl_program *pprog = &brw->vertex_program->Base;
> + const struct gl_fragment_program *fprog = brw->fragment_program;
> + struct brw_vue_map *vue_map = &brw->vs.prog_data->vue_map;
> +
> + /* Default everything to INTERP_QUALIFIER_NONE */
> + memset(brw->interpolation_mode, INTERP_QUALIFIER_NONE, BRW_VERT_RESULT_MAX);
I don't like seeing this data that should be referenced out of the
program cache key being communicated through brw->.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Mesa-dev] [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place.
2012-07-26 17:22 ` [Mesa-dev] " Eric Anholt
@ 2012-07-27 9:12 ` Olivier Galibert
0 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-27 9:12 UTC (permalink / raw)
To: Eric Anholt; +Cc: mesa-dev, intel-gfx
On Thu, Jul 26, 2012 at 10:22:26AM -0700, Eric Anholt wrote:
> I don't like seeing this data that should be referenced out of the
> program cache key being communicated through brw->.
What would you like it being communicated through?
OG.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 6/9] intel gen4-5: Correctly setup the parameters in the sf.
2012-07-19 20:00 (no subject) Olivier Galibert
` (4 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 7/9] intel gen4-5: Correctly handle flat vs. non-flat in the clipper Olivier Galibert
` (2 subsequent siblings)
8 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
This patch also correct a couple of problems with noperspective
interpolation.
At that point all the glsl 1.1/1.3 interpolation tests that do not
clip pass (the -none ones).
The fs code does not use the pre-resolved interpolation modes in order
not to mess with gen6+. Sharing the resolution would require putting
brw_wm_prog before brw_clip_prog and brw_sf_prog. This may be a good
thing, but it could have unexpected consequences, so it's better be
done independently in any case.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
---
src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 15 +++
src/mesa/drivers/dri/i965/brw_sf.c | 12 +-
src/mesa/drivers/dri/i965/brw_sf.h | 2 +-
src/mesa/drivers/dri/i965/brw_sf_emit.c | 164 +++++++++++++-------------
5 files changed, 106 insertions(+), 89 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3b62952..4734a5d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -757,7 +757,7 @@ fs_visitor::emit_general_interpolation(ir_variable *ir)
inst->predicated = true;
inst->predicate_inverse = true;
}
- if (intel->gen < 6) {
+ if (intel->gen < 6 && interpolation_mode == INTERP_QUALIFIER_SMOOTH) {
emit(BRW_OPCODE_MUL, attr, attr, this->pixel_w);
}
}
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 08c0130..c6dc265 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1872,6 +1872,21 @@ fs_visitor::emit_interpolation_setup_gen4()
emit(BRW_OPCODE_ADD, this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC],
this->pixel_y, fs_reg(negate(brw_vec1_grf(1, 1))));
+ /*
+ * On Gen4-5, we accomplish perspective-correct interpolation by
+ * dividing the attribute values by w in the sf shader,
+ * interpolating the result linearly in screen space, and then
+ * multiplying by w in the fragment shader. So the interpolation
+ * step is always linear in screen space, regardless of whether the
+ * attribute is perspective or non-perspective. Accordingly, we
+ * use the same delta_x and delta_y values for both kinds of
+ * interpolation.
+ */
+ this->delta_x[BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC] =
+ this->delta_x[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC];
+ this->delta_y[BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC] =
+ this->delta_y[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC];
+
this->current_annotation = "compute pos.w and 1/pos.w";
/* Compute wpos.w. It's always in our setup, since it's needed to
* interpolate the other attributes.
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c
index 26cbaf7..c00e85a 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -139,6 +139,7 @@ brw_upload_sf_prog(struct brw_context *brw)
struct brw_sf_prog_key key;
/* _NEW_BUFFERS */
bool render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
+ int i;
memset(&key, 0, sizeof(key));
@@ -190,11 +191,16 @@ brw_upload_sf_prog(struct brw_context *brw)
if ((ctx->Point.SpriteOrigin == GL_LOWER_LEFT) != render_to_fbo)
key.sprite_origin_lower_left = true;
- /* _NEW_LIGHT */
- key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
+ /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
+ key.has_flat_shading = 0;
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT) {
+ key.has_flat_shading = 1;
+ break;
+ }
+ }
key.do_twoside_color = ctx->VertexProgram._TwoSideEnabled;
- /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
memcpy(key.interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
/* _NEW_POLYGON */
diff --git a/src/mesa/drivers/dri/i965/brw_sf.h b/src/mesa/drivers/dri/i965/brw_sf.h
index 5e261fb..47fdb3e 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.h
+++ b/src/mesa/drivers/dri/i965/brw_sf.h
@@ -50,7 +50,7 @@ struct brw_sf_prog_key {
uint8_t point_sprite_coord_replace;
GLuint primitive:2;
GLuint do_twoside_color:1;
- GLuint do_flat_shading:1;
+ GLuint has_flat_shading:1;
GLuint frontface_ccw:1;
GLuint do_point_sprite:1;
GLuint do_point_coord:1;
diff --git a/src/mesa/drivers/dri/i965/brw_sf_emit.c b/src/mesa/drivers/dri/i965/brw_sf_emit.c
index 9d8aa38..c99578a 100644
--- a/src/mesa/drivers/dri/i965/brw_sf_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_sf_emit.c
@@ -44,6 +44,17 @@
/**
+ * Determine the vue slot corresponding to the given half of the given
+ * register. half=0 means the first half of a register, half=1 means the
+ * second half.
+ */
+static inline int vert_reg_to_vue_slot(struct brw_sf_compile *c, GLuint reg,
+ int half)
+{
+ return (reg + c->urb_entry_read_offset) * 2 + half;
+}
+
+/**
* Determine the vert_result corresponding to the given half of the given
* register. half=0 means the first half of a register, half=1 means the
* second half.
@@ -51,11 +62,24 @@
static inline int vert_reg_to_vert_result(struct brw_sf_compile *c, GLuint reg,
int half)
{
- int vue_slot = (reg + c->urb_entry_read_offset) * 2 + half;
+ int vue_slot = vert_reg_to_vue_slot(c, reg, half);
return c->vue_map.slot_to_vert_result[vue_slot];
}
/**
+ * Determine the register corresponding to the given vue slot.
+ */
+static struct brw_reg get_vue_slot(struct brw_sf_compile *c,
+ struct brw_reg vert,
+ int vue_slot)
+{
+ GLuint off = vue_slot / 2 - c->urb_entry_read_offset;
+ GLuint sub = vue_slot % 2;
+
+ return brw_vec4_grf(vert.nr + off, sub * 4);
+}
+
+/**
* Determine the register corresponding to the given vert_result.
*/
static struct brw_reg get_vert_result(struct brw_sf_compile *c,
@@ -64,10 +88,7 @@ static struct brw_reg get_vert_result(struct brw_sf_compile *c,
{
int vue_slot = c->vue_map.vert_result_to_slot[vert_result];
assert (vue_slot >= c->urb_entry_read_offset);
- GLuint off = vue_slot / 2 - c->urb_entry_read_offset;
- GLuint sub = vue_slot % 2;
-
- return brw_vec4_grf(vert.nr + off, sub * 4);
+ return get_vue_slot(c, vert, vue_slot);
}
static bool
@@ -128,31 +149,37 @@ static void do_twoside_color( struct brw_sf_compile *c )
* Flat shading
*/
-#define VERT_RESULT_COLOR_BITS (BITFIELD64_BIT(VERT_RESULT_COL0) | \
- BITFIELD64_BIT(VERT_RESULT_COL1))
-
-static void copy_colors( struct brw_sf_compile *c,
- struct brw_reg dst,
- struct brw_reg src,
- int allow_twoside)
+static void copy_flatshaded_attributes( struct brw_sf_compile *c,
+ struct brw_reg dst,
+ struct brw_reg src)
{
struct brw_compile *p = &c->func;
+ struct brw_context *brw = p->brw;
GLuint i;
- for (i = VERT_RESULT_COL0; i <= VERT_RESULT_COL1; i++) {
- if (have_attr(c,i)) {
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT) {
brw_MOV(p,
- get_vert_result(c, dst, i),
- get_vert_result(c, src, i));
+ get_vue_slot(c, dst, i),
+ get_vue_slot(c, src, i));
- } else if(allow_twoside && have_attr(c, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0)) {
- brw_MOV(p,
- get_vert_result(c, dst, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0),
- get_vert_result(c, src, i - VERT_RESULT_COL0 + VERT_RESULT_BFC0));
}
}
}
+static GLuint count_flatshaded_attributes(struct brw_sf_compile *c )
+{
+ struct brw_compile *p = &c->func;
+ struct brw_context *brw = p->brw;
+ GLuint count = 0;
+ GLuint i;
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT)
+ count++;
+ }
+ return count;
+}
+
/* Need to use a computed jump to copy flatshaded attributes as the
@@ -168,18 +195,6 @@ static void do_flatshade_triangle( struct brw_sf_compile *c )
GLuint nr;
- if (c->key.do_twoside_color) {
- nr = ((c->key.attrs & (BITFIELD64_BIT(VERT_RESULT_COL0) | BITFIELD64_BIT(VERT_RESULT_BFC0))) != 0) +
- ((c->key.attrs & (BITFIELD64_BIT(VERT_RESULT_COL1) | BITFIELD64_BIT(VERT_RESULT_BFC1))) != 0);
-
- } else {
- nr = ((c->key.attrs & BITFIELD64_BIT(VERT_RESULT_COL0)) != 0) +
- ((c->key.attrs & BITFIELD64_BIT(VERT_RESULT_COL1)) != 0);
- }
-
- if (!nr)
- return;
-
/* Already done in clip program:
*/
if (c->key.primitive == SF_UNFILLED_TRIS)
@@ -188,21 +203,23 @@ static void do_flatshade_triangle( struct brw_sf_compile *c )
if (intel->gen == 5)
jmpi = 2;
+ nr = count_flatshaded_attributes(c);
+
brw_push_insn_state(p);
brw_MUL(p, c->pv, c->pv, brw_imm_d(jmpi*(nr*2+1)));
brw_JMPI(p, ip, ip, c->pv);
- copy_colors(c, c->vert[1], c->vert[0], c->key.do_twoside_color);
- copy_colors(c, c->vert[2], c->vert[0], c->key.do_twoside_color);
+ copy_flatshaded_attributes(c, c->vert[1], c->vert[0]);
+ copy_flatshaded_attributes(c, c->vert[2], c->vert[0]);
brw_JMPI(p, ip, ip, brw_imm_d(jmpi*(nr*4+1)));
- copy_colors(c, c->vert[0], c->vert[1], c->key.do_twoside_color);
- copy_colors(c, c->vert[2], c->vert[1], c->key.do_twoside_color);
+ copy_flatshaded_attributes(c, c->vert[0], c->vert[1]);
+ copy_flatshaded_attributes(c, c->vert[2], c->vert[1]);
brw_JMPI(p, ip, ip, brw_imm_d(jmpi*nr*2));
- copy_colors(c, c->vert[0], c->vert[2], c->key.do_twoside_color);
- copy_colors(c, c->vert[1], c->vert[2], c->key.do_twoside_color);
+ copy_flatshaded_attributes(c, c->vert[0], c->vert[2]);
+ copy_flatshaded_attributes(c, c->vert[1], c->vert[2]);
brw_pop_insn_state(p);
}
@@ -213,12 +230,9 @@ static void do_flatshade_line( struct brw_sf_compile *c )
struct brw_compile *p = &c->func;
struct intel_context *intel = &p->brw->intel;
struct brw_reg ip = brw_ip_reg();
- GLuint nr = _mesa_bitcount_64(c->key.attrs & VERT_RESULT_COLOR_BITS);
+ GLuint nr;
GLuint jmpi = 1;
- if (!nr)
- return;
-
/* Already done in clip program:
*/
if (c->key.primitive == SF_UNFILLED_TRIS)
@@ -227,14 +241,16 @@ static void do_flatshade_line( struct brw_sf_compile *c )
if (intel->gen == 5)
jmpi = 2;
+ nr = count_flatshaded_attributes(c);
+
brw_push_insn_state(p);
brw_MUL(p, c->pv, c->pv, brw_imm_d(jmpi*(nr+1)));
brw_JMPI(p, ip, ip, c->pv);
- copy_colors(c, c->vert[1], c->vert[0], 0);
+ copy_flatshaded_attributes(c, c->vert[1], c->vert[0]);
brw_JMPI(p, ip, ip, brw_imm_ud(jmpi*nr));
- copy_colors(c, c->vert[0], c->vert[1], 0);
+ copy_flatshaded_attributes(c, c->vert[0], c->vert[1]);
brw_pop_insn_state(p);
}
@@ -332,40 +348,25 @@ static void invert_det( struct brw_sf_compile *c)
static bool
calculate_masks(struct brw_sf_compile *c,
- GLuint reg,
- GLushort *pc,
- GLushort *pc_persp,
- GLushort *pc_linear)
+ GLuint reg,
+ GLushort *pc,
+ GLushort *pc_persp,
+ GLushort *pc_linear)
{
+ struct brw_compile *p = &c->func;
+ struct brw_context *brw = p->brw;
+ enum glsl_interp_qualifier interp;
bool is_last_attr = (reg == c->nr_setup_regs - 1);
- GLbitfield64 persp_mask;
- GLbitfield64 linear_mask;
-
- if (c->key.do_flat_shading)
- persp_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_HPOS) |
- BITFIELD64_BIT(VERT_RESULT_COL0) |
- BITFIELD64_BIT(VERT_RESULT_COL1) |
- BITFIELD64_BIT(VERT_RESULT_BFC0) |
- BITFIELD64_BIT(VERT_RESULT_BFC1));
- else
- persp_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_HPOS));
-
- if (c->key.do_flat_shading)
- linear_mask = c->key.attrs & ~(BITFIELD64_BIT(VERT_RESULT_COL0) |
- BITFIELD64_BIT(VERT_RESULT_COL1) |
- BITFIELD64_BIT(VERT_RESULT_BFC0) |
- BITFIELD64_BIT(VERT_RESULT_BFC1));
- else
- linear_mask = c->key.attrs;
*pc_persp = 0;
*pc_linear = 0;
*pc = 0xf;
-
- if (persp_mask & BITFIELD64_BIT(vert_reg_to_vert_result(c, reg, 0)))
- *pc_persp = 0xf;
- if (linear_mask & BITFIELD64_BIT(vert_reg_to_vert_result(c, reg, 0)))
+ interp = brw->interpolation_mode[vert_reg_to_vue_slot(c, reg, 0)];
+ if (interp == INTERP_QUALIFIER_SMOOTH) {
+ *pc_linear = 0xf;
+ *pc_persp = 0xf;
+ } else if(interp == INTERP_QUALIFIER_NOPERSPECTIVE)
*pc_linear = 0xf;
/* Maybe only processs one attribute on the final round:
@@ -373,11 +374,12 @@ calculate_masks(struct brw_sf_compile *c,
if (vert_reg_to_vert_result(c, reg, 1) != BRW_VERT_RESULT_MAX) {
*pc |= 0xf0;
- if (persp_mask & BITFIELD64_BIT(vert_reg_to_vert_result(c, reg, 1)))
- *pc_persp |= 0xf0;
-
- if (linear_mask & BITFIELD64_BIT(vert_reg_to_vert_result(c, reg, 1)))
- *pc_linear |= 0xf0;
+ interp = brw->interpolation_mode[vert_reg_to_vue_slot(c, reg, 1)];
+ if (interp == INTERP_QUALIFIER_SMOOTH) {
+ *pc_linear |= 0xf0;
+ *pc_persp |= 0xf0;
+ } else if(interp == INTERP_QUALIFIER_NOPERSPECTIVE)
+ *pc_linear |= 0xf0;
}
return is_last_attr;
@@ -430,7 +432,7 @@ void brw_emit_tri_setup(struct brw_sf_compile *c, bool allocate)
if (c->key.do_twoside_color)
do_twoside_color(c);
- if (c->key.do_flat_shading)
+ if (c->key.has_flat_shading)
do_flatshade_triangle(c);
@@ -443,7 +445,6 @@ void brw_emit_tri_setup(struct brw_sf_compile *c, bool allocate)
struct brw_reg a2 = offset(c->vert[2], i);
GLushort pc, pc_persp, pc_linear;
bool last = calculate_masks(c, i, &pc, &pc_persp, &pc_linear);
-
if (pc_persp)
{
brw_set_predicate_control_flag_value(p, pc_persp);
@@ -507,7 +508,6 @@ void brw_emit_line_setup(struct brw_sf_compile *c, bool allocate)
struct brw_compile *p = &c->func;
GLuint i;
-
c->nr_verts = 2;
if (allocate)
@@ -516,7 +516,7 @@ void brw_emit_line_setup(struct brw_sf_compile *c, bool allocate)
invert_det(c);
copy_z_inv_w(c);
- if (c->key.do_flat_shading)
+ if (c->key.has_flat_shading)
do_flatshade_line(c);
for (i = 0; i < c->nr_setup_regs; i++)
@@ -799,7 +799,3 @@ void brw_emit_anyprim_setup( struct brw_sf_compile *c )
brw_emit_point_setup( c, false );
}
-
-
-
-
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 7/9] intel gen4-5: Correctly handle flat vs. non-flat in the clipper.
2012-07-19 20:00 (no subject) Olivier Galibert
` (5 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 6/9] intel gen4-5: Correctly setup the parameters in the sf Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 8/9] intel gen4-5: Make noperspective clipping work Olivier Galibert
2012-07-19 20:00 ` [PATCH 9/9] intel gen4-5: Don't touch flatshaded values when clipping, only copy them Olivier Galibert
8 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
At that point, all interpolation piglit tests involving fixed clipping
work as long as there's no noperspective.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
---
src/mesa/drivers/dri/i965/brw_clip.c | 13 ++++--
src/mesa/drivers/dri/i965/brw_clip.h | 6 +--
src/mesa/drivers/dri/i965/brw_clip_line.c | 6 +--
src/mesa/drivers/dri/i965/brw_clip_tri.c | 20 ++++-----
src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 2 +-
src/mesa/drivers/dri/i965/brw_clip_util.c | 56 +++++++------------------
src/mesa/drivers/dri/i965/brw_sf_emit.c | 8 ++++
7 files changed, 50 insertions(+), 61 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c
index b4a2e0a..8512172 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -218,7 +218,7 @@ brw_upload_clip_prog(struct brw_context *brw)
struct intel_context *intel = &brw->intel;
struct gl_context *ctx = &intel->ctx;
struct brw_clip_prog_key key;
-
+ int i;
memset(&key, 0, sizeof(key));
/* Populate the key:
@@ -231,11 +231,16 @@ brw_upload_clip_prog(struct brw_context *brw)
key.primitive = brw->intel.reduced_primitive;
/* CACHE_NEW_VS_PROG (also part of VUE map) */
key.attrs = brw->vs.prog_data->outputs_written;
- /* _NEW_LIGHT */
- key.do_flat_shading = (ctx->Light.ShadeModel == GL_FLAT);
+ /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
+ key.has_flat_shading = 0;
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT) {
+ key.has_flat_shading = 1;
+ break;
+ }
+ }
key.pv_first = (ctx->Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION);
- /* BRW_NEW_FRAGMENT_PROGRAM, _NEW_LIGHT */
memcpy(key.interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
/* _NEW_TRANSFORM (also part of VUE map)*/
diff --git a/src/mesa/drivers/dri/i965/brw_clip.h b/src/mesa/drivers/dri/i965/brw_clip.h
index e78d074..3ad2e13 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.h
+++ b/src/mesa/drivers/dri/i965/brw_clip.h
@@ -46,7 +46,7 @@ struct brw_clip_prog_key {
unsigned char interpolation_mode[BRW_VERT_RESULT_MAX]; /* copy of the main context */
GLuint primitive:4;
GLuint nr_userclip:4;
- GLuint do_flat_shading:1;
+ GLuint has_flat_shading:1;
GLuint pv_first:1;
GLuint do_unfilled:1;
GLuint fill_cw:2; /* includes cull information */
@@ -166,8 +166,8 @@ void brw_clip_kill_thread(struct brw_clip_compile *c);
struct brw_reg brw_clip_plane_stride( struct brw_clip_compile *c );
struct brw_reg brw_clip_plane0_address( struct brw_clip_compile *c );
-void brw_clip_copy_colors( struct brw_clip_compile *c,
- GLuint to, GLuint from );
+void brw_clip_copy_flatshaded_attributes( struct brw_clip_compile *c,
+ GLuint to, GLuint from );
void brw_clip_init_clipmask( struct brw_clip_compile *c );
diff --git a/src/mesa/drivers/dri/i965/brw_clip_line.c b/src/mesa/drivers/dri/i965/brw_clip_line.c
index 6cf2bd2..729d8c0 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_line.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_line.c
@@ -271,11 +271,11 @@ void brw_emit_line_clip( struct brw_clip_compile *c )
brw_clip_line_alloc_regs(c);
brw_clip_init_ff_sync(c);
- if (c->key.do_flat_shading) {
+ if (c->key.has_flat_shading) {
if (c->key.pv_first)
- brw_clip_copy_colors(c, 1, 0);
+ brw_clip_copy_flatshaded_attributes(c, 1, 0);
else
- brw_clip_copy_colors(c, 0, 1);
+ brw_clip_copy_flatshaded_attributes(c, 0, 1);
}
clip_and_emit_line(c);
diff --git a/src/mesa/drivers/dri/i965/brw_clip_tri.c b/src/mesa/drivers/dri/i965/brw_clip_tri.c
index a29f8e0..71225f5 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_tri.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_tri.c
@@ -187,8 +187,8 @@ void brw_clip_tri_flat_shade( struct brw_clip_compile *c )
brw_IF(p, BRW_EXECUTE_1);
{
- brw_clip_copy_colors(c, 1, 0);
- brw_clip_copy_colors(c, 2, 0);
+ brw_clip_copy_flatshaded_attributes(c, 1, 0);
+ brw_clip_copy_flatshaded_attributes(c, 2, 0);
}
brw_ELSE(p);
{
@@ -200,19 +200,19 @@ void brw_clip_tri_flat_shade( struct brw_clip_compile *c )
brw_imm_ud(_3DPRIM_TRIFAN));
brw_IF(p, BRW_EXECUTE_1);
{
- brw_clip_copy_colors(c, 0, 1);
- brw_clip_copy_colors(c, 2, 1);
+ brw_clip_copy_flatshaded_attributes(c, 0, 1);
+ brw_clip_copy_flatshaded_attributes(c, 2, 1);
}
brw_ELSE(p);
{
- brw_clip_copy_colors(c, 1, 0);
- brw_clip_copy_colors(c, 2, 0);
+ brw_clip_copy_flatshaded_attributes(c, 1, 0);
+ brw_clip_copy_flatshaded_attributes(c, 2, 0);
}
brw_ENDIF(p);
}
else {
- brw_clip_copy_colors(c, 0, 2);
- brw_clip_copy_colors(c, 1, 2);
+ brw_clip_copy_flatshaded_attributes(c, 0, 2);
+ brw_clip_copy_flatshaded_attributes(c, 1, 2);
}
}
brw_ENDIF(p);
@@ -606,8 +606,8 @@ void brw_emit_tri_clip( struct brw_clip_compile *c )
* flatshading, need to apply the flatshade here because we don't
* respect the PV when converting to trifan for emit:
*/
- if (c->key.do_flat_shading)
- brw_clip_tri_flat_shade(c);
+ if (c->key.has_flat_shading)
+ brw_clip_tri_flat_shade(c);
if ((c->key.clip_mode == BRW_CLIPMODE_NORMAL) ||
(c->key.clip_mode == BRW_CLIPMODE_KERNEL_CLIP))
diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
index 03c7d42..96f9a84 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
@@ -502,7 +502,7 @@ void brw_emit_unfilled_clip( struct brw_clip_compile *c )
/* Need to do this whether we clip or not:
*/
- if (c->key.do_flat_shading)
+ if (c->key.has_flat_shading)
brw_clip_tri_flat_shade(c);
brw_clip_init_clipmask(c);
diff --git a/src/mesa/drivers/dri/i965/brw_clip_util.c b/src/mesa/drivers/dri/i965/brw_clip_util.c
index bf8cc3a..692573e 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_util.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_util.c
@@ -165,7 +165,7 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
vert_result == VERT_RESULT_CLIP_DIST1) {
/* PSIZ doesn't need interpolation because it isn't used by the
* fragment shader. CLIP_DIST0 and CLIP_DIST1 don't need
- * intepolation because on pre-GEN6, these are just placeholder VUE
+ * interpolation because on pre-GEN6, these are just placeholder VUE
* slots that don't perform any action.
*/
} else if (vert_result < VERT_RESULT_MAX) {
@@ -291,49 +291,25 @@ struct brw_reg brw_clip_plane_stride( struct brw_clip_compile *c )
}
-/* If flatshading, distribute color from provoking vertex prior to
+/* Distribute flatshaded attributes from provoking vertex prior to
* clipping.
*/
-void brw_clip_copy_colors( struct brw_clip_compile *c,
- GLuint to, GLuint from )
+void brw_clip_copy_flatshaded_attributes( struct brw_clip_compile *c,
+ GLuint to, GLuint from )
{
struct brw_compile *p = &c->func;
-
- if (brw_clip_have_vert_result(c, VERT_RESULT_COL0))
- brw_MOV(p,
- byte_offset(c->reg.vertex[to],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_COL0)),
- byte_offset(c->reg.vertex[from],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_COL0)));
-
- if (brw_clip_have_vert_result(c, VERT_RESULT_COL1))
- brw_MOV(p,
- byte_offset(c->reg.vertex[to],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_COL1)),
- byte_offset(c->reg.vertex[from],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_COL1)));
-
- if (brw_clip_have_vert_result(c, VERT_RESULT_BFC0))
- brw_MOV(p,
- byte_offset(c->reg.vertex[to],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_BFC0)),
- byte_offset(c->reg.vertex[from],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_BFC0)));
-
- if (brw_clip_have_vert_result(c, VERT_RESULT_BFC1))
- brw_MOV(p,
- byte_offset(c->reg.vertex[to],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_BFC1)),
- byte_offset(c->reg.vertex[from],
- brw_vert_result_to_offset(&c->vue_map,
- VERT_RESULT_BFC1)));
+ struct brw_context *brw = p->brw;
+ GLuint i;
+
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT) {
+ brw_MOV(p,
+ byte_offset(c->reg.vertex[to],
+ brw_vue_slot_to_offset(i)),
+ byte_offset(c->reg.vertex[from],
+ brw_vue_slot_to_offset(i)));
+ }
+ }
}
diff --git a/src/mesa/drivers/dri/i965/brw_sf_emit.c b/src/mesa/drivers/dri/i965/brw_sf_emit.c
index c99578a..2e9beed 100644
--- a/src/mesa/drivers/dri/i965/brw_sf_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_sf_emit.c
@@ -177,6 +177,14 @@ static GLuint count_flatshaded_attributes(struct brw_sf_compile *c )
if (brw->interpolation_mode[i] == INTERP_QUALIFIER_FLAT)
count++;
}
+
+ /* This should only be called if there is at least one flatshaded
+ * attribute. While nothing should break if there isn't any, the
+ * generated code would be heavily pessimized. So check that all
+ * is well.
+ */
+ assert(count != 0);
+
return count;
}
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 8/9] intel gen4-5: Make noperspective clipping work.
2012-07-19 20:00 (no subject) Olivier Galibert
` (6 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 7/9] intel gen4-5: Correctly handle flat vs. non-flat in the clipper Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 9/9] intel gen4-5: Don't touch flatshaded values when clipping, only copy them Olivier Galibert
8 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
At this point all interpolation tests with fixed clipping work.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
---
src/mesa/drivers/dri/i965/brw_clip.c | 9 ++
src/mesa/drivers/dri/i965/brw_clip.h | 1 +
src/mesa/drivers/dri/i965/brw_clip_util.c | 147 ++++++++++++++++++++++++++---
3 files changed, 146 insertions(+), 11 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c
index 8512172..eca2844 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -239,6 +239,15 @@ brw_upload_clip_prog(struct brw_context *brw)
break;
}
}
+ key.has_noperspective_shading = 0;
+ for (i = 0; i < BRW_VERT_RESULT_MAX; i++) {
+ if (brw->interpolation_mode[i] == INTERP_QUALIFIER_NOPERSPECTIVE &&
+ brw->vs.prog_data->vue_map.slot_to_vert_result[i] != VERT_RESULT_HPOS) {
+ key.has_noperspective_shading = 1;
+ break;
+ }
+ }
+
key.pv_first = (ctx->Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION);
memcpy(key.interpolation_mode, brw->interpolation_mode, BRW_VERT_RESULT_MAX);
diff --git a/src/mesa/drivers/dri/i965/brw_clip.h b/src/mesa/drivers/dri/i965/brw_clip.h
index 3ad2e13..66dd928 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.h
+++ b/src/mesa/drivers/dri/i965/brw_clip.h
@@ -47,6 +47,7 @@ struct brw_clip_prog_key {
GLuint primitive:4;
GLuint nr_userclip:4;
GLuint has_flat_shading:1;
+ GLuint has_noperspective_shading:1;
GLuint pv_first:1;
GLuint do_unfilled:1;
GLuint fill_cw:2; /* includes cull information */
diff --git a/src/mesa/drivers/dri/i965/brw_clip_util.c b/src/mesa/drivers/dri/i965/brw_clip_util.c
index 692573e..b06ad1d 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_util.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_util.c
@@ -129,6 +129,8 @@ static void brw_clip_project_vertex( struct brw_clip_compile *c,
/* Interpolate between two vertices and put the result into a0.0.
* Increment a0.0 accordingly.
+ *
+ * Beware that dest_ptr can be equal to v0_ptr.
*/
void brw_clip_interp_vertex( struct brw_clip_compile *c,
struct brw_indirect dest_ptr,
@@ -138,7 +140,8 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
bool force_edgeflag)
{
struct brw_compile *p = &c->func;
- struct brw_reg tmp = get_tmp(c);
+ struct brw_context *brw = p->brw;
+ struct brw_reg t_nopersp, v0_ndc_copy;
GLuint slot;
/* Just copy the vertex header:
@@ -148,13 +151,130 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
* back on Ironlake, so needn't change it
*/
brw_copy_indirect_to_indirect(p, dest_ptr, v0_ptr, 1);
-
- /* Iterate over each attribute (could be done in pairs?)
+
+ /*
+ * First handle the 3D and NDC positioning, in case we need
+ * noperspective interpolation. Doing it early has no performance
+ * impact in any case.
+ */
+
+ /* Start by picking up the v0 NDC coordinates, because that vertex
+ * may be shared with the destination.
+ */
+ if (c->key.has_noperspective_shading) {
+ GLuint offset = brw_vert_result_to_offset(&c->vue_map,
+ BRW_VERT_RESULT_NDC);
+ v0_ndc_copy = get_tmp(c);
+ brw_MOV(p, v0_ndc_copy, deref_4f(v0_ptr, offset));
+ }
+
+ /*
+ * Compute the new 3D position
+ *
+ * dest_hpos = v0_hpos * (1 - t0) + v1_hpos * t0
+ */
+ {
+ GLuint delta = brw_vert_result_to_offset(&c->vue_map, VERT_RESULT_HPOS);
+ struct brw_reg tmp = get_tmp(c);
+ brw_MUL(p,
+ vec4(brw_null_reg()),
+ deref_4f(v1_ptr, delta),
+ t0);
+
+ brw_MAC(p,
+ tmp,
+ negate(deref_4f(v0_ptr, delta)),
+ t0);
+
+ brw_ADD(p,
+ deref_4f(dest_ptr, delta),
+ deref_4f(v0_ptr, delta),
+ tmp);
+ release_tmp(c, tmp);
+ }
+
+ /* Then recreate the projected (NDC) coordinate in the new vertex
+ * header
+ */
+ brw_clip_project_vertex(c, dest_ptr);
+
+ /*
+ * If we have noperspective attributes, we now need to compute the
+ * screen-space t.
+ */
+ if (c->key.has_noperspective_shading) {
+ GLuint delta = brw_vert_result_to_offset(&c->vue_map, BRW_VERT_RESULT_NDC);
+ struct brw_reg tmp = get_tmp(c);
+ t_nopersp = get_tmp(c);
+
+ /* Build a register with coordinates from the second and new vertices
+ *
+ * t_nopersp = vec4(v1.xy, dest.xy)
+ */
+ brw_MOV(p, t_nopersp, deref_4f(v1_ptr, delta));
+ brw_MOV(p, tmp, deref_4f(dest_ptr, delta));
+ brw_set_access_mode(p, BRW_ALIGN_16);
+ brw_MOV(p,
+ brw_writemask(t_nopersp, WRITEMASK_ZW),
+ brw_swizzle(tmp, 0,1,0,1));
+
+ /* Subtract the coordinates of the first vertex
+ *
+ * t_nopersp = vec4(v1.xy, dest.xy) - v0.xyxy
+ */
+ brw_ADD(p, t_nopersp, t_nopersp, negate(brw_swizzle(v0_ndc_copy, 0,1,0,1)));
+
+ /* Add the absolute value of the X and Y deltas so that if the
+ * points aren't in the same place on the screen we get non-zero
+ * values to divide.
+ *
+ * After that we have vert1-vert0 in t_nopersp.x and vertnew-vert0 in t_nopersp.y.
+ *
+ * t_nopersp = vec2(|v1.x -v0.x| + |v1.y -v0.y|,
+ * |dest.x-v0.x| + |dest.y-v0.y|)
+ */
+ brw_ADD(p,
+ brw_writemask(t_nopersp, WRITEMASK_XY),
+ brw_abs(brw_swizzle(t_nopersp, 0,2,0,0)),
+ brw_abs(brw_swizzle(t_nopersp, 1,3,0,0)));
+ brw_set_access_mode(p, BRW_ALIGN_1);
+
+ /* If the points are in the same place (vert1-vert0 == 0), just
+ * substitute a value that will ensure that we don't divide by
+ * 0.
+ */
+ brw_CMP(p, vec1(brw_null_reg()), BRW_CONDITIONAL_EQ,
+ vec1(t_nopersp),
+ brw_imm_f(0));
+ brw_IF(p, BRW_EXECUTE_1);
+ brw_MOV(p, t_nopersp, brw_imm_vf4(VF_ONE, VF_ZERO, VF_ZERO, VF_ZERO));
+ brw_ENDIF(p);
+
+ /* Now compute t_nopersp = t_nopersp.y/t_nopersp.x and broadcast it */
+ brw_math_invert(p, get_element(t_nopersp, 0), get_element(t_nopersp, 0));
+ brw_MUL(p,
+ vec1(t_nopersp),
+ vec1(t_nopersp),
+ vec1(suboffset(t_nopersp, 1)));
+ brw_set_access_mode(p, BRW_ALIGN_16);
+ brw_MOV(p, t_nopersp, brw_swizzle(t_nopersp, 0,0,0,0));
+ brw_set_access_mode(p, BRW_ALIGN_1);
+
+ release_tmp(c, tmp);
+ release_tmp(c, v0_ndc_copy);
+ }
+
+ /* Now we can iterate over each attribute
+ * (could be done in pairs?)
*/
for (slot = 0; slot < c->vue_map.num_slots; slot++) {
int vert_result = c->vue_map.slot_to_vert_result[slot];
GLuint delta = brw_vue_slot_to_offset(slot);
+ /* HPOS is already handled */
+ if(vert_result == VERT_RESULT_HPOS)
+ continue;
+
if (vert_result == VERT_RESULT_EDGE) {
if (force_edgeflag)
brw_MOV(p, deref_4f(dest_ptr, delta), brw_imm_f(1));
@@ -174,20 +294,29 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
*
* New = attr0 + t*attr1 - t*attr0
*/
+
+ struct brw_reg tmp = get_tmp(c);
+
+ struct brw_reg t =
+ brw->interpolation_mode[slot] == INTERP_QUALIFIER_NOPERSPECTIVE ?
+ t_nopersp : t0;
+
brw_MUL(p,
vec4(brw_null_reg()),
deref_4f(v1_ptr, delta),
- t0);
+ t);
brw_MAC(p,
tmp,
negate(deref_4f(v0_ptr, delta)),
- t0);
+ t);
brw_ADD(p,
deref_4f(dest_ptr, delta),
deref_4f(v0_ptr, delta),
tmp);
+
+ release_tmp(c, tmp);
}
}
@@ -197,12 +326,8 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
brw_MOV(p, deref_4f(dest_ptr, delta), brw_imm_f(0));
}
- release_tmp(c, tmp);
-
- /* Recreate the projected (NDC) coordinate in the new vertex
- * header:
- */
- brw_clip_project_vertex(c, dest_ptr );
+ if (c->key.has_noperspective_shading)
+ release_tmp(c, t_nopersp);
}
void brw_clip_emit_vue(struct brw_clip_compile *c,
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 9/9] intel gen4-5: Don't touch flatshaded values when clipping, only copy them.
2012-07-19 20:00 (no subject) Olivier Galibert
` (7 preceding siblings ...)
2012-07-19 20:00 ` [PATCH 8/9] intel gen4-5: Make noperspective clipping work Olivier Galibert
@ 2012-07-19 20:00 ` Olivier Galibert
8 siblings, 0 replies; 41+ messages in thread
From: Olivier Galibert @ 2012-07-19 20:00 UTC (permalink / raw)
To: intel-gfx, mesa-dev; +Cc: Olivier Galibert
This patch ensures that integers will pass through unscathed. Doing
(useless) computations on them is risky, especially when their bit
patterns correspond to values like inf or nan.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
---
src/mesa/drivers/dri/i965/brw_clip_util.c | 48 ++++++++++++++++++-----------
1 file changed, 30 insertions(+), 18 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_clip_util.c b/src/mesa/drivers/dri/i965/brw_clip_util.c
index b06ad1d..998c304 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_util.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_util.c
@@ -293,30 +293,42 @@ void brw_clip_interp_vertex( struct brw_clip_compile *c,
* header), so interpolate:
*
* New = attr0 + t*attr1 - t*attr0
+ *
+ * unless it's flat shaded, then just copy the value from a
+ * source vertex.
*/
- struct brw_reg tmp = get_tmp(c);
+ GLuint interp = brw->interpolation_mode[slot];
- struct brw_reg t =
- brw->interpolation_mode[slot] == INTERP_QUALIFIER_NOPERSPECTIVE ?
- t_nopersp : t0;
+ if(interp == INTERP_QUALIFIER_SMOOTH ||
+ interp == INTERP_QUALIFIER_NOPERSPECTIVE) {
+ struct brw_reg tmp = get_tmp(c);
+ struct brw_reg t =
+ interp == INTERP_QUALIFIER_NOPERSPECTIVE ?
+ t_nopersp : t0;
- brw_MUL(p,
- vec4(brw_null_reg()),
- deref_4f(v1_ptr, delta),
- t);
+ brw_MUL(p,
+ vec4(brw_null_reg()),
+ deref_4f(v1_ptr, delta),
+ t);
- brw_MAC(p,
- tmp,
- negate(deref_4f(v0_ptr, delta)),
- t);
+ brw_MAC(p,
+ tmp,
+ negate(deref_4f(v0_ptr, delta)),
+ t);
- brw_ADD(p,
- deref_4f(dest_ptr, delta),
- deref_4f(v0_ptr, delta),
- tmp);
-
- release_tmp(c, tmp);
+ brw_ADD(p,
+ deref_4f(dest_ptr, delta),
+ deref_4f(v0_ptr, delta),
+ tmp);
+
+ release_tmp(c, tmp);
+
+ } else if(interp == INTERP_QUALIFIER_FLAT) {
+ brw_MOV(p,
+ deref_4f(dest_ptr, delta),
+ deref_4f(v0_ptr, delta));
+ }
}
}
--
1.7.10.280.gaa39
^ permalink raw reply related [flat|nested] 41+ messages in thread
* (no subject)
@ 2018-07-06 14:42 Christian König
0 siblings, 0 replies; 41+ messages in thread
From: Christian König @ 2018-07-06 14:42 UTC (permalink / raw)
To: intel-gfx
Next try of prework for unpinned DMA-buf operation.
Only send to intel-gfx to trigger unit tests on the following patches.
Christian.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2018-07-05 10:38 rosdi ablatiff
0 siblings, 0 replies; 41+ messages in thread
From: rosdi ablatiff @ 2018-07-05 10:38 UTC (permalink / raw)
To: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 1 bytes --]
[-- Attachment #1.2: Type: text/html, Size: 1 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2017-01-16 16:28 Tony Whittam
0 siblings, 0 replies; 41+ messages in thread
From: Tony Whittam @ 2017-01-16 16:28 UTC (permalink / raw)
To: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 2520 bytes --]
Hi everyone,
I don't know if this is too specialised for this list. Anyway, no harm in
asking the question :-)
*Preamble*
Build: Yocto from the Apollo Lake BSP release *gold, *
Hardware: Oxbow Hill Rev B CRB with Intel Atom E3950 and 4GB DDR3 RAM (one
SODIMM)
Build: core-image-sato-sdk
Installed on the onboard eMMC.
OpenCL: installed user space drivers from SRB4 https://software.intel.
com/file/533571/download
I'm currently evaluating the Apollo Lake platform as a candidate to run our
embedded application. We already have this application running on less
powerful ARM based Linux systems with Mali GPU using OpenCL 1.2. We're now
evaluating the E3950 as a faster alternative. To evaluate the application I
need OpenCL 1.2 or later.
To verify the OpenCL installation I have built and run the Intel demo apps:
CapsBasic and Bitonic Sort. CapsBasic sees two devices: CPU and GPU and
Bitonic sort can run its kernels correctly on both the CPU and the GPU.
*The issue*
Simply put, the application has
- thread 1 (feeder): has a loop that feeds data into OpenCL and queues
kernels
- thread 2 (consumer): waits for results and reads output data.
- an OpenCL Host command queue with out-of-order execution enabled
When I run my app and select the GPU OpenCL device, the feeder thread *stalls
inside a blocking call to clEnqueueMapBuffer(). *At this point only one
thing has been queued on the command queue: a buffer unmap command for a
different buffer. This unmap is waiting for an OpenCL event that will
indicate data ready to be processed.
In contrast, when I run my app and select the *CPU OpenCL *device, it works
perfectly.
Does anyone have any ideas on
1. what might be causing this problem running with the GPU?
2. how to debug this on the Yocto platform?
Best regards,
Tony
--
Tony Whittam
Rapt Touch
--
Confidentiality Notice:
The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the
individual or entity to whom this message is addressed. If the reader of
this message is not the intended recipient or an agent or designee of the
intended recipient, please note that any review, use, disclosure or
distribution of this message or its attachments, in any form, is strictly
prohibited. If you have received this message in error, please immediately
notify the sender and/or Rapt Touch Ltd via email at info@rapttouch.com and
delete or destroy any copy of this message and its attachments.
[-- Attachment #1.2: Type: text/html, Size: 4121 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH i-g-t v5 1/4] lib: add igt_dummyload
@ 2016-11-11 16:16 Daniel Vetter
2016-11-14 18:24 ` (no subject) Abdiel Janulgue
0 siblings, 1 reply; 41+ messages in thread
From: Daniel Vetter @ 2016-11-11 16:16 UTC (permalink / raw)
To: Abdiel Janulgue; +Cc: Daniel Vetter, intel-gfx
On Fri, Nov 11, 2016 at 07:41:10PM +0200, Abdiel Janulgue wrote:
> A lot of igt testcases need some GPU workload to make sure a race
> window is big enough. Unfortunately having a fixed amount of
> workload leads to spurious test failures or overtly long runtimes
> on some fast/slow platforms. This library contains functionality
> to submit GPU workloads that should consume exactly a specific
> amount of time.
>
> v2 : Add recursive batch feature from Chris
> v3 : Drop auto-tuned stuff. Add bo dependecy to recursive batch
> by adding a dummy reloc to the bo as suggested by Ville.
> v4: Fix dependency reloc as write instead of read (Ville).
> Fix wrong handling of batchbuffer start on ILK causing
> test failure
>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> ---
> lib/Makefile.sources | 2 +
> lib/igt.h | 1 +
> lib/igt_dummyload.c | 276 +++++++++++++++++++++++++++++++++++++++++++++++++++
> lib/igt_dummyload.h | 42 ++++++++
Did you check that your new docs do show up in the generated
documentation? Iirc you need to edit some xml under docs/.
-Daniel
> 4 files changed, 321 insertions(+)
> create mode 100644 lib/igt_dummyload.c
> create mode 100644 lib/igt_dummyload.h
>
> diff --git a/lib/Makefile.sources b/lib/Makefile.sources
> index e8e277b..7fc5ec2 100644
> --- a/lib/Makefile.sources
> +++ b/lib/Makefile.sources
> @@ -75,6 +75,8 @@ lib_source_list = \
> igt_draw.h \
> igt_pm.c \
> igt_pm.h \
> + igt_dummyload.c \
> + igt_dummyload.h \
> uwildmat/uwildmat.h \
> uwildmat/uwildmat.c \
> $(NULL)
> diff --git a/lib/igt.h b/lib/igt.h
> index d751f24..a0028d5 100644
> --- a/lib/igt.h
> +++ b/lib/igt.h
> @@ -32,6 +32,7 @@
> #include "igt_core.h"
> #include "igt_debugfs.h"
> #include "igt_draw.h"
> +#include "igt_dummyload.h"
> #include "igt_fb.h"
> #include "igt_gt.h"
> #include "igt_kms.h"
> diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c
> new file mode 100644
> index 0000000..b934fd5
> --- /dev/null
> +++ b/lib/igt_dummyload.c
> @@ -0,0 +1,276 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "igt.h"
> +#include "igt_dummyload.h"
> +#include <time.h>
> +#include <signal.h>
> +#include <sys/syscall.h>
> +
> +/**
> + * SECTION:igt_dummyload
> + * @short_description: Library for submitting GPU workloads
> + * @title: Dummyload
> + * @include: igt.h
> + *
> + * A lot of igt testcases need some GPU workload to make sure a race window is
> + * big enough. Unfortunately having a fixed amount of workload leads to
> + * spurious test failures or overtly long runtimes on some fast/slow platforms.
> + * This library contains functionality to submit GPU workloads that should
> + * consume exactly a specific amount of time.
> + */
> +
> +#define NSEC_PER_SEC 1000000000L
> +
> +#define gettid() syscall(__NR_gettid)
> +#define sigev_notify_thread_id _sigev_un._tid
> +
> +#define LOCAL_I915_EXEC_BSD_SHIFT (13)
> +#define LOCAL_I915_EXEC_BSD_MASK (3 << LOCAL_I915_EXEC_BSD_SHIFT)
> +
> +#define ENGINE_MASK (I915_EXEC_RING_MASK | LOCAL_I915_EXEC_BSD_MASK)
> +
> +static void
> +fill_object(struct drm_i915_gem_exec_object2 *obj, uint32_t gem_handle,
> + struct drm_i915_gem_relocation_entry *relocs, uint32_t count)
> +{
> + memset(obj, 0, sizeof(*obj));
> + obj->handle = gem_handle;
> + obj->relocation_count = count;
> + obj->relocs_ptr = (uintptr_t)relocs;
> +}
> +
> +static void
> +fill_reloc(struct drm_i915_gem_relocation_entry *reloc,
> + uint32_t gem_handle, uint32_t offset,
> + uint32_t read_domains, uint32_t write_domains)
> +{
> + reloc->target_handle = gem_handle;
> + reloc->delta = 0;
> + reloc->offset = offset * sizeof(uint32_t);
> + reloc->presumed_offset = 0;
> + reloc->read_domains = read_domains;
> + reloc->write_domain = write_domains;
> +}
> +
> +
> +static uint32_t *batch;
> +
> +static uint32_t emit_recursive_batch(int fd, int engine, unsigned dep_handle)
> +{
> + const int gen = intel_gen(intel_get_drm_devid(fd));
> + struct drm_i915_gem_exec_object2 obj[2];
> + struct drm_i915_gem_relocation_entry relocs[2];
> + struct drm_i915_gem_execbuffer2 execbuf;
> + unsigned engines[16];
> + unsigned nengine, handle;
> + int i = 0, reloc_count = 0, buf_count = 0;
> +
> + buf_count = 0;
> + nengine = 0;
> + if (engine < 0) {
> + for_each_engine(fd, engine)
> + if (engine)
> + engines[nengine++] = engine;
> + } else {
> + igt_require(gem_has_ring(fd, engine));
> + engines[nengine++] = engine;
> + }
> + igt_require(nengine);
> +
> + memset(&execbuf, 0, sizeof(execbuf));
> + memset(obj, 0, sizeof(obj));
> + memset(relocs, 0, sizeof(relocs));
> +
> + execbuf.buffers_ptr = (uintptr_t) obj;
> + handle = gem_create(fd, 4096);
> + batch = gem_mmap__gtt(fd, handle, 4096, PROT_WRITE);
> + gem_set_domain(fd, handle,
> + I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> +
> + if (nengine == 1 && dep_handle > 0) {
> + /* dummy write to dependency */
> + fill_object(&obj[buf_count], dep_handle, NULL, 0);
> + buf_count++;
> +
> + fill_reloc(&relocs[reloc_count], dep_handle, i,
> + I915_GEM_DOMAIN_RENDER,
> + I915_GEM_DOMAIN_RENDER);
> + batch[i++] = 0; /* reloc */
> + reloc_count++;
> + batch[i++] = MI_NOOP;
> + }
> +
> + if (gen >= 8) {
> + batch[i++] = MI_BATCH_BUFFER_START | 1 << 8 | 1;
> + /* recurse */
> + fill_reloc(&relocs[reloc_count], handle, i,
> + I915_GEM_DOMAIN_COMMAND, 0);
> + batch[i++] = 0;
> + batch[i++] = 0;
> + } else if (gen >= 6) {
> + batch[i++] = MI_BATCH_BUFFER_START | 1 << 8;
> + /* recurse */
> + fill_reloc(&relocs[reloc_count], handle, i,
> + I915_GEM_DOMAIN_COMMAND, 0);
> + batch[i++] = 0;
> + } else {
> + batch[i++] = MI_BATCH_BUFFER_START | 2 << 6 |
> + ((gen < 4) ? 1 : 0);
> + /* recurse */
> + fill_reloc(&relocs[reloc_count], handle, i,
> + I915_GEM_DOMAIN_COMMAND, 0);
> + batch[i++] = 0;
> + if (gen < 4)
> + relocs[reloc_count].delta = 1;
> + }
> + reloc_count++;
> +
> + fill_object(&obj[buf_count], handle, relocs, reloc_count);
> + buf_count++;
> +
> + for (i = 0; i < nengine; i++) {
> + execbuf.flags &= ~ENGINE_MASK;
> + execbuf.flags = engines[i];
> + execbuf.buffer_count = buf_count;
> + gem_execbuf(fd, &execbuf);
> + }
> +
> + return handle;
> +}
> +
> +static void sigiter(int sig, siginfo_t *info, void *arg)
> +{
> + *batch = MI_BATCH_BUFFER_END;
> + __sync_synchronize();
> +}
> +
> +static timer_t setup_batch_exit_timer(int64_t ns)
> +{
> + timer_t timer;
> + struct sigevent sev;
> + struct sigaction act;
> + struct itimerspec its;
> +
> + memset(&sev, 0, sizeof(sev));
> + sev.sigev_notify = SIGEV_SIGNAL | SIGEV_THREAD_ID;
> + sev.sigev_notify_thread_id = gettid();
> + sev.sigev_signo = SIGRTMIN + 1;
> + igt_assert(timer_create(CLOCK_MONOTONIC, &sev, &timer) == 0);
> + igt_assert(timer > 0);
> +
> + memset(&act, 0, sizeof(act));
> + act.sa_sigaction = sigiter;
> + act.sa_flags = SA_SIGINFO;
> + igt_assert(sigaction(SIGRTMIN + 1, &act, NULL) == 0);
> +
> + memset(&its, 0, sizeof(its));
> + its.it_value.tv_sec = ns / NSEC_PER_SEC;
> + its.it_value.tv_nsec = ns % NSEC_PER_SEC;
> + igt_assert(timer_settime(timer, 0, &its, NULL) == 0);
> +
> + return timer;
> +}
> +
> +/**
> + * igt_spin_batch:
> + * @fd: open i915 drm file descriptor
> + * @ns: amount of time in nanoseconds the batch executes after terminating.
> + * If value is less than 0, execute batch forever.
> + * @engine: Ring to execute batch OR'd with execbuf flags. If value is less
> + * than 0, execute on all available rings.
> + * @dep_handle: handle to a buffer object dependency. If greater than 0, add a
> + * relocation entry to this buffer within the batch.
> + *
> + * Start a recursive batch on a ring that terminates after an exact amount
> + * of time has elapsed. Immediately returns a #igt_spin_t that contains the
> + * batch's handle that can be waited upon. The returned structure must be passed to
> + * igt_post_spin_batch() for post-processing.
> + *
> + * Returns:
> + * Structure with helper internal state for igt_post_spin_batch().
> + */
> +igt_spin_t igt_spin_batch(int fd, int64_t ns, int engine, unsigned dep_handle)
> +{
> + timer_t timer;
> + uint32_t handle = emit_recursive_batch(fd, engine, dep_handle);
> + int64_t wait_timeout = 0;
> + igt_assert_eq(gem_wait(fd, handle, &wait_timeout), -ETIME);
> +
> + if (ns < 1) {
> + if (ns == 0) {
> + *batch = MI_BATCH_BUFFER_END;
> + __sync_synchronize();
> + return (igt_spin_t){ handle, batch, 0};
> + }
> + return (igt_spin_t){ handle, batch, 0 };
> + }
> + timer = setup_batch_exit_timer(ns);
> +
> + return (igt_spin_t){ handle, batch, timer };
> +}
> +
> +/**
> + * igt_post_spin_batch:
> + * @fd: open i915 drm file descriptor
> + * @arg: spin batch state from igt_spin_batch()
> + *
> + * This function does the necessary post-processing after starting a recursive
> + * batch with igt_spin_batch().
> + */
> +void igt_post_spin_batch(int fd, igt_spin_t arg)
> +{
> + if (arg.handle == 0)
> + return;
> +
> + if (arg.timer > 0)
> + timer_delete(arg.timer);
> +
> + gem_close(fd, arg.handle);
> + munmap(arg.batch, 4096);
> +}
> +
> +
> +/**
> + * igt_spin_batch_wait:
> + * @fd: open i915 drm file descriptor
> + * @ns: amount of time in nanoseconds the batch executes after terminating.
> + * If value is less than 0, execute batch forever.
> + * @engine: ring to execute batch OR'd with execbuf flags. If value is less
> + * than 0, execute on all available rings.
> + * @dep_handle: handle to a buffer object dependency. If greater than 0, include
> + * this buffer on the wait dependency
> + *
> + * This is similar to igt_spin_batch(), but waits on the recursive batch to finish
> + * instead of returning right away. The function also does the necessary
> + * post-processing automatically if set to timeout.
> + */
> +void igt_spin_batch_wait(int fd, int64_t ns, int engine, unsigned dep_handle)
> +{
> + igt_spin_t spin = igt_spin_batch(fd, ns, engine, dep_handle);
> + int64_t wait_timeout = ns + (0.5 * NSEC_PER_SEC);
> + igt_assert_eq(gem_wait(fd, spin.handle, &wait_timeout), 0);
> +
> + igt_post_spin_batch(fd, spin);
> +}
> diff --git a/lib/igt_dummyload.h b/lib/igt_dummyload.h
> new file mode 100644
> index 0000000..79ead2c
> --- /dev/null
> +++ b/lib/igt_dummyload.h
> @@ -0,0 +1,42 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __IGT_DUMMYLOAD_H__
> +#define __IGT_DUMMYLOAD_H__
> +
> +typedef struct igt_spin {
> + unsigned handle;
> + uint32_t *batch;
> + timer_t timer;
> +} igt_spin_t;
> +
> +
> +igt_spin_t igt_spin_batch(int fd, int64_t ns, int engine, unsigned dep_handle);
> +
> +void igt_post_spin_batch(int fd, igt_spin_t arg);
> +
> +void igt_spin_batch_wait(int fd, int64_t ns, int engine, unsigned dep_handle);
> +
> +
> +#endif /* __IGT_DUMMYLOAD_H__ */
> --
> 2.7.0
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
2016-11-11 16:16 [PATCH i-g-t v5 1/4] lib: add igt_dummyload Daniel Vetter
@ 2016-11-14 18:24 ` Abdiel Janulgue
0 siblings, 0 replies; 41+ messages in thread
From: Abdiel Janulgue @ 2016-11-14 18:24 UTC (permalink / raw)
To: intel-gfx
On 11.11.2016 18:16, Daniel Vetter wrote:
> On Fri, Nov 11, 2016 at 07:41:10PM +0200, Abdiel Janulgue wrote:
>> A lot of igt testcases need some GPU workload to make sure a race
>> window is big enough. Unfortunately having a fixed amount of
>> workload leads to spurious test failures or overtly long runtimes
>> on some fast/slow platforms. This library contains functionality
>> to submit GPU workloads that should consume exactly a specific
>> amount of time.
>>
>> v2 : Add recursive batch feature from Chris
>> v3 : Drop auto-tuned stuff. Add bo dependecy to recursive batch
>> by adding a dummy reloc to the bo as suggested by Ville.
>> v4: Fix dependency reloc as write instead of read (Ville).
>> Fix wrong handling of batchbuffer start on ILK causing
>> test failure
>>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>> ---
>> lib/Makefile.sources | 2 +
>> lib/igt.h | 1 +
>> lib/igt_dummyload.c | 276 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> lib/igt_dummyload.h | 42 ++++++++
>
> Did you check that your new docs do show up in the generated
> documentation? Iirc you need to edit some xml under docs/.
> -Daniel
>
Yeah I missed that. Updated now to include the docs in generated
documentation.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare()
2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon
@ 2015-06-12 17:09 Dave Gordon
[not found] ` <1433789441-8295-1-git-send-email-david.s.gordon@intel.com>
0 siblings, 1 reply; 41+ messages in thread
From: Dave Gordon @ 2015-06-12 17:09 UTC (permalink / raw)
To: intel-gfx
The original idea of preallocating the OLR was implemented in
> 9d773091 drm/i915: Preallocate next seqno before touching the ring
and the sequence of operations was to allocate the OLR, then wrap past
the end of the ring if necessary, then wait for space if necessary.
But subsequently intel_ring_begin() was refactored, in
> 304d695 drm/i915: Flush outstanding requests before allocating new seqno
to ensure that pending work that might need to be flushed used the old
and not the newly-allocated request. This changed the sequence to wrap
and/or wait, then allocate, although the comment still said
/* Preallocate the olr before touching the ring */
which was no longer true as intel_wrap_ring_buffer() touches the ring.
However, with the introduction of dynamic pinning, in
> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
came the possibility that the ringbuffer might not be pinned to the GTT
or mapped into CPU address space when intel_ring_begin() is called. It
gets pinned when the request is allocated, so it's now important that
this comes *before* anything that can write into the ringbuffer, in this
case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer
happens not to be mapped, and (b) tail happens to be sufficiently close
to the end of the ring to trigger wrapping.
So the correct order is neither the original allocate-wait-pad-wait, nor
the subsequent wait-pad-wait-allocate, but wait-allocate-pad, avoiding
both the problems described in the two commits mentioned above.
As a bonus, we eliminate the special case where a single ring_begin()
might end up waiting twice (once to be able to wrap, and then again
if that still hadn't actually freed enough space for the request). We
just precalculate the total amount of space we'll need *including* any
for padding the end of the ring and wait for that much in one go :)
In the time since this code was written, it has all been cloned from
the original ringbuffer model to become the execbuffer code, in
> 82e104c drm/i915/bdw: New logical ring submission mechanism
So now we have to fix it in both paths ...
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 64 +++++++++++++++----------------
drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++---------------
2 files changed, 64 insertions(+), 63 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 454e836..3ef5fb6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -740,39 +740,22 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
execlists_context_queue(ring, ctx, ringbuf->tail, request);
}
-static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
- struct intel_context *ctx)
-{
- uint32_t __iomem *virt;
- int rem = ringbuf->size - ringbuf->tail;
-
- if (ringbuf->space < rem) {
- int ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
-
- if (ret)
- return ret;
- }
-
- virt = ringbuf->virtual_start + ringbuf->tail;
- rem /= 4;
- while (rem--)
- iowrite32(MI_NOOP, virt++);
-
- ringbuf->tail = 0;
- intel_ring_update_space(ringbuf);
-
- return 0;
-}
-
static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
struct intel_context *ctx, int bytes)
{
+ int fill = 0;
int ret;
+ /*
+ * If the request will not fit between 'tail' and the effective
+ * size of the ringbuffer, then we need to pad the end of the
+ * ringbuffer with NOOPs, then start the request at the top of
+ * the ring. This increases the total size that we need to check
+ * for by however much is left at the end of the ring ...
+ */
if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
- ret = logical_ring_wrap_buffer(ringbuf, ctx);
- if (unlikely(ret))
- return ret;
+ fill = ringbuf->size - ringbuf->tail;
+ bytes += fill;
}
if (unlikely(ringbuf->space < bytes)) {
@@ -781,6 +764,28 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
return ret;
}
+ /* Ensure we have a request before touching the ring */
+ if (!ringbuf->ring->outstanding_lazy_request) {
+ ret = i915_gem_request_alloc(ringbuf->ring, ctx);
+ if (ret)
+ return ret;
+ }
+
+ if (unlikely(fill)) {
+ uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail;
+
+ /* tail should not have moved */
+ if (WARN_ON(fill != ringbuf->size - ringbuf->tail))
+ fill = ringbuf->size - ringbuf->tail;
+
+ do
+ iowrite32(MI_NOOP, virt++);
+ while ((fill -= 4) > 0);
+
+ ringbuf->tail = 0;
+ intel_ring_update_space(ringbuf);
+ }
+
return 0;
}
@@ -814,11 +819,6 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
if (ret)
return ret;
- /* Preallocate the olr before touching the ring */
- ret = i915_gem_request_alloc(ring, ctx);
- if (ret)
- return ret;
-
ringbuf->space -= num_dwords * sizeof(uint32_t);
return 0;
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a3406b2..4c0bc29 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2137,29 +2137,6 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
return 0;
}
-static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
-{
- uint32_t __iomem *virt;
- struct intel_ringbuffer *ringbuf = ring->buffer;
- int rem = ringbuf->size - ringbuf->tail;
-
- if (ringbuf->space < rem) {
- int ret = ring_wait_for_space(ring, rem);
- if (ret)
- return ret;
- }
-
- virt = ringbuf->virtual_start + ringbuf->tail;
- rem /= 4;
- while (rem--)
- iowrite32(MI_NOOP, virt++);
-
- ringbuf->tail = 0;
- intel_ring_update_space(ringbuf);
-
- return 0;
-}
-
int intel_ring_idle(struct intel_engine_cs *ring)
{
struct drm_i915_gem_request *req;
@@ -2197,12 +2174,19 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
int bytes)
{
struct intel_ringbuffer *ringbuf = ring->buffer;
+ int fill = 0;
int ret;
+ /*
+ * If the request will not fit between 'tail' and the effective
+ * size of the ringbuffer, then we need to pad the end of the
+ * ringbuffer with NOOPs, then start the request at the top of
+ * the ring. This increases the total size that we need to check
+ * for by however much is left at the end of the ring ...
+ */
if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
- ret = intel_wrap_ring_buffer(ring);
- if (unlikely(ret))
- return ret;
+ fill = ringbuf->size - ringbuf->tail;
+ bytes += fill;
}
if (unlikely(ringbuf->space < bytes)) {
@@ -2211,6 +2195,28 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring,
return ret;
}
+ /* Ensure we have a request before touching the ring */
+ if (!ringbuf->ring->outstanding_lazy_request) {
+ ret = i915_gem_request_alloc(ringbuf->ring, ctx);
+ if (ret)
+ return ret;
+ }
+
+ if (unlikely(fill)) {
+ uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail;
+
+ /* tail should not have moved */
+ if (WARN_ON(fill != ringbuf->size - ringbuf->tail))
+ fill = ringbuf->size - ringbuf->tail;
+
+ do
+ iowrite32(MI_NOOP, virt++);
+ while ((fill -= 4) > 0);
+
+ ringbuf->tail = 0;
+ intel_ring_update_space(ringbuf);
+ }
+
return 0;
}
@@ -2229,11 +2235,6 @@ int intel_ring_begin(struct intel_engine_cs *ring,
if (ret)
return ret;
- /* Preallocate the olr before touching the ring */
- ret = i915_gem_request_alloc(ring, ring->default_context);
- if (ret)
- return ret;
-
ring->buffer->space -= num_dwords * sizeof(uint32_t);
return 0;
}
--
1.7.9.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 41+ messages in thread
* (no subject)
@ 2015-04-14 10:10 Mika Kahola
0 siblings, 0 replies; 41+ messages in thread
From: Mika Kahola @ 2015-04-14 10:10 UTC (permalink / raw)
To: intel-gfx
This series is revised based on Jani's good comments.
In this series the patch which read out DP link training
parameters from VBT is discarded as based on the comments
that I received.
Files changed:
drivers/gpu/drm/i915/intel_dp.c
drivers/gpu/drm/i915/intel_drv.h
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] drm/i915: (VLV2) Fix the hotplug detection bits
@ 2014-01-21 16:38 Todd Previte
2014-01-23 4:22 ` (no subject) Todd Previte
0 siblings, 1 reply; 41+ messages in thread
From: Todd Previte @ 2014-01-21 16:38 UTC (permalink / raw)
To: intel-gfx
These bits are in reverse order in the header from those defined in
the specification. Change the bit positions for ports B and D to
correctly match the spec.
---
drivers/gpu/drm/i915/i915_reg.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 10ecf90..2d77b51 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2083,9 +2083,9 @@
* Please check the detailed lore in the commit message for for experimental
* evidence.
*/
-#define PORTD_HOTPLUG_LIVE_STATUS (1 << 29)
+#define PORTD_HOTPLUG_LIVE_STATUS (1 << 27)
#define PORTC_HOTPLUG_LIVE_STATUS (1 << 28)
-#define PORTB_HOTPLUG_LIVE_STATUS (1 << 27)
+#define PORTB_HOTPLUG_LIVE_STATUS (1 << 29)
#define PORTD_HOTPLUG_INT_STATUS (3 << 21)
#define PORTC_HOTPLUG_INT_STATUS (3 << 19)
#define PORTB_HOTPLUG_INT_STATUS (3 << 17)
--
1.8.3.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* (no subject)
@ 2013-12-30 2:29 Oravil Nair
2014-01-07 7:32 ` Daniel Vetter
0 siblings, 1 reply; 41+ messages in thread
From: Oravil Nair @ 2013-12-30 2:29 UTC (permalink / raw)
To: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 242 bytes --]
Hi,
i915_gem_object_pin(), during i915 driver create, seems to write to the
memory written by BIOS. Where can the start address be specified to
allocate memory so that the memory written by BIOS is not overwritten at
initialization?
Thanks
[-- Attachment #1.2: Type: text/html, Size: 331 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: (no subject)
2013-12-30 2:29 Oravil Nair
@ 2014-01-07 7:32 ` Daniel Vetter
0 siblings, 0 replies; 41+ messages in thread
From: Daniel Vetter @ 2014-01-07 7:32 UTC (permalink / raw)
To: Oravil Nair; +Cc: intel-gfx
On Mon, Dec 30, 2013 at 07:59:49AM +0530, Oravil Nair wrote:
> Hi,
>
> i915_gem_object_pin(), during i915 driver create, seems to write to the
> memory written by BIOS. Where can the start address be specified to
> allocate memory so that the memory written by BIOS is not overwritten at
> initialization?
I guess you want Jesse's patches to save the screen contents from the BIOS
modeset setup? But tbh I'm not clear at all what exactly you're talking
about.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CAEyVMbDjLwcDFrQ7y4UtGp7HOT1wi5MB2EWLGTuOdJCKDWsUew@mail.gmail.com>]
* Re: (no subject)
[not found] <CAEyVMbDjLwcDFrQ7y4UtGp7HOT1wi5MB2EWLGTuOdJCKDWsUew@mail.gmail.com>
@ 2013-04-03 15:46 ` Daniel Vetter
0 siblings, 0 replies; 41+ messages in thread
From: Daniel Vetter @ 2013-04-03 15:46 UTC (permalink / raw)
To: Dihan Wickremasuriya; +Cc: Michael Siracusa, intel-gfx, Jeff Faneuff
[-- Attachment #1.1: Type: text/plain, Size: 1354 bytes --]
Hi all,
Two things:
- Please _always_ include a public mailing list when reporting bugs, your
dear maintainer sometimes slacks off.
- We need to see the error_state before we can assess what kind of hang you
have (it's like gettting a SIGSEGV for a normal program, no two gpu hangs
are the same ...).
Cheers, Daniel
On Wed, Apr 3, 2013 at 5:42 PM, Dihan Wickremasuriya <
dwickremasuriya@rethinkrobotics.com> wrote:
> Hi Chris/Daniel,
>
> This is Dihan from Rethink Robotics and we were hoping you could help with
> the GPU hang problem in the i915 driver mentioned in bug #26345:
> https://bugs.freedesktop.org/show_bug.cgi?id=26345
>
> We are running into the same problem with the 3.8.5 kernel (which has the
> fix mentioned in comment #153 of the bug report) when running a Qt 5
> application in Gentoo. At times the entire X session would freeze. The
> x11-perf tests described in the bug report run without any issues though.
>
> Would you happen to know whether this is because of an issue in the driver
> that is not currently being addressed by the fix? I have attached the Xorg
> log, the dmesg output and i915_error_state from a hung session. Please let
> me know if you need any more info. Thanks in advance!
>
> Best regards,
> Dihan
>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
[-- Attachment #1.2: Type: text/html, Size: 2095 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-05-31 18:00 Muhammad Jamil
0 siblings, 0 replies; 41+ messages in thread
From: Muhammad Jamil @ 2012-05-31 18:00 UTC (permalink / raw)
To: e_wangi, intel-gfx, iswahyudiwardany, mail-noreply, sandyseteluk,
irdiansyah27, afia_gra
Learn How to Make Money Online
http://3ftechnologies.com/http102dx-2.php?qohranknumber=245
__________________
Why dont your juries hangmurderers? Because theyre afraid the mans friends will shoot them inthe back, in the dark--and its just what they WOULD do. kaelah winfrith
Thu, 31 May 2012 19:00:33
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-04-12 0:55 Rodrigo Vivi
0 siblings, 0 replies; 41+ messages in thread
From: Rodrigo Vivi @ 2012-04-12 0:55 UTC (permalink / raw)
To: DRI Development; +Cc: Intel Graphics Development, Rodrigo Vivi
There are many bugs open on fd.o regarding missing modes that are supported on Windows and other closed source drivers.
>From EDID spec we can (might?) infer modes using GTF and CVT when monitor allows it trough range limited flag... obviously limiting by the range.
>From our code:
* EDID spec says modes should be preferred in this order:
* - preferred detailed mode
* - other detailed modes from base block
* - detailed modes from extension blocks
* - CVT 3-byte code modes
* - standard timing codes
* - established timing codes
* - modes inferred from GTF or CVT range information
*
* We get this pretty much right.
Not actually so right... We were inferring just using GTF... not CVT or even GTF2.
This patch not just add some common cvt modes but also allows some modes been inferred when using gtf2 as well.
Cheers,
Rodrigo.
>From 4b7a88d0d812583d850ca691d1ac491355230d11 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Wed, 11 Apr 2012 15:36:31 -0300
Subject: [PATCH] drm/edid: Adding common CVT inferred modes when monitor
allows range limited ones trough EDID.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/drm_edid.c | 37 +++++++++++++-
drivers/gpu/drm/drm_edid_modes.h | 101 ++++++++++++++++++++++++++++++++++++++
2 files changed, 136 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index 7ee7be1..3179572 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -1020,17 +1020,50 @@ drm_gtf_modes_for_range(struct drm_connector *connector, struct edid *edid,
return modes;
}
+static int
+drm_cvt_modes_for_range(struct drm_connector *connector, struct edid *edid,
+ struct detailed_timing *timing)
+{
+ int i, modes = 0;
+ struct drm_display_mode *newmode;
+ struct drm_device *dev = connector->dev;
+
+ for (i = 0; i < drm_num_cvt_inferred_modes; i++) {
+ if (mode_in_range(drm_cvt_inferred_modes + i, edid, timing)) {
+ newmode = drm_mode_duplicate(dev, &drm_cvt_inferred_modes[i]);
+ if (newmode) {
+ drm_mode_probed_add(connector, newmode);
+ modes++;
+ }
+ }
+ }
+
+ return modes;
+}
+
static void
do_inferred_modes(struct detailed_timing *timing, void *c)
{
struct detailed_mode_closure *closure = c;
struct detailed_non_pixel *data = &timing->data.other_data;
- int gtf = (closure->edid->features & DRM_EDID_FEATURE_DEFAULT_GTF);
+ int timing_level = standard_timing_level(closure->edid);
- if (gtf && data->type == EDID_DETAIL_MONITOR_RANGE)
+ if (data->type == EDID_DETAIL_MONITOR_RANGE)
+ switch (timing_level) {
+ case LEVEL_DMT:
+ break;
+ case LEVEL_GTF:
+ case LEVEL_GTF2:
closure->modes += drm_gtf_modes_for_range(closure->connector,
closure->edid,
timing);
+ break;
+ case LEVEL_CVT:
+ closure->modes += drm_cvt_modes_for_range(closure->connector,
+ closure->edid,
+ timing);
+ break;
+ }
}
static int
diff --git a/drivers/gpu/drm/drm_edid_modes.h b/drivers/gpu/drm/drm_edid_modes.h
index a91ffb1..7e14a32 100644
--- a/drivers/gpu/drm/drm_edid_modes.h
+++ b/drivers/gpu/drm/drm_edid_modes.h
@@ -266,6 +266,107 @@ static const struct drm_display_mode drm_dmt_modes[] = {
static const int drm_num_dmt_modes =
sizeof(drm_dmt_modes) / sizeof(struct drm_display_mode);
+static const struct drm_display_mode drm_cvt_inferred_modes[] = {
+ /* 640x480@60Hz */
+ { DRM_MODE("640x480", DRM_MODE_TYPE_DRIVER, 23750 640, 664,
+ 720, 800, 0, 480, 483, 487, 500, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 800x600@60Hz */
+ { DRM_MODE("800x600", DRM_MODE_TYPE_DRIVER, 38250, 800, 832,
+ 912, 1024, 0, 600, 603, 607, 624, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 900x600@60Hz */
+ { DRM_MODE("900x600", DRM_MODE_TYPE_DRIVER, 45250, 960, 992,
+ 1088, 1216, 0, 600, 603, 609, 624, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1024x576@60Hz */
+ { DRM_MODE("1024x576", DRM_MODE_TYPE_DRIVER, 46500, 1024, 1064,
+ 1160, 1296, 0, 576, 579, 584, 599, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1024x768@60Hz */
+ { DRM_MODE("1024x768", DRM_MODE_TYPE_DRIVER, 63500, 1024, 1072,
+ 1176, 1328, 0, 768, 771, 775, 798, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1152x864@60Hz */
+ { DRM_MODE("1152x864", DRM_MODE_TYPE_DRIVER, 81750, 1152, 1216,
+ 1336, 1520, 0, 864, 867, 871, 897, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1280x720@60Hz */
+ { DRM_MODE("1280x720", DRM_MODE_TYPE_DRIVER, 74500, 1280, 1344,
+ 1472, 1664, 0, 720, 723, 728, 748, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1280x768@60Hz */
+ { DRM_MODE("1280x768", DRM_MODE_TYPE_DRIVER, 79500, 1280, 1344,
+ 1472, 1664, 0, 768, 771, 781, 798, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1280x800@60Hz */
+ { DRM_MODE("1280x800", DRM_MODE_TYPE_DRIVER, 83500, 1280, 1352,
+ 1480, 1680, 0, 800, 803, 809, 831, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1280x1024@60Hz */
+ { DRM_MODE("1280x1024", DRM_MODE_TYPE_DRIVER, 109000, 1280, 1368,
+ 1496, 1712, 0, 1024, 1027, 1034, 1063, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1360x768@60Hz */
+ { DRM_MODE("1360x768", DRM_MODE_TYPE_DRIVER, 84750, 1360, 1432,
+ 1568, 1776, 0, 768, 771, 781, 798, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1366x768@60Hz */
+ { DRM_MODE("1366x768", DRM_MODE_TYPE_DRIVER, 85250, 1368, 1440,
+ 1576, 1784, 0, 768, 771, 781, 798, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1440x900@60Hz */
+ { DRM_MODE("1440x900", DRM_MODE_TYPE_DRIVER, 106500, 1440, 1528,
+ 1672, 1904, 0, 900, 903, 909, 934, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1400x1050@60Hz */
+ { DRM_MODE("1400x1050", DRM_MODE_TYPE_DRIVER, 121750, 1400, 1488,
+ 1632, 1864, 0, 1050, 1053, 1057, 1089, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1600x900@60Hz */
+ { DRM_MODE("1600x900", DRM_MODE_TYPE_DRIVER, 118250, 1600, 1696,
+ 1856, 2112, 0, 900, 903, 908, 934, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1600x1200@60Hz */
+ { DRM_MODE("1600x1200", DRM_MODE_TYPE_DRIVER, 161000, 1600, 1712,
+ 1880, 2160, 0, 1200, 1203, 1207, 1245, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1680x945@60Hz */
+ { DRM_MODE("1680x945", DRM_MODE_TYPE_DRIVER, 130750, 1680, 1776,
+ 1952, 2224, 0, 945, 948, 953, 981, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1680x1050@60Hz */
+ { DRM_MODE("1680x1050", DRM_MODE_TYPE_DRIVER, 146250, 1680, 1784,
+ 1960, 2240, 0, 1050, 1053, 1059, 1089, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1920x1080@60Hz */
+ { DRM_MODE("1920x1080", DRM_MODE_TYPE_DRIVER, 73000, 1920, 2048,
+ 2248, 2576, 0, 1080, 1083, 1088, 1120, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1920x1200@60Hz */
+ { DRM_MODE("1920x1200", DRM_MODE_TYPE_DRIVER, 193250, 1920, 2056,
+ 2256, 2592, 0, 1200, 1203, 1209, 1245, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 1920x1440@60Hz */
+ { DRM_MODE("1920x1440", DRM_MODE_TYPE_DRIVER, 233500, 1920, 2064,
+ 2264, 2608, 0, 1440, 1443, 1447, 1493, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 2048x1152@60Hz */
+ { DRM_MODE("2048x1152", DRM_MODE_TYPE_DRIVER, 197000, 2048, 2184,
+ 2400, 2752, 0, 1152, 1155, 1160, 1195, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 2048x1536@60Hz */
+ { DRM_MODE("2048x1536", DRM_MODE_TYPE_DRIVER, 272000, 2048, 2208,
+ 2424, 2800, 0, 1563, 1566, 1576, 1620, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+ /* 2560x1600@60Hz */
+ { DRM_MODE("2560x1600", DRM_MODE_TYPE_DRIVER, 348500, 2560, 2760,
+ 3032, 3504, 0, 1600, 1603, 1609, 1658, 0,
+ DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NVSYNC) },
+};
+static const int drm_num_cvt_inferred_modes =
+ sizeof(drm_cvt_inferred_modes) / sizeof(struct drm_display_mode);
+
static const struct drm_display_mode edid_est_modes[] = {
{ DRM_MODE("800x600", DRM_MODE_TYPE_DRIVER, 40000, 800, 840,
968, 1056, 0, 600, 601, 605, 628, 0,
--
1.7.7.6
^ permalink raw reply related [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-04-08 2:26 Muhammad Jamil
0 siblings, 0 replies; 41+ messages in thread
From: Muhammad Jamil @ 2012-04-08 2:26 UTC (permalink / raw)
To: alia2426, intel-gfx, anggita_chaonk, embapoenya, support,
semetgp, sandyseteluk
Make Income 0nline with revolutionary system
http://184.168.145.37/etdfgtim.php?pjtcamp=95
Sun, 8 Apr 2012 3:25:59
______________
"Billy took his seat with the others around a golden oak table, with a microphone all his own." (c) jerald visszapattant
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-04-05 6:44 Muhammad Jamil
0 siblings, 0 replies; 41+ messages in thread
From: Muhammad Jamil @ 2012-04-05 6:44 UTC (permalink / raw)
To: intel-gfx, dianaoktavia81, ilham_syah, herman_suni,
friends_confession, ashley_nn30, eddy_susanto96
Learn H0w T0 Earn M0ney 0nline N0w
http://residentialtreatmentcenter.net/coffegold.php?evynumber=91
Thu, 5 Apr 2012 7:44:44
_________________________________
" You yell." (c) daelynn wulfgar
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-04-03 18:25 Muhammad Jamil
0 siblings, 0 replies; 41+ messages in thread
From: Muhammad Jamil @ 2012-04-03 18:25 UTC (permalink / raw)
To: embapoenya, intel-gfx, iswahyudiwardany, semetgp, support
[-- Attachment #1.1: Type: text/plain, Size: 109 bytes --]
http://www.signsandsites.com/wp-content/themes/duotone/nav21.php
Muhammad Jamil
Sumintar
4/3/2012 11:25:11 AM
[-- Attachment #1.2: Type: text/html, Size: 366 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2012-04-03 12:42 Muhammad Jamil
0 siblings, 0 replies; 41+ messages in thread
From: Muhammad Jamil @ 2012-04-03 12:42 UTC (permalink / raw)
To: rahman_lyk, ivoximoet, psamawa, friends_confession, cute_mommy77,
intel-gfx, irdiansyah27
W0rking fr0m h0me Ieads t0 sh0cking m0ney resuIts!
http://jadehurtz.com/coffeemoney.php?apohgoto=64
Tue, 3 Apr 2012 13:42:09
______________
" So Tom took his goods out himself, and soughtemployers for Bert who did not know of this strain of poetry inhis nature" (c) britin wynton
^ permalink raw reply [flat|nested] 41+ messages in thread
* forcewake v4 (now with spinlock)
@ 2011-04-14 18:13 Ben Widawsky
2011-04-14 18:13 ` (no subject) Ben Widawsky
0 siblings, 1 reply; 41+ messages in thread
From: Ben Widawsky @ 2011-04-14 18:13 UTC (permalink / raw)
To: intel-gfx
As the patches say, I don't think adding this spinlock will have too
much of a performance impact (I couldn't notice anything in my limited
testing), because the serializing locks are already held when acquiring
this lock. I suppose it now serializes access between stuct_mutex and
config.lock, but in most cases I don't think that's big.
Perhaps the ugliest change is spin_lock() in debugfs. But as I argue in
the comments, if you can't get the lock there, register reads are also
failing, and hung somewhere else.
Ben
^ permalink raw reply [flat|nested] 41+ messages in thread
* forcewake junk, RFC, RFT(test)
@ 2011-04-08 17:47 Ben Widawsky
2011-04-09 20:26 ` forcewake junk, part2 Ben Widawsky
0 siblings, 1 reply; 41+ messages in thread
From: Ben Widawsky @ 2011-04-08 17:47 UTC (permalink / raw)
To: intel-gfx
I am requesting 3 things:
1. code review/request for comments
2. testing on SNB
3. performance regression on < SNB
review
------
The first two patches have nothing to do with the user's ability to
interact with registers. Those patches are about enforcing correctness
within our driver for newer generation products. In other words, if
patch 3 doesn't make sense, don't automatically drop 1, and 2.
review patch 1/2
----------------
The first change is straight forward. It attempts to fold the forcewake stuff
into our standard register read and write functions. Some overhead is added as
a result, but I'd guess it is nothing compared to the UC read about to happen,
and so will not be noticeable.
The existing method for doing the forcewake_get and put requires some
synchronization. For the most part it is protected by struct_mutex and life is
good. Adding a WARN to the get()/put() function is there more as documentation
to future developers, and reminders to the current ones.
To provide an interface to allow user space to use forcewake, I've decided to
change the mechanism that get()/put() operate by introducing a reference count.
The reference count itself must be protected by struct mutex (since we need
synchronization between the initial condition and the destructor). Imagine for
instance: Thread A does a get() and is in the middle of waking the GT (ref has
already been incremented), thread B comes in, thinks the GT is awake,
and incorrectly goes about its business. This does allow users of the
interface to only have to hold struct_mutex while doing the get() and
not for every read and write.
review patch 3
--------------
User space interface is mostly what you'd expect, except the in the case
of trying to get lockless access. This code is a bit meh, but to
remind everyone, it is root only debug code.
testing
-------
The assertion that forcewake is currently properly protected for the
most part, may not be true. People interested should run these patches
on SNB systems with their favorite graphics applications and report the
warnings that occur, they will be in the kernel log, ie. dmesg.
performance
-----------
Looking mostly for regressions on older systems. There is a slight
overhead added to all register reads and writes, which I think shouldn't
be noticeable, but who knows.
Thanks, and let the flames begin!
Ben
^ permalink raw reply [flat|nested] 41+ messages in thread
* forcewake junk, part2
2011-04-08 17:47 forcewake junk, RFC, RFT(test) Ben Widawsky
@ 2011-04-09 20:26 ` Ben Widawsky
2011-04-09 20:26 ` (no subject) Ben Widawsky
0 siblings, 1 reply; 41+ messages in thread
From: Ben Widawsky @ 2011-04-09 20:26 UTC (permalink / raw)
To: intel-gfx
The request to test, and report the warnings is still desired.
These address all of Chris' comments from IRC (I think). I tried to put
the reasoning in the code comments, as well as commits; but I'll add it
here for completeness.
Simplify the acquisition of the forcewake lock by removing the forced
entry. The motivation behind it is that the kernel offers sufficient
tools to get relevant data before/during/after a hang, and adding a
complex mechanism for userspace is not such a great idea. If a need to
do this comes up, we can always add it back on a situational basis.
Similarly, on releasing of the debugfs file, no more forced entry mechanism.
This does leave a potential for release to hang, but AFAICS if release hangs,
the system is going to need a reboot before i915 becomes usable again because
module unload would also hang.
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
@ 2011-04-07 21:32 Jesse Barnes
0 siblings, 0 replies; 41+ messages in thread
From: Jesse Barnes @ 2011-04-07 21:32 UTC (permalink / raw)
To: intel-gfx
These are some prep patches I'd like to get feedback on. I've only
compile tested them so far (the actual hw support code this is for was
tested before the split), so testing would be appreciated as well.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
@ 2010-11-09 9:17 Zou Nan hai
2010-11-09 9:17 ` Zou, Nanhai
0 siblings, 1 reply; 41+ messages in thread
From: Zou Nan hai @ 2010-11-09 9:17 UTC (permalink / raw)
To: intel-gfx, Chris Wilson
before reading ring register, set force wake bit to prevent GT core
power down to low power state. otherwise we may read stale value.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++++++
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_ringbuffer.c | 3 ---
drivers/gpu/drm/i915/intel_ringbuffer.h | 11 +++++++----
4 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 90414ae..53c0239 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1325,4 +1325,18 @@ static inline void i915_write(struct drm_i915_private *dev_priv, u32 reg,
#define PRIMARY_RINGBUFFER_SIZE (128*1024)
+/* on SNB platform,
+ before reading ring registers forcewake bit
+ must be set to prevent GT core from power down
+*/
+
+static inline u32 i915_safe_read(struct intel_ring_buffer *ring,
+ unsigned int offset)
+{
+ u32 ret;
+ drm_i915_private_t *dev_priv = ring->dev->dev_private;
+ if (IS_GEN6(ring->dev)) I915_WRITE(FORCEWAKE, 1);
+ ret = I915_READ(offset);
+ return ret;
+}
#endif
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 25ed911..4d994d2 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3052,4 +3052,5 @@
#define EDP_LINK_TRAIN_800MV_0DB_SNB_B (0x38<<22)
#define EDP_LINK_TRAIN_VOL_EMP_MASK_SNB (0x3f<<22)
+#define FORCEWAKE 0xA18C
#endif /* _I915_REG_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7c1f3ff..2820235 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -572,7 +572,6 @@ err:
int intel_init_ring_buffer(struct drm_device *dev,
struct intel_ring_buffer *ring)
{
- struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_object *obj_priv;
struct drm_gem_object *obj;
int ret;
@@ -691,8 +690,6 @@ int intel_wait_ring_buffer(struct drm_device *dev,
struct intel_ring_buffer *ring, int n)
{
unsigned long end;
- drm_i915_private_t *dev_priv = dev->dev_private;
-
trace_i915_ring_wait_begin (dev);
end = jiffies + 3 * HZ;
do {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3126c26..cde1cdd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -7,13 +7,16 @@ struct intel_hw_status_page {
struct drm_gem_object *obj;
};
-#define I915_READ_TAIL(ring) I915_READ(RING_TAIL(ring->mmio_base))
+#define I915_READ_TAIL(ring) i915_safe_read(ring, RING_TAIL(ring->mmio_base))
#define I915_WRITE_TAIL(ring, val) I915_WRITE(RING_TAIL(ring->mmio_base), val)
-#define I915_READ_START(ring) I915_READ(RING_START(ring->mmio_base))
+
+#define I915_READ_START(ring) i915_safe_read(ring, RING_START(ring->mmio_base))
#define I915_WRITE_START(ring, val) I915_WRITE(RING_START(ring->mmio_base), val)
-#define I915_READ_HEAD(ring) I915_READ(RING_HEAD(ring->mmio_base))
+
+#define I915_READ_HEAD(ring) i915_safe_read(ring, RING_HEAD(ring->mmio_base))
#define I915_WRITE_HEAD(ring, val) I915_WRITE(RING_HEAD(ring->mmio_base), val)
-#define I915_READ_CTL(ring) I915_READ(RING_CTL(ring->mmio_base))
+
+#define I915_READ_CTL(ring) i915_safe_read(ring, RING_CTL(ring->mmio_base))
#define I915_WRITE_CTL(ring, val) I915_WRITE(RING_CTL(ring->mmio_base), val)
struct drm_i915_gem_execbuffer2;
--
1.7.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
2010-11-09 9:17 [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register Zou Nan hai
@ 2010-11-09 9:17 ` Zou, Nanhai
2010-11-09 10:50 ` Chris Wilson
0 siblings, 1 reply; 41+ messages in thread
From: Zou, Nanhai @ 2010-11-09 9:17 UTC (permalink / raw)
To: intel-gfx, Chris Wilson, Zhao, Jian J
>>-----Original Message-----
>>From: Zou, Nanhai
>>Sent: 2010年11月9日 17:18
>>To: intel-gfx@lists.freedesktop.org; Chris Wilson
>>Cc: Zou, Nanhai
>>Subject: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring
>>register
>>
>>before reading ring register, set force wake bit to prevent GT core
>>power down to low power state. otherwise we may read stale value.
>>
>>Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
>>---
>> drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++++++
>> drivers/gpu/drm/i915/i915_reg.h | 1 +
>> drivers/gpu/drm/i915/intel_ringbuffer.c | 3 ---
>> drivers/gpu/drm/i915/intel_ringbuffer.h | 11 +++++++----
>> 4 files changed, 22 insertions(+), 7 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>b/drivers/gpu/drm/i915/i915_drv.h
>>index 90414ae..53c0239 100644
>>--- a/drivers/gpu/drm/i915/i915_drv.h
>>+++ b/drivers/gpu/drm/i915/i915_drv.h
>>@@ -1325,4 +1325,18 @@ static inline void i915_write(struct drm_i915_private
>>*dev_priv, u32 reg,
>>
>> #define PRIMARY_RINGBUFFER_SIZE (128*1024)
>>
>>+/* on SNB platform,
>>+ before reading ring registers forcewake bit
>>+ must be set to prevent GT core from power down
>>+*/
>>+
>>+static inline u32 i915_safe_read(struct intel_ring_buffer *ring,
>>+ unsigned int offset)
>>+{
>>+ u32 ret;
>>+ drm_i915_private_t *dev_priv = ring->dev->dev_private;
>>+ if (IS_GEN6(ring->dev)) I915_WRITE(FORCEWAKE, 1);
>>+ ret = I915_READ(offset);
>>+ return ret;
>>+}
>> #endif
>>diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>b/drivers/gpu/drm/i915/i915_reg.h
>>index 25ed911..4d994d2 100644
>>--- a/drivers/gpu/drm/i915/i915_reg.h
>>+++ b/drivers/gpu/drm/i915/i915_reg.h
>>@@ -3052,4 +3052,5 @@
>> #define EDP_LINK_TRAIN_800MV_0DB_SNB_B (0x38<<22)
>> #define EDP_LINK_TRAIN_VOL_EMP_MASK_SNB (0x3f<<22)
>>
>>+#define FORCEWAKE 0xA18C
>> #endif /* _I915_REG_H_ */
>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>index 7c1f3ff..2820235 100644
>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>@@ -572,7 +572,6 @@ err:
>> int intel_init_ring_buffer(struct drm_device *dev,
>> struct intel_ring_buffer *ring)
>> {
>>- struct drm_i915_private *dev_priv = dev->dev_private;
>> struct drm_i915_gem_object *obj_priv;
>> struct drm_gem_object *obj;
>> int ret;
>>@@ -691,8 +690,6 @@ int intel_wait_ring_buffer(struct drm_device *dev,
>> struct intel_ring_buffer *ring, int n)
>> {
>> unsigned long end;
>>- drm_i915_private_t *dev_priv = dev->dev_private;
>>-
>> trace_i915_ring_wait_begin (dev);
>> end = jiffies + 3 * HZ;
>> do {
>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>index 3126c26..cde1cdd 100644
>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>@@ -7,13 +7,16 @@ struct intel_hw_status_page {
>> struct drm_gem_object *obj;
>> };
>>
>>-#define I915_READ_TAIL(ring) I915_READ(RING_TAIL(ring->mmio_base))
>>+#define I915_READ_TAIL(ring) i915_safe_read(ring,
>>RING_TAIL(ring->mmio_base))
>> #define I915_WRITE_TAIL(ring, val) I915_WRITE(RING_TAIL(ring->mmio_base),
>>val)
>>-#define I915_READ_START(ring) I915_READ(RING_START(ring->mmio_base))
>>+
>>+#define I915_READ_START(ring) i915_safe_read(ring,
>>RING_START(ring->mmio_base))
>> #define I915_WRITE_START(ring, val) I915_WRITE(RING_START(ring->mmio_base),
>>val)
>>-#define I915_READ_HEAD(ring) I915_READ(RING_HEAD(ring->mmio_base))
>>+
>>+#define I915_READ_HEAD(ring) i915_safe_read(ring,
>>RING_HEAD(ring->mmio_base))
>> #define I915_WRITE_HEAD(ring, val) I915_WRITE(RING_HEAD(ring->mmio_base),
>>val)
>>-#define I915_READ_CTL(ring) I915_READ(RING_CTL(ring->mmio_base))
>>+
>>+#define I915_READ_CTL(ring) i915_safe_read(ring,
>>RING_CTL(ring->mmio_base))
>> #define I915_WRITE_CTL(ring, val) I915_WRITE(RING_CTL(ring->mmio_base),
>>val)
>>
>> struct drm_i915_gem_execbuffer2;
>>--
>>1.7.1
I have tested this patch with the read ring head from status page workaround patch reverted.
Seems it works on my SNB box.
Thanks
Zou Nanhai
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
2010-11-09 9:17 ` Zou, Nanhai
@ 2010-11-09 10:50 ` Chris Wilson
2010-11-10 0:36 ` Zou, Nanhai
0 siblings, 1 reply; 41+ messages in thread
From: Chris Wilson @ 2010-11-09 10:50 UTC (permalink / raw)
To: Zou, Nanhai, intel-gfx, Zhao, Jian J
On Tue, 9 Nov 2010 17:17:07 +0800, "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> I have tested this patch with the read ring head from status page workaround patch reverted.
> Seems it works on my SNB box.
I needed to add a udelay(100) to i915_safe_read for my rev 8. Can you
check if there is a recommended delay for FORCEWAKE?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
2010-11-09 10:50 ` Chris Wilson
@ 2010-11-10 0:36 ` Zou, Nanhai
2010-11-10 7:54 ` Chris Wilson
0 siblings, 1 reply; 41+ messages in thread
From: Zou, Nanhai @ 2010-11-10 0:36 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Zhao, Jian J
>>-----Original Message-----
>>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
>>Sent: 2010年11月9日 18:50
>>To: Zou, Nanhai; intel-gfx@lists.freedesktop.org; Zhao, Jian J
>>Subject: RE: [PATCH] drm/i915/ringbuffer: set force wake bit before reading
>>ring register
>>
>>On Tue, 9 Nov 2010 17:17:07 +0800, "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
>>
>>> I have tested this patch with the read ring head from status page workaround
>>patch reverted.
>>> Seems it works on my SNB box.
>>
>>I needed to add a udelay(100) to i915_safe_read for my rev 8. Can you
>>check if there is a recommended delay for FORCEWAKE?
>>-Chris
>>
Dose a post read to FORCEWAKE help?
Thanks
Zou Nanhai
>>--
>>Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
2010-11-10 0:36 ` Zou, Nanhai
@ 2010-11-10 7:54 ` Chris Wilson
2010-11-10 18:47 ` Jesse Barnes
0 siblings, 1 reply; 41+ messages in thread
From: Chris Wilson @ 2010-11-10 7:54 UTC (permalink / raw)
To: Zou, Nanhai, intel-gfx, Zhao, Jian J
[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]
On Wed, 10 Nov 2010 08:36:20 +0800, "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> >>-----Original Message-----
> >>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >>Sent: 2010å¹´11æ9æ¥ 18:50
> >>To: Zou, Nanhai; intel-gfx@lists.freedesktop.org; Zhao, Jian J
> >>Subject: RE: [PATCH] drm/i915/ringbuffer: set force wake bit before reading
> >>ring register
> >>
> >>On Tue, 9 Nov 2010 17:17:07 +0800, "Zou, Nanhai" <nanhai.zou@intel.com> wrote:
> >>
> >>> I have tested this patch with the read ring head from status page workaround
> >>patch reverted.
> >>> Seems it works on my SNB box.
> >>
> >>I needed to add a udelay(100) to i915_safe_read for my rev 8. Can you
> >>check if there is a recommended delay for FORCEWAKE?
> >>-Chris
> >>
> Dose a post read to FORCEWAKE help?
No, tried a POSTING_READ(FORCEWAKE) first and it wasn't until I added the
udelay() between the READ(FORCEWAKE) and the READ(reg) that it returned
the correct results in a single call.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register
2010-11-10 7:54 ` Chris Wilson
@ 2010-11-10 18:47 ` Jesse Barnes
2010-11-17 22:52 ` (no subject) Thantry, Hariharan L
0 siblings, 1 reply; 41+ messages in thread
From: Jesse Barnes @ 2010-11-10 18:47 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Wed, 10 Nov 2010 07:54:12 +0000
Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, 10 Nov 2010 08:36:20 +0800, "Zou, Nanhai"
> <nanhai.zou@intel.com> wrote:
> > >>-----Original Message-----
> > >>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > >>Sent: 2010年11月9日 18:50
> > >>To: Zou, Nanhai; intel-gfx@lists.freedesktop.org; Zhao, Jian J
> > >>Subject: RE: [PATCH] drm/i915/ringbuffer: set force wake bit
> > >>before reading ring register
> > >>
> > >>On Tue, 9 Nov 2010 17:17:07 +0800, "Zou, Nanhai"
> > >><nanhai.zou@intel.com> wrote:
> > >>
> > >>> I have tested this patch with the read ring head from status
> > >>> page workaround
> > >>patch reverted.
> > >>> Seems it works on my SNB box.
> > >>
> > >>I needed to add a udelay(100) to i915_safe_read for my rev 8. Can
> > >>you check if there is a recommended delay for FORCEWAKE?
> > >>-Chris
> > >>
> > Dose a post read to FORCEWAKE help?
>
> No, tried a POSTING_READ(FORCEWAKE) first and it wasn't until I added
> the udelay() between the READ(FORCEWAKE) and the READ(reg) that it
> returned the correct results in a single call.
I think these regs will be affected by the ucode on the GPU; we may
need specific delays after some operations. Have you tried asking the
hw guys what the best practices are here?
--
Jesse Barnes, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 41+ messages in thread
* (no subject)
2010-11-10 18:47 ` Jesse Barnes
@ 2010-11-17 22:52 ` Thantry, Hariharan L
0 siblings, 0 replies; 41+ messages in thread
From: Thantry, Hariharan L @ 2010-11-17 22:52 UTC (permalink / raw)
To: intel-gfx
Hi folks,
I am a bit new to graphics, but had a few questions that I was hoping that someone could answer for me. I hope this is the right forum to ask these questions.
My interest is in seeing whether I can use the Intel integrated graphics part for non-graphics (GPGPU) work, while driving the display through another discrete card.
I have an Ironlake system (core setup with base Debian (no X-related packages), a basic PCI-E graphics card (NVIDIA NV37GL) and a 2.6.36 kernel with the following relevant config entries.
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_TTM=m
CONFIG_DRM_R128=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
CONFIG_DRM_I915_KMS=y
I have libdrm & libva installed, and was hoping to use libdrm APIs to do some basic operations on the integrated graphics.
I can insmod the DRM & the DRM_KMS_HELPER module fine, but when trying to insert the I915 driver, I get a "no such device error", even though the module object exists.
lspci doesn't seem to return the Intel integrated graphics PCI device either.
00:00.0 Host bridge: Intel Corporation Auburndale/Havendale DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation Auburndale/Havendale PCI Express x16 Root Port (rev 02)
00:16.0 Communication controller: Intel Corporation Ibex Peak HECI Controller (rev 06)
00:16.2 IDE interface: Intel Corporation Ibex Peak PT IDER Controller (rev 06)
00:16.3 Serial controller: Intel Corporation Ibex Peak KT Controller (rev 06)
00:19.0 Ethernet controller: Intel Corporation Device 10f0 (rev 06)
00:1a.0 USB Controller: Intel Corporation Ibex Peak USB2 Enhanced Host Controller (rev 06)
00:1b.0 Audio device: Intel Corporation Ibex Peak High Definition Audio (rev 06)
00:1d.0 USB Controller: Intel Corporation Ibex Peak USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation Ibex Peak LPC Interface Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation Ibex Peak 6 port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation Ibex Peak SMBus Controller (rev 06)
01:00.0 VGA compatible controller: nVidia Corporation NV37GL [Quadro PCI-E Series] (rev a2)
02:02.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08)
First off, is there a way for the Intel integrated graphics to appear in the list of PCI devices when it's not being used for driving the display?
Secondly, can I simply use the libdrm APIs to directly perform operations on the Intel integrated part? Does there exist any documentation describing the DRM APIs?
Finally, can I use the DRM APIs for using the GPU "media pipe" (architecturally different from the 3D graphics pipe)?
Thanks,
Hari
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2018-07-06 14:42 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-19 20:00 (no subject) Olivier Galibert
2012-07-19 20:00 ` [PATCH 1/9] intel gen4-5: fix the vue view in the fs Olivier Galibert
2012-07-26 17:18 ` [Mesa-dev] " Eric Anholt
2012-07-27 9:21 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 2/9] intel gen4-5: simplify the bfc copy in the sf Olivier Galibert
2012-07-26 17:20 ` Eric Anholt
2012-07-19 20:00 ` [PATCH 3/9] intel gen4-5: fix GL_VERTEX_PROGRAM_TWO_SIDE selection Olivier Galibert
2012-07-26 17:19 ` Eric Anholt
2012-07-19 20:00 ` [PATCH 4/9] intel gen4-5: Fix backface/frontface selection when one one color is written to Olivier Galibert
2012-07-20 17:01 ` Eric Anholt
2012-07-20 18:03 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 5/9] intel gen4-5: Compute the interpolation status for every variable in one place Olivier Galibert
2012-07-26 17:22 ` [Mesa-dev] " Eric Anholt
2012-07-27 9:12 ` Olivier Galibert
2012-07-19 20:00 ` [PATCH 6/9] intel gen4-5: Correctly setup the parameters in the sf Olivier Galibert
2012-07-19 20:00 ` [PATCH 7/9] intel gen4-5: Correctly handle flat vs. non-flat in the clipper Olivier Galibert
2012-07-19 20:00 ` [PATCH 8/9] intel gen4-5: Make noperspective clipping work Olivier Galibert
2012-07-19 20:00 ` [PATCH 9/9] intel gen4-5: Don't touch flatshaded values when clipping, only copy them Olivier Galibert
-- strict thread matches above, loose matches on Subject: below --
2018-07-06 14:42 (no subject) Christian König
2018-07-05 10:38 rosdi ablatiff
2017-01-16 16:28 Tony Whittam
2016-11-11 16:16 [PATCH i-g-t v5 1/4] lib: add igt_dummyload Daniel Vetter
2016-11-14 18:24 ` (no subject) Abdiel Janulgue
2015-06-12 17:09 [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() Dave Gordon
[not found] ` <1433789441-8295-1-git-send-email-david.s.gordon@intel.com>
2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon
2015-06-12 20:25 ` (no subject) Dave Gordon
2015-06-17 11:04 ` Daniel Vetter
2015-06-17 12:41 ` Jani Nikula
2015-06-18 10:30 ` Dave Gordon
2015-04-14 10:10 Mika Kahola
2014-01-21 16:38 [PATCH] drm/i915: (VLV2) Fix the hotplug detection bits Todd Previte
2014-01-23 4:22 ` (no subject) Todd Previte
2013-12-30 2:29 Oravil Nair
2014-01-07 7:32 ` Daniel Vetter
[not found] <CAEyVMbDjLwcDFrQ7y4UtGp7HOT1wi5MB2EWLGTuOdJCKDWsUew@mail.gmail.com>
2013-04-03 15:46 ` Daniel Vetter
2012-05-31 18:00 Muhammad Jamil
2012-04-12 0:55 Rodrigo Vivi
2012-04-08 2:26 Muhammad Jamil
2012-04-05 6:44 Muhammad Jamil
2012-04-03 18:25 Muhammad Jamil
2012-04-03 12:42 Muhammad Jamil
2011-04-14 18:13 forcewake v4 (now with spinlock) Ben Widawsky
2011-04-14 18:13 ` (no subject) Ben Widawsky
2011-04-08 17:47 forcewake junk, RFC, RFT(test) Ben Widawsky
2011-04-09 20:26 ` forcewake junk, part2 Ben Widawsky
2011-04-09 20:26 ` (no subject) Ben Widawsky
2011-04-07 21:32 Jesse Barnes
2010-11-09 9:17 [PATCH] drm/i915/ringbuffer: set force wake bit before reading ring register Zou Nan hai
2010-11-09 9:17 ` Zou, Nanhai
2010-11-09 10:50 ` Chris Wilson
2010-11-10 0:36 ` Zou, Nanhai
2010-11-10 7:54 ` Chris Wilson
2010-11-10 18:47 ` Jesse Barnes
2010-11-17 22:52 ` (no subject) Thantry, Hariharan L
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).