From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com> To: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "C, Ramalingam" <ramalingam.c@intel.com>, "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org> Cc: "Auld, Matthew" <matthew.auld@intel.com> Subject: Re: [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj Date: Thu, 3 Mar 2022 08:04:29 +0000 [thread overview] Message-ID: <2f9fef4f88fd088735e3b7413f5727c1b1db23f1.camel@intel.com> (raw) In-Reply-To: <20220301215334.20543-5-ramalingam.c@intel.com> On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote: > When we are swapping out the local memory obj on flat-ccs capable > platform, > we need to capture the ccs data too along with main meory and we need > to > restore it when we are swapping in the content. > > When lmem object is swapped into a smem obj, smem obj will > have the extra pages required to hold the ccs data corresponding to > the > lmem main memory. So main memory of lmem will be copied into the > initial > pages of the smem and then ccs data corresponding to the main memory > will be copied to the subsequent pages of smem. ccs data is 1/256 of > lmem size. > > Swapin happens exactly in reverse order. First main memory of lmem is > restored from the smem's initial pages and the ccs data will be > restored > from the subsequent pages of smem. > > Extracting and restoring the CCS data is done through a special cmd > called > XY_CTRL_SURF_COPY_BLT > > v2: Fixing the ccs handling > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++- > -- > 1 file changed, 167 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c > b/drivers/gpu/drm/i915/gt/intel_migrate.c > index 330fcdc3e0cf..73ac7382aeb6 100644 > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c > @@ -341,12 +341,9 @@ static int emit_no_arbitration(struct > i915_request *rq) > return 0; > } > > -static int emit_pte(struct i915_request *rq, > - struct sgt_dma *it, > +static int emit_pte(struct i915_request *rq, struct sgt_dma *it, > enum i915_cache_level cache_level, > - bool is_lmem, > - u64 offset, > - int length) > + bool is_lmem, u64 offset, int length) Above change seems unrelated? > { > bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915); > const u64 encode = rq->context->vm->pte_encode(0, > cache_level, > @@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, > u64 src_addr, u64 dst_addr, > return cmd; > } > > +static int emit_ccs_copy(struct i915_request *rq, > + bool dst_is_lmem, u32 dst_offset, > + bool src_is_lmem, u32 src_offset, int size) > +{ > + struct drm_i915_private *i915 = rq->engine->i915; > + int mocs = rq->engine->gt->mocs.uc_index << 1; > + u32 num_ccs_blks, ccs_ring_size; > + u8 src_access, dst_access; > + u32 *cs; > + > + GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) || > !HAS_FLAT_CCS(i915)); > + > + ccs_ring_size = calc_ctrl_surf_instr_size(i915, size); > + WARN_ON(!ccs_ring_size); > + > + cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2)); > + if (IS_ERR(cs)) > + return PTR_ERR(cs); > + > + num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), > + NUM_CCS_BYTES_PER_BLOCK); > + > + src_access = !src_is_lmem && dst_is_lmem; > + dst_access = !src_access; > + > + cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS); > + cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset, > + src_access, dst_access, > + mocs, mocs, num_ccs_blks); > + cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS); > + if (ccs_ring_size & 1) > + *cs++ = MI_NOOP; > + > + intel_ring_advance(rq, cs); > + > + return 0; > +} > + > static int emit_copy(struct i915_request *rq, > - u32 dst_offset, u32 src_offset, int size) > + bool dst_is_lmem, u32 dst_offset, > + bool src_is_lmem, u32 src_offset, int size) > { > const int ver = GRAPHICS_VER(rq->engine->i915); > u32 instance = rq->engine->instance; > u32 *cs; > > cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6); > + > if (IS_ERR(cs)) > return PTR_ERR(cs); Changes to emit_copy() above seem unrelated? Also for the verbatim copy we need to adjust the compression flags in the main copy blit. > > @@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq, > return 0; > } > > +static int scatter_list_length(struct scatterlist *sg) > +{ > + int len = 0; > + > + while (sg) { Terminate loop if (sg_dma_len() == 0) ? > + len += sg_dma_len(sg); > + sg = sg_next(sg); > + }; > + > + return len; > +} > + > int > intel_context_migrate_copy(struct intel_context *ce, > const struct i915_deps *deps, > @@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context > *ce, > struct i915_request **out) Perhaps add a parameter "verbatim" to indicate whether we want to do a verbatim copy or not. That way we can differentiate between eviction (verbatim) and migration (ordinary blit)? > { > struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); > + struct drm_i915_private *i915 = ce->engine->i915; > + u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy; > struct i915_request *rq; > + bool ccs_copy = false; > int err; > > GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); > @@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context > *ce, > > GEM_BUG_ON(ce->ring->size < SZ_64K); > > + if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) { > + src_sz = scatter_list_length(src); > + dst_sz = scatter_list_length(dst); > + > + if (src_is_lmem) > + bytes_to_cpy = src_sz; > + else if (dst_is_lmem) > + bytes_to_cpy = dst_sz; > + > + /* > + * When there is a eviction of ccs needed smem will > have the > + * extra pages for the ccs data > + * > + * TO-DO: Want to move the size mismatch check to a > WARN_ON, > + * but still we have some requests of smem->lmem with > same size. > + * Need to fix it. > + */ > + ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915, > bytes_to_cpy) : 0; > + } > + > do { > - u32 src_offset, dst_offset; > - int len; > + u32 src_offset, dst_offset, copy_sz; > > rq = i915_request_create(ce); > if (IS_ERR(rq)) { > @@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context > *ce, > dst_offset = 2 * CHUNK_SZ; > } > > - len = emit_pte(rq, &it_src, src_cache_level, > src_is_lmem, > - src_offset, CHUNK_SZ); > - if (len <= 0) { > - err = len; > + if (ccs_copy) { This loop was hard to understand already before this patch. Could we try to break out some loop functionality into separate functions? Also if I understand the flow correctly, We're first blitting all the chunks of the main surface, and after that the CCS data? However for the control surface blit indirect addressing of LMEM to work, I figure *all* main surface LMEM pages for which we blit control data need to be present in the CHUNK_SZ window VMA, which is only true for small buffers. Hence we need to interleave main surface and CCS copies when we need to split the main surface into chunks, perhaps something like for_each_chunk() { disable_preemption(); emit_pte(lmem); emit_pte(system); xy_fast_copy_blt(); emit_pte(system_ccs_region); // Still use the system window for this tlb_flush(); // Flush the updated system ptes xy_ctrl_surf_copy_blt(); enable_preemption(); } And also check whether we need to do the ctrl_surface blit first depending on blit direction (according to the docs). Thanks, Thomas ---------------------------------------------------------------------- Intel Sweden AB Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden Registration Number: 556189-6027 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
WARNING: multiple messages have this Message-ID (diff)
From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com> To: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>, "C, Ramalingam" <ramalingam.c@intel.com>, "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org> Cc: "Auld, Matthew" <matthew.auld@intel.com> Subject: Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj Date: Thu, 3 Mar 2022 08:04:29 +0000 [thread overview] Message-ID: <2f9fef4f88fd088735e3b7413f5727c1b1db23f1.camel@intel.com> (raw) In-Reply-To: <20220301215334.20543-5-ramalingam.c@intel.com> On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote: > When we are swapping out the local memory obj on flat-ccs capable > platform, > we need to capture the ccs data too along with main meory and we need > to > restore it when we are swapping in the content. > > When lmem object is swapped into a smem obj, smem obj will > have the extra pages required to hold the ccs data corresponding to > the > lmem main memory. So main memory of lmem will be copied into the > initial > pages of the smem and then ccs data corresponding to the main memory > will be copied to the subsequent pages of smem. ccs data is 1/256 of > lmem size. > > Swapin happens exactly in reverse order. First main memory of lmem is > restored from the smem's initial pages and the ccs data will be > restored > from the subsequent pages of smem. > > Extracting and restoring the CCS data is done through a special cmd > called > XY_CTRL_SURF_COPY_BLT > > v2: Fixing the ccs handling > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++- > -- > 1 file changed, 167 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c > b/drivers/gpu/drm/i915/gt/intel_migrate.c > index 330fcdc3e0cf..73ac7382aeb6 100644 > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c > @@ -341,12 +341,9 @@ static int emit_no_arbitration(struct > i915_request *rq) > return 0; > } > > -static int emit_pte(struct i915_request *rq, > - struct sgt_dma *it, > +static int emit_pte(struct i915_request *rq, struct sgt_dma *it, > enum i915_cache_level cache_level, > - bool is_lmem, > - u64 offset, > - int length) > + bool is_lmem, u64 offset, int length) Above change seems unrelated? > { > bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915); > const u64 encode = rq->context->vm->pte_encode(0, > cache_level, > @@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, > u64 src_addr, u64 dst_addr, > return cmd; > } > > +static int emit_ccs_copy(struct i915_request *rq, > + bool dst_is_lmem, u32 dst_offset, > + bool src_is_lmem, u32 src_offset, int size) > +{ > + struct drm_i915_private *i915 = rq->engine->i915; > + int mocs = rq->engine->gt->mocs.uc_index << 1; > + u32 num_ccs_blks, ccs_ring_size; > + u8 src_access, dst_access; > + u32 *cs; > + > + GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) || > !HAS_FLAT_CCS(i915)); > + > + ccs_ring_size = calc_ctrl_surf_instr_size(i915, size); > + WARN_ON(!ccs_ring_size); > + > + cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2)); > + if (IS_ERR(cs)) > + return PTR_ERR(cs); > + > + num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size), > + NUM_CCS_BYTES_PER_BLOCK); > + > + src_access = !src_is_lmem && dst_is_lmem; > + dst_access = !src_access; > + > + cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS); > + cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset, > + src_access, dst_access, > + mocs, mocs, num_ccs_blks); > + cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS); > + if (ccs_ring_size & 1) > + *cs++ = MI_NOOP; > + > + intel_ring_advance(rq, cs); > + > + return 0; > +} > + > static int emit_copy(struct i915_request *rq, > - u32 dst_offset, u32 src_offset, int size) > + bool dst_is_lmem, u32 dst_offset, > + bool src_is_lmem, u32 src_offset, int size) > { > const int ver = GRAPHICS_VER(rq->engine->i915); > u32 instance = rq->engine->instance; > u32 *cs; > > cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6); > + > if (IS_ERR(cs)) > return PTR_ERR(cs); Changes to emit_copy() above seem unrelated? Also for the verbatim copy we need to adjust the compression flags in the main copy blit. > > @@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq, > return 0; > } > > +static int scatter_list_length(struct scatterlist *sg) > +{ > + int len = 0; > + > + while (sg) { Terminate loop if (sg_dma_len() == 0) ? > + len += sg_dma_len(sg); > + sg = sg_next(sg); > + }; > + > + return len; > +} > + > int > intel_context_migrate_copy(struct intel_context *ce, > const struct i915_deps *deps, > @@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context > *ce, > struct i915_request **out) Perhaps add a parameter "verbatim" to indicate whether we want to do a verbatim copy or not. That way we can differentiate between eviction (verbatim) and migration (ordinary blit)? > { > struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); > + struct drm_i915_private *i915 = ce->engine->i915; > + u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy; > struct i915_request *rq; > + bool ccs_copy = false; > int err; > > GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); > @@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context > *ce, > > GEM_BUG_ON(ce->ring->size < SZ_64K); > > + if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) { > + src_sz = scatter_list_length(src); > + dst_sz = scatter_list_length(dst); > + > + if (src_is_lmem) > + bytes_to_cpy = src_sz; > + else if (dst_is_lmem) > + bytes_to_cpy = dst_sz; > + > + /* > + * When there is a eviction of ccs needed smem will > have the > + * extra pages for the ccs data > + * > + * TO-DO: Want to move the size mismatch check to a > WARN_ON, > + * but still we have some requests of smem->lmem with > same size. > + * Need to fix it. > + */ > + ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915, > bytes_to_cpy) : 0; > + } > + > do { > - u32 src_offset, dst_offset; > - int len; > + u32 src_offset, dst_offset, copy_sz; > > rq = i915_request_create(ce); > if (IS_ERR(rq)) { > @@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context > *ce, > dst_offset = 2 * CHUNK_SZ; > } > > - len = emit_pte(rq, &it_src, src_cache_level, > src_is_lmem, > - src_offset, CHUNK_SZ); > - if (len <= 0) { > - err = len; > + if (ccs_copy) { This loop was hard to understand already before this patch. Could we try to break out some loop functionality into separate functions? Also if I understand the flow correctly, We're first blitting all the chunks of the main surface, and after that the CCS data? However for the control surface blit indirect addressing of LMEM to work, I figure *all* main surface LMEM pages for which we blit control data need to be present in the CHUNK_SZ window VMA, which is only true for small buffers. Hence we need to interleave main surface and CCS copies when we need to split the main surface into chunks, perhaps something like for_each_chunk() { disable_preemption(); emit_pte(lmem); emit_pte(system); xy_fast_copy_blt(); emit_pte(system_ccs_region); // Still use the system window for this tlb_flush(); // Flush the updated system ptes xy_ctrl_surf_copy_blt(); enable_preemption(); } And also check whether we need to do the ctrl_surface blit first depending on blit direction (according to the docs). Thanks, Thomas ---------------------------------------------------------------------- Intel Sweden AB Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden Registration Number: 556189-6027 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
next prev parent reply other threads:[~2022-03-03 8:04 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-01 21:53 [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object Ramalingam C 2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C 2022-03-01 21:53 ` [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms Ramalingam C 2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C 2022-03-03 7:39 ` Hellstrom, Thomas 2022-03-03 7:39 ` [Intel-gfx] " Hellstrom, Thomas 2022-03-01 21:53 ` [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt Ramalingam C 2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C 2022-03-02 12:54 ` Thomas Hellström 2022-03-02 12:54 ` [Intel-gfx] " Thomas Hellström 2022-03-02 13:24 ` Christian König 2022-03-02 13:24 ` [Intel-gfx] " Christian König 2022-03-01 21:53 ` [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data Ramalingam C 2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C 2022-03-02 12:58 ` Thomas Hellström 2022-03-02 12:58 ` [Intel-gfx] " Thomas Hellström 2022-03-01 21:53 ` [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj Ramalingam C 2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C 2022-03-03 8:04 ` Hellstrom, Thomas [this message] 2022-03-03 8:04 ` Hellstrom, Thomas 2022-03-02 1:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/ttm: Evict and store of compressed object (rev2) Patchwork 2022-03-02 1:53 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork 2022-03-02 2:23 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2022-03-02 8:28 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork 2022-03-02 15:31 ` [Intel-gfx] [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object Das, Nirmoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2f9fef4f88fd088735e3b7413f5727c1b1db23f1.camel@intel.com \ --to=thomas.hellstrom@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ --cc=matthew.auld@intel.com \ --cc=ramalingam.c@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.