dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fixed-width mask/bit helpers
@ 2023-05-09  5:14 Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 1/3] drm/amd: Remove wrapper macros over get_u{32,16,8} Lucas De Marchi
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09  5:14 UTC (permalink / raw)
  To: intel-gfx, intel-xe, dri-devel
  Cc: Masahiro Yamada, Kevin Brodsky, Lucas De Marchi, linux-kernel,
	Christian König, Alex Deucher, Thomas Gleixner,
	Andy Shevchenko, Andrew Morton

Generalize the REG_GENMASK*() and REG_BIT*() macros so they can be used
by other drivers. The intention is to migrate i915 to the generic
helpers and also make use of them on the upcoming xe driver. There are
possibly other users in the kernel that need u32/u16/u8 bit handling.

First patch is one of the possible alternatives in radeon/amdgpu drivers
so they use the U32() that is planned to be used here. Other
alternatives would be to use a amd/radeon prefix or use a _Generic().

Last patch is a temporary one to demonstrate what would be changed on
the i915 side. However instead of replacing the implementation of the
REG_* macros, the goal is to replace the callers as well.

Patches here are currently based on drm-tip branch.

Lucas De Marchi (3):
  drm/amd: Remove wrapper macros over get_u{32,16,8}
  linux/bits.h: Add fixed-width GENMASK and BIT macros
  drm/i915: Temporary conversion to new GENMASK/BIT macros

 drivers/gpu/drm/amd/amdgpu/atom.c       | 212 ++++++++++++------------
 drivers/gpu/drm/amd/include/atom-bits.h |   9 +-
 drivers/gpu/drm/i915/i915_reg_defs.h    |  28 +---
 drivers/gpu/drm/radeon/atom-bits.h      |   9 +-
 drivers/gpu/drm/radeon/atom.c           | 209 +++++++++++------------
 include/linux/bits.h                    |  22 +++
 include/uapi/linux/const.h              |   2 +
 include/vdso/const.h                    |   1 +
 8 files changed, 249 insertions(+), 243 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] drm/amd: Remove wrapper macros over get_u{32,16,8}
  2023-05-09  5:14 [PATCH 0/3] Fixed-width mask/bit helpers Lucas De Marchi
@ 2023-05-09  5:14 ` Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros Lucas De Marchi
  2 siblings, 0 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09  5:14 UTC (permalink / raw)
  To: intel-gfx, intel-xe, dri-devel
  Cc: Masahiro Yamada, Kevin Brodsky, Lucas De Marchi, linux-kernel,
	Christian König, Alex Deucher, Thomas Gleixner,
	Andy Shevchenko, Andrew Morton

Both amdgpu and radeon use some wrapper macros over get_u{32,16,8}()
functions which end up adding an implicit argument. Instead of using
the macros, just call the functions directly without hiding the context
that is being passed. This will allow the macros to be used in a more
global context like ULL() and UL() currently are.

Callers are automatically converted with the following coccinelle
script:

	$ cat utype.cocci
	virtual patch

	@@
	expression e;
	@@
	(
	- U32(e)
	+ get_u32(ctx->ctx->bios, e)
	|
	- U16(e)
	+ get_u16(ctx->ctx->bios, e)
	|
	- U8(e)
	+ get_u8(ctx->ctx->bios, e)
	|
	- CU32(e)
	+ get_u32(ctx->bios, e)
	|
	- CU16(e)
	+ get_u16(ctx->bios, e)
	|
	- CU8(e)
	+ get_u8(ctx->bios, e)
	)

	$ coccicheck SPFLAGS=--in-place MODE=patch \
		COCCI=utype.cocci \
		M=./drivers/gpu/drm/

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/amd/amdgpu/atom.c       | 212 ++++++++++++------------
 drivers/gpu/drm/amd/include/atom-bits.h |   9 +-
 drivers/gpu/drm/radeon/atom-bits.h      |   9 +-
 drivers/gpu/drm/radeon/atom.c           | 209 +++++++++++------------
 4 files changed, 219 insertions(+), 220 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c
index 1c5d9388ad0b..eea49bfb403f 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -112,62 +112,62 @@ static uint32_t atom_iio_execute(struct atom_context *ctx, int base,
 	uint32_t temp = 0xCDCDCDCD;
 
 	while (1)
-		switch (CU8(base)) {
+		switch (get_u8(ctx->bios, base)) {
 		case ATOM_IIO_NOP:
 			base++;
 			break;
 		case ATOM_IIO_READ:
-			temp = ctx->card->reg_read(ctx->card, CU16(base + 1));
+			temp = ctx->card->reg_read(ctx->card,
+						   get_u16(ctx->bios, base + 1));
 			base += 3;
 			break;
 		case ATOM_IIO_WRITE:
-			ctx->card->reg_write(ctx->card, CU16(base + 1), temp);
+			ctx->card->reg_write(ctx->card,
+				             get_u16(ctx->bios, base + 1),
+				             temp);
 			base += 3;
 			break;
 		case ATOM_IIO_CLEAR:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 2));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 2));
 			base += 3;
 			break;
 		case ATOM_IIO_SET:
 			temp |=
-			    (0xFFFFFFFF >> (32 - CU8(base + 1))) << CU8(base +
-									2);
+			    (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) << get_u8(ctx->bios,
+										         base + 2);
 			base += 3;
 			break;
 		case ATOM_IIO_MOVE_INDEX:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
-			    ((index >> CU8(base + 2)) &
-			     (0xFFFFFFFF >> (32 - CU8(base + 1)))) << CU8(base +
-									  3);
+			    ((index >> get_u8(ctx->bios, base + 2)) &
+			     (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1)))) << get_u8(ctx->bios,
+										           base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_MOVE_DATA:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
-			    ((data >> CU8(base + 2)) &
-			     (0xFFFFFFFF >> (32 - CU8(base + 1)))) << CU8(base +
-									  3);
+			    ((data >> get_u8(ctx->bios, base + 2)) &
+			     (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1)))) << get_u8(ctx->bios,
+										           base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_MOVE_ATTR:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
 			    ((ctx->
-			      io_attr >> CU8(base + 2)) & (0xFFFFFFFF >> (32 -
-									  CU8
-									  (base
-									   +
-									   1))))
-			    << CU8(base + 3);
+			      io_attr >> get_u8(ctx->bios, base + 2)) & (0xFFFFFFFF >> (32 -
+										        get_u8(ctx->bios, base + 1))))
+			    << get_u8(ctx->bios, base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_END:
@@ -187,7 +187,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 	align = (attr >> 3) & 7;
 	switch (arg) {
 	case ATOM_ARG_REG:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		if (print)
 			DEBUG("REG[0x%04X]", idx);
@@ -219,7 +219,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_PS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		/* get_unaligned_le32 avoids unaligned accesses from atombios
 		 * tables, noticed on a DEC Alpha. */
@@ -228,7 +228,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 			DEBUG("PS[0x%02X,0x%04X]", idx, val);
 		break;
 	case ATOM_ARG_WS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("WS[0x%02X]", idx);
@@ -265,7 +265,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_ID:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		if (print) {
 			if (gctx->data_block)
@@ -273,10 +273,10 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 			else
 				DEBUG("ID[0x%04X]", idx);
 		}
-		val = U32(idx + gctx->data_block);
+		val = get_u32(ctx->ctx->bios, idx + gctx->data_block);
 		break;
 	case ATOM_ARG_FB:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if ((gctx->fb_base + (idx * 4)) > gctx->scratch_size_bytes) {
 			DRM_ERROR("ATOM: fb read beyond scratch region: %d vs. %d\n",
@@ -290,7 +290,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 	case ATOM_ARG_IMM:
 		switch (align) {
 		case ATOM_SRC_DWORD:
-			val = U32(*ptr);
+			val = get_u32(ctx->ctx->bios, *ptr);
 			(*ptr) += 4;
 			if (print)
 				DEBUG("IMM 0x%08X\n", val);
@@ -298,7 +298,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		case ATOM_SRC_WORD0:
 		case ATOM_SRC_WORD8:
 		case ATOM_SRC_WORD16:
-			val = U16(*ptr);
+			val = get_u16(ctx->ctx->bios, *ptr);
 			(*ptr) += 2;
 			if (print)
 				DEBUG("IMM 0x%04X\n", val);
@@ -307,7 +307,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		case ATOM_SRC_BYTE8:
 		case ATOM_SRC_BYTE16:
 		case ATOM_SRC_BYTE24:
-			val = U8(*ptr);
+			val = get_u8(ctx->ctx->bios, *ptr);
 			(*ptr)++;
 			if (print)
 				DEBUG("IMM 0x%02X\n", val);
@@ -315,14 +315,14 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		return 0;
 	case ATOM_ARG_PLL:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("PLL[0x%02X]", idx);
 		val = gctx->card->pll_read(gctx->card, idx);
 		break;
 	case ATOM_ARG_MC:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("MC[0x%02X]", idx);
@@ -410,20 +410,20 @@ static uint32_t atom_get_src_direct(atom_exec_context *ctx, uint8_t align, int *
 
 	switch (align) {
 	case ATOM_SRC_DWORD:
-		val = U32(*ptr);
+		val = get_u32(ctx->ctx->bios, *ptr);
 		(*ptr) += 4;
 		break;
 	case ATOM_SRC_WORD0:
 	case ATOM_SRC_WORD8:
 	case ATOM_SRC_WORD16:
-		val = U16(*ptr);
+		val = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		break;
 	case ATOM_SRC_BYTE0:
 	case ATOM_SRC_BYTE8:
 	case ATOM_SRC_BYTE16:
 	case ATOM_SRC_BYTE24:
-		val = U8(*ptr);
+		val = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		break;
 	}
@@ -460,7 +460,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 	val |= saved;
 	switch (arg) {
 	case ATOM_ARG_REG:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		DEBUG("REG[0x%04X]", idx);
 		idx += gctx->reg_block;
@@ -493,13 +493,13 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_PS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("PS[0x%02X]", idx);
 		ctx->ps[idx] = cpu_to_le32(val);
 		break;
 	case ATOM_ARG_WS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("WS[0x%02X]", idx);
 		switch (idx) {
@@ -532,7 +532,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_FB:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if ((gctx->fb_base + (idx * 4)) > gctx->scratch_size_bytes) {
 			DRM_ERROR("ATOM: fb write beyond scratch region: %d vs. %d\n",
@@ -542,13 +542,13 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		DEBUG("FB[0x%02X]", idx);
 		break;
 	case ATOM_ARG_PLL:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("PLL[0x%02X]", idx);
 		gctx->card->pll_write(gctx->card, idx, val);
 		break;
 	case ATOM_ARG_MC:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("MC[0x%02X]", idx);
 		gctx->card->mc_write(gctx->card, idx, val);
@@ -584,7 +584,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 
 static void atom_op_add(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -598,7 +598,7 @@ static void atom_op_add(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_and(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -617,14 +617,14 @@ static void atom_op_beep(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_calltable(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int idx = U8((*ptr)++);
+	int idx = get_u8(ctx->ctx->bios, (*ptr)++);
 	int r = 0;
 
 	if (idx < ATOM_TABLE_NAMES_CNT)
 		SDEBUG("   table: %d (%s)\n", idx, atom_table_names[idx]);
 	else
 		SDEBUG("   table: %d\n", idx);
-	if (U16(ctx->ctx->cmd_table + 4 + 2 * idx))
+	if (get_u16(ctx->ctx->bios, ctx->ctx->cmd_table + 4 + 2 * idx))
 		r = amdgpu_atom_execute_table_locked(ctx->ctx, idx, ctx->ps + ctx->ps_shift);
 	if (r) {
 		ctx->abort = true;
@@ -633,7 +633,7 @@ static void atom_op_calltable(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_clear(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t saved;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -645,7 +645,7 @@ static void atom_op_clear(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_compare(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -659,7 +659,7 @@ static void atom_op_compare(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_delay(atom_exec_context *ctx, int *ptr, int arg)
 {
-	unsigned count = U8((*ptr)++);
+	unsigned count = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("   count: %d\n", count);
 	if (arg == ATOM_UNIT_MICROSEC)
 		udelay(count);
@@ -671,7 +671,7 @@ static void atom_op_delay(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_div(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -689,7 +689,7 @@ static void atom_op_div(atom_exec_context *ctx, int *ptr, int arg)
 static void atom_op_div32(atom_exec_context *ctx, int *ptr, int arg)
 {
 	uint64_t val64;
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -714,7 +714,7 @@ static void atom_op_eot(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_jump(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int execute = 0, target = U16(*ptr);
+	int execute = 0, target = get_u16(ctx->ctx->bios, *ptr);
 	unsigned long cjiffies;
 
 	(*ptr) += 2;
@@ -768,7 +768,7 @@ static void atom_op_jump(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_mask(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, mask, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -785,7 +785,7 @@ static void atom_op_mask(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_move(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t src, saved;
 	int dptr = *ptr;
 	if (((attr >> 3) & 7) != ATOM_SRC_DWORD)
@@ -802,7 +802,7 @@ static void atom_op_move(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_mul(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -814,7 +814,7 @@ static void atom_op_mul(atom_exec_context *ctx, int *ptr, int arg)
 static void atom_op_mul32(atom_exec_context *ctx, int *ptr, int arg)
 {
 	uint64_t val64;
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -832,7 +832,7 @@ static void atom_op_nop(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_or(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -846,7 +846,7 @@ static void atom_op_or(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_postcard(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t val = U8((*ptr)++);
+	uint8_t val = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("POST card output: 0x%02X\n", val);
 }
 
@@ -867,7 +867,7 @@ static void atom_op_savereg(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_setdatablock(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int idx = U8(*ptr);
+	int idx = get_u8(ctx->ctx->bios, *ptr);
 	(*ptr)++;
 	SDEBUG("   block: %d\n", idx);
 	if (!idx)
@@ -875,13 +875,14 @@ static void atom_op_setdatablock(atom_exec_context *ctx, int *ptr, int arg)
 	else if (idx == 255)
 		ctx->ctx->data_block = ctx->start;
 	else
-		ctx->ctx->data_block = U16(ctx->ctx->data_table + 4 + 2 * idx);
+		ctx->ctx->data_block = get_u16(ctx->ctx->bios,
+					       ctx->ctx->data_table + 4 + 2 * idx);
 	SDEBUG("   base: 0x%04X\n", ctx->ctx->data_block);
 }
 
 static void atom_op_setfbbase(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("   fb_base: ");
 	ctx->ctx->fb_base = atom_get_src(ctx, attr, ptr);
 }
@@ -891,7 +892,7 @@ static void atom_op_setport(atom_exec_context *ctx, int *ptr, int arg)
 	int port;
 	switch (arg) {
 	case ATOM_PORT_ATI:
-		port = U16(*ptr);
+		port = get_u16(ctx->ctx->bios, *ptr);
 		if (port < ATOM_IO_NAMES_CNT)
 			SDEBUG("   port: %d (%s)\n", port, atom_io_names[port]);
 		else
@@ -915,14 +916,14 @@ static void atom_op_setport(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_setregblock(atom_exec_context *ctx, int *ptr, int arg)
 {
-	ctx->ctx->reg_block = U16(*ptr);
+	ctx->ctx->reg_block = get_u16(ctx->ctx->bios, *ptr);
 	(*ptr) += 2;
 	SDEBUG("   base: 0x%04X\n", ctx->ctx->reg_block);
 }
 
 static void atom_op_shift_left(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -938,7 +939,7 @@ static void atom_op_shift_left(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shift_right(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -954,7 +955,7 @@ static void atom_op_shift_right(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shl(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	uint32_t dst_align = atom_dst_to_src[(attr >> 3) & 7][(attr >> 6) & 3];
@@ -973,7 +974,7 @@ static void atom_op_shl(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shr(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	uint32_t dst_align = atom_dst_to_src[(attr >> 3) & 7][(attr >> 6) & 3];
@@ -992,7 +993,7 @@ static void atom_op_shr(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_sub(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -1006,18 +1007,18 @@ static void atom_op_sub(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_switch(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t src, val, target;
 	SDEBUG("   switch: ");
 	src = atom_get_src(ctx, attr, ptr);
-	while (U16(*ptr) != ATOM_CASE_END)
-		if (U8(*ptr) == ATOM_CASE_MAGIC) {
+	while (get_u16(ctx->ctx->bios, *ptr) != ATOM_CASE_END)
+		if (get_u8(ctx->ctx->bios, *ptr) == ATOM_CASE_MAGIC) {
 			(*ptr)++;
 			SDEBUG("   case: ");
 			val =
 			    atom_get_src(ctx, (attr & 0x38) | ATOM_ARG_IMM,
 					 ptr);
-			target = U16(*ptr);
+			target = get_u16(ctx->ctx->bios, *ptr);
 			if (val == src) {
 				SDEBUG("   target: %04X\n", target);
 				*ptr = ctx->start + target;
@@ -1033,7 +1034,7 @@ static void atom_op_switch(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_test(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -1045,7 +1046,7 @@ static void atom_op_test(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_xor(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -1059,13 +1060,13 @@ static void atom_op_xor(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_debug(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t val = U8((*ptr)++);
+	uint8_t val = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("DEBUG output: 0x%02X\n", val);
 }
 
 static void atom_op_processds(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint16_t val = U16(*ptr);
+	uint16_t val = get_u16(ctx->ctx->bios, *ptr);
 	(*ptr) += val + 2;
 	SDEBUG("PROCESSDS output: 0x%02X\n", val);
 }
@@ -1206,7 +1207,7 @@ static struct {
 
 static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index, uint32_t *params)
 {
-	int base = CU16(ctx->cmd_table + 4 + 2 * index);
+	int base = get_u16(ctx->bios, ctx->cmd_table + 4 + 2 * index);
 	int len, ws, ps, ptr;
 	unsigned char op;
 	atom_exec_context ectx;
@@ -1215,9 +1216,9 @@ static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index,
 	if (!base)
 		return -EINVAL;
 
-	len = CU16(base + ATOM_CT_SIZE_PTR);
-	ws = CU8(base + ATOM_CT_WS_PTR);
-	ps = CU8(base + ATOM_CT_PS_PTR) & ATOM_CT_PS_MASK;
+	len = get_u16(ctx->bios, base + ATOM_CT_SIZE_PTR);
+	ws = get_u8(ctx->bios, base + ATOM_CT_WS_PTR);
+	ps = get_u8(ctx->bios, base + ATOM_CT_PS_PTR) & ATOM_CT_PS_MASK;
 	ptr = base + ATOM_CT_CODE_PTR;
 
 	SDEBUG(">> execute %04X (len %d, WS %d, PS %d)\n", base, len, ws, ps);
@@ -1235,7 +1236,7 @@ static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index,
 
 	debug_depth++;
 	while (1) {
-		op = CU8(ptr++);
+		op = get_u8(ctx->bios, ptr++);
 		if (op < ATOM_OP_NAMES_CNT)
 			SDEBUG("%s @ 0x%04X\n", atom_op_names[op], ptr - 1);
 		else
@@ -1293,11 +1294,11 @@ static void atom_index_iio(struct atom_context *ctx, int base)
 	ctx->iio = kzalloc(2 * 256, GFP_KERNEL);
 	if (!ctx->iio)
 		return;
-	while (CU8(base) == ATOM_IIO_START) {
-		ctx->iio[CU8(base + 1)] = base + 2;
+	while (get_u8(ctx->bios, base) == ATOM_IIO_START) {
+		ctx->iio[get_u8(ctx->bios, base + 1)] = base + 2;
 		base += 2;
-		while (CU8(base) != ATOM_IIO_END)
-			base += atom_iio_len[CU8(base)];
+		while (get_u8(ctx->bios, base) != ATOM_IIO_END)
+			base += atom_iio_len[get_u8(ctx->bios, base)];
 		base += 3;
 	}
 }
@@ -1472,7 +1473,7 @@ struct atom_context *amdgpu_atom_parse(struct card_info *card, void *bios)
 	ctx->card = card;
 	ctx->bios = bios;
 
-	if (CU16(0) != ATOM_BIOS_MAGIC) {
+	if (get_u16(ctx->bios, 0) != ATOM_BIOS_MAGIC) {
 		pr_info("Invalid BIOS magic\n");
 		kfree(ctx);
 		return NULL;
@@ -1485,7 +1486,7 @@ struct atom_context *amdgpu_atom_parse(struct card_info *card, void *bios)
 		return NULL;
 	}
 
-	base = CU16(ATOM_ROM_TABLE_PTR);
+	base = get_u16(ctx->bios, ATOM_ROM_TABLE_PTR);
 	if (strncmp
 	    (CSTR(base + ATOM_ROM_MAGIC_PTR), ATOM_ROM_MAGIC,
 	     strlen(ATOM_ROM_MAGIC))) {
@@ -1494,15 +1495,16 @@ struct atom_context *amdgpu_atom_parse(struct card_info *card, void *bios)
 		return NULL;
 	}
 
-	ctx->cmd_table = CU16(base + ATOM_ROM_CMD_PTR);
-	ctx->data_table = CU16(base + ATOM_ROM_DATA_PTR);
-	atom_index_iio(ctx, CU16(ctx->data_table + ATOM_DATA_IIO_PTR) + 4);
+	ctx->cmd_table = get_u16(ctx->bios, base + ATOM_ROM_CMD_PTR);
+	ctx->data_table = get_u16(ctx->bios, base + ATOM_ROM_DATA_PTR);
+	atom_index_iio(ctx,
+		       get_u16(ctx->bios, ctx->data_table + ATOM_DATA_IIO_PTR) + 4);
 	if (!ctx->iio) {
 		amdgpu_atom_destroy(ctx);
 		return NULL;
 	}
 
-	idx = CU16(ATOM_ROM_PART_NUMBER_PTR);
+	idx = get_u16(ctx->bios, ATOM_ROM_PART_NUMBER_PTR);
 	if (idx == 0)
 		idx = 0x80;
 
@@ -1533,18 +1535,18 @@ struct atom_context *amdgpu_atom_parse(struct card_info *card, void *bios)
 
 int amdgpu_atom_asic_init(struct atom_context *ctx)
 {
-	int hwi = CU16(ctx->data_table + ATOM_DATA_FWI_PTR);
+	int hwi = get_u16(ctx->bios, ctx->data_table + ATOM_DATA_FWI_PTR);
 	uint32_t ps[16];
 	int ret;
 
 	memset(ps, 0, 64);
 
-	ps[0] = cpu_to_le32(CU32(hwi + ATOM_FWI_DEFSCLK_PTR));
-	ps[1] = cpu_to_le32(CU32(hwi + ATOM_FWI_DEFMCLK_PTR));
+	ps[0] = cpu_to_le32(get_u32(ctx->bios, hwi + ATOM_FWI_DEFSCLK_PTR));
+	ps[1] = cpu_to_le32(get_u32(ctx->bios, hwi + ATOM_FWI_DEFMCLK_PTR));
 	if (!ps[0] || !ps[1])
 		return 1;
 
-	if (!CU16(ctx->cmd_table + 4 + 2 * ATOM_CMD_INIT))
+	if (!get_u16(ctx->bios, ctx->cmd_table + 4 + 2 * ATOM_CMD_INIT))
 		return 1;
 	ret = amdgpu_atom_execute_table(ctx, ATOM_CMD_INIT, ps);
 	if (ret)
@@ -1566,18 +1568,18 @@ bool amdgpu_atom_parse_data_header(struct atom_context *ctx, int index,
 			    uint16_t *data_start)
 {
 	int offset = index * 2 + 4;
-	int idx = CU16(ctx->data_table + offset);
+	int idx = get_u16(ctx->bios, ctx->data_table + offset);
 	u16 *mdt = (u16 *)(ctx->bios + ctx->data_table + 4);
 
 	if (!mdt[index])
 		return false;
 
 	if (size)
-		*size = CU16(idx);
+		*size = get_u16(ctx->bios, idx);
 	if (frev)
-		*frev = CU8(idx + 2);
+		*frev = get_u8(ctx->bios, idx + 2);
 	if (crev)
-		*crev = CU8(idx + 3);
+		*crev = get_u8(ctx->bios, idx + 3);
 	*data_start = idx;
 	return true;
 }
@@ -1586,16 +1588,16 @@ bool amdgpu_atom_parse_cmd_header(struct atom_context *ctx, int index, uint8_t *
 			   uint8_t *crev)
 {
 	int offset = index * 2 + 4;
-	int idx = CU16(ctx->cmd_table + offset);
+	int idx = get_u16(ctx->bios, ctx->cmd_table + offset);
 	u16 *mct = (u16 *)(ctx->bios + ctx->cmd_table + 4);
 
 	if (!mct[index])
 		return false;
 
 	if (frev)
-		*frev = CU8(idx + 2);
+		*frev = get_u8(ctx->bios, idx + 2);
 	if (crev)
-		*crev = CU8(idx + 3);
+		*crev = get_u8(ctx->bios, idx + 3);
 	return true;
 }
 
diff --git a/drivers/gpu/drm/amd/include/atom-bits.h b/drivers/gpu/drm/amd/include/atom-bits.h
index e8fae5c77514..28c196a91221 100644
--- a/drivers/gpu/drm/amd/include/atom-bits.h
+++ b/drivers/gpu/drm/amd/include/atom-bits.h
@@ -29,20 +29,17 @@ static inline uint8_t get_u8(void *bios, int ptr)
 {
     return ((unsigned char *)bios)[ptr];
 }
-#define U8(ptr) get_u8(ctx->ctx->bios, (ptr))
-#define CU8(ptr) get_u8(ctx->bios, (ptr))
+
 static inline uint16_t get_u16(void *bios, int ptr)
 {
     return get_u8(bios ,ptr)|(((uint16_t)get_u8(bios, ptr+1))<<8);
 }
-#define U16(ptr) get_u16(ctx->ctx->bios, (ptr))
-#define CU16(ptr) get_u16(ctx->bios, (ptr))
+
 static inline uint32_t get_u32(void *bios, int ptr)
 {
     return get_u16(bios, ptr)|(((uint32_t)get_u16(bios, ptr+2))<<16);
 }
-#define U32(ptr) get_u32(ctx->ctx->bios, (ptr))
-#define CU32(ptr) get_u32(ctx->bios, (ptr))
+
 #define CSTR(ptr) (((char *)(ctx->bios))+(ptr))
 
 #endif
diff --git a/drivers/gpu/drm/radeon/atom-bits.h b/drivers/gpu/drm/radeon/atom-bits.h
index e8fae5c77514..28c196a91221 100644
--- a/drivers/gpu/drm/radeon/atom-bits.h
+++ b/drivers/gpu/drm/radeon/atom-bits.h
@@ -29,20 +29,17 @@ static inline uint8_t get_u8(void *bios, int ptr)
 {
     return ((unsigned char *)bios)[ptr];
 }
-#define U8(ptr) get_u8(ctx->ctx->bios, (ptr))
-#define CU8(ptr) get_u8(ctx->bios, (ptr))
+
 static inline uint16_t get_u16(void *bios, int ptr)
 {
     return get_u8(bios ,ptr)|(((uint16_t)get_u8(bios, ptr+1))<<8);
 }
-#define U16(ptr) get_u16(ctx->ctx->bios, (ptr))
-#define CU16(ptr) get_u16(ctx->bios, (ptr))
+
 static inline uint32_t get_u32(void *bios, int ptr)
 {
     return get_u16(bios, ptr)|(((uint32_t)get_u16(bios, ptr+2))<<16);
 }
-#define U32(ptr) get_u32(ctx->ctx->bios, (ptr))
-#define CU32(ptr) get_u32(ctx->bios, (ptr))
+
 #define CSTR(ptr) (((char *)(ctx->bios))+(ptr))
 
 #endif
diff --git a/drivers/gpu/drm/radeon/atom.c b/drivers/gpu/drm/radeon/atom.c
index c1bbfbe28bda..1c54d52c4cb0 100644
--- a/drivers/gpu/drm/radeon/atom.c
+++ b/drivers/gpu/drm/radeon/atom.c
@@ -112,64 +112,65 @@ static uint32_t atom_iio_execute(struct atom_context *ctx, int base,
 	uint32_t temp = 0xCDCDCDCD;
 
 	while (1)
-		switch (CU8(base)) {
+		switch (get_u8(ctx->bios, base)) {
 		case ATOM_IIO_NOP:
 			base++;
 			break;
 		case ATOM_IIO_READ:
-			temp = ctx->card->ioreg_read(ctx->card, CU16(base + 1));
+			temp = ctx->card->ioreg_read(ctx->card,
+						     get_u16(ctx->bios, base + 1));
 			base += 3;
 			break;
 		case ATOM_IIO_WRITE:
 			if (rdev->family == CHIP_RV515)
-				(void)ctx->card->ioreg_read(ctx->card, CU16(base + 1));
-			ctx->card->ioreg_write(ctx->card, CU16(base + 1), temp);
+				(void)ctx->card->ioreg_read(ctx->card,
+						            get_u16(ctx->bios, base + 1));
+			ctx->card->ioreg_write(ctx->card,
+				               get_u16(ctx->bios, base + 1),
+				               temp);
 			base += 3;
 			break;
 		case ATOM_IIO_CLEAR:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 2));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 2));
 			base += 3;
 			break;
 		case ATOM_IIO_SET:
 			temp |=
-			    (0xFFFFFFFF >> (32 - CU8(base + 1))) << CU8(base +
-									2);
+			    (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) << get_u8(ctx->bios,
+										         base + 2);
 			base += 3;
 			break;
 		case ATOM_IIO_MOVE_INDEX:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
-			    ((index >> CU8(base + 2)) &
-			     (0xFFFFFFFF >> (32 - CU8(base + 1)))) << CU8(base +
-									  3);
+			    ((index >> get_u8(ctx->bios, base + 2)) &
+			     (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1)))) << get_u8(ctx->bios,
+										           base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_MOVE_DATA:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
-			    ((data >> CU8(base + 2)) &
-			     (0xFFFFFFFF >> (32 - CU8(base + 1)))) << CU8(base +
-									  3);
+			    ((data >> get_u8(ctx->bios, base + 2)) &
+			     (0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1)))) << get_u8(ctx->bios,
+										           base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_MOVE_ATTR:
 			temp &=
-			    ~((0xFFFFFFFF >> (32 - CU8(base + 1))) <<
-			      CU8(base + 3));
+			    ~((0xFFFFFFFF >> (32 - get_u8(ctx->bios, base + 1))) <<
+			      get_u8(ctx->bios, base + 3));
 			temp |=
 			    ((ctx->
-			      io_attr >> CU8(base + 2)) & (0xFFFFFFFF >> (32 -
-									  CU8
-									  (base
-									   +
-									   1))))
-			    << CU8(base + 3);
+			      io_attr >> get_u8(ctx->bios, base + 2)) & (0xFFFFFFFF >> (32 -
+										        get_u8(ctx->bios, base + 1))))
+			    << get_u8(ctx->bios, base + 3);
 			base += 4;
 			break;
 		case ATOM_IIO_END:
@@ -189,7 +190,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 	align = (attr >> 3) & 7;
 	switch (arg) {
 	case ATOM_ARG_REG:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		if (print)
 			DEBUG("REG[0x%04X]", idx);
@@ -221,7 +222,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_PS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		/* get_unaligned_le32 avoids unaligned accesses from atombios
 		 * tables, noticed on a DEC Alpha. */
@@ -230,7 +231,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 			DEBUG("PS[0x%02X,0x%04X]", idx, val);
 		break;
 	case ATOM_ARG_WS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("WS[0x%02X]", idx);
@@ -267,7 +268,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_ID:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		if (print) {
 			if (gctx->data_block)
@@ -275,10 +276,10 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 			else
 				DEBUG("ID[0x%04X]", idx);
 		}
-		val = U32(idx + gctx->data_block);
+		val = get_u32(ctx->ctx->bios, idx + gctx->data_block);
 		break;
 	case ATOM_ARG_FB:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if ((gctx->fb_base + (idx * 4)) > gctx->scratch_size_bytes) {
 			DRM_ERROR("ATOM: fb read beyond scratch region: %d vs. %d\n",
@@ -292,7 +293,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 	case ATOM_ARG_IMM:
 		switch (align) {
 		case ATOM_SRC_DWORD:
-			val = U32(*ptr);
+			val = get_u32(ctx->ctx->bios, *ptr);
 			(*ptr) += 4;
 			if (print)
 				DEBUG("IMM 0x%08X\n", val);
@@ -300,7 +301,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		case ATOM_SRC_WORD0:
 		case ATOM_SRC_WORD8:
 		case ATOM_SRC_WORD16:
-			val = U16(*ptr);
+			val = get_u16(ctx->ctx->bios, *ptr);
 			(*ptr) += 2;
 			if (print)
 				DEBUG("IMM 0x%04X\n", val);
@@ -309,7 +310,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		case ATOM_SRC_BYTE8:
 		case ATOM_SRC_BYTE16:
 		case ATOM_SRC_BYTE24:
-			val = U8(*ptr);
+			val = get_u8(ctx->ctx->bios, *ptr);
 			(*ptr)++;
 			if (print)
 				DEBUG("IMM 0x%02X\n", val);
@@ -317,14 +318,14 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, uint8_t attr,
 		}
 		return 0;
 	case ATOM_ARG_PLL:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("PLL[0x%02X]", idx);
 		val = gctx->card->pll_read(gctx->card, idx);
 		break;
 	case ATOM_ARG_MC:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if (print)
 			DEBUG("MC[0x%02X]", idx);
@@ -412,20 +413,20 @@ static uint32_t atom_get_src_direct(atom_exec_context *ctx, uint8_t align, int *
 
 	switch (align) {
 	case ATOM_SRC_DWORD:
-		val = U32(*ptr);
+		val = get_u32(ctx->ctx->bios, *ptr);
 		(*ptr) += 4;
 		break;
 	case ATOM_SRC_WORD0:
 	case ATOM_SRC_WORD8:
 	case ATOM_SRC_WORD16:
-		val = U16(*ptr);
+		val = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		break;
 	case ATOM_SRC_BYTE0:
 	case ATOM_SRC_BYTE8:
 	case ATOM_SRC_BYTE16:
 	case ATOM_SRC_BYTE24:
-		val = U8(*ptr);
+		val = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		break;
 	}
@@ -462,7 +463,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 	val |= saved;
 	switch (arg) {
 	case ATOM_ARG_REG:
-		idx = U16(*ptr);
+		idx = get_u16(ctx->ctx->bios, *ptr);
 		(*ptr) += 2;
 		DEBUG("REG[0x%04X]", idx);
 		idx += gctx->reg_block;
@@ -495,13 +496,13 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_PS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("PS[0x%02X]", idx);
 		ctx->ps[idx] = cpu_to_le32(val);
 		break;
 	case ATOM_ARG_WS:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("WS[0x%02X]", idx);
 		switch (idx) {
@@ -534,7 +535,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		}
 		break;
 	case ATOM_ARG_FB:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		if ((gctx->fb_base + (idx * 4)) > gctx->scratch_size_bytes) {
 			DRM_ERROR("ATOM: fb write beyond scratch region: %d vs. %d\n",
@@ -544,13 +545,13 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 		DEBUG("FB[0x%02X]", idx);
 		break;
 	case ATOM_ARG_PLL:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("PLL[0x%02X]", idx);
 		gctx->card->pll_write(gctx->card, idx, val);
 		break;
 	case ATOM_ARG_MC:
-		idx = U8(*ptr);
+		idx = get_u8(ctx->ctx->bios, *ptr);
 		(*ptr)++;
 		DEBUG("MC[0x%02X]", idx);
 		gctx->card->mc_write(gctx->card, idx, val);
@@ -586,7 +587,7 @@ static void atom_put_dst(atom_exec_context *ctx, int arg, uint8_t attr,
 
 static void atom_op_add(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -600,7 +601,7 @@ static void atom_op_add(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_and(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -619,14 +620,14 @@ static void atom_op_beep(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_calltable(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int idx = U8((*ptr)++);
+	int idx = get_u8(ctx->ctx->bios, (*ptr)++);
 	int r = 0;
 
 	if (idx < ATOM_TABLE_NAMES_CNT)
 		SDEBUG("   table: %d (%s)\n", idx, atom_table_names[idx]);
 	else
 		SDEBUG("   table: %d\n", idx);
-	if (U16(ctx->ctx->cmd_table + 4 + 2 * idx))
+	if (get_u16(ctx->ctx->bios, ctx->ctx->cmd_table + 4 + 2 * idx))
 		r = atom_execute_table_locked(ctx->ctx, idx, ctx->ps + ctx->ps_shift);
 	if (r) {
 		ctx->abort = true;
@@ -635,7 +636,7 @@ static void atom_op_calltable(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_clear(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t saved;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -647,7 +648,7 @@ static void atom_op_clear(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_compare(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -661,7 +662,7 @@ static void atom_op_compare(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_delay(atom_exec_context *ctx, int *ptr, int arg)
 {
-	unsigned count = U8((*ptr)++);
+	unsigned count = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("   count: %d\n", count);
 	if (arg == ATOM_UNIT_MICROSEC)
 		udelay(count);
@@ -673,7 +674,7 @@ static void atom_op_delay(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_div(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -695,7 +696,7 @@ static void atom_op_eot(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_jump(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int execute = 0, target = U16(*ptr);
+	int execute = 0, target = get_u16(ctx->ctx->bios, *ptr);
 	unsigned long cjiffies;
 
 	(*ptr) += 2;
@@ -748,7 +749,7 @@ static void atom_op_jump(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_mask(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, mask, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -765,7 +766,7 @@ static void atom_op_mask(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_move(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t src, saved;
 	int dptr = *ptr;
 	if (((attr >> 3) & 7) != ATOM_SRC_DWORD)
@@ -782,7 +783,7 @@ static void atom_op_move(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_mul(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -798,7 +799,7 @@ static void atom_op_nop(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_or(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -812,7 +813,7 @@ static void atom_op_or(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_postcard(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t val = U8((*ptr)++);
+	uint8_t val = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("POST card output: 0x%02X\n", val);
 }
 
@@ -833,7 +834,7 @@ static void atom_op_savereg(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_setdatablock(atom_exec_context *ctx, int *ptr, int arg)
 {
-	int idx = U8(*ptr);
+	int idx = get_u8(ctx->ctx->bios, *ptr);
 	(*ptr)++;
 	SDEBUG("   block: %d\n", idx);
 	if (!idx)
@@ -841,13 +842,14 @@ static void atom_op_setdatablock(atom_exec_context *ctx, int *ptr, int arg)
 	else if (idx == 255)
 		ctx->ctx->data_block = ctx->start;
 	else
-		ctx->ctx->data_block = U16(ctx->ctx->data_table + 4 + 2 * idx);
+		ctx->ctx->data_block = get_u16(ctx->ctx->bios,
+					       ctx->ctx->data_table + 4 + 2 * idx);
 	SDEBUG("   base: 0x%04X\n", ctx->ctx->data_block);
 }
 
 static void atom_op_setfbbase(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	SDEBUG("   fb_base: ");
 	ctx->ctx->fb_base = atom_get_src(ctx, attr, ptr);
 }
@@ -857,7 +859,7 @@ static void atom_op_setport(atom_exec_context *ctx, int *ptr, int arg)
 	int port;
 	switch (arg) {
 	case ATOM_PORT_ATI:
-		port = U16(*ptr);
+		port = get_u16(ctx->ctx->bios, *ptr);
 		if (port < ATOM_IO_NAMES_CNT)
 			SDEBUG("   port: %d (%s)\n", port, atom_io_names[port]);
 		else
@@ -881,14 +883,14 @@ static void atom_op_setport(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_setregblock(atom_exec_context *ctx, int *ptr, int arg)
 {
-	ctx->ctx->reg_block = U16(*ptr);
+	ctx->ctx->reg_block = get_u16(ctx->ctx->bios, *ptr);
 	(*ptr) += 2;
 	SDEBUG("   base: 0x%04X\n", ctx->ctx->reg_block);
 }
 
 static void atom_op_shift_left(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -904,7 +906,7 @@ static void atom_op_shift_left(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shift_right(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	attr &= 0x38;
@@ -920,7 +922,7 @@ static void atom_op_shift_right(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shl(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	uint32_t dst_align = atom_dst_to_src[(attr >> 3) & 7][(attr >> 6) & 3];
@@ -939,7 +941,7 @@ static void atom_op_shl(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_shr(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++), shift;
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++), shift;
 	uint32_t saved, dst;
 	int dptr = *ptr;
 	uint32_t dst_align = atom_dst_to_src[(attr >> 3) & 7][(attr >> 6) & 3];
@@ -958,7 +960,7 @@ static void atom_op_shr(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_sub(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -972,18 +974,18 @@ static void atom_op_sub(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_switch(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t src, val, target;
 	SDEBUG("   switch: ");
 	src = atom_get_src(ctx, attr, ptr);
-	while (U16(*ptr) != ATOM_CASE_END)
-		if (U8(*ptr) == ATOM_CASE_MAGIC) {
+	while (get_u16(ctx->ctx->bios, *ptr) != ATOM_CASE_END)
+		if (get_u8(ctx->ctx->bios, *ptr) == ATOM_CASE_MAGIC) {
 			(*ptr)++;
 			SDEBUG("   case: ");
 			val =
 			    atom_get_src(ctx, (attr & 0x38) | ATOM_ARG_IMM,
 					 ptr);
-			target = U16(*ptr);
+			target = get_u16(ctx->ctx->bios, *ptr);
 			if (val == src) {
 				SDEBUG("   target: %04X\n", target);
 				*ptr = ctx->start + target;
@@ -999,7 +1001,7 @@ static void atom_op_switch(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_test(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src;
 	SDEBUG("   src1: ");
 	dst = atom_get_dst(ctx, arg, attr, ptr, NULL, 1);
@@ -1011,7 +1013,7 @@ static void atom_op_test(atom_exec_context *ctx, int *ptr, int arg)
 
 static void atom_op_xor(atom_exec_context *ctx, int *ptr, int arg)
 {
-	uint8_t attr = U8((*ptr)++);
+	uint8_t attr = get_u8(ctx->ctx->bios, (*ptr)++);
 	uint32_t dst, src, saved;
 	int dptr = *ptr;
 	SDEBUG("   dst: ");
@@ -1158,7 +1160,7 @@ atom_op_debug, 0},};
 
 static int atom_execute_table_locked(struct atom_context *ctx, int index, uint32_t * params)
 {
-	int base = CU16(ctx->cmd_table + 4 + 2 * index);
+	int base = get_u16(ctx->bios, ctx->cmd_table + 4 + 2 * index);
 	int len, ws, ps, ptr;
 	unsigned char op;
 	atom_exec_context ectx;
@@ -1167,9 +1169,9 @@ static int atom_execute_table_locked(struct atom_context *ctx, int index, uint32
 	if (!base)
 		return -EINVAL;
 
-	len = CU16(base + ATOM_CT_SIZE_PTR);
-	ws = CU8(base + ATOM_CT_WS_PTR);
-	ps = CU8(base + ATOM_CT_PS_PTR) & ATOM_CT_PS_MASK;
+	len = get_u16(ctx->bios, base + ATOM_CT_SIZE_PTR);
+	ws = get_u8(ctx->bios, base + ATOM_CT_WS_PTR);
+	ps = get_u8(ctx->bios, base + ATOM_CT_PS_PTR) & ATOM_CT_PS_MASK;
 	ptr = base + ATOM_CT_CODE_PTR;
 
 	SDEBUG(">> execute %04X (len %d, WS %d, PS %d)\n", base, len, ws, ps);
@@ -1187,7 +1189,7 @@ static int atom_execute_table_locked(struct atom_context *ctx, int index, uint32
 
 	debug_depth++;
 	while (1) {
-		op = CU8(ptr++);
+		op = get_u8(ctx->bios, ptr++);
 		if (op < ATOM_OP_NAMES_CNT)
 			SDEBUG("%s @ 0x%04X\n", atom_op_names[op], ptr - 1);
 		else
@@ -1253,11 +1255,11 @@ static void atom_index_iio(struct atom_context *ctx, int base)
 	ctx->iio = kzalloc(2 * 256, GFP_KERNEL);
 	if (!ctx->iio)
 		return;
-	while (CU8(base) == ATOM_IIO_START) {
-		ctx->iio[CU8(base + 1)] = base + 2;
+	while (get_u8(ctx->bios, base) == ATOM_IIO_START) {
+		ctx->iio[get_u8(ctx->bios, base + 1)] = base + 2;
 		base += 2;
-		while (CU8(base) != ATOM_IIO_END)
-			base += atom_iio_len[CU8(base)];
+		while (get_u8(ctx->bios, base) != ATOM_IIO_END)
+			base += atom_iio_len[get_u8(ctx->bios, base)];
 		base += 3;
 	}
 }
@@ -1277,7 +1279,7 @@ struct atom_context *atom_parse(struct card_info *card, void *bios)
 	ctx->card = card;
 	ctx->bios = bios;
 
-	if (CU16(0) != ATOM_BIOS_MAGIC) {
+	if (get_u16(ctx->bios, 0) != ATOM_BIOS_MAGIC) {
 		pr_info("Invalid BIOS magic\n");
 		kfree(ctx);
 		return NULL;
@@ -1290,7 +1292,7 @@ struct atom_context *atom_parse(struct card_info *card, void *bios)
 		return NULL;
 	}
 
-	base = CU16(ATOM_ROM_TABLE_PTR);
+	base = get_u16(ctx->bios, ATOM_ROM_TABLE_PTR);
 	if (strncmp
 	    (CSTR(base + ATOM_ROM_MAGIC_PTR), ATOM_ROM_MAGIC,
 	     strlen(ATOM_ROM_MAGIC))) {
@@ -1299,15 +1301,16 @@ struct atom_context *atom_parse(struct card_info *card, void *bios)
 		return NULL;
 	}
 
-	ctx->cmd_table = CU16(base + ATOM_ROM_CMD_PTR);
-	ctx->data_table = CU16(base + ATOM_ROM_DATA_PTR);
-	atom_index_iio(ctx, CU16(ctx->data_table + ATOM_DATA_IIO_PTR) + 4);
+	ctx->cmd_table = get_u16(ctx->bios, base + ATOM_ROM_CMD_PTR);
+	ctx->data_table = get_u16(ctx->bios, base + ATOM_ROM_DATA_PTR);
+	atom_index_iio(ctx,
+		       get_u16(ctx->bios, ctx->data_table + ATOM_DATA_IIO_PTR) + 4);
 	if (!ctx->iio) {
 		atom_destroy(ctx);
 		return NULL;
 	}
 
-	str = CSTR(CU16(base + ATOM_ROM_MSG_PTR));
+	str = CSTR(get_u16(ctx->bios, base + ATOM_ROM_MSG_PTR));
 	while (*str && ((*str == '\n') || (*str == '\r')))
 		str++;
 	/* name string isn't always 0 terminated */
@@ -1326,18 +1329,18 @@ struct atom_context *atom_parse(struct card_info *card, void *bios)
 int atom_asic_init(struct atom_context *ctx)
 {
 	struct radeon_device *rdev = ctx->card->dev->dev_private;
-	int hwi = CU16(ctx->data_table + ATOM_DATA_FWI_PTR);
+	int hwi = get_u16(ctx->bios, ctx->data_table + ATOM_DATA_FWI_PTR);
 	uint32_t ps[16];
 	int ret;
 
 	memset(ps, 0, 64);
 
-	ps[0] = cpu_to_le32(CU32(hwi + ATOM_FWI_DEFSCLK_PTR));
-	ps[1] = cpu_to_le32(CU32(hwi + ATOM_FWI_DEFMCLK_PTR));
+	ps[0] = cpu_to_le32(get_u32(ctx->bios, hwi + ATOM_FWI_DEFSCLK_PTR));
+	ps[1] = cpu_to_le32(get_u32(ctx->bios, hwi + ATOM_FWI_DEFMCLK_PTR));
 	if (!ps[0] || !ps[1])
 		return 1;
 
-	if (!CU16(ctx->cmd_table + 4 + 2 * ATOM_CMD_INIT))
+	if (!get_u16(ctx->bios, ctx->cmd_table + 4 + 2 * ATOM_CMD_INIT))
 		return 1;
 	ret = atom_execute_table(ctx, ATOM_CMD_INIT, ps);
 	if (ret)
@@ -1346,7 +1349,7 @@ int atom_asic_init(struct atom_context *ctx)
 	memset(ps, 0, 64);
 
 	if (rdev->family < CHIP_R600) {
-		if (CU16(ctx->cmd_table + 4 + 2 * ATOM_CMD_SPDFANCNTL))
+		if (get_u16(ctx->bios, ctx->cmd_table + 4 + 2 * ATOM_CMD_SPDFANCNTL))
 			atom_execute_table(ctx, ATOM_CMD_SPDFANCNTL, ps);
 	}
 	return ret;
@@ -1363,18 +1366,18 @@ bool atom_parse_data_header(struct atom_context *ctx, int index,
 			    uint16_t * data_start)
 {
 	int offset = index * 2 + 4;
-	int idx = CU16(ctx->data_table + offset);
+	int idx = get_u16(ctx->bios, ctx->data_table + offset);
 	u16 *mdt = (u16 *)(ctx->bios + ctx->data_table + 4);
 
 	if (!mdt[index])
 		return false;
 
 	if (size)
-		*size = CU16(idx);
+		*size = get_u16(ctx->bios, idx);
 	if (frev)
-		*frev = CU8(idx + 2);
+		*frev = get_u8(ctx->bios, idx + 2);
 	if (crev)
-		*crev = CU8(idx + 3);
+		*crev = get_u8(ctx->bios, idx + 3);
 	*data_start = idx;
 	return true;
 }
@@ -1383,16 +1386,16 @@ bool atom_parse_cmd_header(struct atom_context *ctx, int index, uint8_t * frev,
 			   uint8_t * crev)
 {
 	int offset = index * 2 + 4;
-	int idx = CU16(ctx->cmd_table + offset);
+	int idx = get_u16(ctx->bios, ctx->cmd_table + offset);
 	u16 *mct = (u16 *)(ctx->bios + ctx->cmd_table + 4);
 
 	if (!mct[index])
 		return false;
 
 	if (frev)
-		*frev = CU8(idx + 2);
+		*frev = get_u8(ctx->bios, idx + 2);
 	if (crev)
-		*crev = CU8(idx + 3);
+		*crev = get_u8(ctx->bios, idx + 3);
 	return true;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09  5:14 [PATCH 0/3] Fixed-width mask/bit helpers Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 1/3] drm/amd: Remove wrapper macros over get_u{32,16,8} Lucas De Marchi
@ 2023-05-09  5:14 ` Lucas De Marchi
  2023-05-09 14:00   ` [Intel-xe] " Gustavo Sousa
                     ` (3 more replies)
  2023-05-09  5:14 ` [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros Lucas De Marchi
  2 siblings, 4 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09  5:14 UTC (permalink / raw)
  To: intel-gfx, intel-xe, dri-devel
  Cc: Masahiro Yamada, Kevin Brodsky, Lucas De Marchi, linux-kernel,
	Christian König, Alex Deucher, Thomas Gleixner,
	Andy Shevchenko, Andrew Morton

Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
masks for fixed-width types and also the corresponding BIT_U32(),
BIT_U16() and BIT_U8().

All of those depend on a new "U" suffix added to the integer constant.
Due to naming clashes it's better to call the macro U32. Since C doesn't
have a proper suffix for short and char types, the U16 and U18 variants
just use U32 with one additional check in the BIT_* macros to make
sure the compiler gives an error when the those types overflow.
The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
as otherwise they would allow an invalid bit to be passed. Hence
implement them in include/linux/bits.h rather than together with
the other BIT* variants.

The following test file is is used to test this:

	$ cat mask.c
	#include <linux/types.h>
	#include <linux/bits.h>

	static const u32 a = GENMASK_U32(31, 0);
	static const u16 b = GENMASK_U16(15, 0);
	static const u8 c = GENMASK_U8(7, 0);
	static const u32 x = BIT_U32(31);
	static const u16 y = BIT_U16(15);
	static const u8 z = BIT_U8(7);

	#if FAIL
	static const u32 a2 = GENMASK_U32(32, 0);
	static const u16 b2 = GENMASK_U16(16, 0);
	static const u8 c2 = GENMASK_U8(8, 0);
	static const u32 x2 = BIT_U32(32);
	static const u16 y2 = BIT_U16(16);
	static const u8 z2 = BIT_U8(8);
	#endif

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 include/linux/bits.h       | 22 ++++++++++++++++++++++
 include/uapi/linux/const.h |  2 ++
 include/vdso/const.h       |  1 +
 3 files changed, 25 insertions(+)

diff --git a/include/linux/bits.h b/include/linux/bits.h
index 7c0cf5031abe..ff4786c99b8c 100644
--- a/include/linux/bits.h
+++ b/include/linux/bits.h
@@ -42,4 +42,26 @@
 #define GENMASK_ULL(h, l) \
 	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
 
+#define __GENMASK_U32(h, l) \
+	(((~U32(0)) - (U32(1) << (l)) + 1) & \
+	 (~U32(0) >> (32 - 1 - (h))))
+#define GENMASK_U32(h, l) \
+	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
+
+#define __GENMASK_U16(h, l) \
+	((U32(0xffff) - (U32(1) << (l)) + 1) & \
+	 (U32(0xffff) >> (16 - 1 - (h))))
+#define GENMASK_U16(h, l) \
+	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
+
+#define __GENMASK_U8(h, l) \
+	(((U32(0xff)) - (U32(1) << (l)) + 1) & \
+	 (U32(0xff) >> (8 - 1 - (h))))
+#define GENMASK_U8(h, l) \
+	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))
+
+#define BIT_U32(nr)	_BITU32(nr)
+#define BIT_U16(nr)	(GENMASK_INPUT_CHECK(16 - 1, nr) + (U32(1) << (nr)))
+#define BIT_U8(nr)	(GENMASK_INPUT_CHECK(32 - 1, nr) + (U32(1) << (nr)))
+
 #endif	/* __LINUX_BITS_H */
diff --git a/include/uapi/linux/const.h b/include/uapi/linux/const.h
index a429381e7ca5..3a4e152520f4 100644
--- a/include/uapi/linux/const.h
+++ b/include/uapi/linux/const.h
@@ -22,9 +22,11 @@
 #define _AT(T,X)	((T)(X))
 #endif
 
+#define _U32(x)		(_AC(x, U))
 #define _UL(x)		(_AC(x, UL))
 #define _ULL(x)		(_AC(x, ULL))
 
+#define _BITU32(x)	(_U32(1) << (x))
 #define _BITUL(x)	(_UL(1) << (x))
 #define _BITULL(x)	(_ULL(1) << (x))
 
diff --git a/include/vdso/const.h b/include/vdso/const.h
index 94b385ad438d..417384a9795b 100644
--- a/include/vdso/const.h
+++ b/include/vdso/const.h
@@ -4,6 +4,7 @@
 
 #include <uapi/linux/const.h>
 
+#define U32(x)		(_U32(x))
 #define UL(x)		(_UL(x))
 #define ULL(x)		(_ULL(x))
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros
  2023-05-09  5:14 [PATCH 0/3] Fixed-width mask/bit helpers Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 1/3] drm/amd: Remove wrapper macros over get_u{32,16,8} Lucas De Marchi
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
@ 2023-05-09  5:14 ` Lucas De Marchi
  2023-05-09  7:57   ` Jani Nikula
  2 siblings, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09  5:14 UTC (permalink / raw)
  To: intel-gfx, intel-xe, dri-devel
  Cc: Masahiro Yamada, Kevin Brodsky, Lucas De Marchi, linux-kernel,
	Christian König, Alex Deucher, Thomas Gleixner,
	Andy Shevchenko, Andrew Morton

Convert the REG_* macros from i915_reg_defs.h to use the new macros
defined in linux/bits.h. This is just to help on the implementation
of the new macros and not intended to be applied.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_reg_defs.h | 28 +++++-----------------------
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg_defs.h b/drivers/gpu/drm/i915/i915_reg_defs.h
index 622d603080f9..61fbb8d62b25 100644
--- a/drivers/gpu/drm/i915/i915_reg_defs.h
+++ b/drivers/gpu/drm/i915/i915_reg_defs.h
@@ -17,10 +17,7 @@
  *
  * @return: Value with bit @__n set.
  */
-#define REG_BIT(__n)							\
-	((u32)(BIT(__n) +						\
-	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&		\
-				 ((__n) < 0 || (__n) > 31))))
+#define REG_BIT(__n) BIT_U32(__n)
 
 /**
  * REG_BIT8() - Prepare a u8 bit value
@@ -30,10 +27,7 @@
  *
  * @return: Value with bit @__n set.
  */
-#define REG_BIT8(__n)                                                   \
-	((u8)(BIT(__n) +                                                \
-	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&         \
-				 ((__n) < 0 || (__n) > 7))))
+#define REG_BIT8(__n) BIT_U8(__n)
 
 /**
  * REG_GENMASK() - Prepare a continuous u32 bitmask
@@ -44,11 +38,7 @@
  *
  * @return: Continuous bitmask from @__high to @__low, inclusive.
  */
-#define REG_GENMASK(__high, __low)					\
-	((u32)(GENMASK(__high, __low) +					\
-	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&	\
-				 __is_constexpr(__low) &&		\
-				 ((__low) < 0 || (__high) > 31 || (__low) > (__high)))))
+#define REG_GENMASK(__high, __low) GENMASK_U32(__high, __low)
 
 /**
  * REG_GENMASK64() - Prepare a continuous u64 bitmask
@@ -59,11 +49,7 @@
  *
  * @return: Continuous bitmask from @__high to @__low, inclusive.
  */
-#define REG_GENMASK64(__high, __low)					\
-	((u64)(GENMASK_ULL(__high, __low) +				\
-	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&		\
-				 __is_constexpr(__low) &&		\
-				 ((__low) < 0 || (__high) > 63 || (__low) > (__high)))))
+#define REG_GENMASK64(__high, __low) GENMASK_ULL(__high, __low)
 
 /**
  * REG_GENMASK8() - Prepare a continuous u8 bitmask
@@ -74,11 +60,7 @@
  *
  * @return: Continuous bitmask from @__high to @__low, inclusive.
  */
-#define REG_GENMASK8(__high, __low)                                     \
-	((u8)(GENMASK(__high, __low) +                                  \
-	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&      \
-				 __is_constexpr(__low) &&               \
-				 ((__low) < 0 || (__high) > 7 || (__low) > (__high)))))
+#define REG_GENMASK8(__high, __low) GENMASK_U8(__high, __low)
 
 /*
  * Local integer constant expression version of is_power_of_2().
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros
  2023-05-09  5:14 ` [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros Lucas De Marchi
@ 2023-05-09  7:57   ` Jani Nikula
  2023-05-09  8:15     ` Lucas De Marchi
  0 siblings, 1 reply; 30+ messages in thread
From: Jani Nikula @ 2023-05-09  7:57 UTC (permalink / raw)
  To: Lucas De Marchi, intel-gfx, intel-xe, dri-devel
  Cc: Masahiro Yamada, Kevin Brodsky, Lucas De Marchi, linux-kernel,
	Christian König, Alex Deucher, Thomas Gleixner,
	Andy Shevchenko, Andrew Morton

On Mon, 08 May 2023, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
> Convert the REG_* macros from i915_reg_defs.h to use the new macros
> defined in linux/bits.h. This is just to help on the implementation
> of the new macros and not intended to be applied.

This drops a number of build time input checks as well as casts to the
specified types.

BR,
Jani.

>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg_defs.h | 28 +++++-----------------------
>  1 file changed, 5 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg_defs.h b/drivers/gpu/drm/i915/i915_reg_defs.h
> index 622d603080f9..61fbb8d62b25 100644
> --- a/drivers/gpu/drm/i915/i915_reg_defs.h
> +++ b/drivers/gpu/drm/i915/i915_reg_defs.h
> @@ -17,10 +17,7 @@
>   *
>   * @return: Value with bit @__n set.
>   */
> -#define REG_BIT(__n)							\
> -	((u32)(BIT(__n) +						\
> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&		\
> -				 ((__n) < 0 || (__n) > 31))))
> +#define REG_BIT(__n) BIT_U32(__n)
>  
>  /**
>   * REG_BIT8() - Prepare a u8 bit value
> @@ -30,10 +27,7 @@
>   *
>   * @return: Value with bit @__n set.
>   */
> -#define REG_BIT8(__n)                                                   \
> -	((u8)(BIT(__n) +                                                \
> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&         \
> -				 ((__n) < 0 || (__n) > 7))))
> +#define REG_BIT8(__n) BIT_U8(__n)
>  
>  /**
>   * REG_GENMASK() - Prepare a continuous u32 bitmask
> @@ -44,11 +38,7 @@
>   *
>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>   */
> -#define REG_GENMASK(__high, __low)					\
> -	((u32)(GENMASK(__high, __low) +					\
> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&	\
> -				 __is_constexpr(__low) &&		\
> -				 ((__low) < 0 || (__high) > 31 || (__low) > (__high)))))
> +#define REG_GENMASK(__high, __low) GENMASK_U32(__high, __low)
>  
>  /**
>   * REG_GENMASK64() - Prepare a continuous u64 bitmask
> @@ -59,11 +49,7 @@
>   *
>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>   */
> -#define REG_GENMASK64(__high, __low)					\
> -	((u64)(GENMASK_ULL(__high, __low) +				\
> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&		\
> -				 __is_constexpr(__low) &&		\
> -				 ((__low) < 0 || (__high) > 63 || (__low) > (__high)))))
> +#define REG_GENMASK64(__high, __low) GENMASK_ULL(__high, __low)
>  
>  /**
>   * REG_GENMASK8() - Prepare a continuous u8 bitmask
> @@ -74,11 +60,7 @@
>   *
>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>   */
> -#define REG_GENMASK8(__high, __low)                                     \
> -	((u8)(GENMASK(__high, __low) +                                  \
> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&      \
> -				 __is_constexpr(__low) &&               \
> -				 ((__low) < 0 || (__high) > 7 || (__low) > (__high)))))
> +#define REG_GENMASK8(__high, __low) GENMASK_U8(__high, __low)
>  
>  /*
>   * Local integer constant expression version of is_power_of_2().

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros
  2023-05-09  7:57   ` Jani Nikula
@ 2023-05-09  8:15     ` Lucas De Marchi
  0 siblings, 0 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09  8:15 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, linux-kernel, dri-devel,
	Christian König, Masahiro Yamada, Alex Deucher,
	Thomas Gleixner, Andy Shevchenko, intel-xe

On Tue, May 09, 2023 at 10:57:19AM +0300, Jani Nikula wrote:
>On Mon, 08 May 2023, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
>> Convert the REG_* macros from i915_reg_defs.h to use the new macros
>> defined in linux/bits.h. This is just to help on the implementation
>> of the new macros and not intended to be applied.
>
>This drops a number of build time input checks as well as casts to the
>specified types.

the explicit checks... but the checks are still there and the compiler
still gives me a warning or error for using invalid values. See test program in
the second patch. Example:

	static const u32 a2 = GENMASK_U32(32, 0);

	In file included from /tmp/genmask.c:2:                                                                                                                             
	include/linux/bits.h:47:19: warning: right shift count is negative [-Wshift-count-negative]                                                                         
	   47 |          (~U32(0) >> (32 - 1 - (h))))                                                                                                                       
	      |                   ^~                                                                                                                                        

It's a warning, not an error though.

Same warning for the other fixed-widths with numbers above the supported width.
For negative values:

	In file included from include/linux/bits.h:21,                                     
			 from /tmp/genmask.c:2:                                            
	include/linux/build_bug.h:16:51: error: negative width in bit-field ‘<anonymous>’  
	   16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))     
	      |                                                   ^                        


The cast to the specified type we lose indeed. Could you give an example where
those are useful in the context they are used? I debated adding them, but couldn't find
a justified use of them.

Lucas De Marchi

>
>BR,
>Jani.
>
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_reg_defs.h | 28 +++++-----------------------
>>  1 file changed, 5 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg_defs.h b/drivers/gpu/drm/i915/i915_reg_defs.h
>> index 622d603080f9..61fbb8d62b25 100644
>> --- a/drivers/gpu/drm/i915/i915_reg_defs.h
>> +++ b/drivers/gpu/drm/i915/i915_reg_defs.h
>> @@ -17,10 +17,7 @@
>>   *
>>   * @return: Value with bit @__n set.
>>   */
>> -#define REG_BIT(__n)							\
>> -	((u32)(BIT(__n) +						\
>> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&		\
>> -				 ((__n) < 0 || (__n) > 31))))
>> +#define REG_BIT(__n) BIT_U32(__n)
>>
>>  /**
>>   * REG_BIT8() - Prepare a u8 bit value
>> @@ -30,10 +27,7 @@
>>   *
>>   * @return: Value with bit @__n set.
>>   */
>> -#define REG_BIT8(__n)                                                   \
>> -	((u8)(BIT(__n) +                                                \
>> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__n) &&         \
>> -				 ((__n) < 0 || (__n) > 7))))
>> +#define REG_BIT8(__n) BIT_U8(__n)
>>
>>  /**
>>   * REG_GENMASK() - Prepare a continuous u32 bitmask
>> @@ -44,11 +38,7 @@
>>   *
>>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>>   */
>> -#define REG_GENMASK(__high, __low)					\
>> -	((u32)(GENMASK(__high, __low) +					\
>> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&	\
>> -				 __is_constexpr(__low) &&		\
>> -				 ((__low) < 0 || (__high) > 31 || (__low) > (__high)))))
>> +#define REG_GENMASK(__high, __low) GENMASK_U32(__high, __low)
>>
>>  /**
>>   * REG_GENMASK64() - Prepare a continuous u64 bitmask
>> @@ -59,11 +49,7 @@
>>   *
>>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>>   */
>> -#define REG_GENMASK64(__high, __low)					\
>> -	((u64)(GENMASK_ULL(__high, __low) +				\
>> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&		\
>> -				 __is_constexpr(__low) &&		\
>> -				 ((__low) < 0 || (__high) > 63 || (__low) > (__high)))))
>> +#define REG_GENMASK64(__high, __low) GENMASK_ULL(__high, __low)
>>
>>  /**
>>   * REG_GENMASK8() - Prepare a continuous u8 bitmask
>> @@ -74,11 +60,7 @@
>>   *
>>   * @return: Continuous bitmask from @__high to @__low, inclusive.
>>   */
>> -#define REG_GENMASK8(__high, __low)                                     \
>> -	((u8)(GENMASK(__high, __low) +                                  \
>> -	       BUILD_BUG_ON_ZERO(__is_constexpr(__high) &&      \
>> -				 __is_constexpr(__low) &&               \
>> -				 ((__low) < 0 || (__high) > 7 || (__low) > (__high)))))
>> +#define REG_GENMASK8(__high, __low) GENMASK_U8(__high, __low)
>>
>>  /*
>>   * Local integer constant expression version of is_power_of_2().
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
@ 2023-05-09 14:00   ` Gustavo Sousa
  2023-05-09 21:34     ` Lucas De Marchi
  2023-05-10 12:18   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Gustavo Sousa @ 2023-05-09 14:00 UTC (permalink / raw)
  To: Lucas De Marchi, dri-devel, intel-gfx, intel-xe
  Cc: Andrew Morton, Masahiro Yamada, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, Alex Deucher, Thomas Gleixner, Andy Shevchenko,
	Christian König

Quoting Lucas De Marchi (2023-05-09 02:14:02)
>Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>masks for fixed-width types and also the corresponding BIT_U32(),
>BIT_U16() and BIT_U8().
>
>All of those depend on a new "U" suffix added to the integer constant.
>Due to naming clashes it's better to call the macro U32. Since C doesn't
>have a proper suffix for short and char types, the U16 and U18 variants
>just use U32 with one additional check in the BIT_* macros to make
>sure the compiler gives an error when the those types overflow.
>The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>as otherwise they would allow an invalid bit to be passed. Hence
>implement them in include/linux/bits.h rather than together with
>the other BIT* variants.
>
>The following test file is is used to test this:
>
>        $ cat mask.c
>        #include <linux/types.h>
>        #include <linux/bits.h>
>
>        static const u32 a = GENMASK_U32(31, 0);
>        static const u16 b = GENMASK_U16(15, 0);
>        static const u8 c = GENMASK_U8(7, 0);
>        static const u32 x = BIT_U32(31);
>        static const u16 y = BIT_U16(15);
>        static const u8 z = BIT_U8(7);
>
>        #if FAIL
>        static const u32 a2 = GENMASK_U32(32, 0);
>        static const u16 b2 = GENMASK_U16(16, 0);
>        static const u8 c2 = GENMASK_U8(8, 0);
>        static const u32 x2 = BIT_U32(32);
>        static const u16 y2 = BIT_U16(16);
>        static const u8 z2 = BIT_U8(8);
>        #endif
>
>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>---
> include/linux/bits.h       | 22 ++++++++++++++++++++++
> include/uapi/linux/const.h |  2 ++
> include/vdso/const.h       |  1 +
> 3 files changed, 25 insertions(+)
>
>diff --git a/include/linux/bits.h b/include/linux/bits.h
>index 7c0cf5031abe..ff4786c99b8c 100644
>--- a/include/linux/bits.h
>+++ b/include/linux/bits.h
>@@ -42,4 +42,26 @@
> #define GENMASK_ULL(h, l) \
>        (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
> 
>+#define __GENMASK_U32(h, l) \
>+  (((~U32(0)) - (U32(1) << (l)) + 1) & \
>+   (~U32(0) >> (32 - 1 - (h))))
>+#define GENMASK_U32(h, l) \
>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
>+
>+#define __GENMASK_U16(h, l) \
>+  ((U32(0xffff) - (U32(1) << (l)) + 1) & \
>+   (U32(0xffff) >> (16 - 1 - (h))))
>+#define GENMASK_U16(h, l) \
>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
>+
>+#define __GENMASK_U8(h, l) \
>+  (((U32(0xff)) - (U32(1) << (l)) + 1) & \
>+   (U32(0xff) >> (8 - 1 - (h))))
>+#define GENMASK_U8(h, l) \
>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))

I wonder if we should use BIT_U* variants in the above to ensure the values are
valid. If so, we get a nice boundary check and we also can use a single
definition for the mask generation:

  #define __GENMASK_U32(h, l) \
          (((~U32(0)) - (U32(1) << (l)) + 1) & \
           (~U32(0) >> (32 - 1 - (h))))
  #define GENMASK_U32(h, l) \
          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U32(h), BIT_U32(l)))
  #define GENMASK_U16(h, l) \
          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U16(h), BIT_U16(l)))
  #define GENMASK_U8(h, l) \
          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U8(h), BIT_U8(l)))

>+
>+#define BIT_U32(nr)       _BITU32(nr)
>+#define BIT_U16(nr)       (GENMASK_INPUT_CHECK(16 - 1, nr) + (U32(1) << (nr)))
>+#define BIT_U8(nr)        (GENMASK_INPUT_CHECK(32 - 1, nr) + (U32(1) << (nr)))

Shouldn't this be GENMASK_INPUT_CHECK(8 - 1, nr)?

--
Gustavo Sousa

>+
> #endif /* __LINUX_BITS_H */
>diff --git a/include/uapi/linux/const.h b/include/uapi/linux/const.h
>index a429381e7ca5..3a4e152520f4 100644
>--- a/include/uapi/linux/const.h
>+++ b/include/uapi/linux/const.h
>@@ -22,9 +22,11 @@
> #define _AT(T,X)       ((T)(X))
> #endif
> 
>+#define _U32(x)           (_AC(x, U))
> #define _UL(x)         (_AC(x, UL))
> #define _ULL(x)                (_AC(x, ULL))
> 
>+#define _BITU32(x)        (_U32(1) << (x))
> #define _BITUL(x)      (_UL(1) << (x))
> #define _BITULL(x)     (_ULL(1) << (x))
> 
>diff --git a/include/vdso/const.h b/include/vdso/const.h
>index 94b385ad438d..417384a9795b 100644
>--- a/include/vdso/const.h
>+++ b/include/vdso/const.h
>@@ -4,6 +4,7 @@
> 
> #include <uapi/linux/const.h>
> 
>+#define U32(x)            (_U32(x))
> #define UL(x)          (_UL(x))
> #define ULL(x)         (_ULL(x))
> 
>-- 
>2.40.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09 14:00   ` [Intel-xe] " Gustavo Sousa
@ 2023-05-09 21:34     ` Lucas De Marchi
  0 siblings, 0 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-09 21:34 UTC (permalink / raw)
  To: Gustavo Sousa
  Cc: Christian König, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Andy Shevchenko, Masahiro Yamada

On Tue, May 09, 2023 at 11:00:36AM -0300, Gustavo Sousa wrote:
>Quoting Lucas De Marchi (2023-05-09 02:14:02)
>>Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>>masks for fixed-width types and also the corresponding BIT_U32(),
>>BIT_U16() and BIT_U8().
>>
>>All of those depend on a new "U" suffix added to the integer constant.
>>Due to naming clashes it's better to call the macro U32. Since C doesn't
>>have a proper suffix for short and char types, the U16 and U18 variants
>>just use U32 with one additional check in the BIT_* macros to make
>>sure the compiler gives an error when the those types overflow.
>>The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>>as otherwise they would allow an invalid bit to be passed. Hence
>>implement them in include/linux/bits.h rather than together with
>>the other BIT* variants.
>>
>>The following test file is is used to test this:
>>
>>        $ cat mask.c
>>        #include <linux/types.h>
>>        #include <linux/bits.h>
>>
>>        static const u32 a = GENMASK_U32(31, 0);
>>        static const u16 b = GENMASK_U16(15, 0);
>>        static const u8 c = GENMASK_U8(7, 0);
>>        static const u32 x = BIT_U32(31);
>>        static const u16 y = BIT_U16(15);
>>        static const u8 z = BIT_U8(7);
>>
>>        #if FAIL
>>        static const u32 a2 = GENMASK_U32(32, 0);
>>        static const u16 b2 = GENMASK_U16(16, 0);
>>        static const u8 c2 = GENMASK_U8(8, 0);
>>        static const u32 x2 = BIT_U32(32);
>>        static const u16 y2 = BIT_U16(16);
>>        static const u8 z2 = BIT_U8(8);
>>        #endif
>>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>---
>> include/linux/bits.h       | 22 ++++++++++++++++++++++
>> include/uapi/linux/const.h |  2 ++
>> include/vdso/const.h       |  1 +
>> 3 files changed, 25 insertions(+)
>>
>>diff --git a/include/linux/bits.h b/include/linux/bits.h
>>index 7c0cf5031abe..ff4786c99b8c 100644
>>--- a/include/linux/bits.h
>>+++ b/include/linux/bits.h
>>@@ -42,4 +42,26 @@
>> #define GENMASK_ULL(h, l) \
>>        (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>>
>>+#define __GENMASK_U32(h, l) \
>>+  (((~U32(0)) - (U32(1) << (l)) + 1) & \
>>+   (~U32(0) >> (32 - 1 - (h))))
>>+#define GENMASK_U32(h, l) \
>>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
>>+
>>+#define __GENMASK_U16(h, l) \
>>+  ((U32(0xffff) - (U32(1) << (l)) + 1) & \
>>+   (U32(0xffff) >> (16 - 1 - (h))))
>>+#define GENMASK_U16(h, l) \
>>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
>>+
>>+#define __GENMASK_U8(h, l) \
>>+  (((U32(0xff)) - (U32(1) << (l)) + 1) & \
>>+   (U32(0xff) >> (8 - 1 - (h))))
>>+#define GENMASK_U8(h, l) \
>>+  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))
>
>I wonder if we should use BIT_U* variants in the above to ensure the values are
>valid. If so, we get a nice boundary check and we also can use a single
>definition for the mask generation:
>
>  #define __GENMASK_U32(h, l) \
>          (((~U32(0)) - (U32(1) << (l)) + 1) & \
>           (~U32(0) >> (32 - 1 - (h))))

the boundary for h and l are already covered here because (32 - 1 - (h))
would lead to a negative value if h >= 32. Similar reason for l

Doing ~U32(0) didn't work for me as it wouldn't catch the invalid values
due to expanding to U32_MAX


>  #define GENMASK_U32(h, l) \
>          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U32(h), BIT_U32(l)))

							^^^^
that doesn't really work as BIT_U32(h) would expand here,
creating the equivalent of

	~U32(0) >> (32 - 1 - (BIT_U32(h))),

which is not what we want

>  #define GENMASK_U16(h, l) \
>          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U16(h), BIT_U16(l)))
>  #define GENMASK_U8(h, l) \
>          (GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(BIT_U8(h), BIT_U8(l)))
>
>>+
>>+#define BIT_U32(nr)       _BITU32(nr)
>>+#define BIT_U16(nr)       (GENMASK_INPUT_CHECK(16 - 1, nr) + (U32(1) << (nr)))
>>+#define BIT_U8(nr)        (GENMASK_INPUT_CHECK(32 - 1, nr) + (U32(1) << (nr)))
>
>Shouldn't this be GENMASK_INPUT_CHECK(8 - 1, nr)?

ugh, good catch. Thanks

I will think if I can come up with something that reuses a single
__GENMASK_U(). Meanwhile I improved my negative tests to cover more
cases.

Lucas De Marchi



>
>--
>Gustavo Sousa
>
>>+
>> #endif /* __LINUX_BITS_H */
>>diff --git a/include/uapi/linux/const.h b/include/uapi/linux/const.h
>>index a429381e7ca5..3a4e152520f4 100644
>>--- a/include/uapi/linux/const.h
>>+++ b/include/uapi/linux/const.h
>>@@ -22,9 +22,11 @@
>> #define _AT(T,X)       ((T)(X))
>> #endif
>>
>>+#define _U32(x)           (_AC(x, U))
>> #define _UL(x)         (_AC(x, UL))
>> #define _ULL(x)                (_AC(x, ULL))
>>
>>+#define _BITU32(x)        (_U32(1) << (x))
>> #define _BITUL(x)      (_UL(1) << (x))
>> #define _BITULL(x)     (_ULL(1) << (x))
>>
>>diff --git a/include/vdso/const.h b/include/vdso/const.h
>>index 94b385ad438d..417384a9795b 100644
>>--- a/include/vdso/const.h
>>+++ b/include/vdso/const.h
>>@@ -4,6 +4,7 @@
>>
>> #include <uapi/linux/const.h>
>>
>>+#define U32(x)            (_U32(x))
>> #define UL(x)          (_UL(x))
>> #define ULL(x)         (_ULL(x))
>>
>>--
>>2.40.1
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
  2023-05-09 14:00   ` [Intel-xe] " Gustavo Sousa
@ 2023-05-10 12:18   ` kernel test robot
  2023-05-12 11:14   ` Andy Shevchenko
  2023-06-22  2:20   ` Yury Norov
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2023-05-10 12:18 UTC (permalink / raw)
  To: Lucas De Marchi, intel-gfx, intel-xe, dri-devel
  Cc: Kevin Brodsky, Masahiro Yamada, llvm, Lucas De Marchi,
	linux-kernel, Christian König, Linux Memory Management List,
	oe-kbuild-all, Alex Deucher, Thomas Gleixner, Andy Shevchenko,
	Andrew Morton

Hi Lucas,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.4-rc1 next-20230510]
[cannot apply to drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lucas-De-Marchi/drm-amd-Remove-wrapper-macros-over-get_u-32-16-8/20230509-131544
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
patch link:    https://lore.kernel.org/r/20230509051403.2748545-3-lucas.demarchi%40intel.com
patch subject: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
config: arm64-randconfig-r021-20230509 (https://download.01.org/0day-ci/archive/20230510/202305102048.2O5u4Wia-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project b0fb98227c90adf2536c9ad644a74d5e92961111)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/dc308f14f76fa2d6c1698a701dfbe0f1b247e6bd
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Lucas-De-Marchi/drm-amd-Remove-wrapper-macros-over-get_u-32-16-8/20230509-131544
        git checkout dc308f14f76fa2d6c1698a701dfbe0f1b247e6bd
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash lib/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305102048.2O5u4Wia-lkp@intel.com/

All errors (new ones prefixed by >>):

>> lib/zstd/compress/zstd_opt.c:785:9: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
   typedef U32 (*ZSTD_getAllMatchesFn)(
   ~~~~~~~ ^
   int
   include/vdso/const.h:7:18: note: expanded from macro 'U32'
   #define U32(x)          (_U32(x))
                            ^
   include/uapi/linux/const.h:25:19: note: expanded from macro '_U32'
   #define _U32(x)         (_AC(x, U))
                            ^
   include/uapi/linux/const.h:21:18: note: expanded from macro '_AC'
   #define _AC(X,Y)        __AC(X,Y)
                           ^
   include/uapi/linux/const.h:20:20: note: expanded from macro '__AC'
   #define __AC(X,Y)       (X##Y)
                            ^
   <scratch space>:178:1: note: expanded from here
   ZSTD_getAllMatchesFnU
   ^
>> lib/zstd/compress/zstd_opt.c:851:8: error: unknown type name 'ZSTD_getAllMatchesFn'; did you mean 'ZSTD_getAllMatchesFnU'?
   static ZSTD_getAllMatchesFn
          ^~~~~~~~~~~~~~~~~~~~
          ZSTD_getAllMatchesFnU
   lib/zstd/compress/zstd_opt.c:785:9: note: 'ZSTD_getAllMatchesFnU' declared here
   typedef U32 (*ZSTD_getAllMatchesFn)(
           ^
   include/vdso/const.h:7:18: note: expanded from macro 'U32'
   #define U32(x)          (_U32(x))
                            ^
   include/uapi/linux/const.h:25:19: note: expanded from macro '_U32'
   #define _U32(x)         (_AC(x, U))
                            ^
   include/uapi/linux/const.h:21:18: note: expanded from macro '_AC'
   #define _AC(X,Y)        __AC(X,Y)
                           ^
   include/uapi/linux/const.h:20:20: note: expanded from macro '__AC'
   #define __AC(X,Y)       (X##Y)
                            ^
   <scratch space>:178:1: note: expanded from here
   ZSTD_getAllMatchesFnU
   ^
>> lib/zstd/compress/zstd_opt.c:854:5: error: use of undeclared identifier 'ZSTD_getAllMatchesFn'
       ZSTD_getAllMatchesFn const getAllMatchesFns[3][4] = {
       ^
>> lib/zstd/compress/zstd_opt.c:862:12: error: use of undeclared identifier 'getAllMatchesFns'
       return getAllMatchesFns[(int)dictMode][mls - 3];
              ^
   lib/zstd/compress/zstd_opt.c:1054:5: error: unknown type name 'ZSTD_getAllMatchesFn'; did you mean 'ZSTD_getAllMatchesFnU'?
       ZSTD_getAllMatchesFn getAllMatches = ZSTD_selectBtGetAllMatches(ms, dictMode);
       ^~~~~~~~~~~~~~~~~~~~
       ZSTD_getAllMatchesFnU
   lib/zstd/compress/zstd_opt.c:785:9: note: 'ZSTD_getAllMatchesFnU' declared here
   typedef U32 (*ZSTD_getAllMatchesFn)(
           ^
   include/vdso/const.h:7:18: note: expanded from macro 'U32'
   #define U32(x)          (_U32(x))
                            ^
   include/uapi/linux/const.h:25:19: note: expanded from macro '_U32'
   #define _U32(x)         (_AC(x, U))
                            ^
   include/uapi/linux/const.h:21:18: note: expanded from macro '_AC'
   #define _AC(X,Y)        __AC(X,Y)
                           ^
   include/uapi/linux/const.h:20:20: note: expanded from macro '__AC'
   #define __AC(X,Y)       (X##Y)
                            ^
   <scratch space>:178:1: note: expanded from here
   ZSTD_getAllMatchesFnU
   ^
   5 errors generated.


vim +/int +785 lib/zstd/compress/zstd_opt.c

e0c1b49f5b674c Nick Terrell 2020-09-11  784  
2aa14b1ab2c41a Nick Terrell 2022-10-17 @785  typedef U32 (*ZSTD_getAllMatchesFn)(
2aa14b1ab2c41a Nick Terrell 2022-10-17  786      ZSTD_match_t*,
2aa14b1ab2c41a Nick Terrell 2022-10-17  787      ZSTD_matchState_t*,
2aa14b1ab2c41a Nick Terrell 2022-10-17  788      U32*,
2aa14b1ab2c41a Nick Terrell 2022-10-17  789      const BYTE*,
2aa14b1ab2c41a Nick Terrell 2022-10-17  790      const BYTE*,
2aa14b1ab2c41a Nick Terrell 2022-10-17  791      const U32 rep[ZSTD_REP_NUM],
2aa14b1ab2c41a Nick Terrell 2022-10-17  792      U32 const ll0,
2aa14b1ab2c41a Nick Terrell 2022-10-17  793      U32 const lengthToBeat);
e0c1b49f5b674c Nick Terrell 2020-09-11  794  
2aa14b1ab2c41a Nick Terrell 2022-10-17  795  FORCE_INLINE_TEMPLATE U32 ZSTD_btGetAllMatches_internal(
2aa14b1ab2c41a Nick Terrell 2022-10-17  796          ZSTD_match_t* matches,
e0c1b49f5b674c Nick Terrell 2020-09-11  797          ZSTD_matchState_t* ms,
e0c1b49f5b674c Nick Terrell 2020-09-11  798          U32* nextToUpdate3,
2aa14b1ab2c41a Nick Terrell 2022-10-17  799          const BYTE* ip,
2aa14b1ab2c41a Nick Terrell 2022-10-17  800          const BYTE* const iHighLimit,
e0c1b49f5b674c Nick Terrell 2020-09-11  801          const U32 rep[ZSTD_REP_NUM],
e0c1b49f5b674c Nick Terrell 2020-09-11  802          U32 const ll0,
2aa14b1ab2c41a Nick Terrell 2022-10-17  803          U32 const lengthToBeat,
2aa14b1ab2c41a Nick Terrell 2022-10-17  804          const ZSTD_dictMode_e dictMode,
2aa14b1ab2c41a Nick Terrell 2022-10-17  805          const U32 mls)
e0c1b49f5b674c Nick Terrell 2020-09-11  806  {
2aa14b1ab2c41a Nick Terrell 2022-10-17  807      assert(BOUNDED(3, ms->cParams.minMatch, 6) == mls);
2aa14b1ab2c41a Nick Terrell 2022-10-17  808      DEBUGLOG(8, "ZSTD_BtGetAllMatches(dictMode=%d, mls=%u)", (int)dictMode, mls);
2aa14b1ab2c41a Nick Terrell 2022-10-17  809      if (ip < ms->window.base + ms->nextToUpdate)
2aa14b1ab2c41a Nick Terrell 2022-10-17  810          return 0;   /* skipped area */
2aa14b1ab2c41a Nick Terrell 2022-10-17  811      ZSTD_updateTree_internal(ms, ip, iHighLimit, mls, dictMode);
2aa14b1ab2c41a Nick Terrell 2022-10-17  812      return ZSTD_insertBtAndGetAllMatches(matches, ms, nextToUpdate3, ip, iHighLimit, dictMode, rep, ll0, lengthToBeat, mls);
2aa14b1ab2c41a Nick Terrell 2022-10-17  813  }
2aa14b1ab2c41a Nick Terrell 2022-10-17  814  
2aa14b1ab2c41a Nick Terrell 2022-10-17  815  #define ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, mls) ZSTD_btGetAllMatches_##dictMode##_##mls
2aa14b1ab2c41a Nick Terrell 2022-10-17  816  
2aa14b1ab2c41a Nick Terrell 2022-10-17  817  #define GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, mls)            \
2aa14b1ab2c41a Nick Terrell 2022-10-17  818      static U32 ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, mls)(      \
2aa14b1ab2c41a Nick Terrell 2022-10-17  819              ZSTD_match_t* matches,                             \
2aa14b1ab2c41a Nick Terrell 2022-10-17  820              ZSTD_matchState_t* ms,                             \
2aa14b1ab2c41a Nick Terrell 2022-10-17  821              U32* nextToUpdate3,                                \
2aa14b1ab2c41a Nick Terrell 2022-10-17  822              const BYTE* ip,                                    \
2aa14b1ab2c41a Nick Terrell 2022-10-17  823              const BYTE* const iHighLimit,                      \
2aa14b1ab2c41a Nick Terrell 2022-10-17  824              const U32 rep[ZSTD_REP_NUM],                       \
2aa14b1ab2c41a Nick Terrell 2022-10-17  825              U32 const ll0,                                     \
2aa14b1ab2c41a Nick Terrell 2022-10-17  826              U32 const lengthToBeat)                            \
2aa14b1ab2c41a Nick Terrell 2022-10-17  827      {                                                          \
2aa14b1ab2c41a Nick Terrell 2022-10-17  828          return ZSTD_btGetAllMatches_internal(                  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  829                  matches, ms, nextToUpdate3, ip, iHighLimit,    \
2aa14b1ab2c41a Nick Terrell 2022-10-17  830                  rep, ll0, lengthToBeat, ZSTD_##dictMode, mls); \
2aa14b1ab2c41a Nick Terrell 2022-10-17  831      }
2aa14b1ab2c41a Nick Terrell 2022-10-17  832  
2aa14b1ab2c41a Nick Terrell 2022-10-17  833  #define GEN_ZSTD_BT_GET_ALL_MATCHES(dictMode)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  834      GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, 3)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  835      GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, 4)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  836      GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, 5)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  837      GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, 6)
2aa14b1ab2c41a Nick Terrell 2022-10-17  838  
2aa14b1ab2c41a Nick Terrell 2022-10-17  839  GEN_ZSTD_BT_GET_ALL_MATCHES(noDict)
2aa14b1ab2c41a Nick Terrell 2022-10-17  840  GEN_ZSTD_BT_GET_ALL_MATCHES(extDict)
2aa14b1ab2c41a Nick Terrell 2022-10-17  841  GEN_ZSTD_BT_GET_ALL_MATCHES(dictMatchState)
2aa14b1ab2c41a Nick Terrell 2022-10-17  842  
2aa14b1ab2c41a Nick Terrell 2022-10-17  843  #define ZSTD_BT_GET_ALL_MATCHES_ARRAY(dictMode)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  844      {                                            \
2aa14b1ab2c41a Nick Terrell 2022-10-17  845          ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, 3), \
2aa14b1ab2c41a Nick Terrell 2022-10-17  846          ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, 4), \
2aa14b1ab2c41a Nick Terrell 2022-10-17  847          ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, 5), \
2aa14b1ab2c41a Nick Terrell 2022-10-17  848          ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, 6)  \
2aa14b1ab2c41a Nick Terrell 2022-10-17  849      }
2aa14b1ab2c41a Nick Terrell 2022-10-17  850  
2aa14b1ab2c41a Nick Terrell 2022-10-17 @851  static ZSTD_getAllMatchesFn
2aa14b1ab2c41a Nick Terrell 2022-10-17  852  ZSTD_selectBtGetAllMatches(ZSTD_matchState_t const* ms, ZSTD_dictMode_e const dictMode)
e0c1b49f5b674c Nick Terrell 2020-09-11  853  {
2aa14b1ab2c41a Nick Terrell 2022-10-17 @854      ZSTD_getAllMatchesFn const getAllMatchesFns[3][4] = {
2aa14b1ab2c41a Nick Terrell 2022-10-17  855          ZSTD_BT_GET_ALL_MATCHES_ARRAY(noDict),
2aa14b1ab2c41a Nick Terrell 2022-10-17  856          ZSTD_BT_GET_ALL_MATCHES_ARRAY(extDict),
2aa14b1ab2c41a Nick Terrell 2022-10-17  857          ZSTD_BT_GET_ALL_MATCHES_ARRAY(dictMatchState)
2aa14b1ab2c41a Nick Terrell 2022-10-17  858      };
2aa14b1ab2c41a Nick Terrell 2022-10-17  859      U32 const mls = BOUNDED(3, ms->cParams.minMatch, 6);
2aa14b1ab2c41a Nick Terrell 2022-10-17  860      assert((U32)dictMode < 3);
2aa14b1ab2c41a Nick Terrell 2022-10-17  861      assert(mls - 3 < 4);
2aa14b1ab2c41a Nick Terrell 2022-10-17 @862      return getAllMatchesFns[(int)dictMode][mls - 3];
e0c1b49f5b674c Nick Terrell 2020-09-11  863  }
e0c1b49f5b674c Nick Terrell 2020-09-11  864  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
  2023-05-09 14:00   ` [Intel-xe] " Gustavo Sousa
  2023-05-10 12:18   ` kernel test robot
@ 2023-05-12 11:14   ` Andy Shevchenko
  2023-05-12 11:25     ` Jani Nikula
  2023-05-12 16:29     ` Lucas De Marchi
  2023-06-22  2:20   ` Yury Norov
  3 siblings, 2 replies; 30+ messages in thread
From: Andy Shevchenko @ 2023-05-12 11:14 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andrew Morton, Christian König, intel-gfx, Kevin Brodsky,
	linux-kernel, dri-devel, intel-xe, Alex Deucher, Thomas Gleixner,
	Masahiro Yamada

On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> masks for fixed-width types and also the corresponding BIT_U32(),
> BIT_U16() and BIT_U8().

Why?

> All of those depend on a new "U" suffix added to the integer constant.
> Due to naming clashes it's better to call the macro U32. Since C doesn't
> have a proper suffix for short and char types, the U16 and U18 variants
> just use U32 with one additional check in the BIT_* macros to make
> sure the compiler gives an error when the those types overflow.
> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
> as otherwise they would allow an invalid bit to be passed. Hence
> implement them in include/linux/bits.h rather than together with
> the other BIT* variants.

So, we have _Generic() in case you still wish to implement this.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 11:14   ` Andy Shevchenko
@ 2023-05-12 11:25     ` Jani Nikula
  2023-05-12 11:32       ` Andy Shevchenko
  2023-05-12 16:29     ` Lucas De Marchi
  1 sibling, 1 reply; 30+ messages in thread
From: Jani Nikula @ 2023-05-12 11:25 UTC (permalink / raw)
  To: Andy Shevchenko, Lucas De Marchi
  Cc: Andrew Morton, Christian König, intel-gfx, Kevin Brodsky,
	linux-kernel, dri-devel, intel-xe, Alex Deucher, Thomas Gleixner,
	Masahiro Yamada

On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> masks for fixed-width types and also the corresponding BIT_U32(),
>> BIT_U16() and BIT_U8().
>
> Why?

The main reason is that GENMASK() and BIT() size varies for 32/64 bit
builds.


BR,
Jani.

>
>> All of those depend on a new "U" suffix added to the integer constant.
>> Due to naming clashes it's better to call the macro U32. Since C doesn't
>> have a proper suffix for short and char types, the U16 and U18 variants
>> just use U32 with one additional check in the BIT_* macros to make
>> sure the compiler gives an error when the those types overflow.
>> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> as otherwise they would allow an invalid bit to be passed. Hence
>> implement them in include/linux/bits.h rather than together with
>> the other BIT* variants.
>
> So, we have _Generic() in case you still wish to implement this.

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 11:25     ` Jani Nikula
@ 2023-05-12 11:32       ` Andy Shevchenko
  2023-05-12 11:45         ` Jani Nikula
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2023-05-12 11:32 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, dri-devel, Christian König, Alex Deucher,
	Thomas Gleixner, Masahiro Yamada, intel-xe

On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> >> masks for fixed-width types and also the corresponding BIT_U32(),
> >> BIT_U16() and BIT_U8().
> >
> > Why?
> 
> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
> builds.

When needed GENMASK_ULL() can be used (with respective castings perhaps)
and BIT_ULL(), no?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 11:32       ` Andy Shevchenko
@ 2023-05-12 11:45         ` Jani Nikula
  2023-06-15 15:53           ` Andy Shevchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Jani Nikula @ 2023-05-12 11:45 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, dri-devel, Christian König, Alex Deucher,
	Thomas Gleixner, Masahiro Yamada, intel-xe

On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
>> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> >> masks for fixed-width types and also the corresponding BIT_U32(),
>> >> BIT_U16() and BIT_U8().
>> >
>> > Why?
>> 
>> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
>> builds.
>
> When needed GENMASK_ULL() can be used (with respective castings perhaps)
> and BIT_ULL(), no?

How does that help with making them the same 32-bit size on both 32 and
64 bit builds?

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 11:14   ` Andy Shevchenko
  2023-05-12 11:25     ` Jani Nikula
@ 2023-05-12 16:29     ` Lucas De Marchi
  2023-06-15 15:58       ` Andy Shevchenko
  1 sibling, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2023-05-12 16:29 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andrew Morton, Christian König, intel-gfx, Kevin Brodsky,
	linux-kernel, dri-devel, intel-xe, Alex Deucher, Thomas Gleixner,
	Masahiro Yamada

On Fri, May 12, 2023 at 02:14:19PM +0300, Andy Shevchenko wrote:
>On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> masks for fixed-width types and also the corresponding BIT_U32(),
>> BIT_U16() and BIT_U8().
>
>Why?

to create the masks/values for device registers that are
of a certain width, preventing mistakes like:

	#define REG1		0x10
	#define REG1_ENABLE	BIT(17)
	#define REG1_FOO	GENMASK(16, 15);

	register_write(REG1_ENABLE, REG1);


... if REG1 is a 16bit register for example. There were mistakes in the
past in the i915 source leading to the creation of the REG_* variants on
top of normal GENMASK/BIT (see last patch and commit 09b434d4f6d2
("drm/i915: introduce REG_BIT() and REG_GENMASK() to define register
contents")

We are preparing another driver (xe), still to be merged but already
open (https://gitlab.freedesktop.org/drm/xe/kernel), that has
similar requirements.


>
>> All of those depend on a new "U" suffix added to the integer constant.
>> Due to naming clashes it's better to call the macro U32. Since C doesn't
>> have a proper suffix for short and char types, the U16 and U18 variants
>> just use U32 with one additional check in the BIT_* macros to make
>> sure the compiler gives an error when the those types overflow.
>> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> as otherwise they would allow an invalid bit to be passed. Hence
>> implement them in include/linux/bits.h rather than together with
>> the other BIT* variants.
>
>So, we have _Generic() in case you still wish to implement this.

humn... how would a _Generic() help here? The input is 1 or 2 integer
literals (h and l) so the compiler can check it is correct at build
time.  See example above.

Lucas De Marchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 11:45         ` Jani Nikula
@ 2023-06-15 15:53           ` Andy Shevchenko
  2023-06-20 14:47             ` Jani Nikula
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2023-06-15 15:53 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, dri-devel, Christian König, Alex Deucher,
	Thomas Gleixner, Masahiro Yamada, intel-xe

On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
> >> >> BIT_U16() and BIT_U8().
> >> >
> >> > Why?
> >> 
> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
> >> builds.
> >
> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
> > and BIT_ULL(), no?
> 
> How does that help with making them the same 32-bit size on both 32 and
> 64 bit builds?

	u32 x = GENMASK();
	u64 y = GENMASK_ULL();

No? Then use in your code either x or y. Note that I assume that the parameters
to GENMASK*() are built-time constants. Is it the case for you?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-12 16:29     ` Lucas De Marchi
@ 2023-06-15 15:58       ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2023-06-15 15:58 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andrew Morton, Christian König, intel-gfx, Kevin Brodsky,
	linux-kernel, dri-devel, intel-xe, Alex Deucher, Thomas Gleixner,
	Masahiro Yamada

On Fri, May 12, 2023 at 09:29:23AM -0700, Lucas De Marchi wrote:
> On Fri, May 12, 2023 at 02:14:19PM +0300, Andy Shevchenko wrote:
> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> > > Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> > > masks for fixed-width types and also the corresponding BIT_U32(),
> > > BIT_U16() and BIT_U8().
> > 
> > Why?
> 
> to create the masks/values for device registers that are
> of a certain width, preventing mistakes like:
> 
> 	#define REG1		0x10
> 	#define REG1_ENABLE	BIT(17)
> 	#define REG1_FOO	GENMASK(16, 15);
> 
> 	register_write(REG1_ENABLE, REG1);
> 
> 
> ... if REG1 is a 16bit register for example. There were mistakes in the
> past in the i915 source leading to the creation of the REG_* variants on
> top of normal GENMASK/BIT (see last patch and commit 09b434d4f6d2
> ("drm/i915: introduce REG_BIT() and REG_GENMASK() to define register
> contents")

Doesn't it look like something for bitfield.h candidate?
If your definition doesn't fit the given mask, bail out.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-15 15:53           ` Andy Shevchenko
@ 2023-06-20 14:47             ` Jani Nikula
  2023-06-20 14:55               ` Andy Shevchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Jani Nikula @ 2023-06-20 14:47 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, dri-devel, Christian König, Alex Deucher,
	Thomas Gleixner, Masahiro Yamada, intel-xe

On Thu, 15 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
>> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
>> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
>> >> >> BIT_U16() and BIT_U8().
>> >> >
>> >> > Why?
>> >> 
>> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
>> >> builds.
>> >
>> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
>> > and BIT_ULL(), no?
>> 
>> How does that help with making them the same 32-bit size on both 32 and
>> 64 bit builds?
>
> 	u32 x = GENMASK();
> 	u64 y = GENMASK_ULL();
>
> No? Then use in your code either x or y. Note that I assume that the parameters
> to GENMASK*() are built-time constants. Is it the case for you?

What's wrong with wanting to define macros with specific size, depending
on e.g. hardware registers instead of build size?

What would you use for printk format if you wanted to to print
GENMASK()?


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-20 14:47             ` Jani Nikula
@ 2023-06-20 14:55               ` Andy Shevchenko
  2023-06-20 17:25                 ` [Intel-xe] " Lucas De Marchi
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2023-06-20 14:55 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, Lucas De Marchi,
	linux-kernel, dri-devel, Christian König, Alex Deucher,
	Thomas Gleixner, Masahiro Yamada, intel-xe

On Tue, Jun 20, 2023 at 05:47:34PM +0300, Jani Nikula wrote:
> On Thu, 15 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> >> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
> >> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> >> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> >> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> >> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
> >> >> >> BIT_U16() and BIT_U8().
> >> >> >
> >> >> > Why?
> >> >> 
> >> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
> >> >> builds.
> >> >
> >> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
> >> > and BIT_ULL(), no?
> >> 
> >> How does that help with making them the same 32-bit size on both 32 and
> >> 64 bit builds?
> >
> > 	u32 x = GENMASK();
> > 	u64 y = GENMASK_ULL();
> >
> > No? Then use in your code either x or y. Note that I assume that the parameters
> > to GENMASK*() are built-time constants. Is it the case for you?
> 
> What's wrong with wanting to define macros with specific size, depending
> on e.g. hardware registers instead of build size?

Nothing, but I think the problem is smaller than it's presented.
And there are already header for bitfields with a lot of helpers
for (similar) cases if not yours.

> What would you use for printk format if you wanted to to print
> GENMASK()?

%lu, no?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-20 14:55               ` Andy Shevchenko
@ 2023-06-20 17:25                 ` Lucas De Marchi
  2023-06-20 17:41                   ` Andy Shevchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2023-06-20 17:25 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: intel-gfx, Kevin Brodsky, linux-kernel, dri-devel, intel-xe,
	Thomas Gleixner, Alex Deucher, Andrew Morton, Masahiro Yamada,
	Christian König

On Tue, Jun 20, 2023 at 05:55:19PM +0300, Andy Shevchenko wrote:
>On Tue, Jun 20, 2023 at 05:47:34PM +0300, Jani Nikula wrote:
>> On Thu, 15 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
>> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> >> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
>> >> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> >> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> >> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> >> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
>> >> >> >> BIT_U16() and BIT_U8().
>> >> >> >
>> >> >> > Why?
>> >> >>
>> >> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
>> >> >> builds.
>> >> >
>> >> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
>> >> > and BIT_ULL(), no?
>> >>
>> >> How does that help with making them the same 32-bit size on both 32 and
>> >> 64 bit builds?
>> >
>> > 	u32 x = GENMASK();
>> > 	u64 y = GENMASK_ULL();
>> >
>> > No? Then use in your code either x or y. Note that I assume that the parameters
>> > to GENMASK*() are built-time constants. Is it the case for you?
>>
>> What's wrong with wanting to define macros with specific size, depending
>> on e.g. hardware registers instead of build size?
>
>Nothing, but I think the problem is smaller than it's presented.

not sure about big/small problem you are talking about. It's a problem
for when the *device* register is a 32b fixed width, which is
independent from the CPU you are running on. We also have registers that
are u16 and u64. Having fixed-width GENMASK and BIT helps avoiding
mistakes like below. Just to use one example, the diff below builds
fine on my 64b machine, yet it's obviously wrong:

	$ git diff 
	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
	index 0b414eae1683..692a0ad9a768 100644
	--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
	+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
	@@ -261,8 +261,8 @@ static u32 rw_with_mcr_steering_fw(struct intel_gt *gt,
			 * No need to save old steering reg value.
			 */
			intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR,
	-                                     REG_FIELD_PREP(MTL_MCR_GROUPID, group) |
	-                                     REG_FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
	+                                     FIELD_PREP(MTL_MCR_GROUPID, group) |
	+                                     FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
					      (rw_flag == FW_REG_READ ? GEN11_MCR_MULTICAST : 0));
		} else if (GRAPHICS_VER(uncore->i915) >= 11) {
			mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
	index 718cb2c80f79..c42bc2900c6a 100644
	--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
	+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
	@@ -80,8 +80,8 @@
	 #define   GEN11_MCR_SLICE_MASK                 GEN11_MCR_SLICE(0xf)
	 #define   GEN11_MCR_SUBSLICE(subslice)         (((subslice) & 0x7) << 24)
	 #define   GEN11_MCR_SUBSLICE_MASK              GEN11_MCR_SUBSLICE(0x7)
	-#define   MTL_MCR_GROUPID                      REG_GENMASK(11, 8)
	-#define   MTL_MCR_INSTANCEID                   REG_GENMASK(3, 0)
	+#define   MTL_MCR_GROUPID                      GENMASK(32, 8)
	+#define   MTL_MCR_INSTANCEID                   GENMASK(3, 0)
	 
	 #define IPEIR_I965                             _MMIO(0x2064)
	 #define IPEHR_I965                             _MMIO(0x2068)

If the driver didn't support 32b CPUs, this would even go unnoticed.

Lucas De Marchi

>And there are already header for bitfields with a lot of helpers
>for (similar) cases if not yours.
>
>> What would you use for printk format if you wanted to to print
>> GENMASK()?
>
>%lu, no?
>
>-- 
>With Best Regards,
>Andy Shevchenko
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-20 17:25                 ` [Intel-xe] " Lucas De Marchi
@ 2023-06-20 17:41                   ` Andy Shevchenko
  2023-06-20 18:02                     ` Lucas De Marchi
  2023-06-20 18:19                     ` Jani Nikula
  0 siblings, 2 replies; 30+ messages in thread
From: Andy Shevchenko @ 2023-06-20 17:41 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-gfx, Kevin Brodsky, linux-kernel, dri-devel, intel-xe,
	Thomas Gleixner, Alex Deucher, Andrew Morton, Masahiro Yamada,
	Christian König

On Tue, Jun 20, 2023 at 10:25:21AM -0700, Lucas De Marchi wrote:
> On Tue, Jun 20, 2023 at 05:55:19PM +0300, Andy Shevchenko wrote:
> > On Tue, Jun 20, 2023 at 05:47:34PM +0300, Jani Nikula wrote:
> > > On Thu, 15 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > > On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
> > > >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > >> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
> > > >> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > >> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> > > >> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> > > >> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
> > > >> >> >> BIT_U16() and BIT_U8().

> > > >> >> > Why?
> > > >> >>
> > > >> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
> > > >> >> builds.
> > > >> >
> > > >> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
> > > >> > and BIT_ULL(), no?
> > > >>
> > > >> How does that help with making them the same 32-bit size on both 32 and
> > > >> 64 bit builds?
> > > >
> > > > 	u32 x = GENMASK();
> > > > 	u64 y = GENMASK_ULL();
> > > >
> > > > No? Then use in your code either x or y. Note that I assume that the parameters
> > > > to GENMASK*() are built-time constants. Is it the case for you?
> > > 
> > > What's wrong with wanting to define macros with specific size, depending
> > > on e.g. hardware registers instead of build size?
> > 
> > Nothing, but I think the problem is smaller than it's presented.
> 
> not sure about big/small problem you are talking about. It's a problem
> for when the *device* register is a 32b fixed width, which is
> independent from the CPU you are running on. We also have registers that
> are u16 and u64. Having fixed-width GENMASK and BIT helps avoiding
> mistakes like below. Just to use one example, the diff below builds
> fine on my 64b machine, yet it's obviously wrong:
> 
> 	$ git diff 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> 	index 0b414eae1683..692a0ad9a768 100644
> 	--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> 	@@ -261,8 +261,8 @@ static u32 rw_with_mcr_steering_fw(struct intel_gt *gt,
> 			 * No need to save old steering reg value.
> 			 */
> 			intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR,
> 	-                                     REG_FIELD_PREP(MTL_MCR_GROUPID, group) |
> 	-                                     REG_FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
> 	+                                     FIELD_PREP(MTL_MCR_GROUPID, group) |
> 	+                                     FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
> 					      (rw_flag == FW_REG_READ ? GEN11_MCR_MULTICAST : 0));
> 		} else if (GRAPHICS_VER(uncore->i915) >= 11) {
> 			mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
> 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> 	index 718cb2c80f79..c42bc2900c6a 100644
> 	--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> 	@@ -80,8 +80,8 @@
> 	 #define   GEN11_MCR_SLICE_MASK                 GEN11_MCR_SLICE(0xf)
> 	 #define   GEN11_MCR_SUBSLICE(subslice)         (((subslice) & 0x7) << 24)
> 	 #define   GEN11_MCR_SUBSLICE_MASK              GEN11_MCR_SUBSLICE(0x7)
> 	-#define   MTL_MCR_GROUPID                      REG_GENMASK(11, 8)
> 	-#define   MTL_MCR_INSTANCEID                   REG_GENMASK(3, 0)
> 	+#define   MTL_MCR_GROUPID                      GENMASK(32, 8)
> 	+#define   MTL_MCR_INSTANCEID                   GENMASK(3, 0)
> 	 	 #define IPEIR_I965                             _MMIO(0x2064)
> 	 #define IPEHR_I965                             _MMIO(0x2068)
> 
> If the driver didn't support 32b CPUs, this would even go unnoticed.

So, what does prevent you from using GENMASK_ULL()?

Another point, you may teach GENMASK() to issue a warning if hi and/or lo
bigger than BITS_PER_LONG.

I still don't see the usefulness of that churn.

> Lucas De Marchi
> 
> > And there are already header for bitfields with a lot of helpers
> > for (similar) cases if not yours.
> > 
> > > What would you use for printk format if you wanted to to print
> > > GENMASK()?
> > 
> > %lu, no?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-20 17:41                   ` Andy Shevchenko
@ 2023-06-20 18:02                     ` Lucas De Marchi
  2023-06-20 18:19                     ` Jani Nikula
  1 sibling, 0 replies; 30+ messages in thread
From: Lucas De Marchi @ 2023-06-20 18:02 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: intel-gfx, Kevin Brodsky, linux-kernel, dri-devel, intel-xe,
	Thomas Gleixner, Alex Deucher, Andrew Morton, Masahiro Yamada,
	Christian König

On Tue, Jun 20, 2023 at 08:41:10PM +0300, Andy Shevchenko wrote:
>On Tue, Jun 20, 2023 at 10:25:21AM -0700, Lucas De Marchi wrote:
>> On Tue, Jun 20, 2023 at 05:55:19PM +0300, Andy Shevchenko wrote:
>> > On Tue, Jun 20, 2023 at 05:47:34PM +0300, Jani Nikula wrote:
>> > > On Thu, 15 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > > > On Fri, May 12, 2023 at 02:45:19PM +0300, Jani Nikula wrote:
>> > > >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > > >> > On Fri, May 12, 2023 at 02:25:18PM +0300, Jani Nikula wrote:
>> > > >> >> On Fri, 12 May 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>> > > >> >> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> > > >> >> >> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> > > >> >> >> masks for fixed-width types and also the corresponding BIT_U32(),
>> > > >> >> >> BIT_U16() and BIT_U8().
>
>> > > >> >> > Why?
>> > > >> >>
>> > > >> >> The main reason is that GENMASK() and BIT() size varies for 32/64 bit
>> > > >> >> builds.
>> > > >> >
>> > > >> > When needed GENMASK_ULL() can be used (with respective castings perhaps)
>> > > >> > and BIT_ULL(), no?
>> > > >>
>> > > >> How does that help with making them the same 32-bit size on both 32 and
>> > > >> 64 bit builds?
>> > > >
>> > > > 	u32 x = GENMASK();
>> > > > 	u64 y = GENMASK_ULL();
>> > > >
>> > > > No? Then use in your code either x or y. Note that I assume that the parameters
>> > > > to GENMASK*() are built-time constants. Is it the case for you?
>> > >
>> > > What's wrong with wanting to define macros with specific size, depending
>> > > on e.g. hardware registers instead of build size?
>> >
>> > Nothing, but I think the problem is smaller than it's presented.
>>
>> not sure about big/small problem you are talking about. It's a problem
>> for when the *device* register is a 32b fixed width, which is
>> independent from the CPU you are running on. We also have registers that
>> are u16 and u64. Having fixed-width GENMASK and BIT helps avoiding
>> mistakes like below. Just to use one example, the diff below builds
>> fine on my 64b machine, yet it's obviously wrong:
>>
>> 	$ git diff 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
>> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
>> 	index 0b414eae1683..692a0ad9a768 100644
>> 	--- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
>> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
>> 	@@ -261,8 +261,8 @@ static u32 rw_with_mcr_steering_fw(struct intel_gt *gt,
>> 			 * No need to save old steering reg value.
>> 			 */
>> 			intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR,
>> 	-                                     REG_FIELD_PREP(MTL_MCR_GROUPID, group) |
>> 	-                                     REG_FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
>> 	+                                     FIELD_PREP(MTL_MCR_GROUPID, group) |
>> 	+                                     FIELD_PREP(MTL_MCR_INSTANCEID, instance) |
>> 					      (rw_flag == FW_REG_READ ? GEN11_MCR_MULTICAST : 0));
>> 		} else if (GRAPHICS_VER(uncore->i915) >= 11) {
>> 			mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
>> 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> 	index 718cb2c80f79..c42bc2900c6a 100644
>> 	--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> 	@@ -80,8 +80,8 @@
>> 	 #define   GEN11_MCR_SLICE_MASK                 GEN11_MCR_SLICE(0xf)
>> 	 #define   GEN11_MCR_SUBSLICE(subslice)         (((subslice) & 0x7) << 24)
>> 	 #define   GEN11_MCR_SUBSLICE_MASK              GEN11_MCR_SUBSLICE(0x7)
>> 	-#define   MTL_MCR_GROUPID                      REG_GENMASK(11, 8)
>> 	-#define   MTL_MCR_INSTANCEID                   REG_GENMASK(3, 0)
>> 	+#define   MTL_MCR_GROUPID                      GENMASK(32, 8)
>> 	+#define   MTL_MCR_INSTANCEID                   GENMASK(3, 0)
>> 	 	 #define IPEIR_I965                             _MMIO(0x2064)
>> 	 #define IPEHR_I965                             _MMIO(0x2068)
>>
>> If the driver didn't support 32b CPUs, this would even go unnoticed.
>
>So, what does prevent you from using GENMASK_ULL()?

nothing is preventing me to write the wrong code, which is what we are
trying to solve. GENMASK_ULL() would generate the wrong code as that
particular register is 32b, not 64b, on the GPU.

>
>Another point, you may teach GENMASK() to issue a warning if hi and/or lo
>bigger than BITS_PER_LONG.

Which varies depending on the CPU you are building for, so it misses the
point.  GENMASK_U32/GENMASK_U16/GENMASK_U8 and BIT counterparts would
emit a warning if hi is bigger than _exactly_ 32, 16 or 8, regardless
of the CPU you built the code for.

Lucas De Marchi

>
>I still don't see the usefulness of that churn.
>
>> Lucas De Marchi
>>
>> > And there are already header for bitfields with a lot of helpers
>> > for (similar) cases if not yours.
>> >
>> > > What would you use for printk format if you wanted to to print
>> > > GENMASK()?
>> >
>> > %lu, no?
>
>-- 
>With Best Regards,
>Andy Shevchenko
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-20 17:41                   ` Andy Shevchenko
  2023-06-20 18:02                     ` Lucas De Marchi
@ 2023-06-20 18:19                     ` Jani Nikula
  1 sibling, 0 replies; 30+ messages in thread
From: Jani Nikula @ 2023-06-20 18:19 UTC (permalink / raw)
  To: Andy Shevchenko, Lucas De Marchi
  Cc: intel-gfx, Kevin Brodsky, linux-kernel, dri-devel, intel-xe,
	Thomas Gleixner, Alex Deucher, Andrew Morton, Masahiro Yamada,
	Christian König

On Tue, 20 Jun 2023, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> So, what does prevent you from using GENMASK_ULL()?
>
> Another point, you may teach GENMASK() to issue a warning if hi and/or lo
> bigger than BITS_PER_LONG.

What good does that do if you want the warning for a fixed size
different from unsigned long or long long? Worse, sizeof(long) depends
on arch, while the GENMASK you want depends on the use case.

> I still don't see the usefulness of that churn.

This thread is turning into a prime example of why drivers and
subsystems reinvent their own wheels instead of trying to get generally
useful stuff merged in kernel headers. :p


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
                     ` (2 preceding siblings ...)
  2023-05-12 11:14   ` Andy Shevchenko
@ 2023-06-22  2:20   ` Yury Norov
  2023-06-22  6:15     ` Lucas De Marchi
  2024-01-18 20:42     ` Re: [Intel-xe] " Lucas De Marchi
  3 siblings, 2 replies; 30+ messages in thread
From: Yury Norov @ 2023-06-22  2:20 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, linux-kernel, dri-devel,
	Christian König, Masahiro Yamada, Alex Deucher,
	Thomas Gleixner, Andy Shevchenko, intel-xe

Hi Lucas, all!

(Thanks, Andy, for pointing to this thread.)

On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> masks for fixed-width types and also the corresponding BIT_U32(),
> BIT_U16() and BIT_U8().

Can you split BIT() and GENMASK() material to separate patches?

> All of those depend on a new "U" suffix added to the integer constant.
> Due to naming clashes it's better to call the macro U32. Since C doesn't
> have a proper suffix for short and char types, the U16 and U18 variants
> just use U32 with one additional check in the BIT_* macros to make
> sure the compiler gives an error when the those types overflow.

I feel like I don't understand the sentence...

> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
> as otherwise they would allow an invalid bit to be passed. Hence
> implement them in include/linux/bits.h rather than together with
> the other BIT* variants.

I don't think it's a good way to go because BIT() belongs to a more basic
level than GENMASK(). Not mentioning possible header dependency issues.
If you need to test against tighter numeric region, I'd suggest to
do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
directly. Something like:
        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))

> The following test file is is used to test this:
> 
> 	$ cat mask.c
> 	#include <linux/types.h>
> 	#include <linux/bits.h>
> 
> 	static const u32 a = GENMASK_U32(31, 0);
> 	static const u16 b = GENMASK_U16(15, 0);
> 	static const u8 c = GENMASK_U8(7, 0);
> 	static const u32 x = BIT_U32(31);
> 	static const u16 y = BIT_U16(15);
> 	static const u8 z = BIT_U8(7);
> 
> 	#if FAIL
> 	static const u32 a2 = GENMASK_U32(32, 0);
> 	static const u16 b2 = GENMASK_U16(16, 0);
> 	static const u8 c2 = GENMASK_U8(8, 0);
> 	static const u32 x2 = BIT_U32(32);
> 	static const u16 y2 = BIT_U16(16);
> 	static const u8 z2 = BIT_U8(8);
> 	#endif
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  include/linux/bits.h       | 22 ++++++++++++++++++++++
>  include/uapi/linux/const.h |  2 ++
>  include/vdso/const.h       |  1 +
>  3 files changed, 25 insertions(+)
> 
> diff --git a/include/linux/bits.h b/include/linux/bits.h
> index 7c0cf5031abe..ff4786c99b8c 100644
> --- a/include/linux/bits.h
> +++ b/include/linux/bits.h
> @@ -42,4 +42,26 @@
>  #define GENMASK_ULL(h, l) \
>  	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>  
> +#define __GENMASK_U32(h, l) \
> +	(((~U32(0)) - (U32(1) << (l)) + 1) & \
> +	 (~U32(0) >> (32 - 1 - (h))))
> +#define GENMASK_U32(h, l) \
> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
> +
> +#define __GENMASK_U16(h, l) \
> +	((U32(0xffff) - (U32(1) << (l)) + 1) & \
> +	 (U32(0xffff) >> (16 - 1 - (h))))
> +#define GENMASK_U16(h, l) \
> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
> +
> +#define __GENMASK_U8(h, l) \
> +	(((U32(0xff)) - (U32(1) << (l)) + 1) & \
> +	 (U32(0xff) >> (8 - 1 - (h))))
> +#define GENMASK_U8(h, l) \
> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))

[...]

I see nothing wrong with fixed-wight versions of GENMASK if it helps
people to write safer code. Can you please in commit message mention
the exact patch(es) that added a bug related to GENMASK() misuse? It
would be easier to advocate the purpose of new API with that in mind.

Regarding implementation - we should avoid copy-pasting in cases
like this. Below is the patch that I boot-tested for x86_64 and
compile-tested for arm64.

It looks less opencoded, and maybe Andy will be less skeptical about
this approach because of less maintenance burden. Please take it if
you like for v2.

Thanks,
Yury

From 39c5b35075df67e7d88644470ca78a3486367c02 Mon Sep 17 00:00:00 2001
From: Yury Norov <yury.norov@gmail.com>
Date: Wed, 21 Jun 2023 15:27:29 -0700
Subject: [PATCH] bits: introduce fixed-type genmasks

Generalize __GENMASK() to support different types, and implement
fixed-types versions of GENMASK() based on it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/bitops.h |  1 -
 include/linux/bits.h   | 22 ++++++++++++----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..1db50c69cfdb 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -15,7 +15,6 @@
 #  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
 #endif
 
-#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
 #define BITS_TO_LONGS(nr)	__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
 #define BITS_TO_U64(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u64))
 #define BITS_TO_U32(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
diff --git a/include/linux/bits.h b/include/linux/bits.h
index 7c0cf5031abe..cb94128171b2 100644
--- a/include/linux/bits.h
+++ b/include/linux/bits.h
@@ -6,6 +6,8 @@
 #include <vdso/bits.h>
 #include <asm/bitsperlong.h>
 
+#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
+
 #define BIT_MASK(nr)		(UL(1) << ((nr) % BITS_PER_LONG))
 #define BIT_WORD(nr)		((nr) / BITS_PER_LONG)
 #define BIT_ULL_MASK(nr)	(ULL(1) << ((nr) % BITS_PER_LONG_LONG))
@@ -30,16 +32,16 @@
 #define GENMASK_INPUT_CHECK(h, l) 0
 #endif
 
-#define __GENMASK(h, l) \
-	(((~UL(0)) - (UL(1) << (l)) + 1) & \
-	 (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
-#define GENMASK(h, l) \
-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
+#define __GENMASK(t, h, l) \
+	(GENMASK_INPUT_CHECK(h, l) + \
+	 (((t)~0ULL - ((t)(1) << (l)) + 1) & \
+	 ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))
 
-#define __GENMASK_ULL(h, l) \
-	(((~ULL(0)) - (ULL(1) << (l)) + 1) & \
-	 (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
-#define GENMASK_ULL(h, l) \
-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
+#define GENMASK(h, l)		__GENMASK(unsigned long,  h, l)
+#define GENMASK_ULL(h, l)	__GENMASK(unsigned long long, h, l)
+#define GENMASK_U8(h, l)	__GENMASK(u8,  h, l)
+#define GENMASK_U16(h, l)	__GENMASK(u16, h, l)
+#define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
+#define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
 
 #endif	/* __LINUX_BITS_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-22  2:20   ` Yury Norov
@ 2023-06-22  6:15     ` Lucas De Marchi
  2023-06-22 14:59       ` Yury Norov
  2024-01-18 20:42     ` Re: [Intel-xe] " Lucas De Marchi
  1 sibling, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2023-06-22  6:15 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
>Hi Lucas, all!
>
>(Thanks, Andy, for pointing to this thread.)
>
>On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> masks for fixed-width types and also the corresponding BIT_U32(),
>> BIT_U16() and BIT_U8().
>
>Can you split BIT() and GENMASK() material to separate patches?
>
>> All of those depend on a new "U" suffix added to the integer constant.
>> Due to naming clashes it's better to call the macro U32. Since C doesn't
>> have a proper suffix for short and char types, the U16 and U18 variants
>> just use U32 with one additional check in the BIT_* macros to make
>> sure the compiler gives an error when the those types overflow.
>
>I feel like I don't understand the sentence...

maybe it was a digression of the integer constants

>
>> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> as otherwise they would allow an invalid bit to be passed. Hence
>> implement them in include/linux/bits.h rather than together with
>> the other BIT* variants.
>
>I don't think it's a good way to go because BIT() belongs to a more basic
>level than GENMASK(). Not mentioning possible header dependency issues.
>If you need to test against tighter numeric region, I'd suggest to
>do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
>directly. Something like:
>        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))
>
>> The following test file is is used to test this:
>>
>> 	$ cat mask.c
>> 	#include <linux/types.h>
>> 	#include <linux/bits.h>
>>
>> 	static const u32 a = GENMASK_U32(31, 0);
>> 	static const u16 b = GENMASK_U16(15, 0);
>> 	static const u8 c = GENMASK_U8(7, 0);
>> 	static const u32 x = BIT_U32(31);
>> 	static const u16 y = BIT_U16(15);
>> 	static const u8 z = BIT_U8(7);
>>
>> 	#if FAIL
>> 	static const u32 a2 = GENMASK_U32(32, 0);
>> 	static const u16 b2 = GENMASK_U16(16, 0);
>> 	static const u8 c2 = GENMASK_U8(8, 0);
>> 	static const u32 x2 = BIT_U32(32);
>> 	static const u16 y2 = BIT_U16(16);
>> 	static const u8 z2 = BIT_U8(8);
>> 	#endif
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  include/linux/bits.h       | 22 ++++++++++++++++++++++
>>  include/uapi/linux/const.h |  2 ++
>>  include/vdso/const.h       |  1 +
>>  3 files changed, 25 insertions(+)
>>
>> diff --git a/include/linux/bits.h b/include/linux/bits.h
>> index 7c0cf5031abe..ff4786c99b8c 100644
>> --- a/include/linux/bits.h
>> +++ b/include/linux/bits.h
>> @@ -42,4 +42,26 @@
>>  #define GENMASK_ULL(h, l) \
>>  	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>>
>> +#define __GENMASK_U32(h, l) \
>> +	(((~U32(0)) - (U32(1) << (l)) + 1) & \
>> +	 (~U32(0) >> (32 - 1 - (h))))
>> +#define GENMASK_U32(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
>> +
>> +#define __GENMASK_U16(h, l) \
>> +	((U32(0xffff) - (U32(1) << (l)) + 1) & \
>> +	 (U32(0xffff) >> (16 - 1 - (h))))
>> +#define GENMASK_U16(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
>> +
>> +#define __GENMASK_U8(h, l) \
>> +	(((U32(0xff)) - (U32(1) << (l)) + 1) & \
>> +	 (U32(0xff) >> (8 - 1 - (h))))
>> +#define GENMASK_U8(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))
>
>[...]
>
>I see nothing wrong with fixed-wight versions of GENMASK if it helps
>people to write safer code. Can you please in commit message mention
>the exact patch(es) that added a bug related to GENMASK() misuse? It
>would be easier to advocate the purpose of new API with that in mind.
>
>Regarding implementation - we should avoid copy-pasting in cases
>like this. Below is the patch that I boot-tested for x86_64 and
>compile-tested for arm64.
>
>It looks less opencoded, and maybe Andy will be less skeptical about
>this approach because of less maintenance burden. Please take it if
>you like for v2.
>
>Thanks,
>Yury
>
>From 39c5b35075df67e7d88644470ca78a3486367c02 Mon Sep 17 00:00:00 2001
>From: Yury Norov <yury.norov@gmail.com>
>Date: Wed, 21 Jun 2023 15:27:29 -0700
>Subject: [PATCH] bits: introduce fixed-type genmasks
>
>Generalize __GENMASK() to support different types, and implement
>fixed-types versions of GENMASK() based on it.
>
>Signed-off-by: Yury Norov <yury.norov@gmail.com>
>---
> include/linux/bitops.h |  1 -
> include/linux/bits.h   | 22 ++++++++++++----------
> 2 files changed, 12 insertions(+), 11 deletions(-)
>
>diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>index 2ba557e067fe..1db50c69cfdb 100644
>--- a/include/linux/bitops.h
>+++ b/include/linux/bitops.h
>@@ -15,7 +15,6 @@
> #  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> #endif
>
>-#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
> #define BITS_TO_LONGS(nr)	__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
> #define BITS_TO_U64(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u64))
> #define BITS_TO_U32(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
>diff --git a/include/linux/bits.h b/include/linux/bits.h
>index 7c0cf5031abe..cb94128171b2 100644
>--- a/include/linux/bits.h
>+++ b/include/linux/bits.h
>@@ -6,6 +6,8 @@
> #include <vdso/bits.h>
> #include <asm/bitsperlong.h>
>
>+#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
>+
> #define BIT_MASK(nr)		(UL(1) << ((nr) % BITS_PER_LONG))
> #define BIT_WORD(nr)		((nr) / BITS_PER_LONG)
> #define BIT_ULL_MASK(nr)	(ULL(1) << ((nr) % BITS_PER_LONG_LONG))
>@@ -30,16 +32,16 @@
> #define GENMASK_INPUT_CHECK(h, l) 0
> #endif
>
>-#define __GENMASK(h, l) \
>-	(((~UL(0)) - (UL(1) << (l)) + 1) & \
>-	 (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
>-#define GENMASK(h, l) \
>-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>+#define __GENMASK(t, h, l) \
>+	(GENMASK_INPUT_CHECK(h, l) + \
>+	 (((t)~0ULL - ((t)(1) << (l)) + 1) & \
>+	 ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))

yeah... forcing the use of ull and then casting to the type is simpler
and does the job. Checked that it does not break the build if h is
greater than the type and it works

../include/linux/bits.h:40:20: error: right shift count >= width of type [-Werror=shift-count-overflow]
    40 |          ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))
       |                    ^~

However this new version does increase the size. Using i915 module
to test:

$ size build64/drivers/gpu/drm/i915/i915.ko*
    text    data     bss     dec     hex filename
4355676  213473    7048 4576197  45d3c5 build64/drivers/gpu/drm/i915/i915.ko
4361052  213505    7048 4581605  45e8e5 build64/drivers/gpu/drm/i915/i915.ko.new

Lucas De Marchi

>
>-#define __GENMASK_ULL(h, l) \
>-	(((~ULL(0)) - (ULL(1) << (l)) + 1) & \
>-	 (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
>-#define GENMASK_ULL(h, l) \
>-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>+#define GENMASK(h, l)		__GENMASK(unsigned long,  h, l)
>+#define GENMASK_ULL(h, l)	__GENMASK(unsigned long long, h, l)
>+#define GENMASK_U8(h, l)	__GENMASK(u8,  h, l)
>+#define GENMASK_U16(h, l)	__GENMASK(u16, h, l)
>+#define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
>+#define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
>
> #endif	/* __LINUX_BITS_H */
>-- 
>2.39.2
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-22  6:15     ` Lucas De Marchi
@ 2023-06-22 14:59       ` Yury Norov
  0 siblings, 0 replies; 30+ messages in thread
From: Yury Norov @ 2023-06-22 14:59 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, Rasmus Villemoes,
	linux-kernel, dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

+ Rasmus Villemoes <linux@rasmusvillemoes.dk>

> > -#define __GENMASK(h, l) \
> > -	(((~UL(0)) - (UL(1) << (l)) + 1) & \
> > -	 (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
> > -#define GENMASK(h, l) \
> > -	(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
> > +#define __GENMASK(t, h, l) \
> > +	(GENMASK_INPUT_CHECK(h, l) + \
> > +	 (((t)~0ULL - ((t)(1) << (l)) + 1) & \
> > +	 ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))
> 
> yeah... forcing the use of ull and then casting to the type is simpler
> and does the job. Checked that it does not break the build if h is
> greater than the type and it works
> 
> ../include/linux/bits.h:40:20: error: right shift count >= width of type [-Werror=shift-count-overflow]
>    40 |          ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))
>       |                    ^~
> 
> However this new version does increase the size. Using i915 module
> to test:
> 
> $ size build64/drivers/gpu/drm/i915/i915.ko*
>    text    data     bss     dec     hex filename
> 4355676  213473    7048 4576197  45d3c5 build64/drivers/gpu/drm/i915/i915.ko
> 4361052  213505    7048 4581605  45e8e5 build64/drivers/gpu/drm/i915/i915.ko.new

It sounds weird because all that should anyways boil down at compile
time...

I enabled DRM_I915 in config and ran bloat-o-meter against today's
master, and I don't see that much difference.

  $ size vmlinux vmlinux.new
     text	   data	    bss	    dec	    hex	filename
  44978613	23962202	3026948	71967763	44a2413	vmlinux
  44978653	23966298	3026948	71971899	44a343b	vmlinux.new
  $ scripts/bloat-o-meter vmlinux vmlinux.new 
  add/remove: 0/0 grow/shrink: 3/2 up/down: 28/-5 (23)
  Function                                     old     new   delta
  kvm_mmu_reset_all_pte_masks                  623     639     +16
  intel_psr_invalidate                        1112    1119      +7
  intel_drrs_activate                          624     629      +5
  intel_psr_flush                             1410    1409      -1
  clk_fractional_divider_general_approximation     207     203      -4
  Total: Before=35398799, After=35398822, chg +0.00%

Can you please check your numbers?

Interestingly, the kvm_mmu_reset_all_pte_masks() uses GENMASK_ULL(),
which should generate the same code across versions. Maybe it's just
a noise? Rasmus, can you please take a look?

Thanks,
Yury


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2023-06-22  2:20   ` Yury Norov
  2023-06-22  6:15     ` Lucas De Marchi
@ 2024-01-18 20:42     ` Lucas De Marchi
  2024-01-18 21:48       ` Yury Norov
  1 sibling, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2024-01-18 20:42 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

Hi,

Reviving this thread as now with xe driver merged we have 2 users for
a fixed-width BIT/GENMASK.

On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
>Hi Lucas, all!
>
>(Thanks, Andy, for pointing to this thread.)
>
>On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> masks for fixed-width types and also the corresponding BIT_U32(),
>> BIT_U16() and BIT_U8().
>
>Can you split BIT() and GENMASK() material to separate patches?
>
>> All of those depend on a new "U" suffix added to the integer constant.
>> Due to naming clashes it's better to call the macro U32. Since C doesn't
>> have a proper suffix for short and char types, the U16 and U18 variants
>> just use U32 with one additional check in the BIT_* macros to make
>> sure the compiler gives an error when the those types overflow.
>
>I feel like I don't understand the sentence...
>
>> The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> as otherwise they would allow an invalid bit to be passed. Hence
>> implement them in include/linux/bits.h rather than together with
>> the other BIT* variants.
>
>I don't think it's a good way to go because BIT() belongs to a more basic
>level than GENMASK(). Not mentioning possible header dependency issues.
>If you need to test against tighter numeric region, I'd suggest to
>do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
>directly. Something like:
>        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))

but then make uapi/linux/const.h include linux/build_bug.h?
I was thinking about leaving BIT() define where it is, and add the
fixed-width versions in this header. I was thinking uapi/linux/const.h
was more about allowing the U/ULL suffixes for things shared with asm.

Lucas De Marchi

>
>> The following test file is is used to test this:
>>
>> 	$ cat mask.c
>> 	#include <linux/types.h>
>> 	#include <linux/bits.h>
>>
>> 	static const u32 a = GENMASK_U32(31, 0);
>> 	static const u16 b = GENMASK_U16(15, 0);
>> 	static const u8 c = GENMASK_U8(7, 0);
>> 	static const u32 x = BIT_U32(31);
>> 	static const u16 y = BIT_U16(15);
>> 	static const u8 z = BIT_U8(7);
>>
>> 	#if FAIL
>> 	static const u32 a2 = GENMASK_U32(32, 0);
>> 	static const u16 b2 = GENMASK_U16(16, 0);
>> 	static const u8 c2 = GENMASK_U8(8, 0);
>> 	static const u32 x2 = BIT_U32(32);
>> 	static const u16 y2 = BIT_U16(16);
>> 	static const u8 z2 = BIT_U8(8);
>> 	#endif
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  include/linux/bits.h       | 22 ++++++++++++++++++++++
>>  include/uapi/linux/const.h |  2 ++
>>  include/vdso/const.h       |  1 +
>>  3 files changed, 25 insertions(+)
>>
>> diff --git a/include/linux/bits.h b/include/linux/bits.h
>> index 7c0cf5031abe..ff4786c99b8c 100644
>> --- a/include/linux/bits.h
>> +++ b/include/linux/bits.h
>> @@ -42,4 +42,26 @@
>>  #define GENMASK_ULL(h, l) \
>>  	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>>
>> +#define __GENMASK_U32(h, l) \
>> +	(((~U32(0)) - (U32(1) << (l)) + 1) & \
>> +	 (~U32(0) >> (32 - 1 - (h))))
>> +#define GENMASK_U32(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U32(h, l))
>> +
>> +#define __GENMASK_U16(h, l) \
>> +	((U32(0xffff) - (U32(1) << (l)) + 1) & \
>> +	 (U32(0xffff) >> (16 - 1 - (h))))
>> +#define GENMASK_U16(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U16(h, l))
>> +
>> +#define __GENMASK_U8(h, l) \
>> +	(((U32(0xff)) - (U32(1) << (l)) + 1) & \
>> +	 (U32(0xff) >> (8 - 1 - (h))))
>> +#define GENMASK_U8(h, l) \
>> +	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_U8(h, l))
>
>[...]
>
>I see nothing wrong with fixed-wight versions of GENMASK if it helps
>people to write safer code. Can you please in commit message mention
>the exact patch(es) that added a bug related to GENMASK() misuse? It
>would be easier to advocate the purpose of new API with that in mind.
>
>Regarding implementation - we should avoid copy-pasting in cases
>like this. Below is the patch that I boot-tested for x86_64 and
>compile-tested for arm64.
>
>It looks less opencoded, and maybe Andy will be less skeptical about
>this approach because of less maintenance burden. Please take it if
>you like for v2.
>
>Thanks,
>Yury
>
>From 39c5b35075df67e7d88644470ca78a3486367c02 Mon Sep 17 00:00:00 2001
>From: Yury Norov <yury.norov@gmail.com>
>Date: Wed, 21 Jun 2023 15:27:29 -0700
>Subject: [PATCH] bits: introduce fixed-type genmasks
>
>Generalize __GENMASK() to support different types, and implement
>fixed-types versions of GENMASK() based on it.
>
>Signed-off-by: Yury Norov <yury.norov@gmail.com>
>---
> include/linux/bitops.h |  1 -
> include/linux/bits.h   | 22 ++++++++++++----------
> 2 files changed, 12 insertions(+), 11 deletions(-)
>
>diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>index 2ba557e067fe..1db50c69cfdb 100644
>--- a/include/linux/bitops.h
>+++ b/include/linux/bitops.h
>@@ -15,7 +15,6 @@
> #  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> #endif
>
>-#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
> #define BITS_TO_LONGS(nr)	__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
> #define BITS_TO_U64(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u64))
> #define BITS_TO_U32(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
>diff --git a/include/linux/bits.h b/include/linux/bits.h
>index 7c0cf5031abe..cb94128171b2 100644
>--- a/include/linux/bits.h
>+++ b/include/linux/bits.h
>@@ -6,6 +6,8 @@
> #include <vdso/bits.h>
> #include <asm/bitsperlong.h>
>
>+#define BITS_PER_TYPE(type)	(sizeof(type) * BITS_PER_BYTE)
>+
> #define BIT_MASK(nr)		(UL(1) << ((nr) % BITS_PER_LONG))
> #define BIT_WORD(nr)		((nr) / BITS_PER_LONG)
> #define BIT_ULL_MASK(nr)	(ULL(1) << ((nr) % BITS_PER_LONG_LONG))
>@@ -30,16 +32,16 @@
> #define GENMASK_INPUT_CHECK(h, l) 0
> #endif
>
>-#define __GENMASK(h, l) \
>-	(((~UL(0)) - (UL(1) << (l)) + 1) & \
>-	 (~UL(0) >> (BITS_PER_LONG - 1 - (h))))
>-#define GENMASK(h, l) \
>-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>+#define __GENMASK(t, h, l) \
>+	(GENMASK_INPUT_CHECK(h, l) + \
>+	 (((t)~0ULL - ((t)(1) << (l)) + 1) & \
>+	 ((t)~0ULL >> (BITS_PER_TYPE(t) - 1 - (h)))))
>
>-#define __GENMASK_ULL(h, l) \
>-	(((~ULL(0)) - (ULL(1) << (l)) + 1) & \
>-	 (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
>-#define GENMASK_ULL(h, l) \
>-	(GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>+#define GENMASK(h, l)		__GENMASK(unsigned long,  h, l)
>+#define GENMASK_ULL(h, l)	__GENMASK(unsigned long long, h, l)
>+#define GENMASK_U8(h, l)	__GENMASK(u8,  h, l)
>+#define GENMASK_U16(h, l)	__GENMASK(u16, h, l)
>+#define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
>+#define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
>
> #endif	/* __LINUX_BITS_H */
>-- 
>2.39.2
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2024-01-18 20:42     ` Re: [Intel-xe] " Lucas De Marchi
@ 2024-01-18 21:48       ` Yury Norov
  2024-01-18 23:25         ` Lucas De Marchi
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Norov @ 2024-01-18 21:48 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

On Thu, Jan 18, 2024 at 02:42:12PM -0600, Lucas De Marchi wrote:
> Hi,
> 
> Reviving this thread as now with xe driver merged we have 2 users for
> a fixed-width BIT/GENMASK.

Can you point where and why?
 
> On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
> > Hi Lucas, all!
> > 
> > (Thanks, Andy, for pointing to this thread.)
> > 
> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> > > Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> > > masks for fixed-width types and also the corresponding BIT_U32(),
> > > BIT_U16() and BIT_U8().
> > 
> > Can you split BIT() and GENMASK() material to separate patches?
> > 
> > > All of those depend on a new "U" suffix added to the integer constant.
> > > Due to naming clashes it's better to call the macro U32. Since C doesn't
> > > have a proper suffix for short and char types, the U16 and U18 variants
> > > just use U32 with one additional check in the BIT_* macros to make
> > > sure the compiler gives an error when the those types overflow.
> > 
> > I feel like I don't understand the sentence...
> > 
> > > The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
> > > as otherwise they would allow an invalid bit to be passed. Hence
> > > implement them in include/linux/bits.h rather than together with
> > > the other BIT* variants.
> > 
> > I don't think it's a good way to go because BIT() belongs to a more basic
> > level than GENMASK(). Not mentioning possible header dependency issues.
> > If you need to test against tighter numeric region, I'd suggest to
> > do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
> > directly. Something like:
> >        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))
> 
> but then make uapi/linux/const.h include linux/build_bug.h?
> I was thinking about leaving BIT() define where it is, and add the
> fixed-width versions in this header. I was thinking uapi/linux/const.h
> was more about allowing the U/ULL suffixes for things shared with asm.

You can't include kernel headers in uapi code. But you can try doing
vice-versa: implement or move the pieces you need to share to the
uapi/linux/const.h, and use them in the kernel code.

In the worst case, you can just implement the macro you need in the
uapi header, and make it working that way.

Can you confirm that my proposal increases the kernel size? If so, is
there any way to fix it? If it doesn't, I'd prefer to use the
__GENMASK() approach.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2024-01-18 21:48       ` Yury Norov
@ 2024-01-18 23:25         ` Lucas De Marchi
  2024-01-19  2:01           ` Yury Norov
  0 siblings, 1 reply; 30+ messages in thread
From: Lucas De Marchi @ 2024-01-18 23:25 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

On Thu, Jan 18, 2024 at 01:48:43PM -0800, Yury Norov wrote:
>On Thu, Jan 18, 2024 at 02:42:12PM -0600, Lucas De Marchi wrote:
>> Hi,
>>
>> Reviving this thread as now with xe driver merged we have 2 users for
>> a fixed-width BIT/GENMASK.
>
>Can you point where and why?

See users of REG_GENMASK and REG_BIT in drivers/gpu/drm/i915 and
drivers/gpu/drm/xe. I  think the register definition in the xe shows it
in a good way:

	drivers/gpu/drm/xe/regs/xe_gt_regs.h

The GPU registers are mostly 32-bit wide. We don't want to accidently do
something like below (s/30/33/ added for illustration purposes):

#define LSC_CHICKEN_BIT_0                       XE_REG_MCR(0xe7c8)
#define   DISABLE_D8_D16_COASLESCE              REG_BIT(33)

Same thing for GENMASK family of macros and for registers that are 16 or
8 bits. See e.g. drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h


>
>> On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
>> > Hi Lucas, all!
>> >
>> > (Thanks, Andy, for pointing to this thread.)
>> >
>> > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> > > Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> > > masks for fixed-width types and also the corresponding BIT_U32(),
>> > > BIT_U16() and BIT_U8().
>> >
>> > Can you split BIT() and GENMASK() material to separate patches?
>> >
>> > > All of those depend on a new "U" suffix added to the integer constant.
>> > > Due to naming clashes it's better to call the macro U32. Since C doesn't
>> > > have a proper suffix for short and char types, the U16 and U18 variants
>> > > just use U32 with one additional check in the BIT_* macros to make
>> > > sure the compiler gives an error when the those types overflow.
>> >
>> > I feel like I don't understand the sentence...
>> >
>> > > The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> > > as otherwise they would allow an invalid bit to be passed. Hence
>> > > implement them in include/linux/bits.h rather than together with
>> > > the other BIT* variants.
>> >
>> > I don't think it's a good way to go because BIT() belongs to a more basic
>> > level than GENMASK(). Not mentioning possible header dependency issues.
>> > If you need to test against tighter numeric region, I'd suggest to
>> > do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
>> > directly. Something like:
>> >        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))
>>
>> but then make uapi/linux/const.h include linux/build_bug.h?
>> I was thinking about leaving BIT() define where it is, and add the
>> fixed-width versions in this header. I was thinking uapi/linux/const.h
>> was more about allowing the U/ULL suffixes for things shared with asm.
>
>You can't include kernel headers in uapi code. But you can try doing
>vice-versa: implement or move the pieces you need to share to the
>uapi/linux/const.h, and use them in the kernel code.

but in this CONST_GE() should trigger a BUG/static_assert
on U8_MAX < x. AFAICS that check can't be on the uapi/ side,
so there's nothing much left to change in uapi/linux/const.h.

I'd expect drivers to be the primary user of these fixed-width BIT
variants, hence the proposal to do  in include/linux/bits.h.
Ssomething like this WIP/untested diff (on top of your previous patch):


diff --git a/include/linux/bits.h b/include/linux/bits.h
index cb94128171b2..409cd10f7597 100644
--- a/include/linux/bits.h
+++ b/include/linux/bits.h
@@ -24,12 +24,16 @@
  #define GENMASK_INPUT_CHECK(h, l) \
  	(BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
  		__is_constexpr((l) > (h)), (l) > (h), 0)))
+#define BIT_INPUT_CHECK(type, b) \
+	((BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
+		__is_constexpr(b), (b) >= BITS_PER_TYPE(type), 0))))
  #else
  /*
   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
   * disable the input check if that is the case.
   */
  #define GENMASK_INPUT_CHECK(h, l) 0
+#define BIT_INPUT_CHECK(type, b) 0
  #endif
  
  #define __GENMASK(t, h, l) \
@@ -44,4 +48,9 @@
  #define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
  #define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
  
+#define BIT_U8(b)		(u8)(BIT_INPUT_CHECK(u8, b) + BIT(b))
+#define BIT_U16(b)		(u16)(BIT_INPUT_CHECK(u16, b) + BIT(b))
+#define BIT_U32(b)		(u32)(BIT_INPUT_CHECK(u32, b) + BIT(b))
+#define BIT_U64(b)		(u64)(BIT_INPUT_CHECK(u64, b) + BIT(b))
+
  #endif	/* __LINUX_BITS_H */

>
>In the worst case, you can just implement the macro you need in the
>uapi header, and make it working that way.
>
>Can you confirm that my proposal increases the kernel size? If so, is
>there any way to fix it? If it doesn't, I'd prefer to use the
>__GENMASK() approach.

I agree on continuing with your approach. The bloat-o-meter indeed
showed almost no difference. `size ....i915.o`  on the other hand
increased, but then decreased when I replaced our current REG_GENMASK()
implementation to reuse the new GENMASK_U*()

	$ # test-genmask.00: before any change
	$ # test-genmask.01: after your patch to GENMASK
	$ # test-genmask.01: after converting drivers/gpu/drm/i915/i915_reg_defs.h
	    to use the new macros
	$ size build64/drivers/gpu/drm/i915/i915.o-test-genmask.*
	   text    data     bss     dec     hex filename
	4506628  215083    7168 4728879  48282f build64/drivers/gpu/drm/i915/i915.o-test-genmask.00
	4511084  215083    7168 4733335  483997 build64/drivers/gpu/drm/i915/i915.o-test-genmask.01
	4493292  215083    7168 4715543  47f417 build64/drivers/gpu/drm/i915/i915.o-test-genmask.02

	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[01]
	add/remove: 0/0 grow/shrink: 2/1 up/down: 4/-5 (-1)
	Function                                     old     new   delta
	intel_drrs_activate                          399     402      +3
	intel_psr_invalidate                         546     547      +1
	intel_psr_flush                              880     875      -5
	Total: Before=2980530, After=2980529, chg -0.00%

	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[12]
	add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
	Function                                     old     new   delta
	Total: Before=2980529, After=2980529, chg +0.00%

thanks
Lucas De Marchi

>
>Thanks,
>Yury

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Re: Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2024-01-18 23:25         ` Lucas De Marchi
@ 2024-01-19  2:01           ` Yury Norov
  2024-01-19 15:07             ` Lucas De Marchi
  0 siblings, 1 reply; 30+ messages in thread
From: Yury Norov @ 2024-01-19  2:01 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Andy Shevchenko, intel-gfx, Kevin Brodsky, linux-kernel,
	dri-devel, intel-xe, Thomas Gleixner, Alex Deucher,
	Andrew Morton, Masahiro Yamada, Christian König

On Thu, Jan 18, 2024 at 05:25:00PM -0600, Lucas De Marchi wrote:
> SA2PR11MB4874
> X-OriginatorOrg: intel.com
> Status: RO
> Content-Length: 6257
> Lines: 150
> 
> On Thu, Jan 18, 2024 at 01:48:43PM -0800, Yury Norov wrote:
> > On Thu, Jan 18, 2024 at 02:42:12PM -0600, Lucas De Marchi wrote:
> > > Hi,
> > > 
> > > Reviving this thread as now with xe driver merged we have 2 users for
> > > a fixed-width BIT/GENMASK.
> > 
> > Can you point where and why?
> 
> See users of REG_GENMASK and REG_BIT in drivers/gpu/drm/i915 and
> drivers/gpu/drm/xe. I  think the register definition in the xe shows it
> in a good way:
> 
> 	drivers/gpu/drm/xe/regs/xe_gt_regs.h
> 
> The GPU registers are mostly 32-bit wide. We don't want to accidently do
> something like below (s/30/33/ added for illustration purposes):
> 
> #define LSC_CHICKEN_BIT_0                       XE_REG_MCR(0xe7c8)
> #define   DISABLE_D8_D16_COASLESCE              REG_BIT(33)
> 
> Same thing for GENMASK family of macros and for registers that are 16 or
> 8 bits. See e.g. drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
> 
> 
> > 
> > > On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
> > > > Hi Lucas, all!
> > > >
> > > > (Thanks, Andy, for pointing to this thread.)
> > > >
> > > > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
> > > > > Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
> > > > > masks for fixed-width types and also the corresponding BIT_U32(),
> > > > > BIT_U16() and BIT_U8().
> > > >
> > > > Can you split BIT() and GENMASK() material to separate patches?
> > > >
> > > > > All of those depend on a new "U" suffix added to the integer constant.
> > > > > Due to naming clashes it's better to call the macro U32. Since C doesn't
> > > > > have a proper suffix for short and char types, the U16 and U18 variants
> > > > > just use U32 with one additional check in the BIT_* macros to make
> > > > > sure the compiler gives an error when the those types overflow.
> > > >
> > > > I feel like I don't understand the sentence...
> > > >
> > > > > The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
> > > > > as otherwise they would allow an invalid bit to be passed. Hence
> > > > > implement them in include/linux/bits.h rather than together with
> > > > > the other BIT* variants.
> > > >
> > > > I don't think it's a good way to go because BIT() belongs to a more basic
> > > > level than GENMASK(). Not mentioning possible header dependency issues.
> > > > If you need to test against tighter numeric region, I'd suggest to
> > > > do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
> > > > directly. Something like:
> > > >        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))
> > > 
> > > but then make uapi/linux/const.h include linux/build_bug.h?
> > > I was thinking about leaving BIT() define where it is, and add the
> > > fixed-width versions in this header. I was thinking uapi/linux/const.h
> > > was more about allowing the U/ULL suffixes for things shared with asm.
> > 
> > You can't include kernel headers in uapi code. But you can try doing
> > vice-versa: implement or move the pieces you need to share to the
> > uapi/linux/const.h, and use them in the kernel code.
> 
> but in this CONST_GE() should trigger a BUG/static_assert
> on U8_MAX < x. AFAICS that check can't be on the uapi/ side,
> so there's nothing much left to change in uapi/linux/const.h.
> 
> I'd expect drivers to be the primary user of these fixed-width BIT
> variants, hence the proposal to do  in include/linux/bits.h.
> Ssomething like this WIP/untested diff (on top of your previous patch):
> 
> 
> diff --git a/include/linux/bits.h b/include/linux/bits.h
> index cb94128171b2..409cd10f7597 100644
> --- a/include/linux/bits.h
> +++ b/include/linux/bits.h
> @@ -24,12 +24,16 @@
>  #define GENMASK_INPUT_CHECK(h, l) \
>  	(BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>  		__is_constexpr((l) > (h)), (l) > (h), 0)))
> +#define BIT_INPUT_CHECK(type, b) \
> +	((BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
> +		__is_constexpr(b), (b) >= BITS_PER_TYPE(type), 0))))
>  #else
>  /*
>   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
>   * disable the input check if that is the case.
>   */
>  #define GENMASK_INPUT_CHECK(h, l) 0
> +#define BIT_INPUT_CHECK(type, b) 0
>  #endif
>  #define __GENMASK(t, h, l) \
> @@ -44,4 +48,9 @@
>  #define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
>  #define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
> +#define BIT_U8(b)		(u8)(BIT_INPUT_CHECK(u8, b) + BIT(b))
> +#define BIT_U16(b)		(u16)(BIT_INPUT_CHECK(u16, b) + BIT(b))
> +#define BIT_U32(b)		(u32)(BIT_INPUT_CHECK(u32, b) + BIT(b))
> +#define BIT_U64(b)		(u64)(BIT_INPUT_CHECK(u64, b) + BIT(b))

Can you add some vertical spacing here, like between GENMASK and BIT
blocks?

> +
>  #endif	/* __LINUX_BITS_H */
> 
> > 
> > In the worst case, you can just implement the macro you need in the
> > uapi header, and make it working that way.
> > 
> > Can you confirm that my proposal increases the kernel size? If so, is
> > there any way to fix it? If it doesn't, I'd prefer to use the
> > __GENMASK() approach.
> 
> I agree on continuing with your approach. The bloat-o-meter indeed
> showed almost no difference. `size ....i915.o`  on the other hand
> increased, but then decreased when I replaced our current REG_GENMASK()
> implementation to reuse the new GENMASK_U*()
> 
> 	$ # test-genmask.00: before any change
> 	$ # test-genmask.01: after your patch to GENMASK
> 	$ # test-genmask.01: after converting drivers/gpu/drm/i915/i915_reg_defs.h
> 	    to use the new macros
> 	$ size build64/drivers/gpu/drm/i915/i915.o-test-genmask.*
> 	   text    data     bss     dec     hex filename
> 	4506628  215083    7168 4728879  48282f build64/drivers/gpu/drm/i915/i915.o-test-genmask.00
> 	4511084  215083    7168 4733335  483997 build64/drivers/gpu/drm/i915/i915.o-test-genmask.01
> 	4493292  215083    7168 4715543  47f417 build64/drivers/gpu/drm/i915/i915.o-test-genmask.02
> 
> 	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[01]
> 	add/remove: 0/0 grow/shrink: 2/1 up/down: 4/-5 (-1)
> 	Function                                     old     new   delta
> 	intel_drrs_activate                          399     402      +3
> 	intel_psr_invalidate                         546     547      +1
> 	intel_psr_flush                              880     875      -5
> 	Total: Before=2980530, After=2980529, chg -0.00%
> 
> 	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[12]
> 	add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> 	Function                                     old     new   delta
> 	Total

OK then. With the above approach, fixed-type BIT() macros look like wrappers
around the plain BIT(), and I think, we can live with that.

Can you  send all the material as a proper series, including my
GENMASK patch, your patch above and a patch that switches your driver
to using the new API? I'll take it then in bitmap-for-next when the
merge window will get closed.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: Re: Re: [Intel-xe] [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros
  2024-01-19  2:01           ` Yury Norov
@ 2024-01-19 15:07             ` Lucas De Marchi
  0 siblings, 0 replies; 30+ messages in thread
From: Lucas De Marchi @ 2024-01-19 15:07 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, intel-gfx, Kevin Brodsky, linux-kernel, dri-devel,
	Christian König, Masahiro Yamada, Alex Deucher,
	Thomas Gleixner, Andy Shevchenko, intel-xe

On Thu, Jan 18, 2024 at 06:01:58PM -0800, Yury Norov wrote:
>On Thu, Jan 18, 2024 at 05:25:00PM -0600, Lucas De Marchi wrote:
>> SA2PR11MB4874
>> X-OriginatorOrg: intel.com
>> Status: RO
>> Content-Length: 6257
>> Lines: 150
>>
>> On Thu, Jan 18, 2024 at 01:48:43PM -0800, Yury Norov wrote:
>> > On Thu, Jan 18, 2024 at 02:42:12PM -0600, Lucas De Marchi wrote:
>> > > Hi,
>> > >
>> > > Reviving this thread as now with xe driver merged we have 2 users for
>> > > a fixed-width BIT/GENMASK.
>> >
>> > Can you point where and why?
>>
>> See users of REG_GENMASK and REG_BIT in drivers/gpu/drm/i915 and
>> drivers/gpu/drm/xe. I  think the register definition in the xe shows it
>> in a good way:
>>
>> 	drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>
>> The GPU registers are mostly 32-bit wide. We don't want to accidently do
>> something like below (s/30/33/ added for illustration purposes):
>>
>> #define LSC_CHICKEN_BIT_0                       XE_REG_MCR(0xe7c8)
>> #define   DISABLE_D8_D16_COASLESCE              REG_BIT(33)
>>
>> Same thing for GENMASK family of macros and for registers that are 16 or
>> 8 bits. See e.g. drivers/gpu/drm/i915/display/intel_cx0_phy_regs.h
>>
>>
>> >
>> > > On Wed, Jun 21, 2023 at 07:20:59PM -0700, Yury Norov wrote:
>> > > > Hi Lucas, all!
>> > > >
>> > > > (Thanks, Andy, for pointing to this thread.)
>> > > >
>> > > > On Mon, May 08, 2023 at 10:14:02PM -0700, Lucas De Marchi wrote:
>> > > > > Add GENMASK_U32(), GENMASK_U16() and GENMASK_U8()  macros to create
>> > > > > masks for fixed-width types and also the corresponding BIT_U32(),
>> > > > > BIT_U16() and BIT_U8().
>> > > >
>> > > > Can you split BIT() and GENMASK() material to separate patches?
>> > > >
>> > > > > All of those depend on a new "U" suffix added to the integer constant.
>> > > > > Due to naming clashes it's better to call the macro U32. Since C doesn't
>> > > > > have a proper suffix for short and char types, the U16 and U18 variants
>> > > > > just use U32 with one additional check in the BIT_* macros to make
>> > > > > sure the compiler gives an error when the those types overflow.
>> > > >
>> > > > I feel like I don't understand the sentence...
>> > > >
>> > > > > The BIT_U16() and BIT_U8() need the help of GENMASK_INPUT_CHECK(),
>> > > > > as otherwise they would allow an invalid bit to be passed. Hence
>> > > > > implement them in include/linux/bits.h rather than together with
>> > > > > the other BIT* variants.
>> > > >
>> > > > I don't think it's a good way to go because BIT() belongs to a more basic
>> > > > level than GENMASK(). Not mentioning possible header dependency issues.
>> > > > If you need to test against tighter numeric region, I'd suggest to
>> > > > do the same trick as  GENMASK_INPUT_CHECK() does, but in uapi/linux/const.h
>> > > > directly. Something like:
>> > > >        #define _U8(x)		(CONST_GT(U8_MAX, x) + _AC(x, U))
>> > >
>> > > but then make uapi/linux/const.h include linux/build_bug.h?
>> > > I was thinking about leaving BIT() define where it is, and add the
>> > > fixed-width versions in this header. I was thinking uapi/linux/const.h
>> > > was more about allowing the U/ULL suffixes for things shared with asm.
>> >
>> > You can't include kernel headers in uapi code. But you can try doing
>> > vice-versa: implement or move the pieces you need to share to the
>> > uapi/linux/const.h, and use them in the kernel code.
>>
>> but in this CONST_GE() should trigger a BUG/static_assert
>> on U8_MAX < x. AFAICS that check can't be on the uapi/ side,
>> so there's nothing much left to change in uapi/linux/const.h.
>>
>> I'd expect drivers to be the primary user of these fixed-width BIT
>> variants, hence the proposal to do  in include/linux/bits.h.
>> Ssomething like this WIP/untested diff (on top of your previous patch):
>>
>>
>> diff --git a/include/linux/bits.h b/include/linux/bits.h
>> index cb94128171b2..409cd10f7597 100644
>> --- a/include/linux/bits.h
>> +++ b/include/linux/bits.h
>> @@ -24,12 +24,16 @@
>>  #define GENMASK_INPUT_CHECK(h, l) \
>>  	(BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>>  		__is_constexpr((l) > (h)), (l) > (h), 0)))
>> +#define BIT_INPUT_CHECK(type, b) \
>> +	((BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>> +		__is_constexpr(b), (b) >= BITS_PER_TYPE(type), 0))))
>>  #else
>>  /*
>>   * BUILD_BUG_ON_ZERO is not available in h files included from asm files,
>>   * disable the input check if that is the case.
>>   */
>>  #define GENMASK_INPUT_CHECK(h, l) 0
>> +#define BIT_INPUT_CHECK(type, b) 0
>>  #endif
>>  #define __GENMASK(t, h, l) \
>> @@ -44,4 +48,9 @@
>>  #define GENMASK_U32(h, l)	__GENMASK(u32, h, l)
>>  #define GENMASK_U64(h, l)	__GENMASK(u64, h, l)
>> +#define BIT_U8(b)		(u8)(BIT_INPUT_CHECK(u8, b) + BIT(b))
>> +#define BIT_U16(b)		(u16)(BIT_INPUT_CHECK(u16, b) + BIT(b))
>> +#define BIT_U32(b)		(u32)(BIT_INPUT_CHECK(u32, b) + BIT(b))
>> +#define BIT_U64(b)		(u64)(BIT_INPUT_CHECK(u64, b) + BIT(b))
>
>Can you add some vertical spacing here, like between GENMASK and BIT
>blocks?

I think gmail mangled this, because it does show up with more vertical
space on the email I sent:
https://lore.kernel.org/all/clamvpymzwiehjqd6jhuigymyg5ikxewxyeee2eae4tgzmaz7u@6rposizee3t6/

Anyway, I will clean this up and probably add some docs about its usage.

>
>> +
>>  #endif	/* __LINUX_BITS_H */
>>
>> >
>> > In the worst case, you can just implement the macro you need in the
>> > uapi header, and make it working that way.
>> >
>> > Can you confirm that my proposal increases the kernel size? If so, is
>> > there any way to fix it? If it doesn't, I'd prefer to use the
>> > __GENMASK() approach.
>>
>> I agree on continuing with your approach. The bloat-o-meter indeed
>> showed almost no difference. `size ....i915.o`  on the other hand
>> increased, but then decreased when I replaced our current REG_GENMASK()
>> implementation to reuse the new GENMASK_U*()
>>
>> 	$ # test-genmask.00: before any change
>> 	$ # test-genmask.01: after your patch to GENMASK
>> 	$ # test-genmask.01: after converting drivers/gpu/drm/i915/i915_reg_defs.h
>> 	    to use the new macros
>> 	$ size build64/drivers/gpu/drm/i915/i915.o-test-genmask.*
>> 	   text    data     bss     dec     hex filename
>> 	4506628  215083    7168 4728879  48282f build64/drivers/gpu/drm/i915/i915.o-test-genmask.00
>> 	4511084  215083    7168 4733335  483997 build64/drivers/gpu/drm/i915/i915.o-test-genmask.01
>> 	4493292  215083    7168 4715543  47f417 build64/drivers/gpu/drm/i915/i915.o-test-genmask.02
>>
>> 	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[01]
>> 	add/remove: 0/0 grow/shrink: 2/1 up/down: 4/-5 (-1)
>> 	Function                                     old     new   delta
>> 	intel_drrs_activate                          399     402      +3
>> 	intel_psr_invalidate                         546     547      +1
>> 	intel_psr_flush                              880     875      -5
>> 	Total: Before=2980530, After=2980529, chg -0.00%
>>
>> 	$ ./scripts/bloat-o-meter  build64/drivers/gpu/drm/i915/i915.o-test-genmask.0[12]
>> 	add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
>> 	Function                                     old     new   delta
>> 	Total
>
>OK then. With the above approach, fixed-type BIT() macros look like wrappers
>around the plain BIT(), and I think, we can live with that.
>
>Can you  send all the material as a proper series, including my
>GENMASK patch, your patch above and a patch that switches your driver
>to using the new API? I'll take it then in bitmap-for-next when the
>merge window will get closed.

sure, thanks


Lucas De Marchi

>
>Thanks,
>Yury

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-01-19 15:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09  5:14 [PATCH 0/3] Fixed-width mask/bit helpers Lucas De Marchi
2023-05-09  5:14 ` [PATCH 1/3] drm/amd: Remove wrapper macros over get_u{32,16,8} Lucas De Marchi
2023-05-09  5:14 ` [PATCH 2/3] linux/bits.h: Add fixed-width GENMASK and BIT macros Lucas De Marchi
2023-05-09 14:00   ` [Intel-xe] " Gustavo Sousa
2023-05-09 21:34     ` Lucas De Marchi
2023-05-10 12:18   ` kernel test robot
2023-05-12 11:14   ` Andy Shevchenko
2023-05-12 11:25     ` Jani Nikula
2023-05-12 11:32       ` Andy Shevchenko
2023-05-12 11:45         ` Jani Nikula
2023-06-15 15:53           ` Andy Shevchenko
2023-06-20 14:47             ` Jani Nikula
2023-06-20 14:55               ` Andy Shevchenko
2023-06-20 17:25                 ` [Intel-xe] " Lucas De Marchi
2023-06-20 17:41                   ` Andy Shevchenko
2023-06-20 18:02                     ` Lucas De Marchi
2023-06-20 18:19                     ` Jani Nikula
2023-05-12 16:29     ` Lucas De Marchi
2023-06-15 15:58       ` Andy Shevchenko
2023-06-22  2:20   ` Yury Norov
2023-06-22  6:15     ` Lucas De Marchi
2023-06-22 14:59       ` Yury Norov
2024-01-18 20:42     ` Re: [Intel-xe] " Lucas De Marchi
2024-01-18 21:48       ` Yury Norov
2024-01-18 23:25         ` Lucas De Marchi
2024-01-19  2:01           ` Yury Norov
2024-01-19 15:07             ` Lucas De Marchi
2023-05-09  5:14 ` [PATCH 3/3] drm/i915: Temporary conversion to new GENMASK/BIT macros Lucas De Marchi
2023-05-09  7:57   ` Jani Nikula
2023-05-09  8:15     ` Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).