linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Raspberry Pi 4 PCIe support
@ 2019-12-03 11:47 Nicolas Saenz Julienne
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
  2019-12-03 11:47 ` [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations Nicolas Saenz Julienne
  0 siblings, 2 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-03 11:47 UTC (permalink / raw)
  To: andrew.murray, maz, linux-kernel
  Cc: james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel,
	Nicolas Saenz Julienne, Robin Murphy, linux-arm-kernel,
	bcm-kernel-feedback-list, devicetree, linux-acpi, linux-clk,
	linux-rdma, iommu, netdev, linux-rockchip, kexec, linux-nfs

This series aims at providing support for Raspberry Pi 4's PCIe
controller, which is also shared with the Broadcom STB family of
devices.

There was a previous attempt to upstream this some years ago[1] but was
blocked as most STB PCIe integrations have a sparse DMA mapping[2] which
is something currently not supported by the kernel.  Luckily this is not
the case for the Raspberry Pi 4.

Note that the driver code is to be based on top of Rob Herring's series
simplifying inbound and outbound range parsing.

[1] https://patchwork.kernel.org/cover/10605933/
[2] https://patchwork.kernel.org/patch/10605957/

---

Changes since v3:
  - Moved all the log2.h related changes at the end of the series, as I
    presume they will be contentious and I don't want the PCIe patches
    to depend on them. Ultimately I think I'll respin them on their own
    series but wanted to keep them in for this submission just for the
    sake of continuity.
  - Addressed small nits here and there.

Changes since v2:
  - Redo register access in driver avoiding indirection while keeping
    the naming intact
  - Add patch editing ARM64's config
  - Last MSI cleanups, notably removing MSIX flag
  - Got rid of all _RB writes
  - Got rid of all of_data
  - Overall churn removal
  - Address the rest of Andrew's comments

Changes since v1:
  - add generic rounddown/roundup_pow_two64() patch
  - Add MAINTAINERS patch
  - Fix Kconfig
  - Cleanup probe, use up to date APIs, exit on MSI failure
  - Get rid of linux,pci-domain and other unused constructs
  - Use edge triggered setup for MSI
  - Cleanup MSI implementation
  - Fix multiple cosmetic issues
  - Remove supend/resume code

Jim Quinlan (3):
  dt-bindings: PCI: Add bindings for brcmstb's PCIe device
  PCI: brcmstb: Add Broadcom STB PCIe host controller driver
  PCI: brcmstb: Add MSI support

Nicolas Saenz Julienne (5):
  ARM: dts: bcm2711: Enable PCIe controller
  MAINTAINERS: Add brcmstb PCIe controller
  arm64: defconfig: Enable Broadcom's STB PCIe controller
  linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

 .../bindings/pci/brcm,stb-pcie.yaml           |   97 ++
 MAINTAINERS                                   |    4 +
 arch/arm/boot/dts/bcm2711.dtsi                |   37 +
 arch/arm64/configs/defconfig                  |    1 +
 drivers/acpi/arm64/iort.c                     |    2 +-
 drivers/clk/clk-divider.c                     |    8 +-
 drivers/clk/sunxi/clk-sunxi.c                 |    2 +-
 drivers/infiniband/hw/hfi1/chip.c             |    4 +-
 drivers/infiniband/hw/hfi1/init.c             |    4 +-
 drivers/infiniband/hw/mlx4/srq.c              |    2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c       |    2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c            |    4 +-
 drivers/iommu/intel-iommu.c                   |    4 +-
 drivers/iommu/intel-svm.c                     |    4 +-
 drivers/iommu/intel_irq_remapping.c           |    2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c  |    4 +-
 drivers/net/ethernet/marvell/sky2.c           |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |    3 +-
 drivers/net/ethernet/rocker/rocker_hw.h       |    4 +-
 drivers/net/ethernet/sfc/ef10.c               |    2 +-
 drivers/net/ethernet/sfc/efx.h                |    2 +-
 drivers/net/ethernet/sfc/falcon/efx.h         |    2 +-
 drivers/of/device.c                           |    3 +-
 drivers/pci/controller/Kconfig                |    9 +
 drivers/pci/controller/Makefile               |    1 +
 .../pci/controller/cadence/pcie-cadence-ep.c  |    3 +-
 drivers/pci/controller/cadence/pcie-cadence.c |    3 +-
 drivers/pci/controller/pcie-brcmstb.c         | 1008 +++++++++++++++++
 drivers/pci/controller/pcie-rockchip-ep.c     |    5 +-
 drivers/pci/msi.c                             |    2 +-
 include/linux/log2.h                          |   44 +-
 kernel/dma/direct.c                           |    2 +-
 kernel/kexec_core.c                           |    3 +-
 lib/rhashtable.c                              |    2 +-
 net/sunrpc/xprtrdma/verbs.c                   |    2 +-
 35 files changed, 1211 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-brcmstb.c

-- 
2.24.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 [PATCH v4 0/8] Raspberry Pi 4 PCIe support Nicolas Saenz Julienne
@ 2019-12-03 11:47 ` Nicolas Saenz Julienne
  2019-12-03 16:39   ` Chuck Lever
                     ` (5 more replies)
  2019-12-03 11:47 ` [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations Nicolas Saenz Julienne
  1 sibling, 6 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-03 11:47 UTC (permalink / raw)
  To: andrew.murray, maz, linux-kernel, Michael Turquette,
	Stephen Boyd, Emilio López, Maxime Ripard, Chen-Yu Tsai,
	Mike Marciniszyn, Dennis Dalessandro, Yishai Hadas, Moni Shoua,
	David Woodhouse, Lu Baolu, Joerg Roedel, Tom Lendacky,
	Mirko Lindner, Stephen Hemminger, Jiri Pirko,
	Solarflare linux maintainers, Edward Cree, Martin Habets,
	Bjorn Helgaas, Eric Biederman, Thomas Graf, Herbert Xu
  Cc: james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel,
	Nicolas Saenz Julienne, Robin Murphy, Doug Ledford,
	Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
out ilog2() already handles 32/64bit calculations properly, and being
the building block to the round functions we can rework them as a
wrapper around it.

Suggested-by: Robin Murphy <robin.murphy@arm.con>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
---
 drivers/clk/clk-divider.c                    |  8 ++--
 drivers/clk/sunxi/clk-sunxi.c                |  2 +-
 drivers/infiniband/hw/hfi1/chip.c            |  4 +-
 drivers/infiniband/hw/hfi1/init.c            |  4 +-
 drivers/infiniband/hw/mlx4/srq.c             |  2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-
 drivers/iommu/intel-iommu.c                  |  4 +-
 drivers/iommu/intel-svm.c                    |  4 +-
 drivers/iommu/intel_irq_remapping.c          |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
 drivers/net/ethernet/marvell/sky2.c          |  2 +-
 drivers/net/ethernet/rocker/rocker_hw.h      |  4 +-
 drivers/net/ethernet/sfc/ef10.c              |  2 +-
 drivers/net/ethernet/sfc/efx.h               |  2 +-
 drivers/net/ethernet/sfc/falcon/efx.h        |  2 +-
 drivers/pci/msi.c                            |  2 +-
 include/linux/log2.h                         | 44 +++++---------------
 kernel/kexec_core.c                          |  3 +-
 lib/rhashtable.c                             |  2 +-
 net/sunrpc/xprtrdma/verbs.c                  |  2 +-
 21 files changed, 41 insertions(+), 64 deletions(-)

diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
index 098b2b01f0af..ba947e4c8193 100644
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
 	int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
 
 	if (flags & CLK_DIVIDER_POWER_OF_TWO)
-		div = __roundup_pow_of_two(div);
+		div = roundup_pow_of_two(div);
 	if (table)
 		div = _round_up_table(table, div);
 
@@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table *table,
 	down = parent_rate / rate;
 
 	if (flags & CLK_DIVIDER_POWER_OF_TWO) {
-		up = __roundup_pow_of_two(up);
-		down = __rounddown_pow_of_two(down);
+		up = roundup_pow_of_two(up);
+		down = rounddown_pow_of_two(down);
 	} else if (table) {
 		up = _round_up_table(table, up);
 		down = _round_down_table(table, down);
@@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int div,
 	div++;
 
 	if (flags & CLK_DIVIDER_POWER_OF_TWO)
-		return __roundup_pow_of_two(div);
+		return roundup_pow_of_two(div);
 	if (table)
 		return _round_up_table(table, div);
 
diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
index 27201fd26e44..faec99dc09c0 100644
--- a/drivers/clk/sunxi/clk-sunxi.c
+++ b/drivers/clk/sunxi/clk-sunxi.c
@@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request *req)
 
 		calcm = DIV_ROUND_UP(div, 1 << calcp);
 	} else {
-		calcp = __roundup_pow_of_two(div);
+		calcp = roundup_pow_of_two(div);
 		calcp = calcp > 3 ? 3 : calcp;
 	}
 
diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 9b1fb84a3d45..96b1d343c32f 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
 			max_by_vl = krcvqs[i];
 	if (max_by_vl > 32)
 		goto no_qos;
-	m = ilog2(__roundup_pow_of_two(max_by_vl));
+	m = ilog2(roundup_pow_of_two(max_by_vl));
 
 	/* determine bits for vl */
-	n = ilog2(__roundup_pow_of_two(num_vls));
+	n = ilog2(roundup_pow_of_two(num_vls));
 
 	/* reject if too much is used */
 	if ((m + n) > 7)
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index 26b792bb1027..838c789c7cce 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
 		 * MTU supported.
 		 */
 		if (rcd->egrbufs.size < hfi1_max_mtu) {
-			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
+			rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
 			hfi1_cdbg(PROC,
 				  "ctxt%u: eager bufs size too small. Adjusting to %u\n",
 				    rcd->ctxt, rcd->egrbufs.size);
@@ -1959,7 +1959,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
 	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
 	 */
 	if (rcd->egrbufs.size <= (1 << 20))
-		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
+		rcd->egrbufs.rcvtid_size = max((unsigned long long)round_mtu,
 			rounddown_pow_of_two(rcd->egrbufs.size / 8));
 
 	while (alloced_bytes < rcd->egrbufs.size &&
diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index 8dcf6e3d9ae2..7e685600a7b3 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -96,7 +96,7 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
 	srq->msrq.max    = roundup_pow_of_two(init_attr->attr.max_wr + 1);
 	srq->msrq.max_gs = init_attr->attr.max_sge;
 
-	desc_size = max(32UL,
+	desc_size = max(32ULL,
 			roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) +
 					   srq->msrq.max_gs *
 					   sizeof (struct mlx4_wqe_data_seg)));
diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
index a85935ccce88..0c2e14b4142a 100644
--- a/drivers/infiniband/hw/mthca/mthca_srq.c
+++ b/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -225,7 +225,7 @@ int mthca_alloc_srq(struct mthca_dev *dev, struct mthca_pd *pd,
 	else
 		srq->max = srq->max + 1;
 
-	ds = max(64UL,
+	ds = max(64ULL,
 		 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
 				    srq->max_gs * sizeof (struct mthca_data_seg)));
 
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index e2c6d1cedf41..040b707b0877 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -592,7 +592,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
 	int err;
 
 	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
-		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
+		int max_rd_atomic = roundup_pow_of_two(attr->max_rd_atomic);
 
 		qp->attr.max_rd_atomic = max_rd_atomic;
 		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
@@ -600,7 +600,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
 
 	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
 		int max_dest_rd_atomic =
-			__roundup_pow_of_two(attr->max_dest_rd_atomic);
+			roundup_pow_of_two(attr->max_dest_rd_atomic);
 
 		qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0c8d81f56a30..ce7c900bd666 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1488,7 +1488,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 				  unsigned long pfn, unsigned int pages,
 				  int ih, int map)
 {
-	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
+	unsigned int mask = ilog2(roundup_pow_of_two(pages));
 	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
 	u16 did = domain->iommu_did[iommu->seq_id];
 
@@ -3390,7 +3390,7 @@ static unsigned long intel_alloc_iova(struct device *dev,
 	/* Restrict dma_mask to the width that the iommu can handle */
 	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
 	/* Ensure we reserve the whole size-aligned region */
-	nrpages = __roundup_pow_of_two(nrpages);
+	nrpages = roundup_pow_of_two(nrpages);
 
 	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
 		/*
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9b159132405d..602caca3cd1a 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -115,7 +115,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 			QI_EIOTLB_TYPE;
 		desc.qw1 = 0;
 	} else {
-		int mask = ilog2(__roundup_pow_of_two(pages));
+		int mask = ilog2(roundup_pow_of_two(pages));
 
 		desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
 				QI_EIOTLB_DID(sdev->did) |
@@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
 			 * for example, an "address" value of 0x12345f000 will
 			 * flush from 0x123440000 to 0x12347ffff (256KiB). */
 			unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
-			unsigned long mask = __rounddown_pow_of_two(address ^ last);
+			unsigned long mask = rounddown_pow_of_two(address ^ last);
 
 			desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
 					(mask - 1)) | QI_DEV_EIOTLB_SIZE;
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 81e43c1df7ec..935657b2c661 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -113,7 +113,7 @@ static int alloc_irte(struct intel_iommu *iommu,
 		return -1;
 
 	if (count > 1) {
-		count = __roundup_pow_of_two(count);
+		count = roundup_pow_of_two(count);
 		mask = ilog2(count);
 	}
 
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
index 6a757dadb5f1..fd5b12c23eaa 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
@@ -680,13 +680,13 @@ static int xgbe_set_ringparam(struct net_device *netdev,
 		return -EINVAL;
 	}
 
-	rx = __rounddown_pow_of_two(ringparam->rx_pending);
+	rx = rounddown_pow_of_two(ringparam->rx_pending);
 	if (rx != ringparam->rx_pending)
 		netdev_notice(netdev,
 			      "rx ring parameter rounded to power of two: %u\n",
 			      rx);
 
-	tx = __rounddown_pow_of_two(ringparam->tx_pending);
+	tx = rounddown_pow_of_two(ringparam->tx_pending);
 	if (tx != ringparam->tx_pending)
 		netdev_notice(netdev,
 			      "tx ring parameter rounded to power of two: %u\n",
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 5f56ee83e3b1..cc3a03b4a611 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4139,7 +4139,7 @@ static int sky2_set_coalesce(struct net_device *dev,
  */
 static unsigned long roundup_ring_size(unsigned long pending)
 {
-	return max(128ul, roundup_pow_of_two(pending+1));
+	return max(128ull, roundup_pow_of_two(pending+1));
 }
 
 static void sky2_get_ringparam(struct net_device *dev,
diff --git a/drivers/net/ethernet/rocker/rocker_hw.h b/drivers/net/ethernet/rocker/rocker_hw.h
index 59f1f8b690d2..d8de15509e2c 100644
--- a/drivers/net/ethernet/rocker/rocker_hw.h
+++ b/drivers/net/ethernet/rocker/rocker_hw.h
@@ -88,8 +88,8 @@ enum rocker_dma_type {
 };
 
 /* Rocker DMA ring size limits and default sizes */
-#define ROCKER_DMA_SIZE_MIN		2ul
-#define ROCKER_DMA_SIZE_MAX		65536ul
+#define ROCKER_DMA_SIZE_MIN		2ull
+#define ROCKER_DMA_SIZE_MAX		65536ull
 #define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
 #define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
 #define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 4d9bbccc6f89..4f4d9a5b3b75 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -27,7 +27,7 @@ enum {
 };
 /* The maximum size of a shared RSS context */
 /* TODO: this should really be from the mcdi protocol export */
-#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
+#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64ULL
 
 /* The filter table(s) are managed by firmware and we have write-only
  * access.  When removing filters we must identify them to the
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 2dd8d5002315..fea2add5860e 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -52,7 +52,7 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
 
 #define EFX_MAX_DMAQ_SIZE 4096UL
 #define EFX_DEFAULT_DMAQ_SIZE 1024UL
-#define EFX_MIN_DMAQ_SIZE 512UL
+#define EFX_MIN_DMAQ_SIZE 512ULL
 
 #define EFX_MAX_EVQ_SIZE 16384UL
 #define EFX_MIN_EVQ_SIZE 512UL
diff --git a/drivers/net/ethernet/sfc/falcon/efx.h b/drivers/net/ethernet/sfc/falcon/efx.h
index d3b4646545fa..0d16257156d6 100644
--- a/drivers/net/ethernet/sfc/falcon/efx.h
+++ b/drivers/net/ethernet/sfc/falcon/efx.h
@@ -55,7 +55,7 @@ void ef4_schedule_slow_fill(struct ef4_rx_queue *rx_queue);
 
 #define EF4_MAX_DMAQ_SIZE 4096UL
 #define EF4_DEFAULT_DMAQ_SIZE 1024UL
-#define EF4_MIN_DMAQ_SIZE 512UL
+#define EF4_MIN_DMAQ_SIZE 512ULL
 
 #define EF4_MAX_EVQ_SIZE 16384UL
 #define EF4_MIN_EVQ_SIZE 512UL
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index c7709e49f0e4..f0391e88bc42 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -578,7 +578,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
-	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
+	entry->msi_attrib.multiple	= ilog2(roundup_pow_of_two(nvec));
 
 	if (control & PCI_MSI_FLAGS_64BIT)
 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..53a727303dac 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -47,26 +47,6 @@ bool is_power_of_2(unsigned long n)
 	return (n != 0 && ((n & (n - 1)) == 0));
 }
 
-/**
- * __roundup_pow_of_two() - round up to nearest power of two
- * @n: value to round up
- */
-static inline __attribute__((const))
-unsigned long __roundup_pow_of_two(unsigned long n)
-{
-	return 1UL << fls_long(n - 1);
-}
-
-/**
- * __rounddown_pow_of_two() - round down to nearest power of two
- * @n: value to round down
- */
-static inline __attribute__((const))
-unsigned long __rounddown_pow_of_two(unsigned long n)
-{
-	return 1UL << (fls_long(n) - 1);
-}
-
 /**
  * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
  * @n: parameter
@@ -170,14 +150,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  * - the result is undefined when n == 0
  * - this can be used to initialise global variables from constant data
  */
-#define roundup_pow_of_two(n)			\
-(						\
-	__builtin_constant_p(n) ? (		\
-		(n == 1) ? 1 :			\
-		(1UL << (ilog2((n) - 1) + 1))	\
-				   ) :		\
-	__roundup_pow_of_two(n)			\
- )
+#define roundup_pow_of_two(n)			  \
+(						  \
+	(__builtin_constant_p(n) && ((n) == 1)) ? \
+	1 : (1ULL << (ilog2((n) - 1) + 1))        \
+)
 
 /**
  * rounddown_pow_of_two - round the given value down to nearest power of two
@@ -187,12 +164,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  * - the result is undefined when n == 0
  * - this can be used to initialise global variables from constant data
  */
-#define rounddown_pow_of_two(n)			\
-(						\
-	__builtin_constant_p(n) ? (		\
-		(1UL << ilog2(n))) :		\
-	__rounddown_pow_of_two(n)		\
- )
+#define rounddown_pow_of_two(n)			  \
+(						  \
+	(__builtin_constant_p(n) && ((n) == 1)) ? \
+	1 : (1ULL << (ilog2(n)))		  \
+)
 
 static inline __attribute_const__
 int __order_base_2(unsigned long n)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..bb9efc6944a4 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1094,7 +1094,8 @@ static int __init crash_notes_memory_init(void)
 	 * crash_notes is allocated inside one physical page.
 	 */
 	size = sizeof(note_buf_t);
-	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
+	align = min(roundup_pow_of_two(sizeof(note_buf_t)),
+		    (unsigned long long)PAGE_SIZE);
 
 	/*
 	 * Break compile if size is bigger than PAGE_SIZE since crash_notes
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bdb7e4cadf05..70908678c7a8 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
 
 	if (params->nelem_hint)
 		retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
-			      (unsigned long)params->min_size);
+			      (unsigned long long)params->min_size);
 	else
 		retsize = max(HASH_DEFAULT_SIZE,
 			      (unsigned long)params->min_size);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 77c7dd7f05e8..78fb8ccabddd 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1015,7 +1015,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
 	maxhdrsize = rpcrdma_fixed_maxsz + 3 +
 		     r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz;
 	maxhdrsize *= sizeof(__be32);
-	rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize),
+	rb = rpcrdma_regbuf_alloc(roundup_pow_of_two(maxhdrsize),
 				  DMA_TO_DEVICE, flags);
 	if (!rb)
 		goto out2;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations
  2019-12-03 11:47 [PATCH v4 0/8] Raspberry Pi 4 PCIe support Nicolas Saenz Julienne
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
@ 2019-12-03 11:47 ` Nicolas Saenz Julienne
  2019-12-03 15:53   ` Rob Herring
  2019-12-05 20:38   ` Bjorn Helgaas
  1 sibling, 2 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-03 11:47 UTC (permalink / raw)
  To: andrew.murray, maz, linux-kernel, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Tariq Toukan, Rob Herring, Frank Rowand,
	Florian Fainelli, bcm-kernel-feedback-list, Eric Anholt,
	Stefan Wahren, Shawn Lin, Heiko Stuebner, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy
  Cc: james.quinlan, mbrugger, phil, jeremy.linton, linux-pci,
	linux-rpi-kernel, Nicolas Saenz Julienne, Rafael J. Wysocki,
	Len Brown, David S. Miller, Bjorn Helgaas, linux-acpi,
	linux-arm-kernel, netdev, linux-rdma, devicetree, linux-rockchip,
	iommu

The function now is safe to use while expecting a 64bit value. Use it
where relevant.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
---
 drivers/acpi/arm64/iort.c                        | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c    | 3 ++-
 drivers/of/device.c                              | 3 ++-
 drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
 drivers/pci/controller/cadence/pcie-cadence.c    | 3 ++-
 drivers/pci/controller/pcie-brcmstb.c            | 3 ++-
 drivers/pci/controller/pcie-rockchip-ep.c        | 5 +++--
 kernel/dma/direct.c                              | 2 +-
 8 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 33f71983e001..9950c9757092 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1090,7 +1090,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
 		 * firmware.
 		 */
 		end = dmaaddr + size - 1;
-		mask = DMA_BIT_MASK(ilog2(end) + 1);
+		mask = roundup_pow_of_two(end) - 1;
 		dev->bus_dma_limit = end;
 		dev->coherent_dma_mask = mask;
 		*dev->dma_mask = mask;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 024788549c25..23dcb18224d4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -33,6 +33,7 @@
 
 #include <linux/mlx4/device.h>
 #include <linux/clocksource.h>
+#include <linux/log2.h>
 
 #include "mlx4_en.h"
 
@@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq)
 {
 	u32 freq_khz = freq * 1000;
 	u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-	u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
+	u64 max_val_cycles_rounded = roundup_pow_of_two(max_val_cycles);
 	/* calculate max possible multiplier in order to fit in 64bit */
 	u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index e9127db7b067..7259922d2078 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -11,6 +11,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/slab.h>
 #include <linux/platform_device.h>
+#include <linux/log2.h>
 
 #include <asm/errno.h>
 #include "of_private.h"
@@ -149,7 +150,7 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
 	 * set by the driver.
 	 */
 	end = dma_addr + size - 1;
-	mask = DMA_BIT_MASK(ilog2(end) + 1);
+	mask = roundup_pow_of_two(end) - 1;
 	dev->coherent_dma_mask &= mask;
 	*dev->dma_mask &= mask;
 	/* ...but only set bus limit if we found valid dma-ranges earlier */
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 1c173dad67d1..72eda0b2f939 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -10,6 +10,7 @@
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/sizes.h>
+#include <linux/log2.h>
 
 #include "pcie-cadence.h"
 
@@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 	 * roundup_pow_of_two() returns an unsigned long, which is not suited
 	 * for 64bit values.
 	 */
-	sz = 1ULL << fls64(sz - 1);
+	sz = roundup_pow_of_two(sz);
 	aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
 	if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/drivers/pci/controller/cadence/pcie-cadence.c b/drivers/pci/controller/cadence/pcie-cadence.c
index cd795f6fc1e2..b1689f725b41 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.c
+++ b/drivers/pci/controller/cadence/pcie-cadence.c
@@ -4,6 +4,7 @@
 // Author: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
 
 #include <linux/kernel.h>
+#include <linux/log2.h>
 
 #include "pcie-cadence.h"
 
@@ -15,7 +16,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, u8 fn,
 	 * roundup_pow_of_two() returns an unsigned long, which is not suited
 	 * for 64bit values.
 	 */
-	u64 sz = 1ULL << fls64(size - 1);
+	u64 sz = roundup_pow_of_two(size);
 	int nbits = ilog2(sz);
 	u32 addr0, addr1, desc0, desc1;
 
diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index 7ba06a0e1a71..e705d9d73030 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -627,7 +627,8 @@ static inline int brcm_pcie_get_rc_bar2_size_and_offset(struct brcm_pcie *pcie,
 		return -ENODEV;
 
 	*rc_bar2_offset = -entry->offset;
-	*rc_bar2_size = 1ULL << fls64(entry->res->end - entry->res->start);
+	*rc_bar2_size = roundup_pow_of_two(entry->res->end -
+					   entry->res->start + 1);
 
 	/*
 	 * We validate the inbound memory view even though we should trust
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index d743b0a48988..83665f5f804a 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -16,6 +16,7 @@
 #include <linux/platform_device.h>
 #include <linux/pci-epf.h>
 #include <linux/sizes.h>
+#include <linux/log2.h>
 
 #include "pcie-rockchip.h"
 
@@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct rockchip_pcie *rockchip, u8 fn,
 					 u32 r, u32 type, u64 cpu_addr,
 					 u64 pci_addr, size_t size)
 {
-	u64 sz = 1ULL << fls64(size - 1);
+	u64 sz = roundup_pow_of_two(size);
 	int num_pass_bits = ilog2(sz);
 	u32 addr0, addr1, desc0, desc1;
 	bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG);
@@ -176,7 +177,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 	 * roundup_pow_of_two() returns an unsigned long, which is not suited
 	 * for 64bit values.
 	 */
-	sz = 1ULL << fls64(sz - 1);
+	sz = roundup_pow_of_two(sz);
 	aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
 	if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6af7ae83c4ad..056886c4efec 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -53,7 +53,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
 {
 	u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
 
-	return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
+	return rounddown_pow_of_two(max_dma) * 2 - 1;
 }
 
 static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations
  2019-12-03 11:47 ` [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations Nicolas Saenz Julienne
@ 2019-12-03 15:53   ` Rob Herring
  2019-12-03 16:06     ` Nicolas Saenz Julienne
  2019-12-05 20:38   ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Rob Herring @ 2019-12-03 15:53 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: Andrew Murray, Marc Zyngier, linux-kernel, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Tariq Toukan, Frank Rowand,
	Florian Fainelli, maintainer:BROADCOM BCM7XXX ARM ARCHITECTURE,
	Eric Anholt, Stefan Wahren, Shawn Lin, Heiko Stuebner,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, james.quinlan,
	Matthias Brugger, Phil Elwell, Jeremy Linton, PCI,
	moderated list:BROADCOM BCM2835 ARM ARCHITECTURE,
	Rafael J. Wysocki, Len Brown, David S. Miller, Bjorn Helgaas,
	linux-acpi,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE, netdev,
	linux-rdma, devicetree, open list:ARM/Rockchip SoC...,
	Linux IOMMU

On Tue, Dec 3, 2019 at 5:48 AM Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>
> The function now is safe to use while expecting a 64bit value. Use it
> where relevant.

What was wrong with the existing code? This is missing some context.

> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>  drivers/acpi/arm64/iort.c                        | 2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_clock.c    | 3 ++-
>  drivers/of/device.c                              | 3 ++-

In any case,

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
>  drivers/pci/controller/cadence/pcie-cadence.c    | 3 ++-
>  drivers/pci/controller/pcie-brcmstb.c            | 3 ++-
>  drivers/pci/controller/pcie-rockchip-ep.c        | 5 +++--
>  kernel/dma/direct.c                              | 2 +-
>  8 files changed, 15 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations
  2019-12-03 15:53   ` Rob Herring
@ 2019-12-03 16:06     ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-03 16:06 UTC (permalink / raw)
  To: Rob Herring
  Cc: Andrew Murray, Marc Zyngier, linux-kernel, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Tariq Toukan, Frank Rowand,
	Florian Fainelli, maintainer:BROADCOM BCM7XXX ARM ARCHITECTURE,
	Eric Anholt, Stefan Wahren, Shawn Lin, Heiko Stuebner,
	Christoph Hellwig, Marek Szyprowski, Robin Murphy, james.quinlan,
	Matthias Brugger, Phil Elwell, Jeremy Linton, PCI,
	moderated list:BROADCOM BCM2835 ARM ARCHITECTURE,
	Rafael J. Wysocki, Len Brown, David S. Miller, Bjorn Helgaas,
	linux-acpi,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE, netdev,
	linux-rdma, devicetree, open list:ARM/Rockchip SoC...,
	Linux IOMMU

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

Hi Rob,

On Tue, 2019-12-03 at 09:53 -0600, Rob Herring wrote:
> On Tue, Dec 3, 2019 at 5:48 AM Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de> wrote:
> > The function now is safe to use while expecting a 64bit value. Use it
> > where relevant.
> 
> What was wrong with the existing code? This is missing some context.

You're right, I'll update it.

For most of files changed the benefit here is factoring out a common pattern
using the standard function roundup/down_pow_two() which now provides correct
64bit results.

As for of/device.c and arm64/iort.c it's more of a readability enhancement. I
consider it's easier to understand than the current calculation as it abstracts
the math.

> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> > ---
> >  drivers/acpi/arm64/iort.c                        | 2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_clock.c    | 3 ++-
> >  drivers/of/device.c                              | 3 ++-
> 
> In any case,
> 
> Acked-by: Rob Herring <robh@kernel.org>
> 

Thanks!

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
@ 2019-12-03 16:39   ` Chuck Lever
  2019-12-04 10:56   ` Martin Habets
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2019-12-03 16:39 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: andrew.murray, maz, Linux Kernel Mailing List, Michael Turquette,
	Stephen Boyd, Emilio López, Maxime Ripard, Chen-Yu Tsai,
	Mike Marciniszyn, Dennis Dalessandro, Yishai Hadas, Moni Shoua,
	David Woodhouse, Lu Baolu, Joerg Roedel, Tom Lendacky,
	Mirko Lindner, Stephen Hemminger, Jiri Pirko,
	Solarflare linux maintainers, Edward Cree, Martin Habets,
	Bjorn Helgaas, Eric W. Biederman, Thomas Graf, Herbert Xu,
	james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, Bruce Fields, linux-clk, linux-arm-kernel,
	linux-rdma, iommu, netdev, kexec, Linux NFS Mailing List



> On Dec 3, 2019, at 6:47 AM, Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote:
> 
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.
> 
> Suggested-by: Robin Murphy <robin.murphy@arm.con>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
> drivers/clk/clk-divider.c                    |  8 ++--
> drivers/clk/sunxi/clk-sunxi.c                |  2 +-
> drivers/infiniband/hw/hfi1/chip.c            |  4 +-
> drivers/infiniband/hw/hfi1/init.c            |  4 +-
> drivers/infiniband/hw/mlx4/srq.c             |  2 +-
> drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
> drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-
> drivers/iommu/intel-iommu.c                  |  4 +-
> drivers/iommu/intel-svm.c                    |  4 +-
> drivers/iommu/intel_irq_remapping.c          |  2 +-
> drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
> drivers/net/ethernet/marvell/sky2.c          |  2 +-
> drivers/net/ethernet/rocker/rocker_hw.h      |  4 +-
> drivers/net/ethernet/sfc/ef10.c              |  2 +-
> drivers/net/ethernet/sfc/efx.h               |  2 +-
> drivers/net/ethernet/sfc/falcon/efx.h        |  2 +-
> drivers/pci/msi.c                            |  2 +-
> include/linux/log2.h                         | 44 +++++---------------
> kernel/kexec_core.c                          |  3 +-
> lib/rhashtable.c                             |  2 +-
> net/sunrpc/xprtrdma/verbs.c                  |  2 +-
> 21 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index 098b2b01f0af..ba947e4c8193 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
> 	int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
> 
> 	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		div = __roundup_pow_of_two(div);
> +		div = roundup_pow_of_two(div);
> 	if (table)
> 		div = _round_up_table(table, div);
> 
> @@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table *table,
> 	down = parent_rate / rate;
> 
> 	if (flags & CLK_DIVIDER_POWER_OF_TWO) {
> -		up = __roundup_pow_of_two(up);
> -		down = __rounddown_pow_of_two(down);
> +		up = roundup_pow_of_two(up);
> +		down = rounddown_pow_of_two(down);
> 	} else if (table) {
> 		up = _round_up_table(table, up);
> 		down = _round_down_table(table, down);
> @@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int div,
> 	div++;
> 
> 	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		return __roundup_pow_of_two(div);
> +		return roundup_pow_of_two(div);
> 	if (table)
> 		return _round_up_table(table, div);
> 
> diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
> index 27201fd26e44..faec99dc09c0 100644
> --- a/drivers/clk/sunxi/clk-sunxi.c
> +++ b/drivers/clk/sunxi/clk-sunxi.c
> @@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request *req)
> 
> 		calcm = DIV_ROUND_UP(div, 1 << calcp);
> 	} else {
> -		calcp = __roundup_pow_of_two(div);
> +		calcp = roundup_pow_of_two(div);
> 		calcp = calcp > 3 ? 3 : calcp;
> 	}
> 
> diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
> index 9b1fb84a3d45..96b1d343c32f 100644
> --- a/drivers/infiniband/hw/hfi1/chip.c
> +++ b/drivers/infiniband/hw/hfi1/chip.c
> @@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
> 			max_by_vl = krcvqs[i];
> 	if (max_by_vl > 32)
> 		goto no_qos;
> -	m = ilog2(__roundup_pow_of_two(max_by_vl));
> +	m = ilog2(roundup_pow_of_two(max_by_vl));
> 
> 	/* determine bits for vl */
> -	n = ilog2(__roundup_pow_of_two(num_vls));
> +	n = ilog2(roundup_pow_of_two(num_vls));
> 
> 	/* reject if too much is used */
> 	if ((m + n) > 7)
> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
> index 26b792bb1027..838c789c7cce 100644
> --- a/drivers/infiniband/hw/hfi1/init.c
> +++ b/drivers/infiniband/hw/hfi1/init.c
> @@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
> 		 * MTU supported.
> 		 */
> 		if (rcd->egrbufs.size < hfi1_max_mtu) {
> -			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
> +			rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
> 			hfi1_cdbg(PROC,
> 				  "ctxt%u: eager bufs size too small. Adjusting to %u\n",
> 				    rcd->ctxt, rcd->egrbufs.size);
> @@ -1959,7 +1959,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
> 	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
> 	 */
> 	if (rcd->egrbufs.size <= (1 << 20))
> -		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
> +		rcd->egrbufs.rcvtid_size = max((unsigned long long)round_mtu,
> 			rounddown_pow_of_two(rcd->egrbufs.size / 8));
> 
> 	while (alloced_bytes < rcd->egrbufs.size &&
> diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
> index 8dcf6e3d9ae2..7e685600a7b3 100644
> --- a/drivers/infiniband/hw/mlx4/srq.c
> +++ b/drivers/infiniband/hw/mlx4/srq.c
> @@ -96,7 +96,7 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
> 	srq->msrq.max    = roundup_pow_of_two(init_attr->attr.max_wr + 1);
> 	srq->msrq.max_gs = init_attr->attr.max_sge;
> 
> -	desc_size = max(32UL,
> +	desc_size = max(32ULL,
> 			roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) +
> 					   srq->msrq.max_gs *
> 					   sizeof (struct mlx4_wqe_data_seg)));
> diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
> index a85935ccce88..0c2e14b4142a 100644
> --- a/drivers/infiniband/hw/mthca/mthca_srq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_srq.c
> @@ -225,7 +225,7 @@ int mthca_alloc_srq(struct mthca_dev *dev, struct mthca_pd *pd,
> 	else
> 		srq->max = srq->max + 1;
> 
> -	ds = max(64UL,
> +	ds = max(64ULL,
> 		 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
> 				    srq->max_gs * sizeof (struct mthca_data_seg)));
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index e2c6d1cedf41..040b707b0877 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -592,7 +592,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
> 	int err;
> 
> 	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
> -		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
> +		int max_rd_atomic = roundup_pow_of_two(attr->max_rd_atomic);
> 
> 		qp->attr.max_rd_atomic = max_rd_atomic;
> 		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
> @@ -600,7 +600,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
> 
> 	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
> 		int max_dest_rd_atomic =
> -			__roundup_pow_of_two(attr->max_dest_rd_atomic);
> +			roundup_pow_of_two(attr->max_dest_rd_atomic);
> 
> 		qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0c8d81f56a30..ce7c900bd666 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1488,7 +1488,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
> 				  unsigned long pfn, unsigned int pages,
> 				  int ih, int map)
> {
> -	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
> +	unsigned int mask = ilog2(roundup_pow_of_two(pages));
> 	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
> 	u16 did = domain->iommu_did[iommu->seq_id];
> 
> @@ -3390,7 +3390,7 @@ static unsigned long intel_alloc_iova(struct device *dev,
> 	/* Restrict dma_mask to the width that the iommu can handle */
> 	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
> 	/* Ensure we reserve the whole size-aligned region */
> -	nrpages = __roundup_pow_of_two(nrpages);
> +	nrpages = roundup_pow_of_two(nrpages);
> 
> 	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
> 		/*
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..602caca3cd1a 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -115,7 +115,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
> 			QI_EIOTLB_TYPE;
> 		desc.qw1 = 0;
> 	} else {
> -		int mask = ilog2(__roundup_pow_of_two(pages));
> +		int mask = ilog2(roundup_pow_of_two(pages));
> 
> 		desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
> 				QI_EIOTLB_DID(sdev->did) |
> @@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
> 			 * for example, an "address" value of 0x12345f000 will
> 			 * flush from 0x123440000 to 0x12347ffff (256KiB). */
> 			unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
> -			unsigned long mask = __rounddown_pow_of_two(address ^ last);
> +			unsigned long mask = rounddown_pow_of_two(address ^ last);
> 
> 			desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
> 					(mask - 1)) | QI_DEV_EIOTLB_SIZE;
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..935657b2c661 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -113,7 +113,7 @@ static int alloc_irte(struct intel_iommu *iommu,
> 		return -1;
> 
> 	if (count > 1) {
> -		count = __roundup_pow_of_two(count);
> +		count = roundup_pow_of_two(count);
> 		mask = ilog2(count);
> 	}
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> index 6a757dadb5f1..fd5b12c23eaa 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> @@ -680,13 +680,13 @@ static int xgbe_set_ringparam(struct net_device *netdev,
> 		return -EINVAL;
> 	}
> 
> -	rx = __rounddown_pow_of_two(ringparam->rx_pending);
> +	rx = rounddown_pow_of_two(ringparam->rx_pending);
> 	if (rx != ringparam->rx_pending)
> 		netdev_notice(netdev,
> 			      "rx ring parameter rounded to power of two: %u\n",
> 			      rx);
> 
> -	tx = __rounddown_pow_of_two(ringparam->tx_pending);
> +	tx = rounddown_pow_of_two(ringparam->tx_pending);
> 	if (tx != ringparam->tx_pending)
> 		netdev_notice(netdev,
> 			      "tx ring parameter rounded to power of two: %u\n",
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index 5f56ee83e3b1..cc3a03b4a611 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4139,7 +4139,7 @@ static int sky2_set_coalesce(struct net_device *dev,
>  */
> static unsigned long roundup_ring_size(unsigned long pending)
> {
> -	return max(128ul, roundup_pow_of_two(pending+1));
> +	return max(128ull, roundup_pow_of_two(pending+1));
> }
> 
> static void sky2_get_ringparam(struct net_device *dev,
> diff --git a/drivers/net/ethernet/rocker/rocker_hw.h b/drivers/net/ethernet/rocker/rocker_hw.h
> index 59f1f8b690d2..d8de15509e2c 100644
> --- a/drivers/net/ethernet/rocker/rocker_hw.h
> +++ b/drivers/net/ethernet/rocker/rocker_hw.h
> @@ -88,8 +88,8 @@ enum rocker_dma_type {
> };
> 
> /* Rocker DMA ring size limits and default sizes */
> -#define ROCKER_DMA_SIZE_MIN		2ul
> -#define ROCKER_DMA_SIZE_MAX		65536ul
> +#define ROCKER_DMA_SIZE_MIN		2ull
> +#define ROCKER_DMA_SIZE_MAX		65536ull
> #define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
> #define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
> #define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 4d9bbccc6f89..4f4d9a5b3b75 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -27,7 +27,7 @@ enum {
> };
> /* The maximum size of a shared RSS context */
> /* TODO: this should really be from the mcdi protocol export */
> -#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
> +#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64ULL
> 
> /* The filter table(s) are managed by firmware and we have write-only
>  * access.  When removing filters we must identify them to the
> diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
> index 2dd8d5002315..fea2add5860e 100644
> --- a/drivers/net/ethernet/sfc/efx.h
> +++ b/drivers/net/ethernet/sfc/efx.h
> @@ -52,7 +52,7 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
> 
> #define EFX_MAX_DMAQ_SIZE 4096UL
> #define EFX_DEFAULT_DMAQ_SIZE 1024UL
> -#define EFX_MIN_DMAQ_SIZE 512UL
> +#define EFX_MIN_DMAQ_SIZE 512ULL
> 
> #define EFX_MAX_EVQ_SIZE 16384UL
> #define EFX_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.h b/drivers/net/ethernet/sfc/falcon/efx.h
> index d3b4646545fa..0d16257156d6 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.h
> +++ b/drivers/net/ethernet/sfc/falcon/efx.h
> @@ -55,7 +55,7 @@ void ef4_schedule_slow_fill(struct ef4_rx_queue *rx_queue);
> 
> #define EF4_MAX_DMAQ_SIZE 4096UL
> #define EF4_DEFAULT_DMAQ_SIZE 1024UL
> -#define EF4_MIN_DMAQ_SIZE 512UL
> +#define EF4_MIN_DMAQ_SIZE 512ULL
> 
> #define EF4_MAX_EVQ_SIZE 16384UL
> #define EF4_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index c7709e49f0e4..f0391e88bc42 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -578,7 +578,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
> 	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
> 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
> 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
> -	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> +	entry->msi_attrib.multiple	= ilog2(roundup_pow_of_two(nvec));
> 
> 	if (control & PCI_MSI_FLAGS_64BIT)
> 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..53a727303dac 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -47,26 +47,6 @@ bool is_power_of_2(unsigned long n)
> 	return (n != 0 && ((n & (n - 1)) == 0));
> }
> 
> -/**
> - * __roundup_pow_of_two() - round up to nearest power of two
> - * @n: value to round up
> - */
> -static inline __attribute__((const))
> -unsigned long __roundup_pow_of_two(unsigned long n)
> -{
> -	return 1UL << fls_long(n - 1);
> -}
> -
> -/**
> - * __rounddown_pow_of_two() - round down to nearest power of two
> - * @n: value to round down
> - */
> -static inline __attribute__((const))
> -unsigned long __rounddown_pow_of_two(unsigned long n)
> -{
> -	return 1UL << (fls_long(n) - 1);
> -}
> -
> /**
>  * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
>  * @n: parameter
> @@ -170,14 +150,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  * - the result is undefined when n == 0
>  * - this can be used to initialise global variables from constant data
>  */
> -#define roundup_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(n == 1) ? 1 :			\
> -		(1UL << (ilog2((n) - 1) + 1))	\
> -				   ) :		\
> -	__roundup_pow_of_two(n)			\
> - )
> +#define roundup_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> +)
> 
> /**
>  * rounddown_pow_of_two - round the given value down to nearest power of two
> @@ -187,12 +164,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  * - the result is undefined when n == 0
>  * - this can be used to initialise global variables from constant data
>  */
> -#define rounddown_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(1UL << ilog2(n))) :		\
> -	__rounddown_pow_of_two(n)		\
> - )
> +#define rounddown_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2(n)))		  \
> +)
> 
> static inline __attribute_const__
> int __order_base_2(unsigned long n)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..bb9efc6944a4 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1094,7 +1094,8 @@ static int __init crash_notes_memory_init(void)
> 	 * crash_notes is allocated inside one physical page.
> 	 */
> 	size = sizeof(note_buf_t);
> -	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> +	align = min(roundup_pow_of_two(sizeof(note_buf_t)),
> +		    (unsigned long long)PAGE_SIZE);
> 
> 	/*
> 	 * Break compile if size is bigger than PAGE_SIZE since crash_notes
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bdb7e4cadf05..70908678c7a8 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
> 
> 	if (params->nelem_hint)
> 		retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> -			      (unsigned long)params->min_size);
> +			      (unsigned long long)params->min_size);
> 	else
> 		retsize = max(HASH_DEFAULT_SIZE,
> 			      (unsigned long)params->min_size);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 77c7dd7f05e8..78fb8ccabddd 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1015,7 +1015,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
> 	maxhdrsize = rpcrdma_fixed_maxsz + 3 +
> 		     r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz;
> 	maxhdrsize *= sizeof(__be32);
> -	rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize),
> +	rb = rpcrdma_regbuf_alloc(roundup_pow_of_two(maxhdrsize),
> 				  DMA_TO_DEVICE, flags);
> 	if (!rb)
> 		goto out2;

For the xprtrdma chunk:

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
  2019-12-03 16:39   ` Chuck Lever
@ 2019-12-04 10:56   ` Martin Habets
  2019-12-04 14:17   ` Leon Romanovsky
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Martin Habets @ 2019-12-04 10:56 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, andrew.murray, maz, linux-kernel,
	Michael Turquette, Stephen Boyd, Emilio López,
	Maxime Ripard, Chen-Yu Tsai, Mike Marciniszyn,
	Dennis Dalessandro, Yishai Hadas, Moni Shoua, David Woodhouse,
	Lu Baolu, Joerg Roedel, Tom Lendacky, Mirko Lindner,
	Stephen Hemminger, Jiri Pirko, Solarflare linux maintainers,
	Edward Cree, Bjorn Helgaas, Eric Biederman, Thomas Graf,
	Herbert Xu
  Cc: james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

For the changes under drivers/net/ethernet/sfc:
Reviewed-by: Martin Habets <mhabets@solarflare.com>

On 03/12/2019 11:47, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.
> 
> Suggested-by: Robin Murphy <robin.murphy@arm.con>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>  drivers/clk/clk-divider.c                    |  8 ++--
>  drivers/clk/sunxi/clk-sunxi.c                |  2 +-
>  drivers/infiniband/hw/hfi1/chip.c            |  4 +-
>  drivers/infiniband/hw/hfi1/init.c            |  4 +-
>  drivers/infiniband/hw/mlx4/srq.c             |  2 +-
>  drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
>  drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-
>  drivers/iommu/intel-iommu.c                  |  4 +-
>  drivers/iommu/intel-svm.c                    |  4 +-
>  drivers/iommu/intel_irq_remapping.c          |  2 +-
>  drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
>  drivers/net/ethernet/marvell/sky2.c          |  2 +-
>  drivers/net/ethernet/rocker/rocker_hw.h      |  4 +-
>  drivers/net/ethernet/sfc/ef10.c              |  2 +-
>  drivers/net/ethernet/sfc/efx.h               |  2 +-
>  drivers/net/ethernet/sfc/falcon/efx.h        |  2 +-
>  drivers/pci/msi.c                            |  2 +-
>  include/linux/log2.h                         | 44 +++++---------------
>  kernel/kexec_core.c                          |  3 +-
>  lib/rhashtable.c                             |  2 +-
>  net/sunrpc/xprtrdma/verbs.c                  |  2 +-
>  21 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index 098b2b01f0af..ba947e4c8193 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
>  	int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
>  
>  	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		div = __roundup_pow_of_two(div);
> +		div = roundup_pow_of_two(div);
>  	if (table)
>  		div = _round_up_table(table, div);
>  
> @@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table *table,
>  	down = parent_rate / rate;
>  
>  	if (flags & CLK_DIVIDER_POWER_OF_TWO) {
> -		up = __roundup_pow_of_two(up);
> -		down = __rounddown_pow_of_two(down);
> +		up = roundup_pow_of_two(up);
> +		down = rounddown_pow_of_two(down);
>  	} else if (table) {
>  		up = _round_up_table(table, up);
>  		down = _round_down_table(table, down);
> @@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int div,
>  	div++;
>  
>  	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		return __roundup_pow_of_two(div);
> +		return roundup_pow_of_two(div);
>  	if (table)
>  		return _round_up_table(table, div);
>  
> diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
> index 27201fd26e44..faec99dc09c0 100644
> --- a/drivers/clk/sunxi/clk-sunxi.c
> +++ b/drivers/clk/sunxi/clk-sunxi.c
> @@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request *req)
>  
>  		calcm = DIV_ROUND_UP(div, 1 << calcp);
>  	} else {
> -		calcp = __roundup_pow_of_two(div);
> +		calcp = roundup_pow_of_two(div);
>  		calcp = calcp > 3 ? 3 : calcp;
>  	}
>  
> diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
> index 9b1fb84a3d45..96b1d343c32f 100644
> --- a/drivers/infiniband/hw/hfi1/chip.c
> +++ b/drivers/infiniband/hw/hfi1/chip.c
> @@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
>  			max_by_vl = krcvqs[i];
>  	if (max_by_vl > 32)
>  		goto no_qos;
> -	m = ilog2(__roundup_pow_of_two(max_by_vl));
> +	m = ilog2(roundup_pow_of_two(max_by_vl));
>  
>  	/* determine bits for vl */
> -	n = ilog2(__roundup_pow_of_two(num_vls));
> +	n = ilog2(roundup_pow_of_two(num_vls));
>  
>  	/* reject if too much is used */
>  	if ((m + n) > 7)
> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
> index 26b792bb1027..838c789c7cce 100644
> --- a/drivers/infiniband/hw/hfi1/init.c
> +++ b/drivers/infiniband/hw/hfi1/init.c
> @@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
>  		 * MTU supported.
>  		 */
>  		if (rcd->egrbufs.size < hfi1_max_mtu) {
> -			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
> +			rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
>  			hfi1_cdbg(PROC,
>  				  "ctxt%u: eager bufs size too small. Adjusting to %u\n",
>  				    rcd->ctxt, rcd->egrbufs.size);
> @@ -1959,7 +1959,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
>  	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
>  	 */
>  	if (rcd->egrbufs.size <= (1 << 20))
> -		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
> +		rcd->egrbufs.rcvtid_size = max((unsigned long long)round_mtu,
>  			rounddown_pow_of_two(rcd->egrbufs.size / 8));
>  
>  	while (alloced_bytes < rcd->egrbufs.size &&
> diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
> index 8dcf6e3d9ae2..7e685600a7b3 100644
> --- a/drivers/infiniband/hw/mlx4/srq.c
> +++ b/drivers/infiniband/hw/mlx4/srq.c
> @@ -96,7 +96,7 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
>  	srq->msrq.max    = roundup_pow_of_two(init_attr->attr.max_wr + 1);
>  	srq->msrq.max_gs = init_attr->attr.max_sge;
>  
> -	desc_size = max(32UL,
> +	desc_size = max(32ULL,
>  			roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) +
>  					   srq->msrq.max_gs *
>  					   sizeof (struct mlx4_wqe_data_seg)));
> diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
> index a85935ccce88..0c2e14b4142a 100644
> --- a/drivers/infiniband/hw/mthca/mthca_srq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_srq.c
> @@ -225,7 +225,7 @@ int mthca_alloc_srq(struct mthca_dev *dev, struct mthca_pd *pd,
>  	else
>  		srq->max = srq->max + 1;
>  
> -	ds = max(64UL,
> +	ds = max(64ULL,
>  		 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
>  				    srq->max_gs * sizeof (struct mthca_data_seg)));
>  
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index e2c6d1cedf41..040b707b0877 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -592,7 +592,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>  	int err;
>  
>  	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
> -		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
> +		int max_rd_atomic = roundup_pow_of_two(attr->max_rd_atomic);
>  
>  		qp->attr.max_rd_atomic = max_rd_atomic;
>  		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
> @@ -600,7 +600,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>  
>  	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
>  		int max_dest_rd_atomic =
> -			__roundup_pow_of_two(attr->max_dest_rd_atomic);
> +			roundup_pow_of_two(attr->max_dest_rd_atomic);
>  
>  		qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
>  
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0c8d81f56a30..ce7c900bd666 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1488,7 +1488,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
>  				  unsigned long pfn, unsigned int pages,
>  				  int ih, int map)
>  {
> -	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
> +	unsigned int mask = ilog2(roundup_pow_of_two(pages));
>  	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
>  	u16 did = domain->iommu_did[iommu->seq_id];
>  
> @@ -3390,7 +3390,7 @@ static unsigned long intel_alloc_iova(struct device *dev,
>  	/* Restrict dma_mask to the width that the iommu can handle */
>  	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
>  	/* Ensure we reserve the whole size-aligned region */
> -	nrpages = __roundup_pow_of_two(nrpages);
> +	nrpages = roundup_pow_of_two(nrpages);
>  
>  	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
>  		/*
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..602caca3cd1a 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -115,7 +115,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>  			QI_EIOTLB_TYPE;
>  		desc.qw1 = 0;
>  	} else {
> -		int mask = ilog2(__roundup_pow_of_two(pages));
> +		int mask = ilog2(roundup_pow_of_two(pages));
>  
>  		desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>  				QI_EIOTLB_DID(sdev->did) |
> @@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>  			 * for example, an "address" value of 0x12345f000 will
>  			 * flush from 0x123440000 to 0x12347ffff (256KiB). */
>  			unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
> -			unsigned long mask = __rounddown_pow_of_two(address ^ last);
> +			unsigned long mask = rounddown_pow_of_two(address ^ last);
>  
>  			desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
>  					(mask - 1)) | QI_DEV_EIOTLB_SIZE;
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..935657b2c661 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -113,7 +113,7 @@ static int alloc_irte(struct intel_iommu *iommu,
>  		return -1;
>  
>  	if (count > 1) {
> -		count = __roundup_pow_of_two(count);
> +		count = roundup_pow_of_two(count);
>  		mask = ilog2(count);
>  	}
>  
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> index 6a757dadb5f1..fd5b12c23eaa 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> @@ -680,13 +680,13 @@ static int xgbe_set_ringparam(struct net_device *netdev,
>  		return -EINVAL;
>  	}
>  
> -	rx = __rounddown_pow_of_two(ringparam->rx_pending);
> +	rx = rounddown_pow_of_two(ringparam->rx_pending);
>  	if (rx != ringparam->rx_pending)
>  		netdev_notice(netdev,
>  			      "rx ring parameter rounded to power of two: %u\n",
>  			      rx);
>  
> -	tx = __rounddown_pow_of_two(ringparam->tx_pending);
> +	tx = rounddown_pow_of_two(ringparam->tx_pending);
>  	if (tx != ringparam->tx_pending)
>  		netdev_notice(netdev,
>  			      "tx ring parameter rounded to power of two: %u\n",
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index 5f56ee83e3b1..cc3a03b4a611 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4139,7 +4139,7 @@ static int sky2_set_coalesce(struct net_device *dev,
>   */
>  static unsigned long roundup_ring_size(unsigned long pending)
>  {
> -	return max(128ul, roundup_pow_of_two(pending+1));
> +	return max(128ull, roundup_pow_of_two(pending+1));
>  }
>  
>  static void sky2_get_ringparam(struct net_device *dev,
> diff --git a/drivers/net/ethernet/rocker/rocker_hw.h b/drivers/net/ethernet/rocker/rocker_hw.h
> index 59f1f8b690d2..d8de15509e2c 100644
> --- a/drivers/net/ethernet/rocker/rocker_hw.h
> +++ b/drivers/net/ethernet/rocker/rocker_hw.h
> @@ -88,8 +88,8 @@ enum rocker_dma_type {
>  };
>  
>  /* Rocker DMA ring size limits and default sizes */
> -#define ROCKER_DMA_SIZE_MIN		2ul
> -#define ROCKER_DMA_SIZE_MAX		65536ul
> +#define ROCKER_DMA_SIZE_MIN		2ull
> +#define ROCKER_DMA_SIZE_MAX		65536ull
>  #define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
>  #define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
>  #define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 4d9bbccc6f89..4f4d9a5b3b75 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -27,7 +27,7 @@ enum {
>  };
>  /* The maximum size of a shared RSS context */
>  /* TODO: this should really be from the mcdi protocol export */
> -#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
> +#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64ULL
>  
>  /* The filter table(s) are managed by firmware and we have write-only
>   * access.  When removing filters we must identify them to the
> diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
> index 2dd8d5002315..fea2add5860e 100644
> --- a/drivers/net/ethernet/sfc/efx.h
> +++ b/drivers/net/ethernet/sfc/efx.h
> @@ -52,7 +52,7 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
>  
>  #define EFX_MAX_DMAQ_SIZE 4096UL
>  #define EFX_DEFAULT_DMAQ_SIZE 1024UL
> -#define EFX_MIN_DMAQ_SIZE 512UL
> +#define EFX_MIN_DMAQ_SIZE 512ULL
>  
>  #define EFX_MAX_EVQ_SIZE 16384UL
>  #define EFX_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.h b/drivers/net/ethernet/sfc/falcon/efx.h
> index d3b4646545fa..0d16257156d6 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.h
> +++ b/drivers/net/ethernet/sfc/falcon/efx.h
> @@ -55,7 +55,7 @@ void ef4_schedule_slow_fill(struct ef4_rx_queue *rx_queue);
>  
>  #define EF4_MAX_DMAQ_SIZE 4096UL
>  #define EF4_DEFAULT_DMAQ_SIZE 1024UL
> -#define EF4_MIN_DMAQ_SIZE 512UL
> +#define EF4_MIN_DMAQ_SIZE 512ULL
>  
>  #define EF4_MAX_EVQ_SIZE 16384UL
>  #define EF4_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index c7709e49f0e4..f0391e88bc42 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -578,7 +578,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
>  	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>  	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>  	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
> -	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> +	entry->msi_attrib.multiple	= ilog2(roundup_pow_of_two(nvec));
>  
>  	if (control & PCI_MSI_FLAGS_64BIT)
>  		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..53a727303dac 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -47,26 +47,6 @@ bool is_power_of_2(unsigned long n)
>  	return (n != 0 && ((n & (n - 1)) == 0));
>  }
>  
> -/**
> - * __roundup_pow_of_two() - round up to nearest power of two
> - * @n: value to round up
> - */
> -static inline __attribute__((const))
> -unsigned long __roundup_pow_of_two(unsigned long n)
> -{
> -	return 1UL << fls_long(n - 1);
> -}
> -
> -/**
> - * __rounddown_pow_of_two() - round down to nearest power of two
> - * @n: value to round down
> - */
> -static inline __attribute__((const))
> -unsigned long __rounddown_pow_of_two(unsigned long n)
> -{
> -	return 1UL << (fls_long(n) - 1);
> -}
> -
>  /**
>   * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
>   * @n: parameter
> @@ -170,14 +150,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   * - the result is undefined when n == 0
>   * - this can be used to initialise global variables from constant data
>   */
> -#define roundup_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(n == 1) ? 1 :			\
> -		(1UL << (ilog2((n) - 1) + 1))	\
> -				   ) :		\
> -	__roundup_pow_of_two(n)			\
> - )
> +#define roundup_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> +)
>  
>  /**
>   * rounddown_pow_of_two - round the given value down to nearest power of two
> @@ -187,12 +164,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   * - the result is undefined when n == 0
>   * - this can be used to initialise global variables from constant data
>   */
> -#define rounddown_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(1UL << ilog2(n))) :		\
> -	__rounddown_pow_of_two(n)		\
> - )
> +#define rounddown_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2(n)))		  \
> +)
>  
>  static inline __attribute_const__
>  int __order_base_2(unsigned long n)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..bb9efc6944a4 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1094,7 +1094,8 @@ static int __init crash_notes_memory_init(void)
>  	 * crash_notes is allocated inside one physical page.
>  	 */
>  	size = sizeof(note_buf_t);
> -	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> +	align = min(roundup_pow_of_two(sizeof(note_buf_t)),
> +		    (unsigned long long)PAGE_SIZE);
>  
>  	/*
>  	 * Break compile if size is bigger than PAGE_SIZE since crash_notes
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bdb7e4cadf05..70908678c7a8 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
>  
>  	if (params->nelem_hint)
>  		retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> -			      (unsigned long)params->min_size);
> +			      (unsigned long long)params->min_size);
>  	else
>  		retsize = max(HASH_DEFAULT_SIZE,
>  			      (unsigned long)params->min_size);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 77c7dd7f05e8..78fb8ccabddd 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1015,7 +1015,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
>  	maxhdrsize = rpcrdma_fixed_maxsz + 3 +
>  		     r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz;
>  	maxhdrsize *= sizeof(__be32);
> -	rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize),
> +	rb = rpcrdma_regbuf_alloc(roundup_pow_of_two(maxhdrsize),
>  				  DMA_TO_DEVICE, flags);
>  	if (!rb)
>  		goto out2;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
  2019-12-03 16:39   ` Chuck Lever
  2019-12-04 10:56   ` Martin Habets
@ 2019-12-04 14:17   ` Leon Romanovsky
  2019-12-05  0:39   ` Lu Baolu
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2019-12-04 14:17 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: andrew.murray, maz, linux-kernel, Michael Turquette,
	Stephen Boyd, Emilio López, Maxime Ripard, Chen-Yu Tsai,
	Mike Marciniszyn, Dennis Dalessandro, Yishai Hadas, Moni Shoua,
	David Woodhouse, Lu Baolu, Joerg Roedel, Tom Lendacky,
	Mirko Lindner, Stephen Hemminger, Jiri Pirko,
	Solarflare linux maintainers, Edward Cree, Martin Habets,
	Bjorn Helgaas, Eric Biederman, Thomas Graf, Herbert Xu,
	james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

On Tue, Dec 03, 2019 at 12:47:40PM +0100, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.
>
> Suggested-by: Robin Murphy <robin.murphy@arm.con>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>  drivers/clk/clk-divider.c                    |  8 ++--
>  drivers/clk/sunxi/clk-sunxi.c                |  2 +-
>  drivers/infiniband/hw/hfi1/chip.c            |  4 +-
>  drivers/infiniband/hw/hfi1/init.c            |  4 +-
>  drivers/infiniband/hw/mlx4/srq.c             |  2 +-
>  drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
>  drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-


Thanks, for infiniband.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
                     ` (2 preceding siblings ...)
  2019-12-04 14:17   ` Leon Romanovsky
@ 2019-12-05  0:39   ` Lu Baolu
  2019-12-05 17:48   ` Robin Murphy
  2019-12-05 22:30   ` Bjorn Helgaas
  5 siblings, 0 replies; 15+ messages in thread
From: Lu Baolu @ 2019-12-05  0:39 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, andrew.murray, maz, linux-kernel,
	Michael Turquette, Stephen Boyd, Emilio López,
	Maxime Ripard, Chen-Yu Tsai, Mike Marciniszyn,
	Dennis Dalessandro, Yishai Hadas, Moni Shoua, David Woodhouse,
	Joerg Roedel, Tom Lendacky, Mirko Lindner, Stephen Hemminger,
	Jiri Pirko, Solarflare linux maintainers, Edward Cree,
	Martin Habets, Bjorn Helgaas, Eric Biederman, Thomas Graf,
	Herbert Xu
  Cc: baolu.lu, james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

Hi,

On 12/3/19 7:47 PM, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.
> 
> Suggested-by: Robin Murphy <robin.murphy@arm.con>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>   drivers/clk/clk-divider.c                    |  8 ++--
>   drivers/clk/sunxi/clk-sunxi.c                |  2 +-
>   drivers/infiniband/hw/hfi1/chip.c            |  4 +-
>   drivers/infiniband/hw/hfi1/init.c            |  4 +-
>   drivers/infiniband/hw/mlx4/srq.c             |  2 +-
>   drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
>   drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-
>   drivers/iommu/intel-iommu.c                  |  4 +-
>   drivers/iommu/intel-svm.c                    |  4 +-
>   drivers/iommu/intel_irq_remapping.c          |  2 +-

For changes in drivers/iommu/intel*.c,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu

>   drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
>   drivers/net/ethernet/marvell/sky2.c          |  2 +-
>   drivers/net/ethernet/rocker/rocker_hw.h      |  4 +-
>   drivers/net/ethernet/sfc/ef10.c              |  2 +-
>   drivers/net/ethernet/sfc/efx.h               |  2 +-
>   drivers/net/ethernet/sfc/falcon/efx.h        |  2 +-
>   drivers/pci/msi.c                            |  2 +-
>   include/linux/log2.h                         | 44 +++++---------------
>   kernel/kexec_core.c                          |  3 +-
>   lib/rhashtable.c                             |  2 +-
>   net/sunrpc/xprtrdma/verbs.c                  |  2 +-
>   21 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index 098b2b01f0af..ba947e4c8193 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
>   	int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		div = __roundup_pow_of_two(div);
> +		div = roundup_pow_of_two(div);
>   	if (table)
>   		div = _round_up_table(table, div);
>   
> @@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table *table,
>   	down = parent_rate / rate;
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO) {
> -		up = __roundup_pow_of_two(up);
> -		down = __rounddown_pow_of_two(down);
> +		up = roundup_pow_of_two(up);
> +		down = rounddown_pow_of_two(down);
>   	} else if (table) {
>   		up = _round_up_table(table, up);
>   		down = _round_down_table(table, down);
> @@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int div,
>   	div++;
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		return __roundup_pow_of_two(div);
> +		return roundup_pow_of_two(div);
>   	if (table)
>   		return _round_up_table(table, div);
>   
> diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
> index 27201fd26e44..faec99dc09c0 100644
> --- a/drivers/clk/sunxi/clk-sunxi.c
> +++ b/drivers/clk/sunxi/clk-sunxi.c
> @@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request *req)
>   
>   		calcm = DIV_ROUND_UP(div, 1 << calcp);
>   	} else {
> -		calcp = __roundup_pow_of_two(div);
> +		calcp = roundup_pow_of_two(div);
>   		calcp = calcp > 3 ? 3 : calcp;
>   	}
>   
> diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
> index 9b1fb84a3d45..96b1d343c32f 100644
> --- a/drivers/infiniband/hw/hfi1/chip.c
> +++ b/drivers/infiniband/hw/hfi1/chip.c
> @@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
>   			max_by_vl = krcvqs[i];
>   	if (max_by_vl > 32)
>   		goto no_qos;
> -	m = ilog2(__roundup_pow_of_two(max_by_vl));
> +	m = ilog2(roundup_pow_of_two(max_by_vl));
>   
>   	/* determine bits for vl */
> -	n = ilog2(__roundup_pow_of_two(num_vls));
> +	n = ilog2(roundup_pow_of_two(num_vls));
>   
>   	/* reject if too much is used */
>   	if ((m + n) > 7)
> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
> index 26b792bb1027..838c789c7cce 100644
> --- a/drivers/infiniband/hw/hfi1/init.c
> +++ b/drivers/infiniband/hw/hfi1/init.c
> @@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
>   		 * MTU supported.
>   		 */
>   		if (rcd->egrbufs.size < hfi1_max_mtu) {
> -			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
> +			rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
>   			hfi1_cdbg(PROC,
>   				  "ctxt%u: eager bufs size too small. Adjusting to %u\n",
>   				    rcd->ctxt, rcd->egrbufs.size);
> @@ -1959,7 +1959,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
>   	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
>   	 */
>   	if (rcd->egrbufs.size <= (1 << 20))
> -		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
> +		rcd->egrbufs.rcvtid_size = max((unsigned long long)round_mtu,
>   			rounddown_pow_of_two(rcd->egrbufs.size / 8));
>   
>   	while (alloced_bytes < rcd->egrbufs.size &&
> diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
> index 8dcf6e3d9ae2..7e685600a7b3 100644
> --- a/drivers/infiniband/hw/mlx4/srq.c
> +++ b/drivers/infiniband/hw/mlx4/srq.c
> @@ -96,7 +96,7 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
>   	srq->msrq.max    = roundup_pow_of_two(init_attr->attr.max_wr + 1);
>   	srq->msrq.max_gs = init_attr->attr.max_sge;
>   
> -	desc_size = max(32UL,
> +	desc_size = max(32ULL,
>   			roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) +
>   					   srq->msrq.max_gs *
>   					   sizeof (struct mlx4_wqe_data_seg)));
> diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
> index a85935ccce88..0c2e14b4142a 100644
> --- a/drivers/infiniband/hw/mthca/mthca_srq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_srq.c
> @@ -225,7 +225,7 @@ int mthca_alloc_srq(struct mthca_dev *dev, struct mthca_pd *pd,
>   	else
>   		srq->max = srq->max + 1;
>   
> -	ds = max(64UL,
> +	ds = max(64ULL,
>   		 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
>   				    srq->max_gs * sizeof (struct mthca_data_seg)));
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index e2c6d1cedf41..040b707b0877 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -592,7 +592,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>   	int err;
>   
>   	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
> -		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
> +		int max_rd_atomic = roundup_pow_of_two(attr->max_rd_atomic);
>   
>   		qp->attr.max_rd_atomic = max_rd_atomic;
>   		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
> @@ -600,7 +600,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>   
>   	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
>   		int max_dest_rd_atomic =
> -			__roundup_pow_of_two(attr->max_dest_rd_atomic);
> +			roundup_pow_of_two(attr->max_dest_rd_atomic);
>   
>   		qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
>   
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0c8d81f56a30..ce7c900bd666 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1488,7 +1488,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
>   				  unsigned long pfn, unsigned int pages,
>   				  int ih, int map)
>   {
> -	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
> +	unsigned int mask = ilog2(roundup_pow_of_two(pages));
>   	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
>   	u16 did = domain->iommu_did[iommu->seq_id];
>   
> @@ -3390,7 +3390,7 @@ static unsigned long intel_alloc_iova(struct device *dev,
>   	/* Restrict dma_mask to the width that the iommu can handle */
>   	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
>   	/* Ensure we reserve the whole size-aligned region */
> -	nrpages = __roundup_pow_of_two(nrpages);
> +	nrpages = roundup_pow_of_two(nrpages);
>   
>   	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
>   		/*
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..602caca3cd1a 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -115,7 +115,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>   			QI_EIOTLB_TYPE;
>   		desc.qw1 = 0;
>   	} else {
> -		int mask = ilog2(__roundup_pow_of_two(pages));
> +		int mask = ilog2(roundup_pow_of_two(pages));
>   
>   		desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>   				QI_EIOTLB_DID(sdev->did) |
> @@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>   			 * for example, an "address" value of 0x12345f000 will
>   			 * flush from 0x123440000 to 0x12347ffff (256KiB). */
>   			unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
> -			unsigned long mask = __rounddown_pow_of_two(address ^ last);
> +			unsigned long mask = rounddown_pow_of_two(address ^ last);
>   
>   			desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
>   					(mask - 1)) | QI_DEV_EIOTLB_SIZE;
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..935657b2c661 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -113,7 +113,7 @@ static int alloc_irte(struct intel_iommu *iommu,
>   		return -1;
>   
>   	if (count > 1) {
> -		count = __roundup_pow_of_two(count);
> +		count = roundup_pow_of_two(count);
>   		mask = ilog2(count);
>   	}
>   
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> index 6a757dadb5f1..fd5b12c23eaa 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> @@ -680,13 +680,13 @@ static int xgbe_set_ringparam(struct net_device *netdev,
>   		return -EINVAL;
>   	}
>   
> -	rx = __rounddown_pow_of_two(ringparam->rx_pending);
> +	rx = rounddown_pow_of_two(ringparam->rx_pending);
>   	if (rx != ringparam->rx_pending)
>   		netdev_notice(netdev,
>   			      "rx ring parameter rounded to power of two: %u\n",
>   			      rx);
>   
> -	tx = __rounddown_pow_of_two(ringparam->tx_pending);
> +	tx = rounddown_pow_of_two(ringparam->tx_pending);
>   	if (tx != ringparam->tx_pending)
>   		netdev_notice(netdev,
>   			      "tx ring parameter rounded to power of two: %u\n",
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index 5f56ee83e3b1..cc3a03b4a611 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4139,7 +4139,7 @@ static int sky2_set_coalesce(struct net_device *dev,
>    */
>   static unsigned long roundup_ring_size(unsigned long pending)
>   {
> -	return max(128ul, roundup_pow_of_two(pending+1));
> +	return max(128ull, roundup_pow_of_two(pending+1));
>   }
>   
>   static void sky2_get_ringparam(struct net_device *dev,
> diff --git a/drivers/net/ethernet/rocker/rocker_hw.h b/drivers/net/ethernet/rocker/rocker_hw.h
> index 59f1f8b690d2..d8de15509e2c 100644
> --- a/drivers/net/ethernet/rocker/rocker_hw.h
> +++ b/drivers/net/ethernet/rocker/rocker_hw.h
> @@ -88,8 +88,8 @@ enum rocker_dma_type {
>   };
>   
>   /* Rocker DMA ring size limits and default sizes */
> -#define ROCKER_DMA_SIZE_MIN		2ul
> -#define ROCKER_DMA_SIZE_MAX		65536ul
> +#define ROCKER_DMA_SIZE_MIN		2ull
> +#define ROCKER_DMA_SIZE_MAX		65536ull
>   #define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
>   #define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
>   #define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 4d9bbccc6f89..4f4d9a5b3b75 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -27,7 +27,7 @@ enum {
>   };
>   /* The maximum size of a shared RSS context */
>   /* TODO: this should really be from the mcdi protocol export */
> -#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
> +#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64ULL
>   
>   /* The filter table(s) are managed by firmware and we have write-only
>    * access.  When removing filters we must identify them to the
> diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
> index 2dd8d5002315..fea2add5860e 100644
> --- a/drivers/net/ethernet/sfc/efx.h
> +++ b/drivers/net/ethernet/sfc/efx.h
> @@ -52,7 +52,7 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
>   
>   #define EFX_MAX_DMAQ_SIZE 4096UL
>   #define EFX_DEFAULT_DMAQ_SIZE 1024UL
> -#define EFX_MIN_DMAQ_SIZE 512UL
> +#define EFX_MIN_DMAQ_SIZE 512ULL
>   
>   #define EFX_MAX_EVQ_SIZE 16384UL
>   #define EFX_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.h b/drivers/net/ethernet/sfc/falcon/efx.h
> index d3b4646545fa..0d16257156d6 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.h
> +++ b/drivers/net/ethernet/sfc/falcon/efx.h
> @@ -55,7 +55,7 @@ void ef4_schedule_slow_fill(struct ef4_rx_queue *rx_queue);
>   
>   #define EF4_MAX_DMAQ_SIZE 4096UL
>   #define EF4_DEFAULT_DMAQ_SIZE 1024UL
> -#define EF4_MIN_DMAQ_SIZE 512UL
> +#define EF4_MIN_DMAQ_SIZE 512ULL
>   
>   #define EF4_MAX_EVQ_SIZE 16384UL
>   #define EF4_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index c7709e49f0e4..f0391e88bc42 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -578,7 +578,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
>   	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>   	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>   	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
> -	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> +	entry->msi_attrib.multiple	= ilog2(roundup_pow_of_two(nvec));
>   
>   	if (control & PCI_MSI_FLAGS_64BIT)
>   		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..53a727303dac 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -47,26 +47,6 @@ bool is_power_of_2(unsigned long n)
>   	return (n != 0 && ((n & (n - 1)) == 0));
>   }
>   
> -/**
> - * __roundup_pow_of_two() - round up to nearest power of two
> - * @n: value to round up
> - */
> -static inline __attribute__((const))
> -unsigned long __roundup_pow_of_two(unsigned long n)
> -{
> -	return 1UL << fls_long(n - 1);
> -}
> -
> -/**
> - * __rounddown_pow_of_two() - round down to nearest power of two
> - * @n: value to round down
> - */
> -static inline __attribute__((const))
> -unsigned long __rounddown_pow_of_two(unsigned long n)
> -{
> -	return 1UL << (fls_long(n) - 1);
> -}
> -
>   /**
>    * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
>    * @n: parameter
> @@ -170,14 +150,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>    * - the result is undefined when n == 0
>    * - this can be used to initialise global variables from constant data
>    */
> -#define roundup_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(n == 1) ? 1 :			\
> -		(1UL << (ilog2((n) - 1) + 1))	\
> -				   ) :		\
> -	__roundup_pow_of_two(n)			\
> - )
> +#define roundup_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> +)
>   
>   /**
>    * rounddown_pow_of_two - round the given value down to nearest power of two
> @@ -187,12 +164,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>    * - the result is undefined when n == 0
>    * - this can be used to initialise global variables from constant data
>    */
> -#define rounddown_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(1UL << ilog2(n))) :		\
> -	__rounddown_pow_of_two(n)		\
> - )
> +#define rounddown_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2(n)))		  \
> +)
>   
>   static inline __attribute_const__
>   int __order_base_2(unsigned long n)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..bb9efc6944a4 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1094,7 +1094,8 @@ static int __init crash_notes_memory_init(void)
>   	 * crash_notes is allocated inside one physical page.
>   	 */
>   	size = sizeof(note_buf_t);
> -	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> +	align = min(roundup_pow_of_two(sizeof(note_buf_t)),
> +		    (unsigned long long)PAGE_SIZE);
>   
>   	/*
>   	 * Break compile if size is bigger than PAGE_SIZE since crash_notes
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bdb7e4cadf05..70908678c7a8 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
>   
>   	if (params->nelem_hint)
>   		retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> -			      (unsigned long)params->min_size);
> +			      (unsigned long long)params->min_size);
>   	else
>   		retsize = max(HASH_DEFAULT_SIZE,
>   			      (unsigned long)params->min_size);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 77c7dd7f05e8..78fb8ccabddd 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1015,7 +1015,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
>   	maxhdrsize = rpcrdma_fixed_maxsz + 3 +
>   		     r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz;
>   	maxhdrsize *= sizeof(__be32);
> -	rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize),
> +	rb = rpcrdma_regbuf_alloc(roundup_pow_of_two(maxhdrsize),
>   				  DMA_TO_DEVICE, flags);
>   	if (!rb)
>   		goto out2;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
                     ` (3 preceding siblings ...)
  2019-12-05  0:39   ` Lu Baolu
@ 2019-12-05 17:48   ` Robin Murphy
  2019-12-12 12:31     ` Nicolas Saenz Julienne
  2019-12-05 22:30   ` Bjorn Helgaas
  5 siblings, 1 reply; 15+ messages in thread
From: Robin Murphy @ 2019-12-05 17:48 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, andrew.murray, maz, linux-kernel,
	Michael Turquette, Stephen Boyd, Emilio López,
	Maxime Ripard, Chen-Yu Tsai, Mike Marciniszyn,
	Dennis Dalessandro, Yishai Hadas, Moni Shoua, David Woodhouse,
	Lu Baolu, Joerg Roedel, Tom Lendacky, Mirko Lindner,
	Stephen Hemminger, Jiri Pirko, Solarflare linux maintainers,
	Edward Cree, Martin Habets, Bjorn Helgaas, Eric Biederman,
	Thomas Graf, Herbert Xu
  Cc: james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

On 03/12/2019 11:47 am, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.

Neat! Although all the additional ULL casts this introduces seem 
somewhat unwelcome - I suppose the (1ULL << (ilog2(n))) makes it 
effectively always return unsigned long long now. Might it make sense to 
cast the return value to typeof(n) to avoid this slightly non-obvious 
behaviour (and the associated churn)?

Robin.

> Suggested-by: Robin Murphy <robin.murphy@arm.con>
> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> ---
>   drivers/clk/clk-divider.c                    |  8 ++--
>   drivers/clk/sunxi/clk-sunxi.c                |  2 +-
>   drivers/infiniband/hw/hfi1/chip.c            |  4 +-
>   drivers/infiniband/hw/hfi1/init.c            |  4 +-
>   drivers/infiniband/hw/mlx4/srq.c             |  2 +-
>   drivers/infiniband/hw/mthca/mthca_srq.c      |  2 +-
>   drivers/infiniband/sw/rxe/rxe_qp.c           |  4 +-
>   drivers/iommu/intel-iommu.c                  |  4 +-
>   drivers/iommu/intel-svm.c                    |  4 +-
>   drivers/iommu/intel_irq_remapping.c          |  2 +-
>   drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
>   drivers/net/ethernet/marvell/sky2.c          |  2 +-
>   drivers/net/ethernet/rocker/rocker_hw.h      |  4 +-
>   drivers/net/ethernet/sfc/ef10.c              |  2 +-
>   drivers/net/ethernet/sfc/efx.h               |  2 +-
>   drivers/net/ethernet/sfc/falcon/efx.h        |  2 +-
>   drivers/pci/msi.c                            |  2 +-
>   include/linux/log2.h                         | 44 +++++---------------
>   kernel/kexec_core.c                          |  3 +-
>   lib/rhashtable.c                             |  2 +-
>   net/sunrpc/xprtrdma/verbs.c                  |  2 +-
>   21 files changed, 41 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index 098b2b01f0af..ba947e4c8193 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
>   	int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		div = __roundup_pow_of_two(div);
> +		div = roundup_pow_of_two(div);
>   	if (table)
>   		div = _round_up_table(table, div);
>   
> @@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table *table,
>   	down = parent_rate / rate;
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO) {
> -		up = __roundup_pow_of_two(up);
> -		down = __rounddown_pow_of_two(down);
> +		up = roundup_pow_of_two(up);
> +		down = rounddown_pow_of_two(down);
>   	} else if (table) {
>   		up = _round_up_table(table, up);
>   		down = _round_down_table(table, down);
> @@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int div,
>   	div++;
>   
>   	if (flags & CLK_DIVIDER_POWER_OF_TWO)
> -		return __roundup_pow_of_two(div);
> +		return roundup_pow_of_two(div);
>   	if (table)
>   		return _round_up_table(table, div);
>   
> diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
> index 27201fd26e44..faec99dc09c0 100644
> --- a/drivers/clk/sunxi/clk-sunxi.c
> +++ b/drivers/clk/sunxi/clk-sunxi.c
> @@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request *req)
>   
>   		calcm = DIV_ROUND_UP(div, 1 << calcp);
>   	} else {
> -		calcp = __roundup_pow_of_two(div);
> +		calcp = roundup_pow_of_two(div);
>   		calcp = calcp > 3 ? 3 : calcp;
>   	}
>   
> diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
> index 9b1fb84a3d45..96b1d343c32f 100644
> --- a/drivers/infiniband/hw/hfi1/chip.c
> +++ b/drivers/infiniband/hw/hfi1/chip.c
> @@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, unsigned int *mp,
>   			max_by_vl = krcvqs[i];
>   	if (max_by_vl > 32)
>   		goto no_qos;
> -	m = ilog2(__roundup_pow_of_two(max_by_vl));
> +	m = ilog2(roundup_pow_of_two(max_by_vl));
>   
>   	/* determine bits for vl */
> -	n = ilog2(__roundup_pow_of_two(num_vls));
> +	n = ilog2(roundup_pow_of_two(num_vls));
>   
>   	/* reject if too much is used */
>   	if ((m + n) > 7)
> diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
> index 26b792bb1027..838c789c7cce 100644
> --- a/drivers/infiniband/hw/hfi1/init.c
> +++ b/drivers/infiniband/hw/hfi1/init.c
> @@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
>   		 * MTU supported.
>   		 */
>   		if (rcd->egrbufs.size < hfi1_max_mtu) {
> -			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
> +			rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
>   			hfi1_cdbg(PROC,
>   				  "ctxt%u: eager bufs size too small. Adjusting to %u\n",
>   				    rcd->ctxt, rcd->egrbufs.size);
> @@ -1959,7 +1959,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
>   	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
>   	 */
>   	if (rcd->egrbufs.size <= (1 << 20))
> -		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
> +		rcd->egrbufs.rcvtid_size = max((unsigned long long)round_mtu,
>   			rounddown_pow_of_two(rcd->egrbufs.size / 8));
>   
>   	while (alloced_bytes < rcd->egrbufs.size &&
> diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
> index 8dcf6e3d9ae2..7e685600a7b3 100644
> --- a/drivers/infiniband/hw/mlx4/srq.c
> +++ b/drivers/infiniband/hw/mlx4/srq.c
> @@ -96,7 +96,7 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq,
>   	srq->msrq.max    = roundup_pow_of_two(init_attr->attr.max_wr + 1);
>   	srq->msrq.max_gs = init_attr->attr.max_sge;
>   
> -	desc_size = max(32UL,
> +	desc_size = max(32ULL,
>   			roundup_pow_of_two(sizeof (struct mlx4_wqe_srq_next_seg) +
>   					   srq->msrq.max_gs *
>   					   sizeof (struct mlx4_wqe_data_seg)));
> diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c
> index a85935ccce88..0c2e14b4142a 100644
> --- a/drivers/infiniband/hw/mthca/mthca_srq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_srq.c
> @@ -225,7 +225,7 @@ int mthca_alloc_srq(struct mthca_dev *dev, struct mthca_pd *pd,
>   	else
>   		srq->max = srq->max + 1;
>   
> -	ds = max(64UL,
> +	ds = max(64ULL,
>   		 roundup_pow_of_two(sizeof (struct mthca_next_seg) +
>   				    srq->max_gs * sizeof (struct mthca_data_seg)));
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index e2c6d1cedf41..040b707b0877 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -592,7 +592,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>   	int err;
>   
>   	if (mask & IB_QP_MAX_QP_RD_ATOMIC) {
> -		int max_rd_atomic = __roundup_pow_of_two(attr->max_rd_atomic);
> +		int max_rd_atomic = roundup_pow_of_two(attr->max_rd_atomic);
>   
>   		qp->attr.max_rd_atomic = max_rd_atomic;
>   		atomic_set(&qp->req.rd_atomic, max_rd_atomic);
> @@ -600,7 +600,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
>   
>   	if (mask & IB_QP_MAX_DEST_RD_ATOMIC) {
>   		int max_dest_rd_atomic =
> -			__roundup_pow_of_two(attr->max_dest_rd_atomic);
> +			roundup_pow_of_two(attr->max_dest_rd_atomic);
>   
>   		qp->attr.max_dest_rd_atomic = max_dest_rd_atomic;
>   
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0c8d81f56a30..ce7c900bd666 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1488,7 +1488,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
>   				  unsigned long pfn, unsigned int pages,
>   				  int ih, int map)
>   {
> -	unsigned int mask = ilog2(__roundup_pow_of_two(pages));
> +	unsigned int mask = ilog2(roundup_pow_of_two(pages));
>   	uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
>   	u16 did = domain->iommu_did[iommu->seq_id];
>   
> @@ -3390,7 +3390,7 @@ static unsigned long intel_alloc_iova(struct device *dev,
>   	/* Restrict dma_mask to the width that the iommu can handle */
>   	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
>   	/* Ensure we reserve the whole size-aligned region */
> -	nrpages = __roundup_pow_of_two(nrpages);
> +	nrpages = roundup_pow_of_two(nrpages);
>   
>   	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
>   		/*
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..602caca3cd1a 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -115,7 +115,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>   			QI_EIOTLB_TYPE;
>   		desc.qw1 = 0;
>   	} else {
> -		int mask = ilog2(__roundup_pow_of_two(pages));
> +		int mask = ilog2(roundup_pow_of_two(pages));
>   
>   		desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>   				QI_EIOTLB_DID(sdev->did) |
> @@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
>   			 * for example, an "address" value of 0x12345f000 will
>   			 * flush from 0x123440000 to 0x12347ffff (256KiB). */
>   			unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
> -			unsigned long mask = __rounddown_pow_of_two(address ^ last);
> +			unsigned long mask = rounddown_pow_of_two(address ^ last);
>   
>   			desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
>   					(mask - 1)) | QI_DEV_EIOTLB_SIZE;
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 81e43c1df7ec..935657b2c661 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -113,7 +113,7 @@ static int alloc_irte(struct intel_iommu *iommu,
>   		return -1;
>   
>   	if (count > 1) {
> -		count = __roundup_pow_of_two(count);
> +		count = roundup_pow_of_two(count);
>   		mask = ilog2(count);
>   	}
>   
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> index 6a757dadb5f1..fd5b12c23eaa 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
> @@ -680,13 +680,13 @@ static int xgbe_set_ringparam(struct net_device *netdev,
>   		return -EINVAL;
>   	}
>   
> -	rx = __rounddown_pow_of_two(ringparam->rx_pending);
> +	rx = rounddown_pow_of_two(ringparam->rx_pending);
>   	if (rx != ringparam->rx_pending)
>   		netdev_notice(netdev,
>   			      "rx ring parameter rounded to power of two: %u\n",
>   			      rx);
>   
> -	tx = __rounddown_pow_of_two(ringparam->tx_pending);
> +	tx = rounddown_pow_of_two(ringparam->tx_pending);
>   	if (tx != ringparam->tx_pending)
>   		netdev_notice(netdev,
>   			      "tx ring parameter rounded to power of two: %u\n",
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index 5f56ee83e3b1..cc3a03b4a611 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4139,7 +4139,7 @@ static int sky2_set_coalesce(struct net_device *dev,
>    */
>   static unsigned long roundup_ring_size(unsigned long pending)
>   {
> -	return max(128ul, roundup_pow_of_two(pending+1));
> +	return max(128ull, roundup_pow_of_two(pending+1));
>   }
>   
>   static void sky2_get_ringparam(struct net_device *dev,
> diff --git a/drivers/net/ethernet/rocker/rocker_hw.h b/drivers/net/ethernet/rocker/rocker_hw.h
> index 59f1f8b690d2..d8de15509e2c 100644
> --- a/drivers/net/ethernet/rocker/rocker_hw.h
> +++ b/drivers/net/ethernet/rocker/rocker_hw.h
> @@ -88,8 +88,8 @@ enum rocker_dma_type {
>   };
>   
>   /* Rocker DMA ring size limits and default sizes */
> -#define ROCKER_DMA_SIZE_MIN		2ul
> -#define ROCKER_DMA_SIZE_MAX		65536ul
> +#define ROCKER_DMA_SIZE_MIN		2ull
> +#define ROCKER_DMA_SIZE_MAX		65536ull
>   #define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
>   #define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
>   #define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 4d9bbccc6f89..4f4d9a5b3b75 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -27,7 +27,7 @@ enum {
>   };
>   /* The maximum size of a shared RSS context */
>   /* TODO: this should really be from the mcdi protocol export */
> -#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64UL
> +#define EFX_EF10_MAX_SHARED_RSS_CONTEXT_SIZE 64ULL
>   
>   /* The filter table(s) are managed by firmware and we have write-only
>    * access.  When removing filters we must identify them to the
> diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
> index 2dd8d5002315..fea2add5860e 100644
> --- a/drivers/net/ethernet/sfc/efx.h
> +++ b/drivers/net/ethernet/sfc/efx.h
> @@ -52,7 +52,7 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
>   
>   #define EFX_MAX_DMAQ_SIZE 4096UL
>   #define EFX_DEFAULT_DMAQ_SIZE 1024UL
> -#define EFX_MIN_DMAQ_SIZE 512UL
> +#define EFX_MIN_DMAQ_SIZE 512ULL
>   
>   #define EFX_MAX_EVQ_SIZE 16384UL
>   #define EFX_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/net/ethernet/sfc/falcon/efx.h b/drivers/net/ethernet/sfc/falcon/efx.h
> index d3b4646545fa..0d16257156d6 100644
> --- a/drivers/net/ethernet/sfc/falcon/efx.h
> +++ b/drivers/net/ethernet/sfc/falcon/efx.h
> @@ -55,7 +55,7 @@ void ef4_schedule_slow_fill(struct ef4_rx_queue *rx_queue);
>   
>   #define EF4_MAX_DMAQ_SIZE 4096UL
>   #define EF4_DEFAULT_DMAQ_SIZE 1024UL
> -#define EF4_MIN_DMAQ_SIZE 512UL
> +#define EF4_MIN_DMAQ_SIZE 512ULL
>   
>   #define EF4_MAX_EVQ_SIZE 16384UL
>   #define EF4_MIN_EVQ_SIZE 512UL
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index c7709e49f0e4..f0391e88bc42 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -578,7 +578,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
>   	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>   	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>   	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
> -	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> +	entry->msi_attrib.multiple	= ilog2(roundup_pow_of_two(nvec));
>   
>   	if (control & PCI_MSI_FLAGS_64BIT)
>   		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..53a727303dac 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -47,26 +47,6 @@ bool is_power_of_2(unsigned long n)
>   	return (n != 0 && ((n & (n - 1)) == 0));
>   }
>   
> -/**
> - * __roundup_pow_of_two() - round up to nearest power of two
> - * @n: value to round up
> - */
> -static inline __attribute__((const))
> -unsigned long __roundup_pow_of_two(unsigned long n)
> -{
> -	return 1UL << fls_long(n - 1);
> -}
> -
> -/**
> - * __rounddown_pow_of_two() - round down to nearest power of two
> - * @n: value to round down
> - */
> -static inline __attribute__((const))
> -unsigned long __rounddown_pow_of_two(unsigned long n)
> -{
> -	return 1UL << (fls_long(n) - 1);
> -}
> -
>   /**
>    * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
>    * @n: parameter
> @@ -170,14 +150,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>    * - the result is undefined when n == 0
>    * - this can be used to initialise global variables from constant data
>    */
> -#define roundup_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(n == 1) ? 1 :			\
> -		(1UL << (ilog2((n) - 1) + 1))	\
> -				   ) :		\
> -	__roundup_pow_of_two(n)			\
> - )
> +#define roundup_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> +)
>   
>   /**
>    * rounddown_pow_of_two - round the given value down to nearest power of two
> @@ -187,12 +164,11 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>    * - the result is undefined when n == 0
>    * - this can be used to initialise global variables from constant data
>    */
> -#define rounddown_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(1UL << ilog2(n))) :		\
> -	__rounddown_pow_of_two(n)		\
> - )
> +#define rounddown_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2(n)))		  \
> +)
>   
>   static inline __attribute_const__
>   int __order_base_2(unsigned long n)
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..bb9efc6944a4 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1094,7 +1094,8 @@ static int __init crash_notes_memory_init(void)
>   	 * crash_notes is allocated inside one physical page.
>   	 */
>   	size = sizeof(note_buf_t);
> -	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> +	align = min(roundup_pow_of_two(sizeof(note_buf_t)),
> +		    (unsigned long long)PAGE_SIZE);
>   
>   	/*
>   	 * Break compile if size is bigger than PAGE_SIZE since crash_notes
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index bdb7e4cadf05..70908678c7a8 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)
>   
>   	if (params->nelem_hint)
>   		retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
> -			      (unsigned long)params->min_size);
> +			      (unsigned long long)params->min_size);
>   	else
>   		retsize = max(HASH_DEFAULT_SIZE,
>   			      (unsigned long)params->min_size);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 77c7dd7f05e8..78fb8ccabddd 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1015,7 +1015,7 @@ struct rpcrdma_req *rpcrdma_req_create(struct rpcrdma_xprt *r_xprt, size_t size,
>   	maxhdrsize = rpcrdma_fixed_maxsz + 3 +
>   		     r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz;
>   	maxhdrsize *= sizeof(__be32);
> -	rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize),
> +	rb = rpcrdma_regbuf_alloc(roundup_pow_of_two(maxhdrsize),
>   				  DMA_TO_DEVICE, flags);
>   	if (!rb)
>   		goto out2;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations
  2019-12-03 11:47 ` [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations Nicolas Saenz Julienne
  2019-12-03 15:53   ` Rob Herring
@ 2019-12-05 20:38   ` Bjorn Helgaas
  2019-12-12 13:21     ` Nicolas Saenz Julienne
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-12-05 20:38 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: andrew.murray, maz, linux-kernel, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Tariq Toukan, Rob Herring, Frank Rowand,
	Florian Fainelli, bcm-kernel-feedback-list, Eric Anholt,
	Stefan Wahren, Shawn Lin, Heiko Stuebner, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, james.quinlan, mbrugger, phil,
	jeremy.linton, linux-pci, linux-rpi-kernel, Rafael J. Wysocki,
	Len Brown, David S. Miller, linux-acpi, linux-arm-kernel, netdev,
	linux-rdma, devicetree, linux-rockchip, iommu

The subject contains a couple typos: it's missing "of" and it's
missing the "n" on "down".

On Tue, Dec 03, 2019 at 12:47:41PM +0100, Nicolas Saenz Julienne wrote:
> The function now is safe to use while expecting a 64bit value. Use it
> where relevant.

Please include the function names ("roundup_pow_of_two()",
"rounddown_pow_of_two()") in the changelog so it is self-contained and
doesn't depend on the subject.

> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

With the nits above and below addressed,

Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci

> ---
>  drivers/acpi/arm64/iort.c                        | 2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_clock.c    | 3 ++-
>  drivers/of/device.c                              | 3 ++-
>  drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
>  drivers/pci/controller/cadence/pcie-cadence.c    | 3 ++-
>  drivers/pci/controller/pcie-brcmstb.c            | 3 ++-
>  drivers/pci/controller/pcie-rockchip-ep.c        | 5 +++--
>  kernel/dma/direct.c                              | 2 +-
>  8 files changed, 15 insertions(+), 9 deletions(-)

> --- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
> +++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
> @@ -10,6 +10,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/sizes.h>
> +#include <linux/log2.h>
>  
>  #include "pcie-cadence.h"
>  
> @@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
>  	 * roundup_pow_of_two() returns an unsigned long, which is not suited
>  	 * for 64bit values.
>  	 */

Please remove the comment above since it no longer applies.

> -	sz = 1ULL << fls64(sz - 1);
> +	sz = roundup_pow_of_two(sz);
>  	aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
>  
>  	if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
> diff --git a/drivers/pci/controller/cadence/pcie-cadence.c b/drivers/pci/controller/cadence/pcie-cadence.c
> index cd795f6fc1e2..b1689f725b41 100644
> --- a/drivers/pci/controller/cadence/pcie-cadence.c
> +++ b/drivers/pci/controller/cadence/pcie-cadence.c
> @@ -4,6 +4,7 @@
>  // Author: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
>  
>  #include <linux/kernel.h>
> +#include <linux/log2.h>
>  
>  #include "pcie-cadence.h"
>  
> @@ -15,7 +16,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, u8 fn,
>  	 * roundup_pow_of_two() returns an unsigned long, which is not suited
>  	 * for 64bit values.
>  	 */

Same here.

> -	u64 sz = 1ULL << fls64(size - 1);
> +	u64 sz = roundup_pow_of_two(size);
>  	int nbits = ilog2(sz);
>  	u32 addr0, addr1, desc0, desc1;
>  
> --- a/drivers/pci/controller/pcie-rockchip-ep.c
> +++ b/drivers/pci/controller/pcie-rockchip-ep.c
> @@ -16,6 +16,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/pci-epf.h>
>  #include <linux/sizes.h>
> +#include <linux/log2.h>
>  
>  #include "pcie-rockchip.h"
>  
> @@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct rockchip_pcie *rockchip, u8 fn,
>  					 u32 r, u32 type, u64 cpu_addr,
>  					 u64 pci_addr, size_t size)
>  {
> -	u64 sz = 1ULL << fls64(size - 1);
> +	u64 sz = roundup_pow_of_two(size);
>  	int num_pass_bits = ilog2(sz);
>  	u32 addr0, addr1, desc0, desc1;
>  	bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG);
> @@ -176,7 +177,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
>  	 * roundup_pow_of_two() returns an unsigned long, which is not suited
>  	 * for 64bit values.
>  	 */

And here.

> -	sz = 1ULL << fls64(sz - 1);
> +	sz = roundup_pow_of_two(sz);
>  	aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
>  
>  	if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 6af7ae83c4ad..056886c4efec 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -53,7 +53,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
>  {
>  	u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
>  
> -	return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
> +	return rounddown_pow_of_two(max_dma) * 2 - 1;

Personally I would probably make this one a separate patch since it's
qualitatively different than the others and it would avoid the slight
awkwardness of the non-greppable "roundup/down_pow_of_two()"
construction in the commit subject.

But it's fine either way.

>  }
>  
>  static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
> -- 
> 2.24.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
                     ` (4 preceding siblings ...)
  2019-12-05 17:48   ` Robin Murphy
@ 2019-12-05 22:30   ` Bjorn Helgaas
  2019-12-12 13:16     ` Nicolas Saenz Julienne
  5 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-12-05 22:30 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: andrew.murray, maz, linux-kernel, Michael Turquette,
	Stephen Boyd, Emilio López, Maxime Ripard, Chen-Yu Tsai,
	Mike Marciniszyn, Dennis Dalessandro, Yishai Hadas, Moni Shoua,
	David Woodhouse, Lu Baolu, Joerg Roedel, Tom Lendacky,
	Mirko Lindner, Stephen Hemminger, Jiri Pirko,
	Solarflare linux maintainers, Edward Cree, Martin Habets,
	Eric Biederman, Thomas Graf, Herbert Xu, james.quinlan, mbrugger,
	f.fainelli, phil, wahrenst, jeremy.linton, linux-pci,
	linux-rpi-kernel, Robin Murphy, Doug Ledford, Jason Gunthorpe,
	David S. Miller, Trond Myklebust, Anna Schumaker,
	J. Bruce Fields, Chuck Lever, linux-clk, linux-arm-kernel,
	linux-rdma, iommu, netdev, kexec, linux-nfs

You got the "n" on "down" in the subject, but still missing "of" ;)

On Tue, Dec 03, 2019 at 12:47:40PM +0100, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.

Missing "of" in the function names here.
s/a wrapper/wrappers/

IIUC the point of this is that roundup_pow_of_two() returned
"unsigned long", which can be either 32 or 64 bits (worth pointing
out, I think), and many callers need something that returns
"unsigned long long" (always 64 bits).

It's a nice simplification to remove the "__" variants.  Just as a
casual reader of this commit message, I'd like to know why we had both
the roundup and the __roundup versions in the first place, and why we
no longer need both.

> -#define roundup_pow_of_two(n)			\
> -(						\
> -	__builtin_constant_p(n) ? (		\
> -		(n == 1) ? 1 :			\
> -		(1UL << (ilog2((n) - 1) + 1))	\
> -				   ) :		\
> -	__roundup_pow_of_two(n)			\
> - )
> +#define roundup_pow_of_two(n)			  \
> +(						  \
> +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> +)

Should the resulting type of this expression always be a ULL, even
when n==1, i.e., should it be this?

  1ULL : (1ULL << (ilog2((n) - 1) + 1))        \

Or maybe there's no case where that makes a difference?

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-05 17:48   ` Robin Murphy
@ 2019-12-12 12:31     ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-12 12:31 UTC (permalink / raw)
  To: Robin Murphy, andrew.murray, maz, linux-kernel,
	Michael Turquette, Stephen Boyd, Emilio López,
	Maxime Ripard, Chen-Yu Tsai, Mike Marciniszyn,
	Dennis Dalessandro, Yishai Hadas, Moni Shoua, David Woodhouse,
	Lu Baolu, Joerg Roedel, Tom Lendacky, Mirko Lindner,
	Stephen Hemminger, Jiri Pirko, Solarflare linux maintainers,
	Edward Cree, Martin Habets, Bjorn Helgaas, Eric Biederman,
	Thomas Graf, Herbert Xu
  Cc: james.quinlan, mbrugger, f.fainelli, phil, wahrenst,
	jeremy.linton, linux-pci, linux-rpi-kernel, Robin Murphy,
	Doug Ledford, Jason Gunthorpe, David S. Miller, Trond Myklebust,
	Anna Schumaker, J. Bruce Fields, Chuck Lever, linux-clk,
	linux-arm-kernel, linux-rdma, iommu, netdev, kexec, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]

Hi Robin,

On Thu, 2019-12-05 at 17:48 +0000, Robin Murphy wrote:
> On 03/12/2019 11:47 am, Nicolas Saenz Julienne wrote:
> > Some users need to make sure their rounding function accepts and returns
> > 64bit long variables regardless of the architecture. Sadly
> > roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> > out ilog2() already handles 32/64bit calculations properly, and being
> > the building block to the round functions we can rework them as a
> > wrapper around it.
> 
> Neat! Although all the additional ULL casts this introduces seem 
> somewhat unwelcome - I suppose the (1ULL << (ilog2(n))) makes it 
> effectively always return unsigned long long now. Might it make sense to 
> cast the return value to typeof(n) to avoid this slightly non-obvious 
> behaviour (and the associated churn)?

It might alleviate some of the churn alright but I don't think a cast is really
going to make the behaviour more obvious. Say your expression is a big mess,
you'll have to analyze it to infer the output type, keeping in mind things like
integer promotion. See this example, 'params->nelem_hint' and
'params->min_size' are u16:

	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
	index bdb7e4cadf05..70908678c7a8 100644
	--- a/lib/rhashtable.c
	+++ b/lib/rhashtable.c
	@@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)

		if (params->nelem_hint)
			retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
	-                             (unsigned long)params->min_size);
	+                             (unsigned long long)params->min_size);
		else
			retsize = max(HASH_DEFAULT_SIZE,
				      (unsigned long)params->min_size);

With a cast the patch will look like this:

	diff --git a/lib/rhashtable.c b/lib/rhashtable.c
	index bdb7e4cadf05..70908678c7a8 100644
	--- a/lib/rhashtable.c
	+++ b/lib/rhashtable.c
	@@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct rhashtable_params *params)

		if (params->nelem_hint)
			retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3),
	-                             (unsigned long)params->min_size);
	+                             (int)params->min_size);
		else
			retsize = max(HASH_DEFAULT_SIZE,
				      (unsigned long)params->min_size);

To me it's even less obvious than with a fixed ULL.

My intuition tells me to keep it as similar as the old behaviour, at the
expense of the extra churn (which is not that different from the current status
quo anyway). That said, I'll be happy to change it.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  2019-12-05 22:30   ` Bjorn Helgaas
@ 2019-12-12 13:16     ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-12 13:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: andrew.murray, maz, linux-kernel, Michael Turquette,
	Stephen Boyd, Emilio López, Maxime Ripard, Chen-Yu Tsai,
	Mike Marciniszyn, Dennis Dalessandro, Yishai Hadas, Moni Shoua,
	David Woodhouse, Lu Baolu, Joerg Roedel, Tom Lendacky,
	Mirko Lindner, Stephen Hemminger, Jiri Pirko,
	Solarflare linux maintainers, Edward Cree, Martin Habets,
	Eric Biederman, Thomas Graf, Herbert Xu, james.quinlan, mbrugger,
	f.fainelli, phil, wahrenst, jeremy.linton, linux-pci,
	linux-rpi-kernel, Robin Murphy, Doug Ledford, Jason Gunthorpe,
	David S. Miller, Trond Myklebust, Anna Schumaker,
	J. Bruce Fields, Chuck Lever, linux-clk, linux-arm-kernel,
	linux-rdma, iommu, netdev, kexec, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2374 bytes --]

On Thu, 2019-12-05 at 16:30 -0600, Bjorn Helgaas wrote:
> You got the "n" on "down" in the subject, but still missing "of" ;)

Yes, sorry about that, I tend to re-read what I meant to say instead of what
it's actually written.

> On Tue, Dec 03, 2019 at 12:47:40PM +0100, Nicolas Saenz Julienne wrote:
> > Some users need to make sure their rounding function accepts and returns
> > 64bit long variables regardless of the architecture. Sadly
> > roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> > out ilog2() already handles 32/64bit calculations properly, and being
> > the building block to the round functions we can rework them as a
> > wrapper around it.
> 
> Missing "of" in the function names here.
> s/a wrapper/wrappers/

Noted

> IIUC the point of this is that roundup_pow_of_two() returned
> "unsigned long", which can be either 32 or 64 bits (worth pointing
> out, I think), and many callers need something that returns
> "unsigned long long" (always 64 bits).

I'll update the commit message to be a more explicit.

> It's a nice simplification to remove the "__" variants.  Just as a
> casual reader of this commit message, I'd like to know why we had both
> the roundup and the __roundup versions in the first place, and why we
> no longer need both.

So, the commit that introduced it (312a0c170945b) meant to use the '__' variant
as a helper, but, due to the fact this is a header file, some found it and made
use of it. I went over some if the commits introducing '__' usages and none of
them seem to acknowledge its use as opposed to the macro version. I think it's
fair to say it's a case of cargo-culting.

> > -#define roundup_pow_of_two(n)			\
> > -(						\
> > -	__builtin_constant_p(n) ? (		\
> > -		(n == 1) ? 1 :			\
> > -		(1UL << (ilog2((n) - 1) + 1))	\
> > -				   ) :		\
> > -	__roundup_pow_of_two(n)			\
> > - )
> > +#define roundup_pow_of_two(n)			  \
> > +(						  \
> > +	(__builtin_constant_p(n) && ((n) == 1)) ? \
> > +	1 : (1ULL << (ilog2((n) - 1) + 1))        \
> > +)
> 
> Should the resulting type of this expression always be a ULL, even
> when n==1, i.e., should it be this?
> 
>   1ULL : (1ULL << (ilog2((n) - 1) + 1))        \
> 
> Or maybe there's no case where that makes a difference?

It should be 1ULL on either case.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations
  2019-12-05 20:38   ` Bjorn Helgaas
@ 2019-12-12 13:21     ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Saenz Julienne @ 2019-12-12 13:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: andrew.murray, maz, linux-kernel, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Tariq Toukan, Rob Herring, Frank Rowand,
	Florian Fainelli, bcm-kernel-feedback-list, Eric Anholt,
	Stefan Wahren, Shawn Lin, Heiko Stuebner, Christoph Hellwig,
	Marek Szyprowski, Robin Murphy, james.quinlan, mbrugger, phil,
	jeremy.linton, linux-pci, linux-rpi-kernel, Rafael J. Wysocki,
	Len Brown, David S. Miller, linux-acpi, linux-arm-kernel, netdev,
	linux-rdma, devicetree, linux-rockchip, iommu

[-- Attachment #1: Type: text/plain, Size: 2721 bytes --]

On Thu, 2019-12-05 at 14:38 -0600, Bjorn Helgaas wrote:
> The subject contains a couple typos: it's missing "of" and it's
> missing the "n" on "down".

Noted > 
> On Tue, Dec 03, 2019 at 12:47:41PM +0100, Nicolas Saenz Julienne wrote:
> > The function now is safe to use while expecting a 64bit value. Use it
> > where relevant.
> 
> Please include the function names ("roundup_pow_of_two()",
> "rounddown_pow_of_two()") in the changelog so it is self-contained and
> doesn't depend on the subject.

Noted

> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
> 
> With the nits above and below addressed,
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci

Thanks!

> > ---
> >  drivers/acpi/arm64/iort.c                        | 2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_clock.c    | 3 ++-
> >  drivers/of/device.c                              | 3 ++-
> >  drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
> >  drivers/pci/controller/cadence/pcie-cadence.c    | 3 ++-
> >  drivers/pci/controller/pcie-brcmstb.c            | 3 ++-
> >  drivers/pci/controller/pcie-rockchip-ep.c        | 5 +++--
> >  kernel/dma/direct.c                              | 2 +-
> >  8 files changed, 15 insertions(+), 9 deletions(-)
> > --- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
> > +++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/platform_device.h>
> >  #include <linux/pm_runtime.h>
> >  #include <linux/sizes.h>
> > +#include <linux/log2.h>
> >  
> >  #include "pcie-cadence.h"
> >  
> > @@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8
> > fn,
> >  	 * roundup_pow_of_two() returns an unsigned long, which is not suited
> >  	 * for 64bit values.
> >  	 */
> 
> Please remove the comment above since it no longer applies.

Noted

[...]

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 6af7ae83c4ad..056886c4efec 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -53,7 +53,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
> >  {
> >  	u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
> >  
> > -	return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
> > +	return rounddown_pow_of_two(max_dma) * 2 - 1;
> 
> Personally I would probably make this one a separate patch since it's
> qualitatively different than the others and it would avoid the slight
> awkwardness of the non-greppable "roundup/down_pow_of_two()"
> construction in the commit subject.
> 
> But it's fine either way.

I'll split it into two parts, as RobH made a similar complaint.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-12 13:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03 11:47 [PATCH v4 0/8] Raspberry Pi 4 PCIe support Nicolas Saenz Julienne
2019-12-03 11:47 ` [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two() Nicolas Saenz Julienne
2019-12-03 16:39   ` Chuck Lever
2019-12-04 10:56   ` Martin Habets
2019-12-04 14:17   ` Leon Romanovsky
2019-12-05  0:39   ` Lu Baolu
2019-12-05 17:48   ` Robin Murphy
2019-12-12 12:31     ` Nicolas Saenz Julienne
2019-12-05 22:30   ` Bjorn Helgaas
2019-12-12 13:16     ` Nicolas Saenz Julienne
2019-12-03 11:47 ` [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations Nicolas Saenz Julienne
2019-12-03 15:53   ` Rob Herring
2019-12-03 16:06     ` Nicolas Saenz Julienne
2019-12-05 20:38   ` Bjorn Helgaas
2019-12-12 13:21     ` Nicolas Saenz Julienne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).