* [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-10 18:44 ` Steve Wise 0 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-10 18:44 UTC (permalink / raw) To: rdreier; +Cc: general, randy.dunlap, linux-next, linux-kernel From: Steve Wise <swise@opengridcomputing.com> Removes the need for special u64 math on i386 systems. Fixes i386 build break in linux-next introduced by commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 2cf6f13..5bb299a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, return -EINVAL; } offset = sg_list[i].addr - mhp->attr.va_fbo; - offset += ((u64) mhp->attr.va_fbo) % - (1UL << (12 + mhp->attr.page_size)); + offset += mhp->attr.va_fbo & + ((1UL << (12 + mhp->attr.page_size)) - 1); pbl_addr[i] = ((mhp->attr.pbl_addr - rhp->rdev.rnic_info.pbl_base) >> 3) + (offset >> (12 + mhp->attr.page_size)); @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); /* to in the WQE == the offset into the page */ - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % - (1UL << (12 + page_size[i]))); + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & + ((1UL << (12 + page_size[i]))-1)); /* pbl_addr is the adapters address in the PBL */ wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-10 18:44 ` Steve Wise 0 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-10 18:44 UTC (permalink / raw) To: rdreier; +Cc: randy.dunlap, linux-next, linux-kernel, general From: Steve Wise <swise@opengridcomputing.com> Removes the need for special u64 math on i386 systems. Fixes i386 build break in linux-next introduced by commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 2cf6f13..5bb299a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, return -EINVAL; } offset = sg_list[i].addr - mhp->attr.va_fbo; - offset += ((u64) mhp->attr.va_fbo) % - (1UL << (12 + mhp->attr.page_size)); + offset += mhp->attr.va_fbo & + ((1UL << (12 + mhp->attr.page_size)) - 1); pbl_addr[i] = ((mhp->attr.pbl_addr - rhp->rdev.rnic_info.pbl_base) >> 3) + (offset >> (12 + mhp->attr.page_size)); @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); /* to in the WQE == the offset into the page */ - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % - (1UL << (12 + page_size[i]))); + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & + ((1UL << (12 + page_size[i]))-1)); /* pbl_addr is the adapters address in the PBL */ wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-10 18:44 ` [ofa-general] " Steve Wise @ 2009-02-10 19:04 ` Randy Dunlap -1 siblings, 0 replies; 23+ messages in thread From: Randy Dunlap @ 2009-02-10 19:04 UTC (permalink / raw) To: Steve Wise; +Cc: rdreier, general, linux-next, linux-kernel Steve Wise wrote: > From: Steve Wise <swise@opengridcomputing.com> > > Removes the need for special u64 math on i386 systems. > > Fixes i386 build break in linux-next introduced by > commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> Yes, that works, thanks. But this patch should go into 2.6.29, not just 2.6.30. > --- > > drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- > 1 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c > index 2cf6f13..5bb299a 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c > @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, > return -EINVAL; > } > offset = sg_list[i].addr - mhp->attr.va_fbo; > - offset += ((u64) mhp->attr.va_fbo) % > - (1UL << (12 + mhp->attr.page_size)); > + offset += mhp->attr.va_fbo & > + ((1UL << (12 + mhp->attr.page_size)) - 1); > pbl_addr[i] = ((mhp->attr.pbl_addr - > rhp->rdev.rnic_info.pbl_base) >> 3) + > (offset >> (12 + mhp->attr.page_size)); > @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, > wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); > > /* to in the WQE == the offset into the page */ > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > - (1UL << (12 + page_size[i]))); > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > + ((1UL << (12 + page_size[i]))-1)); > > /* pbl_addr is the adapters address in the PBL */ > wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); -- ~Randy ^ permalink raw reply [flat|nested] 23+ messages in thread
* [ofa-general] Re: [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-10 19:04 ` Randy Dunlap 0 siblings, 0 replies; 23+ messages in thread From: Randy Dunlap @ 2009-02-10 19:04 UTC (permalink / raw) To: Steve Wise; +Cc: rdreier, linux-next, linux-kernel, general Steve Wise wrote: > From: Steve Wise <swise@opengridcomputing.com> > > Removes the need for special u64 math on i386 systems. > > Fixes i386 build break in linux-next introduced by > commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> Yes, that works, thanks. But this patch should go into 2.6.29, not just 2.6.30. > --- > > drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- > 1 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c > index 2cf6f13..5bb299a 100644 > --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c > +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c > @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, > return -EINVAL; > } > offset = sg_list[i].addr - mhp->attr.va_fbo; > - offset += ((u64) mhp->attr.va_fbo) % > - (1UL << (12 + mhp->attr.page_size)); > + offset += mhp->attr.va_fbo & > + ((1UL << (12 + mhp->attr.page_size)) - 1); > pbl_addr[i] = ((mhp->attr.pbl_addr - > rhp->rdev.rnic_info.pbl_base) >> 3) + > (offset >> (12 + mhp->attr.page_size)); > @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, > wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); > > /* to in the WQE == the offset into the page */ > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > - (1UL << (12 + page_size[i]))); > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > + ((1UL << (12 + page_size[i]))-1)); > > /* pbl_addr is the adapters address in the PBL */ > wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); -- ~Randy ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-10 19:04 ` [ofa-general] " Randy Dunlap @ 2009-02-10 19:10 ` Steve Wise -1 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-10 19:10 UTC (permalink / raw) To: Randy Dunlap; +Cc: rdreier, general, linux-next, linux-kernel Randy Dunlap wrote: > Steve Wise wrote: > >> From: Steve Wise <swise@opengridcomputing.com> >> >> Removes the need for special u64 math on i386 systems. >> >> Fixes i386 build break in linux-next introduced by >> commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. >> >> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >> > > Yes, that works, thanks. But this patch should go into 2.6.29, not > just 2.6.30. > > > I thought the commit that caused this was: 1e27e8cee0698259ccb1fe6abeaf4b48969c0945 And that was going in 2.6.30. (I thought). >> --- >> >> drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- >> 1 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c >> index 2cf6f13..5bb299a 100644 >> --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c >> +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c >> @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, >> return -EINVAL; >> } >> offset = sg_list[i].addr - mhp->attr.va_fbo; >> - offset += ((u64) mhp->attr.va_fbo) % >> - (1UL << (12 + mhp->attr.page_size)); >> + offset += mhp->attr.va_fbo & >> + ((1UL << (12 + mhp->attr.page_size)) - 1); >> pbl_addr[i] = ((mhp->attr.pbl_addr - >> rhp->rdev.rnic_info.pbl_base) >> 3) + >> (offset >> (12 + mhp->attr.page_size)); >> @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, >> wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); >> >> /* to in the WQE == the offset into the page */ >> - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % >> - (1UL << (12 + page_size[i]))); >> + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & >> + ((1UL << (12 + page_size[i]))-1)); >> >> /* pbl_addr is the adapters address in the PBL */ >> wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); >> > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [ofa-general] Re: [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-10 19:10 ` Steve Wise 0 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-10 19:10 UTC (permalink / raw) To: Randy Dunlap; +Cc: rdreier, linux-next, linux-kernel, general Randy Dunlap wrote: > Steve Wise wrote: > >> From: Steve Wise <swise@opengridcomputing.com> >> >> Removes the need for special u64 math on i386 systems. >> >> Fixes i386 build break in linux-next introduced by >> commit 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. >> >> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >> > > Yes, that works, thanks. But this patch should go into 2.6.29, not > just 2.6.30. > > > I thought the commit that caused this was: 1e27e8cee0698259ccb1fe6abeaf4b48969c0945 And that was going in 2.6.30. (I thought). >> --- >> >> drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- >> 1 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c >> index 2cf6f13..5bb299a 100644 >> --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c >> +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c >> @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list, >> return -EINVAL; >> } >> offset = sg_list[i].addr - mhp->attr.va_fbo; >> - offset += ((u64) mhp->attr.va_fbo) % >> - (1UL << (12 + mhp->attr.page_size)); >> + offset += mhp->attr.va_fbo & >> + ((1UL << (12 + mhp->attr.page_size)) - 1); >> pbl_addr[i] = ((mhp->attr.pbl_addr - >> rhp->rdev.rnic_info.pbl_base) >> 3) + >> (offset >> (12 + mhp->attr.page_size)); >> @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, union t3_wr *wqe, >> wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); >> >> /* to in the WQE == the offset into the page */ >> - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % >> - (1UL << (12 + page_size[i]))); >> + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & >> + ((1UL << (12 + page_size[i]))-1)); >> >> /* pbl_addr is the adapters address in the PBL */ >> wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); >> > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-10 19:10 ` [ofa-general] " Steve Wise (?) @ 2009-02-10 19:12 ` Randy Dunlap -1 siblings, 0 replies; 23+ messages in thread From: Randy Dunlap @ 2009-02-10 19:12 UTC (permalink / raw) To: Steve Wise; +Cc: rdreier, general, linux-next, linux-kernel Steve Wise wrote: > > Randy Dunlap wrote: >> Steve Wise wrote: >> >>> From: Steve Wise <swise@opengridcomputing.com> >>> >>> Removes the need for special u64 math on i386 systems. >>> >>> Fixes i386 build break in linux-next introduced by commit >>> 1e27e8cee0698259ccb1fe6abeaf4b48969c0945. >>> >>> Signed-off-by: Steve Wise <swise@opengridcomputing.com> >>> >> >> Yes, that works, thanks. But this patch should go into 2.6.29, not >> just 2.6.30. >> >> >> > I thought the commit that caused this was: > > 1e27e8cee0698259ccb1fe6abeaf4b48969c0945 > > And that was going in 2.6.30. (I thought). Oh, OK. If that's the case, then you are obviously correct about [2.6.30]. Thanks. >>> --- >>> >>> drivers/infiniband/hw/cxgb3/iwch_qp.c | 8 ++++---- >>> 1 files changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c >>> b/drivers/infiniband/hw/cxgb3/iwch_qp.c >>> index 2cf6f13..5bb299a 100644 >>> --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c >>> +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c >>> @@ -232,8 +232,8 @@ static int iwch_sgl2pbl_map(struct iwch_dev *rhp, >>> struct ib_sge *sg_list, >>> return -EINVAL; >>> } >>> offset = sg_list[i].addr - mhp->attr.va_fbo; >>> - offset += ((u64) mhp->attr.va_fbo) % >>> - (1UL << (12 + mhp->attr.page_size)); >>> + offset += mhp->attr.va_fbo & >>> + ((1UL << (12 + mhp->attr.page_size)) - 1); >>> pbl_addr[i] = ((mhp->attr.pbl_addr - >>> rhp->rdev.rnic_info.pbl_base) >> 3) + >>> (offset >> (12 + mhp->attr.page_size)); >>> @@ -263,8 +263,8 @@ static int build_rdma_recv(struct iwch_qp *qhp, >>> union t3_wr *wqe, >>> wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length); >>> >>> /* to in the WQE == the offset into the page */ >>> - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % >>> - (1UL << (12 + page_size[i]))); >>> + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & >>> + ((1UL << (12 + page_size[i]))-1)); >>> >>> /* pbl_addr is the adapters address in the PBL */ >>> wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]); -- ~Randy ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-10 18:44 ` [ofa-general] " Steve Wise (?) (?) @ 2009-02-11 0:38 ` Roland Dreier 2009-02-11 1:03 ` Steve Wise ` (2 more replies) -1 siblings, 3 replies; 23+ messages in thread From: Roland Dreier @ 2009-02-11 0:38 UTC (permalink / raw) To: Steve Wise; +Cc: randy.dunlap, linux-next, linux-kernel, general I'll roll this into the offending patch (that is in -next). But: > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > - (1UL << (12 + page_size[i]))); > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > + ((1UL << (12 + page_size[i]))-1)); Is this required? Strength reduction optimization should do this automatically (and the code has been there for quite a while, so obviously it isn't causing problems) - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 0:38 ` [ofa-general] " Roland Dreier @ 2009-02-11 1:03 ` Steve Wise 2009-02-11 1:07 ` David Miller 2009-02-11 1:03 ` Steve Wise 2009-02-11 15:44 ` Steve Wise 2 siblings, 1 reply; 23+ messages in thread From: Steve Wise @ 2009-02-11 1:03 UTC (permalink / raw) To: Roland Dreier; +Cc: randy.dunlap, linux-next, linux-kernel, general Roland Dreier wrote: > I'll roll this into the offending patch (that is in -next). > > But: > > > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > > - (1UL << (12 + page_size[i]))); > > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > > + ((1UL << (12 + page_size[i]))-1)); > > Is this required? Strength reduction optimization should do this > automatically (and the code has been there for quite a while, so > obviously it isn't causing problems) > > - R. > Ok. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 1:03 ` Steve Wise @ 2009-02-11 1:07 ` David Miller 2009-02-11 1:18 ` Roland Dreier 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2009-02-11 1:07 UTC (permalink / raw) To: swise; +Cc: rdreier, randy.dunlap, linux-next, linux-kernel, general From: Steve Wise <swise@opengridcomputing.com> Date: Tue, 10 Feb 2009 19:03:52 -0600 > Roland Dreier wrote: > > I'll roll this into the offending patch (that is in -next). > > > > But: > > > > > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > > > - (1UL << (12 + page_size[i]))); > > > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > > > + ((1UL << (12 + page_size[i]))-1)); > > > > Is this required? Strength reduction optimization should do this > > automatically (and the code has been there for quite a while, so > > obviously it isn't causing problems) > > > > - R. > > > Ok. GCC won't optimize that modulus the way you expect, try for yourself and look at the assembler if you don't believe me. :-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 1:07 ` David Miller @ 2009-02-11 1:18 ` Roland Dreier 0 siblings, 0 replies; 23+ messages in thread From: Roland Dreier @ 2009-02-11 1:18 UTC (permalink / raw) To: David Miller; +Cc: swise, randy.dunlap, linux-next, linux-kernel, general > > Is this required? Strength reduction optimization should do this > > automatically (and the code has been there for quite a while, so > > obviously it isn't causing problems) > GCC won't optimize that modulus the way you expect, try for yourself > and look at the assembler if you don't believe me. :-) Are you thinking of the case when there are signed integers involved and so "% modulus" might produce a different result than "& (modulus - 1)" (because the compiler can't know that things are never negative)? Because in this case the compiler seems to do what I thought it would; the relevant part of the i386 assembly for wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % (1UL << (12 + page_size[i]))); is movl %eax, 28(%edi,%ebx) # <variable>.length, <variable>.len movzbl 28(%esp,%esi), %ecx # page_size, tmp89 movl $1, %eax #, tmp92 addl $12, %ecx #, tmp90 sall %cl, %eax # tmp90, tmp92 movl (%esp), %ecx # wr, decl %eax # tmp93 movl 12(%ecx), %edx # <variable>.sg_list, <variable>.sg_list andl (%edx,%ebx), %eax # <variable>.addr, tmp93 ie the compiler computes the modulus, then does decl to compute modulus-1 and then &s with it. Or am I misunderstanding your point? - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-11 1:18 ` Roland Dreier 0 siblings, 0 replies; 23+ messages in thread From: Roland Dreier @ 2009-02-11 1:18 UTC (permalink / raw) To: David Miller; +Cc: randy.dunlap, linux-next, general, linux-kernel > > Is this required? Strength reduction optimization should do this > > automatically (and the code has been there for quite a while, so > > obviously it isn't causing problems) > GCC won't optimize that modulus the way you expect, try for yourself > and look at the assembler if you don't believe me. :-) Are you thinking of the case when there are signed integers involved and so "% modulus" might produce a different result than "& (modulus - 1)" (because the compiler can't know that things are never negative)? Because in this case the compiler seems to do what I thought it would; the relevant part of the i386 assembly for wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % (1UL << (12 + page_size[i]))); is movl %eax, 28(%edi,%ebx) # <variable>.length, <variable>.len movzbl 28(%esp,%esi), %ecx # page_size, tmp89 movl $1, %eax #, tmp92 addl $12, %ecx #, tmp90 sall %cl, %eax # tmp90, tmp92 movl (%esp), %ecx # wr, decl %eax # tmp93 movl 12(%ecx), %edx # <variable>.sg_list, <variable>.sg_list andl (%edx,%ebx), %eax # <variable>.addr, tmp93 ie the compiler computes the modulus, then does decl to compute modulus-1 and then &s with it. Or am I misunderstanding your point? - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 1:18 ` Roland Dreier @ 2009-02-11 1:23 ` David Miller -1 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2009-02-11 1:23 UTC (permalink / raw) To: rdreier; +Cc: swise, randy.dunlap, linux-next, linux-kernel, general From: Roland Dreier <rdreier@cisco.com> Date: Tue, 10 Feb 2009 17:18:49 -0800 > > > Is this required? Strength reduction optimization should do this > > > automatically (and the code has been there for quite a while, so > > > obviously it isn't causing problems) > > > GCC won't optimize that modulus the way you expect, try for yourself > > and look at the assembler if you don't believe me. :-) > > Are you thinking of the case when there are signed integers involved and > so "% modulus" might produce a different result than "& (modulus - 1)" > (because the compiler can't know that things are never negative)? > Because in this case the compiler seems to do what I thought it would; > the relevant part of the i386 assembly for > > wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > (1UL << (12 + page_size[i]))); > > is > > movl %eax, 28(%edi,%ebx) # <variable>.length, > <variable>.len > movzbl 28(%esp,%esi), %ecx # page_size, tmp89 > movl $1, %eax #, tmp92 > addl $12, %ecx #, tmp90 > sall %cl, %eax # tmp90, tmp92 > movl (%esp), %ecx # wr, > decl %eax # tmp93 > movl 12(%ecx), %edx # <variable>.sg_list, <variable>.sg_list > andl (%edx,%ebx), %eax # <variable>.addr, tmp93 > > ie the compiler computes the modulus, then does decl to compute > modulus-1 and then &s with it. > > Or am I misunderstanding your point? Must be compiler and platform specific because with gcc-4.1.3 on sparc with -O2, for the test program: unsigned long page_size[4]; int main(int argc) { unsigned long long x = argc; return x % (1UL << (12 + page_size[argc])); } I get a call to __umoddi3: main: save %sp, -112, %sp sethi %hi(page_size), %g1 sll %i0, 2, %g3 or %g1, %lo(page_size), %g1 mov 1, %o2 ld [%g1+%g3], %g2 add %g2, 12, %g2 sll %o2, %g2, %o2 mov %i0, %o1 mov %o2, %o3 sra %i0, 31, %o0 call __umoddi3, 0 mov 0, %o2 jmp %i7+8 restore %g0, %o1, %o0 I get the same with gcc-4.3.0 and -O2 on 32-bit x86: main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) movl $1, %eax pushl %ebp movl %esp, %ebp pushl %ecx subl $20, %esp movl (%ecx), %edx movl page_size(,%edx,4), %ecx movl $0, 12(%esp) movl %edx, (%esp) addl $12, %ecx sall %cl, %eax movl %eax, 8(%esp) movl %edx, %eax sarl $31, %eax movl %eax, 4(%esp) call __umoddi3 addl $20, %esp popl %ecx popl %ebp leal -4(%ecx), %esp ret ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-11 1:23 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2009-02-11 1:23 UTC (permalink / raw) To: rdreier; +Cc: randy.dunlap, linux-next, general, linux-kernel From: Roland Dreier <rdreier@cisco.com> Date: Tue, 10 Feb 2009 17:18:49 -0800 > > > Is this required? Strength reduction optimization should do this > > > automatically (and the code has been there for quite a while, so > > > obviously it isn't causing problems) > > > GCC won't optimize that modulus the way you expect, try for yourself > > and look at the assembler if you don't believe me. :-) > > Are you thinking of the case when there are signed integers involved and > so "% modulus" might produce a different result than "& (modulus - 1)" > (because the compiler can't know that things are never negative)? > Because in this case the compiler seems to do what I thought it would; > the relevant part of the i386 assembly for > > wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > (1UL << (12 + page_size[i]))); > > is > > movl %eax, 28(%edi,%ebx) # <variable>.length, > <variable>.len > movzbl 28(%esp,%esi), %ecx # page_size, tmp89 > movl $1, %eax #, tmp92 > addl $12, %ecx #, tmp90 > sall %cl, %eax # tmp90, tmp92 > movl (%esp), %ecx # wr, > decl %eax # tmp93 > movl 12(%ecx), %edx # <variable>.sg_list, <variable>.sg_list > andl (%edx,%ebx), %eax # <variable>.addr, tmp93 > > ie the compiler computes the modulus, then does decl to compute > modulus-1 and then &s with it. > > Or am I misunderstanding your point? Must be compiler and platform specific because with gcc-4.1.3 on sparc with -O2, for the test program: unsigned long page_size[4]; int main(int argc) { unsigned long long x = argc; return x % (1UL << (12 + page_size[argc])); } I get a call to __umoddi3: main: save %sp, -112, %sp sethi %hi(page_size), %g1 sll %i0, 2, %g3 or %g1, %lo(page_size), %g1 mov 1, %o2 ld [%g1+%g3], %g2 add %g2, 12, %g2 sll %o2, %g2, %o2 mov %i0, %o1 mov %o2, %o3 sra %i0, 31, %o0 call __umoddi3, 0 mov 0, %o2 jmp %i7+8 restore %g0, %o1, %o0 I get the same with gcc-4.3.0 and -O2 on 32-bit x86: main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) movl $1, %eax pushl %ebp movl %esp, %ebp pushl %ecx subl $20, %esp movl (%ecx), %edx movl page_size(,%edx,4), %ecx movl $0, 12(%esp) movl %edx, (%esp) addl $12, %ecx sall %cl, %eax movl %eax, 8(%esp) movl %edx, %eax sarl $31, %eax movl %eax, 4(%esp) call __umoddi3 addl $20, %esp popl %ecx popl %ebp leal -4(%ecx), %esp ret ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 1:23 ` David Miller @ 2009-02-11 7:20 ` Roland Dreier -1 siblings, 0 replies; 23+ messages in thread From: Roland Dreier @ 2009-02-11 7:20 UTC (permalink / raw) To: David Miller; +Cc: randy.dunlap, linux-next, general, linux-kernel > Must be compiler and platform specific because with gcc-4.1.3 on > sparc with -O2, for the test program: > > unsigned long page_size[4]; > > int main(int argc) > { > unsigned long long x = argc; > > return x % (1UL << (12 + page_size[argc])); > } > > I get a call to __umoddi3: You're not testing the same thing. The original code was: wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % (1UL << (12 + page_size[i]))); and it's not that easy to see with all the parentheses, but the expression being done is (u32) % (unsigned long). So rather than unsigned long long in your program, you should have just done unsigned (u32 is unsigned int on all Linux architectures). In that case gcc does not generate a call to any library function in all the versions I have handy, although gcc 4.1 does do a div instead of an and. (And I don't think any 32-bit architectures require a library function for (unsigned) % (unsigned), so the code should be OK) Your example shows that gcc is missing a strength reduction opportunity in not handling (u64) % (unsigned long) on 32 bit architectures, but I guess it is a more difficult optimization to do, since gcc has to know that it can simply zero the top 32 bits. - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. @ 2009-02-11 7:20 ` Roland Dreier 0 siblings, 0 replies; 23+ messages in thread From: Roland Dreier @ 2009-02-11 7:20 UTC (permalink / raw) To: David Miller; +Cc: randy.dunlap, linux-next, linux-kernel, general > Must be compiler and platform specific because with gcc-4.1.3 on > sparc with -O2, for the test program: > > unsigned long page_size[4]; > > int main(int argc) > { > unsigned long long x = argc; > > return x % (1UL << (12 + page_size[argc])); > } > > I get a call to __umoddi3: You're not testing the same thing. The original code was: wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % (1UL << (12 + page_size[i]))); and it's not that easy to see with all the parentheses, but the expression being done is (u32) % (unsigned long). So rather than unsigned long long in your program, you should have just done unsigned (u32 is unsigned int on all Linux architectures). In that case gcc does not generate a call to any library function in all the versions I have handy, although gcc 4.1 does do a div instead of an and. (And I don't think any 32-bit architectures require a library function for (unsigned) % (unsigned), so the code should be OK) Your example shows that gcc is missing a strength reduction opportunity in not handling (u64) % (unsigned long) on 32 bit architectures, but I guess it is a more difficult optimization to do, since gcc has to know that it can simply zero the top 32 bits. - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 7:20 ` Roland Dreier (?) @ 2009-02-11 8:00 ` David Miller -1 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2009-02-11 8:00 UTC (permalink / raw) To: rdreier; +Cc: randy.dunlap, linux-next, general, linux-kernel From: Roland Dreier <rdreier@cisco.com> Date: Tue, 10 Feb 2009 23:20:39 -0800 > > unsigned long page_size[4]; > > > > int main(int argc) > > { > > unsigned long long x = argc; > > > > return x % (1UL << (12 + page_size[argc])); > > } > > > > I get a call to __umoddi3: > > You're not testing the same thing. The original code was: > > wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > (1UL << (12 + page_size[i]))); > > and it's not that easy to see with all the parentheses, but the > expression being done is (u32) % (unsigned long). So rather than > unsigned long long in your program, you should have just done unsigned > (u32 is unsigned int on all Linux architectures). In that case gcc does > not generate a call to any library function in all the versions I have > handy, although gcc 4.1 does do a div instead of an and. (And I don't > think any 32-bit architectures require a library function for (unsigned) > % (unsigned), so the code should be OK) > > Your example shows that gcc is missing a strength reduction opportunity > in not handling (u64) % (unsigned long) on 32 bit architectures, but I > guess it is a more difficult optimization to do, since gcc has to know > that it can simply zero the top 32 bits. Indeed, I get the divide if I use "unsigned int" for "x". I still think you should make this change, as many systems out there are getting the expensive divide. main: sethi %hi(page_size), %g1 or %g1, %lo(page_size), %g1 mov %o0, %g3 sll %o0, 2, %g4 ld [%g1+%g4], %g2 mov 1, %g1 add %g2, 12, %g2 sll %g1, %g2, %g1 wr %g0, %g0, %y nop nop nop udiv %o0, %g1, %o0 smul %o0, %g1, %o0 jmp %o7+8 sub %g3, %o0, %o0 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 0:38 ` [ofa-general] " Roland Dreier 2009-02-11 1:03 ` Steve Wise @ 2009-02-11 1:03 ` Steve Wise 2009-02-11 15:44 ` Steve Wise 2 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-11 1:03 UTC (permalink / raw) To: Roland Dreier; +Cc: randy.dunlap, linux-next, linux-kernel, general Roland Dreier wrote: > I'll roll this into the offending patch (that is in -next). > > But: > > > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > > - (1UL << (12 + page_size[i]))); > > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > > + ((1UL << (12 + page_size[i]))-1)); > > Is this required? Strength reduction optimization should do this > automatically (and the code has been there for quite a while, so > obviously it isn't causing problems) > > - R. > Ok. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 0:38 ` [ofa-general] " Roland Dreier 2009-02-11 1:03 ` Steve Wise 2009-02-11 1:03 ` Steve Wise @ 2009-02-11 15:44 ` Steve Wise 2009-02-11 18:12 ` Roland Dreier 2 siblings, 1 reply; 23+ messages in thread From: Steve Wise @ 2009-02-11 15:44 UTC (permalink / raw) To: Roland Dreier; +Cc: randy.dunlap, linux-next, linux-kernel, general Roland Dreier wrote: > I'll roll this into the offending patch (that is in -next). > > But: > > > - wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > > - (1UL << (12 + page_size[i]))); > > + wqe->recv.sgl[i].to = cpu_to_be64(((u64) wr->sg_list[i].addr) & > > + ((1UL << (12 + page_size[i]))-1)); > > Is this required? Strength reduction optimization should do this > automatically (and the code has been there for quite a while, so > obviously it isn't causing problems) > > - R. > Note that wr->sg_list[i].addr was being cast to a u32. That was wrong. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 15:44 ` Steve Wise @ 2009-02-11 18:12 ` Roland Dreier 2009-02-11 18:32 ` Steve Wise 0 siblings, 1 reply; 23+ messages in thread From: Roland Dreier @ 2009-02-11 18:12 UTC (permalink / raw) To: Steve Wise; +Cc: randy.dunlap, linux-next, linux-kernel, general > Note that wr->sg_list[i].addr was being cast to a u32. That was wrong. Is it possible for the page to be bigger than 4GB? If so then yes you might be chopping off high-order bits or something. Anyway please send me this change as a separate patch with a changelog explaining that you're avoiding the div etc.... I don't want to roll it in with the other unrelated fix (which changes code that was never upstream anyway). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 18:12 ` Roland Dreier @ 2009-02-11 18:32 ` Steve Wise 2009-02-11 18:36 ` Roland Dreier 0 siblings, 1 reply; 23+ messages in thread From: Steve Wise @ 2009-02-11 18:32 UTC (permalink / raw) To: Roland Dreier; +Cc: randy.dunlap, linux-next, linux-kernel, general Roland Dreier wrote: > > Note that wr->sg_list[i].addr was being cast to a u32. That was wrong. > > Is it possible for the page to be bigger than 4GB? If so then yes you > might be chopping off high-order bits or something. > Yes it is possible. A MR can be created with an iov_base of say 0xffffffff00000000. Then any sge.addr entries would be the iob_base + any offset. > Anyway please send me this change as a separate patch with a changelog > explaining that you're avoiding the div etc.... I don't want to roll it > in with the other unrelated fix (which changes code that was never > upstream anyway). > will do. So you are handling the offset patch that will make it u64 and remove the mod usage, correct? I will post a new patch with just this send change. Steve. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 18:32 ` Steve Wise @ 2009-02-11 18:36 ` Roland Dreier 2009-02-11 18:44 ` Steve Wise 0 siblings, 1 reply; 23+ messages in thread From: Roland Dreier @ 2009-02-11 18:36 UTC (permalink / raw) To: Steve Wise; +Cc: randy.dunlap, linux-next, linux-kernel, general > > Is it possible for the page to be bigger than 4GB? If so then yes you > > might be chopping off high-order bits or something. > Yes it is possible. > > A MR can be created with an iov_base of say 0xffffffff00000000. > > Then any sge.addr entries would be the iob_base + any offset. But the code we're talking about is: /* to in the WQE == the offset into the page */ wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % (1UL << (12 + page_size[i]))); so it seems the top address bits don't matter unless page_size[i] is at least 20 -- in which case using 1UL to shift overflows on 32 bits anyway... > So you are handling the offset patch that will make it u64 and remove > the mod usage, correct? Yeah, I rolled the fix into the "offset needs to be u64" patch, it should be in linux-next by now (or at least in my for-next branch). - R. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ofa-general] [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math. 2009-02-11 18:36 ` Roland Dreier @ 2009-02-11 18:44 ` Steve Wise 0 siblings, 0 replies; 23+ messages in thread From: Steve Wise @ 2009-02-11 18:44 UTC (permalink / raw) To: Roland Dreier; +Cc: randy.dunlap, linux-next, linux-kernel, general Roland Dreier wrote: > > > Is it possible for the page to be bigger than 4GB? If so then yes you > > > might be chopping off high-order bits or something. > > > Yes it is possible. > > > > A MR can be created with an iov_base of say 0xffffffff00000000. > > > > Then any sge.addr entries would be the iob_base + any offset. > > But the code we're talking about is: > > /* to in the WQE == the offset into the page */ > wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) % > (1UL << (12 + page_size[i]))); > > so it seems the top address bits don't matter unless page_size[i] is at > least 20 -- in which case using 1UL to shift overflows on 32 bits anyway... > > Yes yes...you're right. This code is really just saving the offset in a page. I'll send a new patch. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-02-11 18:44 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-02-10 18:44 [PATCH 2.6.30] RDMA/cxgb3: Remove modulo math Steve Wise 2009-02-10 18:44 ` [ofa-general] " Steve Wise 2009-02-10 19:04 ` Randy Dunlap 2009-02-10 19:04 ` [ofa-general] " Randy Dunlap 2009-02-10 19:10 ` Steve Wise 2009-02-10 19:10 ` [ofa-general] " Steve Wise 2009-02-10 19:12 ` Randy Dunlap 2009-02-11 0:38 ` [ofa-general] " Roland Dreier 2009-02-11 1:03 ` Steve Wise 2009-02-11 1:07 ` David Miller 2009-02-11 1:18 ` Roland Dreier 2009-02-11 1:18 ` Roland Dreier 2009-02-11 1:23 ` David Miller 2009-02-11 1:23 ` David Miller 2009-02-11 7:20 ` Roland Dreier 2009-02-11 7:20 ` Roland Dreier 2009-02-11 8:00 ` David Miller 2009-02-11 1:03 ` Steve Wise 2009-02-11 15:44 ` Steve Wise 2009-02-11 18:12 ` Roland Dreier 2009-02-11 18:32 ` Steve Wise 2009-02-11 18:36 ` Roland Dreier 2009-02-11 18:44 ` Steve Wise
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.