linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [openib-general] [PATCH  v4 01/13] Linux RDMA Core Changes
@ 2007-01-05 17:32 Felix Marti
  2007-01-05 18:59 ` Steve Wise
  2007-01-06 17:34 ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Felix Marti @ 2007-01-05 17:32 UTC (permalink / raw)
  To: Steve Wise, Roland Dreier; +Cc: linux-kernel, openib-general, netdev



> -----Original Message-----
> From: openib-general-bounces@openib.org [mailto:openib-general-
> bounces@openib.org] On Behalf Of Steve Wise
> Sent: Friday, January 05, 2007 6:22 AM
> To: Roland Dreier
> Cc: linux-kernel@vger.kernel.org; openib-general@openib.org;
> netdev@vger.kernel.org
> Subject: Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes
> 
> On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> > OK, I'm back from vacation today.
> >
> > Anyway I don't have a definitive statement on this right now.  I
guess
> > I agree that I don't like having an extra parameter to a function
that
> > should be pretty fast (although req notify isn't quite as hot as
> > something like posting a send request or polling a cq), given that
it
> > adds measurable overhead.  (And I am surprised that the overhead is
> > measurable, since 3 arguments still fit in registers, but OK).
> >
> > I also agree that adding an extra entry point just to pass in the
user
> > data is ugly, and also racy.
> >
> > Giving the kernel driver a pointer it can read seems OK I guess,
> > although it's a little ugly to have a backdoor channel like that.
> >
> 
> Another alternative is for the cq-index u32 memory to be allocated by
> the kernel and mapped into the user process.  So the lib can
read/write
> it, and the kernel can read it directly.  This is the fastest way
> perfwise, but I didn't want to do it because of the page granularity
of
> mapping.  IE it would require a page of address space (and backing
> memory I guess) just for 1 u32.  The CQ element array memory is
already
> allocated this way (and its DMA coherent too), but I didn't want to
> overload that memory with this extra variable either.  Mapping just
> seemed ugly and wasteful to me.
> 
> So given 3 approaches:
> 
> 1) allow user data to be passed into ib_req_notify_cq() via the
standard
> uverbs mechanisms.
> 
> 2) hide this in the chelsio driver and have the driver copyin the info
> directly.
> 
> 3) allocate the memory for this in the kernel and map it to the user
> process.
> 
> I chose 1 because it seemed the cleanest from an architecture point of
> view and I didn't think it would impact performance much.

[Felix Marti] In addition, is arming the CQ really in the performance
path? - Don't apps poll the CQ as long as there are pending CQEs and
only arm the CQ for notification once there is nothing left to do? If
this is the case, it would mean that we waste a few cycles 'idle'
cycles. Steve, next to the micro benchmark that you did, did you run any
performance benchmark that actually moves traffic? If so, did you see a
difference in performance?

> 
> 
> Steve.
> 
> 
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [openib-general] [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-05 17:32 [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes Felix Marti
@ 2007-01-05 18:59 ` Steve Wise
  2007-01-06 17:34 ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Steve Wise @ 2007-01-05 18:59 UTC (permalink / raw)
  To: Felix Marti; +Cc: Roland Dreier, linux-kernel, openib-general, netdev

On Fri, 2007-01-05 at 09:32 -0800, Felix Marti wrote:
> 
> > -----Original Message-----
> > From: openib-general-bounces@openib.org [mailto:openib-general-
> > bounces@openib.org] On Behalf Of Steve Wise
> > Sent: Friday, January 05, 2007 6:22 AM
> > To: Roland Dreier
> > Cc: linux-kernel@vger.kernel.org; openib-general@openib.org;
> > netdev@vger.kernel.org
> > Subject: Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes
> > 
> > On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> > > OK, I'm back from vacation today.
> > >
> > > Anyway I don't have a definitive statement on this right now.  I
> guess
> > > I agree that I don't like having an extra parameter to a function
> that
> > > should be pretty fast (although req notify isn't quite as hot as
> > > something like posting a send request or polling a cq), given that
> it
> > > adds measurable overhead.  (And I am surprised that the overhead is
> > > measurable, since 3 arguments still fit in registers, but OK).
> > >
> > > I also agree that adding an extra entry point just to pass in the
> user
> > > data is ugly, and also racy.
> > >
> > > Giving the kernel driver a pointer it can read seems OK I guess,
> > > although it's a little ugly to have a backdoor channel like that.
> > >
> > 
> > Another alternative is for the cq-index u32 memory to be allocated by
> > the kernel and mapped into the user process.  So the lib can
> read/write
> > it, and the kernel can read it directly.  This is the fastest way
> > perfwise, but I didn't want to do it because of the page granularity
> of
> > mapping.  IE it would require a page of address space (and backing
> > memory I guess) just for 1 u32.  The CQ element array memory is
> already
> > allocated this way (and its DMA coherent too), but I didn't want to
> > overload that memory with this extra variable either.  Mapping just
> > seemed ugly and wasteful to me.
> > 
> > So given 3 approaches:
> > 
> > 1) allow user data to be passed into ib_req_notify_cq() via the
> standard
> > uverbs mechanisms.
> > 
> > 2) hide this in the chelsio driver and have the driver copyin the info
> > directly.
> > 
> > 3) allocate the memory for this in the kernel and map it to the user
> > process.
> > 
> > I chose 1 because it seemed the cleanest from an architecture point of
> > view and I didn't think it would impact performance much.
> 
> [Felix Marti] In addition, is arming the CQ really in the performance
> path? - Don't apps poll the CQ as long as there are pending CQEs and
> only arm the CQ for notification once there is nothing left to do? If
> this is the case, it would mean that we waste a few cycles 'idle'
> cycles. 

I tend to agree.  This shouldn't be the hot perf path like
post_send/recv and poll.

> Steve, next to the micro benchmark that you did, did you run any
> performance benchmark that actually moves traffic? If so, did you see a
> difference in performance?
> 

No.  But I didn't explicitly measure with and without this one single
change.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-05 17:32 [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes Felix Marti
  2007-01-05 18:59 ` Steve Wise
@ 2007-01-06 17:34 ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-06 17:34 UTC (permalink / raw)
  To: Felix Marti
  Cc: Steve Wise, Roland Dreier, netdev, linux-kernel, openib-general

> [Felix Marti] In addition, is arming the CQ really in the performance
> path? - Don't apps poll the CQ as long as there are pending CQEs and
> only arm the CQ for notification once there is nothing left to do? If
> this is the case, it would mean that we waste a few cycles 'idle'
> cycles.

Applications such as IPoIB might queue up packets, then ARM the CQ,
and only then they are processed by the upper layers in the stack.
So arming the CQ is on hot datapath.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-04  5:07                         ` Michael S. Tsirkin
@ 2007-01-04 14:07                           ` Steve Wise
  0 siblings, 0 replies; 19+ messages in thread
From: Steve Wise @ 2007-01-04 14:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

On Thu, 2007-01-04 at 07:07 +0200, Michael S. Tsirkin wrote:
> > If you think I should not add the udata parameter to the req_notify_cq()
> > provider verb, then I can rework the chelsio driver:
> > 
> > 1) at cq creation time, pass the virtual address of the u32 used by the
> > library to track the current cq index.  That way the chelsio kernel
> > driver can save the address in its kernel cq context for later use.
> > 
> > 2) change chelsio's req_notify_cq() to copy in the current cq index
> > value directly for rearming.
> > 
> > This puts all the burden on the chelsio driver, which is apparently the
> > only one that needs this functionality.  
> 
> Good thinking, I haven't thought of this approach.
> 
> This way there won't be any API/core changes and no changes to
> other low level drivers, correct? And for chelsio, there's no overhead
> as compared to code you posted.
> 
> Sounds good.
> 

I still want to hear from Roland on this before I go to the effort of
reworking all this...


Steve.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 21:22                       ` [openib-general] " Steve Wise
@ 2007-01-04  5:07                         ` Michael S. Tsirkin
  2007-01-04 14:07                           ` Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-04  5:07 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

> If you think I should not add the udata parameter to the req_notify_cq()
> provider verb, then I can rework the chelsio driver:
> 
> 1) at cq creation time, pass the virtual address of the u32 used by the
> library to track the current cq index.  That way the chelsio kernel
> driver can save the address in its kernel cq context for later use.
> 
> 2) change chelsio's req_notify_cq() to copy in the current cq index
> value directly for rearming.
> 
> This puts all the burden on the chelsio driver, which is apparently the
> only one that needs this functionality.  

Good thinking, I haven't thought of this approach.

This way there won't be any API/core changes and no changes to
other low level drivers, correct? And for chelsio, there's no overhead
as compared to code you posted.

Sounds good.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 19:33                   ` Michael S. Tsirkin
@ 2007-01-03 20:20                     ` Steve Wise
  2007-01-03 21:22                       ` [openib-general] " Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2007-01-03 20:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

On Wed, 2007-01-03 at 21:33 +0200, Michael S. Tsirkin wrote:
> > Without extra param (1000 iterations in cycles): 
> > 	ave 101.283 min 91 max 247
> > With extra param (1000 iterations in cycles):
> > 	ave 103.311 min 91 max 221
> 
> A 2% hit then. Not huge, but 0 either.
> 
> > Convert cycles to ns (3466.727 MHz CPU):
> > 
> > Without: 101.283 / 3466.727 = .02922us == 29.22ns
> > With:    103.311 / 3466.727 = .02980us == 29.80ns
> > 
> > So I measure a .58ns average increase for passing in the additional
> > parameter.
> 
> That depends on CPU speed though. Percentage is likely to be more universal.
> 
> > Here is a snipit of the test:
> > 
> >         spin_lock_irq(&lock);
> >         do_gettimeofday(&start_tv);
> >         for (i=0; i<1000; i++) {
> >                 cycles_start[i] = get_cycles();
> >                 ib_req_notify_cq(cb->cq, IB_CQ_NEXT_COMP);
> >                 cycles_stop[i] = get_cycles();
> >         }
> >         do_gettimeofday(&stop_tv);
> >         spin_unlock_irq(&lock);
> > 
> >         if (stop_tv.tv_usec < start_tv.tv_usec) {
> >                 stop_tv.tv_usec += 1000000;
> >                 stop_tv.tv_sec  -= 1;
> >         }
> > 
> >         for (i=0; i < 1000; i++) {
> >                 cycles_t v = cycles_stop[i] - cycles_start[i];
> >                 sum += v;
> >                 if (v > max)
> >                         max = v;
> >                 if (min == 0 || v < min)
> >                         min = v;
> >         }
> > 
> >         printk(KERN_ERR PFX "FOO delta sec %lu usec %lu sum %llu min %llu max %llu\n",
> >                 stop_tv.tv_sec - start_tv.tv_sec,
> >                 stop_tv.tv_usec - start_tv.tv_usec,
> >                 (unsigned long long)sum, (unsigned long long)min,
> >                 (unsigned long long)max);
> 
> Good job, the test looks good, thanks.
> 
> So what does this tell you?
> To me it looks like there's a measurable speed difference,
> and so we should find a way (e.g. what I proposed) to enable chelsio userspace
> without adding overhead to other low level drivers or indeed chelsio kernel level code.
> 
> What do you think? Roland?
> 

I think having a 2nd function to set the udata seems onerous.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 19:17                 ` Steve Wise
@ 2007-01-03 19:33                   ` Michael S. Tsirkin
  2007-01-03 20:20                     ` Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 19:33 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

> Without extra param (1000 iterations in cycles): 
> 	ave 101.283 min 91 max 247
> With extra param (1000 iterations in cycles):
> 	ave 103.311 min 91 max 221

A 2% hit then. Not huge, but 0 either.

> Convert cycles to ns (3466.727 MHz CPU):
> 
> Without: 101.283 / 3466.727 = .02922us == 29.22ns
> With:    103.311 / 3466.727 = .02980us == 29.80ns
> 
> So I measure a .58ns average increase for passing in the additional
> parameter.

That depends on CPU speed though. Percentage is likely to be more universal.

> Here is a snipit of the test:
> 
>         spin_lock_irq(&lock);
>         do_gettimeofday(&start_tv);
>         for (i=0; i<1000; i++) {
>                 cycles_start[i] = get_cycles();
>                 ib_req_notify_cq(cb->cq, IB_CQ_NEXT_COMP);
>                 cycles_stop[i] = get_cycles();
>         }
>         do_gettimeofday(&stop_tv);
>         spin_unlock_irq(&lock);
> 
>         if (stop_tv.tv_usec < start_tv.tv_usec) {
>                 stop_tv.tv_usec += 1000000;
>                 stop_tv.tv_sec  -= 1;
>         }
> 
>         for (i=0; i < 1000; i++) {
>                 cycles_t v = cycles_stop[i] - cycles_start[i];
>                 sum += v;
>                 if (v > max)
>                         max = v;
>                 if (min == 0 || v < min)
>                         min = v;
>         }
> 
>         printk(KERN_ERR PFX "FOO delta sec %lu usec %lu sum %llu min %llu max %llu\n",
>                 stop_tv.tv_sec - start_tv.tv_sec,
>                 stop_tv.tv_usec - start_tv.tv_usec,
>                 (unsigned long long)sum, (unsigned long long)min,
>                 (unsigned long long)max);

Good job, the test looks good, thanks.

So what does this tell you?
To me it looks like there's a measurable speed difference,
and so we should find a way (e.g. what I proposed) to enable chelsio userspace
without adding overhead to other low level drivers or indeed chelsio kernel level code.

What do you think? Roland?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 15:18               ` Michael S. Tsirkin
@ 2007-01-03 19:17                 ` Steve Wise
  2007-01-03 19:33                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2007-01-03 19:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

> > > > ib_set_cq_udata() would transition into the kernel to pass in the
> > > > consumer's index.  In addition, ib_req_notify_cq would also transition
> > > > into the kernel since its not a bypass function for chelsio.
> > > 
> > > We misunderstand each other.
> > > 
> > > ib_uverbs_req_notify_cq is in drivers/infiniband/core/uverbs_cmd.c -
> > > all this code runs inside the IB_USER_VERBS_CMD_REQ_NOTIFY_CQ command,
> > > so there is a single user to kernel transition.
> > > 
> > 
> > Oh I see. 
> > 
> > This seems like a lot of extra code to avoid passing one extra arg to
> > the driver's req_notify_cq verb.  I'd appreciate other folk's input on
> > how important they think this is.  
> >
> > If you insist, then I'll run some tests specifically in kernel mode and
> > see how this affects mthca's req_notify performance.
> 
> This might be an interesting datapoint.
> 

Here's what I measured:

Without extra param (1000 iterations in cycles): 
	ave 101.283 min 91 max 247
With extra param (1000 iterations in cycles):
	ave 103.311 min 91 max 221

Convert cycles to ns (3466.727 MHz CPU):

Without: 101.283 / 3466.727 = .02922us == 29.22ns
With:    103.311 / 3466.727 = .02980us == 29.80ns

So I measure a .58ns average increase for passing in the additional
parameter.

Here is a snipit of the test:

        spin_lock_irq(&lock);
        do_gettimeofday(&start_tv);
        for (i=0; i<1000; i++) {
                cycles_start[i] = get_cycles();
                ib_req_notify_cq(cb->cq, IB_CQ_NEXT_COMP);
                cycles_stop[i] = get_cycles();
        }
        do_gettimeofday(&stop_tv);
        spin_unlock_irq(&lock);

        if (stop_tv.tv_usec < start_tv.tv_usec) {
                stop_tv.tv_usec += 1000000;
                stop_tv.tv_sec  -= 1;
        }

        for (i=0; i < 1000; i++) {
                cycles_t v = cycles_stop[i] - cycles_start[i];
                sum += v;
                if (v > max)
                        max = v;
                if (min == 0 || v < min)
                        min = v;
        }

        printk(KERN_ERR PFX "FOO delta sec %lu usec %lu sum %llu min %llu max %llu\n",
                stop_tv.tv_sec - start_tv.tv_sec,
                stop_tv.tv_usec - start_tv.tv_usec,
                (unsigned long long)sum, (unsigned long long)min,
                (unsigned long long)max);




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 15:07             ` Steve Wise
@ 2007-01-03 15:18               ` Michael S. Tsirkin
  2007-01-03 19:17                 ` Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 15:18 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, Roland Dreier, linux-kernel, openib-general

> > > > No, it won't need 2 transitions - just an extra function call,
> > > > so it won't hurt performance - it would improve performance.
> > > > 
> > > > ib_uverbs_req_notify_cq would call
> > > > 
> > > > 	ib_uverbs_req_notify_cq()
> > > > 	{
> > > > 			ib_set_cq_udata(cq, udata)
> > > > 			ib_req_notify_cq(cq, cmd.solicited_only ?
> > > > 				IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
> > > > 	}
> > > > 
> > > 
> > > ib_set_cq_udata() would transition into the kernel to pass in the
> > > consumer's index.  In addition, ib_req_notify_cq would also transition
> > > into the kernel since its not a bypass function for chelsio.
> > 
> > We misunderstand each other.
> > 
> > ib_uverbs_req_notify_cq is in drivers/infiniband/core/uverbs_cmd.c -
> > all this code runs inside the IB_USER_VERBS_CMD_REQ_NOTIFY_CQ command,
> > so there is a single user to kernel transition.
> > 
> 
> Oh I see. 
> 
> This seems like a lot of extra code to avoid passing one extra arg to
> the driver's req_notify_cq verb.  I'd appreciate other folk's input on
> how important they think this is.  
>
> If you insist, then I'll run some tests specifically in kernel mode and
> see how this affects mthca's req_notify performance.

This might be an interesting datapoint.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 15:06             ` Steve Wise
@ 2007-01-03 15:10               ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 15:10 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

> > > I've run this code with mthca and didn't notice any performance
> > > degradation, but I wasn't specifically measuring cq_poll overhead in a
> > > tight loop...
> > 
> > We were speaking about ib_req_notify_cq here, actually, not cq poll.
> > So what was tested?
> > 
> 
> Sorry, I meant req_notify.  I didn't specifically measure the cost of
> req_notify before and after this change.
> 
> I've been running the user mode perftest programs mainly.  

So, it's not really activated a lot there.
You want something like IPoIB BW test.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 15:00           ` Michael S. Tsirkin
@ 2007-01-03 15:07             ` Steve Wise
  2007-01-03 15:18               ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2007-01-03 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

On Wed, 2007-01-03 at 17:00 +0200, Michael S. Tsirkin wrote:
> > > 
> > > No, it won't need 2 transitions - just an extra function call,
> > > so it won't hurt performance - it would improve performance.
> > > 
> > > ib_uverbs_req_notify_cq would call
> > > 
> > > 	ib_uverbs_req_notify_cq()
> > > 	{
> > > 			ib_set_cq_udata(cq, udata)
> > > 			ib_req_notify_cq(cq, cmd.solicited_only ?
> > > 				IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
> > > 	}
> > > 
> > 
> > ib_set_cq_udata() would transition into the kernel to pass in the
> > consumer's index.  In addition, ib_req_notify_cq would also transition
> > into the kernel since its not a bypass function for chelsio.
> 
> We misunderstand each other.
> 
> ib_uverbs_req_notify_cq is in drivers/infiniband/core/uverbs_cmd.c -
> all this code runs inside the IB_USER_VERBS_CMD_REQ_NOTIFY_CQ command,
> so there is a single user to kernel transition.
> 

Oh I see. 

This seems like a lot of extra code to avoid passing one extra arg to
the driver's req_notify_cq verb.  I'd appreciate other folk's input on
how important they think this is.  

If you insist, then I'll run some tests specifically in kernel mode and
see how this affects mthca's req_notify performance.

Steve.






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 15:02           ` Michael S. Tsirkin
@ 2007-01-03 15:06             ` Steve Wise
  2007-01-03 15:10               ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2007-01-03 15:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

On Wed, 2007-01-03 at 17:02 +0200, Michael S. Tsirkin wrote:
> > I've run this code with mthca and didn't notice any performance
> > degradation, but I wasn't specifically measuring cq_poll overhead in a
> > tight loop...
> 
> We were speaking about ib_req_notify_cq here, actually, not cq poll.
> So what was tested?
> 

Sorry, I meant req_notify.  I didn't specifically measure the cost of
req_notify before and after this change.

I've been running the user mode perftest programs mainly.  




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 14:56         ` Steve Wise
  2007-01-03 15:00           ` Michael S. Tsirkin
@ 2007-01-03 15:02           ` Michael S. Tsirkin
  2007-01-03 15:06             ` Steve Wise
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 15:02 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

> I've run this code with mthca and didn't notice any performance
> degradation, but I wasn't specifically measuring cq_poll overhead in a
> tight loop...

We were speaking about ib_req_notify_cq here, actually, not cq poll.
So what was tested?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 14:56         ` Steve Wise
@ 2007-01-03 15:00           ` Michael S. Tsirkin
  2007-01-03 15:07             ` Steve Wise
  2007-01-03 15:02           ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 15:00 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

> > 
> > No, it won't need 2 transitions - just an extra function call,
> > so it won't hurt performance - it would improve performance.
> > 
> > ib_uverbs_req_notify_cq would call
> > 
> > 	ib_uverbs_req_notify_cq()
> > 	{
> > 			ib_set_cq_udata(cq, udata)
> > 			ib_req_notify_cq(cq, cmd.solicited_only ?
> > 				IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
> > 	}
> > 
> 
> ib_set_cq_udata() would transition into the kernel to pass in the
> consumer's index.  In addition, ib_req_notify_cq would also transition
> into the kernel since its not a bypass function for chelsio.

We misunderstand each other.

ib_uverbs_req_notify_cq is in drivers/infiniband/core/uverbs_cmd.c -
all this code runs inside the IB_USER_VERBS_CMD_REQ_NOTIFY_CQ command,
so there is a single user to kernel transition.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 14:41       ` Michael S. Tsirkin
@ 2007-01-03 14:56         ` Steve Wise
  2007-01-03 15:00           ` Michael S. Tsirkin
  2007-01-03 15:02           ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Steve Wise @ 2007-01-03 14:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

> > > 
> > > It seems all Chelsio needs is to pass in a consumer index - so, how about a new
> > > entry point? Something like void set_cq_udata(struct ib_cq *cq, struct ib_udata *udata)?
> > > 
> > 
> > Adding a new entry point would hurt chelsio's user mode performance if
> > if then requires 2 kernel transitions to rearm the cq.  
> 
> No, it won't need 2 transitions - just an extra function call,
> so it won't hurt performance - it would improve performance.
> 
> ib_uverbs_req_notify_cq would call
> 
> 	ib_uverbs_req_notify_cq()
> 	{
> 			ib_set_cq_udata(cq, udata)
> 			ib_req_notify_cq(cq, cmd.solicited_only ?
> 				IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
> 	}
> 

ib_set_cq_udata() would transition into the kernel to pass in the
consumer's index.  In addition, ib_req_notify_cq would also transition
into the kernel since its not a bypass function for chelsio.

> This way kernel consumers don't incur any overhead,
> and in userspace users extra function call is dwarfed
> by system call overhead.
> 
> > Passing in user data is sort of SOP for these sorts of verbs.  
> 
> I don't see other examples. Where we did pass extra user data
> is in non-data pass verbs such as create QP.
> 
> This is most inner tight loop in many ULPs, so we should be very careful
> about adding code there - these things do add up.
> See recent IRQ API update in kernel.

Roland, do you have any comments on this?  You previously indicated
these patches were good to go once chelsio's ethernet driver gets pulled
in. 

> > How much does passing one more param cost for kernel users?  
> 
> Donnu. I just reviewed the code.
> It really should be up to patch submitter to check the performance
> effect of his patch, if there might be any.

I've run this code with mthca and didn't notice any performance
degradation, but I wasn't specifically measuring cq_poll overhead in a
tight loop...







^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2007-01-03 14:25     ` Steve Wise
@ 2007-01-03 14:41       ` Michael S. Tsirkin
  2007-01-03 14:56         ` Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2007-01-03 14:41 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, netdev, linux-kernel, openib-general

> > > @@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
> > >  static inline int ib_req_notify_cq(struct ib_cq *cq,
> > >  				   enum ib_cq_notify cq_notify)
> > >  {
> > > -	return cq->device->req_notify_cq(cq, cq_notify);
> > > +	return cq->device->req_notify_cq(cq, cq_notify, NULL);
> > >  }
> > >  
> > >  /**
> > 
> > Can't say I like this adding overhead in data path operations (and note this
> > can't be optimized out). And kernel consumers work without passing it in, so it
> > hurts kernel code even for Chelsio. Granted, the cost is small here, but these
> > things do tend to add up.
> > 
> > It seems all Chelsio needs is to pass in a consumer index - so, how about a new
> > entry point? Something like void set_cq_udata(struct ib_cq *cq, struct ib_udata *udata)?
> > 
> 
> Adding a new entry point would hurt chelsio's user mode performance if
> if then requires 2 kernel transitions to rearm the cq.  

No, it won't need 2 transitions - just an extra function call,
so it won't hurt performance - it would improve performance.

ib_uverbs_req_notify_cq would call

	ib_uverbs_req_notify_cq()
	{
			ib_set_cq_udata(cq, udata)
			ib_req_notify_cq(cq, cmd.solicited_only ?
				IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
	}

This way kernel consumers don't incur any overhead,
and in userspace users extra function call is dwarfed
by system call overhead.

> Passing in user data is sort of SOP for these sorts of verbs.  

I don't see other examples. Where we did pass extra user data
is in non-data pass verbs such as create QP.

This is most inner tight loop in many ULPs, so we should be very careful
about adding code there - these things do add up.
See recent IRQ API update in kernel.

> How much does passing one more param cost for kernel users?  

Donnu. I just reviewed the code.
It really should be up to patch submitter to check the performance
effect of his patch, if there might be any.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2006-12-24  8:49   ` Michael S. Tsirkin
@ 2007-01-03 14:25     ` Steve Wise
  2007-01-03 14:41       ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2007-01-03 14:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: rdreier, netdev, linux-kernel, openib-general

> > @@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
> >  static inline int ib_req_notify_cq(struct ib_cq *cq,
> >  				   enum ib_cq_notify cq_notify)
> >  {
> > -	return cq->device->req_notify_cq(cq, cq_notify);
> > +	return cq->device->req_notify_cq(cq, cq_notify, NULL);
> >  }
> >  
> >  /**
> 
> Can't say I like this adding overhead in data path operations (and note this
> can't be optimized out). And kernel consumers work without passing it in, so it
> hurts kernel code even for Chelsio. Granted, the cost is small here, but these
> things do tend to add up.
> 
> It seems all Chelsio needs is to pass in a consumer index - so, how about a new
> entry point? Something like void set_cq_udata(struct ib_cq *cq, struct ib_udata *udata)?
> 

Adding a new entry point would hurt chelsio's user mode performance if
if then requires 2 kernel transitions to rearm the cq.  

Passing in user data is sort of SOP for these sorts of verbs.  

How much does passing one more param cost for kernel users?  




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH  v4 01/13] Linux RDMA Core Changes
  2006-12-14 13:53 ` [PATCH v4 01/13] Linux RDMA Core Changes Steve Wise
@ 2006-12-24  8:49   ` Michael S. Tsirkin
  2007-01-03 14:25     ` Steve Wise
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2006-12-24  8:49 UTC (permalink / raw)
  To: Steve Wise; +Cc: rdreier, netdev, linux-kernel, openib-general

> diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
> index 283d50b..15cbd49 100644
> --- a/drivers/infiniband/hw/mthca/mthca_cq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_cq.c
> @@ -722,7 +722,8 @@ repoll:
>  	return err == 0 || err == -EAGAIN ? npolled : err;
>  }
>  
> -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify)
> +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, 
> +		       struct ib_udata *udata)
>  {
>  	__be32 doorbell[2];
>  
> @@ -739,7 +740,8 @@ int mthca_tavor_arm_cq(struct ib_cq *cq,
>  	return 0;
>  }
>  
> -int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
> +int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
> +		       struct ib_udata *udata)
>  {
>  	struct mthca_cq *cq = to_mcq(ibcq);
>  	__be32 doorbell[2];
> diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
> index fe5cecf..6b9ccf6 100644
> --- a/drivers/infiniband/hw/mthca/mthca_dev.h
> +++ b/drivers/infiniband/hw/mthca/mthca_dev.h
> @@ -493,8 +493,8 @@ void mthca_unmap_eq_icm(struct mthca_dev
>  
>  int mthca_poll_cq(struct ib_cq *ibcq, int num_entries,
>  		  struct ib_wc *entry);
> -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
> -int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
> +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
> +int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
>  int mthca_init_cq(struct mthca_dev *dev, int nent,
>  		  struct mthca_ucontext *ctx, u32 pdn,
>  		  struct mthca_cq *cq);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 8eacc35..e3e1a2c 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -941,7 +941,8 @@ struct ib_device {
>  					      struct ib_wc *wc);
>  	int                        (*peek_cq)(struct ib_cq *cq, int wc_cnt);
>  	int                        (*req_notify_cq)(struct ib_cq *cq,
> -						    enum ib_cq_notify cq_notify);
> +						    enum ib_cq_notify cq_notify,
> +						    struct ib_udata *udata);
>  	int                        (*req_ncomp_notif)(struct ib_cq *cq,
>  						      int wc_cnt);
>  	struct ib_mr *             (*get_dma_mr)(struct ib_pd *pd,
> @@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
>  static inline int ib_req_notify_cq(struct ib_cq *cq,
>  				   enum ib_cq_notify cq_notify)
>  {
> -	return cq->device->req_notify_cq(cq, cq_notify);
> +	return cq->device->req_notify_cq(cq, cq_notify, NULL);
>  }
>  
>  /**

Can't say I like this adding overhead in data path operations (and note this
can't be optimized out). And kernel consumers work without passing it in, so it
hurts kernel code even for Chelsio. Granted, the cost is small here, but these
things do tend to add up.

It seems all Chelsio needs is to pass in a consumer index - so, how about a new
entry point? Something like void set_cq_udata(struct ib_cq *cq, struct ib_udata *udata)?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH  v4 01/13] Linux RDMA Core Changes
  2006-12-14 13:52 [PATCH v4 00/13] 2.6.20 Chelsio T3 RDMA Driver Steve Wise
@ 2006-12-14 13:53 ` Steve Wise
  2006-12-24  8:49   ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Wise @ 2006-12-14 13:53 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, openib-general, linux-kernel


Support provider-specific data in ib_uverbs_cmd_req_notify_cq().
The Chelsio iwarp provider library needs to pass information to the
kernel verb for re-arming the CQ.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/core/uverbs_cmd.c      |    9 +++++++--
 drivers/infiniband/hw/amso1100/c2.h       |    2 +-
 drivers/infiniband/hw/amso1100/c2_cq.c    |    3 ++-
 drivers/infiniband/hw/ehca/ehca_iverbs.h  |    3 ++-
 drivers/infiniband/hw/ehca/ehca_reqs.c    |    3 ++-
 drivers/infiniband/hw/ipath/ipath_cq.c    |    4 +++-
 drivers/infiniband/hw/ipath/ipath_verbs.h |    3 ++-
 drivers/infiniband/hw/mthca/mthca_cq.c    |    6 ++++--
 drivers/infiniband/hw/mthca/mthca_dev.h   |    4 ++--
 include/rdma/ib_verbs.h                   |    5 +++--
 10 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 743247e..5dd1de9 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -959,6 +959,7 @@ ssize_t ib_uverbs_req_notify_cq(struct i
 				int out_len)
 {
 	struct ib_uverbs_req_notify_cq cmd;
+	struct ib_udata		      udata;
 	struct ib_cq                  *cq;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
@@ -968,8 +969,12 @@ ssize_t ib_uverbs_req_notify_cq(struct i
 	if (!cq)
 		return -EINVAL;
 
-	ib_req_notify_cq(cq, cmd.solicited_only ?
-			 IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
+	INIT_UDATA(&udata, buf + sizeof cmd, 0,
+		   in_len - sizeof cmd, 0); 
+
+	cq->device->req_notify_cq(cq, cmd.solicited_only ?
+				  IB_CQ_SOLICITED : IB_CQ_NEXT_COMP,
+				  &udata);
 
 	put_cq_read(cq);
 
diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h
index 04a9db5..9a76869 100644
--- a/drivers/infiniband/hw/amso1100/c2.h
+++ b/drivers/infiniband/hw/amso1100/c2.h
@@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2
 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index);
 extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index);
 extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
-extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
+extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, struct ib_udata *udata);
 
 /* CM */
 extern int c2_llp_connect(struct iw_cm_id *cm_id,
diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c
index 05c9154..7ce8bca 100644
--- a/drivers/infiniband/hw/amso1100/c2_cq.c
+++ b/drivers/infiniband/hw/amso1100/c2_cq.c
@@ -217,7 +217,8 @@ int c2_poll_cq(struct ib_cq *ibcq, int n
 	return npolled;
 }
 
-int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+	      struct ib_udata *udata)
 {
 	struct c2_mq_shared __iomem *shared;
 	struct c2_cq *cq;
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 3720e30..566b30c 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -135,7 +135,8 @@ int ehca_poll_cq(struct ib_cq *cq, int n
 
 int ehca_peek_cq(struct ib_cq *cq, int wc_cnt);
 
-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify);
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+		       struct ib_udata *udata);
 
 struct ib_qp *ehca_create_qp(struct ib_pd *pd,
 			     struct ib_qp_init_attr *init_attr,
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index b46bda1..3ed6992 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -634,7 +634,8 @@ poll_cq_exit0:
 	return ret;
 }
 
-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify)
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+		       struct ib_udata *udata)
 {
 	struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 87462e0..27ba4db 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq)
  * ipath_req_notify_cq - change the notification type for a completion queue
  * @ibcq: the completion queue
  * @notify: the type of notification to request
+ * @udata: user data 
  *
  * Returns 0 for success.
  *
  * This may be called from interrupt context.  Also called by
  * ib_req_notify_cq() in the generic verbs code.
  */
-int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+			struct ib_udata *udata)
 {
 	struct ipath_cq *cq = to_icq(ibcq);
 	unsigned long flags;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 8039f6e..0d39960 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -716,7 +716,8 @@ struct ib_cq *ipath_create_cq(struct ib_
 
 int ipath_destroy_cq(struct ib_cq *ibcq);
 
-int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
+int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+			struct ib_udata *udata);
 
 int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
 
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index 283d50b..15cbd49 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -722,7 +722,8 @@ repoll:
 	return err == 0 || err == -EAGAIN ? npolled : err;
 }
 
-int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify)
+int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, 
+		       struct ib_udata *udata)
 {
 	__be32 doorbell[2];
 
@@ -739,7 +740,8 @@ int mthca_tavor_arm_cq(struct ib_cq *cq,
 	return 0;
 }
 
-int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+		       struct ib_udata *udata)
 {
 	struct mthca_cq *cq = to_mcq(ibcq);
 	__be32 doorbell[2];
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index fe5cecf..6b9ccf6 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -493,8 +493,8 @@ void mthca_unmap_eq_icm(struct mthca_dev
 
 int mthca_poll_cq(struct ib_cq *ibcq, int num_entries,
 		  struct ib_wc *entry);
-int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
-int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
+int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
+int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
 int mthca_init_cq(struct mthca_dev *dev, int nent,
 		  struct mthca_ucontext *ctx, u32 pdn,
 		  struct mthca_cq *cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8eacc35..e3e1a2c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -941,7 +941,8 @@ struct ib_device {
 					      struct ib_wc *wc);
 	int                        (*peek_cq)(struct ib_cq *cq, int wc_cnt);
 	int                        (*req_notify_cq)(struct ib_cq *cq,
-						    enum ib_cq_notify cq_notify);
+						    enum ib_cq_notify cq_notify,
+						    struct ib_udata *udata);
 	int                        (*req_ncomp_notif)(struct ib_cq *cq,
 						      int wc_cnt);
 	struct ib_mr *             (*get_dma_mr)(struct ib_pd *pd,
@@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
 static inline int ib_req_notify_cq(struct ib_cq *cq,
 				   enum ib_cq_notify cq_notify)
 {
-	return cq->device->req_notify_cq(cq, cq_notify);
+	return cq->device->req_notify_cq(cq, cq_notify, NULL);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-01-06 17:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-05 17:32 [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes Felix Marti
2007-01-05 18:59 ` Steve Wise
2007-01-06 17:34 ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2006-12-14 13:52 [PATCH v4 00/13] 2.6.20 Chelsio T3 RDMA Driver Steve Wise
2006-12-14 13:53 ` [PATCH v4 01/13] Linux RDMA Core Changes Steve Wise
2006-12-24  8:49   ` Michael S. Tsirkin
2007-01-03 14:25     ` Steve Wise
2007-01-03 14:41       ` Michael S. Tsirkin
2007-01-03 14:56         ` Steve Wise
2007-01-03 15:00           ` Michael S. Tsirkin
2007-01-03 15:07             ` Steve Wise
2007-01-03 15:18               ` Michael S. Tsirkin
2007-01-03 19:17                 ` Steve Wise
2007-01-03 19:33                   ` Michael S. Tsirkin
2007-01-03 20:20                     ` Steve Wise
2007-01-03 21:22                       ` [openib-general] " Steve Wise
2007-01-04  5:07                         ` Michael S. Tsirkin
2007-01-04 14:07                           ` Steve Wise
2007-01-03 15:02           ` Michael S. Tsirkin
2007-01-03 15:06             ` Steve Wise
2007-01-03 15:10               ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).