All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: queue_transaction interface + unique_ptr + performance
@ 2015-12-03  2:13 Somnath Roy
  2015-12-03  3:50 ` James (Fei) Liu-SSI
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Somnath Roy @ 2015-12-03  2:13 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil (sage@newdream.net),
	Samuel Just (sam.just@inktank.com)
  Cc: ceph-devel

*Also*, in this way we are unnecessary adding another smart pointer overhead in the Ceph IO path.
As I communicated sometimes back (probably 2 years now :-) ) in the community, profiler is showing these smart pointers (shared_ptr) as one of the hot spot. Now, I decided to actually measure this..Here is my findings from a sample application and using jemalloc.

1.  First, I measured the performance difference of just creation and deletion of various pointers..Here is the result..

##### Test conventional ptr ######
start: 1449107326 secs, 903873 usecs
end: 1449107353 secs, 578709 usecs
micros_used for conventional ptr: 26674836

##### Test Unique Smart ptr ######
start: 1449107353 secs, 578764 usecs
end: 1449107438 secs, 835114 usecs
micros_used for unique ptr: 85256350

##### Test Shared Smart ptr ######
start: 1449107438 secs, 835155 usecs
end: 1449107543 secs, 285443 usecs
micros_used for shared ptr: 104450288

So, as you can see >3x degradation with unique_ptr and ~4x degradation with shared_ptr.
My sample application is single threaded and I can see from perf top lot of other smart_ptr related functions are popping up reducing the actual % of jemalloc cpu usage (thus causing a slowdown).


2. Next, I added pointer dereferencing in the code..Here is the result..

##### Test conventional ptr ######
start: 1449107850 secs, 500595 usecs
end: 1449107876 secs, 936586 usecs
micros_used for conventional ptr: 26435991
##### Test Unique Smart ptr ######
start: 1449107876 secs, 936643 usecs
end: 1449107994 secs, 629418 usecs
micros_used for unique ptr: 117692775
##### Test Shared Smart ptr ######
start: 1449107994 secs, 629459 usecs
end: 1449108107 secs, 846052 usecs
micros_used for shared ptr: 113216593

This is interesting , not much change in case of conventional pointers but huge change for unique_ptr and some change for shared_ptr as well..So, now degradation for unique_ptr > 4X..This is probably inline with this http://stackoverflow.com/questions/8138284/about-unique-ptr-performances

3. I didn't measure the other stuff like std::move() , reference count in case of shared object etc. This will degrade the performance even more.

4. Here is the sample code in case anybody interested.

#include <iostream>
#include <memory>
#include <list>
#include <sys/time.h>
#include <stdint.h>

struct Foo { // object to manage
    Foo():xx(99),yy(999999) { /*std::cout << "Foo ctor\n";*/ }
    ~Foo() { /*std::cout << "~Foo dtor\n";*/ }
    int xx;
    long yy;
    char str[1024];
};
int main()
{
   struct timeval start, end;
   long secs_used,micros_used;
    printf("##### Test conventional ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      Foo* f = new Foo();
      int xxx = f->xx;
      long yyy = f->yy;
      delete f;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for conventional ptr: %d\n",micros_used);

    printf("##### Test Unique Smart ptr ######\n");

    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::unique_ptr<Foo> fu (new Foo());
      int xxx = fu->xx;
      long yyy = fu->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for unique ptr: %d\n",micros_used);

    printf("##### Test Shared Smart ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::shared_ptr<Foo> fs (new Foo());
      int xxx = fs->xx;
      long yyy = fs->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for shared ptr: %d\n",micros_used);
    std::cout <<"Existing..\n";
    return 0;
}



So, my guess is, the heavy use of these smart pointers in the Ceph IO path is bringing iops/core down substantially.
My suggestion is *not to introduce* any smart pointers in the objectstore interface.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 4:18 PM
To: Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: queue_transaction interface + unique_ptr

Hi Sage/Sam,
As discussed in today's performance meeting , I am planning to change the queue_transactions() interface to the following.

  int queue_transactions(Sequencer *osr, list<TransactionRef>& tls,
                         Context *onreadable, Context *ondisk=0,
                         Context *onreadable_sync=0,
                         TrackedOpRef op = TrackedOpRef(),
                         ThreadPool::TPHandle *handle = NULL) ;

typedef unique_ptr<Transaction> TransactionRef;


IMO , there is a problem with this approach.

The interface like apply_transaction(), queue_transaction() etc. basically the interfaces taking single transaction pointer and internally forming a list to call the queue_transactions() also needs to be changed to accept TransactionRef which will be *bad*. The reason is while preparing list internally we need to move the uniqueue_ptr and callers won't be aware of that.

Also, now changing every interfaces (and callers) that is taking Transaction* will produce a very big delta (and big testing effort as well). 

So, should we *reconsider* co-existing both  queue_transactions() interfaces and call the new one from the IO path ?

Thanks & Regards
Somnath



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03  2:13 queue_transaction interface + unique_ptr + performance Somnath Roy
@ 2015-12-03  3:50 ` James (Fei) Liu-SSI
  2015-12-03  3:59   ` Somnath Roy
  2015-12-03  7:21   ` Somnath Roy
  2015-12-03  8:42 ` Piotr.Dalek
  2015-12-03 11:50 ` Sage Weil
  2 siblings, 2 replies; 32+ messages in thread
From: James (Fei) Liu-SSI @ 2015-12-03  3:50 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil (sage@newdream.net),
	Samuel Just (sam.just@inktank.com)
  Cc: ceph-devel

Hi Somnath,
  Great findings. As you mentioned, unique_ptr and smart_ptr always have a well known problem about the performance comparing to conventional pointer , Got a chance to run your interesting code and close to what you find.
But I did not find any place in filestore/newstore (keyvaluestore and journal use them somewhere but not too heavily ) using smart pointer. Am I missing anything?

Thanks,
James

  ##### Test conventional ptr ######
start: %d secs, %d usecs
1449112986729634end: 1449113038 secs, 235140 usecs
micros_used for conventional ptr: 51505506
##### Test Unique Smart ptr ######
start: 1449113038 secs, 235186 usecs
end: 1449113158 secs, 825106 usecs
micros_used for unique ptr: 120589920
##### Test Shared Smart ptr ######
start: 1449113158 secs, 825130 usecs
end: 1449113322 secs, 132210 usecs
micros_used for shared ptr: 163307080
Existing..




-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 6:13 PM
To: Somnath Roy; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

*Also*, in this way we are unnecessary adding another smart pointer overhead in the Ceph IO path.
As I communicated sometimes back (probably 2 years now :-) ) in the community, profiler is showing these smart pointers (shared_ptr) as one of the hot spot. Now, I decided to actually measure this..Here is my findings from a sample application and using jemalloc.

1.  First, I measured the performance difference of just creation and deletion of various pointers..Here is the result..

##### Test conventional ptr ######
start: 1449107326 secs, 903873 usecs
end: 1449107353 secs, 578709 usecs
micros_used for conventional ptr: 26674836

##### Test Unique Smart ptr ######
start: 1449107353 secs, 578764 usecs
end: 1449107438 secs, 835114 usecs
micros_used for unique ptr: 85256350

##### Test Shared Smart ptr ######
start: 1449107438 secs, 835155 usecs
end: 1449107543 secs, 285443 usecs
micros_used for shared ptr: 104450288

So, as you can see >3x degradation with unique_ptr and ~4x degradation with shared_ptr.
My sample application is single threaded and I can see from perf top lot of other smart_ptr related functions are popping up reducing the actual % of jemalloc cpu usage (thus causing a slowdown).


2. Next, I added pointer dereferencing in the code..Here is the result..

##### Test conventional ptr ######
start: 1449107850 secs, 500595 usecs
end: 1449107876 secs, 936586 usecs
micros_used for conventional ptr: 26435991 ##### Test Unique Smart ptr ######
start: 1449107876 secs, 936643 usecs
end: 1449107994 secs, 629418 usecs
micros_used for unique ptr: 117692775
##### Test Shared Smart ptr ######
start: 1449107994 secs, 629459 usecs
end: 1449108107 secs, 846052 usecs
micros_used for shared ptr: 113216593

This is interesting , not much change in case of conventional pointers but huge change for unique_ptr and some change for shared_ptr as well..So, now degradation for unique_ptr > 4X..This is probably inline with this http://stackoverflow.com/questions/8138284/about-unique-ptr-performances

3. I didn't measure the other stuff like std::move() , reference count in case of shared object etc. This will degrade the performance even more.

4. Here is the sample code in case anybody interested.

#include <iostream>
#include <memory>
#include <list>
#include <sys/time.h>
#include <stdint.h>

struct Foo { // object to manage
    Foo():xx(99),yy(999999) { /*std::cout << "Foo ctor\n";*/ }
    ~Foo() { /*std::cout << "~Foo dtor\n";*/ }
    int xx;
    long yy;
    char str[1024];
};
int main()
{
   struct timeval start, end;
   long secs_used,micros_used;
    printf("##### Test conventional ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      Foo* f = new Foo();
      int xxx = f->xx;
      long yyy = f->yy;
      delete f;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for conventional ptr: %d\n",micros_used);

    printf("##### Test Unique Smart ptr ######\n");

    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::unique_ptr<Foo> fu (new Foo());
      int xxx = fu->xx;
      long yyy = fu->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for unique ptr: %d\n",micros_used);

    printf("##### Test Shared Smart ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::shared_ptr<Foo> fs (new Foo());
      int xxx = fs->xx;
      long yyy = fs->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for shared ptr: %d\n",micros_used);
    std::cout <<"Existing..\n";
    return 0;
}



So, my guess is, the heavy use of these smart pointers in the Ceph IO path is bringing iops/core down substantially.
My suggestion is *not to introduce* any smart pointers in the objectstore interface.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 4:18 PM
To: Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: queue_transaction interface + unique_ptr

Hi Sage/Sam,
As discussed in today's performance meeting , I am planning to change the queue_transactions() interface to the following.

  int queue_transactions(Sequencer *osr, list<TransactionRef>& tls,
                         Context *onreadable, Context *ondisk=0,
                         Context *onreadable_sync=0,
                         TrackedOpRef op = TrackedOpRef(),
                         ThreadPool::TPHandle *handle = NULL) ;

typedef unique_ptr<Transaction> TransactionRef;


IMO , there is a problem with this approach.

The interface like apply_transaction(), queue_transaction() etc. basically the interfaces taking single transaction pointer and internally forming a list to call the queue_transactions() also needs to be changed to accept TransactionRef which will be *bad*. The reason is while preparing list internally we need to move the uniqueue_ptr and callers won't be aware of that.

Also, now changing every interfaces (and callers) that is taking Transaction* will produce a very big delta (and big testing effort as well). 

So, should we *reconsider* co-existing both  queue_transactions() interfaces and call the new one from the IO path ?

Thanks & Regards
Somnath



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03  3:50 ` James (Fei) Liu-SSI
@ 2015-12-03  3:59   ` Somnath Roy
  2015-12-03  7:21   ` Somnath Roy
  1 sibling, 0 replies; 32+ messages in thread
From: Somnath Roy @ 2015-12-03  3:59 UTC (permalink / raw)
  To: James (Fei) Liu-SSI, Sage Weil (sage@newdream.net),
	Samuel Just (sam.just@inktank.com)
  Cc: ceph-devel

Thanks James for looking into this..
Shared_ptr used heavily in the OSD.cc/Replicated PG path..

Regards
Somnath

-----Original Message-----
From: James (Fei) Liu-SSI [mailto:james.liu@ssi.samsung.com] 
Sent: Wednesday, December 02, 2015 7:50 PM
To: Somnath Roy; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

Hi Somnath,
  Great findings. As you mentioned, unique_ptr and smart_ptr always have a well known problem about the performance comparing to conventional pointer , Got a chance to run your interesting code and close to what you find.
But I did not find any place in filestore/newstore (keyvaluestore and journal use them somewhere but not too heavily ) using smart pointer. Am I missing anything?

Thanks,
James

  ##### Test conventional ptr ######
start: %d secs, %d usecs
1449112986729634end: 1449113038 secs, 235140 usecs micros_used for conventional ptr: 51505506 ##### Test Unique Smart ptr ######
start: 1449113038 secs, 235186 usecs
end: 1449113158 secs, 825106 usecs
micros_used for unique ptr: 120589920
##### Test Shared Smart ptr ######
start: 1449113158 secs, 825130 usecs
end: 1449113322 secs, 132210 usecs
micros_used for shared ptr: 163307080
Existing..




-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 6:13 PM
To: Somnath Roy; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

*Also*, in this way we are unnecessary adding another smart pointer overhead in the Ceph IO path.
As I communicated sometimes back (probably 2 years now :-) ) in the community, profiler is showing these smart pointers (shared_ptr) as one of the hot spot. Now, I decided to actually measure this..Here is my findings from a sample application and using jemalloc.

1.  First, I measured the performance difference of just creation and deletion of various pointers..Here is the result..

##### Test conventional ptr ######
start: 1449107326 secs, 903873 usecs
end: 1449107353 secs, 578709 usecs
micros_used for conventional ptr: 26674836

##### Test Unique Smart ptr ######
start: 1449107353 secs, 578764 usecs
end: 1449107438 secs, 835114 usecs
micros_used for unique ptr: 85256350

##### Test Shared Smart ptr ######
start: 1449107438 secs, 835155 usecs
end: 1449107543 secs, 285443 usecs
micros_used for shared ptr: 104450288

So, as you can see >3x degradation with unique_ptr and ~4x degradation with shared_ptr.
My sample application is single threaded and I can see from perf top lot of other smart_ptr related functions are popping up reducing the actual % of jemalloc cpu usage (thus causing a slowdown).


2. Next, I added pointer dereferencing in the code..Here is the result..

##### Test conventional ptr ######
start: 1449107850 secs, 500595 usecs
end: 1449107876 secs, 936586 usecs
micros_used for conventional ptr: 26435991 ##### Test Unique Smart ptr ######
start: 1449107876 secs, 936643 usecs
end: 1449107994 secs, 629418 usecs
micros_used for unique ptr: 117692775
##### Test Shared Smart ptr ######
start: 1449107994 secs, 629459 usecs
end: 1449108107 secs, 846052 usecs
micros_used for shared ptr: 113216593

This is interesting , not much change in case of conventional pointers but huge change for unique_ptr and some change for shared_ptr as well..So, now degradation for unique_ptr > 4X..This is probably inline with this http://stackoverflow.com/questions/8138284/about-unique-ptr-performances

3. I didn't measure the other stuff like std::move() , reference count in case of shared object etc. This will degrade the performance even more.

4. Here is the sample code in case anybody interested.

#include <iostream>
#include <memory>
#include <list>
#include <sys/time.h>
#include <stdint.h>

struct Foo { // object to manage
    Foo():xx(99),yy(999999) { /*std::cout << "Foo ctor\n";*/ }
    ~Foo() { /*std::cout << "~Foo dtor\n";*/ }
    int xx;
    long yy;
    char str[1024];
};
int main()
{
   struct timeval start, end;
   long secs_used,micros_used;
    printf("##### Test conventional ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      Foo* f = new Foo();
      int xxx = f->xx;
      long yyy = f->yy;
      delete f;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for conventional ptr: %d\n",micros_used);

    printf("##### Test Unique Smart ptr ######\n");

    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::unique_ptr<Foo> fu (new Foo());
      int xxx = fu->xx;
      long yyy = fu->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for unique ptr: %d\n",micros_used);

    printf("##### Test Shared Smart ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::shared_ptr<Foo> fs (new Foo());
      int xxx = fs->xx;
      long yyy = fs->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for shared ptr: %d\n",micros_used);
    std::cout <<"Existing..\n";
    return 0;
}



So, my guess is, the heavy use of these smart pointers in the Ceph IO path is bringing iops/core down substantially.
My suggestion is *not to introduce* any smart pointers in the objectstore interface.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 4:18 PM
To: Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: queue_transaction interface + unique_ptr

Hi Sage/Sam,
As discussed in today's performance meeting , I am planning to change the queue_transactions() interface to the following.

  int queue_transactions(Sequencer *osr, list<TransactionRef>& tls,
                         Context *onreadable, Context *ondisk=0,
                         Context *onreadable_sync=0,
                         TrackedOpRef op = TrackedOpRef(),
                         ThreadPool::TPHandle *handle = NULL) ;

typedef unique_ptr<Transaction> TransactionRef;


IMO , there is a problem with this approach.

The interface like apply_transaction(), queue_transaction() etc. basically the interfaces taking single transaction pointer and internally forming a list to call the queue_transactions() also needs to be changed to accept TransactionRef which will be *bad*. The reason is while preparing list internally we need to move the uniqueue_ptr and callers won't be aware of that.

Also, now changing every interfaces (and callers) that is taking Transaction* will produce a very big delta (and big testing effort as well). 

So, should we *reconsider* co-existing both  queue_transactions() interfaces and call the new one from the IO path ?

Thanks & Regards
Somnath



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03  3:50 ` James (Fei) Liu-SSI
  2015-12-03  3:59   ` Somnath Roy
@ 2015-12-03  7:21   ` Somnath Roy
  1 sibling, 0 replies; 32+ messages in thread
From: Somnath Roy @ 2015-12-03  7:21 UTC (permalink / raw)
  To: James (Fei) Liu-SSI, Sage Weil (sage@newdream.net),
	Samuel Just (sam.just@inktank.com)
  Cc: ceph-devel

 Huh..One update :
Earlier I didn't use any compiler optimization, now with O2  I got this..

##### Test conventional ptr ######
start: 1449126917 secs, 565820 usecs
end: 1449126940 secs, 227713 usecs
micros_used for conventional ptr: 22661893

##### Test Unique Smart ptr ######
start: 1449126940 secs, 227758 usecs
end: 1449126962 secs, 917930 usecs
micros_used for unique ptr: 22690172

##### Test Shared Smart ptr ######
start: 1449126962 secs, 917973 usecs
end: 1449127009 secs, 179995 usecs
micros_used for shared ptr: 46262022

So, unique_ptr is having now very little overhead and for shared_ptr overhead is >2X.

Thanks & Regards
Somnath

-----Original Message-----
From: Somnath Roy 
Sent: Wednesday, December 02, 2015 7:59 PM
To: 'James (Fei) Liu-SSI'; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

Thanks James for looking into this..
Shared_ptr used heavily in the OSD.cc/Replicated PG path..

Regards
Somnath

-----Original Message-----
From: James (Fei) Liu-SSI [mailto:james.liu@ssi.samsung.com] 
Sent: Wednesday, December 02, 2015 7:50 PM
To: Somnath Roy; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

Hi Somnath,
  Great findings. As you mentioned, unique_ptr and smart_ptr always have a well known problem about the performance comparing to conventional pointer , Got a chance to run your interesting code and close to what you find.
But I did not find any place in filestore/newstore (keyvaluestore and journal use them somewhere but not too heavily ) using smart pointer. Am I missing anything?

Thanks,
James

  ##### Test conventional ptr ######
start: %d secs, %d usecs
1449112986729634end: 1449113038 secs, 235140 usecs micros_used for conventional ptr: 51505506 ##### Test Unique Smart ptr ######
start: 1449113038 secs, 235186 usecs
end: 1449113158 secs, 825106 usecs
micros_used for unique ptr: 120589920
##### Test Shared Smart ptr ######
start: 1449113158 secs, 825130 usecs
end: 1449113322 secs, 132210 usecs
micros_used for shared ptr: 163307080
Existing..




-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 6:13 PM
To: Somnath Roy; Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

*Also*, in this way we are unnecessary adding another smart pointer overhead in the Ceph IO path.
As I communicated sometimes back (probably 2 years now :-) ) in the community, profiler is showing these smart pointers (shared_ptr) as one of the hot spot. Now, I decided to actually measure this..Here is my findings from a sample application and using jemalloc.

1.  First, I measured the performance difference of just creation and deletion of various pointers..Here is the result..

##### Test conventional ptr ######
start: 1449107326 secs, 903873 usecs
end: 1449107353 secs, 578709 usecs
micros_used for conventional ptr: 26674836

##### Test Unique Smart ptr ######
start: 1449107353 secs, 578764 usecs
end: 1449107438 secs, 835114 usecs
micros_used for unique ptr: 85256350

##### Test Shared Smart ptr ######
start: 1449107438 secs, 835155 usecs
end: 1449107543 secs, 285443 usecs
micros_used for shared ptr: 104450288

So, as you can see >3x degradation with unique_ptr and ~4x degradation with shared_ptr.
My sample application is single threaded and I can see from perf top lot of other smart_ptr related functions are popping up reducing the actual % of jemalloc cpu usage (thus causing a slowdown).


2. Next, I added pointer dereferencing in the code..Here is the result..

##### Test conventional ptr ######
start: 1449107850 secs, 500595 usecs
end: 1449107876 secs, 936586 usecs
micros_used for conventional ptr: 26435991 ##### Test Unique Smart ptr ######
start: 1449107876 secs, 936643 usecs
end: 1449107994 secs, 629418 usecs
micros_used for unique ptr: 117692775
##### Test Shared Smart ptr ######
start: 1449107994 secs, 629459 usecs
end: 1449108107 secs, 846052 usecs
micros_used for shared ptr: 113216593

This is interesting , not much change in case of conventional pointers but huge change for unique_ptr and some change for shared_ptr as well..So, now degradation for unique_ptr > 4X..This is probably inline with this http://stackoverflow.com/questions/8138284/about-unique-ptr-performances

3. I didn't measure the other stuff like std::move() , reference count in case of shared object etc. This will degrade the performance even more.

4. Here is the sample code in case anybody interested.

#include <iostream>
#include <memory>
#include <list>
#include <sys/time.h>
#include <stdint.h>

struct Foo { // object to manage
    Foo():xx(99),yy(999999) { /*std::cout << "Foo ctor\n";*/ }
    ~Foo() { /*std::cout << "~Foo dtor\n";*/ }
    int xx;
    long yy;
    char str[1024];
};
int main()
{
   struct timeval start, end;
   long secs_used,micros_used;
    printf("##### Test conventional ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      Foo* f = new Foo();
      int xxx = f->xx;
      long yyy = f->yy;
      delete f;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for conventional ptr: %d\n",micros_used);

    printf("##### Test Unique Smart ptr ######\n");

    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::unique_ptr<Foo> fu (new Foo());
      int xxx = fu->xx;
      long yyy = fu->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for unique ptr: %d\n",micros_used);

    printf("##### Test Shared Smart ptr ######\n");
    gettimeofday(&start, NULL);
    for (uint64_t i = 0; i < 1000000000; i++) {
      std::shared_ptr<Foo> fs (new Foo());
      int xxx = fs->xx;
      long yyy = fs->yy;
    }
    gettimeofday(&end, NULL);

    printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
    printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);

    secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting first
    micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);

    printf("micros_used for shared ptr: %d\n",micros_used);
    std::cout <<"Existing..\n";
    return 0;
}



So, my guess is, the heavy use of these smart pointers in the Ceph IO path is bringing iops/core down substantially.
My suggestion is *not to introduce* any smart pointers in the objectstore interface.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Wednesday, December 02, 2015 4:18 PM
To: Sage Weil (sage@newdream.net); Samuel Just (sam.just@inktank.com)
Cc: ceph-devel@vger.kernel.org
Subject: queue_transaction interface + unique_ptr

Hi Sage/Sam,
As discussed in today's performance meeting , I am planning to change the queue_transactions() interface to the following.

  int queue_transactions(Sequencer *osr, list<TransactionRef>& tls,
                         Context *onreadable, Context *ondisk=0,
                         Context *onreadable_sync=0,
                         TrackedOpRef op = TrackedOpRef(),
                         ThreadPool::TPHandle *handle = NULL) ;

typedef unique_ptr<Transaction> TransactionRef;


IMO , there is a problem with this approach.

The interface like apply_transaction(), queue_transaction() etc. basically the interfaces taking single transaction pointer and internally forming a list to call the queue_transactions() also needs to be changed to accept TransactionRef which will be *bad*. The reason is while preparing list internally we need to move the uniqueue_ptr and callers won't be aware of that.

Also, now changing every interfaces (and callers) that is taking Transaction* will produce a very big delta (and big testing effort as well). 

So, should we *reconsider* co-existing both  queue_transactions() interfaces and call the new one from the IO path ?

Thanks & Regards
Somnath



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03  2:13 queue_transaction interface + unique_ptr + performance Somnath Roy
  2015-12-03  3:50 ` James (Fei) Liu-SSI
@ 2015-12-03  8:42 ` Piotr.Dalek
  2015-12-03 11:50 ` Sage Weil
  2 siblings, 0 replies; 32+ messages in thread
From: Piotr.Dalek @ 2015-12-03  8:42 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil (sage@newdream.net),
	Samuel Just (sam.just@inktank.com)
  Cc: ceph-devel

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-
> owner@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Thursday, December 03, 2015 3:13 AM
> 
> *Also*, in this way we are unnecessary adding another smart pointer
> overhead in the Ceph IO path.
> As I communicated sometimes back (probably 2 years now :-) ) in the
> community, profiler is showing these smart pointers (shared_ptr) as one of
> the hot spot. Now, I decided to actually measure this..Here is my findings
> from a sample application and using jemalloc.
>
> 1.  First, I measured the performance difference of just creation and deletion
> of various pointers..Here is the result..
> 
> [..]
> int main()
> {
>    struct timeval start, end;
>    long secs_used,micros_used;
>     printf("##### Test conventional ptr ######\n");
>     gettimeofday(&start, NULL);

Don't use gettimeofday, because it returns current wall time, which may be adjusted any second by ntpd. Use clock_gettime with CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW.

>     for (uint64_t i = 0; i < 1000000000; i++) {
>       Foo* f = new Foo();
>       int xxx = f->xx;
>       long yyy = f->yy;
>       delete f;
>     }
>     gettimeofday(&end, NULL);

>     printf("start: %d secs, %d usecs\n",start.tv_sec,start.tv_usec);
>     printf("end: %d secs, %d usecs\n",end.tv_sec,end.tv_usec);
>     secs_used=(end.tv_sec - start.tv_sec); //avoid overflow by subtracting
> first
>     micros_used= ((secs_used*1000000) + end.tv_usec) - (start.tv_usec);
> 
>     printf("micros_used for conventional ptr: %d\n",micros_used);

You calculate *total* time which is subject to drift due to context switches, interrupts, and other stuff getting in the way. Divide that by 1000000000 (times you ran your loop) to get average time per loop, which is way more convincing. Then, count the number of actual uses of particular thing and multiply by average times to get reliaible info whether the perf gain outweighs reasons why shared_ptrs and unique_ptrs are used in the first place.

I actually redid your testing and I can confirm that creation and freeing of shared_ptrs is >3x slower than simple or unique_ptrs on g++ -std=c++11 -O2:

##### Test conventional ptr ######         
nanoseconds used by conventional ptr: 66.33
##### Test Unique Smart ptr ######         
nanoseconds used by unique_ptr: 66.94      
##### Test Shared Smart ptr ######         
nanoseconds used by shared_ptr: 202.22     

Also, accessing data through those pointers costs basically the same.
But are we creating and destroying that many shared_ptrs for this to become actual problem? We may gain performance, but at (high) cost of fighting memory leaks and dangling pointers. I'm not sure if it's worth it in long run.


With best regards / Pozdrawiam
Piotr Dałek


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03  2:13 queue_transaction interface + unique_ptr + performance Somnath Roy
  2015-12-03  3:50 ` James (Fei) Liu-SSI
  2015-12-03  8:42 ` Piotr.Dalek
@ 2015-12-03 11:50 ` Sage Weil
  2015-12-03 15:16   ` Casey Bodley
                     ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Sage Weil @ 2015-12-03 11:50 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Samuel Just (sam.just@inktank.com), ceph-devel

1- I agree we should avoid shared_ptr whenever possible.

2- unique_ptr should not have any more overhead than a raw pointer--the 
compiler is enforcing the single-owner semantics.  See for example

	https://msdn.microsoft.com/en-us/library/hh279676.aspx

"It is exactly is efficient as a raw pointer and can be used in STL 
containers."

Unless the implementation is broken somehow?  That seems unlikely...

sage


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 11:50 ` Sage Weil
@ 2015-12-03 15:16   ` Casey Bodley
  2015-12-03 16:52     ` Somnath Roy
  2015-12-03 17:02   ` Somnath Roy
  2015-12-03 19:04   ` Matt Benjamin
  2 siblings, 1 reply; 32+ messages in thread
From: Casey Bodley @ 2015-12-03 15:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

Hi,

To those writing benchmarks, please note that the standard library
provides std::make_shared() as an optimization for shared_ptr. It
creates the object and its shared storage in a single allocation,
instead of the two allocations required for
"std::shared_ptr<Foo> fs (new Foo())".

Casey

----- Original Message -----
> 1- I agree we should avoid shared_ptr whenever possible.
> 
> 2- unique_ptr should not have any more overhead than a raw pointer--the
> compiler is enforcing the single-owner semantics.  See for example
> 
> 	https://msdn.microsoft.com/en-us/library/hh279676.aspx
> 
> "It is exactly is efficient as a raw pointer and can be used in STL
> containers."
> 
> Unless the implementation is broken somehow?  That seems unlikely...
> 
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 15:16   ` Casey Bodley
@ 2015-12-03 16:52     ` Somnath Roy
  2015-12-03 17:17       ` Adam C. Emerson
  0 siblings, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-03 16:52 UTC (permalink / raw)
  To: Casey Bodley, Sage Weil; +Cc: Samuel Just (sam.just@inktank.com), ceph-devel

I don't think make_shared / make_unique is part of c++11 (and ceph is using that). It is part of c++14 I guess..

Thanks & Regards
Somnath

-----Original Message-----
From: Casey Bodley [mailto:cbodley@redhat.com] 
Sent: Thursday, December 03, 2015 7:17 AM
To: Sage Weil
Cc: Somnath Roy; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

Hi,

To those writing benchmarks, please note that the standard library provides std::make_shared() as an optimization for shared_ptr. It creates the object and its shared storage in a single allocation, instead of the two allocations required for "std::shared_ptr<Foo> fs (new Foo())".

Casey

----- Original Message -----
> 1- I agree we should avoid shared_ptr whenever possible.
> 
> 2- unique_ptr should not have any more overhead than a raw 
> pointer--the compiler is enforcing the single-owner semantics.  See 
> for example
> 
> 	https://msdn.microsoft.com/en-us/library/hh279676.aspx
> 
> "It is exactly is efficient as a raw pointer and can be used in STL 
> containers."
> 
> Unless the implementation is broken somehow?  That seems unlikely...
> 
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 11:50 ` Sage Weil
  2015-12-03 15:16   ` Casey Bodley
@ 2015-12-03 17:02   ` Somnath Roy
  2015-12-03 17:25     ` Adam C. Emerson
  2015-12-03 19:04   ` Matt Benjamin
  2 siblings, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-03 17:02 UTC (permalink / raw)
  To: Sage Weil; +Cc: Samuel Just (sam.just@inktank.com), ceph-devel

Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
I will add the test of adding to list overhead and start implementing the new interface.
But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
Should we reconsider having two queue_transaction interface ?

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sage@newdream.net] 
Sent: Thursday, December 03, 2015 3:50 AM
To: Somnath Roy
Cc: Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

1- I agree we should avoid shared_ptr whenever possible.

2- unique_ptr should not have any more overhead than a raw pointer--the compiler is enforcing the single-owner semantics.  See for example

	https://msdn.microsoft.com/en-us/library/hh279676.aspx

"It is exactly is efficient as a raw pointer and can be used in STL containers."

Unless the implementation is broken somehow?  That seems unlikely...

sage


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 16:52     ` Somnath Roy
@ 2015-12-03 17:17       ` Adam C. Emerson
  2015-12-03 17:40         ` Somnath Roy
  0 siblings, 1 reply; 32+ messages in thread
From: Adam C. Emerson @ 2015-12-03 17:17 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Casey Bodley, Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On 03/12/2015, Somnath Roy wrote:
> I don't think make_shared / make_unique is part of c++11 (and ceph is using that). It is part of c++14 I guess..

std::make_shared is in C++11, std::make_unique is in C++14. In addition to
performance, std::make_shared is also more correct, in that:

  std::shared_ptr<T>(new T)

can leak object allocated by 'new T' if the allocation for the shared_ptr
fails.

std::make_unique is nice for cleanliness and symmetry, but doesn't add anything
for performance or safety. (Also you can write your own make_unique, where
make_shared is intimately involved with the details of shared_ptr.)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 603 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:02   ` Somnath Roy
@ 2015-12-03 17:25     ` Adam C. Emerson
  2015-12-03 17:48       ` Somnath Roy
  0 siblings, 1 reply; 32+ messages in thread
From: Adam C. Emerson @ 2015-12-03 17:25 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

On 03/12/2015, Somnath Roy wrote:
> Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
> I will add the test of adding to list overhead and start implementing the new interface.
> But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
> Should we reconsider having two queue_transaction interface ?

As I understand it, the concern with switching to unique_ptr was that the callee
would move from the reference without this being known to the caller.

Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? That
way the compiler should demand that the callers explicitly use std::move on the
reference they're holding, documenting at the site of the call that they're
willing to give up ownership.


-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 603 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:17       ` Adam C. Emerson
@ 2015-12-03 17:40         ` Somnath Roy
  2015-12-03 17:48           ` Samuel Just
  0 siblings, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-03 17:40 UTC (permalink / raw)
  To: Adam C. Emerson
  Cc: Casey Bodley, Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

Hmm..Thanks Adam, so, one of the cleanups on the Ceph path could be to introduce make_shared..
We will come up with a prototype of optimized io path with removing unnecessary use of shared_ptr and see how much overall gain we are getting..

Thanks & Regards
Somnath

-----Original Message-----
From: Adam C. Emerson [mailto:aemerson@redhat.com] 
Sent: Thursday, December 03, 2015 9:17 AM
To: Somnath Roy
Cc: Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

On 03/12/2015, Somnath Roy wrote:
> I don't think make_shared / make_unique is part of c++11 (and ceph is using that). It is part of c++14 I guess..

std::make_shared is in C++11, std::make_unique is in C++14. In addition to performance, std::make_shared is also more correct, in that:

  std::shared_ptr<T>(new T)

can leak object allocated by 'new T' if the allocation for the shared_ptr fails.

std::make_unique is nice for cleanliness and symmetry, but doesn't add anything for performance or safety. (Also you can write your own make_unique, where make_shared is intimately involved with the details of shared_ptr.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:25     ` Adam C. Emerson
@ 2015-12-03 17:48       ` Somnath Roy
  2015-12-03 17:50         ` Samuel Just
  0 siblings, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-03 17:48 UTC (permalink / raw)
  To: Adam C. Emerson; +Cc: Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

Yes, that we can do..But, in that case aren't we restricting user if they want to do something with this Transaction object later. I didn't go through each and every part of the code yet (which is huge) that are using these interfaces to understand if it is using Transaction object afterwards.

Thanks & Regards
Somnath
-----Original Message-----
From: Adam C. Emerson [mailto:aemerson@redhat.com] 
Sent: Thursday, December 03, 2015 9:25 AM
To: Somnath Roy
Cc: Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

On 03/12/2015, Somnath Roy wrote:
> Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
> I will add the test of adding to list overhead and start implementing the new interface.
> But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
> Should we reconsider having two queue_transaction interface ?

As I understand it, the concern with switching to unique_ptr was that the callee would move from the reference without this being known to the caller.

Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? That way the compiler should demand that the callers explicitly use std::move on the reference they're holding, documenting at the site of the call that they're willing to give up ownership.


-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:40         ` Somnath Roy
@ 2015-12-03 17:48           ` Samuel Just
  0 siblings, 0 replies; 32+ messages in thread
From: Samuel Just @ 2015-12-03 17:48 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Adam C. Emerson, Casey Bodley, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

I still want there to be a single instance.  You need to have the
interface take TransactionRef by rvalue ref to make that work
(TransactionRef &&) as Adam mentioned.  This forces the caller to
either pass an rvalue or use std::move() -- which forces it to be
explicit.  The point of unique_ptr is to enforces single ownership
(any current violation of which would be a serious bug) with minimal
overhead.  I suspect that the performance issue you saw initially was
that the compiler simply wasn't in-lining the construction?
shared_ptr is of course more expensive due to the need to maintain a
thread safe ref count.  Any user which actually needs the ref count
would probably be just as slow with a hand-rolled implementation
(assuming we use make_shared).  Any user which doesn't actually need
the ref count could be switched to use unique_ptr.
-Sam

On Thu, Dec 3, 2015 at 9:40 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Hmm..Thanks Adam, so, one of the cleanups on the Ceph path could be to introduce make_shared..
> We will come up with a prototype of optimized io path with removing unnecessary use of shared_ptr and see how much overall gain we are getting..
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Adam C. Emerson [mailto:aemerson@redhat.com]
> Sent: Thursday, December 03, 2015 9:17 AM
> To: Somnath Roy
> Cc: Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
> Subject: Re: queue_transaction interface + unique_ptr + performance
>
> On 03/12/2015, Somnath Roy wrote:
>> I don't think make_shared / make_unique is part of c++11 (and ceph is using that). It is part of c++14 I guess..
>
> std::make_shared is in C++11, std::make_unique is in C++14. In addition to performance, std::make_shared is also more correct, in that:
>
>   std::shared_ptr<T>(new T)
>
> can leak object allocated by 'new T' if the allocation for the shared_ptr fails.
>
> std::make_unique is nice for cleanliness and symmetry, but doesn't add anything for performance or safety. (Also you can write your own make_unique, where make_shared is intimately involved with the details of shared_ptr.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:48       ` Somnath Roy
@ 2015-12-03 17:50         ` Samuel Just
  2015-12-03 17:52           ` Samuel Just
  2015-12-03 17:53           ` Somnath Roy
  0 siblings, 2 replies; 32+ messages in thread
From: Samuel Just @ 2015-12-03 17:50 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Adam C. Emerson, Sage Weil, Samuel Just (sam.just@inktank.com),
	ceph-devel

As far as I know, there are no current users which want to use the
Transaction later.  You could also change apply_transaction to copy
the Transaction into a unique_ptr since I don't think it's used in any
performance sensitive code paths.
-Sam

On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Yes, that we can do..But, in that case aren't we restricting user if they want to do something with this Transaction object later. I didn't go through each and every part of the code yet (which is huge) that are using these interfaces to understand if it is using Transaction object afterwards.
>
> Thanks & Regards
> Somnath
> -----Original Message-----
> From: Adam C. Emerson [mailto:aemerson@redhat.com]
> Sent: Thursday, December 03, 2015 9:25 AM
> To: Somnath Roy
> Cc: Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
> Subject: Re: queue_transaction interface + unique_ptr + performance
>
> On 03/12/2015, Somnath Roy wrote:
>> Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
>> I will add the test of adding to list overhead and start implementing the new interface.
>> But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
>> Should we reconsider having two queue_transaction interface ?
>
> As I understand it, the concern with switching to unique_ptr was that the callee would move from the reference without this being known to the caller.
>
> Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? That way the compiler should demand that the callers explicitly use std::move on the reference they're holding, documenting at the site of the call that they're willing to give up ownership.
>
>
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:50         ` Samuel Just
@ 2015-12-03 17:52           ` Samuel Just
  2015-12-03 17:53           ` Somnath Roy
  1 sibling, 0 replies; 32+ messages in thread
From: Samuel Just @ 2015-12-03 17:52 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Adam C. Emerson, Sage Weil, Samuel Just (sam.just@inktank.com),
	ceph-devel

To be clear, I'm ok with having a queue_transaction which takes a
Transaction by value and moves it into a dynamic unique_ptr controlled
instance, but I don't want the filestore passing a naked pointer
around internally with a flag indicating whether it needs to be
deleted by the filestore.  If there is an actual user which requires
the transaction to stick around afterwards *and* cannot afford to copy
it, then we can talk about a design which accomplishes that.
-Sam

On Thu, Dec 3, 2015 at 9:50 AM, Samuel Just <sjust@redhat.com> wrote:
> As far as I know, there are no current users which want to use the
> Transaction later.  You could also change apply_transaction to copy
> the Transaction into a unique_ptr since I don't think it's used in any
> performance sensitive code paths.
> -Sam
>
> On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>> Yes, that we can do..But, in that case aren't we restricting user if they want to do something with this Transaction object later. I didn't go through each and every part of the code yet (which is huge) that are using these interfaces to understand if it is using Transaction object afterwards.
>>
>> Thanks & Regards
>> Somnath
>> -----Original Message-----
>> From: Adam C. Emerson [mailto:aemerson@redhat.com]
>> Sent: Thursday, December 03, 2015 9:25 AM
>> To: Somnath Roy
>> Cc: Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
>> Subject: Re: queue_transaction interface + unique_ptr + performance
>>
>> On 03/12/2015, Somnath Roy wrote:
>>> Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
>>> I will add the test of adding to list overhead and start implementing the new interface.
>>> But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
>>> Should we reconsider having two queue_transaction interface ?
>>
>> As I understand it, the concern with switching to unique_ptr was that the callee would move from the reference without this being known to the caller.
>>
>> Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? That way the compiler should demand that the callers explicitly use std::move on the reference they're holding, documenting at the site of the call that they're willing to give up ownership.
>>
>>
>> --
>> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
>> IRC: Aemerson@{RedHat, OFTC, Freenode}
>> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:50         ` Samuel Just
  2015-12-03 17:52           ` Samuel Just
@ 2015-12-03 17:53           ` Somnath Roy
  2015-12-03 19:09             ` Casey Bodley
  1 sibling, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-03 17:53 UTC (permalink / raw)
  To: Samuel Just
  Cc: Adam C. Emerson, Sage Weil, Samuel Just (sam.just@inktank.com),
	ceph-devel

Yes, if we can do that, that will be far more easier..I will double check that if apply_transaction is been used from any of the performance sensitive path and do the changes accordingly..Thanks..

-----Original Message-----
From: Samuel Just [mailto:sjust@redhat.com] 
Sent: Thursday, December 03, 2015 9:51 AM
To: Somnath Roy
Cc: Adam C. Emerson; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

As far as I know, there are no current users which want to use the Transaction later.  You could also change apply_transaction to copy the Transaction into a unique_ptr since I don't think it's used in any performance sensitive code paths.
-Sam

On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Yes, that we can do..But, in that case aren't we restricting user if they want to do something with this Transaction object later. I didn't go through each and every part of the code yet (which is huge) that are using these interfaces to understand if it is using Transaction object afterwards.
>
> Thanks & Regards
> Somnath
> -----Original Message-----
> From: Adam C. Emerson [mailto:aemerson@redhat.com]
> Sent: Thursday, December 03, 2015 9:25 AM
> To: Somnath Roy
> Cc: Sage Weil; Samuel Just (sam.just@inktank.com); 
> ceph-devel@vger.kernel.org
> Subject: Re: queue_transaction interface + unique_ptr + performance
>
> On 03/12/2015, Somnath Roy wrote:
>> Yes, I posted the new result after adding -O2 in the compiler flag and it shows almost no overhead with unique_ptr.
>> I will add the test of adding to list overhead and start implementing the new interface.
>> But, regarding my other point of changing all the objecstore interfaces (my first mail on this mail chain in case you have missed) taking Transaction, any thought of that ?
>> Should we reconsider having two queue_transaction interface ?
>
> As I understand it, the concern with switching to unique_ptr was that the callee would move from the reference without this being known to the caller.
>
> Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? That way the compiler should demand that the callers explicitly use std::move on the reference they're holding, documenting at the site of the call that they're willing to give up ownership.
>
>
> --
> Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> IRC: Aemerson@{RedHat, OFTC, Freenode}
> 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 11:50 ` Sage Weil
  2015-12-03 15:16   ` Casey Bodley
  2015-12-03 17:02   ` Somnath Roy
@ 2015-12-03 19:04   ` Matt Benjamin
  2 siblings, 0 replies; 32+ messages in thread
From: Matt Benjamin @ 2015-12-03 19:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

++ #1

----- Original Message -----
> From: "Sage Weil" <sage@newdream.net>
> To: "Somnath Roy" <Somnath.Roy@sandisk.com>
> Cc: "Samuel Just (sam.just@inktank.com)" <sam.just@inktank.com>, ceph-devel@vger.kernel.org
> Sent: Thursday, December 3, 2015 6:50:26 AM
> Subject: RE: queue_transaction interface + unique_ptr + performance
> 
> 1- I agree we should avoid shared_ptr whenever possible.
> 
> 
> sage

-- 
-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 17:53           ` Somnath Roy
@ 2015-12-03 19:09             ` Casey Bodley
  2015-12-03 19:12               ` Adam C. Emerson
  0 siblings, 1 reply; 32+ messages in thread
From: Casey Bodley @ 2015-12-03 19:09 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Samuel Just, Adam C. Emerson, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

After more discussion with Adam, it seems that making the Transaction
object itself a movable type could alleviate all of these concerns about
heap allocations, ownership, and lifetime.

The queue_transactions() interface could take a container of Transactions,
rather than pointers to Transactions, and the ObjectStore would move them
out of the container into whatever representation it prefers.

Casey

----- Original Message -----
> Yes, if we can do that, that will be far more easier..I will double check
> that if apply_transaction is been used from any of the performance sensitive
> path and do the changes accordingly..Thanks..
> 
> -----Original Message-----
> From: Samuel Just [mailto:sjust@redhat.com]
> Sent: Thursday, December 03, 2015 9:51 AM
> To: Somnath Roy
> Cc: Adam C. Emerson; Sage Weil; Samuel Just (sam.just@inktank.com);
> ceph-devel@vger.kernel.org
> Subject: Re: queue_transaction interface + unique_ptr + performance
> 
> As far as I know, there are no current users which want to use the
> Transaction later.  You could also change apply_transaction to copy the
> Transaction into a unique_ptr since I don't think it's used in any
> performance sensitive code paths.
> -Sam
> 
> On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> > Yes, that we can do..But, in that case aren't we restricting user if they
> > want to do something with this Transaction object later. I didn't go
> > through each and every part of the code yet (which is huge) that are using
> > these interfaces to understand if it is using Transaction object
> > afterwards.
> >
> > Thanks & Regards
> > Somnath
> > -----Original Message-----
> > From: Adam C. Emerson [mailto:aemerson@redhat.com]
> > Sent: Thursday, December 03, 2015 9:25 AM
> > To: Somnath Roy
> > Cc: Sage Weil; Samuel Just (sam.just@inktank.com);
> > ceph-devel@vger.kernel.org
> > Subject: Re: queue_transaction interface + unique_ptr + performance
> >
> > On 03/12/2015, Somnath Roy wrote:
> >> Yes, I posted the new result after adding -O2 in the compiler flag and it
> >> shows almost no overhead with unique_ptr.
> >> I will add the test of adding to list overhead and start implementing the
> >> new interface.
> >> But, regarding my other point of changing all the objecstore interfaces
> >> (my first mail on this mail chain in case you have missed) taking
> >> Transaction, any thought of that ?
> >> Should we reconsider having two queue_transaction interface ?
> >
> > As I understand it, the concern with switching to unique_ptr was that the
> > callee would move from the reference without this being known to the
> > caller.
> >
> > Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)?
> > That way the compiler should demand that the callers explicitly use
> > std::move on the reference they're holding, documenting at the site of the
> > call that they're willing to give up ownership.
> >
> >
> > --
> > Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
> > IRC: Aemerson@{RedHat, OFTC, Freenode}
> > 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 19:09             ` Casey Bodley
@ 2015-12-03 19:12               ` Adam C. Emerson
  2015-12-03 19:14                 ` Samuel Just
  0 siblings, 1 reply; 32+ messages in thread
From: Adam C. Emerson @ 2015-12-03 19:12 UTC (permalink / raw)
  To: Casey Bodley
  Cc: Somnath Roy, Samuel Just, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

[-- Attachment #1: Type: text/plain, Size: 463 bytes --]

On 03/12/2015, Casey Bodley wrote:
[snip]
> The queue_transactions() interface could take a container of Transactions,
> rather than pointers to Transactions, and the ObjectStore would move them
> out of the container into whatever representation it prefers.
[snip]

Or a pointer and count (or we could steal array_view from GSL). That way we
could pass in any continguous range (std::vector or even a std::array or
regular C style array allocated on the stack.)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 603 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 19:12               ` Adam C. Emerson
@ 2015-12-03 19:14                 ` Samuel Just
  2015-12-03 19:29                   ` Casey Bodley
  0 siblings, 1 reply; 32+ messages in thread
From: Samuel Just @ 2015-12-03 19:14 UTC (permalink / raw)
  To: Casey Bodley, Somnath Roy, Samuel Just, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

Eh, Sage had a point that Transaction has a bunch of little fields
which would have to be filled in -- its move constructor would be less
trivial than unique_ptr's.
-Sam

On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemerson@redhat.com> wrote:
> On 03/12/2015, Casey Bodley wrote:
> [snip]
>> The queue_transactions() interface could take a container of Transactions,
>> rather than pointers to Transactions, and the ObjectStore would move them
>> out of the container into whatever representation it prefers.
> [snip]
>
> Or a pointer and count (or we could steal array_view from GSL). That way we
> could pass in any continguous range (std::vector or even a std::array or
> regular C style array allocated on the stack.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 19:14                 ` Samuel Just
@ 2015-12-03 19:29                   ` Casey Bodley
  2015-12-03 19:33                     ` Samuel Just
  0 siblings, 1 reply; 32+ messages in thread
From: Casey Bodley @ 2015-12-03 19:29 UTC (permalink / raw)
  To: Samuel Just
  Cc: Somnath Roy, Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel


----- Original Message -----
> Eh, Sage had a point that Transaction has a bunch of little fields
> which would have to be filled in -- its move constructor would be less
> trivial than unique_ptr's.
> -Sam

It's true that the move ctor has to do work. I counted 18 fields, half of
which are integers, and the rest have move ctors themselves. But the cpu
is good at integers. The win here is that you're not hitting the allocator
in the fast path.

Casey

> 
> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemerson@redhat.com> wrote:
> > On 03/12/2015, Casey Bodley wrote:
> > [snip]
> >> The queue_transactions() interface could take a container of Transactions,
> >> rather than pointers to Transactions, and the ObjectStore would move them
> >> out of the container into whatever representation it prefers.
> > [snip]
> >
> > Or a pointer and count (or we could steal array_view from GSL). That way we
> > could pass in any continguous range (std::vector or even a std::array or
> > regular C style array allocated on the stack.)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 19:29                   ` Casey Bodley
@ 2015-12-03 19:33                     ` Samuel Just
  2015-12-03 20:14                       ` Casey Bodley
  0 siblings, 1 reply; 32+ messages in thread
From: Samuel Just @ 2015-12-03 19:33 UTC (permalink / raw)
  To: Casey Bodley
  Cc: Somnath Roy, Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

Well, yeah we are, it's just the actual Transaction structure which
wouldn't be dynamic -- the buffers and many other fields would still
hit the allocator.
-Sam

On Thu, Dec 3, 2015 at 11:29 AM, Casey Bodley <cbodley@redhat.com> wrote:
>
> ----- Original Message -----
>> Eh, Sage had a point that Transaction has a bunch of little fields
>> which would have to be filled in -- its move constructor would be less
>> trivial than unique_ptr's.
>> -Sam
>
> It's true that the move ctor has to do work. I counted 18 fields, half of
> which are integers, and the rest have move ctors themselves. But the cpu
> is good at integers. The win here is that you're not hitting the allocator
> in the fast path.
>
> Casey
>
>>
>> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemerson@redhat.com> wrote:
>> > On 03/12/2015, Casey Bodley wrote:
>> > [snip]
>> >> The queue_transactions() interface could take a container of Transactions,
>> >> rather than pointers to Transactions, and the ObjectStore would move them
>> >> out of the container into whatever representation it prefers.
>> > [snip]
>> >
>> > Or a pointer and count (or we could steal array_view from GSL). That way we
>> > could pass in any continguous range (std::vector or even a std::array or
>> > regular C style array allocated on the stack.)
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 19:33                     ` Samuel Just
@ 2015-12-03 20:14                       ` Casey Bodley
  2015-12-03 21:06                         ` Sage Weil
  0 siblings, 1 reply; 32+ messages in thread
From: Casey Bodley @ 2015-12-03 20:14 UTC (permalink / raw)
  To: Samuel Just
  Cc: Somnath Roy, Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel


----- Original Message -----
> Well, yeah we are, it's just the actual Transaction structure which
> wouldn't be dynamic -- the buffers and many other fields would still
> hit the allocator.
> -Sam

Sure. I was looking specifically at the tradeoffs between allocating
and moving the Transaction object itself.

As it currently stands, the caller of ObjectStore can choose whether to
allocate its Transactions on the heap, embed them in other objects, or
put them on the stack for use with apply_transactions(). Switching to an
interface built around unique_ptr forces all callers to use the heap. I'm
advocating for an interface that doesn't.

Casey

> 
> On Thu, Dec 3, 2015 at 11:29 AM, Casey Bodley <cbodley@redhat.com> wrote:
> >
> > ----- Original Message -----
> >> Eh, Sage had a point that Transaction has a bunch of little fields
> >> which would have to be filled in -- its move constructor would be less
> >> trivial than unique_ptr's.
> >> -Sam
> >
> > It's true that the move ctor has to do work. I counted 18 fields, half of
> > which are integers, and the rest have move ctors themselves. But the cpu
> > is good at integers. The win here is that you're not hitting the allocator
> > in the fast path.
> >
> > Casey
> >
> >>
> >> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemerson@redhat.com>
> >> wrote:
> >> > On 03/12/2015, Casey Bodley wrote:
> >> > [snip]
> >> >> The queue_transactions() interface could take a container of
> >> >> Transactions,
> >> >> rather than pointers to Transactions, and the ObjectStore would move
> >> >> them
> >> >> out of the container into whatever representation it prefers.
> >> > [snip]
> >> >
> >> > Or a pointer and count (or we could steal array_view from GSL). That way
> >> > we
> >> > could pass in any continguous range (std::vector or even a std::array or
> >> > regular C style array allocated on the stack.)
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 20:14                       ` Casey Bodley
@ 2015-12-03 21:06                         ` Sage Weil
  2015-12-03 21:23                           ` Casey Bodley
  0 siblings, 1 reply; 32+ messages in thread
From: Sage Weil @ 2015-12-03 21:06 UTC (permalink / raw)
  To: Casey Bodley
  Cc: Samuel Just, Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

On Thu, 3 Dec 2015, Casey Bodley wrote:
> > Well, yeah we are, it's just the actual Transaction structure which
> > wouldn't be dynamic -- the buffers and many other fields would still
> > hit the allocator.
> > -Sam
> 
> Sure. I was looking specifically at the tradeoffs between allocating
> and moving the Transaction object itself.
> 
> As it currently stands, the caller of ObjectStore can choose whether to
> allocate its Transactions on the heap, embed them in other objects, or
> put them on the stack for use with apply_transactions(). Switching to an
> interface built around unique_ptr forces all callers to use the heap. I'm
> advocating for an interface that doesn't.

That leaves us with either std::move or.. the raw Transaction* we have 
now.  Right?

> > > It's true that the move ctor has to do work. I counted 18 fields, half of
> > > which are integers, and the rest have move ctors themselves. But the cpu
> > > is good at integers. The win here is that you're not hitting the allocator
> > > in the fast path.

To be fair, many of these are also legacy that we can remove... possibly 
even now.  IIRC the only exposure to legacy encoded transactions (that use 
the tbl hackery) are journal items from an upgrade pre-hammer OSD that 
aren't flushed on upgrade.  We should have made the osd flush the journal 
before recording the 0_94_4 ondisk feature.  We could add another one to 
enforce that and rip all that code out now instead of waiting until 
after jewel... that would be satisfying (and I think an ondisk ceph-osd 
feature is enough here, then document that users should upgrade to 
hammer 0.94.6 or infernalis 9.2.1 before moving to jewel).

sage

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 21:06                         ` Sage Weil
@ 2015-12-03 21:23                           ` Casey Bodley
  2015-12-03 23:23                             ` Samuel Just
  0 siblings, 1 reply; 32+ messages in thread
From: Casey Bodley @ 2015-12-03 21:23 UTC (permalink / raw)
  To: Sage Weil
  Cc: Samuel Just, Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel


----- Original Message -----
> On Thu, 3 Dec 2015, Casey Bodley wrote:
> > > Well, yeah we are, it's just the actual Transaction structure which
> > > wouldn't be dynamic -- the buffers and many other fields would still
> > > hit the allocator.
> > > -Sam
> > 
> > Sure. I was looking specifically at the tradeoffs between allocating
> > and moving the Transaction object itself.
> > 
> > As it currently stands, the caller of ObjectStore can choose whether to
> > allocate its Transactions on the heap, embed them in other objects, or
> > put them on the stack for use with apply_transactions(). Switching to an
> > interface built around unique_ptr forces all callers to use the heap. I'm
> > advocating for an interface that doesn't.
> 
> That leaves us with either std::move or.. the raw Transaction* we have
> now.  Right?

Right. The thing I really like about the unique_ptr approach is that the
caller no longer has to care about the Transaction's lifetime, so doesn't
have to allocate the extra ObjectStore::C_DeleteTransaction for cleanup.
Movable Transactions accomplish this as well.

Casey

> 
> > > > It's true that the move ctor has to do work. I counted 18 fields, half
> > > > of
> > > > which are integers, and the rest have move ctors themselves. But the
> > > > cpu
> > > > is good at integers. The win here is that you're not hitting the
> > > > allocator
> > > > in the fast path.
> 
> To be fair, many of these are also legacy that we can remove... possibly
> even now.  IIRC the only exposure to legacy encoded transactions (that use
> the tbl hackery) are journal items from an upgrade pre-hammer OSD that
> aren't flushed on upgrade.  We should have made the osd flush the journal
> before recording the 0_94_4 ondisk feature.  We could add another one to
> enforce that and rip all that code out now instead of waiting until
> after jewel... that would be satisfying (and I think an ondisk ceph-osd
> feature is enough here, then document that users should upgrade to
> hammer 0.94.6 or infernalis 9.2.1 before moving to jewel).
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-03 21:23                           ` Casey Bodley
@ 2015-12-03 23:23                             ` Samuel Just
  2015-12-04  0:25                               ` Somnath Roy
  0 siblings, 1 reply; 32+ messages in thread
From: Samuel Just @ 2015-12-03 23:23 UTC (permalink / raw)
  To: Casey Bodley
  Cc: Sage Weil, Somnath Roy, Samuel Just (sam.just@inktank.com), ceph-devel

From a simplicity point of view, I'd rather just move a Transaction
object than use a unique_ptr.   Maybe the overhead doesn't end up
being significant?
-Sam

On Thu, Dec 3, 2015 at 1:23 PM, Casey Bodley <cbodley@redhat.com> wrote:
>
> ----- Original Message -----
>> On Thu, 3 Dec 2015, Casey Bodley wrote:
>> > > Well, yeah we are, it's just the actual Transaction structure which
>> > > wouldn't be dynamic -- the buffers and many other fields would still
>> > > hit the allocator.
>> > > -Sam
>> >
>> > Sure. I was looking specifically at the tradeoffs between allocating
>> > and moving the Transaction object itself.
>> >
>> > As it currently stands, the caller of ObjectStore can choose whether to
>> > allocate its Transactions on the heap, embed them in other objects, or
>> > put them on the stack for use with apply_transactions(). Switching to an
>> > interface built around unique_ptr forces all callers to use the heap. I'm
>> > advocating for an interface that doesn't.
>>
>> That leaves us with either std::move or.. the raw Transaction* we have
>> now.  Right?
>
> Right. The thing I really like about the unique_ptr approach is that the
> caller no longer has to care about the Transaction's lifetime, so doesn't
> have to allocate the extra ObjectStore::C_DeleteTransaction for cleanup.
> Movable Transactions accomplish this as well.
>
> Casey
>
>>
>> > > > It's true that the move ctor has to do work. I counted 18 fields, half
>> > > > of
>> > > > which are integers, and the rest have move ctors themselves. But the
>> > > > cpu
>> > > > is good at integers. The win here is that you're not hitting the
>> > > > allocator
>> > > > in the fast path.
>>
>> To be fair, many of these are also legacy that we can remove... possibly
>> even now.  IIRC the only exposure to legacy encoded transactions (that use
>> the tbl hackery) are journal items from an upgrade pre-hammer OSD that
>> aren't flushed on upgrade.  We should have made the osd flush the journal
>> before recording the 0_94_4 ondisk feature.  We could add another one to
>> enforce that and rip all that code out now instead of waiting until
>> after jewel... that would be satisfying (and I think an ondisk ceph-osd
>> feature is enough here, then document that users should upgrade to
>> hammer 0.94.6 or infernalis 9.2.1 before moving to jewel).
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-03 23:23                             ` Samuel Just
@ 2015-12-04  0:25                               ` Somnath Roy
  2015-12-04  1:43                                 ` Adam C. Emerson
  0 siblings, 1 reply; 32+ messages in thread
From: Somnath Roy @ 2015-12-04  0:25 UTC (permalink / raw)
  To: Samuel Just, Casey Bodley
  Cc: Sage Weil, Samuel Just (sam.just@inktank.com), ceph-devel

Some more findings..

1. By using make_shared the cost of shared_ptr is reduced significantly..

##### Test conventional ptr ######
micros_used for conventional ptr: 22633702 

##### Test Unique Smart ptr ######
micros_used for unique ptr: 22633694


##### Test Shared Smart ptr ######
micros_used for shared ptr: 27105354


So, almost half now...

Will do some cleanup in the ceph io path on this..

2. I have also measured the cost of list::push_back etc. , all is expected.. if I replace list with vector again I am getting *some benefit* , which is also expected I guess.

I am not seeing any conclusion on the interface change...Here is my understanding so far..

1. For the method like apply_transaction() which is taking Transaction reference now, I will not be changing the interface , instead will do an extra copy to create unique_ptr inside.

2. For all the cases taking Transaction* , I will change that to TransactionRef&&

3. The queue_transactions() will be changed to vector< TransactionRef> instead of list< Transaction*>

Let me know if I am missing anything..

Thanks & Regards
Somnath

-----Original Message-----
From: Samuel Just [mailto:sjust@redhat.com] 
Sent: Thursday, December 03, 2015 3:24 PM
To: Casey Bodley
Cc: Sage Weil; Somnath Roy; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

From a simplicity point of view, I'd rather just move a Transaction
object than use a unique_ptr.   Maybe the overhead doesn't end up
being significant?
-Sam

On Thu, Dec 3, 2015 at 1:23 PM, Casey Bodley <cbodley@redhat.com> wrote:
>
> ----- Original Message -----
>> On Thu, 3 Dec 2015, Casey Bodley wrote:
>> > > Well, yeah we are, it's just the actual Transaction structure 
>> > > which wouldn't be dynamic -- the buffers and many other fields 
>> > > would still hit the allocator.
>> > > -Sam
>> >
>> > Sure. I was looking specifically at the tradeoffs between 
>> > allocating and moving the Transaction object itself.
>> >
>> > As it currently stands, the caller of ObjectStore can choose 
>> > whether to allocate its Transactions on the heap, embed them in 
>> > other objects, or put them on the stack for use with 
>> > apply_transactions(). Switching to an interface built around 
>> > unique_ptr forces all callers to use the heap. I'm advocating for an interface that doesn't.
>>
>> That leaves us with either std::move or.. the raw Transaction* we 
>> have now.  Right?
>
> Right. The thing I really like about the unique_ptr approach is that 
> the caller no longer has to care about the Transaction's lifetime, so 
> doesn't have to allocate the extra ObjectStore::C_DeleteTransaction for cleanup.
> Movable Transactions accomplish this as well.
>
> Casey
>
>>
>> > > > It's true that the move ctor has to do work. I counted 18 
>> > > > fields, half of which are integers, and the rest have move 
>> > > > ctors themselves. But the cpu is good at integers. The win here 
>> > > > is that you're not hitting the allocator in the fast path.
>>
>> To be fair, many of these are also legacy that we can remove... 
>> possibly even now.  IIRC the only exposure to legacy encoded 
>> transactions (that use the tbl hackery) are journal items from an 
>> upgrade pre-hammer OSD that aren't flushed on upgrade.  We should 
>> have made the osd flush the journal before recording the 0_94_4 
>> ondisk feature.  We could add another one to enforce that and rip all 
>> that code out now instead of waiting until after jewel... that would 
>> be satisfying (and I think an ondisk ceph-osd feature is enough here, 
>> then document that users should upgrade to hammer 0.94.6 or infernalis 9.2.1 before moving to jewel).
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: queue_transaction interface + unique_ptr + performance
  2015-12-04  0:25                               ` Somnath Roy
@ 2015-12-04  1:43                                 ` Adam C. Emerson
  2015-12-04  6:15                                   ` Somnath Roy
                                                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Adam C. Emerson @ 2015-12-04  1:43 UTC (permalink / raw)
  To: Somnath Roy
  Cc: Samuel Just, Casey Bodley, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

On 04/12/2015, Somnath Roy wrote:
[snip]
> ##### Test Shared Smart ptr ######
> micros_used for shared ptr: 27105354

One thing to keep in mind is that, if there are any cases where we really DO
want to use shared_ptr, if we find ourselves assigning a new pointer then
releasing an old one, it's better to std::move from one to the other, since it
doesn't have to use locking/atomics to fiddle with the refcount in that case.
But if we can avoid them, all the better.


[snip]
> 2. I have also measured the cost of list::push_back etc. , all is expected.. if I replace list with vector again I am getting *some benefit* , which is also expected I guess.

If we have some idea how many elements we expect/want to hold, calling reserve
will get all our allocations done all at once rather than dipping into the heap
multiple times.

> I am not seeing any conclusion on the interface change...Here is my understanding so far..
> 
> 1. For the method like apply_transaction() which is taking Transaction reference now, I will not be changing the interface , instead will do an extra copy to create unique_ptr inside.
> 
> 2. For all the cases taking Transaction* , I will change that to TransactionRef&&
> 
> 3. The queue_transactions() will be changed to vector< TransactionRef> instead of list< Transaction*>
> 
> Let me know if I am missing anything..

My understanding is that in some cases we might not want to use the heap. (For
example, if we're executing a transaction synchronously rather than enqueing
it.) And to make Transaction movable. Then we would not build a
std::vector<TransactionRef>, what we would build instead is
std::vector<Transaction>

In this case, we should probably have everything just take Transaction&& (so the
caller can expect to have the contents of the transaction moved out.

We might also want to consider building a transaction in place. I know I
saw one constructor that takes bufferlist. If there are other cases where
we could benefit from just passing in arguments and having the transaction
built right into the vector (with emplace_back), that would be even
better than constructing it on the stack and moving it in to the vector.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-04  1:43                                 ` Adam C. Emerson
@ 2015-12-04  6:15                                   ` Somnath Roy
  2015-12-04  6:43                                   ` Somnath Roy
  2016-01-18 20:48                                   ` Somnath Roy
  2 siblings, 0 replies; 32+ messages in thread
From: Somnath Roy @ 2015-12-04  6:15 UTC (permalink / raw)
  To: Adam C. Emerson
  Cc: Samuel Just, Casey Bodley, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

Adam,
May be I am missing something here. 
Don't we need to do an extra Transaction copy in case of async completion and Transaction is allocated on stack for queue_transaction() taking vector<Transaction> ?

Thanks & Regards
Somnath

-----Original Message-----
From: Adam C. Emerson [mailto:aemerson@redhat.com] 
Sent: Thursday, December 03, 2015 5:44 PM
To: Somnath Roy
Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

On 04/12/2015, Somnath Roy wrote:
[snip]
> ##### Test Shared Smart ptr ######
> micros_used for shared ptr: 27105354

One thing to keep in mind is that, if there are any cases where we really DO want to use shared_ptr, if we find ourselves assigning a new pointer then releasing an old one, it's better to std::move from one to the other, since it doesn't have to use locking/atomics to fiddle with the refcount in that case.
But if we can avoid them, all the better.


[snip]
> 2. I have also measured the cost of list::push_back etc. , all is expected.. if I replace list with vector again I am getting *some benefit* , which is also expected I guess.

If we have some idea how many elements we expect/want to hold, calling reserve will get all our allocations done all at once rather than dipping into the heap multiple times.

> I am not seeing any conclusion on the interface change...Here is my understanding so far..
> 
> 1. For the method like apply_transaction() which is taking Transaction reference now, I will not be changing the interface , instead will do an extra copy to create unique_ptr inside.
> 
> 2. For all the cases taking Transaction* , I will change that to 
> TransactionRef&&
> 
> 3. The queue_transactions() will be changed to vector< TransactionRef> 
> instead of list< Transaction*>
> 
> Let me know if I am missing anything..

My understanding is that in some cases we might not want to use the heap. (For example, if we're executing a transaction synchronously rather than enqueing
it.) And to make Transaction movable. Then we would not build a std::vector<TransactionRef>, what we would build instead is std::vector<Transaction>

In this case, we should probably have everything just take Transaction&& (so the caller can expect to have the contents of the transaction moved out.

We might also want to consider building a transaction in place. I know I saw one constructor that takes bufferlist. If there are other cases where we could benefit from just passing in arguments and having the transaction built right into the vector (with emplace_back), that would be even better than constructing it on the stack and moving it in to the vector.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-04  1:43                                 ` Adam C. Emerson
  2015-12-04  6:15                                   ` Somnath Roy
@ 2015-12-04  6:43                                   ` Somnath Roy
  2016-01-18 20:48                                   ` Somnath Roy
  2 siblings, 0 replies; 32+ messages in thread
From: Somnath Roy @ 2015-12-04  6:43 UTC (permalink / raw)
  To: Adam C. Emerson
  Cc: Samuel Just, Casey Bodley, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

Sorry , I went through entire conversation and realized that we are expecting the move constructor of Transaction class won't be inducing much overhead vs unique_ptr.
Thanks I will start coding around that and will publish the final interfaces soon for the sign off.

Regards
Somnath

-----Original Message-----
From: Somnath Roy 
Sent: Thursday, December 03, 2015 10:16 PM
To: 'Adam C. Emerson'
Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

Adam,
May be I am missing something here. 
Don't we need to do an extra Transaction copy in case of async completion and Transaction is allocated on stack for queue_transaction() taking vector<Transaction> ?

Thanks & Regards
Somnath

-----Original Message-----
From: Adam C. Emerson [mailto:aemerson@redhat.com]
Sent: Thursday, December 03, 2015 5:44 PM
To: Somnath Roy
Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

On 04/12/2015, Somnath Roy wrote:
[snip]
> ##### Test Shared Smart ptr ######
> micros_used for shared ptr: 27105354

One thing to keep in mind is that, if there are any cases where we really DO want to use shared_ptr, if we find ourselves assigning a new pointer then releasing an old one, it's better to std::move from one to the other, since it doesn't have to use locking/atomics to fiddle with the refcount in that case.
But if we can avoid them, all the better.


[snip]
> 2. I have also measured the cost of list::push_back etc. , all is expected.. if I replace list with vector again I am getting *some benefit* , which is also expected I guess.

If we have some idea how many elements we expect/want to hold, calling reserve will get all our allocations done all at once rather than dipping into the heap multiple times.

> I am not seeing any conclusion on the interface change...Here is my understanding so far..
> 
> 1. For the method like apply_transaction() which is taking Transaction reference now, I will not be changing the interface , instead will do an extra copy to create unique_ptr inside.
> 
> 2. For all the cases taking Transaction* , I will change that to 
> TransactionRef&&
> 
> 3. The queue_transactions() will be changed to vector< TransactionRef> 
> instead of list< Transaction*>
> 
> Let me know if I am missing anything..

My understanding is that in some cases we might not want to use the heap. (For example, if we're executing a transaction synchronously rather than enqueing
it.) And to make Transaction movable. Then we would not build a std::vector<TransactionRef>, what we would build instead is std::vector<Transaction>

In this case, we should probably have everything just take Transaction&& (so the caller can expect to have the contents of the transaction moved out.

We might also want to consider building a transaction in place. I know I saw one constructor that takes bufferlist. If there are other cases where we could benefit from just passing in arguments and having the transaction built right into the vector (with emplace_back), that would be even better than constructing it on the stack and moving it in to the vector.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: queue_transaction interface + unique_ptr + performance
  2015-12-04  1:43                                 ` Adam C. Emerson
  2015-12-04  6:15                                   ` Somnath Roy
  2015-12-04  6:43                                   ` Somnath Roy
@ 2016-01-18 20:48                                   ` Somnath Roy
  2 siblings, 0 replies; 32+ messages in thread
From: Somnath Roy @ 2016-01-18 20:48 UTC (permalink / raw)
  To: Adam C. Emerson
  Cc: Samuel Just, Casey Bodley, Sage Weil,
	Samuel Just (sam.just@inktank.com),
	ceph-devel

Hi,
Please review the following pull request addressing this..

https://github.com/ceph/ceph/pull/7271

It is working well in my tests with 384 TB cluster, 48 OSDs , 6 OSD nodes..It is giving ~3-4% cpu usage improvement as well..
 
Thanks & Regards
Somnath

-----Original Message-----
From: Somnath Roy 
Sent: Thursday, December 03, 2015 10:44 PM
To: 'Adam C. Emerson'
Cc: 'Samuel Just'; 'Casey Bodley'; 'Sage Weil'; 'Samuel Just (sam.just@inktank.com)'; 'ceph-devel@vger.kernel.org'
Subject: RE: queue_transaction interface + unique_ptr + performance

Sorry , I went through entire conversation and realized that we are expecting the move constructor of Transaction class won't be inducing much overhead vs unique_ptr.
Thanks I will start coding around that and will publish the final interfaces soon for the sign off.

Regards
Somnath

-----Original Message-----
From: Somnath Roy
Sent: Thursday, December 03, 2015 10:16 PM
To: 'Adam C. Emerson'
Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: RE: queue_transaction interface + unique_ptr + performance

Adam,
May be I am missing something here. 
Don't we need to do an extra Transaction copy in case of async completion and Transaction is allocated on stack for queue_transaction() taking vector<Transaction> ?

Thanks & Regards
Somnath

-----Original Message-----
From: Adam C. Emerson [mailto:aemerson@redhat.com]
Sent: Thursday, December 03, 2015 5:44 PM
To: Somnath Roy
Cc: Samuel Just; Casey Bodley; Sage Weil; Samuel Just (sam.just@inktank.com); ceph-devel@vger.kernel.org
Subject: Re: queue_transaction interface + unique_ptr + performance

On 04/12/2015, Somnath Roy wrote:
[snip]
> ##### Test Shared Smart ptr ######
> micros_used for shared ptr: 27105354

One thing to keep in mind is that, if there are any cases where we really DO want to use shared_ptr, if we find ourselves assigning a new pointer then releasing an old one, it's better to std::move from one to the other, since it doesn't have to use locking/atomics to fiddle with the refcount in that case.
But if we can avoid them, all the better.


[snip]
> 2. I have also measured the cost of list::push_back etc. , all is expected.. if I replace list with vector again I am getting *some benefit* , which is also expected I guess.

If we have some idea how many elements we expect/want to hold, calling reserve will get all our allocations done all at once rather than dipping into the heap multiple times.

> I am not seeing any conclusion on the interface change...Here is my understanding so far..
> 
> 1. For the method like apply_transaction() which is taking Transaction reference now, I will not be changing the interface , instead will do an extra copy to create unique_ptr inside.
> 
> 2. For all the cases taking Transaction* , I will change that to 
> TransactionRef&&
> 
> 3. The queue_transactions() will be changed to vector< TransactionRef> 
> instead of list< Transaction*>
> 
> Let me know if I am missing anything..

My understanding is that in some cases we might not want to use the heap. (For example, if we're executing a transaction synchronously rather than enqueing
it.) And to make Transaction movable. Then we would not build a std::vector<TransactionRef>, what we would build instead is std::vector<Transaction>

In this case, we should probably have everything just take Transaction&& (so the caller can expect to have the contents of the transaction moved out.

We might also want to consider building a transaction in place. I know I saw one constructor that takes bufferlist. If there are other cases where we could benefit from just passing in arguments and having the transaction built right into the vector (with emplace_back), that would be even better than constructing it on the stack and moving it in to the vector.

-- 
Senior Software Engineer           Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@{RedHat, OFTC, Freenode}
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-01-18 20:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03  2:13 queue_transaction interface + unique_ptr + performance Somnath Roy
2015-12-03  3:50 ` James (Fei) Liu-SSI
2015-12-03  3:59   ` Somnath Roy
2015-12-03  7:21   ` Somnath Roy
2015-12-03  8:42 ` Piotr.Dalek
2015-12-03 11:50 ` Sage Weil
2015-12-03 15:16   ` Casey Bodley
2015-12-03 16:52     ` Somnath Roy
2015-12-03 17:17       ` Adam C. Emerson
2015-12-03 17:40         ` Somnath Roy
2015-12-03 17:48           ` Samuel Just
2015-12-03 17:02   ` Somnath Roy
2015-12-03 17:25     ` Adam C. Emerson
2015-12-03 17:48       ` Somnath Roy
2015-12-03 17:50         ` Samuel Just
2015-12-03 17:52           ` Samuel Just
2015-12-03 17:53           ` Somnath Roy
2015-12-03 19:09             ` Casey Bodley
2015-12-03 19:12               ` Adam C. Emerson
2015-12-03 19:14                 ` Samuel Just
2015-12-03 19:29                   ` Casey Bodley
2015-12-03 19:33                     ` Samuel Just
2015-12-03 20:14                       ` Casey Bodley
2015-12-03 21:06                         ` Sage Weil
2015-12-03 21:23                           ` Casey Bodley
2015-12-03 23:23                             ` Samuel Just
2015-12-04  0:25                               ` Somnath Roy
2015-12-04  1:43                                 ` Adam C. Emerson
2015-12-04  6:15                                   ` Somnath Roy
2015-12-04  6:43                                   ` Somnath Roy
2016-01-18 20:48                                   ` Somnath Roy
2015-12-03 19:04   ` Matt Benjamin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.