c++ - c+ + - 打印堆栈跟踪和测量函数执行时析构函数的执行

我想有一些简单的检测,打印函数调用堆栈,并测量每个函数所花费的时间。


#define STACK_TRACE_ENABLED 1



struct trace_t {


 std::string func_name;


 std::chrono::high_resolution_clock::time_point begin;


 std::chrono::high_resolution_clock::time_point end;


};



using stack_trace_t = std::vector<trace_t>;


auto stack_trace = stack_trace_t{};



void print_top_of_trace() {


 using namespace std::chrono;


 duration<double, std::nano> t = stack_trace.back().end - stack_trace.back().begin;


 std::cout <<"##" << std::setw(50) << stack_trace.back().func_name 


 <<" took" << std::setw(16) << t.count() 


 <<" nanoseconds ##n";


}



struct tracer {


 tracer(std::string fn) 


 :begin{std::chrono::high_resolution_clock::now()}


 {


 stack_trace.push_back(trace_t{fn, begin, end});


 }



 ~tracer() {


 stack_trace[stack_trace.size() - 1].end = std::chrono::high_resolution_clock::now();


 print_top_of_trace();


 stack_trace.pop_back();


 }



 std::chrono::high_resolution_clock::time_point begin;


 std::chrono::high_resolution_clock::time_point end;


}; 



和一些宏来简化使用:


#ifndef NDEBUG


#define ADD_STACK_TRACE_(func_name) tracer __ny_tracer(func_name);


#define ADD_STACK_TRACE ADD_STACK_TRACE_(__PRETTY_FUNCTION__)


#else


#define ADD_STACK_TRACE_(func_name) (void)(0);


#define ADD_STACK_TRACE ADD_STACK_TRACE_(__PRETTY_FUNCTION__)


#endif 



然后我有两个函数来测量,需要添加ADD_STACK_TRACE作为函数的第一行,完整的工作示例见godbolt,


constexpr auto sin = [](float x) {


 ADD_STACK_TRACE


 return x - 


 ((x * x * x) / 6.0f) + 


 ((x * x * x * x * x) / 120.0f) -


 ((x * x * x * x * x * x * x) / 5040.0f);


};



constexpr auto cos = [](float x) {


 ADD_STACK_TRACE


 return 1.0f - 


 ((x * x) / 2.0f) + 


 ((x * x * x * x) / 24.0f) -


 ((x * x * x * x * x * x) / 720.0f);


};



float sum(float i1, float i2) {


 ADD_STACK_TRACE


 return i1 + i2;


}



float tan(float f) {


 ADD_STACK_TRACE


 return sin(f) / cos(f);


}



int main() {


 ADD_STACK_TRACE


 float param = sum(44.0f, 1.0f) * PI / 180.0f;


 return tan(param);


} 



目前,我得到以下输出:


## float sum(float, float) took 1442 nanoseconds ##



## auto (anonymous class)::operator()(float) const took 2257 nanoseconds ##



## auto (anonymous class)::operator()(float) const took 118 nanoseconds ##



## float tan(float) took 4689 nanoseconds ##



## int main() took 44182 nanoseconds ##



我觉得有点奇怪,sincos都使用Taylor扩展,我看不到为什么它们在执行时间上会有很大的不同,实际上,如果从ADD_STACK_TRACEtan中移除,我得到以下输出:


## float sum(float, float) took 1324 nanoseconds ##



## auto (anonymous class)::operator()(float) const took 169 nanoseconds ##



## auto (anonymous class)::operator()(float) const took 124 nanoseconds ##



## int main() took 33660 nanoseconds ##



可以看到,这表明sincos非常接近,这里有什么问题?

时间:

std::vector<T>文档说明:

因为它重新分配新的连续内存,并将元素移动/复制到新创建的内存中。

https://en.cppreference.com/w/cpp/container/vector/push_back

要防止调整大小,请尝试以下操作:


constexpr auto sin = [](float x) {


 ADD_STACK_TRACE


 return x - 


 ((x * x * x) / 6.0f) + 


 ((x * x * x * x * x) / 120.0f) -


 ((x * x * x * x * x * x * x) / 5040.0f);


};



constexpr auto cos = [](float x) {


 ADD_STACK_TRACE


 return 1.0f - 


 ((x * x) / 2.0f) + 


 ((x * x * x * x) / 24.0f) -


 ((x * x * x * x * x * x) / 720.0f);


};



float sum(float i1, float i2) {


 ADD_STACK_TRACE


 return i1 + i2;


}



float tan(float f) {


 ADD_STACK_TRACE


 return sin(f) / cos(f);


}



int main() {


 stack_trace.reserve( 1000 ); // A big capacity


 ADD_STACK_TRACE


 float param = sum(44.0f, 1.0f) * PI / 180.0f;


 return tan(param);


} 



...