1.** Optimization of network forward calculation * *:tensort uses a technology called "network layer convergence" to optimize forward calculation. It combines adjacent convolution layer and pool layer to reduce unnecessary memory access and data rearrangement. In addition, TensorRT parallelizes these operations with special instructions (such as CUDA) on the GPU, thus improving the computing speed.
2.** Optimization of reverse calculation * *: For the back propagation of neural network, TensorRT optimizes the calculation by using techniques similar to forward calculation. It combines the adjacent convolution layer and pool layer, reducing the computation and memory access in back propagation. In addition, TensorRT also uses gradient polymerization technology to further optimize the calculation of back propagation.
3.** Data layout optimization * *: Tensorrt can also optimize data layout and further improve memory access efficiency. It uses a technology called "data layout optimization" to determine the best data layout mode according to the network structure and data flow mode. This can reduce unnecessary memory access and data rearrangement, thus improving the reasoning speed.
4.** Model Pruning * *: For some models that do not need precise reasoning, such as image super-resolution or semantic segmentation, model pruning technology can be used to reduce the complexity of the model. TensorRT supports the use of different pruning strategies, such as global or local pruning, to reduce the parameters and calculation of the model, thus improving the reasoning speed.
Through the above optimization techniques, TensorRT can significantly improve the speed of neural network reasoning, especially in GPU environment. This makes TensorRT a commonly used reasoning acceleration tool in deep learning applications.