Commit 3ecc137
[Caffe2] Improve AddMomentsVec and UpdateMomentsVec (pytorch#167664)
Summary: RowwiseMomentsImpl accounts for about 0.4% cpu time of AdRanker: https://fburl.com/strobelight/ywf79nw3. It primarily calls AddMomentsVec and UpdateMomentsVec. These two routines are written using Pytorch's VecLib, meaning the utilized operators translate into intrinsics. Unfortunately, the compiler makes less transformations and optimizations when intrinsics are used. Therefore, if we carefully decouple and re-order operations, the emitted instruction sequence improves. Here we can see the dissassembly for the old and new AddMomentsVec: https://godbolt.org/z/83fxYvKfv We can see a much better instruction sequence is achieved in the new implementation. Test Plan: AdRanker ServiceLab Differential Revision: D86805648 Pull Request resolved: pytorch#167664 Approved by: https://github.com/mcfi1 parent 84a7a34 commit 3ecc137
1 file changed
+16
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| |||
65 | 68 | | |
66 | 69 | | |
67 | 70 | | |
| 71 | + | |
68 | 72 | | |
69 | | - | |
70 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
71 | 76 | | |
72 | 77 | | |
73 | 78 | | |
| |||
89 | 94 | | |
90 | 95 | | |
91 | 96 | | |
| 97 | + | |
92 | 98 | | |
93 | 99 | | |
94 | 100 | | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
99 | 107 | | |
100 | 108 | | |
101 | 109 | | |
| |||
0 commit comments