- Notifications
You must be signed in to change notification settings - Fork 5.4k
Improvements to the cudamatrix directory. #3221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| inline cublasHandle_t GetCublasHandle() { return cublas_handle_; } | ||
| inline cusparseHandle_t GetCusparseHandle() { return cusparse_handle_; } | ||
| inline curandGenerator_t GetCurandHandle() { return curand_handle_; } | ||
| inline cusolverDnHandle_t GetCusolverDnHandle() { return cusolverdn_handle_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for something just yet? Not that I necessarily object, just asking.
| Not yet but it will be for future commits. The example where i'm using it is to use a solver instead of the built in matrix inversion. Here is a snapshot from the ivector extractor i'm working on: #if 0 quadratic.Invert(); ivector->Resize(ivector_dim_,kUndefined); ivector->AddSpVec(1.0, quadratic, linear, 0.0); #else //x = quadratic^-1 * linear //ivector+=x //Inverting the matrix is unneccessary. We are only solving a single //linear system. So just use choleskey's to solve for a single ivector //Equation being solved: quadratic * ivector = linear int nrhs=1; ivector->Resize(ivector_dim_, kUndefined); //cusolver does an inplace solve. so copy RHS to ivector ivector->CopyFromVec(linear); //Forming new non-SP matrix for cusolver. CuMatrix<float> A(quadratic); //This is the cusolver return code. Checking it would require synchronization. //So we do not check it. int *d_info = NULL; //query temp buffer size int L_work; CUSOLVER_SAFE_CALL(cusolverDnSpotrf_bufferSize(GetCusolverDnHandle(), CUBLAS_FILL_MODE_LOWER, ivector_dim_, A.Data(), A.Stride(), &L_work)); //allocate temp buffer float *workspace = static_cast<float*>(CuDevice::Instantiate().Malloc(L_work)); //perform factorization CUSOLVER_SAFE_CALL(cusolverDnSpotrf(GetCusolverDnHandle(), CUBLAS_FILL_MODE_LOWER, ivector_dim_, A.Data(), A.Stride(), workspace, L_work, d_info)); //solve for rhs CUSOLVER_SAFE_CALL(cusolverDnSpotrs(GetCusolverDnHandle(), CUBLAS_FILL_MODE_LOWER, ivector_dim_, nrhs, A.Data(), A.Stride(), ivector->Data(), ivector_dim_, d_info)); CuDevice::Instantiate().Free(workspace); #endif Note we could also integrate something similar into the built in inversion routines as this library supports both cholesky and lu inversion. But in the case i'm working on inversion is overkill. …On Thu, Apr 11, 2019 at 2:02 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/cudamatrix/cu-device.h <#3221 (comment)>: > @@ -83,6 +84,7 @@ class CuDevice { inline cublasHandle_t GetCublasHandle() { return cublas_handle_; } inline cusparseHandle_t GetCusparseHandle() { return cusparse_handle_; } inline curandGenerator_t GetCurandHandle() { return curand_handle_; } + inline cusolverDnHandle_t GetCusolverDnHandle() { return cusolverdn_handle_; } Is this needed for something just yet? Not that I necessarily object, just asking. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcmLdfV-ppKURywEEQf-6MXrTdjINks5vf5TvgaJpZM4coSYW> . |
| Hm, OK, I'll merge. Can you first confirm that you ran the tests in src/ with CUDA enabled, and none failed, other than the dct thing which we are fixing? |
| I ran make test and nothing failed up to the dct test. it stopped there and did not continue. I did also run .are test in the cudamatrix for which ran fine. …On Thu, Apr 11, 2019, 8:36 PM Daniel Povey ***@***.***> wrote: Hm, OK, I'll merge. Can you first confirm that you ran the tests in src/ with CUDA enabled, and none failed, other than the dct thing which we are fixing? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcsDlhpkbb9L8VB2RXIFmgMUbChIuks5vf_EVgaJpZM4coSYW> . |
| OK, please rerun after merging with master because we fixed the DCT thing. On Thu, Apr 11, 2019 at 4:45 PM Justin Luitjens <notifications@github.com> wrote: … I ran make test and nothing failed up to the dct test. it stopped there and did not continue. I did also run .are test in the cudamatrix for which ran fine. On Thu, Apr 11, 2019, 8:36 PM Daniel Povey ***@***.***> wrote: > Hm, OK, I'll merge. Can you first confirm that you ran the tests in src/ > with CUDA enabled, and none failed, other than the dct thing which we are > fixing? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3221 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AGRZcsDlhpkbb9L8VB2RXIFmgMUbChIuks5vf_EVgaJpZM4coSYW > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu7tOMLuVXLRb6gzHfoXKdW9A1SG3ks5vf_NKgaJpZM4coSYW> . |
57e7104 to c3c3d69 Compare | rebased onto master and am now testing at my patch~1 (so pure master). There are failures. Going to run through and see if I can figure any out and will report any that are not simply tolerance is too tight. So far one matrix test failed due to a tolerance at 1e-5 instead of 1e-4. On Thu, Apr 11, 2019 at 8:48 PM Daniel Povey <notifications@github.com> wrote: … OK, please rerun after merging with master because we fixed the DCT thing. On Thu, Apr 11, 2019 at 4:45 PM Justin Luitjens ***@***.***> wrote: > I ran make test and nothing failed up to the dct test. it stopped there > and did not continue. I did also run .are test in the cudamatrix for which > ran fine. > > On Thu, Apr 11, 2019, 8:36 PM Daniel Povey ***@***.***> > wrote: > > > Hm, OK, I'll merge. Can you first confirm that you ran the tests in src/ > > with CUDA enabled, and none failed, other than the dct thing which we are > > fixing? > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub > > <#3221 (comment)>, > or mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AGRZcsDlhpkbb9L8VB2RXIFmgMUbChIuks5vf_EVgaJpZM4coSYW > > > > . > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3221 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADJVu7tOMLuVXLRb6gzHfoXKdW9A1SG3ks5vf_NKgaJpZM4coSYW > > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcnw5991kkUHtmBHzcFiENKejvHH2ks5vf_QZgaJpZM4coSYW> . |
| the one failure with tolerance in cudamatrix is the one i've already fixed in my patch set (duh!). Anyway there are other failures. I tested a few random directories to see: nnet2 failure: LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:214) DctComponent, input-dim=10, output-dim=2, dct_dim=5, dct_keep_dim=1 LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:83) Comparing feature gradients 10 times. LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 1.62634e-05 and 7.41854e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 1.16659e-05 and 3.0525e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 2.23374e-05 and 1.37091e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing -9.48816e-05 and -1.90735e-06 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 7.98946e-05 and 4.42564e-06 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing -4.7326e-05 and -3.51667e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 0.00015349 and -4.29675e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing -7.85769e-05 and 1.3791e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing 6.35098e-06 and -1.19284e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:116) Input gradients: comparing -2.54259e-05 and 1.44467e-05 WARNING ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:121) Bad difference! LOG ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:128) Succeeded for 0 out of 10 tries. ERROR ([5.5.287-9b730]:UnitTestGenericComponentInternal():nnet-component-test.cc:132) Feature-derivative check failed nnet3 failure: Input-indexes: <I1V> 8 <I1> 0 3 0 <I1> 0 4 0 <I1> 0 5 0 <I1> 0 6 0 <I1> 1 3 0 <I1> 1 4 0 <I1> 2 3 0 <I1> 2 4 0 Input-indexes-modified: <I1V> 12 <I1> 0 3 0 <I1> 0 4 0 <I1> 1 3 0 <I1> 1 4 0 <I1> 2 3 0 <I1> 2 4 0 <I1> 0 5 0 <I1> 0 6 0 <I1> 1 -2147483648 0 <I1> 1 -2147483648 0 <I1> 2 -2147483648 0 <I1> 2 -2147483648 0 Output-indexes: <I1V> 4 <I1> 0 4 0 <I1> 0 6 0 <I1> 1 4 0 <I1> 2 4 0 Output-indexes-modified: <I1V> 6 <I1> 0 4 0 <I1> 1 4 0 <I1> 2 4 0 <I1> 0 6 0 <I1> 1 -2147483648 0 <I1> 2 -2147483648 0 LOG ([5.5.287-9b730]:UnitTestTimeHeightConvolutionCompile():convolution-test.cc:396) iter = 1 WARNING ([5.5.287-9b730]:Check():convolution.cc:195) The input at the 2'th height is never used. WARNING ([5.5.287-9b730]:GetRandomConvolutionModel():convolution-test.cc:70) Regenerating model because it didn't pass the check: num-filters-in=3, num-filters-out=7, height-in=8, height-out=1, height-subsample-out=1, {time,height}-offsets=[0,-1 0,0 0,1], required-time-offsets=[0], input-dim=24, output-dim=7 LOG ([5.5.287-9b730]:TestRunningComputation():convolution-test.cc:298) Tested convolution for model: num-filters-in=3, num-filters-out=9, height-in=4, height-out=3, height-subsample-out=1, {time,height}-offsets=[0,0 0,2 1,0 2,1 2,2], required-time-offsets=[0,2], input-dim=12, output-dim=27 LOG ([5.5.287-9b730]:TestDataBackprop():convolution-test.cc:341) Expected objf = -29.8989, observed objf = -29.8989 LOG ([5.5.287-9b730]:TestParamsBackprop():convolution-test.cc:384) Expected objf = 26.8043, observed objf = 26.8043 LOG ([5.5.287-9b730]:UnitTestTimeHeightConvolutionCompile():convolution-test.cc:432) Input-indexes: <I1V> 3 <I1> 2 3 1 <I1> 2 4 1 <I1> 2 5 1 Input-indexes-modified: <I1V> 3 <I1> 2 3 1 <I1> 2 4 1 <I1> 2 5 1 Output-indexes: <I1V> 1 <I1> 2 3 1 Output-indexes-modified: <I1V> 1 <I1> 2 3 1 LOG ([5.5.287-9b730]:UnitTestTimeHeightConvolutionCompile():convolution-test.cc:396) iter = 2 WARNING ([5.5.287-9b730]:Check():convolution.cc:195) The input at the 3'th height is never used. WARNING ([5.5.287-9b730]:GetRandomConvolutionModel():convolution-test.cc:70) Regenerating model because it didn't pass the check: num-filters-in=8, num-filters-out=4, height-in=4, height-out=3, height-subsample-out=1, {time,height}-offsets=[0,-1 0,0 1,-2], required-time-offsets=[0,1], input-dim=32, output-dim=12 LOG ([5.5.287-9b730]:TestRunningComputation():convolution-test.cc:298) Tested convolution for model: num-filters-in=1, num-filters-out=6, height-in=4, height-out=4, height-subsample-out=1, {time,height}-offsets=[0,0], required-time-offsets=[0], input-dim=4, output-dim=24 LOG ([5.5.287-9b730]:TestDataBackprop():convolution-test.cc:341) Expected objf = -5.6619, observed objf = -5.6619 LOG ([5.5.287-9b730]:TestParamsBackprop():convolution-test.cc:384) Expected objf = 0.103592, observed objf = -4.79865 ERROR ([5.5.287-9b730]:TestParamsBackprop():convolution-test.cc:388) Difference in objf too large. [ Stack-Trace: ] kaldi::MessageLogger::LogMessage() const kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) kaldi::nnet3::time_height_convolution::TestParamsBackprop(kaldi::nnet3::time_height_convolution::ConvolutionModel const&, std::vector<kaldi::nnet3::Index, std::allocator<kaldi::nnet3::Index> > const&, std::vector<kaldi::nnet3::Index, std::allocator<kaldi::nnet3::Index> > const&, kaldi::nnet3::time_height_convolution::ConvolutionComputation const&) kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolutionCompile() kaldi::nnet3::time_height_convolution::UnitTestTimeHeightConvolution() main __libc_start_main _start nnet3 failure: WARNING ([5.5.287-9b730]:PreconditionDirectionsCpu():natural-gradient-online-test.cc:205) Floored 5 elements of d_{t+1}. WARNING ([5.5.287-9b730]:PreconditionDirectionsCpu():natural-gradient-online-test.cc:205) Floored 4 elements of d_{t+1}. WARNING ([5.5.287-9b730]:PreconditionDirectionsCpu():natural-gradient-online-test.cc:205) Floored 5 elements of d_{t+1}. WARNING ([5.5.287-9b730]:SelfTest():natural-gradient-online.cc:316) Failed to verify W_t (worst error: O[0,0] = 8.46464e+08, d_t = [ 1.33135e-09 ] ASSERTION_FAILED ([5.5.287-9b730]:SelfTest():natural-gradient-online.cc:283) Assertion failed: (rho_t_ > 0.9 * delta_ * d_t_max) [ Stack-Trace: ] kaldi::MessageLogger::LogMessage() const kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) kaldi::nnet3::OnlineNaturalGradient::SelfTest() const kaldi::nnet3::OnlineNaturalGradient::PreconditionDirectionsInternal(float, float, bool, kaldi::Vector<float> const&, kaldi::CuMatrixBase<float>*, kaldi::CuMatrixBase<float>*) kaldi::nnet3::OnlineNaturalGradient::PreconditionDirections(kaldi::CuMatrixBase<float>*, float*) kaldi::nnet3::OnlineNaturalGradient::Init(kaldi::CuMatrixBase<float> const&) kaldi::nnet3::OnlineNaturalGradient::PreconditionDirections(kaldi::CuMatrixBase<float>*, float*) kaldi::nnet3::UnitTestPreconditionDirectionsOnline() main __libc_start_main _start Again these are all without my changes. …On Thu, Apr 11, 2019 at 9:31 PM Justin Luitjens ***@***.***> wrote: rebased onto master and am now testing at my patch~1 (so pure master). There are failures. Going to run through and see if I can figure any out and will report any that are not simply tolerance is too tight. So far one matrix test failed due to a tolerance at 1e-5 instead of 1e-4. On Thu, Apr 11, 2019 at 8:48 PM Daniel Povey ***@***.***> wrote: > OK, please rerun after merging with master because we fixed the DCT thing. > > On Thu, Apr 11, 2019 at 4:45 PM Justin Luitjens ***@***.*** > > > wrote: > > > I ran make test and nothing failed up to the dct test. it stopped there > > and did not continue. I did also run .are test in the cudamatrix for > which > > ran fine. > > > > On Thu, Apr 11, 2019, 8:36 PM Daniel Povey ***@***.***> > > wrote: > > > > > Hm, OK, I'll merge. Can you first confirm that you ran the tests in > src/ > > > with CUDA enabled, and none failed, other than the dct thing which we > are > > > fixing? > > > > > > — > > > You are receiving this because you authored the thread. > > > Reply to this email directly, view it on GitHub > > > <#3221 (comment) > >, > > or mute > > > the thread > > > < > > > https://github.com/notifications/unsubscribe-auth/AGRZcsDlhpkbb9L8VB2RXIFmgMUbChIuks5vf_EVgaJpZM4coSYW > > > > > > . > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > <#3221 (comment)>, > or mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/ADJVu7tOMLuVXLRb6gzHfoXKdW9A1SG3ks5vf_NKgaJpZM4coSYW > > > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3221 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AGRZcnw5991kkUHtmBHzcFiENKejvHH2ks5vf_QZgaJpZM4coSYW> > . > |
| Were these errors while running on CPU? You can tell from seeing whether it already printed that it got a GPU. Is this setup using MKL? I suspect the change to MKL may be altering things. |
| This is not running with MKL. We are using Atlas. It looks like it has gotten to the GPU tests. See the attached log for the nnet2 fail. …On Thu, Apr 11, 2019 at 9:48 PM Daniel Povey ***@***.***> wrote: Were these errors while running on CPU? You can tell from seeing whether it already printed that it got a GPU. Is this setup using MKL? I suspect the change to MKL may be altering things. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcrvzHTdw8d9gWv4EIpDPw8IG7NSlks5vgAIpgaJpZM4coSYW> . |
| Looks like github does not allow attachments. |
| I did run make depend and make clean. The only thing i didn't do is a distclean and reconfigure. it certainly could be synchronization bugs on newer hardware. Do you run any regression tests on V100? …On Thu, Apr 11, 2019 at 10:02 PM Daniel Povey ***@***.***> wrote: Looks like github does not allow attachments. I am starting to test the current master on our hardware. Could it be those synchronization bugs due to newer hardware? And you are sure this is master you are testing, none of your changes? Do "make depend" and "make clean" in cudamatrix to be sure it's not a dependency-tracking issue. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcti-u5WMdGgl4bRKhEVHiwMvGxy2ks5vgAVggaJpZM4coSYW> . |
| No, I haven't tested on v100. For me, on our hardware, the tests work with current master in the nnet3 directory. perhaps shiyin's commit would go some way to resolving it? On Thu, Apr 11, 2019 at 6:11 PM Justin Luitjens <notifications@github.com> wrote: … I did run make depend and make clean. The only thing i didn't do is a distclean and reconfigure. it certainly could be synchronization bugs on newer hardware. Do you run any regression tests on V100? On Thu, Apr 11, 2019 at 10:02 PM Daniel Povey ***@***.***> wrote: > Looks like github does not allow attachments. > I am starting to test the current master on our hardware. Could it be > those synchronization bugs due to newer hardware? > And you are sure this is master you are testing, none of your changes? Do > "make depend" and "make clean" in cudamatrix to be sure it's not a > dependency-tracking issue. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3221 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AGRZcti-u5WMdGgl4bRKhEVHiwMvGxy2ks5vgAVggaJpZM4coSYW > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu2dDZmgP6dBbGR6RcIrhb0HoXYHmks5vgAdpgaJpZM4coSYW> . |
| possibly. I think we should accept that commit and I'll rebase Monday and retest. On Thu, Apr 11, 2019, 10:15 PM Daniel Povey <notifications@github.com> wrote: … No, I haven't tested on v100. For me, on our hardware, the tests work with current master in the nnet3 directory. perhaps shiyin's commit would go some way to resolving it? On Thu, Apr 11, 2019 at 6:11 PM Justin Luitjens ***@***.***> wrote: > I did run make depend and make clean. The only thing i didn't do is a > distclean and reconfigure. > > it certainly could be synchronization bugs on newer hardware. Do you run > any regression tests on V100? > > On Thu, Apr 11, 2019 at 10:02 PM Daniel Povey ***@***.***> > wrote: > > > Looks like github does not allow attachments. > > I am starting to test the current master on our hardware. Could it be > > those synchronization bugs due to newer hardware? > > And you are sure this is master you are testing, none of your changes? Do > > "make depend" and "make clean" in cudamatrix to be sure it's not a > > dependency-tracking issue. > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub > > <#3221 (comment)>, > or mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AGRZcti-u5WMdGgl4bRKhEVHiwMvGxy2ks5vgAVggaJpZM4coSYW > > > > . > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3221 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADJVu2dDZmgP6dBbGR6RcIrhb0HoXYHmks5vgAdpgaJpZM4coSYW > > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3221 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRZcrxZ10HDBVJ3uU3H3W-NTvuzyFPZks5vgAhegaJpZM4coSYW> . |
c2465ee to 02beb2e Compare src/cudamatrix/cu-matrix.cc Outdated
| | ||
| | ||
| #if HAVE_CUDA == 1 | ||
| #include <nvToolsExt.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is left over from profiling. it can be safely removed.
Changes: cu-array-inl.h, cu-packed-matrix.cc: Remove unecessary synchronization. Synchronization will occur with stream semantics cu-device.h, cu-device.cc, cuda-common.h, cuda_64bit.mk Add a handle for cusolverDN library. Future changes will rely on this. cu-kernels-ansi.h, cu-kernels.cu, cu-kernels.h: Add RowSumMat kernel support which mirriors ColSumMat but operators on rows. cu-matrix.cc: make cudaMemset2D asynchronous. Synchronization is handled via streams. cu-value.h: Added -= operator which mirriors += operator cu-vector.cc, cu-vector.h: Added ApplyLogSoftMax which matches CPU version. Remove stream synchronization on AddMatVec (handled by streams) Use direct kernel for row sum instead of a mat vec. This is more efficient as it avoids extra allocation and memset. cu-sparse-matrix-test.cc: adjusted epislon to be more tolerant of order of operations floating point error.
02beb2e to b0ae28c Compare | @huangruizhe can you please ASAP prepare a PR that reverts just the cusolver-related parts of this PR? |
…aldi-asr#3221) Changes: cu-array-inl.h, cu-packed-matrix.cc: Remove unecessary synchronization. Synchronization will occur with stream semantics cu-device.h, cu-device.cc, cuda-common.h, cuda_64bit.mk Add a handle for cusolverDN library. Future changes will rely on this. cu-kernels-ansi.h, cu-kernels.cu, cu-kernels.h: Add RowSumMat kernel support which mirriors ColSumMat but operators on rows. cu-matrix.cc: make cudaMemset2D asynchronous. Synchronization is handled via streams. cu-value.h: Added -= operator which mirriors += operator cu-vector.cc, cu-vector.h: Added ApplyLogSoftMax which matches CPU version. Remove stream synchronization on AddMatVec (handled by streams) Use direct kernel for row sum instead of a mat vec. This is more efficient as it avoids extra allocation and memset. cu-sparse-matrix-test.cc: adjusted epislon to be more tolerant of order of operations floating point error.
Changes:
cu-array-inl.h, cu-packed-matrix.cc:
Remove unecessary synchronization. Synchronization will occur with
stream semantics
cu-device.h, cu-device.cc, cuda-common.h, cuda_64bit.mk
Add a handle for cusolverDN library. Future changes will rely on
this.
cu-kernels-ansi.h, cu-kernels.cu, cu-kernels.h:
Add RowSumMat kernel support which mirriors ColSumMat but operators on rows.
cu-matrix.cc:
make cudaMemset2D asynchronous. Synchronization is handled via
streams.
cu-value.h:
Added -= operator which mirrors += operator
cu-vector.cc, cu-vector.h:
Added ApplyLogSoftMax which matches CPU version.
Remove stream synchronization on AddMatVec (handled by streams)
Use direct kernel for row sum instead of a mat vec. This is more
efficient as it avoids extra allocation and memset.
cu-sparse-matrix-test.cc:
adjusted epsilon to be more tolerant of order of operations floating
point error.