Do Cholesky Decomposition With the GPU#529
Conversation
|
Q: Should issues I have with this go under a separate issue or be listed here? Attempting to run the tests with the current branch gives the following errors running I don't have experience with clang, I mostly use gcc or nvcc. What causes this part of the error for clang? |
|
Rolling the build fix into the same pull request is fine. Afraid I don't know how to fix it, though.
… On Apr 22, 2017, at 6:00 PM, Steve Bronder ***@***.***> wrote:
Q: Should issues I have with this go under a separate issue or be listed here?
Attempting to run the tests with the current branch gives the following errors
$ ./runTests.py test/unit
------------------------------------------------------------
make -j1 test/unit/math_include_test test/unit/multiple_translation_units_test
clang++ -shared -fPIC -o test/unit/libmultiple.so test/unit/multiple_translation_units1.o test/unit/multiple_translation_units2.o
/usr/bin/ld: test/unit/multiple_translation_units1.o: relocation R_X86_64_32S against `_ZTVN4stan4math4variE' can not be used when making a shared object; recompile with -fPIC
test/unit/multiple_translation_units1.o: error adding symbols: Bad value
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: Warning: Archive 'test/libgtest.a' seems to have been created in deterministic mode. 'test/gtest.o' will always be updated. Please consider passing the U flag to ar to avoid the problem.
ar rv test/libgtest.a test/gtest.o
r - test/gtest.o
make: Warning: Archive 'test/libgtest.a' seems to have been created in deterministic mode. 'test/gtest.o' will always be updated. Please consider passing the U flag to ar to avoid the problem.
clang++ -I . -isystem lib/eigen_3.2.9 -isystem lib/boost_1.62.0 -isystem lib/viennacl_1.7.1 -isystemlib/cvodes_2.9.0/include -Wall -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DNO_FPRINTF_OUTPUT -pipe -lOpenCL -Wno-unused-function -Wno-uninitialized -Wno-c++11-long-long -c -O3 -DGTEST_USE_OWN_TR1_TUPLE -DGTEST_HAS_PTHREAD=0 -Wno-c++11-long-long -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 test/unit/math_include_test.cpp -o test/unit/math_include_test.o
clang: warning: -lOpenCL: 'linker' input unused
clang++ -I . -isystem lib/eigen_3.2.9 -isystem lib/boost_1.62.0 -isystem lib/viennacl_1.7.1 -isystemlib/cvodes_2.9.0/include -Wall -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DNO_FPRINTF_OUTPUT -pipe -lOpenCL -Wno-unused-function -Wno-uninitialized -Wno-c++11-long-long -O3 lib/gtest_1.7.0/src/gtest_main.cc test/unit/math_include_test.o -DGTEST_USE_OWN_TR1_TUPLE -DGTEST_HAS_PTHREAD=0 -Wno-c++11-long-long -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -o test/unit/math_include_test test/libgtest.a lib/cvodes_2.9.0/lib/libsundials_nvecserial.a lib/cvodes_2.9.0/lib/libsundials_cvodes.a
make: *** No rule to make target 'test/unit/libmultiple.so', needed by 'test/unit/multiple_translation_units_test.cpp'. Stop.
rm test/unit/math_include_test.o
make -j1 test/unit/math_include_test test/unit/multiple_translation_units_test failed
exit now (04/22/17 17:56:19 EDT)
running clang++ -v returns
$ clang++ -v
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.0.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Candidate multilib: .***@***.***
Selected multilib: .***@***.***
Found CUDA installation: /usr/local/cuda
I don't have experience with clang, I mostly use gcc or nvcc. What causes this part of the error for clang?
clang: error: linker command failed with exit code 1 (use -v to see invocation)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
The part about 'test/libgtest.a' may be related to #528 |
|
For the -fPIC thing, add "-fPIC" to the end of the CFLAGS line in math/makefile. I think it's unresolved (http://discourse.mc-stan.org/t/stan-math-ubuntu-clang-needs-fpic/339/5). You'll need to do a make clean and then build again. I'm not sure what the test/libgtest.a warnings are. I see them on Ubuntu 16.04 but since nothing blows up I do not want to investigate haha. |
|
Does the test pass on the develop branch? No need to test everything. Just run: We may have an incompatibility with that test and gcc 5 that we didn't know about until recently. |
|
Compilation Notes:
Yes this does the trick!
Running running The The second warning, as noted above comes from something in g++ for linux and does not seem to be a huge issue. Implementation Notes: On line 351 of cholesky_decompose.hpp there is an if statement that checks if I think what we want is, if the user specifies they want to do computation on the GPU
|
|
Update: Running Which is good at this point because at least it compiles. I just started my new job so am going to be a little slow in writing this (hopefully this weekend). At this point I just need to go through the unit tests with a debugger and see what is causing the failure. @rtrangucci would you have time to take a look at this? I think it is like 90% of the way there, but I'm doing something simple that is giving it the wrong output. |
|
Some bad news. Here is a link to a dropbox folder that contains that cholesky test that I originally ran. However, in this version we do not assume that the matrix is diagonal dominant. Below are the results of the output. Also, oddly, it's much slower even at the 3600x3600 case. Why would the diagonal dominant assumption have such a large effect? How fair is it to assume that the matrix that comes into the Cholesky will be diagonal dominant? |
|
Wow, that really is dramatic. What does diagonal dominant mean?
Is that precision the precision of the matching? We can get by with less than 2e-16 precision.
Sorry if this sounds stupid, but are you sure that's just a difference of input matrices and that nothing else changed?
Also, what's with LU vs. LLT?
...
… -- Times for 3600x3600 Matrix --
Precision: 2.22045e-16
Time for Partial Pivot LU (Eigen): 110.272
----------------------------------------------
Time for LLT (Eigen): 0.55694
----------------------------------------------
Time for LU (ViennaCL): 3.04208
----------------------------------------------
Eigen LU Partial Pivot and ViennaCL Lower Diagonal Do Not Match!
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Do Not Match!
----------------------------------------------
Also, oddly, it's much slower even at the 3600x3600 case.
Why would the diagonal dominant assumption have such a large effect? How fair is it to assume that the matrix that comes into the Cholesky will be diagonal dominant?
|
Diagonal dominant means that for each column the diagonal is the largest value
Yes, I need to test how far off we are given less precision. I also want to look at the Slovenian teams code OpenCL code. If their cholesky works with better precision on non diagonal dominant matrices than it should not be too hard to integrate.
I think so, but I will double check.
To do cholesky in ViennaCL we have to take the L of the LU and divide each column by the diagonal element to get the L of the LLT decomposition. We do this for both the LU of Eigen and ViennaCL to make sure it works on both. My Linear Algebra is rusty, but maybe if it's not diagonal dominant we don't need to do the division? I'll play around with this, though any suggestions would be appreciated. |
|
Thanks. I didnt' see a comment from @betanalpha but we talked about this today and he thought the problem would arise when LU did a lot of pivoting. He thought LLT would be a lot faster. @rtrangucci mentioned that our use case for these will often have non-diagonal dominant matrices. But if it's really just that diagonal is the largest, then that'll certainly be satisfied by correlation matrices, which have 1 on the diagonal and values strictly between -1 and 1 elsewhere. |
|
I ran across this the other day: https://en.wikipedia.org/wiki/Cholesky_decomposition#Stability_of_the_computation (which is I assume what @betanalpha was saying). Apparently Cholesky doesn't require the pivots to be numerically stable, which is why it's nice if you've got a positive definite matrix you want to factor. |
|
I think @betanalpha and @bbbales2 are in the money. We are using the LU in the GPU code to get to the LLT, so pivots matter. Testing the Slovenian teams code with the non diagonal dominant matrices also fails (even at So I'll do a dd to see if other external libraries contain a stable Cholesky for the GPU. Perhaps MAGMA EDIT: Maybe we can use the ViennaCL preconditioners? Not totally sure if that's what we want, but maybe it will help New job is keeping me busy, but I can poke around this week and weekend |
|
I somehow missed this topic on GitHub, so I am a bit late to the party, sorry. I looked into this and I am not really sure on the input matrix you are using. If the input matrix is not positive definite, then the Cholesky should output NaN, thus the precision error. |
|
@rok-cesnovar you are correct! Apologies, I did the new tests after work one day and my brain must have been shut off. Fixing that allows for both the GPU versions to be equal to the Eigen version with precision of FTR: here is how I am computing the test matrix now MatrixXd A = MatrixXd::Random(m,m);
// Make matrices symmetric positive definite
for (int i = 0; i < m; i++){
for (int j = i + 1; j < m; j++){
A(i,j) = A(j,i);
}
}
A = A * A.transpose();Here is a dropbox containing the new tests. Below are the results on my machine And here is a table showing the relative speedup as compared to Eigen for the larger matrices
So BayesCL gives the best results! Though with the recent update of Eigen 3.3 I should probably rewrite the tests to use the inplace LLT. So now the question is, supposing these tests are correct, why are the results not matching up in the unit tests? With the obvious speedups of BayesCL I do think it would be worth using them. Since it's rather easy to use custom kernels in ViennaCL I think we should first get the plain ViennaCL version working and then swap the ViennaCL version with the BayesCL version. I also want to point out that the ViennaCL team would most likely be very interested in having this LLT decomp in their library. Perhaps we can start talking with them about implementing it there? @rok-cesnovar would you want to reach out to them? |
|
Awesome!
We need to figure out the issue with the tests. If the results aren't
valid, then the timing results don't actually hold.
Do you have a handle on the math? Looks like the Cholesky decomposition
isn't unique if the input matrix is positive semi-definite. If you're
testing with positive semi-definite matrices, then make sure we're
recovering the original matrix. I'd also test with positive definite
matrices, where there is a unique Cholesky decomposition.
…On Sat, May 6, 2017 at 2:03 PM, Steve Bronder ***@***.***> wrote:
@rok-cesnovar <https://github.com/rok-cesnovar> you are correct!
Apologies, I did the new tests after work one day and my brain must have
been shut off. Fixing that allows for both the GPU versions to be equal to
the Eigen version with precision of 2.22045e-12. Though they do not match
for deeper precision.
FTR: here is how I am computing the test matrix now
MatrixXd A = MatrixXd::Random(m,m);
// Make matrices symmetric positive definite
for (int i = 0; i < m; i++){
for (int j = i + 1; j < m; j++){
A(i,j) = A(j,i);
}
}
A = A * A.transpose();
Here
<https://www.dropbox.com/s/m88qnjj5orgnq5k/eigen_vs_vienna_dense_lu.tar.gz?dl=0>
is a dropbox containing the new tests. Below are the results on my machine
## Benchmark ::Eigen Vs. ViennaCL Performance
----------------------------------------------
-- Times for 16x16 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 9.9e-05
----------------------------------------------
Time for LU (ViennaCL): 0.000306
----------------------------------------------
Time for BayesCL 0.001507
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
-- Times for 100x100 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 0.003874
----------------------------------------------
Time for LU (ViennaCL): 0.017542
----------------------------------------------
Time for BayesCL 0.0042
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
-- Times for 400x400 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 0.150595
----------------------------------------------
Time for LU (ViennaCL): 0.064673
----------------------------------------------
Time for BayesCL 0.020384
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
-- Times for 900x900 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 1.41396
----------------------------------------------
Time for LU (ViennaCL): 0.283503
----------------------------------------------
Time for BayesCL 0.069564
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
-- Times for 1600x1600 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 8.09797
----------------------------------------------
Time for LU (ViennaCL): 0.661283
----------------------------------------------
Time for BayesCL 0.171118
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
-- Times for 2500x2500 Matrix --
Precision: 2.22045e-12
Using the device GeForce GTX 780 on the platform NVIDIA CUDA
----------------------------------------------
Time for LLT (Eigen): 28.8914
----------------------------------------------
Time for LU (ViennaCL): 2.17089
----------------------------------------------
Time for BayesCL 0.482391
----------------------------------------------
Eigen LLT and ViennaCL Lower Diagonal Match!
----------------------------------------------
Eigen LLT and BayesCL Lower Diagonal Match!
----------------------------------------------
And here is a table showing the relative speedup as compared to Eigen for
the larger matrices
Size Eigen ViennaCL BayesCL
400 1 2.5 7.5
900 1 5 23
1600 1 12 47
2500 1 13 60
So BayesCL gives the best results! Though with the recent update of Eigen
3.3 I should probably rewrite the tests to use the inplace LLT.
So now the question is, supposing these tests are correct, why are the
results not matching up in the unit tests?
With the obvious speedups of BayesCL I do think it would be worth using
them. Since it's rather easy to use custom kernels in ViennaCL I think we
should first get the plain ViennaCL version working and then swap the
ViennaCL version with the BayesCL version.
I also want to point out that the ViennaCL team would most likely be very
interested in having this LLT decomp in their library. Perhaps we can start
talking with them about implementing it there? @rok-cesnovar
<https://github.com/rok-cesnovar> would you want to reach out to them?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#529 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAZ_F96WF2pu6WhUnZ5l_rRG3hR4t3Dnks5r3LXpgaJpZM4NFNUg>
.
|
|
Steve, check out my comment on gpu_cholesky. The gradient isn't being calculated correctly, but I wrote something I think is correct in the comment. Can I run the tests on my own?
Rob
… On May 6, 2017, at 11:55 AM, Daniel Lee ***@***.***> wrote:
Awesome!
We need to figure out the issue with the tests. If the results aren't
valid, then the timing results don't actually hold.
Do you have a handle on the math? Looks like the Cholesky decomposition
isn't unique if the input matrix is positive semi-definite. If you're
testing with positive semi-definite matrices, then make sure we're
recovering the original matrix. I'd also test with positive definite
matrices, where there is a unique Cholesky decomposition.
On Sat, May 6, 2017 at 2:03 PM, Steve Bronder ***@***.***>
wrote:
> @rok-cesnovar <https://github.com/rok-cesnovar> you are correct!
> Apologies, I did the new tests after work one day and my brain must have
> been shut off. Fixing that allows for both the GPU versions to be equal to
> the Eigen version with precision of 2.22045e-12. Though they do not match
> for deeper precision.
>
> FTR: here is how I am computing the test matrix now
>
> MatrixXd A = MatrixXd::Random(m,m);
> // Make matrices symmetric positive definite
> for (int i = 0; i < m; i++){
> for (int j = i + 1; j < m; j++){
> A(i,j) = A(j,i);
> }
> }
> A = A * A.transpose();
>
> Here
> <https://www.dropbox.com/s/m88qnjj5orgnq5k/eigen_vs_vienna_dense_lu.tar.gz?dl=0>
> is a dropbox containing the new tests. Below are the results on my machine
>
> ## Benchmark ::Eigen Vs. ViennaCL Performance
> ----------------------------------------------
>
>
> -- Times for 16x16 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 9.9e-05
> ----------------------------------------------
> Time for LU (ViennaCL): 0.000306
> ----------------------------------------------
> Time for BayesCL 0.001507
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> -- Times for 100x100 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 0.003874
> ----------------------------------------------
> Time for LU (ViennaCL): 0.017542
> ----------------------------------------------
> Time for BayesCL 0.0042
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> -- Times for 400x400 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 0.150595
> ----------------------------------------------
> Time for LU (ViennaCL): 0.064673
> ----------------------------------------------
> Time for BayesCL 0.020384
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> -- Times for 900x900 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 1.41396
> ----------------------------------------------
> Time for LU (ViennaCL): 0.283503
> ----------------------------------------------
> Time for BayesCL 0.069564
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> -- Times for 1600x1600 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 8.09797
> ----------------------------------------------
> Time for LU (ViennaCL): 0.661283
> ----------------------------------------------
> Time for BayesCL 0.171118
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> -- Times for 2500x2500 Matrix --
> Precision: 2.22045e-12
> Using the device GeForce GTX 780 on the platform NVIDIA CUDA
> ----------------------------------------------
> Time for LLT (Eigen): 28.8914
> ----------------------------------------------
> Time for LU (ViennaCL): 2.17089
> ----------------------------------------------
> Time for BayesCL 0.482391
> ----------------------------------------------
> Eigen LLT and ViennaCL Lower Diagonal Match!
> ----------------------------------------------
> Eigen LLT and BayesCL Lower Diagonal Match!
> ----------------------------------------------
>
> And here is a table showing the relative speedup as compared to Eigen for
> the larger matrices
> Size Eigen ViennaCL BayesCL
> 400 1 2.5 7.5
> 900 1 5 23
> 1600 1 12 47
> 2500 1 13 60
>
> So BayesCL gives the best results! Though with the recent update of Eigen
> 3.3 I should probably rewrite the tests to use the inplace LLT.
>
> So now the question is, supposing these tests are correct, why are the
> results not matching up in the unit tests?
>
> With the obvious speedups of BayesCL I do think it would be worth using
> them. Since it's rather easy to use custom kernels in ViennaCL I think we
> should first get the plain ViennaCL version working and then swap the
> ViennaCL version with the BayesCL version.
>
> I also want to point out that the ViennaCL team would most likely be very
> interested in having this LLT decomp in their library. Perhaps we can start
> talking with them about implementing it there? @rok-cesnovar
> <https://github.com/rok-cesnovar> would you want to reach out to them?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#529 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAZ_F96WF2pu6WhUnZ5l_rRG3hR4t3Dnks5r3LXpgaJpZM4NFNUg>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
On Sat, May 6, 2017 at 2:03 PM, Steve Bronder ***@***.***> wrote:
FTR: here is how I am computing the test matrix now
MatrixXd A = MatrixXd::Random(m,m);
// Make matrices symmetric positive definite
for (int i = 0; i < m; i++){
for (int j = i + 1; j < m; j++){
A(i,j) = A(j,i);
}
}
A = A * A.transpose();
That may yield a weird distribution for A. But as long as we are testing
this for use in Stan, you might as well do
MatrixXd L = stan::math::lkj_corr_cholesky_rng(1.0);
MatrixXd A = stan::math::multiply_lower_tri_self_transpose(L);
and perhaps also do some tests with a diagonal matrix of very different
standard deviations that pre-multiplies lkj_corr_cholesky_rng(1.0).
|
|
I apologize for the long post. I had time today to go through and really think about everything. TL;DR: All the tests for stan are passing. I rewrote the benchmark so that it uses the
... ehhh, sort of. I may need a little hand holding from time to time. It's been a while since I have done any serious linear algebra. For instance, I was doing Benchmark ResultsI remade the benchmark so that we use the Below are the measure of speedup relative to Eigen. The custom BayesCL kernel is about twice as fast as ViennaCL.
stan Tests@syclik with the most recent commit the tests do pass! With the current development version we are really only testing One thing to note: As you can see here in Some other things
|
|
@bgoodri would it be better to use MatrixXd L = stan::math::lkj_corr_cholesky_rng(1.0);
MatrixXd A = stan::math::multiply_lower_tri_self_transpose(L);
Tmrw I will rewrite the benchmark so that it runs a few different matrices and collects the times and whether they all matched |
|
@rtrangucci just saw your post now, will go and look at your comment now! |
|
@rtrangucci for some reason I can see part of your comment from the email github sent me, but it won't take me to the full comment. Could you post the comment here?
It's easy to run, but it can take a little time setting up. If you have an NVIDIA GPU, I used these instructions to get mine all set up |
|
Another question is, once we have the GPU version how do we check it with Travis? |
|
@Stevo15025 Here's the comment below: I think the gradient code should be something like this, starting on line 306: |
|
@rtrangucci I updated the code with your code, though putting You seem to be very close though. I'll look more at this tmrw. If you have any problems installing |
|
We need to divide diagonal of the result, vcl_Lbar, by 2. That should work. My bad!
Rob
… On May 6, 2017, at 4:56 PM, Steve Bronder ***@***.***> wrote:
@rtrangucci I updated the code with your code, though putting cholesky_gpu() back into things makes the tests fail again.
Running main() from gtest_main.cc
[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from AgradRevMatrix
[ RUN ] AgradRevMatrix.mat_cholesky
[ OK ] AgradRevMatrix.mat_cholesky (0 ms)
[ RUN ] AgradRevMatrix.exception_mat_cholesky
[ OK ] AgradRevMatrix.exception_mat_cholesky (0 ms)
[ RUN ] AgradRevMatrix.mat_cholesky_1st_deriv_small
[ OK ] AgradRevMatrix.mat_cholesky_1st_deriv_small (163 ms)
[ RUN ] AgradRevMatrix.check_varis_on_stack_small
[ OK ] AgradRevMatrix.check_varis_on_stack_small (0 ms)
[ RUN ] AgradRevMatrix.mat_cholesky_1st_deriv_large_gradients
test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp:260: Failure
The difference between grad_fd(i) and grad_ad(i) is 14.279985728486359, which exceeds prec, where
grad_fd(i) evaluates to -1.8259633576235501,
grad_ad(i) evaluates to -16.105949086109909, and
prec evaluates to 1e-08.
test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp:260: Failure
The difference between grad_fd(i) and grad_ad(i) is 1.563984294565459, which exceeds prec, where
grad_fd(i) evaluates to 0.55275936344211618,
grad_ad(i) evaluates to 2.1167436580075751, and
prec evaluates to 1e-08.
test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp:260: Failure
The difference between grad_fd(i) and grad_ad(i) is 488.01477513919122, which exceeds prec, where
grad_fd(i) evaluates to -494.24178632483142,
grad_ad(i) evaluates to -982.25656146402264, and
prec evaluates to 1e-08.
[ FAILED ] AgradRevMatrix.mat_cholesky_1st_deriv_large_gradients (6691 ms)
[ RUN ] AgradRevMatrix.check_varis_on_stack_large
[ OK ] AgradRevMatrix.check_varis_on_stack_large (0 ms)
[----------] 6 tests from AgradRevMatrix (6854 ms total)
[----------] Global test environment tear-down
[==========] 6 tests from 1 test case ran. (6854 ms total)
[ PASSED ] 5 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] AgradRevMatrix.mat_cholesky_1st_deriv_large_gradients
1 FAILED TEST
test/unit/math/rev/mat/fun/cholesky_decompose_test --gtest_output="xml:test/unit/math/rev/mat/fun/cholesky_decompose_test.xml" failed
exit now (05/06/17 19:52:48 EDT)
You seem to be very close though. I'll look more at this tmrw. If you have any problems installing nvcc from the link above feel free to contact me tmrw as I will be around all day.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
No worries! Added this to the latest commit here. I'm doing the division on the diagonal after we move back to Eigen. Still off on the tests though? I'm about to run out of the house, but tomorrow I will look over the paper |
|
Oh, reading the paper I see the division I am doing is in the wrong place |
… for cov_matrix_constrain()
…directly, tests are passing
…iple translation units
…ckend for regular cholesky_gpu
c104ce7 to
2bf82a7
Compare
|
So all of the GPU tests are passing on my local. I've added the Oddly On a side note, should we think about having a seperate PR for adding ViennaCL to devel? Then this pull will be more like 500 lines and not 100K. |
|
Here's a link to the error. Seems to be an operator overload error |
|
Nvm Going to run all the tests overnight and if everything passes we should be good to go! |
|
Yes, having a pull request just for ViennaCL would be very helpful.
|
|
Closing this for #637 |

Submission Checklist
./runTests.py test/unitwithg++./runTests.py test/unitwithclang++make cpplintSummary:
Passes Cholesky Decompositions to the GPU using ViennaCL
Intended Effect:
Gives users the ability to pass computation for Cholesky Decompositions to the GPU
How to Verify:
TODO: Make and run tests
Side Effects:
TODO
Documentation:
TODO
Reviewer Suggestions:
Copyright and Licensing
Steve Bronder
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
Yes