一些碎碎念
想了解一下3D深度学习,但以我的水平自己搞就是天方夜谭。于是在youtube上看到了关于pytorch3d的宣传片,觉得利用人家成熟的example学习是一个捷径。但没想到单是安装一环就装吐了。
看了一下历史记录,
- 2021/05/26 13:50 在youtube上偶然看到pytorch3d
- 2021/05/27 13:50 完成安装,测试,运行example
恰好整整折腾了24小时。期间经历了换电脑(原来的电脑GPU比较老),重装系统,原生python党被迫向anconda投降,换cuda版本……还有其他问题我忘了。但是万幸在我的契而不舍下终于成功了,决定写下来这个过程给自己和其他有需要的小伙伴参考。
安装核心库 !注意各个part的版本匹配!
这个是玄学,也讲不清到底应该怎么配,pytorch3d GitHub install.md 上cuda、pytorch和torchvision版本的匹配和pytorch官网上的不一样。中间我也试了好多次,最终成功的版本是:
python 3.8 + cuda 10.2 + pytorch 1.7.1 + torchvision 0.8.2 + cub 1.10.0
其中,cudnn和cudatoolkit与cuda的版本一致的,cub直接conda是不行的,需要从github上下载release,再在环境变量里添加CUB_HOME。
另外还需要:
按照install.md上面的顺序把所有依赖都安装好,然后就可以挑战真正的大坑了
安装pytorch3d
从github,本地clone安装,还是下载release版本安装
没有区别!但是建议本地,因为每次从github上请求,很慢!也不好发现问题在哪。clone和release版本没啥区别,该编译不过的还是不过。我在网上查到说用release版本,然后用管理员运行x64 Native Tools Command Prompt for VS 2019的建议,对我来说p用都没有。另外也有改源码的,install.md上也有早期pytorch需要修改源码才能顺利编译的说明,但是pytorch1.7.1已经是pytorch3d支持的最新版本的pytorch了,所以不能按照install.md上的改,但是最后问题的解决我还是去改了源码,具体看下一节内容。
因为本质都是用写进setup.py里面的install,调用ninja对源码进行编译,所以错误主要是出在编译过程里。
subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1.
这个是所有坑中最坑的,24小时的安装有12个小时耗在这上面。关于这个Error网上最容易查到的是将['ninja','-v']改成['ninja','--v'] 或者['ninja','--version'],这样改会报新的错误:找不到xxx.obj。我不太清楚原理,应该是把一些本来应该被编译出来xxx.obj的过程跳过了,才会找不到。
后来是看了知乎上安装另一些库的经验才知道,这样搞是不行的,而是需要向前找error。
当时我的PowerShell里的log是这样的
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -ID:\Codes\local_install\pytorch3d\pytorch3d\csrc -IC:\SDK\cub-1.10.0 -Ienvs\lib\site-packages\torch\include -Ienvs\lib\site-packages\torch\include\torch\csrc\api\include -Ienvs\lib\site-packages\torch\include\TH -Ienvs\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -Ienvs\include -Ienvs\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Codes\local_install\pytorch3d\pytorch3d\csrc\blending\sigmoid_alpha_blend.cu -o D:\Codes\local_install\pytorch3d\build\temp.win-amd64-3.8\Release\Codes\local_install\pytorch3d\pytorch3d\csrc\blending\sigmoid_alpha_blend.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -std=c++14 -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61
FAILED: D:/Codes/local_install/pytorch3d/build/temp.win-amd64-3.8/Release/Codes/local_install/pytorch3d/pytorch3d/csrc/blending/sigmoid_alpha_blend.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -ID:\Codes\local_install\pytorch3d\pytorch3d\csrc -IC:\SDK\cub-1.10.0 -Ienvs\lib\site-packages\torch\include -Ienvs\lib\site-packages\torch\include\torch\csrc\api\include -Ienvs\lib\site-packages\torch\include\TH -Ienvs\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -Ienvs\include -Ienvs\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30037\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Codes\local_install\pytorch3d\pytorch3d\csrc\blending\sigmoid_alpha_blend.cu -o D:\Codes\local_install\pytorch3d\build\temp.win-amd64-3.8\Release\Codes\local_install\pytorch3d\pytorch3d\csrc\blending\sigmoid_alpha_blend.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -std=c++14 -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61
envs\lib\site-packages\torch\include\pybind11\detail/common.h(106): warning C4005: “HAVE_SNPRINTF”: 宏重定义
envs\include\pyerrors.h(315): note: 参见“HAVE_SNPRINTF”的前一个定义
envs/lib/site-packages/torch/include\c10/util/ThreadLocalDebugInfo.h(12): warning: modifier is ignored on an enum specifier
envs/lib/site-packages/torch/include\ATen/core/boxing/impl/boxing.h(100): warning: integer conversion resulted in a change of sign
envs/lib/site-packages/torch/include\ATen/record_function.h(13): warning: modifier is ignored on an enum specifier
envs/lib/site-packages/torch/include\ATen/core/op_registration/op_whitelist.h(39): warning: integer conversion resulted in a change of sign
envs/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1347): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized
envs/lib/site-packages/torch/include\torch/csrc/autograd/profiler.h(106): warning: modifier is ignored on an enum specifier
envs/lib/site-packages/torch/include\torch/csrc/autograd/profiler.h(138): warning: modifier is ignored on an enum specifier
envs/lib/site-packages/torch/include/torch/csrc/api/include\torch/nn/modules/transformerlayer.h(73): warning: extra ";" ignored
1 error detected in the compilation of "C:/Users/49304/AppData/Local/Temp/tmpxft_0000054c_00000000-7_sigmoid_alpha_blend.cpp1.ii".
nvcc warning : The -std=c++14 flag is not supported with the configured host compiler. Flag will be ignored.
sigmoid_alpha_blend.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "envs\lib\site-packages\torch\utils\cpp_extension.py", line 1533, in _run_ninja_build
subprocess.run(
File "envs\lib\subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 128, in <module>
setup(
File "envs\lib\site-packages\setuptools\__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "envs\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "envs\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "envs\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "envs\lib\site-packages\setuptools\command\install.py", line 67, in run
self.do_egg_install()
File "envs\lib\site-packages\setuptools\command\install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "envs\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "envs\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "envs\lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "envs\lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "envs\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "envs\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "envs\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "envs\lib\distutils\command\install_lib.py", line 107, in build
self.run_command('build_ext')
File "envs\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "envs\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "envs\lib\site-packages\setuptools\command\build_ext.py", line 79, in run
_build_ext.run(self)
File "envs\lib\distutils\command\build_ext.py", line 340, in run
self.build_extensions()
File "envs\lib\site-packages\torch\utils\cpp_extension.py", line 670, in build_extensions
build_ext.build_extensions(self)
File "envs\lib\distutils\command\build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "envs\lib\distutils\command\build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "envs\lib\site-packages\setuptools\command\build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "envs\lib\distutils\command\build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "envs\lib\site-packages\torch\utils\cpp_extension.py", line 643, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "envs\lib\site-packages\torch\utils\cpp_extension.py", line 1250, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "envs\lib\site-packages\torch\utils\cpp_extension.py", line 1555, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
在PowerShell里看这些东西真的是头大,不知哪里是开始,所有字符都一个颜色,虽然我有简单的在VS里编译C++的经验,还是楞没意识到该怎么在这一堆东西里把error找出来,反而是对着python的Traceback研究半天。直到我意识到这是编译过程,能打断这个过程的只有error:
error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized
于是百度了一下
error: member “torch::jit::ProfileOptionalOp::Kind“ may not be initialized_xiongxyowo的博客-CSDN博客
于是就搞定了。
安装结束
累觉不爱,本来想把所有的example跑一遍的,只跑了第一个海豚,准备以后再慢慢研究吧。