지금부터 오류의 향연에 대한 여정이 시작되는데요.. ✈️
미리 이야기 해드리자면, pytorch - cuda - nvidia driver 버전이 맞아야 합니다
< 현재 환경 >
OS : Windows 11
GPU 모델 : GeForce MX250
pip 가상 환경
해결 방법 정리
1. PyTorch 버전 확인 : pip show torch
2. CUDA 버전 확인 : nvcc --version
3. Nvidia Driver 버전 확인 : nvidia-smi
4. PyTorch 공식 홈페이지에서 호환되는 버전 확인하기 - CUDA 버전에 맞는 PyTorch 를 설치해야 한다
해결 과정 정리
1. pip install 'torch-sparse' → No module named 'torch' 오류 발생
2. CUDA & Nvidia Driver 설치
3. 호환되는 CUDA 버전에 맞는 PyTorch 재설치
4. 해결
📎 Chapter 1. No module named 'torch' 오류 발생
RecBole 데이터셋 훈련 좀 시키려는데.... 이런 에러가 났다.
$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch_geo
metric\typing.py:124: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: [WinError 127] 지정된 프로시저를 찾을 수 없습니다
warnings.warn(f"An issue occurred while importing 'torch-sparse'. "
Traceback (most recent call last):
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
super(Config, self).__init__(model, dataset, config_file_list, config_dict)
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
self.model, self.model_class, self.dataset = self._get_model_and_dataset(
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
final_model_class = get_model(final_model)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 77, in get_model
if importlib.util.find_spec(module_path, __name__):
File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
parent = __import__(parent_name, fromlist=['__path__'])
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\general_recommender\__init__.py", line 1, in <module>
from recbole_gnn.model.general_recommender.lightgcn import LightGCN
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\general_recommender\lightgcn.py", line 23, in <module>
from recbole_gnn.model.layers import LightGCNConv
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\layers.py", line 5, in <module>
from torch_sparse import matmul
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch_sparse\__init__.py", line 19, in <module>
torch.ops.load_library(spec.origin)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_ops.py", line 1295, in load_library
ctypes.CDLL(path)
File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: [WinError 127] 지정된 프로시저를 찾을 수 없습니다
잘 읽어보니 An issue occurred while importing 'torch-sparse'
그래서 torch-sparse 설치를 해주려는데....
$ pip install torch-sparse
Collecting torch-sparse
Downloading torch_sparse-0.6.18.tar.gz (209 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
main()
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-pack
ages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 332, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 302, in _get_build_requires
self.run_setup()
File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 503, in run_setup
super().run_setup(setup_script=setup_script)
File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 318, in run_setup
exec(code, locals())
File "<string>", line 8, in <module>
ModuleNotFoundError: No module named 'torch'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
나 torch 있는데 ModuleNotFoundError : No mudule named 'torch' 란다
인증
📎 Chapter2. CUDA & Nvidia Driver 설치
수많은..... StackOverflow 와... GitHub 코멘트들을 거쳐.... 이상한 중국 사이트도 들렀다가 (해결이 전혀 안돼 전혀)
새로운 사실을 알게 되었다
뭐? 쿠다랑 드라이버가 있어야 한다고?
- Nvidia Driver 를 설치하려면 현재 사용하고 있는 GPU 모델을 알아야 한다
- CUDA 를 설치하려면 현재 사용하고 있는 OS 를 알아야 한다
< Nvidia Driver 설치 >
1. 사용하고 있는 GPU 모델 확인하기
장치 관리자 > 디스플레이 어댑터
(+) command 로 확인하는 법
>> nvidia-smi --query | fgrep 'Product Name'
2. 나의 환경에 맞는 드라이버 설치
Start Search 를 클릭하면 여러 가지 종류가 나온다
다들 이 공식 문서를 보고 버전을 맞춘다는데, 대충 제일 최신 거 설치하면 CUDA 다 쓸 수 있는 거 잖아?
나는 565.90 설치했다
3. 설치 확인
>> nvidia-smi
< CUDA 설치 >
1. 본인 운영 체제 확인
시스템 정보 > 시스템 요약
2. 쿠다 설치
다들 이걸 보고 뭔갈 안다던데, 난 뭔소린지 이해 못해서 아무거나 설치했다
처음에 12.6 설치했는데 CUDA 12.6 은 아직 호환 버전이 아니라서 삭제하고 12.1 로 다시 설치했다.
3. 설치 확인 : nvcc --version
📎 Chapter3. PyTorch - CUDA 버전 맞추기
PyTorch 공식 문서에 들어가 줍니다
본인의 PyTorch 버전 & CUDA 버전에 맞는 command 확인하기
내 기존 torch 버전 : 2.4.1
>> pip show torch
충격 사실 pytorch 는 cuda 랑 사용할 수 있는 최대 버전이 지금 2.4.0 이다.
당장 지워요
>> pip uninstall torch torchvision torchaudio
공식 문서에 나와 있는대로 버전 맞춰 다시 설치!!!
>> pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
조심스레 torch-scatter 설치해본다
아 진짜 미쳤다
torch-sparse
torch-cluster
torch-spline-conv
torch-geometric
- 이미 있었다
📎 Chapter 4. 해결 후 다시 도전
>> python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 3, in <module>
from recbole_gnn.quick_start import run_recbole_gnn
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 3, in <module>
from recbole.utils import init_logger, init_seed, set_color
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\__init__.py", line 1, in <module>
from recbole.utils.logger import init_logger, set_color
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\logger.py", line 26, in <module>
from recbole.utils.utils import get_local_time, ensure_dir
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 23, in <module>
import torch
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\__init__.py", line 2120, in <module>
from torch._higher_order_ops import cond
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_higher_order_ops\__init__.py", line 1, in <module>
from .cond import cond
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_higher_order_ops\cond.py", line 5, in <module>
import torch._subclasses.functional_tensor
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py", line 42, in <module>
class FunctionalTensor(torch.Tensor):
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py", line 258, in FunctionalTensor
cpu = _conversion_method_template(device=torch.device("cpu"))
C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py:258: UserWarning: Failed to initialize NumPy:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
(Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
super(Config, self).__init__(model, dataset, config_file_list, config_dict)
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
self.model, self.model_class, self.dataset = self._get_model_and_dataset(
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
final_model_class = get_model(final_model)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 82, in get_model
model_class = get_recbole_model(model_name)
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 76, in get_model
if importlib.util.find_spec(module_path, __name__):
File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
parent = __import__(parent_name, fromlist=['__path__'])
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\__init__.py", line 1, in <module>
from recbole.model.exlib_recommender.lightgbm import LightGBM
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\lightgbm.py", line 11, in <module>
import lightgbm as lgb
ModuleNotFoundError: No module named 'lightgbm'
응 아직 안돼~
1. numpy 버전 낮춰.
2. lightgbm 설치해.
이젠 좀 돼라 ;
$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
Traceback (most recent call last):
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
super(Config, self).__init__(model, dataset, config_file_list, config_dict)
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
self.model, self.model_class, self.dataset = self._get_model_and_dataset(
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
final_model_class = get_model(final_model)
File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 82, in get_model
model_class = get_recbole_model(model_name)
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 76, in get_model
if importlib.util.find_spec(module_path, __name__):
File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
parent = __import__(parent_name, fromlist=['__path__'])
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\__init__.py", line 2, in <module>
from recbole.model.exlib_recommender.xgboost import XGBoost
File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\xgboost.py", line 11, in <module>
import xgboost as xgb
ModuleNotFoundError: No module named 'xgboost'
누가 이기나 보자고
훈련이
시작되었다
..
미 친
해결 완료.