ERROR

[PyTorch/CUDA] error: subprocess-exited-with-error ~ ModuleNotFoundError: No module named 'torch'

alsruds 2024. 10. 15. 17:41

 

지금부터 오류의 향연에 대한 여정이 시작되는데요.. ✈️
미리 이야기 해드리자면, pytorch - cuda - nvidia driver 버전이 맞아야 합니다

< 현재 환경 >
OS : Windows 11
GPU 모델 : GeForce MX250
pip 가상 환경

 


해결 방법 정리

1. PyTorch 버전 확인 : pip show torch
2. CUDA 버전 확인 : nvcc --version
3. Nvidia Driver 버전 확인 : nvidia-smi
4. PyTorch 공식 홈페이지에서 호환되는 버전 확인하기 - CUDA 버전에 맞는 PyTorch 를 설치해야 한다

 

해결 과정 정리

1. pip install 'torch-sparse' → No module named 'torch' 오류 발생
2. CUDA & Nvidia Driver 설치
3. 호환되는 CUDA 버전에 맞는 PyTorch 재설치
4. 해결

 


 

📎 Chapter 1. No module named 'torch' 오류 발생 

 

RecBole 데이터셋 훈련 좀 시키려는데.... 이런 에러가 났다.

$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch_geo
metric\typing.py:124: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: [WinError 127] 지정된 프로시저를 찾을 수 없습니다 
  warnings.warn(f"An issue occurred while importing 'torch-sparse'. "
Traceback (most recent call last):
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
    run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
    config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
    super(Config, self).__init__(model, dataset, config_file_list, config_dict)
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
    self.model, self.model_class, self.dataset = self._get_model_and_dataset(
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
    final_model_class = get_model(final_model)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 77, in get_model
    if importlib.util.find_spec(module_path, __name__):
  File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
    parent = __import__(parent_name, fromlist=['__path__'])
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\general_recommender\__init__.py", line 1, in <module>
    from recbole_gnn.model.general_recommender.lightgcn import LightGCN
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\general_recommender\lightgcn.py", line 23, in <module>
    from recbole_gnn.model.layers import LightGCNConv
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\model\layers.py", line 5, in <module>
    from torch_sparse import matmul
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch_sparse\__init__.py", line 19, in <module>
    torch.ops.load_library(spec.origin)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_ops.py", line 1295, in load_library
    ctypes.CDLL(path)
  File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 127] 지정된 프로시저를 찾을 수 없습니다

 

 

잘 읽어보니 An issue occurred while importing 'torch-sparse'

그래서 torch-sparse 설치를 해주려는데....

$ pip install torch-sparse
Collecting torch-sparse
  Downloading torch_sparse-0.6.18.tar.gz (209 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module> 
          main()
        File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main     
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-pack
ages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 332, in get_requires_for_build_wheel  
          return self._get_build_requires(config_settings, requirements=[])
        File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 302, in _get_build_requires
          self.run_setup()
        File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 503, in run_setup
          super().run_setup(setup_script=setup_script)
        File "C:\Users\alsrud\AppData\Local\Temp\pip-build-env-_mc3l5b2\overlay\Lib\site-packages\setuptools\build_meta.py", line 318, in run_setup
          exec(code, locals())
        File "<string>", line 8, in <module>
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

 

 

나 torch 있는데 ModuleNotFoundError : No mudule named 'torch' 란다

인증

ㅎㅎ

 

 

📎 Chapter2. CUDA & Nvidia Driver 설치 

 

수많은..... StackOverflow 와... GitHub 코멘트들을 거쳐.... 이상한 중국 사이트도 들렀다가 (해결이 전혀 안돼 전혀)

 

새로운 사실을 알게 되었다

뭐? 쿠다랑 드라이버가 있어야 한다고? 

 

- Nvidia Driver 를 설치하려면 현재 사용하고 있는 GPU 모델을 알아야 한다

- CUDA 를 설치하려면 현재 사용하고 있는 OS 를 알아야 한다

 

< Nvidia Driver 설치 >

1. 사용하고 있는 GPU 모델 확인하기

장치 관리자 > 디스플레이 어댑터

 

(+) command 로 확인하는 법

>> nvidia-smi --query | fgrep 'Product Name'

 

 

2. 나의 환경에 맞는 드라이버 설치

 

 

Start Search 를 클릭하면 여러 가지 종류가 나온다

다들 이 공식 문서를 보고 버전을 맞춘다는데, 대충 제일 최신 거 설치하면 CUDA 다 쓸 수 있는 거 잖아?

나는 565.90 설치했다

 

 

3. 설치 확인

>> nvidia-smi

여기서 나오는 CUDA Version 은 현재 드라이버에 추천하는 CUDA 버전이라 한다.

 

 

< CUDA 설치 >

1. 본인 운영 체제 확인

시스템 정보 > 시스템 요약

 

2. 쿠다 설치

다들 이걸 보고 뭔갈 안다던데, 난 뭔소린지 이해 못해서 아무거나 설치했다

처음에 12.6 설치했는데 CUDA 12.6 은 아직 호환 버전이 아니라서 삭제하고 12.1 로 다시 설치했다.

 

3. 설치 확인 : nvcc --version

처음에 12.6 설치했던 거

 

 

📎 Chapter3. PyTorch - CUDA 버전 맞추기

 

PyTorch 공식 문서에 들어가 줍니다

본인의 PyTorch 버전 & CUDA 버전에 맞는 command 확인하기

 

내 기존 torch 버전 : 2.4.1

>> pip show torch

 

충격 사실 pytorch 는 cuda 랑 사용할 수 있는 최대 버전이 지금 2.4.0 이다.

 

당장 지워요

>> pip uninstall torch torchvision torchaudio

 

공식 문서에 나와 있는대로 버전 맞춰 다시 설치!!!

>> pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

 

조심스레 torch-scatter 설치해본다

아 진짜 미쳤다

 

torch-sparse

 

torch-cluster

 

torch-spline-conv

 

torch-geometric

- 이미 있었다

 

 

📎 Chapter 4. 해결 후 다시 도전

 

>> python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'

$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 3, in <module>
    from recbole_gnn.quick_start import run_recbole_gnn
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 3, in <module>
    from recbole.utils import init_logger, init_seed, set_color
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\__init__.py", line 1, in <module>
    from recbole.utils.logger import init_logger, set_color
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\logger.py", line 26, in <module>
    from recbole.utils.utils import get_local_time, ensure_dir
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 23, in <module>
    import torch
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\__init__.py", line 2120, in <module>
    from torch._higher_order_ops import cond
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_higher_order_ops\__init__.py", line 1, in <module>
    from .cond import cond
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_higher_order_ops\cond.py", line 5, in <module>
    import torch._subclasses.functional_tensor
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py", line 42, in <module>
    class FunctionalTensor(torch.Tensor):
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py", line 258, in FunctionalTensor
    cpu = _conversion_method_template(device=torch.device("cpu"))
C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\venv\lib\site-packages\torch\_subclasses\functional_tensor.py:258: UserWarning: Failed to initialize NumPy:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

 (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:84.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
    run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
    config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
    super(Config, self).__init__(model, dataset, config_file_list, config_dict)
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
    self.model, self.model_class, self.dataset = self._get_model_and_dataset(
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
    final_model_class = get_model(final_model)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 82, in get_model
    model_class = get_recbole_model(model_name)
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 76, in get_model
    if importlib.util.find_spec(module_path, __name__):
  File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
    parent = __import__(parent_name, fromlist=['__path__'])
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\__init__.py", line 1, in <module>
    from recbole.model.exlib_recommender.lightgbm import LightGBM
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\lightgbm.py", line 11, in <module>
    import lightgbm as lgb
ModuleNotFoundError: No module named 'lightgbm'

 

응 아직 안돼~

 

1. numpy 버전 낮춰.

2. lightgbm 설치해.

 

 

 

이젠 좀 돼라 ;

$ python RecBole-GNN/run_recbole_gnn.py --model='NGCF' --dataset='Food'
Traceback (most recent call last):
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\run_recbole_gnn.py", line 15, in <module>
    run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\quick_start.py", line 20, in run_recbole_gnn
    config = Config(model=model, dataset=dataset, config_file_list=config_file_list, config_dict=config_dict)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 22, in __init__
    super(Config, self).__init__(model, dataset, config_file_list, config_dict)     
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\config\configurator.py", line 88, in __init__
    self.model, self.model_class, self.dataset = self._get_model_and_dataset(       
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\config.py", line 50, in _get_model_and_dataset
    final_model_class = get_model(final_model)
  File "C:\Users\alsrud\Downloads\Babal-Server\Babal-Server\RecBole\RecBole-GNN\recbole_gnn\utils.py", line 82, in get_model
    model_class = get_recbole_model(model_name)
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\utils\utils.py", line 76, in get_model
    if importlib.util.find_spec(module_path, __name__):
  File "C:\Users\alsrud\AppData\Local\Programs\Python\Python39\lib\importlib\util.py", line 94, in find_spec
    parent = __import__(parent_name, fromlist=['__path__'])
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\__init__.py", line 2, in <module>
    from recbole.model.exlib_recommender.xgboost import XGBoost
  File "c:\users\alsrud\downloads\babal-server\babal-server\recbole\recbole\model\exlib_recommender\xgboost.py", line 11, in <module>
    import xgboost as xgb
ModuleNotFoundError: No module named 'xgboost'

 

 

누가 이기나 보자고

 


 

 

 

훈련이

시작되었다

..

미 친

 

 

해결 완료.