Open main menu

Humanoid Robots Wiki β

Changes

K-Scale Cluster

5 bytes added, 27 April
no edit summary
IdentityFile ~/.ssh/id_rsa
</syntaxhighlight>
 
You need to restart <code>ssh</code> to get it working.
After setting this up, you can use the command <code>ssh cluster</code> to directly connect.
=== Notes ===
* You may need to restart <code>ssh</code> to get it working.
* You may be sharing your part of the cluster with other users. If so, it is a good idea to avoid using all the GPUs. If you're training models in PyTorch, you can do this using the <code>CUDA_VISIBLE_DEVICES</code> command.
* You should avoid storing data files and model checkpoints to your root directory. Instead, use the <code>/ephemeral</code> directory. Your home directory should come with a symlink to a subdirectory which you have write access to.
102
edits