Difference between revisions of "K-Scale Cluster"
(→Onboarding) |
|||
Line 35: | Line 35: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | You need to restart | + | You need to restart <code>ssh</code> to get it working. |
− | |||
− | |||
− | Please inform us if you have any issues | + | After setting this up, you can use the command <code>ssh cluster</code> to directly connect. |
+ | |||
+ | You can also access via VS Code. Tutorial of using <code>ssh</code> in VS Code is [https://code.visualstudio.com/docs/remote/ssh-tutorial here]. | ||
+ | |||
+ | Please inform us if you have any issues! | ||
=== Notes === | === Notes === |
Revision as of 21:11, 27 April 2024
The K-Scale Labs cluster is a shared cluster for robotics research. This page contains notes on how to access the cluster.
Onboarding
To get onboarded, you should send us the public key that you want to use and maybe your preferred username.
After being onboarded, you should receive the following information:
- Your user ID (for this example, we'll use
stompy
) - The jumphost ID (for this example, we'll use
127.0.0.1
) - The cluster ID (for this example, we'll use
127.0.0.2
)
To connect, you should be able to use the following command:
ssh -o ProxyCommand="ssh -i ~/.ssh/id_rsa -W %h:%p stompy@127.0.0.1" stompy@127.0.0.2 -i ~/.ssh/id_rsa
Note that ~/.ssh/id_rsa
should point to your private key file.
Alternatively, you can add the following to your SSH config file, which should allow you to connect directly,
Use your favorite editor to open the ssh config file (normally located at ~/.ssh/config
for Ubuntu) and paste the text:
Host jumphost
User stompy
Hostname 127.0.0.1
IdentityFile ~/.ssh/id_rsa
Host cluster
User stompy
Hostname 127.0.0.2
ProxyJump jumphost
IdentityFile ~/.ssh/id_rsa
You need to restart ssh
to get it working.
After setting this up, you can use the command ssh cluster
to directly connect.
You can also access via VS Code. Tutorial of using ssh
in VS Code is here.
Please inform us if you have any issues!
Notes
- You may be sharing your part of the cluster with other users. If so, it is a good idea to avoid using all the GPUs. If you're training models in PyTorch, you can do this using the
CUDA_VISIBLE_DEVICES
command. - You should avoid storing data files and model checkpoints to your root directory. Instead, use the
/ephemeral
directory. Your home directory should come with a symlink to a subdirectory which you have write access to.