Humanoid Robots Wiki - User contributions [en]

Dennis' Optimization Notes

2024-05-25T04:57:12Z

Dennisc: Initial notes on optimization

Notes of various riffs on Gradient Descent from a perspective of neural networks.

[[Category: Gradient Descent]]


== A review of standard Gradient Descent ==

The goal of Gradient Descent is to minimize a loss function <math display="inline">L</math>. To be more specific, if <math display="inline">L : \mathbb R^n \to \mathbb R</math> is a differentiable multivariate function, we want to find the vector <math display="inline">w</math> that minimizes <math display="inline">L(w)</math>.

Given an initial vector <math display="inline">w_0</math>, we want to “move” in the direction <math display="inline">\Delta w</math> where <math display="inline">L(w_0) - L(w_0 + \Delta w)</math> is minimized (suppose the magnitude of <math display="inline">\Delta w</math> is fixed). By Cauchy’s Inequality, this is precisely when <math display="inline">\Delta w</math> is in the direction of <math display="inline">-\nabla L(w_0)</math>.

So given some <math display="inline">w_n</math>, we want to update in the direction of <math display="inline">-\alpha \nabla L(w_n)</math>. This motivates setting <math display="inline">w_{n+1} = w_n - \alpha \nabla L(w_n)</math>, where <math display="inline">\alpha</math> is a scalar factor. We call <math display="inline">\alpha</math> the “learning rate” because it affects how fast the series <math display="inline">w_n</math> converges to the optimum. The main trouble in machine learning is to tweak the <math display="inline">\alpha</math> to what “works best” in ensuring convergence, and that is one of the considerations that the remaining algorithms try to address.


== Stochastic Gradient Descent ==

In practice we don’t actually know the “true gradient”. So instead we take some datasets, say datasets <math display="inline">1</math> through <math display="inline">n</math>, and for dataset <math display="inline">i</math> we derive an estimated gradient <math display="inline">\nabla L_i</math>. Then we may estimate <math display="inline">\nabla L</math> as

<math display="block">\frac{\nabla L_1 + \cdots + \nabla L_n}{n}.</math>

If it is easy to compute <math display="inline">\nabla L_i(w)</math> in general then we are golden: this is the best estimate of <math display="inline">L</math> we can get. But what if <math display="inline">\nabla L_i</math> are computationally expensive to compute? Then there is a tradeoff between variance and computational cost when evaluating our estimate of <math display="inline">\nabla L</math>.

A very low-cost (but low-accuracy) way to estimate <math display="inline">\nabla L</math> is just via <math display="inline">\nabla L_1</math> (or any other <math display="inline">\nabla L_i</math>). But this is obviously problematic: we aren’t even using most of our data! A better balance can be struck as follows: to evaluate <math display="inline">\nabla L(w_n)</math>, select <math display="inline">k</math> functions at random from <math display="inline">\{\nabla L_1, \ldots, \nabla L_n\}</math>. Then estimate <math display="inline">\nabla L</math> as the average of those <math display="inline">k</math> functions ''only at that step''.


== Riffs on stochastic gradient descent ==


=== Momentum ===

See also [https://distill.pub/2017/momentum/ “Momentum” on Distill].

In typical stochastic gradient descent, the next step we take is based solely on the gradient at the current point. This completely ignores the past gradients. However, many times it makes sense to take the past gradients into account. Of course, if we are at <math display="inline">w_{100}</math>, we should care about <math display="inline">\nabla L(w_{99})</math> much heavier than <math display="inline">\nabla L(w_1)</math>. So we should weight <math display="inline">\nabla L(w_{99})</math> much more than <math display="inline">\nabla L(w_1)</math>.

The simplest way is to weight it with a geometric approach. So when we iterate, instead of taking <math display="inline">w_{n+1}</math> to satisfy

<math display="block">w_{n+1} - w_n = -\alpha \nabla L(w_n)</math>

like in standard gradient descent, we instead want to take <math display="inline">w_{n+1}</math> to satisfy

<math display="block">w_{n+1} - w_n = -\alpha \nabla L(w_n) - \beta \alpha \nabla L(w_{n-1}) - \cdots - \beta^n \nabla L(w_0).</math>

But this raises a concern: are we really going to be storing all of these terms, especially as <math display="inline">n</math> grows? Fortunately, we do not need to. For we may notice that

<math display="block">w_{n+1} - w_n = -\alpha \nabla L(w_n) - \beta (\alpha \nabla L(w_{n-1}) - \cdots - \beta^{n-1} L(w_0)) = -\alpha \nabla L(w_n) - \beta(w_n - w_{n-1}).</math>

To put it another way, if we write <math display="inline">w_n - w_{n-1} = \Delta w_n</math>, i.e. how much <math display="inline">w_n</math> differs from <math display="inline">w_{n-1}</math> by, we may rewrite this equation as

<math display="block">\Delta w_{n+1} = -\alpha \nabla L(w_n) + \beta \Delta w_n.</math>

Some of the benefits of using a momentum based approach:

* most importantly, ''it can dramatically speed up convergence to a local minimum''.
* it makes convergence more likely in general
* escaping local minima/saddles/plateaus (its importance is possibly contested? See [https://www.reddit.com/r/MachineLearning/comments/dqbp9g/d_momentum_methods_helps_to_escape_local_minima/ this reddit thread])


=== RMSProp ===

Gradient descent also often has diminishing learning rates. In order to counter this, we very broadly want to - track the past learning rates, - and if they have been low, multiply <math display="inline">\Delta w_{n+1}</math> by a scalar to increase the learning rate. - (As a side effect, if our past learning rates are quite high, we will tamper the learning rates.)

While performing our gradient descent to get <math display="inline">w_n \to w_{n+1}</math>, we create and store an auxillary parameter <math display="inline">v_{n+1}</math> as follows:

<math display="block">v_{n+1} = \beta v_n + (1 - \beta) \nabla L(w)^2</math>

and define

<math display="block">w_{n+1} = w_n - \frac{\alpha}{\sqrt{v_n} + \epsilon} L(w),</math>

where <math display="inline">\alpha</math> as usual is the learning rate, <math display="inline">\beta</math> is the decay rate of <math display="inline">v_n</math>, and <math display="inline">\epsilon</math> is a constant that also needs to be fine-tuned.

We include the constant term of <math display="inline">\epsilon</math> in order to ensure that the sequence <math display="inline">w_n</math> actually converges and to ensure numerical stability. If we are near the minimum, then <math display="inline">v_n</math> will be quite small, meaning the denominator <math display="inline">\sqrt{v_n} + \epsilon</math> will essentially just become <math display="inline">\sqrt{v_n}</math>. But because <math display="inline">w</math> will converge when <math display="inline">L(w)</math> is just multiplied by a constant (this is the underlying assumption of standard gradient descent, after all), we will achieve convergence when near a minimum.

Side note: in order to get RMSProp to interoperate with stochastic gradient descent, we instead compute the sequence <math display="inline">v_n</math> for each approximated loss function <math display="inline">L_i</math>.


=== Adam ===

Adam ('''Ada'''ptive '''M'''oment Estimation) is a gradient descent modification that combines Momentum and RMSProp. We create two auxillary variables while iterating <math display="inline">w_n</math> (where <math display="inline">\alpha</math> is the learning rate, <math display="inline">\beta_1</math> and <math display="inline">\beta_2</math> are decay parameters that need to be fine-tuned, and <math display="inline">\epsilon</math> is a parameter serving the same purpose as in RMSProp):

<math display="block">m_{n+1} = \beta_1 m_n + (1 - \beta_1) \nabla L(w_n)</math>

<math display="block">v_{n+1} = \beta_2 v_n + (1 - \beta_2) \nabla L(w_n)^2.</math>

For notational convenience, we will define

<math display="block">\widehat{m}_n = \frac{m_n}{1 - \beta_1^n}</math>

<math display="block">\widehat{v}_n = \frac{v_n}{1 - \beta_2^n}.</math>

Then our update function to get <math display="inline">w_{n+1}</math> is

<math display="block">w_{n+1} = w_n - \alpha \frac{\widehat{m}_n}{\sqrt{\widehat{v}_w} + \epsilon}.</math>

It is worth noting that though this formula does not explicitly include <math display="inline">\nabla L(w_n)</math>, it is accounted for in the <math display="inline">\widehat{m}_n</math> term through <math display="inline">m_n</math>.

Getting Started with Machine Learning

2024-05-20T03:33:05Z

Dennisc: Explain uv venv --python 3.11 flag and why it is important/why one might want to use it.

This is [[User:Ben]]'s guide to getting started with machine learning.

=== Dependencies ===

Here's some useful dependencies that I use:

* [https://astral.sh/blog/uv uv]
** This is similar to Pip but written in Rust and is way faster
** It has nice management of virtual environments
** Can use Conda instead but it is much slower
* [https://github.com/features/copilot Github Copilot]
* [https://github.com/kscalelabs/mlfab mlfab]
** This is a Python package I made to help make it easy to quickly try out machine learning ideas in PyTorch
* Coding tools
** [https://mypy-lang.org/ mypy] static analysis
** [https://github.com/psf/black black] code formatter
** [https://docs.astral.sh/ruff/ ruff] alternative to flake8

==== uv ====

To install <code>uv</code> on the K-Scale clusters, run

<syntaxhighlight lang="bash">
curl -LsSf https://astral.sh/uv/install.sh | sh
</syntaxhighlight>

To get started with <code>uv</code>, pick a directory you want your virtual environment to live in. (<code>$HOME</code> is not recommended.) Once you have <code>cd</code>ed there, run

<syntaxhighlight lang="bash">
uv venv
</syntaxhighlight>

'''If you are on the clusters''', you instead may want to run

<syntaxhighlight lang="bash">
uv venv --python 3.11
</syntaxhighlight>

to ensure that the virtual environment uses Python 3.11. This is because by default, uv uses the system's version of Python (whatever the result of <code>which python</code> yields), and the clusters are running Python 3.10.12. (Python 3.11 will be useful because various projects, including the starter project, will require Python 3.11.)

To activate your virtual environment, run

<syntaxhighlight lang="bash">
source .venv/bin/activate
</syntaxhighlight>

'''while in the directory you created your <code>.venv</code> in'''.

=== Installing Starter Project ===

* Go to [https://github.com/kscalelabs/getting-started this project] and install it

==== Opening the project in VSCode ====

* Create a VSCode config file that looks something like this:

<syntaxhighlight lang="json">
{
"folders": [
{
"name": "Getting Started",
"path": "/home/ubuntu/Github/getting_started"
},
{
"name": "Workspaces",
"path": "/home/ubuntu/.code-workspaces"
}
],
"settings": {
"cmake.configureSettings": {
"CMAKE_CUDA_COMPILER": "/usr/bin/nvcc",
"CMAKE_PREFIX_PATH": [
"/home/ubuntu/.virtualenvs/getting-started/lib/python3.11/site-packages/torch/share/cmake"
],
"PYTHON_EXECUTABLE": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"TORCH_CUDA_ARCH_LIST": "'8.0'"
},
"python.defaultInterpreterPath": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"ruff.path": [
"/home/ubuntu/.virtualenvs/getting-started/bin/ruff"
]
}
}
</syntaxhighlight>

* Install the [https://code.visualstudio.com/docs/remote/ssh VSCode SSH extension]
* SSH into the cluster (see [[K-Scale Cluster]] for instructions)
* Open the workspace that you created in VSCode

=== Useful Brain Dump Stuff ===

* Use <code>breakpoint()</code> to debug code
* Check out the [https://github.com/kscalelabs/mlfab/tree/master/examples mlfab examples directory] for some ideas
* It is a good idea to try to write the full training loop yourself to figure out what's going on
* Run <code>nvidia-smi</code> to see the GPUs and their statuses/any active processes

Getting Started with Machine Learning

2024-05-20T03:23:58Z

Dennisc: How to install uv on cluster

This is [[User:Ben]]'s guide to getting started with machine learning.

=== Dependencies ===

Here's some useful dependencies that I use:

* [https://astral.sh/blog/uv uv]
** This is similar to Pip but written in Rust and is way faster
** It has nice management of virtual environments
** Can use Conda instead but it is much slower
* [https://github.com/features/copilot Github Copilot]
* [https://github.com/kscalelabs/mlfab mlfab]
** This is a Python package I made to help make it easy to quickly try out machine learning ideas in PyTorch
* Coding tools
** [https://mypy-lang.org/ mypy] static analysis
** [https://github.com/psf/black black] code formatter
** [https://docs.astral.sh/ruff/ ruff] alternative to flake8

==== uv ====

To install <code>uv</code> on the K-Scale clusters, run

<syntaxhighlight lang="bash">
curl -LsSf https://astral.sh/uv/install.sh | sh
</syntaxhighlight>

To get started with <code>uv</code>, pick a directory you want your virtual environment to live in. (<code>$HOME</code> is not recommended.) Once you have <code>cd</code>ed there, run

<syntaxhighlight lang="bash">
uv venv
</syntaxhighlight>

To activate your virtual environment, run

<syntaxhighlight lang="bash">
source .venv/bin/activate
</syntaxhighlight>

'''while in the directory you created your <code>.venv</code> in'''.

=== Installing Starter Project ===

* Go to [https://github.com/kscalelabs/getting-started this project] and install it

==== Opening the project in VSCode ====

* Create a VSCode config file that looks something like this:

<syntaxhighlight lang="json">
{
"folders": [
{
"name": "Getting Started",
"path": "/home/ubuntu/Github/getting_started"
},
{
"name": "Workspaces",
"path": "/home/ubuntu/.code-workspaces"
}
],
"settings": {
"cmake.configureSettings": {
"CMAKE_CUDA_COMPILER": "/usr/bin/nvcc",
"CMAKE_PREFIX_PATH": [
"/home/ubuntu/.virtualenvs/getting-started/lib/python3.11/site-packages/torch/share/cmake"
],
"PYTHON_EXECUTABLE": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"TORCH_CUDA_ARCH_LIST": "'8.0'"
},
"python.defaultInterpreterPath": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"ruff.path": [
"/home/ubuntu/.virtualenvs/getting-started/bin/ruff"
]
}
}
</syntaxhighlight>

* Install the [https://code.visualstudio.com/docs/remote/ssh VSCode SSH extension]
* SSH into the cluster (see [[K-Scale Cluster]] for instructions)
* Open the workspace that you created in VSCode

=== Useful Brain Dump Stuff ===

* Use <code>breakpoint()</code> to debug code
* Check out the [https://github.com/kscalelabs/mlfab/tree/master/examples mlfab examples directory] for some ideas
* It is a good idea to try to write the full training loop yourself to figure out what's going on
* Run <code>nvidia-smi</code> to see the GPUs and their statuses/any active processes

Getting Started with Machine Learning

2024-05-20T03:20:08Z

Dennisc: fix bold

This is [[User:Ben]]'s guide to getting started with machine learning.

=== Dependencies ===

Here's some useful dependencies that I use:

* [https://astral.sh/blog/uv uv]
** This is similar to Pip but written in Rust and is way faster
** It has nice management of virtual environments
** Can use Conda instead but it is much slower
* [https://github.com/features/copilot Github Copilot]
* [https://github.com/kscalelabs/mlfab mlfab]
** This is a Python package I made to help make it easy to quickly try out machine learning ideas in PyTorch
* Coding tools
** [https://mypy-lang.org/ mypy] static analysis
** [https://github.com/psf/black black] code formatter
** [https://docs.astral.sh/ruff/ ruff] alternative to flake8

==== uv ====

To get started with <code>uv</code>, pick a directory you want your virtual environment to live in. (<code>$HOME</code> is not recommended.) Once you have <code>cd</code>ed there, run

<syntaxhighlight lang="bash">
uv venv
</syntaxhighlight>

To activate your virtual environment, run

<syntaxhighlight lang="bash">
source .venv/bin/activate
</syntaxhighlight>

'''while in the directory you created your <code>.venv</code> in'''.

=== Installing Starter Project ===

* Go to [https://github.com/kscalelabs/getting-started this project] and install it

==== Opening the project in VSCode ====

* Create a VSCode config file that looks something like this:

<syntaxhighlight lang="json">
{
"folders": [
{
"name": "Getting Started",
"path": "/home/ubuntu/Github/getting_started"
},
{
"name": "Workspaces",
"path": "/home/ubuntu/.code-workspaces"
}
],
"settings": {
"cmake.configureSettings": {
"CMAKE_CUDA_COMPILER": "/usr/bin/nvcc",
"CMAKE_PREFIX_PATH": [
"/home/ubuntu/.virtualenvs/getting-started/lib/python3.11/site-packages/torch/share/cmake"
],
"PYTHON_EXECUTABLE": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"TORCH_CUDA_ARCH_LIST": "'8.0'"
},
"python.defaultInterpreterPath": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"ruff.path": [
"/home/ubuntu/.virtualenvs/getting-started/bin/ruff"
]
}
}
</syntaxhighlight>

* Install the [https://code.visualstudio.com/docs/remote/ssh VSCode SSH extension]
* SSH into the cluster (see [[K-Scale Cluster]] for instructions)
* Open the workspace that you created in VSCode

=== Useful Brain Dump Stuff ===

* Use <code>breakpoint()</code> to debug code
* Check out the [https://github.com/kscalelabs/mlfab/tree/master/examples mlfab examples directory] for some ideas
* It is a good idea to try to write the full training loop yourself to figure out what's going on
* Run <code>nvidia-smi</code> to see the GPUs and their statuses/any active processes

Getting Started with Machine Learning

2024-05-20T03:19:45Z

Dennisc: skeleton of uv

This is [[User:Ben]]'s guide to getting started with machine learning.

=== Dependencies ===

Here's some useful dependencies that I use:

* [https://astral.sh/blog/uv uv]
** This is similar to Pip but written in Rust and is way faster
** It has nice management of virtual environments
** Can use Conda instead but it is much slower
* [https://github.com/features/copilot Github Copilot]
* [https://github.com/kscalelabs/mlfab mlfab]
** This is a Python package I made to help make it easy to quickly try out machine learning ideas in PyTorch
* Coding tools
** [https://mypy-lang.org/ mypy] static analysis
** [https://github.com/psf/black black] code formatter
** [https://docs.astral.sh/ruff/ ruff] alternative to flake8

==== uv ====

To get started with <code>uv</code>, pick a directory you want your virtual environment to live in. (<code>$HOME</code> is not recommended.) Once you have <code>cd</code>ed there, run

<syntaxhighlight lang="bash">
uv venv
</syntaxhighlight>

To activate your virtual environment, run

<syntaxhighlight lang="bash">
source .venv/bin/activate
</syntaxhighlight>

*while in the directory you created your <code>.venv</code> in*.

=== Installing Starter Project ===

* Go to [https://github.com/kscalelabs/getting-started this project] and install it

==== Opening the project in VSCode ====

* Create a VSCode config file that looks something like this:

<syntaxhighlight lang="json">
{
"folders": [
{
"name": "Getting Started",
"path": "/home/ubuntu/Github/getting_started"
},
{
"name": "Workspaces",
"path": "/home/ubuntu/.code-workspaces"
}
],
"settings": {
"cmake.configureSettings": {
"CMAKE_CUDA_COMPILER": "/usr/bin/nvcc",
"CMAKE_PREFIX_PATH": [
"/home/ubuntu/.virtualenvs/getting-started/lib/python3.11/site-packages/torch/share/cmake"
],
"PYTHON_EXECUTABLE": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"TORCH_CUDA_ARCH_LIST": "'8.0'"
},
"python.defaultInterpreterPath": "/home/ubuntu/.virtualenvs/getting-started/bin/python",
"ruff.path": [
"/home/ubuntu/.virtualenvs/getting-started/bin/ruff"
]
}
}
</syntaxhighlight>

* Install the [https://code.visualstudio.com/docs/remote/ssh VSCode SSH extension]
* SSH into the cluster (see [[K-Scale Cluster]] for instructions)
* Open the workspace that you created in VSCode

=== Useful Brain Dump Stuff ===

* Use <code>breakpoint()</code> to debug code
* Check out the [https://github.com/kscalelabs/mlfab/tree/master/examples mlfab examples directory] for some ideas
* It is a good idea to try to write the full training loop yourself to figure out what's going on
* Run <code>nvidia-smi</code> to see the GPUs and their statuses/any active processes

Getting Started with Machine Learning

2024-05-20T03:11:30Z

Dennisc:

Getting Started with Machine Learning

2024-05-20T03:11:12Z

Dennisc:

User:Dennisc

2024-05-13T02:49:16Z

Dennisc: Created page with "Dennis is a technical intern and a 2nd-year CS/Math student at CMU. {{infobox person | name = Dennis Chen | organization = K-Scale Labs | title = Technical Intern | websi..."

Dennis is a technical intern and a 2nd-year CS/Math student at CMU.

{{infobox person
| name = Dennis Chen
| organization = [[K-Scale Labs]]
| title = Technical Intern
| website_link = https://andrew.cmu.edu/~dennisc2
}}

[[Category: K-Scale Employees]]