Skip to content

Commit 61c76cb

Browse files
Update README.md
1 parent 836f75e commit 61c76cb

File tree

1 file changed

+62
-29
lines changed

1 file changed

+62
-29
lines changed

README.md

Lines changed: 62 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,106 @@
11
# Optimization-Algorithm
2-
For learning OPTAL
2+
For learning, visualizing and understanding the optimization techniques and algorithms.
3+
# ____________________________________________________________________________________________________________
34

4-
Visualization of the Machine Learning optimization algorithms are shown here with example of different functions and with comparison among them.
55

6+
Visualizations and in depth concepts of the Machine Learning optimization algorithms are discussed and shown here with different functions as examples and understanding the differences by comparing them.
7+
## Over all types of Line and direction search algorithms in Gradient Descent discussed here:
8+
- Line search & direction search in Gradient Descent:
9+
- >> Exact Methods:
10+
- > Using derivatives ( for differentiable functions only):
11+
- Newton's method
12+
- Secant Method
13+
- > Without using derivatives (require function evaluation only):
14+
- Golden Section
15+
- Fibonacci
16+
- Bisection
17+
- >> Inexact method (step length found is not exactly optimal ) {these are parameter specific}:
18+
- Armijo
19+
- Wolf-Powell
620

7-
* Line search & direction search in Gradient Descent:
21+
22+
23+
<!-- * Line search & direction search in Gradient Descent:
824
+ Exact Methods:
9-
* Using derivatives ( for differentiable fnctons only):
25+
* Using derivatives ( for differentiable functions only):
1026
* Newton's method
1127
* Secant Method
12-
* Without using derivatives (requre function evaluation only):
28+
* Without using derivatives (require function evaluation only):
1329
* Golden Section
1430
* Fibonacci
1531
* Bisection
16-
* Inexat method (step length found is not exactly optimal ) {these are parameter specific}:
32+
* Inexact method (step length found is not exactly optimal ) {these are parameter specific}:
1733
* Armijo
18-
* Wolf-Powell
34+
* Wolf-Powell -->
1935

20-
Here All optimization techniques are explained through code. You can find the codes inside the ipynb files along with some explaination. The codes are very simple to understand, if the theory is clear to you. ( In future I will put some easy explaination and theory here )
36+
Here All optimization techniques are explained through code. You can find the codes inside the ipynb files along with some explanation. The codes are very simple to understand, if the theory is clear to you. ( In future I will put some easy explanation and theory here )
2137
First go through the file " complete_OPTAL.ipynb" then go through "GD_SGD_MNGD_momentum.ipynb" initially with the pdf. If you want to visualize the pots in other way checkout the "collection_of_plots.ipynb"
2238

39+
40+
### Example of Some Bivariate Functions: :+1:
2341
Here we go,
24-
First some visualization of functions will definitely make you curious to know more about the optimization, so,
25-
look at the function,
42+
First some visualizations of functions will definitely make you curious to know more about the optimization, so,
43+
look at the functions think how to find the minimum starting from a arbitrary point,
2644

2745
$\frac{sin(10(x^2+y^2))}{10}$
2846
![Image of function](Images/cool.png)
2947

30-
We will work with simple univariat and bivariate function for understanding, One convex and one non-convex function is shown below,
31-
![Image of function](Images/convex_function.png)
32-
![Image of function](Images/non_convex.png)
33-
You can visualize some nice simple functions and other complex functions and Rosenbrock function in the ipynb.
48+
We will work with simple univariate and bivariate functions for understanding, One convex and one non-convex function is shown below,
49+
![Image of function](Images/convex_function.png) ![Image of function](Images/non_convex.png)
50+
51+
You can visualize some nice simple functions and other complex functions like Rosenbrock function in the ipynb.
52+
53+
### GD in Univariate function:
54+
3455
Next I will show some plots by which you will get some interest about Gradient descent,
35-
Here you can see how step size gradually decrease near optima, ignore Armijo method for now just follow the curve
56+
Now see how zigzagging happens while reaching the minima starting from an arbitrary point, in the left side the value of the parameter is shown with iteration and in the right side function values are shown
57+
![Image of function](Images/download8.png)
58+
59+
Here you can see how step size gradually decrease near optima, and the ideal approximation of step length is shown with iteration using Armijo rule,
3660
![Image of function](Images/download.png)
3761

38-
Now see how zagziging happens while reaching the minima, in the left side the value of the parameter is shown and in the right side function values are showen
39-
![Image of function](Images/download8.png)
4062

41-
To reduce this long zigzag path we need various shar maehtods, as you can see below the zigzaging reduced ..again ignore all other terms just follow the curves, later you can understand why and how this hes been happened.
63+
To reduce this long zigzag path we need to adjust parameters or need to apply various methods, after using these, as you can see below the zigzagging reduced ..again ignore all other terms just follow the curves, later you can understand why and how this hes been happened.
4264
![Image of function](Images/download12.png)
4365

44-
Now the same thing for a bivarite function:
66+
67+
### GD in Bivariate function:
68+
69+
Now the same thing for a bivariate function is shown in a contour plot, latter you can visualize it in a 3d plot also:
4570
![Image of function](Images/download3.png)
4671

47-
Now come to SGD[ stocastic gradient descent]
48-
As we are not dealing with dataset directly, we here take simulation of that by adding random noise, and then follow how the diffficulty is getting increased,( okey, so you have to be good with random numbers in python, Here I attached one plot to get you idea how you can play with it to see differents) Follow the sampling and its distribution,
72+
73+
### SGD in Bivariate function ( perturbed GD):
74+
75+
Now come to SGD[ stochastic gradient descent]
76+
As we are not dealing with dataset directly, we here take simulation of that by adding random noise from normal distribution, and then follow how the difficulty is getting increased by adding this noise, and this problem looks like a **Stochastic Gradient Descent** problem, latter we will reduce this noise by taking average of some random noise to simulate the **Mini BAtch Gradient Descent** ( OK, so you have to be good with random numbers in python, Here I attached one plot to get you idea how you can play with it to see differences) Follow the sampling, i.e, sample number and its distribution, When the sample variance is large this is a simulation of SGD and when the variance is low this is same as a Mini Batch GD.
4977
![Image of function](Images/random.png)
5078

51-
and for that we have to control the step size wisely(in the pic the step size is polinomially decreased)
5279

53-
![Image of function](Images/pr_gd.png)
54-
![Image of function](Images/pr_gd_3d.png)
55-
Here polinomially decreases step size is use dbut you can use exponentiall function to handle eta, or you may keep it constant or step wise decreasung,, a plot showing comparison among these are shown here,,, these different methods behave differently in different function.
56-
![Image of function](Images/compare.png)
80+
## Dynamic step size:
81+
As we have added noise to perform a SGD, it becomes tough now to reach the minima within affordable iteration number. So, for that we have to control the step size wisely(in the plot below, the step size is polynomially controlled for other methods go to the first portion of the "GD_SGD_MBGD_momentum.ipynb"). In the figure below, The change of the parameters, function value, function value near minima, gradient norm value step size is shown with iteration. Also you can visualize how the minima is reached in a contour plot as well as in a 3d plot.
82+
83+
- ![Image of function](Images/pr_gd.png) - ![Image of function](Images/pr_gd_3d.png)
5784

85+
## Comparison of different methods of dynamic step size:
86+
Here polynomially decreases step size is used but you can use exponential functions to handle eta, or you may keep it constant or step wise decreasing, a plot showing comparison among these different methods are shown here,,, these different methods behave differently in different function, so be careful. But in most cases Polynomially decreasing $\eta$ is doing better control
87+
![Image of function](Images/compare.png)
5888

59-
Finally we will see how tough this is for a non-convex surface,
89+
## For the non-convex Surface:
90+
Finally we will see how tough this is for a non-convex surface ( the function used here is shown above $\large f(x_1, x_2) = x_1^2 - 2 x_2^2$ ),
6091

6192
![Image of function](Images/pr_gd_cncv.png)
6293
![Image of function](Images/pr_gd_cncv_3d.png)
6394

64-
And finally We will use momentum updation to haldel these dificulties. Here to show the change I purposefully kept the momentum parameter high
95+
96+
## Momentum updation in GD:
97+
And finally We will use momentum updation to handle these difficulties. Momentum help the point keep going in a direction resultant of its momentum and gradient and helps not to stuck in local minima or saddle point. Here to show the change I purposefully kept the momentum parameter high, so you can see that though it know in which direction the minima is , but it will take its momentum in consideration. As a result it takes a long way, but if you reduce momentum controlling parameter it will help,
6598

6699
![Image of function](Images/momentum.png)
67100
![Image of function](Images/momentum_3d.png)
68101
Thats it. Go to the ipynb files now.
69102

70103
**
71-
Also I think, it will be better if anyone want to help me by just making the ipynb files more understable by seperation the topics.
104+
Also I think, it will be better if anyone want to help me by just making the ipynb files more understandable by separation the topics.
72105
If you feel hard anywhere, contact me in mahendranandi.0608@gmail.com
73106
**

0 commit comments

Comments
 (0)