Optimization with SciPy and application ideas to machine learning

Optimization is often the final frontier, which needs to be conquered to deliver the real value, for a large variety of business and technological processes. We show how to perform optimization with the most popular scientific analysis package in Python – SciPy and discuss unique applications in the machine learning space.

Introduction

You may remember a simple calculus problem from the high school days – finding the minimum amount of material needed to build a box given a restriction on its volume.

Simple enough?

It is useful to ponder a bit on this problem and to recognize that the same principle applied here, finds widespread use in complex, large-scale business and social problems.

Look at the problem above carefully. The constraint is a fixed volume. Think of that as a business deliverable (aka commitment to the customer).

But the goal of the problem is to find the minimum material needed (in terms of the surface area). Think of that as related to the profit margin of the producer (the less material is needed, the less production cost for the same selling price, and hence a higher profit margin).

Mathematical optimization is at the heart of solutions to major business problems in engineering, finance, healthcare, socioeconomic affairs. Pretty much all business problems boil down to minimization of some kind of resource cost or maximization of some kind of profit given other constraints.

An optimization process is also the soul of operation research, which is intimately related to modern data-driven business analytics. In this manner, it is also closely related to the data science pipeline, employed in virtually all businesses today.

Although much has been written about the data wrangling and predictive modeling aspects of a data science project, the final frontier often involves solving an optimization problem using the data-driven models which can improve the bottom-line of the business by reducing cost or enhancing productivity.

Why a Business Analytics Problem Demands all of your data science skills

Apart from the pure business-driven motivation, the subject of optimization is worthy to study on its own merit as it lies at the heart of all machine learning (ML) algorithms starting to simple linear regression all the way up to deep neural networks. Understanding the various algorithms, limitations, and formulation of optimization problems can produce valuable insight for solving ML problems efficiently.

What lies beneath? Optimization at the heart of Machine Learning

Therefore, it is imperative for a data scientist to learn basic tools and frameworks to solve optimization problems to make a real-life impact.

Python and SciPy for optimization

Python has become the de-facto lingua franca of analytics, data science, and machine learning. Therefore, it makes sense to discuss optimization packages and frameworks within the Python ecosystem.

How Did Python Become the Language of Choice for Data Science?

In my previous posts, I have covered linear programming and other discrete optimization methodology using Python and introduced powerful packages such as PuLP and CVXPY.

In this post, I will cover optimization algorithms available within the SciPy ecosystem. SciPy is the most widely used Python package for scientific and mathematical analysis and it is no wonder that it boasts of powerful yet easy-to-use optimization routines for solving complex problems.

Relevant example code can be found in the author’s GitHub repository.

Start simple – univariate scalar optimization

We start with a simple scalar function (of one variable) minimization example. Suppose, we want to minimize the following function, which is plotted between x = -10 to x = 10. The function looks like the following. Within the function domain, it has a global minimum and a local minimum.

The code defining the function is,

def scalar1(x): return np.sin(x)*np.exp(-0.1*(x-0.6)**2)

The code to determine the global minimum is extremely simple with SciPy. We can use the minimize_scalar function in this case.

from scipy import optimize result = optimize.minimize_scalar(scalar1)

That’s it. Believe it or not, the optimization is done! We can print out the resulting object to get more useful information.

print(result)

>> fun: -0.6743051024666711 nfev: 15 nit: 10 success: True x: -1.2214484245210282

The value at which the minimum is reached is stored in the result['x'] variable.

print("Minimum occurs at: ",result['x']) >> Minimum occurs at: -1.2214484245210282

Rest quantities yield information about the number of function evaluation, iterations, the state of the solution (success or not) and the function value at the final solution.

What if the variables are bounded?

The code above accomplished what is called unconstrained/unbounded optimization i.e. no restriction of any kind was imposed on the problem. However, most practical optimization problems involve complex constraints. A simple example of that is bound on the independent variable (x).

As we can see that this function is characterized by two minima, the result would be different if we only considered the positive values of x. The code to search with bound is only slightly different from above.

result = optimize.minimize_scalar(scalar1, bounds = (0,10),method='Bounded')

So, we have to pass on the bounds argument with a suitable tuple containing the minimum and maximum bounds and use the method='Bounded' argument.

print("When bounded between 0 and 10, minimum occurs at: ",result['x'])

>> When bounded between 0 and 10, minimum occurs at: 4.101466164987216

Introducing other functional constraints

We could have had other complicated constraints in the problem. Suppose, we want the following conditions to be met along with the goal of finding the global minimum.

Note, one of them is inequality and another is equality constraint.

Putting constraints as functions inside a dictionary

SciPy allows handling arbitrary constraints through the more generalized method optimize.minimize. The constraints have to be written in a Python dictionary following a particular syntax. The inequality constraint needs to be broken down in individual inequalities in form f(x) < 0. The following code demonstrates the idea.

Choosing a suitable method

After that, we can run the optimization by choosing a suitable method which supports constraints (not all methods in the minimize function support constraint and bounds). Here we chose SLSQP method which stands for sequential least-square quadratic programming.

Initial guess and the first trial run

Furthermore, to use minimize we need to pass on an initial guess in the form of x0 argument. Suppose, we pass on x0=0 for a trial run.

result = optimize.minimize(scalar1,x0=0,method='SLSQP', constraints=cons,options={'maxiter':1000})

Failure!

If we print the result, we see something different from the simple unconstrained optimization result.

fun: 0.7631695862891654 jac: array([0.59193639]) message: 'Iteration limit exceeded' nfev: 1254 nit: 101 njev: 101 status: 9 success: False x: array([0.8773752])

The optimization parameter success: False indicates it did not succeed in reaching the global minimum. The message is ‘Iteration limit exceeded‘ i.e. it tried 101 iterations but could not reach the minimum.

But, why?

The answer lies in the deep theory of the mathematical optimization (and associated algorithm) but it suffices to say that the initial guess played a big role. In general, a non-convex optimization problem has no mathematical guarantee to be solved successfully and the nature of our problem here is non-convex. To know more about convexity of an optimization problem, see this video,

How can we improve the optimization (search)?

In general cases, we cannot do much. However, in this toy example, we already have the plot of the function and can eyeball the optimum solution. Therefore, we can just give a better initial guess to the algorithm. We give x0=-2.

result = optimize.minimize(scalar1,x0=-2,method='SLSQP', constraints=cons,options={'maxiter':100})

Now the result is favorable!

fun: -0.2859494456768613 jac: array([-0.4675066]) message: 'Optimization terminated successfully.' nfev: 811 nit: 71 njev: 67 status: 0 success: True x: array([-2.37569791])

What about the number of iterations?

What if we restrict the number of iterations performed by the algorithm? For demonstration purpose only, we severely limit the number of iteration to 3.

result = optimize.minimize(scalar1,x0=-20,method='SLSQP', constraints=cons,options={'maxiter':3})

The result is, as expected, not favorable.

fun: -0.4155114388552631 jac: array([-0.46860977]) message: 'Iteration limit exceeded' nfev: 12 nit: 4 njev: 4 status: 9 success: False x: array([-2.10190632])

Note that the optimization came close to the global minimum, but did not quite reach it – of course, due to the fact that it was not allowed to iterate a sufficient number of times.

Why is this important to consider?

That is because of the fact that each iteration equates to computational (and sometimes not computational but actual physical) cost.

This is a business aspect of the optimization process. In real life, we may not be able to run the optimization for a long period of time if the individual function evaluation costs significant resources.

This kind of scenario arises when the optimization is done not involving simple mathematical evaluation but complex, time-consuming simulation or cost and labor-intensive experimentation.

When each evaluation costs money or resources, then not only the choice of the algorithm but also the finer details become important to consider.

Going more complex – multi-variate function

Although we considered all essential aspects of solving a standard optimization problem in the preceding sections, the example consisted of a simple single-variable, analytical function.

But it does not have to be the case!

SciPy methods work with any Python function – not necessarily a closed-form, single-dimensional mathematical function.

Let us show an example with a multi-valued function.

Maximization of a Gaussian mixture

Often in a chemical or manufacturing process, multiple stochastic sub-processes are combined to give rise to a Gaussian mixture. It may be desirable to maximize the final resultant process output by choosing the optimum operating points in the individual sub-processes (within certain process limits).

The trick is to use a vector as the input to the objective function and to make sure the objective function still returns a single scalar value. Also, because the optimization problem here is about maximization of the objective function, we need to change the sign and return the negative of the sum of the Gaussian functions as the result of the objective function.

The same result['x'] stores the optimum setting of the individual processes as a vector. That is the only difference between optimizing a single-valued and a multivariate function is that we get back a vector instead of a scalar.

x: array([-1.00017852, 0.29992313, 2.10102748])

Bounded inputs

Needless to say that we can change the bounds here to reflect practical constraints. For example, if the sub-process settings can occupy only a certain range of values (some must be positive, some must be negative, etc.) then the solution will be slightly different – it may not be the global optimum.

Here, the solution is as follows. This is dictating to push the 3rd sub-process setting to the maximum possible value (zero) while adjusting the other two suitably.

x: array([-1.00000644e+00, 3.00115191e-01, -8.03574200e-17])

The constraints for multi-variate optimization are handled in a similar way as shown for the single-variable case.

Linear programming with Scipy

Simple, straight-forward linear programming (LP) problems can also be addressed by Scipy. Prior to 2014, it did not have a LP solver built-in, but it has changed since then.

Let’s take a practical factory production problem (borrowed from this example and slightly changed)

A factory produces four different products, and that the daily produced amount of the first product is x1, the amount produced of the second product is x2, and so on. The goal is to determine the profit-maximizing daily production amount for each product, with the following constraints,

The profit per unit of product is 20, 12, 30, and 15 for the first, second, third, and fourth product, respectively.
Due to manpower constraints, the total number of units produced per day can’t exceed fifty (50).
For each unit of the first product, three units of the raw material A are consumed. Each unit of the second product requires two units of raw material A and one unit of the raw material B. Each unit of the third product needs two units of A and five units of B. Finally, each unit of the fourth product requires three units of B.
Due to the transportation and storage constraints, the factory can consume up to one hundred units of raw material A and ninety units of B per day.

Setting up this problem is easy in Scipy.

And then solve it with one line of code,

So, the solution says that,

The factory should produce 26.66 units of x1, 10 units of x3, and 13.33 units of x4 every day. The extremely small number corresponding to x2 essentially indicates that no amount of x2 should be produced.
The maximum profit obtainable is $1033.33 under this arrangement.

A noteworthy point is that the solution indicates a fractional choice, which may not be feasible in a practical situation. This is the limitation of Scipy solver that it cannot solve the so-called integer programming problems. Other Python packages like PuLP could be an option for such problems. See my article here.

Linear programming and discrete optimization with Python using PuLP

Extending the process to the machine learning domain

To be honest, there is no limit to the level of complexity you can push this approach as long as you can define a proper objective function that generates a scalar value and suitable bounds and constraints matching the actual problem scenario.

Error minimization in machine learning

The crux of almost all machine learning (ML) algorithms is to define a suitable error function (or loss metric), iterate over the data, and find the optimum settings of the parameter of the ML model which minimizes the total error. Often, the error is a measure of some kind of distance between the model prediction and the ground truth (given data).

Therefore, it is perfectly possible to use SciPy optimization routines to solve an ML problem.

This gives you a deep insight into the actual working of the algorithm as you have to construct the loss metric yourself and not depend on some ready-made, out-of-the-box function.

Hyperparameter optimization in ML

Tuning parameters and hyperparameters of ML models is often a cumbersome and error-prone task. Although there are grid-search methods available for searching the best parametric combination, some degree of automation can be easily introduced by running an optimization loop over the parameter space. The objective function, in this case, has to be some metric of the quality of the ML model prediction (mean-square error, complexity measure, or F1 score for example).

Using machine learning as the function evaluator

In many situations, you cannot have a nice, closed-form analytical function to use as the objective of an optimization problem.

But who cares about being nice when we have deep learning?

Imagine the power of an optimization model which is fed (for its objective function as well as for the constraints) by a multitude of models – different in fundamental nature but standardized with respect to the output format so that they can act in unison.

You are free to choose an analytical function, a deep learning network (perhaps as a regression model), or even a complicated simulation model, and throw them all together into the pit of optimization.

The possibilities are endless!

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.com. Also, you can check the author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.

Tirthajyoti Sarkar – Sr. Principal Engineer – Semiconductor, AI, Machine Learning – ON…