Self-Study Plan for Becoming a Quantitative Trader -
Part II
quantstart.com/articles/Self-Study-Plan-for-Becoming-a-Quantitative-Trader-Part-II/
In the previous article on studying to become a quant trader we touched on the
importance of statistical and machine learning. Many of you contacted me in regard to
the "state of the art" of such machine learning methods, and how they're applied in the
quant finance world. In this article I want to outline the resources necessary to learn
machine learning techniques so that you'll be better prepared for a role as a quant
trader.
Statistical learning is extremely important in quant trading research. We can bring to
bear the entire weight of the scientific method and hypothesis testing in order to
rigourously assess the quant trading research process. For quantitative trading we are
interested in testable, repeatable results that are subject to constant scrutiny. This allows
easy replacement of trading strategies as and when performance degrades. Note that
this is in stark contrast to the approach taken in "discretionary" trading where
performance and risk are not often assessed in this manner.
Why Should We Use The Scientific Method In Quantitative
Trading?
The statistical approach to quant trading is designed to eliminate issues that surround
discretionary methods. A great deal of discretionary technical trading is rife with
cognitive biases, including loss aversion, confirmation bias and the bandwagon effect.
Quant trading research uses alternative mathematical methods to mitigate such
behaviours and thus enhance trading performance.
In order to carry out such a methodical process quant trading researchers possess a
continuously skeptical mindset and any strategy ideas or hypotheses about market
behaviour are subject to continual scrutiny. A strategy idea will only be put into a
"production" environment after extensive statistical analysis, testing and refinement. This
is necessary because the market has a rather low signal-to-noise ratio. This creates
difficulties in forecasting and thus leads to a challenging trading environment.
What Modelling Problems Do We Encounter In Quantitative
Finance?
The goal of quantitative trading research is to produce algorithms and technology that
can satisfy a certain investment mandate. In practice this translates into creating trading
strategies (and related infrastructure) that produce consistent returns above a certain
1/4
pre-determined benchmark, net of costs associated with the trading transactions, while
minimising "risk". Hence there are a few levers that can be pulled to enhance the
financial objectives.
A great deal of attention is often given to the signal/alpha generator, i.e. "the strategy".
The best funds and retail quants will spend a significant amount of time
modelling/reducing transaction costs, effectively managing risk and determining the
optimal portfolio. This article is primarily aimed at the alpha generator component of the
stack, but please be aware that the other components are of equal importance if
successful long-term strategies are to be carried out.
We will now investigate problems encountered in signal generation and how to solve
them. The following is a basic list of such methods (which clearly overlap) that are often
encountered in signal generation problems:
Forecasting/Prediction - The most common technique is direct forecasting of a
financial asset price/direction based on prior prices (or fundamental factors). This
usually involves detection of an underlying signal in the "noise" of the market that
can be predicted and thus traded upon. It might also involve regressing against
other factors (including lags in the original time series) in order to assess the future
response against future predictors.
Clustering/Classification - Clustering or classification techniques are methods
designed to group data into certain classes. These can be binary in nature, e.g.
"up" or "down", or multiply-grouped, e.g. "weak volatility", "strong volatility",
"medium volatility".
Sentiment Analysis - More recent innovations in natural language processing and
computational speed have lead to sophisticated "sentiment analysis" techniques,
which are essentially a classification method, designed to group data based on
some underlying sentiment factors. These could be directional in nature, e.g.
"bullish", "bearish", "neutral" or emotional such as "happy", "sad", "positive" or
"negative". Ultimately this will lead to a trading signal of some form.
Big Data - Alternative sources of data, such as consumer social media activities,
often lead to terabytes (or greater) of data that requires more novel
software/hardware in order to interpret. New algorithm implementations have been
created in order to handle such "big data".
Modelling Methodology
There are countless textbooks on statistical modelling, probability and machine learning.
It is actually quite challenging to know where to begin. I myself have had to go through
this process when transitioning from a physical modelling mindset (during my own PhD)
towards a statistical approach while in industry. I described the two books I consider the
"best" to get started in this field in the previous article, but to recap they are:
An Introduction to Statistical Learning by Gareth James et al
The Elements of Statistical Learning by Trevor Hastie et al
2/4
The first book doesn't require a great deal of mathematical sophistication. The
necessary background includes typical college linear algebra, calculus and probability
theory. The second book is more advanced and goes deeper into the theory. For that
you should have some good grounding in probability theory, prior statistical methods and
modelling.
These books will teach you about the following topics. By studying the books (and
carrying out the associated "labs" in R) you will gain a solid insight into when certain
algorithms are applicable.
Statistical Modelling and Limitations - The books will outline what statistical
learning is and isn't capable of along with the tradeoffs that are necessary when
carrying out such research. The difference between prediction and inference is
outlined as well as the difference between supervised and unsupervised learning.
The bias-variance tradeoff is also explained in detail.
Linear Regression - Linear regression (LR) is one of the simplest supervised
learning techniques. It assumes a model where the predicted values are a linear
function of the predictor variable(s). While this may seem simplistic compared to
the remaining methods in this list, linear regression is still widely utilised in the
financial industry. Being aware of LR is important in order to grasp the later
methods, some of which are generalisations of LR.
Supervised Classification: Logistic Regression, LDA, QDA, KNN - Supervised
classification techniques such as Logistic Regression, Linear/Quadratic
Discriminant Analysis and K-Nearest Neighbours are techniques for modelling
qualitative classification situations, such as prediction of whether a stock index will
move up or down (i.e. a binary value) in the next time period.
Resampling Techniques: Bootstrapping, Cross-Validation - Resampling
techniques are necessary in quantitative finance (and statistics in general)
because of the dangers of model-fitting. Such techniques are used to ascertain
how a model behaves over different training sets and how to minimise the problem
of "overfitting" models.
Decision Tree Methods: Bagging, Random Forests - Decision trees are a type
of graph that are often employed in classification settings. Bagging and Random
Forest techniques are ensemble methods making use of such trees to reduce
overfitting and reduce variance in individually fitted supervised learning methods.
Neural Networks - Artificial Neural Networks (ANN) are a machine learning
technique often employed in a supervised manner to find non-linear relationships
between predictors and responses. In the financial domain they are often used for
time series prediction and forecasting.
Support Vector Machines - SVMs are also classification or regression tools,
which work by constructing a hyperplane in high or infinite dimensonal spaces. The
kernel trick allows non-linear classification to occur by a mapping of the original
space into an inner-product space.
3/4
Unsupervised Methods: PCA, K-Means, Hierarchical Clustering, NNMF -
Unsupervised learning techniques are designed to find hidden structure in data,
without the use of an objective or reward function to "train" on. Additionally,
unsupervised techniques are often used to pre-process data.
Ensemble Methods - Ensemble methods make use of multiple separate statistical
learning models in order to achieve greater predictive capability than could be
achieved from any of the individual models.
To become an adept quantitative trading researcher it is essential to be familiar with the
process of statistical modelling. An exhaustive knowledge of machine learning
techniques is of lesser importance than a deeper understanding of the modelling
process itself. Make sure to always keep in mind the core ideas of modelling
assumptions, the bias-variance tradeoff, algorithm applicability and cognitive biases
when carrying out quantitative trading research.
4/4