100% found this document useful (2 votes)
1K views12 pages

MScFE 600 Financial Data GWP1 - Report

The document outlines a group project for MSF 600: Financial Data, detailing the contributions of members and a statement of integrity regarding the originality of the work. It covers key topics such as data quality, yield curve modeling using Nelson-Siegel and Cubic Spline approaches, and the ethical considerations of data smoothing. Additionally, the project includes an analysis of correlation and principal component analysis applied to financial data.

Uploaded by

ybanceorakle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views12 pages

MScFE 600 Financial Data GWP1 - Report

The document outlines a group project for MSF 600: Financial Data, detailing the contributions of members and a statement of integrity regarding the originality of the work. It covers key topics such as data quality, yield curve modeling using Nelson-Siegel and Cubic Spline approaches, and the ethical considerations of data smoothing. Additionally, the project includes an analysis of correlation and principal component analysis applied to financial data.

Uploaded by

ybanceorakle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA

GROUP NUMBER: 7982

MARK X FOR ANY


NON-
FULL LEGAL NAME LOCATION (COUNTRY) EMAIL ADDRESS CONTRIBUTING
MEMBER

FAGBEMI OLUSEGUN SOUTH AFRICA OLUSOULS@GMAIL.COM


SULAIMON

YOUSSOUF BANCE FRANCE (PARIS) YOUSSOUFBANCE2012@GMAIL.C


OM

Taufik Abdur Rahman x

Statement of integrity: By typing the names of all group members in the text boxes below, you confirm that the
assignment submitted is original work produced by the group (excluding any non-contributing members identified
with an “X” above).

Team member 1 Fagbemi Olusegun Sulaimon

Team member 2 YOUSSOUF BANCE

Team member 3

Use the box below to explain any attempts to reach out to a non-contributing member. Type (N/A) if all
members contributed.
Note: You may be required to provide proof of your outreach to non-contributing members upon
request.

1
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

Table of Contents

1.0 DATA QUALITY ................................................................................................................................................. 3

1.1 Introduction ...................................................................................................................................................... 3


a) Example of poor-quality structured data: ............................................................................................................. 3
Table a. Example of poor-quality structured data ........................................................................................................ 3
b) Issues of Data Quality. ......................................................................................................................................... 3
Regardless of the source, there are four features of data equality we always seek (WQU, 2024).: .......................... 3
c) Example of poor-quality unstructured data .......................................................................................................... 3
d) Data Quality Issues in Unstructured Data ............................................................................................................ 3

2.0 Yield Curve ....................................................................................................................................................... 5

2.1 Introduction ...................................................................................................................................................... 5


2.2 The Analysis ..................................................................................................................................................... 5
a) Selection of Government Securities ..................................................................................................................... 5
b) Maturities Selection .............................................................................................................................................. 5
c) Fitting the Nelson-Siegel Model ........................................................................................................................... 5
d) Fitting the Cubic Spline Model ............................................................................................................................. 6
e) Comparison of Models ......................................................................................................................................... 6
f) Model Parameters ................................................................................................................................................ 7
g) Ethical Considerations ......................................................................................................................................... 7

3.0 Exploiting Correlation ..................................................................................................................................... 8

3.1 Introduction ...................................................................................................................................................... 8


3.2 Simulated Data Analysis ................................................................................................................................. 8
a) Generate Uncorrelated Gaussian Random Variables .......................................................................................... 8
b) Principal Components Analysis (PCA) ................................................................................................................. 8
c) Variance Comparison ........................................................................................................................................... 8
d) Screeplot .............................................................................................................................................................. 8
3.3 Real Government Data Analysis..................................................................................................................... 9
e) Collecting Daily Closing Yields ............................................................................................................................. 9
f) Daily Yield Changes ............................................................................................................................................. 9
g) PCA Using Correlation Matrix .............................................................................................................................. 9
h) Variance Comparison........................................................................................................................................... 9
i) Screeplot (Real Data) .......................................................................................................................................... 10
j) Comparison of Screeplots ................................................................................................................................... 10

Reference ................................................................................................................................................................. 12

2
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

1.0 DATA QUALITY


1.1 Introduction

The quality of data is vital for accurate decision-making and reliable analysis. Structured data, like tables or
spreadsheets, constantly faces issues like missing information, errors, or inconsistent values, which reduce its
value. Unstructured data, as customer reviews, imposes more challenges with informal language, mixed formats,
and unclear details. These issues make it harder to analyse and generate meaningful insights. With the prevalence
of data available, financial analysts now go beyond traditional market data and corporate statements, using diverse
sources like health, satellite, and climate data, making data quality more critical than ever (WQU, 2024).

a) Example of poor-quality structured data:


Date Income ($) Customer ID Product
2024-12-01 -500 ABC123 Laptop
2024-12-02 0 NULL Tablet
2024-12-03 300 ABC124 Phone
2024-12-04 N/A ABC125 NULL
Table a. Example of poor-quality structured data

b) Issues of Data Quality.

Regardless of the source, there are four features of data equality we always seek (WQU, 2024).:

• Accuracy
• Completeness
• Consistency
• Timeliness

If any of these features are not met, the analysis of such data may encounter significant issues. For instance,
considering structured data as an example, we can identify several problems with the values in some cells.
Looking at example in (a)These issues include:

• Accuracy – The values for income do not make any sense (e.g., negative revenue, "N/A" for numbers).
• Completeness - Customer IDs is missing and Product names shows incomplete records.
• Consistency - "NULL" and "N/A" are inconsistent depictions of missing data.
• Timeliness - On 2024-12-04, the Income ($) is entered as N/A. This can impact timely analysis if it is
delayed entry.

c) Example of poor-quality unstructured data

• A customer review dataset with unstructured comments:


• "The product was gra8 but delivery was :( ."
• "NULL"
• "I liked it. But idk, maybe will use again later ##goodstuff."
• "Why u no fix this fasterrrr. Service!! #FAIL"

d) Data Quality Issues in Unstructured Data

Unstructured data are information that does not adhere to a predefined data model or organisational structure.
Examples include text documents, audio clips, images, videos, social media posts, review sites, and photographs
(Encord, n.d.). This type of data is often used in tasks like pricing weather derivatives (e.g., analysing satellite
photos) or monitoring companies by evaluating social media sentiment, reviews, and photos to identify trends,

3
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

assess product popularity, or gauge customer loyalty. Unlike structured data, unstructured data is more challenging
to store due to its unconventional and irregular format (WQU, 2024).
Data quality is critical to ensuring accurate and reliable decision-making in organizations. Key dimensions like
accuracy, completeness, consistency, timeliness, validity, and uniqueness significantly impact operational efficiency
and business outcomes (Sharma, 2024).
Data Quality Dimensions are various tangible features or characteristics of data that can be used to evaluate it.
The following details (refer to Table a) show how unstructured data may fail to meet the requirements of good
quality data:
1. Clarity - The use of informal language, abbreviations ("grai8," "idk"), makes it hard to interpret the
feedback consistently.
2. Consistency - Mixed formats (e.g., hashtags, "NULL" as actual information) lead to inconsistent data that
is challenging to prepare for analysis.
3. Relevance - Certain comments, such as "NULL" or excessive punctuation (if any), add no tangible value
to understanding customer sentiment.
4. Completeness - The reviews lack sufficient context or specific details to derive actionable insights,
reducing their usefulness for decision-making.

4
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

2.0 YIELD CURVE


2.1 Introduction

The yield curve, a graphical representation of interest rates across different bond maturities, is a part of financial
analysis. It provides insights into the economic outlook, interest rate risk, and investment strategies. This report
details the modeling of the yield curve for Indonesia's government securities using the Nelson-Siegel and Cubic
Spline approaches. It also evaluates the models' fit and interpretability, followed by ethical considerations of data
smoothing.

2.2 The Analysis

a) Selection of Government Securities

The analysis focuses on Indonesia's government bonds. The dataset, taken from Investing.com, includes daily
bond yields for short, medium, and long-term maturities. Indonesia was chosen because one of the group
members is a citizen and in line with the project requirement to select a country linked to any of the group
members. Also, Indonesia's dataset covers all key maturity types, making it a good choice for studying yield
curves. The data can be found at this link: https://www.investing.com/rates-bonds/indonesia-government-bonds.

The dataset includes:


• Short-term maturities: 1 month, 3 months, 6 months, 1 year.
• Medium-term maturities: 3 years, 5 years.
• Long-term maturities: 10 years, 15 years, 20 years, 30 years.
This wide range of maturities ensures a good understanding of the yield curve dynamics, as recommended by
(Hull, 2015).

b) Maturities Selection

We converted maturities into years to standardise the data:


• Short-term: 1/12, 3/12, 6/12, and 1 year.
• Medium-term: 3 years, 5 years.
• Long-term: 10 years, 15 years, 20 years, 30 years.

The inclusion of these maturities guarantees the analysis captures the short, medium, and long-term dynamics of
the yield curve (Diebold, 2006)This allows the Nelson-Siegel model to effectively capture the level, slope, and
curvature, while the Cubic Spline model can leverage its flexibility for a closer fit.

c) Fitting the Nelson-Siegel Model

The Nelson-Siegel model is a parametric model that is widely used for yield curve analysis because of its ability to
summarise the curve using interpretable parameters. The model equation is:

1 − 𝑒 −𝑡/𝜏 1 − 𝑒 −𝑡/𝜏
𝑦(𝑚) = 𝛽0 + 𝛽1 ( ) + 𝛽2 ( − 𝑒 −𝑡/𝜏 )
𝑡/𝜏 𝑡/𝜏

where:
• 𝛽0 : Long-term yield,
• 𝛽1 : Short-term slope,
• 𝛽2 : Medium-term curvature,
5
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

• 𝜏 : Decay factor determining the exponential rate of change.


With non-linear least squares optimisation, the estimated parameters for securities are:
• 𝛽0 = 7.0918 (long-term yield),
• 𝛽1 = - 2.7091 (short-term slope),
• 𝛽2 = - 0.0005 (minimal curvature),
• 𝜏 = 4.6961 (decay factor).
These values show a steep short-term decline (β1) and a reasonably flat curvature (𝛽2 ). The long-term yield (𝛽0 )
aligns with historical trends in Indonesia's bond markets (Nelson, 1987).

d) Fitting the Cubic Spline Model

The Cubic Spline model is a non-parametric approach that fits a smooth curve through observed yields using
piecewise polynomial functions. This model does not assume a specific functional form, making it highly flexible.
Using the scipy.interpolate.CubicSpline function, the model achieved a perfect fit to the observed data.
The Cubic Spline model excels in flexibility, but lacks interpretability, as it does not provide parameter estimates
that describe the level, slope, or curvature of the yield curve (De Boor, 2001).

e) Comparison of Models
1) Fit

Nelson-Siegel
• Provides a smooth approximation of the yield curve, balancing fit and simplicity.
• May slightly deviate from individual data points but capture the overall trend effectively.

Cubic-Spline

• Fits the data exactly at the observed maturities.


• Risks overfitting to short-term anomalies or noise in the data.

The performance of the Nelson-Siegel and Cubic Spline models was evaluated using the following metrics:
Metric Nelson-Siegel Model Cubic Spline Model
RMSE 0.1914 0.0000
R² 0.9544 1.0000

2) Interpretation

Nelson-Siegel:
Provides meaningful parameters ((β₀, β₁, β₂, λ)) that describe the yield curve’s economic structure:
• 𝛽0 : Long-term yield,
• 𝛽1 : Short-term slope,
• 𝛽2 : Medium-term curvature,
• 𝜏 : Decay factor determining the exponential rate of change.
Suitable for economic analysis and policy interpretation.
Cubic-Spline:

• Lacks interpretable parameters, focusing purely on mathematical smoothness.


• Useful for interpolation but less reliable for understanding the underlying economic factors.
6
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

The Nelson-Siegel model offers reasonable accuracy with interpretable parameters, making it ideal for analysing
the yield curve's overall structure.

The Cubic Spline model achieves a perfect fit but risks overfitting and lacks economic interpretability.

These results confirm findings by (Diebold, 2006), that highlighted some of the trade-offs between parametric
and nonparametric models when analyzing yield curves.

f) Model Parameters

The Nelson-Siegel model's parameters provide key insights:

• 𝛽0 : Reflects the long-term interest rate level.


• 𝛽1 : Captures the short-term slope, indicating a steep decline in short-term yields,
• 𝛽2 : Represents the curvature, showing minimal deviation.,
• 𝜏 : Determines the exponential decay, controlling the curve's hump.
In contrast, the Cubic Spline model does not offer interpretable parameters, focusing solely on interpolation
g) Ethical Considerations

It is unethical to intentionally create misleading data. Smoothing data, while useful in filtering noise in
Econometrics, becomes unethical when it misrepresents reality, such as understating volatility or inflating Sharpe
ratios. For instance, holding back gains to offset losses, as in smoothing profit-and-loss, creates the illusion of
stability. Good ethics demands transparency, making such practices unethical if used to mislead stakeholders
about risk or performance. (WQU, 2024)
Smoothing data can be unethical if it misrepresents reality, such as understating volatility or inflating Sharpe ratios.
If Nelson-Siegel smoothing is used solely for analytical clarity, it is acceptable. However, if it distorts the yield curve
to mislead stakeholders about risk or performance, it becomes unethical, violating transparency and good ethics.

The use of smoothing techniques, such as the Nelson-Siegel model, raises ethical concerns if it obscures critical
market information. Key considerations are:

• Transparency
Smoothing improves interpretability but may conceal market risks if used without disclosure.
• Over-smoothing
During periods of volatility, excessive smoothing could mislead investors by hiding significant deviations in
market conditions.
Data smoothing enhances financial modeling by improving accuracy, decision-making, and mitigating market
volatility. However, there is always the risk of over-smoothing, lag, and data distortion. Ethical modeling requires
transparency, validation, and alignment with analysis goals (Tamplin, 2023). The Nelson-Siegel model offers
interpretability and robust yield curve insights, while the Cubic Spline ensures a perfect fit but lacks economic
meaning. Transparency and careful validation are essential for ethical use of smoothing techniques in financial
modeling.

7
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

3.0 EXPLOITING CORRELATION


3.1 Introduction

This analysis explores how correlation and principal component analysis (PCA) can be applied to summarise
financial data. We analysed both simulated and real-world financial data to determine the proportion of variance
explained by each principal component.

3.2 Simulated Data Analysis

a) Generate Uncorrelated Gaussian Random Variables

To simulate yield changes, we generated five (5) uncorrelated Gaussian random variables using NumPy, with a
mean of 0 and a standard deviation of 0.1 𝑁(0,0.1). These variables were designed to represent independent
fluctuations, ensuring no inherent correlation between them.

b) Principal Components Analysis (PCA)

We conducted PCA on the correlation matrix of the simulated data. The correlation matrix ensures that variables
are standardised, allowing for consistent analysis across the components (Jolliffe, 2002).

c) Variance Comparison

The variance explained and cumulative variance for the simulated (uncorrelated) data are as follows:
• Component 1: 36.56% (Cumulative: 36.56%)
• Component 2: 24.62% (Cumulative: 61.18%)
• Component 3: 22.42% (Cumulative: 83.60%)
• Component 4: 16.40% (Cumulative: 100.00%)
• Component 5: 0.00% (Cumulative: 100.00%)

Key Insight

The first two components explain 61.18% of the variance, indicating their significance. The cumulative variance
reaches 100% by the fourth component, showing that only the first three or four components are necessary to
summarise the data.

d) Screeplot

The scree plot visualised the variance explained by each principal component. The gradual decline in variance
from PC1 to PC5 indicates that the first few components explain most of the variability, while the later components
contribute progressively less (Cattell, 1966).

8
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

3.3 Real Government Data Analysis

e) Collecting Daily Closing Yields

Daily closing yields for five government securities were collected over a 6-month period. This dataset reflects real-
world financial behavior influenced by various economic factors and contained daily closing yields for five
maturities (1M, 3M, 1Yr, 5Yr, 10Yr). This data represented real-world financial trends in government securities.

f) Daily Yield Changes


The daily yield changes were got using the .diff() function, highlighting the day-to-day variability in bond yields. This
step was necessary to focus the PCA on yield changes rather than absolute levels, making the analysis more
robust (Tsay, 2010).
g) PCA Using Correlation Matrix

PCA was re-run on the correlation matrix derived from the daily yield changes. The correlation matrix was used
because the variables (yields) had different scales. Standardising them ensures fair comparison across
components.

h) Variance Comparison

The PCA results showed that:


• Component 1: 62.89% (Cumulative: 62.89%)
• Component 2: 21.24% (Cumulative: 84.13%)
• Component 3: 14.01% (Cumulative: 98.14%)
• Component 4: 1.86% (Cumulative: 100.00%)
• Component 5: 0.00% (Cumulative: 100.00%)

Key Insight:

9
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

The first component accounts for 62.89% of the variance, which means that there is strong correlation in the real-
world data. By the third component however, the cumulative variance exceeds 98% making the further
components unimportant.

This dominance of PC1 indicated a strong correlation between the yields of different maturities, driven by market-
wide factors such as interest rate policies (Fabozzi, et al., 2014).

i) Screeplot (Real Data)

The screeplot for government data exhibited a sharp "elbow" after PC1, emphasising its dominance. The second
and third components contributed minimally to the total variance.

j) Comparison of Screeplots

10
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

The screeplot of the uncorrelated data revealed a gradual decrease in explained variance, with none of the
components being predominant. In comparison, the screeplot for government data indicated a dramatic decline
following PC1, with more than 91% of variance captured by component 1 alone. This discrepancy reveals the
inherent correlation in government yield data, and the lack of dependency in uncorrelated (Jolliffe & & Cadima,
2016).

Similarities

• The first three components explain the majority of the variance in both datasets.
• Scree plots show a rapid decline in variance explained after the first component.

Differences
• In the simulated data, the variance is fairly uniform across the first three components, 36.56% of variance
is explained by Component 1.

• In the government data, the first component dominates with 62.89% of the variance, reflecting stronger
interdependencies in real-world financial variables.
This analysis shows how important PCA is, to understanding the structure of financial data. The variance was
distributed evenly across components for simulated uncorrelated data. As for government data, most of the
variance was captured by PC1, reflecting strong interrelationships among yields. PCA not only reduces the number
of variables but also highlights the main factors that cause variability in data. This makes it a valuable tool for
analysing data.

11
GROUP WORK PROJECT # 1 MSF 600: FINANCIAL DATA
GROUP NUMBER: 7982

REFERENCE
Abdi, H. & Williams, L. J., 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational
Statistics, 2(4), pp. 433-459.

Cattell, R. B., 1966. The scree test for the number of factors. Multivariate Behavioral Research,. In: s.l.:s.n., p.
245–276.
D'Acunto, F. & Rossi, a. A., 2022. Robo-advice: An Effective Tool to Reduce Inequalities?. [Online]
Available at: https://www.brookings.edu/research/robo-advice-an-effective-tool-to-reduce-inequalities/.

De Boor, C. (., 2001. A Practical Guide to Splines. s.l.:Springer-Verlag.


Diebold, F. X. &. L. C., 2006. Forecasting the term structure of government bond yields. Journal of Econometrics,
130(2), pp. 337-364.

Encord, n.d. Best Practices for Handling Unstructured Data Efficiently. [Online]
Available at: https://encord.com
Fabozzi, F. J., Focardi, S. M. & Kolm, P. N., 2014. Factor Models: Theory and Practice. s.l.:Wiley.

Hull, J. C., 2015. Options, Futures, and Other Derivatives. s.l.:Pearson Education..
Jolliffe, I. T. & & Cadima, J., 2016. Principal component analysis: a review and recent developments. Philosophical
Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. s.l.:s.n.

Jolliffe, I. T., 2002. Principal Component Analysis. s.l.:Springer.


Nelson, C. R. &. S. A. F., 1987. Parsimonious modeling of yield curves. The Journal of Business, 60(4), pp. 473-
489.

Sharma, N., 2024. 6 Data Quality Dimensions How to Perform & Ensure Them. [Online]
Available at: https://hevoacademy.com/data-quality/data-quality-dimensions

Tamplin, T., 2023. Data Smoothing. [Online]


Available at: https://www.financestrategists.com/wealth-management/fundamental-vs-technical-analysis/data-
smoothing/?utm_source=chatgpt.com#final-thoughts

Tsay, R. S., 2010. Analysis of Financial Time Series. s.l.:Wiley.

WQU, 2024. Ethical Practices with Data Module 2 | Lesson 4. s.l.:WorldQuant University.

WQU, 2024. Financial data best practices ensure data integrity and security" (WQU, Module 1, Lesson 1). s.l.:s.n.

12

You might also like