An Extended ID3 Decision Tree Algorithm for
Spatial Data
Imas Sukaesih Sitanggang#†1, Razali Yaakob#2, Norwati Mustapha#3, Ahmad Ainuddin B Nuruddin*4
#
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia
43400 Serdang Selangor, Malaysia
1
imas.sitanggang@ipb.ac.id, 2razaliy@fsktm.upm.edu.my, 3norwati@fsktm.upm.edu.my
*
Institute of Tropical Forestry and Forest Products (INTROP), Universiti Putra Malaysia
43400 Serdang Selangor, Malaysia
4
ainuddin@forr.upm.edu.my
†
Computer Science Department, Bogor Agricultural University, Bogor 16680, Indonesia
Abstract— Utilizing data mining tasks such as classification on geometric types (polygons, lines or points) that have
spatial data is more complex than those on non-spatial data. It is quantitative measurements such as area and distance.
because spatial data mining algorithms have to consider not only This paper proposes a spatial decision tree algorithm to
objects of interest itself but also neighbours of the objects in construct a classification model from a spatial dataset. The
order to extract useful and interesting patterns. One of
classification algorithms namely the ID3 algorithm which
dataset contains only discrete features: points, lines and
originally designed for a non-spatial dataset has been improved polygons. The algorithm is an extension of ID3 algorithm [1]
by other researchers in the previous work to construct a spatial for a non-spatial dataset. As in the ID3 algorithm, the
decision tree from a spatial dataset containing polygon features proposed algorithm uses information gain for spatial data,
only. The objective of this paper is to propose a new spatial namely spatial information gain, to choose a layer as a
decision tree algorithm based on the ID3 algorithm for discrete splitting layer. Instead of using number of tuples in a partition,
features represented in points, lines and polygons. As in the ID3 spatial information gain is calculated using spatial measures.
algorithm that use information gain in the attribute selection, the We adopt the formula for spatial information gain proposed in
proposed algorithm uses the spatial information gain to choose [2]. We extend the spatial measure definition for the
the best splitting layer from a set of explanatory layers. The new
formula for spatial information gain is proposed using spatial
geometry type points, lines and polygons rather than only for
measures for point, line and polygon features. Empirical result polygons as in [2].
demonstrates that the proposed algorithm can be used to join The paper is organized as follows: introduction is in section
two spatial objects in constructing spatial decision trees on small 1. Related works in developing spatial decision tree
spatial dataset. The proposed algorithm has been applied to the algorithms are briefly explained in section 2. Section 3
real spatial dataset consisting of point and polygon features. The discusses spatial relationships. We explain a proposed spatial
result is a spatial decision tree with 138 leaves and the accuracy decision tree algorithm in section 4. Finally we summarize
is 74.72%. the conclusion in section 5.
Keywords— ID3 algorithm, spatial decision tree, spatial II. RELATED WORKS
information gain, spatial relation, spatial measure
The works in developing spatial data mining algorithms
I. INTRODUCTION including spatial classification and spatial association rules
Utilizing data mining tasks on a spatial dataset differs with continue growing in recent years. The discovery processes
the tasks on a non-spatial dataset. Spatial data describe such as classification and association rules mining for spatial
locations of features. In a non-spatial dataset especially for data are more complex than those for non-spatial data,
classification, data are arranged in a single relation consisting because spatial data mining algorithms have to consider the
of some columns for attributes and rows representing tuples neighbours of objects in order to extract useful knowledge [3].
that have values for each attributes. In a spatial dataset, data In the spatial data mining system, the attributes of the
are organized in a set of layers representing continue or neighbours of an object may have a significant influence on
discrete features. Discrete features include points (e.g. village the object itself.
centres), lines (e.g. rivers) and polygons (e.g. land cover Spatial decision trees refer to a model expressing
types). One layer relates to other layers to create objects in a classification rules induced from spatial data. The training
spatial dataset by applying spatial relations such as topological and testing records for this task consist of not only object of
relations and metric relation. In spatial data mining tasks we interest itself but also neighbours of objects. Spatial decision
should consider not only objects itself but also their trees differ from conventional decision trees by taking account
neighbours that could belong to other layers. In addition, types implicit spatial relationships in addition to other object
of attributes in a non-spatial dataset include numerical and attributes [4]. Reference [3] introduced an algorithm that was
categorical meanwhile features in layers are represented by designed for spatial databases based on the ID3 algorithm [1].
The algorithm considers not only attributes of the object to be
___________________________________
978-1-4244-8351-8/11/$26.00 ©2011 IEEE
classified but to consider also attributes of neighbouring Boolean value, the quantitative values in the third column of
objects. The algorithm does not make distinction between SJI are spatial measure of features as results from spatial
thematic layers and it takes into account only one spatial relationships between two layers.
relationship [4]. The decision tree from spatial data was also We consider an input for the algorithm a spatial database as
proposed as in [5]. The approach for spatial classification a set of layers L. Each layer in L is a collection of
used in [5] is based on both (1) non-spatial properties of the geographical objects and has only one geometric type that can
classified objects and (2) attributes, predicates and functions be polygons, or lines or points. Assume that each object of a
describing spatial relation between classified objects and other layer is uniquely identified. Let L is a set of layers, Li and Lj
features located in the spatial proximity of the classified are two distinct layers in L. A spatial relationship applied to Li
objects. Reference [6] discusses another spatial decision tree and Lj is denoted SpatRel(Li, Lj) that can be topological
algorithm namely SCART (Spatial Classification and relation or metric relation. For the case of topological relation,
Regression Trees) as an extension of the CART method. The SpatRel(Li, Lj) is a relation according to the dimension
CART (Classification and Regression Trees) is one of most extended method proposed by [10]. While for the case of
commonly used systems for induction of decision trees for metric relation, SpatRel(oi, oj) is a distance relation proposed
classification proposed by Brieman et. al. in 1984. The by [11], where oi is a spatial object in Li and oj is a spatial
SCART considers the geographical data organized in thematic object in Lj.
layers, and their spatial relationships. To calculate the spatial Relations between two layers in a spatial database can
relationship between the locations of two collections of spatial result quantitative values such as distance between two points
objects, SCART has the Spatial Join Index (SJI) table [7] as or intersection area of two polygons in each layer. We denote
one of input parameters. The study [2] extended the ID3 these values as spatial measures as in [2] that will be used in
algorithm [1] such that the new algorithm can create a spatial calculating spatial information gain in the proposed algorithm.
decision tree from the spatial dataset taking into account not For the case of topological relation, the spatial measure of a
only spatial objects itself but also their relationship to its feature is defined as follows. Let Li and Lj in a set of layers L,
neighbour objects. The algorithm generates a tree by selecting Li /j, for each feature ri in R = SpatRel(Li, Lj), a spatial
the best layer to separate a dataset into smaller partitions as measure of ri denoted by SpatMes(ri) is defined as
pure as possible meaning that all tuples in partitions belong to 1. Area of ri, if < Li, in, Lj > or < Li, overlap, Lj > hold for
the same class. As in the ID3 algorithm, the algorithm uses the all features in Li and Lj represented in polygon
information gain for spatial data, namely spatial information 2. Count of ri, if < Li, in, Lj > holds for all features in Li
gain, to choose a layer as a splitting layer. Instead of using represented in point and all features in Lj represented
number of tuples in a partition, spatial information gain is in polygon.
calculated using spatial measures namely area [2]. For the case of metric relation, we define a distance function
from p to q as dist(p, q), distance from a point (or line) p in Li
III. SPATIAL RELATIONSHIP to a point (or line) q in Lj.
Determining spatial relationships between two features is a Spatial measure of R is denoted by SpatMer(R) and defined as
major function of a Geographical Information Systems (GISs). SpatMes(R) = f(SpatMes(r1), SpatMes(r2), …, SpatMes(rn))
Spatial relationships include topological [8] such as overlap, (1)
touch, and intersect and metric such as distance. For example, for ri in R, i = 1, 2, …, n and n number of features in R. f is an
two different polygon features can overlap, touch, or intersect aggregate function that can be sum, min, max or average.
each other. Spatial relationships make spatial data mining A spatial relationship applied to Li and Lj in L results a new
algorithms differ from non-spatial data mining algorithms. layer R. We define a spatial join relation (SJR) for all features
Spatial relationships are materialized by an extension of the p in Li and q in Lj as follows:
well-known join indices [7]. The concept of join index SJR = {(p, SpatMes(r), q | r is a feature in R associated to p
between two relations was proposed in [9]. The result of join and q}. (2)
index between two relations is a new relation consisting of
indices pairs each referencing a tuple of each relation. The IV. EXTENDED ID3 ALGORITHM FOR SPATIAL DATA
pairs of indices refer to objects that meet the join criterion. A spatial database is composed of a set of layers in which
Reference [7] introduced the structure Spatial Join Index (SJI) all features in a layer have the same geometry type. This
as an extended the join indices [9] in the relational database study considers only discrete features include points, lines and
framework. Join indices can be handled in the same way than polygons. For mining purpose using classification algorithms,
other tables and manipulated using the powerful and the a set of layers that divided into two groups: explanatory layers
standardized SQL query language [7]. It pre-computes the and one target layer (or reference layer) where spatial
exact spatial relationships between objects from thematic relationships are applied to construct set of tuples. The target
layers [7]. In addition, a spatial join index has a third column layer has some attributes including a target attribute that store
that contains spatial relationship, SpatRel, between two layers. target classes. Each explanatory layer has several attributes.
Our study adopts the concept of SJI as in [7] to store the One of the attributes is a predictive attribute that will classify
relations between two different layers in spatial database. tuples in the dataset to target classes. In this study the target
Instead of spatial relationship that can be numerical or attribute and predictive attributes are categorical. Features
(polygons, lines or points) in the target layer are related to H(S|L) represents the amount of information needed (after the
features in explanatory layers to create a set of tuples in which partitioning) in order to arrive at an exact classification.
each value in a tuple corresponds to value of these layers.
Two distinct layers are associated to produce a new layer B. Spatial Information Gain
using a spatial relationship. Relation between two layers The spatial information gain for the layer L is given by:
produces a spatial measure (1) for the new layer. Spatial Gain(L) = H(S) H(S|L) (5)
measure then will be used in the formula for spatial Gain(L) denotes how much information would be gained by
information gain. branching on the layer L. The layer L with the highest
Building a spatial decision tree follows the basic learning information gain, (Gain(L)), is chosen as the splitting layer at
process in the algorithm ID3 [1]. The ID3 calculates a node N. This is equivalent to say that we want to partition
information gain to define the best splitting layer for the objects according to layer L that would do the “best
dataset. In spatial decision tree algorithm we define the spatial classification”, such that the amount of information still
information gain to select an explanatory layer L that gives required to complete classifying the objects is minimal (i.e.,
best splitting the spatial dataset according to values of minimum H(S|L)).
predictive attribute in the layer L. For this purpose, we adopt
the formula for spatial information gain as in [2] and apply the C. Spatial Decision Tree Algorithm
spatial measure (1) to the formula. Fig. 1 shows our proposed algorithm to generate spatial
Let a dataset D be a training set of class-labelled tuples. In decision tree (SDT). Input of the algorithm is divided into two
the non-spatial decision tree algorithm we calculate groups: 1) a set of layers containing some explanatory layers
probability that an arbitrary tuples in D belong to class Ci and and one target layer that hold class labels for tuples in the
it is estimated by |Ci,D|/|D| where |D| is number of tuples in D dataset, and 2) spatial join relations (SJRs) storing spatial
and |Ci,D| is number of tuples of class Ci in D [12]. In this measures for features resulted from spatial relations between
study, a dataset contains some layers including a target layer two layers. The algorithm generates a tree by selecting the
that store class labels. Number of tuples in the dataset is the best layer to separate dataset into smaller partitions as pure as
same as number of objects in the target layer because each possible meaning that all tuples in partitions belong to the
tuple will be created by relating features in the target layer to same class.
features in explanatory layers. One feature in the target layer To illustrate how the algorithm works, consider an active
will exactly associate with one tuple in the dataset. For fire dataset containing three explanatory layers: land cover
simplicity we will use number of objects in the target layer (Lland_cover), population density (Lpopulation_density) and river
instead of using number of tuples in the spatial dataset. (Lriver), and one target layer (Ltarget) (Fig. 2).
Furthermore in a non-spatial dataset, target classes are Land cover layer represents polygon features for land cover
discrete-valued and unordered (categorical) and explanatory types. It has a predictive attribute that contains land cover
attributes are categorical or numerical. In spatial dataset, types in the study area. They are dryland forest, paddy field,
features in layers are represented by geometric type (polygons, mix garden, shrubs, and paddy field (Fig. 2a).
lines or points) that have quantitative measurements such as Population layer contains polygon features for population
area and distance. For that we calculate spatial measures of density. The layer has a predictive attribute population class
layers (1) to replace number of tuples in a non-spatial data representing classes for population density (Fig. 2b). Classes
partition. for population density are as follows:
x Low: population_density <= 50
A. Entropy x Medium: 50 < population_density <= 150
Let a target attribute C in a target layer S has l distinct x High: population_density > 150
classes (i.e. c1, c2, …, cl), entropy for S represents the River layer has only two attributes: the identifier of objects
expected information needed to determine the class of tuples and geometry representation for lines.
in the dataset and defined as Target layer represents point features for true and false alarm.
l SpatMes ( S ) SpatMes( S ci ) True alarms (T) are active fires (hotspots) and false alarms (F)
ci
H ( S ) ¦ log 2 (3) are random points generated near true alarms.
i 1 SpatMes ( S ) SpatMes( S )
The algorithm requires spatial measures in the spatial join
SpatMes(S) represents the spatial measure of layer S as relation (SJR) between the target layer and an explanatory
defined in (1). layer.
Let an explanatory attribute V in an explanatory (non-target)
layer L has q distinct values (i.e. v1, v2, …, vq). We partition
the objects in target layer S according to the layer L then we
have a set of layers L(vi, S) for each possible value vi in L. In
our work, we assume that the layer L covers all areas in the
layer S. The expected entropy value for splitting is given by:
q SpatMes ( L(v , S ))
j
H ( S | L) ¦ H (L(v j , S )) (4)
j 1 SpatMes ( S )
Algorithm: Generate_SDT (Spatial Decision Tree)
Input:
a. Spatial dataset D, which is a set of training
tuples and their associated class labels. These
tuples are constructed from a set of layers, P,
using spatial relations.
b. A target layer S P with a target attribute C. (a) (b)
c. A non empty set of explanatory layers L P
and L L has a predictive attribute V. P = S
L.
d. Spatial Join Relation (SJR) on the set of layers
P, SJR(P), as defined in (2).
Output: A Spatial Decision Tree
Method:
1 Create a node N; (c) (d)
2 If only one explanatory layer in L then
Fig. 2 A set of layers: (a) land cover, (b) population density, (c) river,
3 return N as a leaf node labeled with the (d) hotspot occurrences
majority class in D; // majority voting
4 endif
5 If objects in D are all of the same class c then
6 return N as a leaf node labeled with the
class c;
7 endif
8 Apply layer_selection_method(D, L, SJR(P))
to find the “best” splitting layer, L*; (a) (b)
9 Label node N with L*;
10 Split D according to the best splitting layer L* in
{D(v1), …, D(vm)}. D(vi) is outcome i of
splitting layer L* and vi, …,vm are possible
values of predictive attribute V in L*;
11 L = L – {L*}; Fig. 3 Target layer overlaid with (a) land cover and (b) population density
12 for each D(vi), i = 1, 2, …, m, do
13 let Ni = Generate_SDT(D(vi), L, SJR(P)); We transform minimum distance from numerical to
14 Attach node Ni to N and label the edge with categorical attribute because the algorithm requires categorical
a selected value of predictive attribute V in value for target and predictive attributes. For that, minimum
L*. distance is classified into three classes based on the following
15 endfor. criteria:
x Low: minimum distance (km) <= 1.5
Fig. 1 Extended ID3 decision tree algorithm x Medium: 1.5 < minimum distance (km) <= 3
Table I provides spatial relationships and spatial measures we x High: minimum distance (km) > 3
use to create SJRs. Following the spatial decision tree algorithm, we start
building a tree by selecting a root node for the tree. The root
TABLE I node is selected from the explanatory layers based on the
SPATIAL RELATION AND SPATIAL MEASURE
value of spatial information gain for each layers (i.e. land
Target Spatial Explanatory Spatial cover, population density and distance to nearest river). For
layer Relationship Layer Measure instance, we calculate spatial information gain for land cover
target in land_cover count layer (Lland_cover). The same procedure can be applied to other
target in population_density count explanatory layers. The entropy of land cover layer for each
target distance river distance type of land cover is given, respectively:
The spatial relationship in and the aggregate function sum H ( Lland _ cov er (dryland _ forest , C ))
are applied to extract all objects in the target layer which are 3 3 7 7
located inside land cover types (Fig. 3a) and population = log 2 log 2 = 0.8812909
10 10 10 10
density classes (Fig. 3b).
The spatial relation distance and aggregate function min are
applied to calculate distance from target objects to nearest H ( Lland _ cov er (mix _ garden, C ))
river. Distance from target objects to nearest river is 3 3 9 9
represented in numerical value. = log 2 log 2 = 0.8112781
12 12 12 12
H ( Lland _ cov er ( Paddy _ field , C )) Land cover
6 6 0 0
= log 2 log 2 = 0
6 6 6 6
H ( Lland _ cov er ( Shrubs, C ))
0 0 2 2
= log 2 log 2 = 0
2 2 2 2 Dryland forest Mix Paddy Shrubs
From (4) we calculate the expected entropy value for splitting: garden field
H ( S | Lland _ cov er ) Population density Distance to nearest river
False True
10 12 6 2
= u 0.8812909 u 0.8112781 u 0 u 0
30 30 30 30
= 0.618274883
Entropy for the target layer S:
12 12 18 18 Low Medium High Low Medium High
H(S) = log 2 log 2 = 0.970950594
30 30 30 30
True True False True
From (5) we calculate the information gain for land cover True False
layer: Fig. 4 Spatial decision tree
Gain(Lland_cover) = H(S) H(S|Lland_cover) = 0.352675712
The spatial information gain for other layers is as follows 4. IF land cover is mix garden AND distance to nearest river
Gain(Lpopulation_density) = 0.18538127 is low THEN Hotspot Occurrence is True
Gain(Lriver) = 0.097717695 5. IF land cover is mix garden AND distance to nearest river
Lland_cover has the highest spatial information gain compared to is medium THEN Hotspot Occurrence is True
two other layers. Therefore Lland_cover is selected as the root of 6. IF land cover is mix garden AND distance to nearest river
the tree. There are four possible values for land cover types: is high THEN Hotspot Occurrence is False
dryland forest, mix garden, paddy field, and shrubs that will 7. IF land cover is paddy field THEN Hotspot Occurrence is
be assigned as label of edges connecting the root node to False
internal nodes. 8. IF land cover is shrubs THEN Hotspot Occurrence is
The Generate_SDT algorithm is then applied to a set of True
layer containing new explanatory layers and the target layer to The decision tree has the misclassi¿FDWLRQ HUURU RI WKH
construct a subtree attached to the root node. New explanatory training set: 16.67% and the error of the testing set: 20%. The
layers are created from existing explanatory layers, best layer accuracy of the tree on the testing set is 80%. The number of
and the value vj of predictive attribute as a selection criterion target objects in the testing set is 30 and the number of
in a query to relate an explanatory layer and the best layer. correctly classified objects is 24.
The tree will stop growing if it meets one of the following The proposed algorithm has been applied to the real active
termination criteria: fires dataset for the Rokan Hilir District, Riau Province
1. Only one explanatory layer in L. In this situation, the Indonesia with the total area is 896,142.93 ha. The dataset
algorithm returns a leaf node labeled with the majority contains five explanatory layers and one target layer. The
class in the SJR for the best layer and the explanatory target layer consists of active fires (hotspots) as true alarm
layer. data and non-hotspots as false alarm data randomly generated
2. The SJR for best layer and explanatory layer contains the near hotspots. Explanatory layers include distance from target
same class c. Then the algorithm returns a leaf node objects to nearest river (dist_river), distance from target
labeled with the class c. objects to nearest road (dist_road), land cover, income source
The graphical depiction of spatial decision tree generated and population density for the village level in the Rokan Hilir
from P = {Lland_cover, Lpopulation_desity, Lriver, target (S)} is shown District. Tabel II summaries the number of features in the
in Fig. 4. The final spatial decision tree contains 8 leaves and dataset for each layer.
3 nodes with the first test attribute is land cover (Fig. 4). TABLE III
Below are rules extracted from the tree: NUMBER OF FEATURES IN THE DATASET
1. IF land cover is dryland forest AND population density is
Layer Number of features
low THEN Hotspot Occurrence is True dist_river 744 points
2. IF land cover is dryland forest AND population density is dist_road 744 points
medium THEN Hotspot Occurrence is True land_cover 3107 polygons
3. IF land cover is dryland forest AND population density is income_source 117 polygons
high THEN Hotspot Occurrence is False population 117 polygons
target 744 points
The decision tree generated from the proposed spatial Education, Indonesia for supporting PhD Scholarship
decision tree algorithm contains 138 leaves with the first test (Contract No. 1724.2/D4.4/2008) and Southeast Asian
attribute is distance from target objects to nearest river Regional Centre for Graduate Study and Research in
(dist_river). The accuracy of the tree on the training set is Agriculture (SEARCA) for partially supporting the research.
74.72% in which 182 of 720 target objects are incorrectly
classified by the tree. Some preprocessing tasks will be
applied to the real spatial dataset such as smoothing to remove REFERENCES
noise from the data, discretization and ggeneralization in order [1] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1,
Kluwer Academic Publishers, Boston, pp. 81-106, 1986.
to obtain a spatial decision tree with the higher accuracy. [2] S. Rinzivillo and T. Franco, Classi¿FDWLRQ LQ *HRJUDSKLFDO
Information Systems. Lecture Notes in Artificial Intelligence. Berlin
V. CONCLUSIONS Heidelberg: Springer-Verlag, pp. 374–385, 2004.
[3] M. Ester, Kr. Hans-Peter, and S. Jörg, “Spatial Data Mining: A
This paper presents an extended ID3 algorithm that can be Database Approach,” in Proc. of the Fifth Int. Symposium on Large
applied to a spatial database containing discrete features Spatial Databases, Berlin, Germany, 1997.
(polygons, lines and points). Spatial data are organized in a [4] K. Zeitouni and C. Nadjim, “Spatial Decision Tree – Application to
set of layers that can be grouped into two categories i.e. Traffic Risk Analysis,” in ACS/IEEE International Conference, IEEE,
2001.
explanatory layers and target layer. Two different layers in [5] K. Koperski, J. Han and N. Stefanovic, “An efficient two-step method
the database are related using topological relationships or for classification of spatial data,” In Symposium on Spatial Data
metric relationship (distance). Quantitative measures such as Handling, 1998.
area and distance from relations between two layers are then [6] N. Chelghoum, Z. Karine, and B. Azedine, “A Decision Tree for Multi-
Layered Spatial Data,” in Symposium on Geospatial Theory,
used in calculating spatial information gain. The algorithm Processing and Applications, Ottawa, 2002.
will select an explanatory layer with the highest information [7] K. Zeitouni, L. Yeh, and M.A. Aufaure, “Join Indices as a Tool for
gain as the best splitting layer. This layer separates the dataset Spatial Data Mining,” in International Workshop on Temporal, Spatial
into smaller partitions as pure as possible such that all tuples and Spatio-Temporal Data Mining, 2000.
[8] M. J. Egenhofer, and D. F. Robert, “Point-set topological spatial
in partitions belong to the same class. relations,” International Journal of Geographical Information Systems,
Empirical result shows that the algorithm can be used to vol. 5(2), pp. 161 – 174, 1991.
join two spatial objects in constructing spatial decision trees [9] P. Valduriez, “Join indices,” ACM Trans. on Database Systems, vol.
on small spatial dataset. Applying the proposed algorithm on 12(2), pp. 218-246, June 1987.
[10] E. Clementini, P. Di Felice, and O. Oosterorn, A small set of formal
the real spatial dataset results a spatial decision tree containing topological relationships suitable for end-user interaction. Lecture
138 leaves and the accuracy of the tree on the training set is Notes in Computer Science. New York: Springer, pp. 277–295, 1993.
74.72%. [11] M. Ester, K. Hans-Peter, and S. Jörg, “Algorithms and Applications for
Spatial Data Mining,” Geographic Data Mining and Knowledge
ACKNOWLEDGMENT Discovery, Research Monographs in GIS, Taylor and Francis, 2001.
[12] J. Han and M. Kamber, Data Mining Concepts and Techniques, 2nd
The authors would like to thank Indonesia Directorate ed., San Diego, USA: Morgan-Kaufmann, 2006.
General of Higher Education (IDGHE), Ministry of National