Skip to content

Commit b506681

Browse files
committed
Adding more datasets
1 parent 9eed746 commit b506681

30 files changed

+29464
-1
lines changed

src/data/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ def __init__(self, name='scale'):
3939
file_path = path.join(base_path, name, "{}.data".format(name))
4040
with open(path.abspath(file_path), 'rb') as handle:
4141
for line in handle:
42-
data, item = line.split(','), []
42+
delimiter = ',' if ',' in line else ' '
43+
data, item = line.split(delimiter), []
4344

4445
# Generate variables if they aren't preset
4546
if not self.variables:

src/data/autos/autos.data

Lines changed: 205 additions & 0 deletions
Large diffs are not rendered by default.

src/data/autos/autos.names

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
1. Title: 1985 Auto Imports Database
2+
3+
2. Source Information:
4+
-- Creator/Donor: Jeffrey C. Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
5+
-- Date: 19 May 1987
6+
-- Sources:
7+
1) 1985 Model Import Car and Truck Specifications, 1985 Ward's
8+
Automotive Yearbook.
9+
2) Personal Auto Manuals, Insurance Services Office, 160 Water
10+
Street, New York, NY 10038
11+
3) Insurance Collision Report, Insurance Institute for Highway
12+
Safety, Watergate 600, Washington, DC 20037
13+
14+
3. Past Usage:
15+
-- Kibler,~D., Aha,~D.~W., \& Albert,~M. (1989). Instance-based prediction
16+
of real-valued attributes. {\it Computational Intelligence}, {\it 5},
17+
51--57.
18+
-- Predicted price of car using all numeric and Boolean attributes
19+
-- Method: an instance-based learning (IBL) algorithm derived from a
20+
localized k-nearest neighbor algorithm. Compared with a
21+
linear regression prediction...so all instances
22+
with missing attribute values were discarded. This resulted with
23+
a training set of 159 instances, which was also used as a test
24+
set (minus the actual instance during testing).
25+
-- Results: Percent Average Deviation Error of Prediction from Actual
26+
-- 11.84% for the IBL algorithm
27+
-- 14.12% for the resulting linear regression equation
28+
29+
4. Relevant Information:
30+
-- Description
31+
This data set consists of three types of entities: (a) the
32+
specification of an auto in terms of various characteristics, (b)
33+
its assigned insurance risk rating, (c) its normalized losses in use
34+
as compared to other cars. The second rating corresponds to the
35+
degree to which the auto is more risky than its price indicates.
36+
Cars are initially assigned a risk factor symbol associated with its
37+
price. Then, if it is more risky (or less), this symbol is
38+
adjusted by moving it up (or down) the scale. Actuarians call this
39+
process "symboling". A value of +3 indicates that the auto is
40+
risky, -3 that it is probably pretty safe.
41+
42+
The third factor is the relative average loss payment per insured
43+
vehicle year. This value is normalized for all autos within a
44+
particular size classification (two-door small, station wagons,
45+
sports/speciality, etc...), and represents the average loss per car
46+
per year.
47+
48+
-- Note: Several of the attributes in the database could be used as a
49+
"class" attribute.
50+
51+
5. Number of Instances: 205
52+
53+
6. Number of Attributes: 26 total
54+
-- 15 continuous
55+
-- 1 integer
56+
-- 10 nominal
57+
58+
7. Attribute Information:
59+
Attribute: Attribute Range:
60+
------------------ -----------------------------------------------
61+
1. symboling: -3, -2, -1, 0, 1, 2, 3.
62+
2. normalized-losses: continuous from 65 to 256.
63+
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
64+
isuzu, jaguar, mazda, mercedes-benz, mercury,
65+
mitsubishi, nissan, peugot, plymouth, porsche,
66+
renault, saab, subaru, toyota, volkswagen, volvo
67+
4. fuel-type: diesel, gas.
68+
5. aspiration: std, turbo.
69+
6. num-of-doors: four, two.
70+
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
71+
8. drive-wheels: 4wd, fwd, rwd.
72+
9. engine-location: front, rear.
73+
10. wheel-base: continuous from 86.6 120.9.
74+
11. length: continuous from 141.1 to 208.1.
75+
12. width: continuous from 60.3 to 72.3.
76+
13. height: continuous from 47.8 to 59.8.
77+
14. curb-weight: continuous from 1488 to 4066.
78+
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
79+
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
80+
17. engine-size: continuous from 61 to 326.
81+
18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
82+
19. bore: continuous from 2.54 to 3.94.
83+
20. stroke: continuous from 2.07 to 4.17.
84+
21. compression-ratio: continuous from 7 to 23.
85+
22. horsepower: continuous from 48 to 288.
86+
23. peak-rpm: continuous from 4150 to 6600.
87+
24. city-mpg: continuous from 13 to 49.
88+
25. highway-mpg: continuous from 16 to 54.
89+
26. price: continuous from 5118 to 45400.
90+
91+
8. Missing Attribute Values: (denoted by "?")
92+
Attribute #: Number of instances missing a value:
93+
2. 41
94+
6. 2
95+
19. 4
96+
20. 4
97+
22. 2
98+
23. 2
99+
26. 4
100+
101+
102+
103+

0 commit comments

Comments
 (0)