Red Datasets provides classes that provide common datasets such as iris dataset.
You can use datasets easily because you can access each dataset with multiple ways such as #each and Apache Arrow Record Batch.
% gem install red-datasets- Adult Dataset
- Aozora Bunko
- California Housing
- CIFAR-10 Dataset
- CIFAR-100 Dataset
- CLDR language plural rules
- Communities and crime
- Diamonds Dataset
- E-Stat Japan
- Fashion-MNIST
- Fuel Economy Dataset
- Geolonia Japanese Addresses
- Hepatitis
- House of Councillors of Japan
- House of Representatives of Japan
- Iris Dataset
- Libsvm
- MNIST database
- Mushroom
- Penguins
- The Penn Treebank Project
- PMJT - Pre-Modern Japanese Text dataset list
- Postal Codes in Japan
- Rdatasets
- Seaborn
- Sudachi Synonym Dictionary
- Wikipedia
- Wine Dataset
Here is an example to access Iris Data Set by #each or Table#to_h or Table#fetch_values.
require "datasets" iris = Datasets::Iris.new iris.each do |record| p [ record.sepal_length, record.sepal_width, record.petal_length, record.petal_width, record.label, ] end # => [5.1, 3.5, 1.4, 0.2, "Iris-setosa"] # => [4.9, 3.0, 1.4, 0.2, "Iris-setosa"] : # => [7.0, 3.2, 4.7, 1.4, "Iris-versicolor"] iris_hash = iris.to_table.to_h p iris_hash[:sepal_length] # => [5.1, 4.9, .. , 7.0, .. p iris_hash[:sepal_width] # => [3.5, 3.0, .. , 3.2, .. p iris_hash[:petal_length] # => [1.4, 1.4, .. , 4.7, .. p iris_hash[:petal_width] # => [0.2, 0.2, .. , 1.4, .. p iris_hash[:label] # => ["Iris-setosa", "Iris-setosa", .. , "Iris-versicolor", .. iris_table = iris.to_table p iris_table.fetch_values(:sepal_length, :sepal_width, :petal_length, :petal_width).transpose # => [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2], : [7.0, 3.2, 4.7, 1.4], : p iris_table[:label] # => ["Iris-setosa", "Iris-setosa", .. , "Iris-versicolor", ..Here is an example to access The CIFAR-10/100 dataset by #each:
CIFAR-10
require "datasets" cifar = Datasets::CIFAR.new(n_classes: 10, type: :train) cifar.metadata #=> #<struct Datasets::Metadata name="CIFAR-10", url="https://www.cs.toronto.edu/~kriz/cifar.html", licenses=nil, description="CIFAR-10 is 32x32 image dataset">licenses=nil, description="CIFAR-10 is 32x32 image datasets"> cifar.each do |record| p record.pixels # => [59, 43, 50, 68, 98, 119, 139, 145, 149, 143, .....] p record.label # => 6 endCIFAR-100
require "datasets" cifar = Datasets::CIFAR.new(n_classes: 100, type: :test) cifar.metadata #=> #<struct Datasets::Metadata name="CIFAR-100", url="https://www.cs.toronto.edu/~kriz/cifar.html", licenses=nil, description="CIFAR-100 is 32x32 image dataset"> cifar.each do |record| p record.pixels #=> [199, 196, 195, 195, 196, 197, 198, 198, 199, .....] p record.coarse_label #=> 10 p record.fine_label #=> 49 endMNIST
require "datasets" mnist = Datasets::MNIST.new(type: :train) mnist.metadata #=> #<struct Datasets::Metadata name="MNIST-train", url="http://yann.lecun.com/exdb/mnist/", licenses=nil, description="a training set of 60,000 examples"> mnist.each do |record| p record.pixels # => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .....] p record.label # => 5 end- Fork https://github.com/red-data-tools/red-datasets
- Create a feature branch from master
- Develop in the feature branch
- Pull request from the feature branch to https://github.com/red-data-tools/red-datasets
The MIT license. See LICENSE.txt for details.