|
| 1 | + |
| 2 | + |
| 3 | +# DynaML: ML + JVM + Scala |
| 4 | + |
| 5 | + |
| 6 | +[](https://gitter.im/DynaML/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [](https://travis-ci.org/transcendent-ai-labs/DynaML) [](https://jitpack.io/#transcendent-ai-labs/DynaML) |
| 7 | +[](https://codecov.io/gh/transcendent-ai-labs/DynaML) |
| 8 | +[](http://joss.theoj.org/papers/a561bdd3e960c5b0718c67c3f73c6f3b) |
| 9 | + |
| 10 | +------------------ |
| 11 | + |
| 12 | +<br/> |
| 13 | + |
| 14 | +DynaML is a Scala & JVM Machine Learning toolbox for research, education & industry. |
| 15 | + |
| 16 | +<br/> |
| 17 | + |
| 18 | +<table> |
| 19 | + <tr> |
| 20 | + <th> <img src="images/plot3dsmall.jpeg" alt="Plot3d"> </th> |
| 21 | + <th> <img src="images/plots_small.png" alt="Plot2d"> </th> |
| 22 | + </tr> |
| 23 | +</table> |
| 24 | + |
| 25 | + |
| 26 | +------------------ |
| 27 | + |
| 28 | +## Motivation |
| 29 | + |
| 30 | + |
| 31 | + - __Interactive.__ Don't want to create Maven/sbt project skeletons |
| 32 | + every time you want to try out ideas? Create and execute [scala worksheets](https://github.com/transcendent-ai-labs/DynaML/blob/master/scripts/randomvariables.sc) |
| 33 | + in the DynaML shell. DynaML comes packaged with a customized version of the [Ammonite](http://ammonite.io) REPL, |
| 34 | + with *auto-complete*, file operations and scripting capabilities. |
| 35 | + |
| 36 | + - __End to End.__ Create complex pre-processing pipelines with the [data pipes](https://transcendent-ai-labs.github.io/DynaML/pipes/pipes/) API, |
| 37 | + train models ([deep nets](https://github.com/transcendent-ai-labs/DynaML/blob/master/scripts/cifar.sc), [gaussian processes](https://transcendent-ai-labs.github.io/DynaML/core/core_gp/), |
| 38 | + [linear models](https://transcendent-ai-labs.github.io/DynaML/core/core_glm/) and more), |
| 39 | + optimize over [hyper-parameters](https://transcendent-ai-labs.github.io/DynaML/core/core_opt_global/), |
| 40 | + [evaluate](https://transcendent-ai-labs.github.io/DynaML/core/core_model_evaluation/) model predictions and |
| 41 | + [visualise](https://transcendent-ai-labs.github.io/DynaML/core/core_graphics/) results. |
| 42 | + |
| 43 | + - __Enterprise Friendly.__ Take advantage of the JVM and Scala ecosystem, use Apache [Spark](https://spark.apache.org) |
| 44 | + to write scalable data analysis jobs, [Tensorflow](http://tensorflow.org) for deep learning, all in the same toolbox. |
| 45 | + |
| 46 | +------------------ |
| 47 | + |
| 48 | +## Getting Started |
| 49 | + |
| 50 | +### Platform Compatibility |
| 51 | + |
| 52 | +Currently, only *nix and OSX platforms are supported. |
| 53 | + |
| 54 | +DynaML is compatible with Scala `2.11` |
| 55 | + |
| 56 | +### Installation |
| 57 | + |
| 58 | +Easiest way to install DynaML is cloning & compiling from the [github](/) repository. Please take a look at |
| 59 | +the [installation](https://transcendent-ai-labs.github.io/DynaML/installation/installation/) instructions, |
| 60 | +to make sure that you have the pre-requisites and to configure your installation. |
| 61 | + |
| 62 | +------------------ |
| 63 | + |
| 64 | +## CIFAR in 100 lines |
| 65 | + |
| 66 | +Below is a sample [script](https://github.com/transcendent-ai-labs/DynaML/blob/master/scripts/cifar.sc) where we train a neural network of stacked |
| 67 | +[Inception](https://arxiv.org/pdf/1409.4842.pdf) cells on the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) |
| 68 | +image classification task. |
| 69 | + |
| 70 | +```scala |
| 71 | +import ammonite.ops._ |
| 72 | +import io.github.tailhq.dynaml.pipes.DataPipe |
| 73 | +import io.github.tailhq.dynaml.tensorflow.data.AbstractDataSet |
| 74 | +import io.github.tailhq.dynaml.tensorflow.{dtflearn, dtfutils} |
| 75 | +import io.github.tailhq.dynaml.tensorflow.implicits._ |
| 76 | +import org.platanios.tensorflow.api._ |
| 77 | +import org.platanios.tensorflow.api.learn.layers.Activation |
| 78 | +import org.platanios.tensorflow.data.image.CIFARLoader |
| 79 | +import java.nio.file.Paths |
| 80 | + |
| 81 | + |
| 82 | +val tempdir = home/"tmp" |
| 83 | + |
| 84 | +val dataSet = CIFARLoader.load( |
| 85 | + Paths.get(tempdir.toString()), |
| 86 | + CIFARLoader.CIFAR_10) |
| 87 | + |
| 88 | +val tf_dataset = AbstractDataSet( |
| 89 | + dataSet.trainImages, dataSet.trainLabels, dataSet.trainLabels.shape(0), |
| 90 | + dataSet.testImages, dataSet.testLabels, dataSet.testLabels.shape(0)) |
| 91 | + |
| 92 | +val trainData = |
| 93 | + tf_dataset.training_data |
| 94 | + .repeat() |
| 95 | + .shuffle(10000) |
| 96 | + .batch(128) |
| 97 | + .prefetch(10) |
| 98 | + |
| 99 | + |
| 100 | +println("Building the model.") |
| 101 | +val input = tf.learn.Input( |
| 102 | + UINT8, |
| 103 | + Shape( |
| 104 | + -1, |
| 105 | + dataSet.trainImages.shape(1), |
| 106 | + dataSet.trainImages.shape(2), |
| 107 | + dataSet.trainImages.shape(3)) |
| 108 | +) |
| 109 | + |
| 110 | +val trainInput = tf.learn.Input(UINT8, Shape(-1)) |
| 111 | + |
| 112 | +val relu_act = DataPipe[String, Activation](tf.learn.ReLU(_)) |
| 113 | + |
| 114 | +val architecture = tf.learn.Cast("Input/Cast", FLOAT32) >> |
| 115 | + dtflearn.inception_unit( |
| 116 | + channels = 3, Seq.fill(4)(10), |
| 117 | + relu_act)(layer_index = 1) >> |
| 118 | + dtflearn.inception_unit( |
| 119 | + channels = 40, Seq.fill(4)(5), |
| 120 | + relu_act)(layer_index = 2) >> |
| 121 | + tf.learn.Flatten("Layer_3/Flatten") >> |
| 122 | + dtflearn.feedforward(256)(id = 4) >> |
| 123 | + tf.learn.ReLU("Layer_4/ReLU", 0.1f) >> |
| 124 | + dtflearn.feedforward(10)(id = 5) |
| 125 | + |
| 126 | +val trainingInputLayer = tf.learn.Cast("TrainInput/Cast", INT64) |
| 127 | + |
| 128 | +val loss = |
| 129 | + tf.learn.SparseSoftmaxCrossEntropy("Loss/CrossEntropy") >> |
| 130 | + tf.learn.Mean("Loss/Mean") >> |
| 131 | + tf.learn.ScalarSummary("Loss/Summary", "Loss") |
| 132 | + |
| 133 | +val optimizer = tf.train.Adam(0.1) |
| 134 | + |
| 135 | +val summariesDir = Paths.get((tempdir/"cifar_summaries").toString()) |
| 136 | + |
| 137 | +val (model, estimator) = dtflearn.build_tf_model( |
| 138 | + architecture, input, trainInput, trainingInputLayer, |
| 139 | + loss, optimizer, summariesDir, dtflearn.max_iter_stop(500), |
| 140 | + 100, 100, 100)( |
| 141 | + trainData, true) |
| 142 | + |
| 143 | +def accuracy(predictions: Tensor, labels: Tensor): Float = |
| 144 | + predictions.argmax(1) |
| 145 | + .cast(UINT8) |
| 146 | + .equal(labels) |
| 147 | + .cast(FLOAT32) |
| 148 | + .mean() |
| 149 | + .scalar |
| 150 | + .asInstanceOf[Float] |
| 151 | + |
| 152 | +val (trainingPreds, testPreds): (Option[Tensor], Option[Tensor]) = |
| 153 | + dtfutils.predict_data[ |
| 154 | + Tensor, Output, DataType, Shape, Output, |
| 155 | + Tensor, Output, DataType, Shape, Output, |
| 156 | + Tensor, Tensor]( |
| 157 | + estimator, |
| 158 | + data = tf_dataset, |
| 159 | + pred_flags = (true, true), |
| 160 | + buff_size = 20000) |
| 161 | + |
| 162 | +val (trainAccuracy, testAccuracy) = ( |
| 163 | + accuracy(trainingPreds.get, dataSet.trainLabels), |
| 164 | + accuracy(testPreds.get, dataSet.testLabels)) |
| 165 | + |
| 166 | +print("Train accuracy = ") |
| 167 | +pprint.pprintln(trainAccuracy) |
| 168 | + |
| 169 | +print("Test accuracy = ") |
| 170 | +pprint.pprintln(testAccuracy) |
| 171 | +``` |
| 172 | + |
| 173 | +------------------ |
| 174 | + |
| 175 | + |
| 176 | +## Support & Community |
| 177 | + |
| 178 | + - [Gitter](https://gitter.im/DynaML/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) |
| 179 | + - [Contributing](https://github.com/transcendent-ai-labs/DynaML/blob/master/CONTRIBUTING.md) |
| 180 | + - [Code of Conduct](https://github.com/transcendent-ai-labs/DynaML/blob/master/CODE_OF_CONDUCT.md) |
0 commit comments