Skip to content

Commit c65dd6c

Browse files
authored
Create avro-parquet.scala
1 parent b7db87d commit c65dd6c

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

avro-parquet.scala

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import com.databricks.spark.avro._
2+
3+
//valid values for compression codec: snappy, deflate, uncompressed
4+
sqlContext.setConf("spark.sql.avro.compression.codec", "snappy")
5+
sqlContext.setConf("spark.sql.avro.deflate.level", "5")
6+
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
7+
8+
val ratings_all = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/input/ratings-all/ratings-all.csv")
9+
10+
//if we want that parquet/avro files are saved in a single partition we do it like this
11+
ratings_all.coalesce(1).saveAsParquetFile("/input/ratings_all_parquet")
12+
ratings_all.coalesce(1).write.format("com.databricks.spark.avro").save("/input/ratings_all_avro")
13+

0 commit comments

Comments
 (0)