@@ -91,12 +91,18 @@ Configuration parameters
9191+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
9292| schema_samplingRatio | 1.0 | No |
9393+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
94- | writeConcern | mongodb.WriteConcern.ACKNOWLEDGED | No |
94+ | writeConcern | "safe" | No |
9595+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
9696| splitSize | 10 | No |
9797+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
9898| splitKey | "fieldName" | No |
9999+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
100+ | splitKeyType | "dataTypeName" | No |
101+ +-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
102+ | splitKeyMin | "minvalue" | No |
103+ +-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
104+ | splitKeyMax | "maxvalue" | No |
105+ +-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
100106| credentials | "user,database,password;user,database,password" | No |
101107+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
102108| updateFields | "fieldName,fieldName" | No |
@@ -117,7 +123,7 @@ Configuration parameters
117123+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
118124| threadsAllowedToBlockForConnectionMultiplier | "5" | No |
119125+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
120- | idasobjectid | "false" | No |
126+ | idAsObjectId | "false" | No |
121127+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
122128| connectionsTime | "180000" | No |
123129+-----------------------------------------------+--------------------------------------------------------------------------------+-------------------------+
@@ -129,12 +135,95 @@ Configuration parameters
129135
130136**Note: ** '_id' field is autogenerated in MongoDB, so by default, you can filter it as String. If you need a custom '_id', you have to set 'idasobjectid' property to "false" like in the above table.
131137
138+ There are two ways to set up configuration:
139+
140+ 1. Using MongodbConfigBuilder, which should contains all the config with the right types.
141+
142+ 2. Using DataFrame API to read/write with a Map[String, String] or setting configuration from a SQL sentence (in String to String format) as in:
143+
144+ ::
145+
146+ CREATE TEMPORARY TABLE tableName USING com.stratio.datasource.mongodb
147+ OPTIONS (host 'host:port', database 'highschool', collection 'students')"
148+
149+
150+ Credentials
151+ -----------
152+
153+ To connect with credentials you should specify user, database and password.
154+
155+ From MongodbConfigBuilder, you have to create a list of MongodbCredentials, here is an example:
156+
157+ ::
158+
159+ MongodbConfigBuilder(Map(Host -> List("localhost:27017"), Database -> "highschool", Collection ->"students",
160+ List(com.stratio.datasource.mongodb.MongodbCredentials(user, database, password.toCharArray))
161+ )).build
162+
163+
164+ In other case, (String format) you have to use the format set in the table above.
165+
166+ One credential:
167+
168+ ::
169+
170+ "user,database,password"
171+
172+
173+
174+ Two credentials:
175+
176+ ::
177+
178+ "user1,database1,password1;user2,database2,password2"
179+
180+
181+
182+ SplitKey parameters
183+ -------------------
184+
185+ An index is needed in the splitKey field.
186+
187+ All splitKey parameters are optionals.
188+
189+ splitKey: Field to split for.
190+
191+ splitSize: Max size of each chunk in MB.
192+
193+ If you want to use explicit boundaries to choose what data get from MongoDB, you will have to use these parameters:
194+
195+ - splitKeyType: Data type of splitKey field. Next MongoDB types are supported:
196+ - "isoDate"
197+ - "int"
198+ - "long"
199+ - "double"
200+ - "string"
201+
202+ - splitKeyMin: Min value of the split in string format.
203+
204+ - splitKeyMax: Max value of the split in string format.
205+
206+ **Note: ** Only data between boundaries would be available
207+
208+
132209Examples
133210========
134211
135212Scala API
136213---------
137214
215+ Launch the spark shell:
216+ ::
217+
218+ $ bin/spark-shell --packages com.stratio.datasource:spark-mongodb_2.10:<VERSION>
219+
220+ If you are using the spark shell, a SQLContext is already created and is available as a variable: 'sqlContext'.
221+ Alternatively, you could create a SQLContext instance in your spark application code:
222+
223+ ::
224+
225+ val sqlContext = new SQLContext(sc)
226+
138227To read a DataFrame from a Mongo collection, you can use the library by loading the implicits from `com.stratio.datasource.mongodb._ `.
139228
140229To save a DataFrame in MongoDB you should use the saveToMongodb() function as follows:
@@ -172,8 +261,8 @@ In the example we can see how to use the fromMongoDB() function to read from Mon
172261 val readConfig = builder.build()
173262 val mongoRDD = sqlContext.fromMongoDB(readConfig)
174263 mongoRDD.registerTempTable("students")
175- sqlContext.sql("SELECT name, age FROM students")
176-
264+ val dataFrame = sqlContext.sql("SELECT name, age FROM students")
265+ dataFrame.show
177266
178267
179268If you want to use a SSL connection, you need to add this 'import', and add 'SSLOptions' to the MongodbConfigBuilder:
@@ -191,9 +280,9 @@ Using StructType:
191280
192281 import org.apache.spark.sql.types._
193282 val schemaMongo = StructType(StructField("name", StringType, true) :: StructField("age", IntegerType, true ) :: Nil)
194- sqlContext.createExternalTable("mongoTable", "com.stratio.datasource.mongodb", schemaMongo, Map("host" -> "localhost:27017", "database" -> "highschool", "collection" -> "students"))
283+ val df = sqlContext.read.schema(schemaMongo).format("com.stratio.datasource.mongodb").options(Map("host" -> "localhost:27017", "database" -> "highschool", "collection" -> "students")).load
284+ df.registerTempTable("mongoTable")
195285 sqlContext.sql("SELECT * FROM mongoTable WHERE name = 'Torcuato'").show()
196- sqlContext.sql("DROP TABLE mongoTable")
197286
198287
199288Using DataFrameWriter:
@@ -242,9 +331,19 @@ Then:
242331::
243332
244333 from pyspark.sql import SQLContext
245- sqlContext.sql("CREATE TEMPORARY TABLE students_table USING com.stratio.datasource.mongodb OPTIONS (host 'host:port ', database 'highschool', collection 'students')")
334+ sqlContext.sql("CREATE TEMPORARY TABLE students_table USING com.stratio.datasource.mongodb OPTIONS (host 'localhost:27017 ', database 'highschool', collection 'students')")
246335 sqlContext.sql("SELECT * FROM students_table").collect()
247336
337+ Using DataFrameReader and DataFrameWriter:
338+ ::
339+
340+ df = sqlContext.read.format('com.stratio.datasource.mongodb').options(host='localhost:27017', database='highschool', collection='students').load()
341+ df.select("name").collect()
342+
343+ df.select("name").write.format("com.stratio.datasource.mongodb").mode('overwrite').options(host='localhost:27017', database='highschool', collection='studentsview').save()
344+ dfView = sqlContext.read.format('com.stratio.datasource.mongodb').options(host='localhost:27017', database='highschool', collection='studentsview').load()
345+ dfView.show()
346+
248347Java API
249348--------
250349
0 commit comments