You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[GithubDataSource](pyspark_datasources/github.py)|`github`| Read pull requests from a Github repository | None |
44
-
|[FakeDataSource](pyspark_datasources/fake.py)|`fake`| Generate fake data using the `Faker` library |`faker`|
45
-
|[StockDataSource](pyspark_datasources/stock.py)|`stock`| Read stock data from Alpha Vantage | None |
46
-
|[GoogleSheetsDataSource](pyspark_datasources/googlesheets.py)|`googlesheets`| Read table from public Google Sheets | None |
47
-
|[KaggleDataSource](pyspark_datasources/kaggle.py)|`kaggle`| Read datasets from Kaggle |`kagglehub`, `pandas`|
48
-
|[SimpleJsonDataSource](pyspark_datasources/simplejson.py)|`simplejson`| Write JSON data to Databricks DBFS |`databricks-sdk`|
49
-
|[OpenSkyDataSource](pyspark_datasources/opensky.py)|`opensky`| Read from OpenSky Network. | None |
50
-
|[SalesforceDataSource](pyspark_datasources/salesforce.py)|`pyspark.datasource.salesforce`| Streaming datasource for writing data to Salesforce |`simple-salesforce`|
41
+
| Data Source | Short Name | Type | Description | Dependencies | Example |
|[FakeDataSource](pyspark_datasources/fake.py)|`fake`| Batch/Streaming Read | Generate fake data using the `Faker` library |`faker`|`pip install pyspark-data-sources[fake]`<br/>`spark.read.format("fake").load()` or `spark.readStream.format("fake").load()`|
46
+
|[GithubDataSource](pyspark_datasources/github.py)|`github`| Batch Read | Read pull requests from a Github repository | None |`pip install pyspark-data-sources`<br/>`spark.read.format("github").load("apache/spark")`|
47
+
|[GoogleSheetsDataSource](pyspark_datasources/googlesheets.py)|`googlesheets`| Batch Read | Read table from public Google Sheets | None |`pip install pyspark-data-sources`<br/>`spark.read.format("googlesheets").load("https://docs.google.com/spreadsheets/d/...")`|
|[StockDataSource](pyspark_datasources/stock.py)|`stock`| Batch Read | Read stock data from Alpha Vantage | None |`pip install pyspark-data-sources`<br/>`spark.read.format("stock").option("symbols", "AAPL,GOOGL").option("api_key", "key").load()`|
51
+
|**Batch Write**||||||
52
+
|[LanceSink](pyspark_datasources/lance.py)|`lance`| Batch Write | Write data in Lance format |`lance`|`pip install pyspark-data-sources[lance]`<br/>`df.write.format("lance").mode("append").save("/tmp/lance_data")`|
|[WeatherDataSource](pyspark_datasources/weather.py)|`weather`| Streaming Read | Fetch weather data from tomorrow.io | None |`pip install pyspark-data-sources`<br/>`spark.readStream.format("weather").option("locations", "[(37.7749, -122.4194)]").option("apikey", "key").load()`|
56
+
|**Streaming Write**||||||
57
+
|[SalesforceDataSource](pyspark_datasources/salesforce.py)|`pyspark.datasource.salesforce`| Streaming Write | Streaming datasource for writing data to Salesforce |`simple-salesforce`|`pip install pyspark-data-sources[salesforce]`<br/>`df.writeStream.format("pyspark.datasource.salesforce").option("username", "user").start()`|
51
58
52
59
See more here: https://allisonwang-db.github.io/pyspark-data-sources/.
0 commit comments