Python SDK

RisingWave provides a Python SDK risingwave-py to help users develop event-driven applications. This SDK provides a simple way to perform ad-hoc queries, subscribe to changes, and define event handlers for tables and materialized views, making it easier to integrate real-time data into applications.

Use `risingwave-py` to connect to RisingWave

risingwave-py is a RisingWave Python SDK that provides the following capabilities:

Interact with RisingWave via Pandas DataFrame.
Subscribe and process changes from RisingWave tables or materialized views.
Run SQL commands supported in RisingWave.

Run RisingWave

To learn about how to run RisingWave, see Run RisingWave.

Connect to RisingWave

To connect to RisingWave via risingwave-py:

from risingwave import RisingWave, RisingWaveConnOptions  # Connect to RisingWave instance on localhost with named parameters rw = RisingWave(  RisingWaveConnOptions.from_connection_info(  host="localhost", port=4566, user="root", password="root", database="dev"  ) )  # Connect to RisingWave instance on localhost with connection string rw = RisingWave(RisingWaveConnOptions("postgresql://root:root@localhost:4566/dev"))  # You can create a new SQL connection and execute operations under the with statement.  # This is the recommended way for python sdk usage. with rw.getconn() as conn:  conn.insert(...)  conn.fetch(...)  conn.execute(...)  conn.mv(...)  conn.on_change(...)   # You can also use the existing connection created by RisingWave object to execute operations. # This will be used in the later sections for simplicity. rw.insert(...) rw.fetch(...) rw.execute(...) rw.mv(...) rw.on_change(...)  

Ingestion into RisingWave

Load a Pandas DataFrame into RisingWave:

from datetime import datetime import pandas as pd  df = pd.DataFrame(  {  "product": ["foo", "bar"],  "price": [123.4, 456.7],  "ts": [datetime.strptime("2023-10-05 14:30:00", "%Y-%m-%d %H:%M:%S"),   datetime.strptime("2023-10-05 14:31:20", "%Y-%m-%d %H:%M:%S")],  } )  # A test table will be created if not exist in risingwave with the correct schema rw.insert(table_name="test", data=df)  # You can provide an optional force_flush parameter and set it to True # if you would the inserted data to be visible in fetch query immediately. # Otherwise, data will be inserted in batches asynchronously for better performance. # rw.insert(table_name="test", data=df, force_flush = True) 

Load data into RisingWave from external systems:

# Create a table and load data from upstream kafka rw.execute("""  CREATE TABLE IF NOT EXISTS source_abc  WITH (  connector='kafka',  properties.bootstrap.server='localhost:9092',  topic='test_topic'  )   FORMAT UPSERT ENCODE AVRO (  schema.registry = 'http://127.0.0.1:8081',  schema.registry.username='your_schema_registry_username',  schema.registry.password='your_schema_registry_password' )""") 

For supported sources and the SQL syntax, see this topic.

Query from RisingWave

from risingwave import OutputFormat  result: pd.DataFrame = rw.fetch("""  SELECT window_start, window_end, product, ROUND(avg(price)) as avg_price  FROM tumble(test, ts, interval '10 seconds')   GROUP BY window_start, window_end, product""",   format=OutputFormat.DATAFRAME)  print(result) # Output: # window_start window_end product avg_price # 0 2023-10-05 14:31:20 2023-10-05 14:31:30 bar 457.0 # 1 2023-10-05 14:30:00 2023-10-05 14:30:10 foo 123.0  # You can also use OutputFormat.RAW to get back list of tuples as the query results # rw.fetch("...", format=OutputFormat.RAW) # [(datetime.datetime(2023, 10, 5, 14, 31, 20), datetime.datetime(2023, 10, 5, 14, 31, 30), 'bar', 457.0),  # (datetime.datetime(2023, 10, 5, 14, 30), datetime.datetime(2023, 10, 5, 14, 30, 10), 'foo', 123.0)] 

Event-driven processing with RisingWave

Event-driven applications depend on real-time data processing to react to events as they occur. With risingwave-py, you can define materialized views using SQL and run them in RisingWave. Behind the scenes, events are processed continuously, and the results are incrementally maintained. In the following example, test_mv is created to incrementally maintain the result of the defined SQL as events are ingested in to the test table.

mv = rw.mv(name="test_mv",  stmt="""SELECT window_start, window_end, product, ROUND(avg(price)) as avg_price  FROM tumble(test, ts, interval '10 seconds')   GROUP BY window_start, window_end, product""") 

In addition to using SQL to do ad-hoc query on tables and materialized views. With risingwave-py, You can also subscribe changes from table / materialized view and define handler of the change events from table / materialized view for you applications.

# Write your own handler # the event will contains all fields of the subscribed MV/Table plus two additional columns: # - op: varchar. The change operations. Valid values: [Insert, UpdateInsert, Delete, UpdateDelete].  # The reason why we have UpdateInsert and UpdateDelete is because RisingWave treats an UPDATE  # as a delete of the old value followed by an insert of the new value. # - rw_timestamp: bigint. The Unix timestamp in milliseconds when the data was written. def simple_event_handler(event: pd.DataFrame):  for _, row in event.iterrows():  # Trigger an action (e.g. place an order via REST API) when the avg_price exceeds 300  if (row["op"] == "UpdateInsert" or row["op"] == "Insert") and row["avg_price"] >= 300:  print(  f"{row['window_start']} - {row['window_end']}: {row['product']} avg price {row['avg_price']} exceeds 300")  # ...   import threading  # Subscribe changes of a materialized view and feed it to your own handler. # Run on_change in a separate thread without blocking the main thread. threading.Thread(  target=lambda: mv.on_change(  # Pass your handler here  handler = simple_event_handler,  # Support DATAFRAME and RAW tuples here  output_format=OutputFormat.DATAFRAME,  # If set to True, progress of the subscription will be saved and can be recovered on python application crashed  persist_progress=True,   # Maximal number of rows returned each time  max_batch_size = 10)  ).start()  # Subscribe changes of a table and print them to console. # Run on_change in a separate thread without blocking the main thread. threading.Thread(  target=lambda: rw.on_change(  subscribe_from="test",  handler = lambda data: print(data),  output_format=OutputFormat.RAW,  persist_progress=False,   max_batch_size = 5)  ).start()   # Future inserted data into the base table will be reflected in the subscriptions # df = pd.DataFrame( # { # "product": ["foo", "bar"], # "price": [1000.2, 5000.4], # "ts": [datetime.strptime("2023-10-05 17:30:00", "%Y-%m-%d %H:%M:%S"),  # datetime.strptime("2023-10-05 17:31:20", "%Y-%m-%d %H:%M:%S")], # } # ) # rw.insert(table_name="test", data=df) 

For more details, please refer to the risingwave-py GitHub repo.

SQL reference

User-defined functions

Python SDK

Use `risingwave-py` to connect to RisingWave

Run RisingWave

Connect to RisingWave

Ingestion into RisingWave

Query from RisingWave

Event-driven processing with RisingWave

SQL reference

Python SDK

User-defined functions

​Use risingwave-py to connect to RisingWave

​Run RisingWave

​Connect to RisingWave

​Ingestion into RisingWave

​Query from RisingWave

​Event-driven processing with RisingWave

Use `risingwave-py` to connect to RisingWave

Run RisingWave

Connect to RisingWave

Ingestion into RisingWave

Query from RisingWave

Event-driven processing with RisingWave