@@ -55,16 +55,16 @@ Find a way to generate some API key for your data source.
5555we'd recommend asking Google * "how to generate personal access token for \< your data source name>".*
5656
5757
58- It usually involves going to profile settings, creating a new access token, and copying it.
58+ Generating token usually involves going to your profile settings, creating a new access token, and copying it.
5959
6060
61- Write down the steps you took to generate the API key somewhere , you'll need them later.
61+ Got your token? Write down the process steps , you'll need it later.
6262
6363<br >
6464
6565---
6666
67- < br >
67+
6868
6969## Let's go
7070This guide will walk you through the process of creating a data source for the imagniary website called ` Magic ` .
@@ -101,7 +101,7 @@ from datetime import datetime
101101from typing import List, Dict
102102
103103from pydantic import BaseModel
104- from data_source.api.base_data_source import BaseDataSource, ConfigField, HTMLInputType
104+ from data_source.api.base_data_source import BaseDataSource, BaseDataSourceConfig, ConfigField
105105from data_source.api.basic_document import BasicDocument, DocumentType
106106from queues.index_queue import IndexQueue
107107```
@@ -110,7 +110,7 @@ from queues.index_queue import IndexQueue
110110## 3. Create a configuration class
111111Create a class that inherits from ` BaseDataSourceConfig ` for your data source configuration.
112112
113- Add the needed configuration fields.
113+ Add your data-source's fields.
114114
115115``` python
116116class MagicConfig (BaseDataSourceConfig ):
@@ -121,8 +121,9 @@ class MagicConfig(BaseDataSourceConfig):
121121
122122<br >
123123
124- ## 4. Implement your data source class
124+ ## 4. Create a data source class
125125Create a new class that inherits from ` BaseDataSource ` and implement the 3 abstract methods:
126+
1261274.1. ` get_config_fields `
127128
1281294.2. ` validate_config `
@@ -163,8 +164,7 @@ class MagicDataSource(BaseDataSource):
163164```
164165
165166### 4.1. ` get_config_fields `
166- This method should return a list of ` ConfigField ` s that describe the configuration fields required for your data source:
167-
167+ Return a list of ` ConfigField ` s that describes the data-source's configuration fields required (same fields as in ` MagicConfig ` but with UI fields).
168168``` python
169169@ staticmethod
170170def get_config_fields () -> List[ConfigField]:
@@ -178,10 +178,13 @@ def get_config_fields() -> List[ConfigField]:
178178 ]
179179```
180180
181- ### 5 .2. ` validate_config `
181+ ### 4 .2. ` validate_config `
182182This method should validate the provided configuration and raise an InvalidDataSourceConfig exception if it's invalid.
183183
184- it should also try to actually connect to the data source and verify that it's working:
184+ it MUST connect to the data source and verify that it's working.
185+
186+ Some data-sources have a ` auth_check ` method, you can use it to verify the connection.
187+ Otherwise you can try to list something from the data source.
185188
186189``` python
187190@ staticmethod
@@ -200,11 +203,70 @@ def validate_config(config: Dict) -> None:
200203 raise InvalidDataSourceConfig from e
201204```
202205
203- ### 5 .3. _ feed_new_documents
204- This method should add new documents to the index queue. The implementation depends on the specific data source you're working with:
206+ ### 4 .3. _ feed_new_documents
207+ This method should add new documents to the index queue. The implementation depends on the specific data source you're working with.
205208
209+ Flow should look like:
210+
211+ 1 . List spaces/channels/whatever from the data source.
212+ 2 . Run tasks to fetch documents from each space/channel/whatever.
213+ * tasks are a built-in Gerev pipeline to run async functions with workers for maximum performance.
214+ 3 . Parse each document into a ` BasicDocument ` object.
215+ 4 . Feed the ` BasicDocument ` object to the index queue.
206216``` python
207217def _feed_new_documents (self ) -> None :
208- # Fetch new documents from your data source, and add them to the index queue
209- # You can use the IndexQueue.get_instance().put_single(doc=doc) method to add a document to the queue
210- ```
218+ channels = self ._magic_client.list_channels()
219+ for channel in channels:
220+ self ._add_task(self ._feed_channel, channel)
221+
222+ def _fetch_channel (self , channel : Channel) -> None :
223+ messages = self ._magic_client.list_messages(channel)
224+ for message in messages:
225+ doc = BasicDocument(
226+ id = message[" id" ],
227+ data_source_id = self ._data_source_id,
228+ type = DocumentType.MESSAGE ,
229+ title = message[' title' ],
230+ content = message.get(" description" ),
231+ author = message[' author' ][' name' ],
232+ author_image_url = message[' author' ][' avatar_url' ],
233+ location = message[' references' ][' full' ],
234+ url = message[' web_url' ],
235+ timestamp = message[' created_at' ],
236+ )
237+ IndexQueue.get_instance().put_single(document)
238+ ```
239+ 5 . Before adding to queue, check whether document is newer than self._ last_indexed_at, if not, skip it.
240+ ``` python
241+ last_modified = datetime.strptime(message[" updated_at" ], " %Y-%m-%d T%H:%M:%S.%f Z" )
242+ if last_modified < self ._last_index_time:
243+ logger.info(f " Message { message[' id' ]} is too old, skipping " )
244+ continue
245+ ```
246+
247+ ## 5. UI instructions
248+
249+ You should add your data source instructions to the UI.
250+
251+ ### 5.1. data-source-panel.tsx
252+ go to ` gerev/ui/src/components/data-source-panel.tsx ` and add your data source to the html.
253+
254+ ``` typescript
255+ {
256+ this .state .selectedDataSource .value === ' Magic' && (
257+ < span className = " flex flex-col leading-9 text-xl text-white" >
258+ <span >1. {' Go to Magic -> top-right profile picture -> Edit profile' }< / span >
259+ <span >2. {' Scroll down to API tokens -> Create token -> Name it' }< / span >
260+ <span >3. {" Set 'Expiry Date' 01/01/2100, create, copy token id + token secret" }< / span >
261+ < / span >
262+ )
263+ }
264+ ```
265+
266+
267+ ## 6. Logo
268+
269+ Add your data-source logo.png to app/static/data_sources_icons.
270+
271+
272+ :rocket : Done!
0 commit comments