Skip to content

Commit 515fbec

Browse files
committed
CONTRIBUTING.md
1 parent 6a3a633 commit 515fbec

File tree

1 file changed

+77
-15
lines changed

1 file changed

+77
-15
lines changed

CONTRIBUTING.md

Lines changed: 77 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -55,16 +55,16 @@ Find a way to generate some API key for your data source.
5555
we'd recommend asking Google *"how to generate personal access token for \<your data source name>".*
5656

5757

58-
It usually involves going to profile settings, creating a new access token, and copying it.
58+
Generating token usually involves going to your profile settings, creating a new access token, and copying it.
5959

6060

61-
Write down the steps you took to generate the API key somewhere, you'll need them later.
61+
Got your token? Write down the process steps, you'll need it later.
6262

6363
<br>
6464

6565
---
6666

67-
<br>
67+
6868

6969
## Let's go
7070
This guide will walk you through the process of creating a data source for the imagniary website called `Magic`.
@@ -101,7 +101,7 @@ from datetime import datetime
101101
from typing import List, Dict
102102

103103
from pydantic import BaseModel
104-
from data_source.api.base_data_source import BaseDataSource, ConfigField, HTMLInputType
104+
from data_source.api.base_data_source import BaseDataSource, BaseDataSourceConfig, ConfigField
105105
from data_source.api.basic_document import BasicDocument, DocumentType
106106
from queues.index_queue import IndexQueue
107107
```
@@ -110,7 +110,7 @@ from queues.index_queue import IndexQueue
110110
## 3. Create a configuration class
111111
Create a class that inherits from `BaseDataSourceConfig` for your data source configuration.
112112

113-
Add the needed configuration fields.
113+
Add your data-source's fields.
114114

115115
```python
116116
class MagicConfig(BaseDataSourceConfig):
@@ -121,8 +121,9 @@ class MagicConfig(BaseDataSourceConfig):
121121

122122
<br>
123123

124-
## 4. Implement your data source class
124+
## 4. Create a data source class
125125
Create a new class that inherits from `BaseDataSource` and implement the 3 abstract methods:
126+
126127
4.1. `get_config_fields`
127128

128129
4.2. `validate_config`
@@ -163,8 +164,7 @@ class MagicDataSource(BaseDataSource):
163164
```
164165

165166
### 4.1. `get_config_fields`
166-
This method should return a list of `ConfigField`s that describe the configuration fields required for your data source:
167-
167+
Return a list of `ConfigField`s that describes the data-source's configuration fields required (same fields as in `MagicConfig` but with UI fields).
168168
```python
169169
@staticmethod
170170
def get_config_fields() -> List[ConfigField]:
@@ -178,10 +178,13 @@ def get_config_fields() -> List[ConfigField]:
178178
]
179179
```
180180

181-
### 5.2. `validate_config`
181+
### 4.2. `validate_config`
182182
This method should validate the provided configuration and raise an InvalidDataSourceConfig exception if it's invalid.
183183

184-
it should also try to actually connect to the data source and verify that it's working:
184+
it MUST connect to the data source and verify that it's working.
185+
186+
Some data-sources have a `auth_check` method, you can use it to verify the connection.
187+
Otherwise you can try to list something from the data source.
185188

186189
```python
187190
@staticmethod
@@ -200,11 +203,70 @@ def validate_config(config: Dict) -> None:
200203
raise InvalidDataSourceConfig from e
201204
```
202205

203-
### 5.3. _feed_new_documents
204-
This method should add new documents to the index queue. The implementation depends on the specific data source you're working with:
206+
### 4.3. _feed_new_documents
207+
This method should add new documents to the index queue. The implementation depends on the specific data source you're working with.
205208

209+
Flow should look like:
210+
211+
1. List spaces/channels/whatever from the data source.
212+
2. Run tasks to fetch documents from each space/channel/whatever.
213+
* tasks are a built-in Gerev pipeline to run async functions with workers for maximum performance.
214+
3. Parse each document into a `BasicDocument` object.
215+
4. Feed the `BasicDocument` object to the index queue.
206216
```python
207217
def _feed_new_documents(self) -> None:
208-
# Fetch new documents from your data source, and add them to the index queue
209-
# You can use the IndexQueue.get_instance().put_single(doc=doc) method to add a document to the queue
210-
```
218+
channels = self._magic_client.list_channels()
219+
for channel in channels:
220+
self._add_task(self._feed_channel, channel)
221+
222+
def _fetch_channel(self, channel: Channel) -> None:
223+
messages = self._magic_client.list_messages(channel)
224+
for message in messages:
225+
doc = BasicDocument(
226+
id=message["id"],
227+
data_source_id=self._data_source_id,
228+
type=DocumentType.MESSAGE,
229+
title=message['title'],
230+
content=message.get("description"),
231+
author=message['author']['name'],
232+
author_image_url=message['author']['avatar_url'],
233+
location=message['references']['full'],
234+
url=message['web_url'],
235+
timestamp=message['created_at'],
236+
)
237+
IndexQueue.get_instance().put_single(document)
238+
```
239+
5. Before adding to queue, check whether document is newer than self._last_indexed_at, if not, skip it.
240+
```python
241+
last_modified = datetime.strptime(message["updated_at"], "%Y-%m-%dT%H:%M:%S.%fZ")
242+
if last_modified < self._last_index_time:
243+
logger.info(f"Message {message['id']} is too old, skipping")
244+
continue
245+
```
246+
247+
## 5. UI instructions
248+
249+
You should add your data source instructions to the UI.
250+
251+
### 5.1. data-source-panel.tsx
252+
go to `gerev/ui/src/components/data-source-panel.tsx` and add your data source to the html.
253+
254+
```typescript
255+
{
256+
this.state.selectedDataSource.value === 'Magic' && (
257+
<span className="flex flex-col leading-9 text-xl text-white">
258+
<span>1. {'Go to Magic -> top-right profile picture -> Edit profile'}</span>
259+
<span>2. {'Scroll down to API tokens -> Create token -> Name it'}</span>
260+
<span>3. {"Set 'Expiry Date' 01/01/2100, create, copy token id + token secret"}</span>
261+
</span>
262+
)
263+
}
264+
```
265+
266+
267+
## 6. Logo
268+
269+
Add your data-source logo.png to app/static/data_sources_icons.
270+
271+
272+
:rocket: Done!

0 commit comments

Comments
 (0)