I have this notes for quite long time, probably will be useful for some people finding inspiration about this topic.
It’s always an attractive discussion point, what’s good team, mature team or ideal team. I’ve always had an ideal team in mind as a philosophical mind exercise, however as it’s philosophical, bringing this into the real world is difficult.
So I’m trying to create a list of traits on what’s a good team -for me- without using a good or ideal team name – to avoid philosophical debate induced by “Ideal” word.
Mature Organization
One Pager Doc, covering :
Explanation of the product and business
Important North Star Metrics
List of Team and Stakeholders
High Level overview of tech design
Complete and good composition of people
Sofware Engineer (SE), Senior SE, Principle Engineer
Right people on the proper role
Manager
1 Leader: ~7 Direct (2025 Notes : I’m thinking probably 15 Direct is Good number)
Good organization ceremonies
Monthly leadership meeting
Routine sprint plan
Weekly report
Weekly product and tech huddle up
Short daily (no more than 15 minutes)
Quarterly Townhall with the product team
Technical excellence, code quality
80% unit test code coverage
80% endpoint integration test coverage
80% gRPC integration test coverage
90% P1 android UI test (2025 Notes : I don’t think this is make sense)
Strict Linter enabled
Monitoring and Observability
Executive dashboard : Capture important tech dashboard E2E biz journey.
RED (RPS, Error / Success rate, Duration / Latency) Metrics : For itself and upstream.
Easy way to pinpoint error on Log : easy to query log, proper tagging, well propagated log context identification (unique session ID).
Technical excellence, process
Each team has a repository of documentation (eg. confluence) linked to Github Readme
[opt] Git squash for every PR
Diligently use proper technical design, RFC process
Incident management
On-call schedule, escalation ready, and well setup
Quarterly revisit on call
Leadership
Routine 1-1 with each team member, minimum 1x in a month
Last week we created an LLM-based Chatbot with a knowledge base, which many people are interested in to my surprise. I thought it would be cool if we can continue on that topic, now let’s make the chatbot a bit more advanced with a sprinkle of WebSocket magic. This will demonstrate how easy it’s to make a useable LLM application using GChain.
The default experience that people expect from a chatbot today is with streaming response a la chatGPT, however, this is not a given experience. The fact that not all off-the-shelf models today support it, so far I only know openAI supports streaming response. Delivering the response to the user/client will also be more complicated as usual restful API is not sufficient, hence in this case we gonna use WebSocket to stream the response.
The plan is :
Prepare a WebSocket server and a basic handler
Make a conversation chain with a streaming model session
Get user’s input and stream the response thru WebSocket
In the end, it will look like this
Prepare the WebSocket boilerplate
I hope WebSocket (ws) term here will not intimidate you, it will be simple and fun :). Ws boilerplate will not be much different than the usual HTTP server and handler, and here github.com/gorilla/websocket will be used to make things much easier. Two big differences are 1.) The HTTP request will be upgraded to ws. 2.) As the connection will be long-lived, input and output will be streamed using WriteMessage and ReadMessage functions.
func main() { // websocket route http.HandleFunc("/chat", wshandler) log.Println("http server started on :8000") err := http.ListenAndServe(":8000", nil) if err != nil { log.Fatal("ListenAndServe: ", err) } fmt.Println("Program exited.") } func wshandler(w http.ResponseWriter, r *http.Request) { // Upgrade initial GET request to a websocket ws, err := upgrader.Upgrade(w, r, nil) if err != nil { log.Fatal(err) } // Make sure we close the connection when the function returns defer ws.Close() }
Make a conversation chain with a streaming model session
This part is where we GChain comes to help us. We’re creating a conversation chain within the ws session, so the conversation memory will be kept as long as the session is alive.
// all communication thru ws will be in this format type message struct { Text string `json:"text"` Finished bool `json:"finished"` } func wshandler(w http.ResponseWriter, r *http.Request) { log.Println("serving ws connection") // new conversation memory created memory := []model.ChatMessage{} // streamingChannel to get response from model streamingChannel := make(chan model.ChatMessage, 100) // setup new conversation chain convoChain := conversation.NewConversationChain(chatModel, memory, callback.NewManager(), "You're helpful chatbot that answer human question very concisely, response with formatted html.", false) // append greeting to the memory convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: "Hi, My name is GioAI"}) ...... // send greetings to user m, err := json.Marshal(message{Text: "Hi, My name is GioAI", Finished: true}) if err != nil { log.Println(err) return } ws.WriteMessage(websocket.TextMessage, m) }
Handle Messages
With the WebSocket boilerplate and the conversation chain setup, we’re ready to get and send messages to the client. All messages sent will use the message struct defined above. There are 4 main activities here: 1.) Get the user’s message from ws. 2.) Send the user’s message to the model. 3.) Get the model’s response. 4.) Stream the response to the user.
A main loop is created within wshandler, we can see how a call to model is within a go routine as we don’t want the model call to be a blocking process. Later streamingChannel is used to handle the model’s response.
func wshandler(w http.ResponseWriter, r *http.Request) { ..... // main loop to handle user's input for { // Read in a new requestMessage as JSON and map it to a Message object _, requestMessage, err := ws.ReadMessage() if err != nil { log.Printf("error: %v", err) break } // whole model output will be kept here var output string // send request to model as a go routine, as we don't want to block here go func() { var err error output, err = convoChain.SimpleRun(context.Background(), string(requestMessage[:]), model.WithIsStreaming(true), model.WithStreamingChannel(streamingChannel)) if err != nil { fmt.Println("error " + err.Error()) return } }() // handle the response streaming for { value, ok := <-streamingChannel } }
For every streamed response from the model, we will put it to message struct to be marshalled as JSON and sent to the client. As we’re sending responses in small chunks, it’s not obvious for the client to identify the end of the message, hence the finished field is important to have.
// the main loop for { ..... // handle the response streaming for { value, ok := <-streamingChannel if ok && !model.IsStreamFinished(value) { m, err := json.Marshal(message{Text: value.Content, Finished: false}) if err != nil { log.Println(err) continue } ws.WriteMessage(websocket.TextMessage, []byte(m)) } else { // Finished field true at the end of the message m, err := json.Marshal(message{Finished: true}) if err != nil { log.Println(err) continue } ws.WriteMessage(websocket.TextMessage, m) break } } // put user message and model response to conversation memory convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleUser, Content: string(requestMessage[:])}) convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: output}) } // the end of main loop
The complete code is just above 100 lines, you can find it in the GChain’s example with a simple HTML client as well.
You can feel the craze of the Large Language Model (LLM) based solution Today. People are racing to find a good market-fit product or even a disruptive one. However, the scene is mainly dominated by Python-based implementation, so building a solution on top of Go is rare Today and I hope interesting for many people.
Here we will use github.com/wejick/gchain to build the chatbot, it’s a framework heavily inspired by LangChain. It has enough tools to build a chatbot with a connection to a vector db as the knowledge base.
What we’re building today?
An indexer to put information to a vector DB
A chatbot backed by openAI Chat API
The chatbot will query information to the vector DB to answer user question
Building Indexer
We’re going to use English Wikipedia page of Indonesia and History of Indonesia as the knowledge and put it into weaviate as our vector db. As both text containts a lot of text and LLM has constraints on the context window size, chunking the text into smaller sizes will help in our application.
In the indexer there are several main steps to be done :
Load the text
Cut the texts into chunks
Get each chunk embedding
Store the text chunk + embedding to the vector DB
Loading the text
sources := []source{ {filename: "indonesia.txt"}, {filename: "history_of_indonesia.txt"}, } for idx, s := range sources { data, err := os.ReadFile(s.filename) if err != nil { log.Println(err) continue } sources[idx].doc = string(data) }
Cut the texts into chunks using textsplitter package
// create text splitter for embedding model indexingplitter, err := textsplitter.NewTikTokenSplitter(openai.AdaEmbeddingV2.String()) // split the documents by 500 token var docs []string docs = indexingplitter.SplitText(sources[0].doc, 500, 0) docs = append(docs, indexingplitter.SplitText(sources[1].doc, 500, 0)...)
Create weaviate and embedding instance, then Store the text chunk to DB
// create e embeddingModel = _openai.NewOpenAIEmbedModel(OAIauthToken, "", openai.AdaEmbeddingV2) // create a new weaviate client wvClient, err = weaviateVS.NewWeaviateVectorStore(wvhost, wvscheme, wvApiKey, embeddingModel, nil) // store the document to weaviate db bErr, err := wvClient.AddDocuments(context.Background(), "Indonesia", docs) // handle error here
Create the ChatBot
GChain has a premade chain named conversational_retriever which gives us ability to create a chatbot that connects to a datastore easily. As we have prepared the knowledge base above, it’s just a matter of wiring everything together and creating a Text interface to handle user interaction.
Setup the chain
// memory to store conversation history memory := []model.ChatMessage{} // Creating text splitter for GPT3Dot5Turbo0301 model textplitter, err := textsplitter.NewTikTokenSplitter(_openai.GPT3Dot5Turbo0301) // create the chain chain := conversational_retrieval.NewConversationalRetrievalChain(chatModel, memory, wvClient, "Indonesia", textplitter, callback.NewManager(), "", 2000, false)
Create a simple text-based user interface, basically loop until user want to quit
fmt.Println("AI : How can I help you, I know many things about indonesia") scanner := bufio.NewScanner(os.Stdin) for { fmt.Print("User : ") scanner.Scan() input := scanner.Text() output, err := chain.Run(context.Background(), map[string]string{"input": input}) if err != nil { log.Println(err) return } fmt.Println("AI : " + output["output"]) }
You can find the full code in the GChain Example. It will looks like this :
For the past few weeks, I’ve been thinking about having a time series DB and visualization tool that is easy to use and light on resources, so it can be used locally for development. Tried several times to find it with keywords such as “Grafana alternative”, “Prometheus alternative”, nothing to be seen as an alternative close to my liking. At one stage I almost write my own tools, which will be too big an undertaking for me.
So why not just use Prometheus? Well not until you see the binary size of it, a whopping 103.93 MB. Not surprising for a statically linked executable, but still, it’s big. Yesterday morning I told myself, why not make it smaller? How hard it can be right? Well at least smaller on the binary size, as I’m not yet informed enough to do a runtime utilization measurement on it.
The first thing I did after getting access to the code is to just remove whatever code I deemed not necessary. Like one time I completely remove tracing and alerting functionality, but that’s only reducing a maximum of 2MB. This comes with a lot bunch of modified files and breaking test cases. Definitely not the right way, we need to do it smarter.
The weight
After browsing for a binary size analysis tool I found goweight. It helps to determine what’s contributing to the binary size :
No surprise to see that the main contributors are cloud provider SDKs, but it’s still disappointing. At this stage, it is just a matter of finding where those are being used. A simple text search revealed that all the SDKs are mainly for Service Discovery (SD), for local development file-SD are more than enough.
After tinkering here and there, the very minimum change to exclude all unused SD is to comment them out in the in discovery/install/install.go . With this change, all tests are passed except one test related to zookeeper initialization. The binary got reduced to just 35.69 MB, just 34% from the original one.
Wrap Up
After stripping the debugging info using -ldflags="-s -w", the size got reduced quite bit more. At the end we reduced the size to 28.30% of the original file to just 29.42 MB.
It’s crazy when you think that more than 50% (I’m being generous) of the size is something most people will not use, maybe this is the price of modern computation where 50MB means nothing. To be fair Prometheus developers are kinda aware of this and putting the discussion in the readme of service discovery, however, the binary size is not discussed there.
There are several potential ways to reduce the file size, like by using upx to package the file. However it’s not readily available on brew, so I just skipped it. Another way is just to cut more functionality like tracing, alert and remote storage, however, it will take more effort to maintain and make sure all the tests will be passed.
Pada suatu masa, bisa jadi hidup terasa mengalir begitu saja. Tidak ada rencana, tidak ada harapan, tidak ada pemikiran dalam menjalani hidup, sebuah definisi yang tepat dari kalimat “Living the present”. Yang terjadi sering kali hanya dua jenis; yang pertama adalah hal-hal yang sudah terprogram saja, rutinitas, seperti bangun mandi, makan, bayar iuran bulanan dan tagihan listrik. Yang kedua untuk hal yang tidak terprogram akan dihadapi secara impromtu saja, alias ya sudah dipikirkan saat sesuatu butuh untuk dipikirkan.
Nampaknya tenang-tenang saja, namun ada suatu kegelisahan yang ingin dikeluarkan. Kegelisahan akan stagnasi akal dan ilmu pengetahuan yang rasanya ingin sekali diasah, disirami, ataupun dilatih dan diolah. Dilatih, diolah dan disirami jiwa dan raganya.
Entah kenapa mungkin sejak lewat beberapa bulan masa pandemi, kurangnya aktifitas fisik, terkungkungnya interaksi manusia, menjadikan gairah untuk mengolah rasa dan pikiran menjadi lemah. Keinginan selalu lebih kecil daripada tarikan pesona hiburan instan ala youtube dan juga instagram. Melelahkan dan membosankan.