Mature Organization

I have this notes for quite long time, probably will be useful for some people finding inspiration about this topic.

It’s always an attractive discussion point, what’s good team, mature team or ideal team. I’ve always had an ideal team in mind as a philosophical mind exercise, however as it’s philosophical, bringing this into the real world is difficult.

So I’m trying to create a list of traits on what’s a good team -for me- without using a good or ideal team name – to avoid philosophical debate induced by “Ideal” word.

Mature Organization

  • One Pager Doc, covering :
    • Explanation of the product and business
    • Important North Star Metrics
    • List of Team and Stakeholders
    • High Level overview of tech design
  • Complete and good composition of people
    • Sofware Engineer (SE), Senior SE, Principle Engineer
      • Right people on the proper role
    • Manager
      • 1 Leader: ~7 Direct (2025 Notes : I’m thinking probably 15 Direct is Good number)
  • Good organization ceremonies
    • Monthly leadership meeting
    • Routine sprint plan
    • Weekly report
    • Weekly product and tech huddle up
    • Short daily (no more than 15 minutes)
    • Quarterly Townhall with the product team
  • Technical excellence, code quality
    • 80% unit test code coverage
    • 80% endpoint integration test coverage
    • 80% gRPC integration test coverage
    • 90% P1 android UI test (2025 Notes : I don’t think this is make sense)
    • Strict Linter enabled
  • Monitoring and Observability
    • Executive dashboard : Capture important tech dashboard E2E biz journey.
    • RED (RPS, Error / Success rate, Duration / Latency) Metrics : For itself and upstream.
    • Easy way to pinpoint error on Log : easy to query log, proper tagging, well propagated log context identification (unique session ID).
  • Technical excellence, process
    • Each team has a repository of documentation (eg. confluence) linked to Github Readme
    • [opt] Git squash for every PR
    • Diligently use proper technical design, RFC process
  • Incident management
    • On-call schedule, escalation ready, and well setup
    • Quarterly revisit on call
  • Leadership
    • Routine 1-1 with each team member, minimum 1x in a month
    • Each of them as a mentor

Making an LLM-Based Streaming Chatbot with Go and WebSocket

Last week we created an LLM-based Chatbot with a knowledge base, which many people are interested in to my surprise. I thought it would be cool if we can continue on that topic, now let’s make the chatbot a bit more advanced with a sprinkle of WebSocket magic. This will demonstrate how easy it’s to make a useable LLM application using GChain.

The default experience that people expect from a chatbot today is with streaming response a la chatGPT, however, this is not a given experience. The fact that not all off-the-shelf models today support it, so far I only know openAI supports streaming response. Delivering the response to the user/client will also be more complicated as usual restful API is not sufficient, hence in this case we gonna use WebSocket to stream the response.

The plan is :

  1. Prepare a WebSocket server and a basic handler
  2. Make a conversation chain with a streaming model session
  3. Get user’s input and stream the response thru WebSocket
In the end, it will look like this

Prepare the WebSocket boilerplate

I hope WebSocket (ws) term here will not intimidate you, it will be simple and fun :). Ws boilerplate will not be much different than the usual HTTP server and handler, and here github.com/gorilla/websocket will be used to make things much easier. Two big differences are 1.) The HTTP request will be upgraded to ws. 2.) As the connection will be long-lived, input and output will be streamed using WriteMessage and ReadMessage functions.

 func main() {	// websocket route	http.HandleFunc("/chat", wshandler)	log.Println("http server started on :8000")	err := http.ListenAndServe(":8000", nil)	if err != nil {	log.Fatal("ListenAndServe: ", err)	}	fmt.Println("Program exited.") } func wshandler(w http.ResponseWriter, r *http.Request) {	// Upgrade initial GET request to a websocket	ws, err := upgrader.Upgrade(w, r, nil)	if err != nil {	log.Fatal(err)	}	// Make sure we close the connection when the function returns	defer ws.Close() } 

Make a conversation chain with a streaming model session

This part is where we GChain comes to help us. We’re creating a conversation chain within the ws session, so the conversation memory will be kept as long as the session is alive.

 // all communication thru ws will be in this format type message struct {	Text string `json:"text"`	Finished bool `json:"finished"` } func wshandler(w http.ResponseWriter, r *http.Request) {	log.Println("serving ws connection")	// new conversation memory created	memory := []model.ChatMessage{}	// streamingChannel to get response from model	streamingChannel := make(chan model.ChatMessage, 100)	// setup new conversation chain	convoChain := conversation.NewConversationChain(chatModel, memory, callback.NewManager(), "You're helpful chatbot that answer human question very concisely, response with formatted html.", false)	// append greeting to the memory	convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: "Hi, My name is GioAI"}) ......	// send greetings to user	m, err := json.Marshal(message{Text: "Hi, My name is GioAI", Finished: true})	if err != nil {	log.Println(err)	return	}	ws.WriteMessage(websocket.TextMessage, m) } 

Handle Messages

With the WebSocket boilerplate and the conversation chain setup, we’re ready to get and send messages to the client. All messages sent will use the message struct defined above. There are 4 main activities here: 1.) Get the user’s message from ws. 2.) Send the user’s message to the model. 3.) Get the model’s response. 4.) Stream the response to the user.

A main loop is created within wshandler, we can see how a call to model is within a go routine as we don’t want the model call to be a blocking process. Later streamingChannel is used to handle the model’s response.

 func wshandler(w http.ResponseWriter, r *http.Request) { .....	// main loop to handle user's input	for {	// Read in a new requestMessage as JSON and map it to a Message object	_, requestMessage, err := ws.ReadMessage()	if err != nil {	log.Printf("error: %v", err)	break	}	// whole model output will be kept here	var output string	// send request to model as a go routine, as we don't want to block here	go func() {	var err error	output, err = convoChain.SimpleRun(context.Background(), string(requestMessage[:]), model.WithIsStreaming(true), model.WithStreamingChannel(streamingChannel))	if err != nil {	fmt.Println("error " + err.Error())	return	}	}()	// handle the response streaming	for {	value, ok := <-streamingChannel	}	} 

For every streamed response from the model, we will put it to message struct to be marshalled as JSON and sent to the client. As we’re sending responses in small chunks, it’s not obvious for the client to identify the end of the message, hence the finished field is important to have.

	// the main loop	for { .....	// handle the response streaming	for {	value, ok := <-streamingChannel	if ok && !model.IsStreamFinished(value) {	m, err := json.Marshal(message{Text: value.Content, Finished: false})	if err != nil {	log.Println(err)	continue	}	ws.WriteMessage(websocket.TextMessage, []byte(m))	} else {	// Finished field true at the end of the message	m, err := json.Marshal(message{Finished: true})	if err != nil {	log.Println(err)	continue	}	ws.WriteMessage(websocket.TextMessage, m)	break	}	}	// put user message and model response to conversation memory	convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleUser, Content: string(requestMessage[:])})	convoChain.AppendToMemory(model.ChatMessage{Role: model.ChatMessageRoleAssistant, Content: output})	} // the end of main loop 

The complete code is just above 100 lines, you can find it in the GChain’s example with a simple HTML client as well.

Building LLM Based Chatbot with a knowledge base in Go

You can feel the craze of the Large Language Model (LLM) based solution Today. People are racing to find a good market-fit product or even a disruptive one. However, the scene is mainly dominated by Python-based implementation, so building a solution on top of Go is rare Today and I hope interesting for many people.

Here we will use github.com/wejick/gchain to build the chatbot, it’s a framework heavily inspired by LangChain. It has enough tools to build a chatbot with a connection to a vector db as the knowledge base.

What we’re building today?

  1. An indexer to put information to a vector DB
  2. A chatbot backed by openAI Chat API
  3. The chatbot will query information to the vector DB to answer user question

Building Indexer

We’re going to use English Wikipedia page of Indonesia and History of Indonesia as the knowledge and put it into weaviate as our vector db. As both text containts a lot of text and LLM has constraints on the context window size, chunking the text into smaller sizes will help in our application.

In the indexer there are several main steps to be done :

  1. Load the text
  2. Cut the texts into chunks
  3. Get each chunk embedding
  4. Store the text chunk + embedding to the vector DB

Loading the text

 sources := []source{	{filename: "indonesia.txt"},	{filename: "history_of_indonesia.txt"},	}	for idx, s := range sources {	data, err := os.ReadFile(s.filename)	if err != nil {	log.Println(err)	continue	}	sources[idx].doc = string(data)	} 

Cut the texts into chunks using textsplitter package

	// create text splitter for embedding model	indexingplitter, err := textsplitter.NewTikTokenSplitter(openai.AdaEmbeddingV2.String())	// split the documents by 500 token	var docs []string	docs = indexingplitter.SplitText(sources[0].doc, 500, 0)	docs = append(docs, indexingplitter.SplitText(sources[1].doc, 500, 0)...) 

Create weaviate and embedding instance, then Store the text chunk to DB

	// create e	embeddingModel = _openai.NewOpenAIEmbedModel(OAIauthToken, "", openai.AdaEmbeddingV2)	// create a new weaviate client	wvClient, err = weaviateVS.NewWeaviateVectorStore(wvhost, wvscheme, wvApiKey, embeddingModel, nil)	// store the document to weaviate db	bErr, err := wvClient.AddDocuments(context.Background(), "Indonesia", docs)	// handle error here 

Create the ChatBot

GChain has a premade chain named conversational_retriever which gives us ability to create a chatbot that connects to a datastore easily. As we have prepared the knowledge base above, it’s just a matter of wiring everything together and creating a Text interface to handle user interaction.

Setup the chain

	// memory to store conversation history	memory := []model.ChatMessage{}	// Creating text splitter for GPT3Dot5Turbo0301 model	textplitter, err := textsplitter.NewTikTokenSplitter(_openai.GPT3Dot5Turbo0301)	// create the chain	chain := conversational_retrieval.NewConversationalRetrievalChain(chatModel, memory, wvClient, "Indonesia", textplitter, callback.NewManager(), "", 2000, false) 

Create a simple text-based user interface, basically loop until user want to quit

	fmt.Println("AI : How can I help you, I know many things about indonesia")	scanner := bufio.NewScanner(os.Stdin)	for {	fmt.Print("User : ")	scanner.Scan()	input := scanner.Text()	output, err := chain.Run(context.Background(), map[string]string{"input": input})	if err != nil {	log.Println(err)	return	}	fmt.Println("AI : " + output["output"])	} 

You can find the full code in the GChain Example. It will looks like this :

Can I have a smaller Prometheus

Prometheus | Power Devops

For the past few weeks, I’ve been thinking about having a time series DB and visualization tool that is easy to use and light on resources, so it can be used locally for development. Tried several times to find it with keywords such as “Grafana alternative”, “Prometheus alternative”, nothing to be seen as an alternative close to my liking. At one stage I almost write my own tools, which will be too big an undertaking for me.

So why not just use Prometheus? Well not until you see the binary size of it, a whopping 103.93 MB. Not surprising for a statically linked executable, but still, it’s big. Yesterday morning I told myself, why not make it smaller? How hard it can be right? Well at least smaller on the binary size, as I’m not yet informed enough to do a runtime utilization measurement on it.

The first thing I did after getting access to the code is to just remove whatever code I deemed not necessary. Like one time I completely remove tracing and alerting functionality, but that’s only reducing a maximum of 2MB. This comes with a lot bunch of modified files and breaking test cases. Definitely not the right way, we need to do it smarter.

The weight

After browsing for a binary size analysis tool I found goweight. It helps to determine what’s contributing to the binary size :

No surprise to see that the main contributors are cloud provider SDKs, but it’s still disappointing. At this stage, it is just a matter of finding where those are being used. A simple text search revealed that all the SDKs are mainly for Service Discovery (SD), for local development file-SD are more than enough.

After tinkering here and there, the very minimum change to exclude all unused SD is to comment them out in the in discovery/install/install.go . With this change, all tests are passed except one test related to zookeeper initialization. The binary got reduced to just 35.69 MB, just 34% from the original one.

Wrap Up

After stripping the debugging info using -ldflags="-s -w", the size got reduced quite bit more. At the end we reduced the size to 28.30% of the original file to just 29.42 MB.

It’s crazy when you think that more than 50% (I’m being generous) of the size is something most people will not use, maybe this is the price of modern computation where 50MB means nothing. To be fair Prometheus developers are kinda aware of this and putting the discussion in the readme of service discovery, however, the binary size is not discussed there.

There are several potential ways to reduce the file size, like by using upx to package the file. However it’s not readily available on brew, so I just skipped it. Another way is just to cut more functionality like tracing, alert and remote storage, however, it will take more effort to maintain and make sure all the tests will be passed.

The related changes can be found here https://github.com/wejick/prometheus/commit/6dd776019a315764a9f70147f83f1235b9b13189

Kegelisahan

Pada suatu masa, bisa jadi hidup terasa mengalir begitu saja. Tidak ada rencana, tidak ada harapan, tidak ada pemikiran dalam menjalani hidup, sebuah definisi yang tepat dari kalimat “Living the present”. Yang terjadi sering kali hanya dua jenis; yang pertama adalah hal-hal yang sudah terprogram saja, rutinitas, seperti bangun mandi, makan, bayar iuran bulanan dan tagihan listrik. Yang kedua untuk hal yang tidak terprogram akan dihadapi secara impromtu saja, alias ya sudah dipikirkan saat sesuatu butuh untuk dipikirkan.

Nampaknya tenang-tenang saja, namun ada suatu kegelisahan yang ingin dikeluarkan. Kegelisahan akan stagnasi akal dan ilmu pengetahuan yang rasanya ingin sekali diasah, disirami, ataupun dilatih dan diolah. Dilatih, diolah dan disirami jiwa dan raganya.

Entah kenapa mungkin sejak lewat beberapa bulan masa pandemi, kurangnya aktifitas fisik, terkungkungnya interaksi manusia, menjadikan gairah untuk mengolah rasa dan pikiran menjadi lemah. Keinginan selalu lebih kecil daripada tarikan pesona hiburan instan ala youtube dan juga instagram. Melelahkan dan membosankan.