DEV Community

0xkoji
0xkoji

Posted on

Run Gemma on Google Colab Free tier

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens

https://huggingface.co/blog/gemma

In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.

requirements

  • HuggingFace account
  • Google account

Step 1. Get access to Gemma

We can use Gemma with Transformers 4.38 but to do that first we need to get a grant to access the model.

https://huggingface.co/google/gemma-7b

Once you get a grant, you will see the below in the above page.

gemma model

Step 2. Add HF_TOKEN to Google Colab

We need to add HF_TOKEN to Google Colab to access gemma via Transformers.

First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens

Then click the key icon in the sidebar on Google Colab like below.

Step 3. Install packages

!pip install -U "transformers==4.38.1" --upgrade !pip install accelerate !pip install -i https://pypi.org/simple/ bitsandbytes 
Enter fullscreen mode Exit fullscreen mode

Step 4. Write Python code to run Gemma

We can use gemma-7b model via transformers.

from transformers import AutoTokenizer, pipeline import torch model = "google/gemma-7b-it" # use quantized model pipeline = pipeline( "text-generation", model=model, model_kwargs={ "torch_dtype": torch.float16, "quantization_config": {"load_in_4bit": True} }, ) messages = [ {"role": "user", "content": "Tell me about ChatGPT"}, ] prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline( prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) print(outputs[0]["generated_text"][len(prompt):]) 
Enter fullscreen mode Exit fullscreen mode

Result

The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲

ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:

Key Features:

  • Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
  • Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
  • Conversation: It can engage in natural language conversation, answer questions, and provide information.
  • Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
  • Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.

Additional Information:

  • Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
  • Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
  • Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development

Top comments (0)