All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

Analyze documents (like PDFs) using the Gemini API

You can ask a Gemini model to analyze document files (like PDFs and plain-text files) that you provide either inline (base64-encoded) or via URL. When you use Firebase AI Logic, you can make this request directly from your app.

With this capability, you can do things like:

Analyze diagrams, charts, and tables inside documents
Extract information into structured output formats
Answer questions about visual and text contents in documents
Summarize documents
Transcribe document content (for example, into HTML), preserving layouts and formatting, for use in downstream applications (such as in RAG pipelines)

Jump to code samples Jump to code for streamed responses

See other guides for additional options for working with documents (like PDFs)
Generate structured output Multi-turn chat

Before you begin

Click your Gemini API provider to view provider-specific content and code on this page.

If you haven't already, complete the getting started guide, which describes how to set up your Firebase project, connect your app to Firebase, add the SDK, initialize the backend service for your chosen Gemini API provider, and create a GenerativeModel instance.

For testing and iterating on your prompts, we recommend using Google AI Studio.

Need a sample PDF file?

You can use this publicly available file with a MIME type of application/pdf (view or download file). https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

Generate text from PDF files (base64-encoded)

Before trying this sample, complete the Before you begin section of this guide to set up your project and app.
In that section, you'll also click a button for your chosen Gemini API provider so that you see provider-specific content on this page.

You can ask a Gemini model to generate text by prompting with text and PDFs—providing each input file's mimeType and the file itself. Find requirements and recommendations for input files later on this page.

Swift

You can call generateContent() to generate text from multimodal input of text and PDFs.

 import FirebaseAILogic // Initialize the Gemini Developer API backend service let ai = FirebaseAI.firebaseAI(backend: .googleAI()) // Create a `GenerativeModel` instance with a model that supports your use case let model = ai.generativeModel(modelName: "gemini-2.5-flash") // Provide the PDF as `Data` with the appropriate MIME type let pdf = try InlineDataPart(data: Data(contentsOf: pdfURL), mimeType: "application/pdf") // Provide a text prompt to include with the PDF file let prompt = "Summarize the important results in this report." // To generate text output, call `generateContent` with the PDF file and text prompt let response = try await model.generateContent(pdf, prompt) // Print the generated text, handling the case where it might be nil print(response.text ?? "No text in response.")

Kotlin

You can call generateContent() to generate text from multimodal input of text and PDFs.

^{For Kotlin, the methods in this SDK are suspend functions and need to be called from a Coroutine scope.}

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a model that supports your use case val model = Firebase.ai(backend = GenerativeBackend.googleAI())  .generativeModel("gemini-2.5-flash") val contentResolver = applicationContext.contentResolver // Provide the URI for the PDF file you want to send to the model val inputStream = contentResolver.openInputStream(pdfUri) if (inputStream != null) { // Check if the PDF file loaded successfully  inputStream.use { stream ->  // Provide a prompt that includes the PDF file specified above and text  val prompt = content {  inlineData(  bytes = stream.readBytes(),  mimeType = "application/pdf" // Specify the appropriate PDF file MIME type  )  text("Summarize the important results in this report.")  }  // To generate text output, call `generateContent` with the prompt  val response = model.generateContent(prompt)  // Log the generated text, handling the case where it might be null  Log.d(TAG, response.text ?: "")  } } else {  Log.e(TAG, "Error getting input stream for file.")  // Handle the error appropriately }

Java

You can call generateContent() to generate text from multimodal input of text and PDFs.

^{For Java, the methods in this SDK return a ListenableFuture.}

 // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a model that supports your use case GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())  .generativeModel("gemini-2.5-flash"); // Use the GenerativeModelFutures Java compatibility layer which offers // support for ListenableFuture and Publisher APIs GenerativeModelFutures model = GenerativeModelFutures.from(ai); ContentResolver resolver = getApplicationContext().getContentResolver(); // Provide the URI for the PDF file you want to send to the model try (InputStream stream = resolver.openInputStream(pdfUri)) {  if (stream != null) {  byte[] audioBytes = stream.readAllBytes();  stream.close();  // Provide a prompt that includes the PDF file specified above and text  Content prompt = new Content.Builder()  .addInlineData(audioBytes, "application/pdf") // Specify the appropriate PDF file MIME type  .addText("Summarize the important results in this report.")  .build();  // To generate text output, call `generateContent` with the prompt  ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);  Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {  @Override  public void onSuccess(GenerateContentResponse result) {  String text = result.getText();  Log.d(TAG, (text == null) ? "" : text);  }  @Override  public void onFailure(Throwable t) {  Log.e(TAG, "Failed to generate a response", t);  }  }, executor);  } else {  Log.e(TAG, "Error getting input stream for file.");  // Handle the error appropriately  } } catch (IOException e) {  Log.e(TAG, "Failed to read the pdf file", e); } catch (URISyntaxException e) {  Log.e(TAG, "Invalid pdf file", e); }

Web

You can call generateContent() to generate text from multimodal input of text and PDFs.

 import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai"; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = {  // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig); // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" }); // Converts a File object to a Part object. async function fileToGenerativePart(file) {  const base64EncodedDataPromise = new Promise((resolve) => {  const reader = new FileReader();  reader.onloadend = () => resolve(reader.result.split(','));  reader.readAsDataURL(file);  });  return {  inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },  }; } async function run() {  // Provide a text prompt to include with the PDF file  const prompt = "Summarize the important results in this report.";  // Prepare PDF file for input  const fileInputEl = document.querySelector("input[type=file]");  const pdfPart = await fileToGenerativePart(fileInputEl.files);  // To generate text output, call `generateContent` with the text and PDF file  const result = await model.generateContent([prompt, pdfPart]);  // Log the generated text, handling the case where it might be undefined  console.log(result.response.text() ?? "No text in response."); } run();

Dart

You can call generateContent() to generate text from multimodal input of text and PDFs.

 import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; // Initialize FirebaseApp await Firebase.initializeApp(  options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a model that supports your use case final model =  FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash'); // Provide a text prompt to include with the PDF file final prompt = TextPart("Summarize the important results in this report."); // Prepare the PDF file for input final doc = await File('document0.pdf').readAsBytes(); // Provide the PDF file as `Data` with the appropriate PDF file MIME type final docPart = InlineDataPart('application/pdf', doc); // To generate text output, call `generateContent` with the text and PDF file final response = await model.generateContent([  Content.multi([prompt,docPart]) ]); // Print the generated text print(response.text);

Unity

You can call GenerateContentAsync() to generate text from multimodal input of text and PDFs.

 using Firebase; using Firebase.AI; // Initialize the Gemini Developer API backend service var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()); // Create a `GenerativeModel` instance with a model that supports your use case var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash"); // Provide a text prompt to include with the PDF file var prompt = ModelContent.Text("Summarize the important results in this report."); // Provide the PDF file as `data` with the appropriate PDF file MIME type var doc = ModelContent.InlineData("application/pdf",  System.IO.File.ReadAllBytes(System.IO.Path.Combine(  UnityEngine.Application.streamingAssetsPath, "document0.pdf"))); // To generate text output, call `GenerateContentAsync` with the text and PDF file var response = await model.GenerateContentAsync(new [] { prompt, doc }); // Print the generated text UnityEngine.Debug.Log(response.Text ?? "No text in response.");

Learn how to choose a model appropriate for your use case and app.

Stream the response

You can achieve faster interactions by not waiting for the entire result from the model generation, and instead use streaming to handle partial results. To stream the response, call generateContentStream.