DEV Community

Toshipy
Toshipy

Posted on

Simple search and library application with Next.js x OpenSearch x Lambda(Python)

Background

When I have been in charge of implementing a search function in my work, I have been filtering and displaying data retrieved from the back-end side on the front-end side.

However, I was concerned about performance degradation as the amount of data increased, and I also thought it would be difficult to handle complex search requirements.

I was just about to learn about serverless architecture as well, so after doing some research, I found an introductory example on Youtube that was implemented using AWS serverless architecture, so I decided to try it myself.

Output

Technology Stack

  • Frontend: TypeScript, Next.js(App router v15)
  • Backend: Python
  • Infrastructure: Lambda, API Gateway, DynamoDB, OpenSearch
  • Authentication: Auth0(Omitted in this article)
  • Containers: Docker (omitted in this article)
  • Local environment construction: Cloudflare tunnel

Architecture

Front End Layer:.

  • User access to the application through a browser
  • Implemented with UI component using Next.js
  • Authentication server using Auth0 handles user authentication

Back-end layer:

  • FastAPI (Python-based) API Gateway installed
  • Process requests from the front end with Lambda

Data Storage Layer:

  • Use DynamoDB as main data storage
  • Employs OpenSearch as the search engine
  • Uses DynamoDB streams to handle data synchronization and updates

Backend

Build OpenSearch in a local environment (Docker)

At first, we built OpenSearch on Docker and prepared an environment that can be accessed from Lambda using Cloudflare tunnel. I verified whether I could confirm communication with OpenSearch in the local environment.

Cloudflare tunnel is a service of Cloudflare that enables secure publication to the Internet and provides an encrypted tunnel that does not require a public IP address or open ports.

The image is the following docker-compose.yml for the environment, building OpenSearch and OpenSearch Dashboard, and making sure it can be accessed from Lambda via Cloudflare tunnel.

version: '3' services:::services opensearch-node1:::opensearchproject/opensearch:2.18.18.18. image: opensearchproject/opensearch:2.18.0 container_name: opensearch-node1 environment: opensearch-node1 - discovery.type=single-node - node.name=opensearch-node1 - plugins.security.disabled=true - "_JAVA_OPTIONS=-XX:UseSVE=0" - OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD} - http.host=0.0.0.0 - transport.host=127.0.0.1 ulimits:: ${OPENSEARCH_INITIAL_ADMIN_PASSWORD memlock: -1 soft: -1 hard: -1 nofile: -1 soft: 65536 hard: 65536  volumes: opensearch-data1:/usr/share/opensearch/data - opensearch-data1:/usr/share/opensearch/data ports: opensearch-data1:/usr/share/opensearch/data - 9200:9200 - 9600:9600 networks: opensearch-net - opensearch-net opensearch-dashboards: opensearch-project/opensearch-dashboards:2.18.0 image: opensearchproject/opensearch-dashboards:2.18.0 container_name: opensearch-dashboards ports: opensearch-net - 5601:5601 expose: '5601' - '5601'  environment:. # OPENSEARCH_HOSTS: '["https://opensearch-node1:9200", "https://opensearch-node2:9200"]' - OPENSEARCH_HOSTS=http://opensearch-node1:9200 - DISABLE_SECURITY_DASHBOARDS_PLUGIN=true networks:. - opensearch-net volumes: opensearch-net opensearch-data1: networks: opensearch-net opensearch-net: 
Enter fullscreen mode Exit fullscreen mode

Building Infrastructure with Serverless Framework

The overall infrastructure is structured such that the search and registration APIs are defined in API Gateway, processing is done in Lambda, DynamoDB is used for storage, and OpenSearch is used as the search engine.

service: search-api provider: aws name: aws runtime: python3.9 region: ap-northeast-1 environment: ${file(env.yml)} iam: ${file(env.yml)} role:: ${file(env.yml)} statements: Allow - Effect: Allow Action: dynamodb:* Resource: !GetAtt BooksTable.Arn - Effect: Allow Action: es:ESHttp* Resource: ! - Sub arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/${self:provider.environment.OPENSEARCH_DOMAIN_NAME}/* package: true individually: true patterns: true - '! **' - 'requirements.txt' functions: api:: src.main.handler handler: src.main.handler package: include: src.main.handler include: src/**/*/*.py - src/**/*.py events:: httpApi - httpApi:: httpApi path: /docs method: get - httpApi: path: /openapi.json path: /openapi.json method: get - httpApi: path: /openapi.json path: /search method: get - httpApi: path: /openapi/json method: get path: /books method: ANY layers: PythonRequirementsLambdaLayer - Ref: PythonRequirementsLambdaLayer syncToOpensearch: handler: src.dynamodb_stream.handler handler: src.dynamodb_stream.handler package: src.dynamodb_stream.handler include: src/**/*/*.py - src/**/*.py events:: stream - stream: stream type: dynamodb arn: !GetAtt BooksTable.StreamArn layers: pythonRequirementsLambdaLayer - Ref: PythonRequirementsLambdaLayer resources: PythonRequirementsLambdaLayer Resources: PythonRequirementsLambdaLayer BooksTable::Table Type: AWS::DynamoDB::Table Properties: TableName: Books TableName: Books AttributeDefinitions: AttributeName: id - AttributeName: id AttributeType: S KeySchema: Books - AttributeName: id KeyType: HASH BillingMode: PAY_PER_REQUEST StreamSpecification: S StreamViewType: NEW_AND_OLD_IMAGES OpenSearchDomain:. Type: AWS::OpenSearchService::Domain Properties: ${self:OpenSearchService::Domain DomainName: ${self:provider.environment.OPENSEARCH_DOMAIN_NAME} ClusterConfig:: ${self:provider.environment.OPENSEARCH_DOMAIN_NAME} InstanceCount: 1 InstanceType: t3.small.search EBSOptions: true EBSEnabled: true VolumeSize: 10 VolumeType: gp2 EncryptionAtRestOptions: true Enabled: true NodeToNodeEncryptionOptions: Enabled: true Enabled: true DomainEndpointOptions: true EnforceHTTPS: true AdvancedSecurityOptions: Enabled: true Enabled: true InternalUserDatabaseEnabled: true MasterUserOptions: Enabled: true MasterUserName: ${self:provider.environment.OPENSEARCH_USERNAME} MasterUserPassword: ${self:provider.environment.OPENSEARCH_PASSWORD} plugins: ${self:provider environment.OPENSEARCH_USERNAME} - serverless-python-requirements custom:: ${self:provider environment. pythonRequirements: true layer: true 
Enter fullscreen mode Exit fullscreen mode

Features.

  • Python 3.9 runtime
  • Dependencies managed by serverless-python-requirements plugin
  • OpenSearch in single node configuration (t3.small)
  • Uses FastAPI framework (from the presence of /docs endpoint) to provide search and CRUD operations on book data
  • Enables DynamoDB streams to synchronize DyonamoDB changes to OpenSearch, Lambda is executed
  • syncToOpensearch: Function to synchronize data from DynamoDB to OpenSearch
  • DynamoDB table (BooksTable) to store and search book data

Search API

We wanted to test the API using SwaggerUI during development, so we used Python's FastAPI to be able to automatically generate OpenAPI (Swagger) documents.

Feature.

  • Both title and body text are searched
  • Enforces IAM-based authentication and HTTPS communication in the connection process with OpenSearch clients.
  • Separate use of DynamoDB and OpenSearch (separation of writing and searching) **Description of registration API is omitted.
  • hits → get the number of hits (result["hits"]["total"]["value"])
  • results → convert the search results list to an object of type SearchResponse.
table = get_dynamodb_table() app = FastAPI( title="Search API", description="Search API for OpenSearch", description="Search API for OpenSearch", description="Search API for OpenSearch", version="1.0.0", version="1.0.0 version="1.0.0") @app.get("/books",response_model=SearchResult, summary="Search for a character", description="Search for a character by name") async def search(keyword: str=None): try:: client = get_opensearch_search_result client = get_opensearch_client() query = build_opensearch_query(keyword) result = client.search(index="books", body=query) hits = result["hits"]["total"]["value"]] results = [ SearchResponse( id=str(hit['_source']['id']),. title=hit['_source']['title'], story=hit['_source']['story'], attributes=hit['_source']['attributes'], created_at=hit['_source']['created_at'], attributes=hit['_source']['attributes'], updated_at=hit['_source']['updated_at']. ) for hit in result["hits"]["hits"]["hits"] ] return SearchResult(hits=hits, results=results) except Exception as e:. raise HTTPException(status_code=500, detail=str(e)) handler = Mangum(app) 
Enter fullscreen mode Exit fullscreen mode

For schema and utils, see

from pydantic import BaseModel from typing import List class SearchResponse(BaseModel): id: str title: str story: str attributes: List[str]: str created_at: str updated_at: str class SearchResult(BaseModel): class SearchResult(BaseModel) hits: int results: List[SearchResponse]. class CreateBook(BaseModel): hits: int title: str class Book(BaseModel): id: str title: str story: str attributes: List[str]: str created_at: str updated_at: str 
Enter fullscreen mode Exit fullscreen mode
from opensearchpy import OpenSearch, RequestsHttpConnection from requests_aws4auth import AWS4Auth import os import boto3 def get_opensearch_client(): endpoint = os.environ.get('OPENSEARCH_ENDPOINT', '') host = endpoint.replace('https://', '').replace('http://', '') credentials = boto3.Session().get_credentials() region = os.environ.get('AWS_REGION', 'ap-northeast-1') auth = AWS4Auth( credentials.access_key, credentials.secret_key, region, 'es', session_token=credentials.token ) return OpenSearch( hosts=[{'host': host, 'port': 443}], http_auth=auth, use_ssl=True, verify_certs=False, connection_class=RequestsHttpConnection ) def get_dynamodb_table(): return boto3.resource('dynamodb').Table('Books') OPENSEARCH_INDEX = 'books' OPENSEARCH_MAPPING = { "mappings": { "properties": { "id": {"type": "keyword"}, "title": {"type": "text"}, "story": {"type": "text"}, "attributes": {"type": "keyword"}, "created_at": {"type": "date"}, "updated_at": {"type": "date"} } } } def build_opensearch_query(keyword): return { "query": { "multi_match": { "query": keyword, "fields": ["title", "story"] } } } 
Enter fullscreen mode Exit fullscreen mode

Frontend

Separation of Interest between ServerActions and Services

The following separation of concerns between ServerActions and Services makes the code easier to read and clarifies whether an error occurs in the schema validation or in the request processing stage or in the response when a request is thrown. When a request is thrown, it becomes clear whether the error occurs in the schema validation or in the request processing stage or in the response.

ServerActions(next-safe-action):

  • Schema validation of input values with zod
  • Calling interface from client
  • Type safety
  • Error handling (application-level error handling)

Service:

  • Allows you to focus on business logic implementation
  • API call implementation
  • Response handling
  • Error handling (error handling related to API communication and business logic)
'use server'. import { actionClient } from '@/lib/actions/safe-action' import { searchBooks } from '@/lib/services/books/search-books' import { searchQuerySchema } from '@/lib/zod/schemas/books' export const searchBooksAction = actionClient .schema(searchQuerySchema) .action(async input => { const books = await searchBooks({ params: { keyword: input.parsedInput.keyword } }) return books }) 
Enter fullscreen mode Exit fullscreen mode
import 'server-only' from '@/lib/services/utils' import { path, handleFailed, handleSucceed import { path, handleFailed, handleSucceed } from '@/lib/services/utils' import { SearchBooksRequest, SearchBooksResponse } from '@/lib/types/books' export const searchBooks = async ({ params: { keyword } }: SearchBooksRequest): Promise<SearchBooksResponse> => { const url = path(`/books?keyword=${encodeURIComponent(keyword)}`) return fetch(url, { headers: { 'Content-Type': 'application/json' }, { headers: { 'Content-Type': 'application/json' method: 'GET', { cache: 'no-store' cache: 'no-store' }) .then(handleSucceed) .catch(handleFailed) } 
Enter fullscreen mode Exit fullscreen mode

shadcn-ui provides a modern UI component.

We used the command component in this implementation.
https://ui.shadcn.com/docs/components/command

Using shadcn/ui, you can save implementation labor (such as drop-down and focus management, which are complex implementations specific to search UI) and concentrate on the business logic of the search function.
As a result, it is recommended as it can significantly reduce the time required to implement the UI.

Fast Linter & Formatter by Biome

Since we are using TypeScript this time, we used Biome instead of ESLint or Prettier.

Biome has the following features.

  • Fast code linter & formatter written in Rust
  • Combines the functionality of ESLint and Prettier in one tool
  • Designed for performance and runs fast (very fast)
  • Easy to set up so it can be deployed quickly (with VSCode, all you need to do is install the Biome extension and customize biome.json)

For example, just write the following Makefile as follows, and you can check the type, linter, and format with make check on the terminal.
Also, with make format, formatting will be executed quickly according to the rules of biome that you customized yourself.

 biome-check: . npx biome check . /src biome-write: npx biome check --write . npx biome check --write . /src tsc-check: npx tsc --noEmit npx tsc --noEmit check: npx tsc --noEmit make biome-check make tsc-check format: make biome-write make biome-write https://biomejs.dev/ja/formatter/ ## Finally If I have a chance in the future, I would like to introduce Japanese morphological analysis tools such as `Kuromoji`and `Sudachi` to improve the accuracy of searching Japanese texts. Since there are a lot of contents, I think some parts may be hard to understand or half-omitted, but I wrote this in the hope that it will be useful to someone! 🙇 Description 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)