카테고리:: 문자열 및 이진 함수 (AI 함수)

AI_EXTRACT¶

입력 문자열 또는 파일에서 정보를 추출합니다.

구문¶

입력 문자열에서 정보 추출:

AI_EXTRACT( <text>, <responseFormat> ) 

Copy

AI_EXTRACT( text => <text>, responseFormat => <responseFormat> ) 

Copy

파일에서 정보 추출:

AI_EXTRACT( <file>, <responseFormat> ) 

Copy

AI_EXTRACT( file => <file>, responseFormat => <responseFormat> ) 

Copy

인자¶

text

추출을 위한 입력 문자열입니다.

file

추출을 위한 FILE 입니다.

지원되는 파일 형식:

PDF
PNG
PPTX, PPT
EML
DOC, DOCX
JPEG, JPG
HTM, HTML
TEXT, TXT
TIF, TIFF
BMP, GIF, WEBP
MD

파일은 크기는 100MB 미만이어야 합니다.

responseFormat

다음 응답 형식 중 하나로 추출할 정보입니다.

Simple object schema that maps the label and information to be extracted; for example:

{'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'} 

추출할 정보가 포함된 문자열 배열입니다. 예를 들면 다음과 같습니다.
```
['What is the last name of the employee?', 'What is the address of the employee?'] 
```

An array of arrays that contain two strings (label and the information to be extracted); for example:

[['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']] 

A JSON schema that defines the structure of the extracted information. Supports entity and table extraction. For example:
```
{  'schema': {  'type': 'object',  'properties': {  'income_table': {  'description': 'Income for FY2026Q2',  'type': 'object',  'column_ordering': ['month', 'income'],  'properties': {  'month': {  'description': 'Month',  'type': 'array'  },  'income': {  'description': 'Income',  'type': 'array'  }  }  },  'title': {  'description': 'What is the title of the document?',  'type': 'string'  },  'employees': {  'description': 'What are the names of employees?',  'type': 'array'  }  }  } } 
```
참고
- JSON 스키마 형식을 다른 응답 형식과 결합할 수 없습니다. responseFormat``에 ``schema 키가 포함된 경우 JSON 스키마 내에서 모든 질문을 정의해야 합니다. 추가 키는 지원되지 않습니다.
- 이 모델은 특정 모양의 JSON 스키마만 허용합니다. 최상위 유형은 항상 독립적으로 추출된 하위 오브젝트를 포함하는 오브젝트여야 합니다. 하위 오브젝트는 테이블(열을 나타내는 문자열 목록의 오브젝트), 문자열 목록 또는 문자열일 수 있습니다.
  
  문자열은 현재 유일하게 지원되는 스칼라 유형입니다.
- description 필드는 선택 사항입니다.
  
  description 필드를 사용하여 모델에 컨텍스트를 제공합니다. 예를 들어, 모델이 문서에서 올바른 테이블을 현지화하는 데 도움을 줍니다.
- column_ordering 필드를 사용하여 추출된 테이블의 모든 열 순서를 지정합니다. column_ordering 필드는 대/소문자를 구분하며 properties 필드에 정의된 열 이름과 일치해야 합니다.

반환¶

추출된 정보를 포함하는 JSON 오브젝트입니다.

배열, 테이블 및 단일 값 추출을 포함하는 출력의 예:

{  "error": null,  "response": {  "employees": [  "Smith",  "Johnson",  "Doe"  ],  "income_table": {  "income": ["$120 678","$130 123","$150 998"],  "month": ["February", "March", "April"]  },  "title": "Financial report"  } } 

액세스 제어 요구 사항¶

Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. For information about granting this privilege, see Cortex LLM privileges.

사용법 노트¶

동일한 함수 호출에서 text 및 file 매개 변수를 동시에 사용할 수는 없습니다.
You can either ask questions in natural language or describe information to be extracted (such as city, street, ZIP code); for example:
['address': 'City, street, ZIP', 'name': 'First and last name']
다음 언어가 지원됩니다.
- 아랍어
- 벵골어
- 버마어
- 세부아노어
- 중국어
- 체코어
- 네덜란드어
- 영어
- 프랑스어
- 독일어
- 히브리어
- 힌디어
- 인도네시아어
- 이탈리아어
- 일본어
- 크메르어
- 한국어
- 라오어
- 말레이어
- 페르시아어
- 폴란드어
- 포르투갈어
- 러시아어
- 스페인어
- 타갈로그어
- 태국어
- 터키어
- 우르두어
- 베트남어
문서의 길이는 125페이지를 넘지 않아야 합니다.
단일 AI_EXTRACT 호출에서 엔터티 추출의 경우 최대 100개의 질문을 할 수 있으며 테이블 추출의 경우 최대 10개의 질문을 할 수 있습니다.

테이블 추출 질문 1개는 엔터티 추출 질문 10개와 같습니다. 예를 들어, 단일 AI_EXTRACT 호출에서 4개의 테이블 추출 질문과 60개의 엔터티 추출 질문을 할 수 있습니다.
엔터티 추출의 최대 출력 길이는 질문당 토큰 512개입니다. 테이블 추출의 경우 모델은 최대 4096개의 토큰에 해당하는 답변을 반환합니다.
Client-side encrypted stages are not supported.
신뢰도 점수는 지원되지 않습니다.

예¶

Extraction from an input string¶

다음 예제에서는 입력 텍스트에서 정보를 추출합니다.

SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'} ); 

Copy

다음 예제에서는 입력 텍스트에서 정보를 추출하고 구문 분석합니다.

SELECT AI_EXTRACT( text => 'John Smith lives in San Francisco and works for Snowflake', responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}') ); 

Copy

Extraction from a file¶

다음 예제에서는 document.pdf 파일에서 정보를 추출합니다.

SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files','document.pdf'), responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']] ); 

Copy

The following example extracts information from all files in a directory on a stage:

참고

디렉터리 테이블이 활성화되어 있는지 확인합니다. 자세한 내용은 디렉터리 테이블 관리하기 섹션을 참조하십시오.
```
SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', relative_path), responseFormat => [ 'What is this document?', 'How would you classify this document?' ] ) FROM DIRECTORY (@db.schema.files); 
```
Copy

The following example extracts the title value from the report.pdf file:

SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'title': { 'description': 'What is the title of document?', 'type': 'string' } } } } ); 

Copy

The following example extracts the employees array from the report.pdf file:

SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } ); 

Copy

The following example extracts the income_table table from the report.pdf file:

SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'column_ordering': ['month', 'income'], 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } } } } } ); 

Copy

The following example extracts table (income_table), single value (title), and array (employees) from the report.pdf file:

SELECT AI_EXTRACT( file => TO_FILE('@db.schema.files', 'report.pdf'), responseFormat => { 'schema': { 'type': 'object', 'properties': { 'income_table': { 'description': 'Income for FY2026Q2', 'type': 'object', 'column_ordering': ['month', 'income'], 'properties': { 'month': { 'type': 'array' }, 'income': { 'type': 'array' } } }, 'title': { 'description': 'What is the title of document?', 'type': 'string' }, 'employees': { 'description': 'What are the surnames of employees?', 'type': 'array' } } } } ); 

Copy

리전 가용성¶

리전 가용성 섹션을 참조하십시오.

법적 고지¶

법적 고지 사항은 Snowflake AI 및 ML 섹션을 참조하십시오.