Skip to content

Unstructured-IO/unstructured-python-client

Python SDK for the Unstructured API

This is a Python client for the Unstructured API.

SDK Installation

pip install unstructured-client

Usage

Only the files parameter is required. See the general partition page for all available parameters. 

from unstructured_client import UnstructuredClient from unstructured_client.models import shared from unstructured_client.models.errors import SDKError s = UnstructuredClient(api_key_auth="YOUR_API_KEY") filename = "sample-docs/layout-parser-paper.pdf" file = open(filename, "rb") req = shared.PartitionParameters( # Note that this currently only supports a single file files=shared.PartitionParametersFiles( content=file.read(), files=filename, ), # Other partition params strategy="fast", ) try: res = s.general.partition(req) print(res.elements[0]) except SDKError as e: print(e) # { # 'type': 'Title', # 'element_id': '015301d4f56aa4b20ec10ac889d2343f', # 'metadata': {'filename': 'layout-parser-paper.pdf', 'filetype': 'application/pdf', 'page_number': 1}, # 'text': 'LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis' # }

Change the base URL

If you are self hosting the API, or developing locally, you can change the server URL when setting up the client.

# Using a local server s = unstructured_client.UnstructuredClient( server_url="http://localhost:8000", security=shared.Security( api_key_auth=api_key, ), ) # Using your own server s = unstructured_client.UnstructuredClient( server_url="https://your-server", security=shared.Security( api_key_auth=api_key, ), )

Error Handling

Handling errors in your SDK should largely match your expectations. All operations return a response object or raise an error. If Error objects are specified in your OpenAPI Spec, the SDK will raise the appropriate Error type.

Example

import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = None try: res = s.general.partition(req) except (HTTPValidationError) as e: print(e) # handle exception if res.elements is not None: # handle response pass

Server Selection

Select Server by Name

You can override the default server globally by passing a server name to the server: str optional parameter when initializing the SDK client instance. The selected server will then be used as the default on the operations that use it. This table lists the names associated with the available servers:

Name Server Variables
prod https://api.unstructured.io None
local http://localhost:8000 None

For example:

import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", server="local" ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = s.general.partition(req) if res.elements is not None: # handle response pass

Override Server URL Per-Client

The default server can also be overridden globally by passing a URL to the server_url: str optional parameter when initializing the SDK client instance. For example:

import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", server_url="https://api.unstructured.io" ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = s.general.partition(req) if res.elements is not None: # handle response pass

Custom HTTP Client

The Python SDK makes API calls using the (requests)[https://pypi.org/project/requests/] HTTP library. In order to provide a convenient way to configure timeouts, cookies, proxies, custom headers, and other low-level configuration, you can initialize the SDK client with a custom requests.Session object.

For example, you could specify a header for every request that your sdk makes as follows:

import unstructured_client import requests http_client = requests.Session() http_client.headers.update({'x-custom-header': 'someValue'}) s = unstructured_client.UnstructuredClient(client: http_client)

Maturity

This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning usage to a specific package version. This way, you can install the same version each time without breaking changes unless you are intentionally looking for the latest version.

Contributions

While we value open-source contributions to this SDK, this library is generated programmatically. Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!

SDK Created by Speakeasy

About

A Python client for the Unstructured Platform API

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 28

Languages