This is a Python client for the Unstructured API.
pip install unstructured-client
Only the files
parameter is required. See the general partition page for all available parameters.
from unstructured_client import UnstructuredClient from unstructured_client.models import shared from unstructured_client.models.errors import SDKError s = UnstructuredClient(api_key_auth="YOUR_API_KEY") filename = "sample-docs/layout-parser-paper.pdf" file = open(filename, "rb") req = shared.PartitionParameters( # Note that this currently only supports a single file files=shared.PartitionParametersFiles( content=file.read(), files=filename, ), # Other partition params strategy="fast", ) try: res = s.general.partition(req) print(res.elements[0]) except SDKError as e: print(e) # { # 'type': 'Title', # 'element_id': '015301d4f56aa4b20ec10ac889d2343f', # 'metadata': {'filename': 'layout-parser-paper.pdf', 'filetype': 'application/pdf', 'page_number': 1}, # 'text': 'LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis' # }
If you are self hosting the API, or developing locally, you can change the server URL when setting up the client.
# Using a local server s = unstructured_client.UnstructuredClient( server_url="http://localhost:8000", security=shared.Security( api_key_auth=api_key, ), ) # Using your own server s = unstructured_client.UnstructuredClient( server_url="https://your-server", security=shared.Security( api_key_auth=api_key, ), )
Handling errors in your SDK should largely match your expectations. All operations return a response object or raise an error. If Error objects are specified in your OpenAPI Spec, the SDK will raise the appropriate Error type.
import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = None try: res = s.general.partition(req) except (HTTPValidationError) as e: print(e) # handle exception if res.elements is not None: # handle response pass
You can override the default server globally by passing a server name to the server: str
optional parameter when initializing the SDK client instance. The selected server will then be used as the default on the operations that use it. This table lists the names associated with the available servers:
Name | Server | Variables |
---|---|---|
prod | https://api.unstructured.io | None |
local | http://localhost:8000 | None |
For example:
import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", server="local" ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = s.general.partition(req) if res.elements is not None: # handle response pass
The default server can also be overridden globally by passing a URL to the server_url: str
optional parameter when initializing the SDK client instance. For example:
import unstructured_client from unstructured_client.models import shared s = unstructured_client.UnstructuredClient( api_key_auth="YOUR_API_KEY", server_url="https://api.unstructured.io" ) req = shared.PartitionParameters( chunking_strategy='by_title', combine_under_n_chars=500, encoding='utf-8', files=shared.PartitionParametersFiles( content='+WmI5Q)|yy'.encode(), files='string', ), gz_uncompressed_content_type='application/pdf', hi_res_model_name='yolox', languages=[ '[', 'e', 'n', 'g', ']', ], new_after_n_chars=1500, output_format='application/json', skip_infer_table_types=[ 'p', 'd', 'f', ], strategy='hi_res', ) res = s.general.partition(req) if res.elements is not None: # handle response pass
The Python SDK makes API calls using the (requests)[https://pypi.org/project/requests/] HTTP library. In order to provide a convenient way to configure timeouts, cookies, proxies, custom headers, and other low-level configuration, you can initialize the SDK client with a custom requests.Session
object.
For example, you could specify a header for every request that your sdk makes as follows:
import unstructured_client import requests http_client = requests.Session() http_client.headers.update({'x-custom-header': 'someValue'}) s = unstructured_client.UnstructuredClient(client: http_client)
This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning usage to a specific package version. This way, you can install the same version each time without breaking changes unless you are intentionally looking for the latest version.
While we value open-source contributions to this SDK, this library is generated programmatically. Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!