Skip to content

Conversation

nunnsy
Copy link
Contributor

@nunnsy nunnsy commented Nov 10, 2024

As mentioned in this issue, the inclusion of a .pyi file makes type checkers and IDEs happier to show function signatures, required types and defaults if they exist.

Unfortauntely, I tried a few pyi generation scripts/modules, and nothing worked flawlessly. Otherwise, I'd have added a script to the setup.py to auto-generate, allowing for a more streamlined process to any updates to the Cython file (ingress.pyx).

For now, this is a quick crack at manually getting the stub file complete using the existing Cython public methods, types and docstrings.

This PR also includes adjusting one of the TOML fields which showed an error as an unexpected type, as well as removing a duplicate function delcaration from the Cython file.

It also removes redundant built logic given the underlying issue has been solved, and adjusts checks for fastparquet due to no 3.13 binaries being available yet.

@CLAassistant
Copy link

CLAassistant commented Nov 10, 2024

CLA assistant check
All committers have signed the CLA.

@nunnsy nunnsy marked this pull request as draft November 10, 2024 21:08
@nunnsy nunnsy mentioned this pull request Nov 10, 2024
[tool.cibuildwheel]
# See: https://cibuildwheel.readthedocs.io/en/stable/options/#configuration-file
build-verbosity = "3"
build-verbosity = 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""Maximum length of a table or column name."""
return self._max_name_len

@property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a duplicate of lines #792-795

@nunnsy nunnsy marked this pull request as ready for review November 12, 2024 15:51
@nunnsy
Copy link
Contributor Author

nunnsy commented Nov 12, 2024

@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.

@amunra
Copy link
Collaborator

amunra commented Nov 12, 2024

@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.

No, that's not an issue.
My main concern here is that it's painfully easy for the .pyi definitions to go out of date.

They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.

@nunnsy
Copy link
Contributor Author

nunnsy commented Nov 12, 2024

My main concern here is that it's painfully easy for the .pyi definitions to go out of date.

Agreed. I tried looking for solutions, as mentioned in the initial PR comment, for autocompleting a stub file. I found all solutions, as it stands, have drawbacks, which means they do not produce a reliable file (and in most cases, just error out, even with extreme modification, as all these scripts text/symbol parsing).

They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.

I don't think you're wrong 😔

I think the best long-term solution would be to have something like mypy build compatibility of its stubgen to read Cython files, and it does look like there are some people working on it. For the time being though, I'm not sure there's a better solution than manual checks, unless there was a desire to try and check symbols within the build script.

Also, given how little this interface is changing (at least for the time being), I wasn't sure whether it would be ok for now either way?

@nunnsy
Copy link
Contributor Author

nunnsy commented Nov 12, 2024

@amunra
Copy link
Collaborator

amunra commented Nov 12, 2024

Auto-generation packages I tried:

* https://github.com/Vizonex/CyStub * https://github.com/mike-huls/cythonbuilder (the `interface` sub-command) * https://github.com/RaubCamaioni/CythonPEG 

These are the same ones I would have experimented with.
cythonbuilder looks like it's the only maintained one.

What issues did you encounter, specifically?

@nunnsy
Copy link
Contributor Author

nunnsy commented Nov 12, 2024

Agreed, and good question.

Initially the logs output:

(questdb) daniel@boopers:~/projects/py-questdb-client/src/questdb$ cybuilder interface -v [cythonbuilder] 16:49:21 cli 148 Generating interface files.. [cythonbuilder] 16:49:21 cli 155 Found 1 to create pyi files for [cythonbuilder] these 1 pyx files? (y/n) - /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyxy [cythonbuilder] 16:49:24 cli 163 Creating 1 pyi files.. [cythonbuilder] 16:49:24 cli 166 Generating pyi files.. [cythonbuilder] 16:49:24 cython_builder 161 [cy_interface] - generating interface files for 1 found pyx files [cythonbuilder] 16:49:24 cython_builder 169 Creating .pyi for /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyx [cythonbuilder] 16:49:24 cli 171 build error: string index out of range 

To look deeper, I made a throwaway script:

from cythonbuilder.pyigenerator import pyx_to_pyi with open("./src/questdb/ingress.pyx", "r") as f: output = pyx_to_pyi(f) with open("./src/questdb/ingress.pyi", "w") as f: f.writelines(output)

Which resulted in:

Traceback (most recent call last): File "grr.py", line 4, in <module> output = pyx_to_pyi(f) File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in pyx_to_pyi all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines] File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in <listcomp> all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines] File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 258, in get_line_indentation_spacecount if (line[letter_idx + 1] != ' '): IndexError: string index out of range 

After fixing their counter which doesn't handle a line length of 1 correctly...

def get_line_indentation_spacecount(line: str) -> int: """ How many spaces is the provided line indented? """ spaces: int = 0 # My code ------ for c in line: if c != ' ': break spaces += 1 return spaces # End my code --- if (len(line) > 0): if (line[0] == ' '): for letter_idx in range(len(line)): if (line[letter_idx + 1] != ' '): spaces = letter_idx + 1 break return spaces

Another issue crops up:

Traceback (most recent call last): File "grr.py", line 4, in <module> output = pyx_to_pyi(f) File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 200, in pyx_to_pyi raise ValueError(f"Found invalid indentation: {indent} not divisible by {spaces_for_one_tab}") ValueError: Found invalid indentation: 6 not divisible by 4 

Removing that check (for the time being), and also modifying their processor to remove lines which are all spaces by adding

 pyx_lines = [l for l in pyx_lines if not l.isspace()]

We do yield a result of the interface file:

""" import cython from enum import Enum from typing import List, Tuple, Dict, Union, Any, Optional, Callable, \ import pathlib import sys import os def _has_gil(gs:pythreadstate**) -> bool:  ... def _ensure_doesnt_have_gil(gs:pythreadstate**) -> bool:  ... def _ensure_has_gil(gs:pythreadstate**) -> None:  ... class IngressErrorCode(Enum):  """Category of Error."""  CouldNotResolveAddr = line_sender_error_could_not_resolve_addr  InvalidApiCall = line_sender_error_invalid_api_call  SocketError = line_sender_error_socket_error  InvalidUtf8 = line_sender_error_invalid_utf8  InvalidName = line_sender_error_invalid_name  InvalidTimestamp = line_sender_error_invalid_timestamp  AuthError = line_sender_error_auth_error  TlsError = line_sender_error_tls_error  HttpNotSupported = line_sender_error_http_not_supported  ServerFlushError = line_sender_error_server_flush_error  ConfigError = line_sender_error_config_error  BadDataFrame = <int>line_sender_error_server_flush_error + 1  def __str__(self) -> str:  ... class IngressError(Exception):  """An error whilst using the ``Sender`` or constructing its ``Buffer``."""  def __init__(self, code, msg):  ...  @property  def code(self) -> IngressErrorCode:  ... def c_err_code_to_py(code:line_sender_error_code):  ... def c_err_to_code_and_msg(err:line_sender_error*):  ... def c_err_to_py(err:line_sender_error*):  ... def c_err_to_py_fmt(err:line_sender_error*, fmt:str):  ... def _fqn(obj:type) -> str:  ... def datetime_to_micros(dt:datetime) -> int64_t:  ... def datetime_to_nanos(dt:datetime) -> int64_t:  ... class _ServerTimestamp:  """ A placeholder value to indicate using a server-generated-timestamp. """  pass ServerTimestamp = _ServerTimestamp() class TimestampMicros:  """ A timestamp in microseconds since the UNIX epoch (UTC). You may construct a ``TimestampMicros`` from an integer or a ``datetime.datetime``, or simply call the :func:`TimestampMicros.now` method. .. code-block:: python TimestampMicros.now() TimestampMicros(time.time_ns() // 1000) TimestampMicros(1657888365426838) ``TimestampMicros`` can also be constructed from a ``datetime.datetime`` object. .. code-block:: python TimestampMicros.from_datetime( datetime.datetime.now(tz=datetime.timezone.utc)) We recommend that when using ``datetime`` objects, you explicitly pass in the timezone to use. This is because ``datetime`` objects without an associated timezone are assumed to be in the local timezone and it is easy to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely bug). """  cdef int64_t _value  def __cinit__(self, value: int):  ...  @classmethod  def from_datetime(cls, dt: datetime):  ...  @classmethod  def now(cls):  ...  @property  def value(self) -> int:  ...  def __repr__(self):  ... class TimestampNanos:  """ A timestamp in nanoseconds since the UNIX epoch (UTC). You may construct a ``TimestampNanos`` from an integer or a ``datetime.datetime``, or simply call the :func:`TimestampNanos.now` method. .. code-block:: python TimestampNanos.now() TimestampNanos(time.time_ns()) TimestampNanos(1657888365426838016) ``TimestampNanos`` can also be constructed from a ``datetime`` object. .. code-block:: python TimestampNanos.from_datetime( datetime.datetime.now(tz=datetime.timezone.utc)) We recommend that when using ``datetime`` objects, you explicitly pass in the timezone to use. This is because ``datetime`` objects without an associated timezone are assumed to be in the local timezone and it is easy to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely bug). """  cdef int64_t _value  def __cinit__(self, value: int):  ...  @classmethod  def from_datetime(cls, dt: datetime):  ...  @classmethod  def now(cls):  ...  @property  def value(self) -> int:  ...  def __repr__(self):  ... def -1:  ... def _is_tcp_protocol(protocol:line_sender_protocol) -> bool:  ... def _is_http_protocol(protocol:line_sender_protocol) -> bool:  ... class SenderTransaction:  """ A transaction for a specific table. Transactions are not supported with ILP/TCP, only ILP/HTTP. The sender API can only operate on one transaction at a time. To create a transaction: .. code_block:: python with sender.transaction('table_name') as txn: txn.row(..) txn.dataframe(..) """  cdef Sender _sender  cdef str _table_name  cdef bint _complete  def __cinit__(self, Sender sender, str table_name):  ...  def __enter__(self):  ...  def __exit__(self, exc_type, _exc_value, _traceback):  ...  def commit(self):  ...  def rollback(self):  ... class Buffer:  """ Construct QuestDB-flavored InfluxDB Line Protocol (ILP) messages. The :func:`Buffer.row` method is used to add a row to the buffer. You can call this many times. .. code-block:: python from questdb.ingress import Buffer from questdb.ingress import Sender, Buffer def __cinit__(self, init_buf_size: int=65536, max_name_len: int=127): ... def _cinit_impl(self, init_buf_size:size_t, max_name_len:size_t) -> inline: ... def __dealloc__(self): ... @property def init_buf_size(self) -> int: ... @property def max_name_len(self) -> int: ... def reserve(self, additional: int): ... def capacity(self) -> int: ... def clear(self): ... def __len__(self) -> int: ... def __str__(self) -> str: ... def _to_str(self): ... def -1: ... def -1: ... def _clear_marker(self) -> inline: ... def -1: ... def _cleared_b(self): ... def -1: ... def -1: ... def -1: ... def -1: ... def -1: ... def -1: ... def -1: ... import pandas as pd import questdb.ingress as qi def _timedelta_to_millis(timedelta:object) -> uint64_t: ... def auto_flush_rows_default(protocol:line_sender_protocol) -> int64_t: ... class TaggedEnum(Enum): """  Base class for tagged enums.  """ @property def tag(self): ... @property def c_value(self): ... @classmethod def parse(cls, tag): ... class Protocol(TaggedEnum): """  Protocol to use for sending data to QuestDB.  See :ref:`sender_which_protocol` for more information.  """ Tcp = ('tcp', 0) Tcps = ('tcps', 1) Http = ('http', 2) Https = ('https', 3) @property def tls_enabled(self): ... class TlsCa(TaggedEnum): """  Verification mechanism for the server's certificate.  Here ``webpki`` refers to the  `WebPKI library <https://github.com/rustls/webpki-roots>`_ and  ``os`` refers to the operating system's certificate store.  See :ref:`sender_conf_tls` for more information.  """ WebpkiRoots = ('webpki_roots', line_sender_ca_webpki_roots) OsRoots = ('os_roots', line_sender_ca_os_roots) WebpkiAndOsRoots = ('webpki_and_os_roots', line_sender_ca_webpki_and_os_roots) PemFile = ('pem_file', line_sender_ca_pem_file) def c_parse_conf_err_to_py(err:questdb_conf_str_parse_err*) -> object: ... class Sender: """  Ingest data into QuestDB.  See the :ref:`sender` documentation for more information.  """ cdef object __weakref__ cdef line_sender_protocol _c_protocol cdef line_sender_opts* _opts cdef line_sender* _impl cdef Buffer _buffer cdef auto_flush_mode_t _auto_flush_mode cdef int64_t* _last_flush_ms cdef size_t _init_buf_size cdef size_t _max_name_len cdef bint _in_txn cdef void_int _set_sender_fields( self, qdb_pystr_buf* b, object protocol, str bind_interface, str username, str password, str token, str token_x, str token_y, object auth_timeout, object tls_verify, object tls_ca, object tls_roots, object max_buf_size, object retry_timeout, object request_min_throughput, object request_timeout, object auto_flush, object auto_flush_rows, object auto_flush_bytes, object auto_flush_interval, object init_buf_size, object max_name_len) except -1: """  Set optional parameters for the sender.  """ cdef line_sender_error* err = NULL cdef str user_agent = 'questdb/python/' + VERSION cdef line_sender_utf8 c_user_agent cdef line_sender_utf8 c_bind_interface cdef line_sender_utf8 c_username cdef line_sender_utf8 c_password cdef line_sender_utf8 c_token cdef line_sender_utf8 c_token_x cdef line_sender_utf8 c_token_y cdef uint64_t c_auth_timeout cdef bint c_tls_verify cdef line_sender_ca c_tls_ca cdef line_sender_utf8 c_tls_roots cdef uint64_t c_max_buf_size cdef uint64_t c_retry_timeout cdef uint64_t c_request_min_throughput cdef uint64_t c_request_timeout self._c_protocol = protocol.c_value str_to_utf8(b, <PyObject*>user_agent, &c_user_agent) if not line_sender_opts_user_agent(self._opts, c_user_agent, &err): raise c_err_to_py(err) if bind_interface is not None: str_to_utf8(b, <PyObject*>bind_interface, &c_bind_interface) if not line_sender_opts_bind_interface(self._opts, c_bind_interface, &err): raise c_err_to_py(err) if username is not None: str_to_utf8(b, <PyObject*>username, &c_username) if not line_sender_opts_username(self._opts, c_username, &err): raise c_err_to_py(err) if password is not None: str_to_utf8(b, <PyObject*>password, &c_password) if not line_sender_opts_password(self._opts, c_password, &err): raise c_err_to_py(err) if token is not None: str_to_utf8(b, <PyObject*>token, &c_token) if not line_sender_opts_token(self._opts, c_token, &err): raise c_err_to_py(err) if token_x is not None: str_to_utf8(b, <PyObject*>token_x, &c_token_x) if not line_sender_opts_token_x(self._opts, c_token_x, &err): raise c_err_to_py(err) if token_y is not None: str_to_utf8(b, <PyObject*>token_y, &c_token_y) if not line_sender_opts_token_y(self._opts, c_token_y, &err): raise c_err_to_py(err) if auth_timeout is not None: if isinstance(auth_timeout, int): c_auth_timeout = auth_timeout elif isinstance(auth_timeout, timedelta): c_auth_timeout = _timedelta_to_millis(auth_timeout) else: raise TypeError( '"auth_timeout" must be an int or a timedelta, ' f'not {_fqn(type(auth_timeout))}') if not line_sender_opts_auth_timeout(self._opts, c_auth_timeout, &err): raise c_err_to_py(err) if tls_verify is not None: if (tls_verify is True) or (tls_verify == 'on'): c_tls_verify = True elif (tls_verify is False) or (tls_verify == 'unsafe_off'): c_tls_verify = False else: raise ValueError( '"tls_verify" must be a bool, "on" or "unsafe_off", ' f'not {tls_verify!r}') if not line_sender_opts_tls_verify(self._opts, c_tls_verify, &err): raise c_err_to_py(err) if tls_roots is not None: tls_roots = str(tls_roots) str_to_utf8(b, <PyObject*>tls_roots, &c_tls_roots) if not line_sender_opts_tls_roots(self._opts, c_tls_roots, &err): raise c_err_to_py(err) if tls_ca is not None: c_tls_ca = TlsCa.parse(tls_ca).c_value if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err): raise c_err_to_py(err) elif protocol.tls_enabled and tls_roots is None: c_tls_ca = line_sender_ca_webpki_and_os_roots if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err): raise c_err_to_py(err) if max_buf_size is not None: c_max_buf_size = max_buf_size if not line_sender_opts_max_buf_size(self._opts, c_max_buf_size, &err): raise c_err_to_py(err) if retry_timeout is not None: if isinstance(retry_timeout, int): c_retry_timeout = retry_timeout if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err): raise c_err_to_py(err) elif isinstance(retry_timeout, timedelta): c_retry_timeout = _timedelta_to_millis(retry_timeout) if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err): raise c_err_to_py(err) else: raise TypeError( '"retry_timeout" must be an int or a timedelta, ' f'not {_fqn(type(retry_timeout))}') if request_min_throughput is not None: c_request_min_throughput = request_min_throughput if not line_sender_opts_request_min_throughput(self._opts, c_request_min_throughput, &err): raise c_err_to_py(err) if request_timeout is not None: if isinstance(request_timeout, int): c_request_timeout = request_timeout if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err): raise c_err_to_py(err) elif isinstance(request_timeout, timedelta): c_request_timeout = _timedelta_to_millis(request_timeout) if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err): raise c_err_to_py(err) else: raise TypeError( '"request_timeout" must be an int or a timedelta, ' f'not {_fqn(type(request_timeout))}') _parse_auto_flush( self._c_protocol, auto_flush, auto_flush_rows, auto_flush_bytes, auto_flush_interval, &self._auto_flush_mode) self._init_buf_size = init_buf_size or 65536 self._max_name_len = max_name_len or 127 self._buffer = Buffer( init_buf_size=self._init_buf_size, max_name_len=self._max_name_len) self._last_flush_ms = <int64_t*>calloc(1, sizeof(int64_t)) def __cinit__(self): ... @staticmethod @staticmethod def new_buffer(self): ... @property def init_buf_size(self) -> int: ... @property def max_name_len(self) -> int: ... @property def auto_flush(self) -> bint: ... @property def auto_flush_rows(self) -> Optional[int]: ... @property def auto_flush_bytes(self) -> Optional[int]: ... @property def auto_flush_interval(self) -> Optional[timedelta]: ... def establish(self): ... def __enter__(self) -> Sender: ... def __str__(self) -> str: ... def __len__(self) -> int: ... def transaction(self, table_name: str): ... import pandas as pd import questdb.ingress as qi def _close(self): ... def close(self, flush:bool=True): ... def __exit__(self, exc_type, _exc_val, _exc_tb): ... def __dealloc__(self): ...

It was at this point, I figured I was facing an uphill battle with their parser, so opted to go the manual route, at least on my fork to allow for easier dev on my side.

Hope some of this info is helpful

@amunra
Copy link
Collaborator

amunra commented Nov 12, 2024

I've just tried it too. It's not going to have decent mileage if it isn't using the Cython module as parsing. Indentation will be the least of its problems.

Copy link
Collaborator

@amunra amunra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried on my end too, and there's really no better way of approaching this for now.

I appreciate the minor improvements too, thanks for the changes here!

@amunra amunra merged commit 4d78220 into questdb:main Nov 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants