Skip to content
This repository was archived by the owner on Sep 17, 2022. It is now read-only.

Conversation

nkreeger
Copy link
Contributor

@nkreeger nkreeger commented Jul 12, 2019

This PR will re-work string usage in the tfjs-node C++ binding code. Currently, the binding reads out string values as string values (char*) and fits them into a heap-based buffer (avoid stack thrashing).

With the change in tfjs-core, the buffers are already managed by V8 so need to manage a heap buffer. However, strings tensors must be encoded using the appropriate TF APIs. This change handles writing uint8 array buffers into TF Tensors, and re-rendering them into typed arrays before surfacing to JS land.

I don't think unit tests will pass here until tfjs-union is published.


This change is Reviewable

@nkreeger nkreeger requested a review from dsmilkov July 12, 2019 16:00
Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @dsmilkov and @nkreeger)


binding/tfjs_backend.cc, line 186 at r1 (raw file):

// Creates a TFE_TensorHandle from a JS array of Uint8Array values. TFE_TensorHandle *CreateTFE_TensorHandleFromUtf8StringArray(

the underlying bytes behind the Uint8Array[] could have any encoding so the fromStringArray suffix in the method name is more correct.


binding/tfjs_backend.cc, line 214 at r1 (raw file):

 if (array_type != napi_uint8_array) { NAPI_THROW_ERROR(env, "Unsupported array type - expecting Uint8 typed array");

nit: say Uint8Array to be more precise.


binding/tfjs_backend.cc, line 247 at r1 (raw file):

 ENSURE_TF_OK_RETVAL(env, tf_status, nullptr); offsets[i] = cur_str_data - str_data_start;

it's a bummer that the TF C API doesn't provide better utilities for constructing string tensors. Looks like their internal storage details are leaking to the user.

@nkreeger nkreeger changed the title Handle upcoming strings as bytes change in tfjs-core Upgrade @tensorflow/tfjs to 1.2.3 and handle strings as bytes Jul 17, 2019
Copy link
Contributor Author

@nkreeger nkreeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsmilkov PTAL - updated to 1.2.3 and resolved comments.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @dsmilkov)


binding/tfjs_backend.cc, line 247 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

it's a bummer that the TF C API doesn't provide better utilities for constructing string tensors. Looks like their internal storage details are leaking to the user.

Yeah a little funny to use for sure.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CI fails due to some lint error

Reviewed 4 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained

@nkreeger nkreeger merged commit fdc3570 into master Jul 17, 2019
@nkreeger nkreeger deleted the kreeger/str-as-bytes branch July 17, 2019 20:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants