Skip to content

Conversation

siilike
Copy link
Contributor

@siilike siilike commented Jun 24, 2019

What's new:

  • logical types (see below);
  • immutable Buffers from pointers;
  • minor improvements to the Collections class.

From the documentation:

Types 0xee-0xef are used for tagging of values to implement logical
types.

For example, if type 0x1c did not exist, the database driver could
serialize a timestamp object (Date in JavaScript, Instant in Java, etc)
into a Unix timestamp, a 64-bit integer. Assuming the lack of schema,
upon deserialization it would not be possible to tell an integer from
a timestamp and deserialize the value accordingly.

Type tagging resolves this by attaching an integer tag to values that
can then be read when deserializing the value, e.g. that tag=1 is a
timestamp and the relevant timestamp class should be used.

The tag values are specified separately and applications can also
specify their own to have the database driver deserialize their specific
data types into the appropriate classes (including models).

Essentially this is object-relational mapping for parts of documents.

The format of the type is:

0xee TAG in 1 byte sub VPack value 

or

0xef TAG in 8 bytes sub VPack value 

Tagging is implemented transparently -- operations on a Slice are unaffected and performed on the value of the tagged type, not on the type itself.

Looking forward to having this merged quickly.

@siilike siilike mentioned this pull request Jun 27, 2019
@graetzer
Copy link
Contributor

can you give us a rational for implementing this into velocypack ? I feel this is a sbustantial change and we need to understand this better

@graetzer
Copy link
Contributor

seems to me like the application should know the schema ahead of time, no ?

@siilike
Copy link
Contributor Author

siilike commented Jun 27, 2019

The use case for this is applications where schema is not explicitly defined or is defined only partially, so the database driver will need additional information to deserialize the values properly.

The timestamp issue given above is a classical example. It can be solved either by having a dedicated type (like VelocyPack has) or by having a tag that signals that the integer that has been stored is a timestamp. The latter way also allows defining application-specific types and types that would not be very useful at a format level (like an age or a height type).

This is highly useful in simpler applications where one can get away with not explicitly specifying the schema and different abstractions, basically the MongoDB use case. But it assumes that what goes into the database comes out of there with the exact same classes, otherwise it will get very clumsy.

Regarding immutable buffers, I checked and turns out I don't have a use case for them any more, not sure what exactly they were for. I will update the pull request accordingly.

@siilike
Copy link
Contributor Author

siilike commented Jul 31, 2019

Any updates on this?

@jsteemann
Copy link
Contributor

@siilike : sorry, all blame is on me, because I forgot to have a look at this PR.
I will get back within the next 24 hours.

@siilike
Copy link
Contributor Author

siilike commented Sep 3, 2019

Great, there are some changes still missing to make it work with the latest changes in master.

@siilike
Copy link
Contributor Author

siilike commented Sep 3, 2019

Now it is all updated. The only downside of the current implementation is that Slice supports transparency for only one level of tags, otherwise every access to _start would require iteration.

return false;
}

bool Collection::containsObject(Slice const& slice, Slice const& other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be renamed to containsSubobject?


static bool contains(Slice const& slice, Slice const& other);

static bool containsObject(Slice const& slice, Slice const& other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be renamed to containsSubobject?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and I actually don't have any use for this method, so I removed it for now.

I was also thinking that it is rather confusing to have array and object methods in the same class as they are not compatible, e.g. it is not possible to filter() or concat() an object, so it would make sense to either add support for objects to all methods or create an entire separate class for handling the same operations for objects.

@jsteemann
Copy link
Contributor

In general, I think the idea of tags makes sense for some applications.
However, this PR changes the behavior of several public methods in velocypack::Slice, so when existing applications start using tags, all existing calls in these applications to Slice::start(), Slice::byteSize() etc. will need to be changed.
Instead, the public methods in Slice should remain downwards-compatible. Otherwise, only the VPack data without tags will make it into client applications, and client applications have no chance of using the tags for anything.
Additionally, I now a few embedders that use the combination of start() and byteSize() and feed that into memcpy. If the behavior of start() and byteSize() however changes, then all tags will instantly be lost unless all embedder code is changed.

@jsteemann
Copy link
Contributor

And finally, we must add tests for tagged values and for the new method in Collections before we can this PR.

@jsteemann
Copy link
Contributor

if it helps, I am also happy to split up this multi-purpose PR into multiple small single-purpose PRs. That way we can already merge the following bits quickly, without having to wait for the bigger tags change:

  • the Collection-related additions (plus tests)
  • the 32bit hash function (plus tests)
  • CMakeLists.txt changes
    WDYT?
@siilike
Copy link
Contributor Author

siilike commented Sep 4, 2019

Thank you for reviewing.

The idea of making the tags transparent by default was that it makes working with the values a lot simpler. Otherwise there would need to be a method that returns the value if tagged and otherwise the object itself which would clutter the code and be an easy source of errors if the value initially isn't tagged, but becomes at some point.

But the important part is that is* and get* methods apply to the value of the tagged value, not the tagged value itself, so start(), end() and others can be reverted to apply to the whole value. This way we'll have enough compatibility with existing solutions and convenient access to values.

 Fix first issues Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
@siilike
Copy link
Contributor Author

siilike commented Sep 4, 2019

I removed the exception from Slice.tagOffset, because the only time when there can be an error is when a new tagged type is added to SliceStaticData.TypeMap and that method is not updated. This removes all the headache of noexcept.

I now updated the transparency logic as follows:

  • methods that deal with the pointer and length keep the former behaviour and have a relevant value* method: start, startAs, head, begin, end, byteSize;
  • in methods that deal with the value tags are transparent and some have a relevant raw* method: type, typeName, is*, get*, copy*, has*, etc.

If that is fine I'll also update Slice.cpp and any references elsewhere.

@siilike
Copy link
Contributor Author

siilike commented Sep 10, 2019

ping

constexpr inline bool isType(ValueType t) const noexcept {
return SliceStaticData::TypeMap[*start()] == t;
constexpr inline bool isType(ValueType t) const {
return SliceStaticData::TypeMap[*valueStart()] == t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this should include tags, then I think it should be

return SliceStaticData::TypeMap[*start()] == t; 
// get the type for the slice (excluding tags)
constexpr inline ValueType type() const noexcept {
return SliceStaticData::TypeMap[head()];
return type(valueHead());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed implementation in the PR changes the behavior of this method to return the type after jumping over the tag. This will make it non-downwards compatible. I think it should be reverted to:

Suggested change
return type(valueHead());
return SliceStaticData::TypeMap[head()];
return SliceStaticData::TypeMap[h];

// get the type for the slice (including tags)
constexpr inline ValueType rawType() const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be renamed to valueType and return the type of the tagged value.


// check if slice is the Boolean value true
constexpr bool isTrue() const noexcept { return head() == 0x1a; }
constexpr bool isTrue() const noexcept { return valueHead() == 0x1a; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still operate on head() and not on valueHead().
If a client application needs to peek into the value behind the tag, I think it should explicitly resolve the tag.

// return the value for a String object
char const* getString(ValueLength& length) const {
uint8_t const h = head();
uint8_t const h = valueHead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still operate on head() and not on valueHead().
If a client application needs to peek into the value behind the tag, I think it should explicitly resolve the tag.

if (h == 0xbf) {
// long UTF-8 String
length = readIntegerFixed<ValueLength, 8>(_start + 1);
length = readIntegerFixed<ValueLength, 8>(valueStart() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still operate on start() and not on valueStart().
If a client application needs to peek into the value behind the tag, I think it should explicitly resolve the tag.

length = readIntegerFixed<ValueLength, 8>(valueStart() + 1);
checkOverflow(length);
return reinterpret_cast<char const*>(_start + 1 + 8);
return reinterpret_cast<char const*>(valueStart() + 1 + 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still operate on start() and not on valueStart().
If a client application needs to peek into the value behind the tag, I think it should explicitly resolve the tag.


char const* getStringUnchecked(ValueLength& length) const noexcept {
uint8_t const h = head();
uint8_t const h = valueHead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still operate on head() and not on valueHead().
If a client application needs to peek into the value behind the tag, I think it should explicitly resolve the tag.

// short UTF-8 String
length = h - 0x40;
return reinterpret_cast<char const*>(_start + 1);
return reinterpret_cast<char const*>(valueStart() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel I am repeating the same sentence here over and over 😄
The same argument applies for most of the following functions.

@siilike
Copy link
Contributor Author

siilike commented Sep 15, 2019

Pushed the third take on this:

  • Slice has no behavioural changes, only tags support added;
  • created new ValueSlice class where tags are transparent;
  • added support for multiple tags -- initially I skipped this as I did not want to introduce iteration everywhere, but it seems lousy not to implement it when technically it is possible to have multiple tags.

If this is fine I'll also update Builder.

@siilike
Copy link
Contributor Author

siilike commented Oct 11, 2019

Turns out this overriding idea does not work. The start() method would have to be virtual which would mean the removal of constexpr and sizeof(arangodb::velocypack::Slice) != sizeof(void*).

@coveralls
Copy link

coveralls commented Nov 10, 2019

Coverage Status

Coverage decreased (-1.2%) to 94.984% when pulling ac70e54 on siilike:siilike into 4ebe24e on arangodb:master.

@siilike
Copy link
Contributor Author

siilike commented Nov 10, 2019

OK, I got rid of ValueSlice for now as I can temporarily give up the convenience until I find the time to resolve this. So, please review the tests and merge if everything is fine.

@OmarAyo
Copy link

OmarAyo commented Nov 11, 2019

Hi @siilike

Many thanks for contributing. We are obliged by law to have a signed scanned copy of a Contributor License Agreement (CLA) to be able to accept pull requests to the official ArangoDB repository -

We use an Apache 2 CLA for ArangoDB and its companion projects, which can be found here: https://www.arangodb.com/documents/cla.pdf
Just fill in the form, sign and send a scanned copy over to cla@arangodb.com

Thank you

@OmarAyo
Copy link

OmarAyo commented Nov 15, 2019

CLA available

@siilike
Copy link
Contributor Author

siilike commented Nov 21, 2019

Ping

@siilike
Copy link
Contributor Author

siilike commented Nov 28, 2019

Still waiting.

@siilike
Copy link
Contributor Author

siilike commented Dec 9, 2019

Yet another week has gone by.

@jsteemann
Copy link
Contributor

@siilike : Hi, sorry for the delay. I am currently busy with lots of other things as we are preparing a release. I will make time to look into this on either Wed or Thu this week. Sorry again!

@siilike
Copy link
Contributor Author

siilike commented Dec 9, 2019

Great, thanks.

I made a separate pull request for the specification change, so we can continue with arangodb/java-velocypack#15 in the meantime.

@jsteemann
Copy link
Contributor

@siilike : thanks for your patience! I left a few comments, but apart from these small issues I think it is good to go now.

@siilike
Copy link
Contributor Author

siilike commented Dec 12, 2019

Great, thank you. I made the fixes.

@jsteemann jsteemann merged commit 413e695 into arangodb:master Dec 12, 2019
@jsteemann
Copy link
Contributor

Thanks a lot for the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants