Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 55 additions & 3 deletions src/wasm-interpreter.h
Original file line number Diff line number Diff line change
Expand Up @@ -1902,7 +1902,21 @@ class ExpressionRunner : public OverriddenVisitor<SubType, Flow> {
Flow visitStringConst(StringConst* curr) {
return Literal(curr->string.toString());
}
Flow visitStringMeasure(StringMeasure* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringMeasure(StringMeasure* curr) {
// For now we only support JS-style strings.
assert(curr->op == StringMeasureWTF16View);

Flow flow = visit(curr->ref);
if (flow.breaking()) {
return flow;
}
auto value = flow.getSingleValue();
auto data = value.getGCData();
if (!data) {
trap("null ref");
}
return Literal(int32_t(data->values.size()));
}
Flow visitStringEncode(StringEncode* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringConcat(StringConcat* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringEq(StringEq* curr) {
Expand Down Expand Up @@ -1971,11 +1985,49 @@ class ExpressionRunner : public OverriddenVisitor<SubType, Flow> {
}
return Literal(result);
}
Flow visitStringAs(StringAs* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringAs(StringAs* curr) {
// For now we only support JS-style strings.
assert(curr->op == StringAsWTF16);

Flow flow = visit(curr->ref);
if (flow.breaking()) {
return flow;
}
auto value = flow.getSingleValue();
auto data = value.getGCData();
if (!data) {
trap("null ref");
}

// A JS-style string can be viewed simply as the underlying data. All we
// need to do is fix up the type.
return Literal(data, curr->type.getHeapType());
}
Flow visitStringWTF8Advance(StringWTF8Advance* curr) {
WASM_UNREACHABLE("unimp");
}
Flow visitStringWTF16Get(StringWTF16Get* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringWTF16Get(StringWTF16Get* curr) {
NOTE_ENTER("StringEq");
Flow ref = visit(curr->ref);
if (ref.breaking()) {
return ref;
}
Flow pos = visit(curr->pos);
if (pos.breaking()) {
return pos;
}
auto refValue = ref.getSingleValue();
auto data = refValue.getGCData();
if (!data) {
trap("null ref");
}
auto& values = data->values;
Index i = pos.getSingleValue().geti32();
if (i >= values.size()) {
trap("string oob");
}
return Literal(values[i].geti32());
}
Flow visitStringIterNext(StringIterNext* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringIterMove(StringIterMove* curr) { WASM_UNREACHABLE("unimp"); }
Flow visitStringSliceWTF(StringSliceWTF* curr) { WASM_UNREACHABLE("unimp"); }
Expand Down
3 changes: 2 additions & 1 deletion src/wasm/wasm-type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1163,7 +1163,8 @@ bool HeapType::isFunction() const {

bool HeapType::isData() const {
if (isBasic()) {
return id == struct_ || id == array || id == string;
return id == struct_ || id == array || id == string ||
id == stringview_wtf16;
Comment on lines +1166 to +1167
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we not change this method (and that we eventually remove it entirely). Code is written assuming that isData has a particular meaning, and if that meaning changes over time, those assumptions can be broken. It is much better for clients of the type API to be precise about what kinds of types they are querying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a form of data, though, isn't it? It's a reference to data (with a notion of how to read the raw data as well).

Maybe I'm not seeing what you are proposing, though: What did you have in mind?

Note that this one-line change would become a many-line change if we need to find the many places that currently have isData() and turn them into isData() || isStringView() - I started down that path and quickly decided to change course. But maybe there is another option?

With that said, I do see your point that it's better when code locations have a precise meaning to what they use. But I don't think this changes that. I see 'isData' is meaning "is a reference to data"; concretely, all isData things are implemented by using the data field in Literal, so this is not arbitrary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about callers assuming that isData() is a convenient shorthand for isStruct() || isArray(), but lgtm for now if the fuzzer is happy. I still think it would be good to eventually do the larger NFC refactoring to eliminate isData() entirely to mitigate that kind of risk. Do you think that is a reasonable thing to do as a future cleanup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could find a more descriptive name for isData that gives it clear semantics beyond isStruct() || isArray(). Historically, isData had clear semantics because it corresponded to subtypes of heap type data, but when we removed data we did not remove isData().

isHeapAllocated might be a good candidate, if somewhat verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel there is a shared concept here, though? Again, it is all the types that use Literal::gcData, all the references to data. It is useful in our codebase to have a concept of all those things, because we need to test on them in all the places that use Literal::gcData. Otherwise each of those places would have "is struct or is array or is string or is stringView" which seems worse.

Is it the name data that meant "struct or array" in the spec that feels wrong to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Our last comments raced, sorry mine isn't in response to yours.)

isHeapAllocated could be shortened to isHeapType 😆 But yeah, this is kind of "data that is heap allocated", but in the stronger sense of our internals. i31 is a heap type but does not store itself using Literal::gcData, so it isn't "heap allocated using Literal::gcData"... I'm not sure what the best name is here, but ignoring the history of the term, we have Literal::gcData now so something with data or gcData seems right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i31 is never heap allocated (or at least, it shouldn't be), so it makes sense that it would be excluded from isHeapAllocated. It's much less clear that it should be excluded from isData!

isHeapData could work as well and is slightly shorter than isHeapAllocated.

I would prefer that we not name something in the wasm-type.h API based on internal details of the Literal API, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Ok, let's keep thinking about this, I sort of understand where you are coming from now but I don't yet see how best to move forward with a naming change.

You're ok with landing this for now, though? (It just adds stringview alongside string, so it doesn't change the meaning of isData in a meaningful way.)

} else {
return getHeapTypeInfo(*this)->isData();
}
Expand Down
31 changes: 31 additions & 0 deletions test/lit/exec/strings.wast
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,29 @@
(string.const "hf")
)
)

;; CHECK: [fuzz-exec] calling get_codeunit
;; CHECK-NEXT: [fuzz-exec] note result: get_codeunit => 99
(func $get_codeunit (export "get_codeunit") (result i32)
;; Reads 'c' which is code 99.
(stringview_wtf16.get_codeunit
(string.as_wtf16
(string.const "abcdefg")
)
(i32.const 2)
)
)

;; CHECK: [fuzz-exec] calling get_length
;; CHECK-NEXT: [fuzz-exec] note result: get_length => 7
(func $get_length (export "get_length") (result i32)
;; This should return 7.
(stringview_wtf16.length
(string.as_wtf16
(string.const "1234567")
)
)
)
)
;; CHECK: [fuzz-exec] calling new_wtf16_array
;; CHECK-NEXT: [fuzz-exec] note result: new_wtf16_array => string("ello")
Expand Down Expand Up @@ -213,6 +236,12 @@

;; CHECK: [fuzz-exec] calling compare.10
;; CHECK-NEXT: [fuzz-exec] note result: compare.10 => -1

;; CHECK: [fuzz-exec] calling get_codeunit
;; CHECK-NEXT: [fuzz-exec] note result: get_codeunit => 99

;; CHECK: [fuzz-exec] calling get_length
;; CHECK-NEXT: [fuzz-exec] note result: get_length => 7
;; CHECK-NEXT: [fuzz-exec] comparing compare.1
;; CHECK-NEXT: [fuzz-exec] comparing compare.10
;; CHECK-NEXT: [fuzz-exec] comparing compare.2
Expand All @@ -229,4 +258,6 @@
;; CHECK-NEXT: [fuzz-exec] comparing eq.3
;; CHECK-NEXT: [fuzz-exec] comparing eq.4
;; CHECK-NEXT: [fuzz-exec] comparing eq.5
;; CHECK-NEXT: [fuzz-exec] comparing get_codeunit
;; CHECK-NEXT: [fuzz-exec] comparing get_length
;; CHECK-NEXT: [fuzz-exec] comparing new_wtf16_array