Skip to content
This repository was archived by the owner on Jul 24, 2024. It is now read-only.

Conversation

edsrzf
Copy link
Contributor

@edsrzf edsrzf commented Sep 13, 2022

Fixes #26.

I've tried to do as straightforward of conversion as possible so this might be non-optimal in some aspects, but I figure we can do further refactoring later. All tests pass.

One thing that's worse now is test failures. The Debug impl for Vec<u8> prints things like [50, 51, 52] so it's not very human-readable.

We could look at pulling in a dependency like bstr, or otherwise adding a custom Debug impl. Let me know what you think and I can either add it onto this PR or do it as a separate one.

@ryangjchandler
Copy link
Contributor

ryangjchandler commented Sep 13, 2022

Wow, appreciate the work on this @edsrzf.

I definitely think a custom Debug impl would be useful, for convenience.

}

pub fn tokenize(&mut self, input: &str) -> Result<Vec<Token>, LexerError> {
pub fn tokenize<B: ?Sized + AsRef<[u8]>>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this generic means it can now accept strings or byte slices.

};
}

fn var<B: ?Sized + AsRef<[u8]>>(v: &B) -> TokenKind {
Copy link
Contributor Author

@edsrzf edsrzf Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it was actually easier to write this as a generic function than a macro, when it needs to deal with both string and byte literals. Alternatively we could always use byte literals, I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. The macros were just originally a quick and dirty way of removing repetition, most of them could probably be removed so that the tests are explicit anyway.

@edsrzf
Copy link
Contributor Author

edsrzf commented Sep 13, 2022

Thoughts on the bstr idea? If I start thinking about how I'd approach a custom Debug impl, I'd probably create a new byte string type with the impl and then have variants like TokenKind::Comment(ByteString) and continue deriving the impl for TokenKind. And then I've basically reimplemented a subset of bstr.

Is there a reason to prefer doing that instead of pulling in bstr?

@ryangjchandler
Copy link
Contributor

@edsrzf I'd ideally like to avoid dependencies where possible. Given that this is really just a case of wrapping a byte array / vec, I think we can just handle roll it with a ByteString structure.

@ryangjchandler
Copy link
Contributor

Great work, thanks @edsrzf!

@ryangjchandler ryangjchandler merged commit c023468 into php-rust-tools:main Sep 13, 2022
@edsrzf edsrzf deleted the byte-lexer branch September 13, 2022 10:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants