chore: introduce benchmarking for rust#4552
chore: introduce benchmarking for rust#4552georgesittas merged 1 commit intotobymao:mainfrom benfdking:introducing_benchmarking
Conversation
benfdking
commented
Dec 30, 2024
- introduces a criterion based bench based on the long sql
- introduces serde serializing and deserializing that is not built into the main release
- introduces some json files which are just simple serialization of a set of configs taken from sqlglot
- introduces gha action that compares performance in a pr to main to check for regressions
|
FYI, the numbers (values) in the setting maps can change, depending on the order of the relevant enums (e.g., token types) in Python. I'm wondering if it's a good idea to check the JSON files into the repo, given that we'll have to re-generate them every time we benchmark, to make sure the numbers aren't stale. Or am I missing something? |
georgesittas
left a comment
There was a problem hiding this comment.
Re: #4552 (comment) (sharing internal discussion).
It’s probably fine for now. I think if the TokenType enum is updated, we'd simply miss the newer tokens and use an outdated mapping, but that’s fine because we don’t use those tokens anyway and we're only using the rust tokenizer directly. So, it’s like taking a snapshot of the tokens supported today and benchmarking the tokenizer using them.