Skip to content

Comments

chore: introduce benchmarking for rust#4552

Merged
georgesittas merged 1 commit intotobymao:mainfrom
benfdking:introducing_benchmarking
Jan 7, 2025
Merged

chore: introduce benchmarking for rust#4552
georgesittas merged 1 commit intotobymao:mainfrom
benfdking:introducing_benchmarking

Conversation

@benfdking
Copy link
Contributor

  • introduces a criterion based bench based on the long sql
  • introduces serde serializing and deserializing that is not built into the main release
  • introduces some json files which are just simple serialization of a set of configs taken from sqlglot
  • introduces gha action that compares performance in a pr to main to check for regressions

@georgesittas
Copy link
Collaborator

georgesittas commented Jan 7, 2025

FYI, the numbers (values) in the setting maps can change, depending on the order of the relevant enums (e.g., token types) in Python. I'm wondering if it's a good idea to check the JSON files into the repo, given that we'll have to re-generate them every time we benchmark, to make sure the numbers aren't stale. Or am I missing something?

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #4552 (comment) (sharing internal discussion).

It’s probably fine for now. I think if the TokenType enum is updated, we'd simply miss the newer tokens and use an outdated mapping, but that’s fine because we don’t use those tokens anyway and we're only using the rust tokenizer directly. So, it’s like taking a snapshot of the tokens supported today and benchmarking the tokenizer using them.

@georgesittas georgesittas merged commit 9921528 into tobymao:main Jan 7, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants