Precompile: Frontend and backend for building circuits#799
Merged
Precompile: Frontend and backend for building circuits#799
Conversation
dc664af to
929eddf
Compare
dreamATD
added a commit
that referenced
this pull request
Jan 6, 2025
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <tianyi.liu.08@gmail.com>
naure
reviewed
Jan 6, 2025
Contributor
naure
left a comment
There was a problem hiding this comment.
First pass on gkr_iop. It makes sense so far.
dreamATD
commented
Jan 7, 2025
16b57f3 to
29061f1
Compare
dreamATD
added a commit
that referenced
this pull request
Jan 15, 2025
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <tianyi.liu.08@gmail.com>
cffdd03 to
d51562b
Compare
hero78119
reviewed
Jan 15, 2025
hero78119
reviewed
Jan 15, 2025
hero78119
reviewed
Jan 15, 2025
Collaborator
hero78119
left a comment
There was a problem hiding this comment.
Awesome job!
I leave few comments in separate section due to large PR so I did the review in segmented time.
Most of the utility of code reused can be done later, I think the most important point might be trying one pre-compile (e.g. keccak-f) first, and benchmark the preliminary performance. Once it meet the requirements, we proceed to more engineering polishing works :)
7762988 to
87d1a30
Compare
dreamATD
added a commit
that referenced
this pull request
Feb 19, 2025
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <tianyi.liu.08@gmail.com>
87d1a30 to
88f9b00
Compare
Collaborator
|
related to #191 |
dreamATD
added a commit
that referenced
this pull request
Mar 22, 2025
Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <tianyi.liu.08@gmail.com>
88f9b00 to
eb4c9cb
Compare
Remove buffers and replace the underlying util functions. Add comments and fix some tiny bugs Suggestions for 'Frontend and backend for building circuits' (#801) Suggestions for #799 Feel free to pick and choose from the suggestions. I talk about most of them on your PR. --------- Co-authored-by: dreamATD <tianyi.liu.08@gmail.com> Refine according to comments refine the protocol prover and verifier structs Add more comments Tiny fix according to the latest comments.
eb4c9cb to
ba053c2
Compare
To close issue #632 named io as `debug_println` in guest program debug build, assuming no "println!" use case in guest program. In debug build, we extend stack address a bit to cover a reserved 256k for io. This extra reserved space also reflect in linker script, so the write to this region wont got any complaints from either elf or riscv emulator Besides, this PR also fix a previous problem where meaningful symbol in bss/sbss section will be skip due to their value are 0. We need to reserve and padding to cover them, since those might be some static variables initialized with 0 or uninitialized. Without do it, emulator will also complain regions is not writable. - cleanup previois workaround in guest program for io - extend stack address for io consistency check during debug build - refactor `load_elf` bss/sbss padding issue. - e2e command also shows io result. - respect profile in guest program examples compilation. An guest program with IO ```bash cargo run --release --features sanity-check --package ceno_zkvm --bin e2e -- --platform=ceno --hints=10 --public-io=4191 examples/target/riscv32im-ceno-zkvm-elf/release/examples/ceno_rt_io cargo run --features sanity-check --package ceno_zkvm --bin e2e -- --platform=ceno --hints=10 --public-io=4191 examples/target/riscv32im-ceno-zkvm-elf/debug/examples/ceno_rt_io ```
github-merge-queue Bot
pushed a commit
that referenced
this pull request
May 9, 2025
To close #936 ### Design rationales - introduce `VirtualPolynomialsBuilder` to lift a witness of "ArcPoly" type to expression container, so they can involve into expression domain for calculation - apply `VirtualPolynomialsBuilder` in tower prover. - keep scalar in base field as possible via introducing `Either<Base, Ext>` type - reserve design for "eq" degree -1 optimisation > this part work haven't done yet and set as future work :) `VirtualPolynomialsBuilder` is more like a util function for ceno main sumcheck flow. For GKR layer circuit in gk- iop #799 , the expression system will directly applied on chip-builder and skip `VirtualPolynomialsBuilder` ### benchmark there is no impact for e2e benchmark before/after this change, which is expected 2^20 ``` fibonacci_max_steps_1048576/prove_fibonacci/fibonacci_max_steps_1048576 time: [2.3583 s 2.3709 s 2.3848 s] change: [-1.8405% -1.0740% -0.2480%] (p = 0.03 < 0.05) Change within noise threshold. ``` 2^21 ``` fibonacci_max_steps_2097152/prove_fibonacci/fibonacci_max_steps_2097152 time: [4.4650 s 4.4758 s 4.4867 s] change: [-0.6673% -0.3122% +0.0493%] (p = 0.13 > 0.05) No change in performance detected. ``` 2^22 ``` fibonacci_max_steps_4194304/prove_fibonacci/fibonacci_max_steps_4194304 time: [9.0115 s 9.0574 s 9.1011 s] change: [-1.0658% -0.3407% +0.3803%] (p = 0.40 > 0.05) No change in performance detected. ```
sync up #799 with master
834e0d6 to
b38ac9d
Compare
``` RUST_LOG=info JEMALLOC_SYS_WITH_MALLOC_CONF=retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true cargo run --features jemalloc --package gkr_iop --bin lookup_keccak ``` > this only cover prover flow, and not verifier flow yet benchmark command ``` JEMALLOC_SYS_WITH_MALLOC_CONF=retain:true,metadata_thp:always,thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1,abort_conf:true cargo bench -p gkr_iop --features jemalloc --bench lookup_keccakf ``` Benchmark results on AMD EPYC 32 cores machine | Version | Throughput (keccak/s) | |------------------------|------------------------| | Ceno Keccak version | 4215 | | Plonky3 + Baby Bear | 1188.47 | | Plonky3 + Goldilocks | 683.05 | | Ceno (textbook gkr) | 128 | --------- Co-authored-by: Zhang Zhuo <mycinbrin@gmail.com>
hero78119
approved these changes
Jun 5, 2025
Collaborator
hero78119
left a comment
There was a problem hiding this comment.
amazing work with many inspiring new designs 👍 !!
### Change This PR sync with ceno master, and rollback partial of change to assure not affect ceno mainflow benchmark ### benchmark against master | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|-------------------------------------| | fibonacci_max_steps_1048576 | 2.1283 | +2.0905% (Change within noise) | | fibonacci_max_steps_2097152 | 3.6231 | +0.9229% (No change in performance) | | fibonacci_max_steps_4194304 | 6.4747 | -0.1104% (No change in performance) | --------- Co-authored-by: Zhang Zhuo <mycinbrin@gmail.com> Co-authored-by: xkx <xiakunxian130@gmail.com> Co-authored-by: Akase Haruka <lightsing@users.noreply.github.com>
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Jun 17, 2025
This PR build on top of #799 with one extra 48ded1a to introduce backend expression and cached in constrain system. This align the design with pre-compile so its easier for next step refactor to introduce precompile chip in main flow. Main sumcheck read/write lookup expression was simplified, as post `evaluate()` was also removed. ### Expression Expression will be simplified into 2 kind: frontend and backend expression - frontend expression: expression with Witin/StructuralWitin/Fixed, in recursive/nested style - backend expression: expression with Witin only, in monomial style. After circuit setup, both expression content are all known and freezed. During runtime, we can take backend expression and evaluate its scalar with "challenge/instance" then the final expression can be put into sumcheck. ### benchmark The nice thing is before/after change, there is no performance difference. | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|----------------------------------------| | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) | | fibonacci_max_steps_2097152 | 3.5514 | -1.0748% (Change within noise threshold) | | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) |
yczhangsjtu
pushed a commit
that referenced
this pull request
Jun 19, 2025
This is an implementation of the expression-based and plonkish-like GKR
IOP protocol. The circuit is denoted as `Chip`, holding all information
to process commit phases and GKR proving phase. In the current
implementation, we assume there are two commit phases. To process the
GKR phase, we extract a `GKRCircuit` from it and run the GKR protocol.
For the implementation status, the GKR phase is ready for review, while
the commit phases hasn't been finalized.
Define a GKR IOP protocol for a chip includes defining
`build_commit_phase`, `build_commit_phase2` and `build_gkr_phase`.
Specially, `build_gkr_phase` is mainly to build GKR layers in the
reverse order. In addition to specify the expressions, to simplify the
case of either transferring evaluations from an input of a succeeding
layer to an output of the current layer or even make some computations
before feeding to the current layer, we use an evaluation tape to place
the evaluations and `EvalExpression` to define the computation. Each
layer input will be assigned a position in the evaluation tape.
`EvalExpression` is defined as follows:
```rust
#[derive(Clone, Debug)]
pub enum EvalExpression {
Single(usize),
Linear(usize, Constant, Constant),
Partition(Vec<Box<EvalExpression>>, Vec<(usize, Constant)>),
}
```
of which the items denote how to compute the output evaluations. For
more details please refer to
[gkr_iop/src/evaluation.rs](https://github.com/scroll-tech/ceno/blob/tianyi/refactor-prover/gkr_iop/src/evaluation.rs).
Here are some subsequent tasks:
- [ ] Parallelize the vector evaluations under
`subprotocols/src/expression/`.
- [ ] Devirgo migration.
- [ ] Benchmarks.
- [ ] Keccak example and benchmarks.
Although the previous tasks should be done, I suggest to start the first
round of review first. Would like to see comments from @naure and
@hero78119 so that I can adjust the design before moving forward.
**Upd:** The design doc: https://hackmd.io/@sphere-liu/HyLR-h2L1g.
---------
Co-authored-by: Mihai <mihai.calancea@gmail.com>
Co-authored-by: mcalancea <mihai@inversed.tech>
Co-authored-by: Sphere L <sph6r6.l1u@gmail.com>
Co-authored-by: Ming <hero78119@gmail.com>
Co-authored-by: Zhang Zhuo <mycinbrin@gmail.com>
Co-authored-by: xkx <xiakunxian130@gmail.com>
Co-authored-by: Akase Haruka <lightsing@users.noreply.github.com>
yczhangsjtu
pushed a commit
that referenced
this pull request
Jun 19, 2025
This PR build on top of #799 with one extra 48ded1a to introduce backend expression and cached in constrain system. This align the design with pre-compile so its easier for next step refactor to introduce precompile chip in main flow. Main sumcheck read/write lookup expression was simplified, as post `evaluate()` was also removed. ### Expression Expression will be simplified into 2 kind: frontend and backend expression - frontend expression: expression with Witin/StructuralWitin/Fixed, in recursive/nested style - backend expression: expression with Witin only, in monomial style. After circuit setup, both expression content are all known and freezed. During runtime, we can take backend expression and evaluate its scalar with "challenge/instance" then the final expression can be put into sumcheck. ### benchmark The nice thing is before/after change, there is no performance difference. | Benchmark | Median Time (s) | Median Change (%) | |----------------------------------|------------------|----------------------------------------| | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) | | fibonacci_max_steps_2097152 | 3.5514 | -1.0748% (Change within noise threshold) | | fibonacci_max_steps_1048576 | 2.0641 | -0.9869% (No change in performance detected) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an implementation of the expression-based and plonkish-like GKR IOP protocol. The circuit is denoted as
Chip, holding all information to process commit phases and GKR proving phase. In the current implementation, we assume there are two commit phases. To process the GKR phase, we extract aGKRCircuitfrom it and run the GKR protocol. For the implementation status, the GKR phase is ready for review, while the commit phases hasn't been finalized.Define a GKR IOP protocol for a chip includes defining
build_commit_phase,build_commit_phase2andbuild_gkr_phase. Specially,build_gkr_phaseis mainly to build GKR layers in the reverse order. In addition to specify the expressions, to simplify the case of either transferring evaluations from an input of a succeeding layer to an output of the current layer or even make some computations before feeding to the current layer, we use an evaluation tape to place the evaluations andEvalExpressionto define the computation. Each layer input will be assigned a position in the evaluation tape.EvalExpressionis defined as follows:of which the items denote how to compute the output evaluations. For more details please refer to gkr_iop/src/evaluation.rs.
Here are some subsequent tasks:
subprotocols/src/expression/.Although the previous tasks should be done, I suggest to start the first round of review first. Would like to see comments from @naure and @hero78119 so that I can adjust the design before moving forward.
Upd: The design doc: https://hackmd.io/@sphere-liu/HyLR-h2L1g.