[rmkit][fb] speed up draw_rect functions#78
Conversation
e855899 to
0527b4c
Compare
|
Picking up where #73 (comment) left off
I did 3 "7x7 easy" lightup puzzles each basically, so manual interaction. I just added saves, so it might be possible to automate it by loading a save file and redrawing a couple times, or something like that. I just tried running a perf build with these changes -- I'm building with -O2, so the biggest difference I'm seeing is that do_dithering isn't inlined with this change. Honestly I'm not totally sure how to interpret these results since the draw_rect time got split up b/c of the inlining change, but it looks like performance is similar, (or maybe a little worse?) based on the % time column. without this diffwith this diff |
| update_dirty(dirty_area, o_x, o_y) | ||
| update_dirty(dirty_area, o_x+w, o_y+h) | ||
|
|
||
| _draw_rect_fast(o_x, o_y, w, h, color, fill, dither) | ||
|
|
||
| inline void _draw_rect_fast(int o_x, o_y, w, h, color, fill=true, float dither=1.0): |
There was a problem hiding this comment.
I do see a huge reduction in the number of times update_dirty is called, which seems like a net benefit in any case, so this looks like a helpful part for sure.
|
|
||
| update_dirty(dirty_area, o_x, o_y) | ||
| update_dirty(dirty_area, o_x+w, o_y+h) | ||
| if likely(fill): |
There was a problem hiding this comment.
It looks like almost every function calls this with fill=true. I think the only place where fill=false would happen is in the original draw_rect itself right?
What about making draw_rect include something like this? (I haven't tested, just a thought)
if fill:
_draw_rect_fast(x, y, w, h, ...)
else:
_draw_rect_fast(x, y, w, 1, ...) // top
_draw_rect_fast(x, y+h-1, w, 1, ...) // bottom
_draw_rect_fast(x, y, 1, h, ...) // left
_draw_rect_fast(x+w-1, y, 1, h, ...) // rightThe main difference I think is that do_dithering could still be inlined in _draw_rect_fast since it would only have one call site.
| if i+o_x >= self.width: | ||
| break |
There was a problem hiding this comment.
I wonder how expensive these comparisons are (they might not be at all) compared to:
w = min(w, self.width-x)
h = min(h, self.height-y)
for j 0 h:
for i 0 w:
do_dithering(self.fbmem, i+o_x, j+o_y, color, dither)There was a problem hiding this comment.
i had a little trouble with this just now (and had trouble the first couple times i tried it a while back)
mrichards42
left a comment
There was a problem hiding this comment.
Cool, I'm seeing a modest speedup with this now. I switched my profiling build to -O1 since -O2 was giving me gibberish, idk if it's from inlining or arm branch prediction or what.
In any case, I'm seeing that update_dirty has dropped off from about 3% total time to 0% since it's called over 2 orders of magnitude less. And I'm seeing about a 20-25% speedup in draw_rect which is pretty good!
FWIW I also tried a build where I replaced the do_dithering call with _set_pixel directly, and that saved another 50% or so. We probably aren't at a point where we can do that just yet, but since draw_rect is still by far the most expensive function, it might be worth seeing if the dithering code could be adapted to support more generic patterns/brushes. I've thought more about that and I think I have a decent plan in mind.
No description provided.