|
|
| Created: 12 years, 9 months ago by junov1 Modified: 12 years, 7 months ago CC: skia-review_googlegroups.com Base URL: http://skia.googlecode.com/svn/trunk/ Visibility: Public. | DescriptionClean-up tile grid query results using intersection testing This CL aims to reject out of bounds draws earlier in the SkPicture playback pipeline when using a tileGrid. Tile grid queries that are not exactly aligned with the tile grid can be pretty common. This CL aims to optimize that case by performing a bounds test on each element before adding it to the results. To perform the test at a very low cost, we take advantage of the fact that bounding boxes are already being computed at record time for constructing a BBoxHierarchy. SkTileGrid will now store this information so that it can be used in query execution. This saves the overhead of processing commands that will later be rejected by SkCanvas::quickReject. In some cases this will even eliminate unnecessary draws that would not have been detected by quickReject such as drawText calls. TEST=TileGrid unit test Patch Set 1 #Patch Set 2 : # Total comments: 1 Patch Set 3 : #
MessagesTotal messages: 22
PTAL Sign in to reply to this message.
Good idea. What are the RAM & CPU impacts? Sign in to reply to this message.
On 2013/02/11 15:43:58, TomH wrote: > Good idea. What are the RAM & CPU impacts? Ram impact: we used to store one void* per tile per draw command, now we store a void* and an SkIRect. CPU impact is complex I haven't measured anything yet. Affects scaled playback performance in chrome for impl-side painting. Current bench pictures tests are not well conditioned to measure the benefit because they do not simulate pinch-zoom conditions. I was at least able to verify that this change does not regress performance for grid-aligned playback (the well conditioned case). Here is the different ways this CL helps scaled playback or other types of non grid aligned playback: -Avoid overhead of decoding draw commands that will be quick rejected (quick reject itself adds significant overhead in many cases). -This rejection catches a few thing that SkCanvas quick reject does not, which is like boosting the scope of quick reject for free. Most notably: drawing of text (SkPicture only rejects based on y bounds). -In some special cases we will eliminate extraneous results from GatherPixelRefs queries Sign in to reply to this message.
A sizable RAM cost, at least proportionately. Perhaps you could publish some abs values for our .skps (w/ and w/o your change, and post the .skp's command-stream size). What is the record cost? This should be easy to measure w/ our current .skps. From a system p.o.v., we might treat this initial CL as a source of interesting data/ideas as well as a possible CL. Perhaps it points to some lighter-weight changes we could make to pictures as a whole (e.g. record bounds for text inline). Sign in to reply to this message.
So I ran some benchmark tests. Not sure where to dump the full results (no file attachments in rietveld). All experiment were on a stock z620/win7 I observed no measurable perf difference in record time. In the case where playback tiles are perfectly aligned with the tile grid (expected very little impact), I am seeing no perf impact on about a thrid of the skps, minor improvements (1-5% faster) on the rest, except for a few cases that stand out. For example desk_pokemonwiki.skp is 10% faster. In the case where playback tiles are not aligned with the tile grid I am seeing performance increase across the board (>90% of skps), mostly single digit %. For this experiment, I used a tile grid of 256x256 and tiles of 257x257. Where things get really interesting is when combining this change with the on posted here which skips quick reject testing on playback: https://codereview.appspot.com/7323046 The thinking is that the quickReject test (which is not always that quick) becomes completely redundant when the BBoxHierarchy is doing an accurate job. With these two patches working together, I am seeing very interesting playback performance increases. Examples of observed playback speedups on some important sites: 6% on desk_googleplus 9% on desk_googlespreadsheet 7% on desk_gws 12% on desk_gmail 12% on desk_facebook Extreme case: desk_googlespreadsheetdashed is 22% faster Sign in to reply to this message.
Another idea I want to try next: When using a BBoxHierarchy, we should skip the optimization In SkPictureRecord that uses DRAW_TEXT_TOP_BOTTOM. That work is redundant with what the BBoxHierarchy does and it adds recording overhead. Sign in to reply to this message.
We know the ram increase is 5x on a 32bit machine. What does that come to in real values for our .skps? https://codereview.appspot.com/7311075/diff/5/src/core/SkTileGrid.cpp File src/core/SkTileGrid.cpp (right): https://codereview.appspot.com/7311075/diff/5/src/core/SkTileGrid.cpp#newcode88 src/core/SkTileGrid.cpp:88: if (queryIsGridAligned) if (...) { Sign in to reply to this message.
I certainly like the idea of faster bounds rejects, but I want us to recognize the cost in complexity and storage for "solving" this on the outside (in only one bbox subclass) rather than finding a more centralized optimization (either in bbox proper, or better yet, in picture or canvas). Sign in to reply to this message.
On 2013/02/12 15:20:41, reed1 wrote: > I certainly like the idea of faster bounds rejects, but I want us to recognize > the cost in complexity and storage for "solving" this on the outside (in only > one bbox subclass) rather than finding a more centralized optimization (either > in bbox proper, or better yet, in picture or canvas). FWIW, A proper bounding box hierarchy look-up should not return extraneous results. Rtree is already doing this right, but tileGrid was cutting corners. In with canvas, we are already doing what we can with quickReject, and in SkPicture, BBoxHierarchies are the solution for earlier rejection based on playback bounds. Sign in to reply to this message.
To measure the RAM cost of adding 1 Rect per draw is potentially big. I will create a temp CL to count how many of these we are creating for each of the skps in our set. Sign in to reply to this message.
I instrumented bench_pictures to count the number of bounding boxes that are being recorded. I gathered two stats: rectangle count, and unique rectangle count. rectangle count takes into account the fact that one bounding box may end up being stored in more that one tile. If we added a level of indirection to this code, it would be possible to store only unique Elements (void*, SkIRect pair) in a shared array and to reference them by index. That way, the per-tile arrays would be index arrays. Analysis: Unique rectangles seldom and average discount of about 50% on average, not sure it is worth adding an indirection for that. Rectangle count is almost always < 10k, so < 160KBytes of RAM. Exception: pokemonwiki with a whopping 88k rects, so 1.4MB worth of rects. Result Dump: bench_pictures: D:\src\skia-tree\skp --mode record --bbh grid 256 256 running bench [1257 2394] desk_amazon.skp record_grid: cmsecs = 0.0000 SkRect Count : 1476 UniqueCount: 942 running bench [1257 1423] desk_baidu.skp record_grid: cmsecs = 0.0000 SkRect Count : 1790 UniqueCount: 1336 running bench [1257 11809] desk_blogger.skp record_grid: cmsecs = 0.0000 SkRect Count : 5274 UniqueCount: 2272 running bench [1257 3045] desk_booking.skp record_grid: cmsecs = 0.0000 SkRect Count : 2994 UniqueCount: 1846 running bench [1257 6612] desk_br337.skp record_grid: cmsecs = 0.0000 SkRect Count : 3956 UniqueCount: 2234 running bench [1272 907] desk_chalkboard.skp record_grid: cmsecs = 0.0000 SkRect Count : 7396 UniqueCount: 5322 running bench [1257 6763] desk_css3gradients.skp record_grid: cmsecs = 0.0000 SkRect Count : 2994 UniqueCount: 1144 running bench [1257 5744] desk_ebay.skp record_grid: cmsecs = 0.0000 SkRect Count : 3582 UniqueCount: 1434 running bench [1257 1926] desk_espn.skp record_grid: cmsecs = 0.0000 SkRect Count : 2322 UniqueCount: 1242 running bench [1257 14258] desk_facebook.skp record_grid: cmsecs = 0.0000 SkRect Count : 8412 UniqueCount: 3724 running bench [1272 908] desk_gmailthread.skp record_grid: cmsecs = 0.0000 SkRect Count : 978 UniqueCount: 530 running bench [1260 6823] desk_googleplus.skp record_grid: cmsecs = 0.0000 SkRect Count : 5148 UniqueCount: 1418 running bench [1272 907] desk_googlespreadsheet.skp record_grid: cmsecs = 0.00 00 SkRect Count : 4486 UniqueCount: 3248 running bench [1272 907] desk_googlespreadsheetdashed.skp record_grid: cmsecs = 0.0000 SkRect Count : 7900 UniqueCount: 6610 running bench [1257 1617] desk_gws.skp record_grid: cmsecs = 0.0000 SkRect Count : 1378 UniqueCount: 776 running bench [1257 2702] desk_linkedin.skp record_grid: cmsecs = 0.0000 SkRect Count : 1740 UniqueCount: 944 running bench [1257 4971] desk_pinterest.skp record_grid: cmsecs = 0.0000 SkRect Count : 3362 UniqueCount: 1914 running bench [1257 19451] desk_pokemonwiki.skp record_grid: cmsecs = 15.6001 SkRect Count : 88000 UniqueCount: 65174 running bench [1257 3976] desk_sfgate.skp record_grid: cmsecs = 0.0000 SkRect Count : 4764 UniqueCount: 2634 running bench [1257 11938] desk_techcrunch.skp record_grid: cmsecs = 0.0000 SkRect Count : 5822 UniqueCount: 2354 running bench [1257 6685] desk_twitter.skp record_grid: cmsecs = 0.0000 SkRect Count : 6140 UniqueCount: 3270 running bench [1257 2531] desk_weather.skp record_grid: cmsecs = 0.0000 SkRect Count : 1978 UniqueCount: 950 running bench [1257 16511] desk_wordpress.skp record_grid: cmsecs = 0.0000 SkRect Count : 6336 UniqueCount: 2198 running bench [1257 26497] desk_wowwiki.skp record_grid: cmsecs = 0.0000 SkRect Count : 19828 UniqueCount: 12396 running bench [1257 1966] desk_yahooanswers.skp record_grid: cmsecs = 0.0000 SkRect Count : 2526 UniqueCount: 1254 running bench [1257 15873] desk_yahoogames.skp record_grid: cmsecs = 0.0000 SkRect Count : 19618 UniqueCount: 12304 running bench [1257 3122] desk_yahoonews.skp record_grid: cmsecs = 0.0000 SkRect Count : 3052 UniqueCount: 1452 running bench [1257 4075] desk_yahoosports.skp record_grid: cmsecs = 0.0000 SkRect Count : 3744 UniqueCount: 1904 running bench [1257 2754] desk_youtube.skp record_grid: cmsecs = 0.0000 SkRect Count : 2438 UniqueCount: 1382 running bench [1257 9545] mobi_wikipedia.skp record_grid: cmsecs = 0.0000 SkRect Count : 11650 UniqueCount: 7694 running bench [1257 8775] tabl_androidpolice.skp record_grid: cmsecs = 0.0000 SkRect Count : 1628 UniqueCount: 232 running bench [1257 3576] tabl_cnet.skp record_grid: cmsecs = 0.0000 SkRect Count : 3148 UniqueCount: 1286 running bench [1257 12391] tabl_cnn.skp record_grid: cmsecs = 0.0000 SkRect Count : 7732 UniqueCount: 3964 running bench [1257 8473] tabl_culturalsolutions.skp record_grid: cmsecs = 0.0 000 SkRect Count : 2822 UniqueCount: 1066 running bench [1257 14232] tabl_cuteoverload.skp record_grid: cmsecs = 0.0000 SkRect Count : 4502 UniqueCount: 1914 running bench [1257 1924] tabl_deviantart.skp record_grid: cmsecs = 0.0000 SkRect Count : 966 UniqueCount: 476 running bench [1257 8999] tabl_digg.skp record_grid: cmsecs = 0.0000 SkRect Count : 4606 UniqueCount: 1818 running bench [1257 13040] tabl_engadget.skp record_grid: cmsecs = 0.0000 SkRect Count : 6032 UniqueCount: 2532 running bench [1272 8942] tabl_frantzen.skp record_grid: cmsecs = 0.0000 SkRect Count : 1200 UniqueCount: 330 running bench [1257 4954] tabl_gamedeksiam.skp record_grid: cmsecs = 15.6001 SkRect Count : 8320 UniqueCount: 5940 running bench [1272 907] tabl_gmail.skp record_grid: cmsecs = 0.0000 SkRect Count : 694 UniqueCount: 370 running bench [1257 21618] tabl_googleblog.skp record_grid: cmsecs = 0.0000 SkRect Count : 7396 UniqueCount: 2848 running bench [1272 907] tabl_googlecalendar.skp record_grid: cmsecs = 0.0000 SkRect Count : 1462 UniqueCount: 744 running bench [1257 2473] tabl_gspro.skp record_grid: cmsecs = 0.0000 SkRect Count : 996 UniqueCount: 276 running bench [1257 11687] tabl_hsfi.skp record_grid: cmsecs = 0.0000 SkRect Count : 6436 UniqueCount: 2978 running bench [1257 3162] tabl_mercurynews.skp record_grid: cmsecs = 0.0000 SkRect Count : 2492 UniqueCount: 698 running bench [1257 3102] tabl_mlb.skp record_grid: cmsecs = 0.0000 SkRect Count : 2916 UniqueCount: 1356 running bench [1257 103492] tabl_mozilla.skp record_grid: cmsecs = 0.0000 SkRect Count : 38854 UniqueCount: 18752 running bench [1257 6641] tabl_nofolo.skp record_grid: cmsecs = 0.0000 SkRect Count : 1748 UniqueCount: 354 running bench [1257 3930] tabl_nytimes.skp record_grid: cmsecs = 0.0000 SkRect Count : 3726 UniqueCount: 2194 running bench [1257 4613] tabl_onlinewsj.skp record_grid: cmsecs = 0.0000 SkRect Count : 4748 UniqueCount: 2434 running bench [1257 4093] tabl_pravda.skp record_grid: cmsecs = 0.0000 SkRect Count : 3826 UniqueCount: 2078 running bench [1257 2003] tabl_sahadan.skp record_grid: cmsecs = 0.0000 SkRect Count : 11918 UniqueCount: 10080 running bench [1267 1547] tabl_slashdot.skp record_grid: cmsecs = 0.0000 SkRect Count : 1096 UniqueCount: 460 running bench [1257 2308] tabl_techmeme.skp record_grid: cmsecs = 0.0000 SkRect Count : 1372 UniqueCount: 622 running bench [1257 7340] tabl_theverge.skp record_grid: cmsecs = 0.0000 SkRect Count : 6378 UniqueCount: 2470 running bench [1257 5601] tabl_transformice.skp record_grid: cmsecs = 0.0000 SkRect Count : 2192 UniqueCount: 790 running bench [1257 4164] tabl_ukwsj.skp record_grid: cmsecs = 0.0000 SkRect Count : 4328 UniqueCount: 2168 running bench [1257 4075] tabl_vnexpress.skp record_grid: cmsecs = 0.0000 SkRect Count : 3992 UniqueCount: 2340 running bench [1257 3080] tabl_worldjournal.skp record_grid: cmsecs = 0.0000 SkRect Count : 4710 UniqueCount: 2806 Press any key to continue . . . Sign in to reply to this message.
Editing error: > Unique rectangles seldom and average discount of about 50% on average I meant: Unique rectangles give an average discount of about 50%. Sign in to reply to this message.
Justin, I think the metic Mike's been asking for since the beginning is the right one: what's the relative increase in SKP size? Or, for each of those SKPs, what's the size of the raw data, what's the size of the hierarchy, and what's the size of the hierarchy + the extra bounding data? Sign in to reply to this message.
Wasn't that answered in my previous post? Extra data consumption in TileGrid structure < 160KB most of the time. In extreme cases like pokemon wiki 1.4MB of extr RAM consumption. Before this change, consumption was 5x less on 32-bit, 3x less on 64-bit Sign in to reply to this message.
Just uploaded Patch Set 3 This version of the patch reduce memory consumption by using a shared array for storing the bounding boxes. The patch is very messy because it uses ifdef to switch between shared and non-shared implementations. I don't think we would want to submit it in this state. Anyways, I toggled between the two implementations to run some benchmarking experiments. While implementing this I realize that although it adds an extra indirection for accessing the draw call data, it removes indirections in the tile merging code. What I found by benchmarking was playback time fluctuations of +/-2% (mostly noise I think). With the new algorithm (shared rect array), memory consumption drops by about 25% on average, in extreme cases, 70% savings. In the unlikely case that no objects in overlap any tiles, which is the theoretical worst case, this algorithm would actually increase memory consumption by 20%. For reference none of our skp files are penalized by this algorithm. Sign in to reply to this message.
Given these results I am inclined to clean-up the patch and just keep the new shared data implementation. Anyone think that is the wrong idea? Sign in to reply to this message.
Justin, you still haven't answered Mike's question. And certainly without that I don't think the extra complexity and indirection is well-justified. Sign in to reply to this message.
In order to provide clear metrics. I started working on another patch that will add total SkPicture memory consumption reporting to bench_pictures. I'm thinking we may even want to check memory consumption baselines on the bots. Sign in to reply to this message.
Another patch and benchmarking feels like a big step. Little steps might be: 1) hack into your code and dump it once 2) display it in the debugger 3) ... ? 4) bot benchmarking Mike threw together a document this morning analyzing all the SKPs checked into the skia repo; it got shared with skia-dev a few minutes ago. Those SKPs might be out of date, and there are a couple of caveats necessary to understand the numbers, but it's promising. Sign in to reply to this message.
Yeah I found some old code inside #ifdef SK_DEBUG_SIZE but it is severely rotten. I am putting it back in shape and adding what it takes to measure BBoxHierarchies. That is tha path of least resistance to getting stats dumped to stderr. Sign in to reply to this message.
Generated some stats : https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AkZDOrGmY3OfdDgxV2h... (viewable by skia-dev) The memory consumption numbers do not include the effects of this change The "tile grid nodes" column represents the number of objects for which we'd want to store bounds information (extra 16 bytes) Sign in to reply to this message.
I added a couple of columns to that document (L, M); are those correct? If so, they suggest that the typical overhead of tracking bounding boxes is 4% of the opcode stream, or < 1% of the total size - usually 4kB or so. Worst-case is pokemonwiki, which has a huge table bloating the number of objects; that's only about 130kB. Sign in to reply to this message. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
