New scenegraph renderer and atlas textures.

The renderer tries to batch primitives together where possible, isolate non-changing subparts of the scene from changing subparts and retain vertexdata on the GPU as much as possible. Atlas textures are crucial in enabling batching. The renderer and atlas texture are described in detail in the doc page "Qt Quick Scene Graph Renderer". Change-Id: Ia476c7f0f42e1fc57a2cef528e93ee88cf8f7055 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@digia.com>
author: Gunnar Sletta <gunnar.sletta@digia.com> 2013-08-14 07:27:07 +0200
committer: The Qt Project <gerrit-noreply@qt-project.org> 2013-09-02 14:24:36 +0200
commit: b480fa83a632b2ae5606e2870b47358328b479a2 (patch)
tree: bdd3e1b68a5a15a3950e13a50db911a93cdf279a /src/quick/doc
parent: 9be35c270082d1614886874e17cc3f90a7a3f489 (diff)
3 files changed, 325 insertions, 2 deletions
diff --git a/src/quick/doc/images/visualcanvas_list.png b/src/quick/doc/images/visualcanvas_list.png
new file mode 100644
index 0000000000..37bf72572d
--- /dev/null
+++ b/src/quick/doc/images/visualcanvas_list.png
diff --git a/src/quick/doc/images/visualcanvas_overlap.png b/src/quick/doc/images/visualcanvas_overlap.png
new file mode 100644
index 0000000000..acca669016
--- /dev/null
+++ b/src/quick/doc/images/visualcanvas_overlap.png
diff --git a/src/quick/doc/src/concepts/visualcanvas/scenegraph.qdoc b/src/quick/doc/src/concepts/visualcanvas/scenegraph.qdoc
index 40d77c3d9b..4939868d1c 100644
--- a/src/quick/doc/src/concepts/visualcanvas/scenegraph.qdoc
+++ b/src/quick/doc/src/concepts/visualcanvas/scenegraph.qdoc
@@ -170,6 +170,9 @@ attach application code. This can be to add custom scene graph
 content or render raw OpenGL content. The integration points are
 defined by the render loop.
 
+For detailed description of how the scene graph renderer works, see
+\l {Qt Quick Scene Graph Renderer}.
+
 
 \section2 Threaded Render Loop
 
@@ -278,7 +281,7 @@ content either under a Qt Quick scene or over it. The benefit of
 integrating in this manner is that no extra framebuffer nor memory is
 needed to perform the rendering. The downside is that Qt Quick decides
 when to call the signals and this is the only time the OpenGL
-application is allowed to draw. 
+application is allowed to draw.
 
 The \l {Scene Graph - OpenGL Under QML} example gives an example on
 how to use use these signals.
@@ -301,7 +304,7 @@ or stencil-buffer or similar. Doing so can result in unpredictable
 behavior.
 
 \warning The OpenGL rendering code must be thread aware, as the
-rendering might be happening outside the GUI thread. 
+rendering might be happening outside the GUI thread.
 
 
 \section2 Custom Items using QPainter
@@ -348,3 +351,323 @@ with multiple windows.
 \endlist
 
 */
+
+/*!
+  \title Qt Quick Scene Graph Renderer
+  \page qtquick-visualcanvas-scenegraph-renderer.html
+
+  This document explains how the scene graph renderer works internally
+  so that one can write code that uses it in an optimal fashion, both
+  performance-wise and feature-wise.
+
+  One does not need to understand the internals of the renderer to get
+  good performance.  However, it might help when integrating with the
+  scene graph or to figure out why it is not possible to squeeze the
+  maximum efficiency out of the graphics chip.
+
+  \note Even in the case where every frame is unique and everything is
+  uploaded from scratch, the default renderer will perform well.
+
+  The Qt Quick items in a QML scene populates a tree of QSGNode
+  instances.  Once created, this tree is a complete description of how
+  a certain frame should be rendered. It does not contain any
+  references back to the Qt Quick items at all and will on most
+  platforms be processed and rendered in a separate thread. The
+  renderer is a self contained part of the scene graph which traverses
+  the QSGNode tree and uses geometry defined in QSGGeometryNode and
+  shader state defined in QSGMaterial to schedule OpenGL state change
+  and draw calls.
+
+  If needed, the renderer can be completely replaced using the
+  internal scene graph back-end API. This is mostly interesting for
+  platform vendors who wish to take advantage of non-standard hardware
+  features. For majority of use cases, the default renderer will be
+  sufficient.
+
+  The default renderer focuses on two primary strategies to optimize
+  the rendering. Batching of draw calls and retention of geometry on
+  the GPU.
+
+  \section1 Batching
+
+  Where a traditional 2D API, such as QPainter, Cairo or Context2D, is
+  written to handle thousands of individual draw calls per frame,
+  OpenGL is a pure hardware API and performs best when the number of
+  draw calls is very low and state changes are kept to a
+  minimum. Consider the following use case:
+
+  \image visualcanvas_list.png
+
+  The simplest way of drawing this list is on a cell-by-cell basis. First
+  the background is drawn. This is a rectangle of a specific color. In
+  OpenGL terms this means selecting a shader program to do solid color
+  fills, setting up the fill color, setting the transformation matrix
+  containing the x and y offsets and then using for instance
+  \c glDrawArrays to draw two triangles making up the rectangle. The icon
+  is drawn next. In OpenGL terms this means selecting a shader program
+  to draw textures, selecting the active texture to use, setting the
+  transformation matrix, enabling alpha-blending and then using for
+  instance \c glDrawArrays to draw the two triangles making up the
+  bounding rectangle of the icon. The text and separator line between
+  cells follow a similar pattern. And this process is repeated for
+  every cell in the list, so for a longer list, the overhead imposed
+  by OpenGL state changes and draw calls completely outweighs the
+  benefit that using a hardware accelerated API could provide.
+
+  When each primitive is large, this overhead is negligible, but in
+  the case of a typical UI, there are many small items which add up to
+  a considerable overhead.
+
+  The default scene graph renderer works within these
+  limitations and will try to merge individual primitives together
+  into batches while preserving the exact same visual result. The
+  result is fewer OpenGL state changes and a minimal amount of draw
+  calls, resulting in optimal performance.
+
+  \section2 Opaque Primitives
+
+  The renderer separates between opaque primitives and primitives
+  which require alpha blending. By using OpenGL's Z-buffer and giving
+  each primitive a unique z position, the renderer can freely reorder
+  opaque primitives without any regard for their location on screen
+  and which other elements they overlap with. By looking at each
+  primitive's material state, the renderer will create opaque
+  batches. From Qt Quick core item set, this includes Rectangle items
+  with opaque colors and fully opaque images, such as JPEGs or BMPs.
+
+  Another benefit of using opaque primitives, is that opaque
+  primitives does not require \c GL_BLEND to be enabled which can be
+  quite costly, especially on mobile and embedded GPUs.
+
+  Opaque primitives are rendered in a front-to-back manner with
+  \c glDepthMask and \c GL_DEPTH_TEST enabled. On GPUs that internally do
+  early-z checks, this means that the fragment shader does not need to
+  run for pixels or blocks of pixels that are obscured. Beware that
+  the renderer still needs to take these nodes into account and the
+  vertex shader is still run for every vertex in these primitives, so
+  if the application knows that something is fully obscured, the best
+  thing to do is to explicitly hide it using Item::visible or
+  Item::opacity.
+
+  \note The Item::z is used to control an Item's stacking order
+  relative to its siblings. It has no direct relation to the renderer and
+  OpenGL's Z-buffer.
+
+  \section2 Alpha Blended Primitives
+
+  Once opaque primitives have been drawn, the renderer will disable
+  \c glDepthMask, enable \c GL_BLEND and render all alpha blended primitives
+  in a back-to-front manner.
+
+  Batching of alpha blended primitives requires a bit more effort in
+  the renderer as elements that are overlapping need to be rendered in
+  the correct order for alpha blending to look correct. Relying on the
+  Z-buffer alone is not enough. The renderer does a pass over all
+  alpha blended primitives and will look at their bounding rect in
+  addition to their material state to figure out which elements can be
+  batched and which can not.
+
+  \image visualcanvas_overlap.png
+
+  In the left-most case, the blue backgrounds can be drawn in one call
+  and the two text elements in another call, as the texts only overlap
+  a background which they are stacked in front of. In the right-most
+  case, the background of "Item 4" overlaps the text of "Item 3" so in
+  this case, each of backgrounds and texts need to be drawn using
+  separate calls.
+
+  Z-wise, the alpha primitives are interleaved with the opaque nodes
+  and may trigger early-z when available, but again, setting
+  Item::visible to false is always faster.
+
+  \section2 Mixing with 3D primitives
+
+  The scene graph can support pseudo 3D and proper 3D primitives. For
+  instance, one can implement a "page curl" effect using a
+  ShaderEffect or implement a bumpmapped torus using QSGGeometry and a
+  custom material. While doing so, one needs to take into account that
+  the default renderer already makes use of the depth buffer.
+
+  The renderer modifies the vertex shader returned from
+  QSGMaterialShader::vertexShader() and compresses the z values of the
+  vertex after the model-view and projection matrices has been applied
+  and then adds a small translation on the z to position it the
+  correct z position.
+
+  The compression assumes that the z values are in the range of 0 to
+  1.
+
+  \section2 Texture Atlas
+
+  The active texture is a unique OpenGL state, which means that
+  multiple primitives using different OpenGL textures cannot be
+  batched. The Qt Quick scene graph for this reason allows multiple
+  QSGTexture instances to be allocated as smaller sub-regions of a
+  larger texture; a texture atlas.
+
+  The biggest benefit of texture atlases is that multiple QSGTexture
+  instances now refer to the same OpenGL texture instance. This makes
+  it possible to batch textured draw calls as well, such as Image
+  items, BorderImage items, ShaderEffect items and also C++ types such
+  as QSGSimpleTextureNode and custom QSGGeometryNodes using textures.
+
+  \note Large textures do not go into the texture atlas.
+
+  Atlas based textures are created by passing
+  QQuickWindow::TextureCanUseAtlas to the
+  QQuickWindow::createTextureFromImage().
+
+  \note Atlas based textures do not have texture coordinates ranging
+  from 0 to 1. Use QSGTexture::normalizedTextureSubRect() to get the
+  atlas texture coordinates.
+
+  The scene graph uses heuristics to figure out how large the atlas
+  should be and what the size threshold for being entered into the
+  atlas is. If different values are needed, it is possible to override
+  them using the environment variables \c {QSG_ATLAS_WIDTH=[width]},
+  \c {QSG_ATLAS_HEIGHT=[height]} and \c
+  {QSG_ATLAS_SIZE_LIMIT=[size]}. Changing these values will mostly be
+  interesting for platform vendors.
+
+  \section1 Batch Roots
+
+  In addition to mergin compatible primitives into batches, the
+  default renderer also tries to minimize the amount of data that
+  needs to be sent to the GPU for every frame. The default renderer
+  identifies subtrees which belong together and tries to put these
+  into separate batches. Once batches are identified, they are merged,
+  uploaded and stored in GPU memory, using Vertex Buffer Objects.
+
+  \section2 Transform Nodes
+
+  Each Qt Quick Item inserts a QSGTransformNode into the scene graph
+  tree to manage its x, y, scale or rotation. Child items will be
+  populated under this transform node.  The default renderer tracks
+  the state of transform nodes between frames, and will look at
+  subtrees to decide if a transform node is a good candidate to become
+  a root for a set of batches. A transform node which changes between
+  frames and which has a fairly complex subtree, can become a batch
+  root.
+
+  QSGGeometryNodes in the subtree of a batch root are pre-transformed
+  relative to the root on the CPU. They are then uploaded and retained
+  on the GPU. When the transform changes, the renderer only needs to
+  update the matrix of the root, not each individual item, making list
+  and grid scrolling very fast. For successive frames, as long as
+  nodes are not being added or removed, rendering the list is
+  effectively for free. When new content enters the subtree, the batch
+  that gets it is rebuilt, but this is still relatively fast. There are
+  usually several unchanging frames for every frame with added or
+  removed nodes when panning through a grid or list.
+
+  Another benefit of identifying transform nodes as batch roots is
+  that it allows the renderer to retain the parts of the tree that has
+  not changed. For instance, say a UI consists of a list and a button
+  row. When the list is being scrolled and delegates are being added
+  and removed, the rest of the UI, the button row, is unchanged and
+  can be drawn using the geometry already stored on the GPU.
+
+  The node and vertex threshold for a transform node to become a batch
+  root can be overridden using the environment variables \c
+  {QSG_RENDERER_BATCH_NODE_THRESHOLD=[count]} and \c
+  {QSG_RENDERER_BATCH_VERTEX_THRESHOLD=[count]}. Overriding these flags
+  will be mostly useful for platform vendors.
+
+  \note Beneath a batch root, one batch is created for each unique
+  set of material state and geometry type.
+
+  \section2 Clipping
+
+  When setting Item::clip to true, it will create a QSGClipNode with a
+  rectangle in its geometry. The default renderer will apply this clip
+  by using scissoring in OpenGL. If the item is rotated by a
+  non-90-degree angle, the OpenGL's stencil buffer is used. Qt Quick
+  Item only supports setting a rectangle as clip through QML, but the
+  scene graph API and the default renderer can use any shape for
+  clipping.
+
+  When applying a clip to a subtree, that subtree needs to be rendered
+  with a unique OpenGL state. This means that when Item::clip is true,
+  batching of that item is limited to its children. When there are
+  many children, like a ListView or GridView, or complex children,
+  like a TextArea, this is fine. One should, however, use clip on
+  smaller items with caution as it prevents batching. This includes
+  button label, text field or list delegate and table cells.
+
+  \section2 Vertex Buffers
+
+  Each batch uses a vertex buffer object (VBO) to store its data on
+  the GPU. This vertex buffer is retained between frames and updated
+  when the part of the scene graph that it represents changes.
+
+  By default, the renderer will upload data into the VBO using
+  \c GL_STATIC_DRAW. It is possible to select different upload strategy
+  by setting the environment variable \c
+  {QSG_RENDERER_BUFFER_STRATEGY=[strategy]}. Valid values are \c
+  stream and \c dynamic. Changing this value is mostly useful for
+  platform vendors.
+
+  \section1 Performance
+
+  As stated in the beginning, understanding the finer details of the
+  renderer is not required to get good performance. It is written to
+  optimize for common use cases and will perform quite well under
+  almost any circumstance.
+
+  \list
+
+  \li Good performance comes from effective batching, with as little
+  as possible of the geometry being uploaded again and again. By
+  setting the environment variable \c {QSG_RENDERER_DEBUG=render}, the
+  renderer will output statistics on how well the batching goes, how
+  many batches, which batches are retained and which are opaque and
+  not. When striving for optimal performance, uploads should happen
+  only when really needed, batches should be fewer than 10 and at
+  least 3-4 of them should be opaque.
+
+  \li The default renderer does not do any CPU-side viewport clipping
+  nor occlusion detection. If something is not supposed to be visible,
+  it should not be shown. Use \c {Item::visible: false} for items that
+  should not be drawn. The primary reason for not adding such logic is
+  that it adds additional cost which would also hurt applications that
+  took care in behaving well.
+
+  \li Make sure the texture atlas is used. The Image and BorderImage
+  items will use it unless the image is too large. For textures
+  created in C++, pass QQuickWindow::TextureCanUseAtlas when
+  calling QQuickWindow::createTexture().
+  By setting the environment variable \c {QSG_ATLAS_OVERLAY} all atlas
+  textures will be colorized so they are easily identifiable in the
+  application.
+
+  \li Use opaque primitives where possible. Opaque primitives are
+  faster to process in the renderer and faster to draw on the GPU. For
+  instance, PNG files will often have an alpha channel, even though
+  each pixel is fully opaque. JPG files are always opaque. When
+  providing images to an QQuickImageProvider or creating images with
+  QQuickWindow::createTextureFromImage(), let the image have
+  QImage::Format_RGB32, when possible.
+
+  \li Be aware of that overlapping compond items, like in the
+  illustration above, can not be batched.
+
+  \li Clipping breaks batching. Never use on a per-item basis, inside
+  tables cells, item delegates or similar. Instead of clipping text,
+  use eliding. Instead of clipping an image, create a
+  QQuickImageProvider that returns a cropped image.
+
+  \li Batching only works for 16-bit indices. All built-in items use
+  16-bit indices, but custom geometry is free to also use 32-bit
+  indices.
+
+  \li Some material flags prevent batching, the most limiting one
+  being QSGMaterial::RequiresFullMatrix which prevents all batching.
+
+  \endlist
+
+  If an application performs poorly, make sure that rendering is
+  actually the bottleneck. Use a profiler! The environment variable \c
+  {QSG_RENDER_TIMING=1} will output a number of useful timing
+  parameters which can be useful in pinpointing where a problem lies.
+
+ */
author	Gunnar Sletta <gunnar.sletta@digia.com>	2013-08-14 07:27:07 +0200
committer	The Qt Project <gerrit-noreply@qt-project.org>	2013-09-02 14:24:36 +0200
commit	b480fa83a632b2ae5606e2870b47358328b479a2 (patch)
tree	bdd3e1b68a5a15a3950e13a50db911a93cdf279a /src/quick/doc
parent	9be35c270082d1614886874e17cc3f90a7a3f489 (diff)