Friday, 7 October 2011

Buffer object streaming in OpenGL

This article presents an algorithm for asynchronous data uploading on the GPU called buffer streaming. It is based on a discussion on the OpenGL forum, and more precisely on a suggestion of Rob Barris (from Blizzard, also member of the ARB). The link to the discussion is given at the end of the article. The algorithm can be used for many interesting things such as efficient uniform data specification (using uniform buffer objects) or to replace the deprecated immediate mode for rendering. The demo I provide performs the latter by rendering a Quake2 Md2 model using an OpenGL 3 (and above) Core profile context.

Many applications process data on the CPU before rendering it. In a key-framed animation for example, the vertices of the mesh are interpolated (usually linearly) to smooth the animation. Since OpenGL3, the data used for rendering has to be stored in buffer objects, so if you have to update your data before each new frame, you also end up having to transfer it into a buffer object. There's been a lot of debate amongst the OpenGL discussion boards on how to do this efficiently, one of the most interesting being this one (definitely worth reading for developers wanting to use buffer objects in OpenGL). Ideally, the transfer should not require synchronization between the CPU and the GPU. Fortunately, such a procedure is possible with the ARB_map_buffer_range extension, which is available on every OpenGL3 compliant GPUs.

Buffer object streaming algorithm in OpenGL
So we have the following scenario: data is written by the CPU to a buffer, which is then read by the GPU. In OpenGL, there are several ways to write to a buffer (glBufferData, glBufferSubData, glMapBuffer and glMapBufferRange to name them all), but there's only one way to do it asynchronously : by calling glMapBufferRange with the unsynchronized flag (GL_MAP_UNSYNCHRONIZED_BIT), so this is what we'll be using. Since the whole process is asynchronous, we have to guarantee that we'll never end up writing to a region of the buffer which is in use by the GPU. The idea is to allocate a fixed amount of memory for the buffer object (using glBufferData, and data set to NULL), and initialize an offset variable to 0. The memory amount should be greater than the data which needs to be processed, but not too big either for fast allocation. A few Mega Bytes is good (I use 8 MBytes in my demo).
// configure buffer objects
glBindBuffer(GL_ARRAY_BUFFER, buffers[BUFFER_VERTEX_MD2]);
glBindBuffer(GL_ARRAY_BUFFER, 0);
When the data has been processed by the CPU, we upload it to mapped region of the buffer object. Once the upload has been done, we increase the offset by the amount of data we added. Hence we also have to watch for overflowing : if the size of the data we're uploading exceeds the buffer capacity, we allocate a new memory block for the buffer, and reset the offset variable. This process is called orphaning.
// stream variables
static GLuint streamOffset = 0;
static GLuint drawOffset   = 0;

// bind the buffer
glBindBuffer(GL_ARRAY_BUFFER, buffers[BUFFER_VERTEX_MD2]);
// orphan the buffer if full
GLuint streamDataSize = fw::next_power_of_two(md2->TriangleCount()
if(streamOffset + streamDataSize > STREAM_BUFFER_CAPACITY)
 // allocate new space and reset the vao
 glBufferData( GL_ARRAY_BUFFER,
               GL_STREAM_DRAW );
  glBindBuffer(GL_ARRAY_BUFFER, buffers[BUFFER_VERTEX_MD2]);
  glVertexAttribPointer( 0, 3, GL_FLOAT, 0, sizeof(Md2::Vertex),
                         FW_BUFFER_OFFSET(0) );
  glVertexAttribPointer( 1, 3, GL_FLOAT, 0, sizeof(Md2::Vertex),
  glVertexAttribPointer( 2, 2, GL_FLOAT, 0, sizeof(Md2::Vertex),
 // reset offset
 streamOffset = 0;

// get memory safely
Md2::Vertex* vertices = (Md2::Vertex*)
// make sure memory is mapped
if(NULL == vertices)
 throw std::runtime_error("Failed to map buffer.");

// set final data

// unmap buffer
glBindBuffer(GL_ARRAY_BUFFER, 0);

// compute draw offset
drawOffset = streamOffset/sizeof(Md2::Vertex);

// increment offset
streamOffset += streamDataSize;
And there you have it, asynchronous data upload !

A few additional notes/guidelines
- Try to make your data size a power of two.
- If you are using your buffer object for rendering, you'll need to reset your vertex array objects after orphaning. Otherwise, you can use set the first argument or the baseVertex of your drawing function. See an excerpt of my demo's source code below (note how I evaluate the first parameter in glDrawArrays):
// draw
glDrawArrays( GL_TRIANGLES,

Rendering a QuakeII Md2 model: I use the buffer streaming algorithm to upload the vertices of a mesh and render it in an OpenGL4.2 Core Profile context. You can download the source archive here. A vs2010 project and a makefile are provided, you should be able to compile under Windows and Linux (works for me with Win7 x64 and Ubuntu Lucid x64 with a Radeon 5770 and Catalyst 11.12). You'll need an OpenGL4.2 compliant GPU to run the demo.

References / Valuable reads
- Rob Barris post on the OpenGL forum :
- OpenGL wiki on buffer streaming :
- OpenGL wiki on buffer objects :
- Unofficial quake md2 model specification :


  1. Thanks a lot! It helped to actually see code based on what the wiki and related forum post talked about.

    1. Glad it helped you! Have you tried building the sources? I'm curious to know if everything works 'out of the box'...

    2. I just tried it. I'm having problems at line 531: glutCreateWindow("OpenGLBufferStreaming");

      Which is really strange...
      It just quits as if normal right after that line, no error messages or anything. Debugging doesn't give any useful info.

      Anyways, my computer only supports openGL 3.3 so I don't think I would be the best person to test anyways.

  2. So apparently you managed to compile the code, good!
    The demo will only run on OpenGL4 hardware, it's not surprising it crashes on your platform. Weird that GLUT doesn't complain, though. Thanks for your feedback anyway :)

  3. can you provide an example that works in older opengl?

  4. Nice article! Very helpful.

    The streamDataSize (next_power_of_two) may not be divisible with vertex size, which causes problems with glDrawArrays rendering from drawOffset. In my case I got the remainder .3333 and only every third buffer got rendered (not sure it was related to the size of the remainder). Luckily I triggered the streaming/mapping manually from the keyboard so it was pretty obvious when it rendered a blank screen. When I skipped calculating the next power of two it worked beautifully.

    Also, there is more about asynchronous streaming here: