Andrew Trice

Real-World Rich Internet Applications

20081204 Thursday December 04, 2008

Visualizing LARGE Data Sets

Recently I heard someone bashing Flex/Flash for having slow data rendering for charts, and of course I had to disagree. Not only did I fervently disagree that point, but I decided to write some examples to prove it. The Flash player has amazing graphical capabilities, and makes visual representations of data very easy and very powerful.

The complaint (and assumption) was that the Flash player's graphics are slow because Flex charts can't handle large data sets. By "large", i mean hundreds of thousands, or even millions of data points.

Well... the first part of this is to not confuse the speed of a framework component with the speed of the graphics capabilities of the flash player. This is not a bashing of the Flex framework; I love Flex and what it can do.

Let's first take a look at what the Flex framework is: It is a framework built on top of Flash Player APIs that enables rapid development of Flash-based applications. Flex is a productivity tool. Adobe has done all of the hard work in creating an application framework. The framework is there to provide generic out-of-the-box components that work for the vast majority of situations and are highly customizable; which it does exceptionally well. The Flex charting components work with a variety of data types, are completely customizable, and make data visualization very easy.

Not everything that you do has to be done through existing Flex framework components. One of the beauties of developing in Flex is that you can customize everything. That could be down to the drawing API level, or (if you are brave enough) down to the pixel manipulation level.

If you are trying to visualize VERY large data sets, chances are the default Flex charting components actually won't cut it. You need to trade generic, customizable objects for specialized visualization objects. The biggest problem with large data sets is that they take a long time to render visually. This is where specialized and optimized code goes a long way.

Let's take a look at a few examples. Keep in mind, all of the data is randomly generated on the client within just a few seconds. No data serialization is taking place -- that could incur additional processing time in large data sets.

The first one is a standard Flex scatter plot chart that display. When you click the "Generate Data" button, this example will randomly generate 5,000 data points and display them within the scatter plot. While there is no "hard" upper bounds on the number of data points that can be visualized within a standard Flex chart, as that number increases, so does the amount of time taken to render the chart. I've pushed this to 10,000 data points, but things definitely start to slow down.

Click the "Generate Data Set" button to view the data visualization.

please be patient if it's running slow on your machine! there's a lot going on here

Now, let's take a look at what's happening in the code. It's actually very simple... data is being generated and set as the dataProvider of the chart instance.

<mx:Script>
  <![CDATA[
    
    private function generateData() : void
    {
      var array : Array = new Array();
      var i : int = 0;
      while ( i < 5000 )
      {
        var o : Object = {};
        o.x = 50 + Math.sin(i) * Math.random() * 75;
        o.y = 50 + Math.cos(i) * Math.random() * 50;
        array.push( o );
        i++;
      }
      chart.dataProvider = array;
    }
    
  ]]>
</mx:Script>

<mx:ApplicationControlBar dock="true">
  
  <mx:Button 
    label="Generate Data Set"
    click="generateData()" />
  
</mx:ApplicationControlBar>

<mx:PlotChart 
  id="chart"
  top="10" bottom="10" left="10" right="10" >
  
  <mx:series>
    <mx:PlotSeries yField="y" xField="x"/>
  </mx:series>
  
</mx:PlotChart>


One approach to creating faster (and more specialized) data visualizations is to do them yourself using the Flash player's drawing API. The next example shows how to use the drawing API to create a customized scatter plot. A small circle is draw for each data point in the scatter plot. this example shows 15,000 data points. With custom charts, you have complete control over how the data is displayed, and it gets easier to customize each individual data point (for example different color per point, or different shape per point).

I've pushed this one upwards of 50,000 data points before it starts to bog down. One thing to keep in mind when using the drawing API and custom visualizations is that transparency negatively impacts rendering performance. The more transparency, the slower it will render, and fewer data points you will be able to show.

Click the "Generate Data Set" button to view the data visualization.

once again, please be patient if it's running slow on your machine! there's a lot going on here

Now, let's take a look this code. Data is being generated by the generateData() function, and data is being rendered by the renderData() function.

<mx:Script>
  <![CDATA[
    import mx.events.IndexChangedEvent;
    import flash.utils.getTimer;
    
    private var datum : Array = [];
    
    private function generateData() : void
    {
      datum = [];
      var i : int = 0;
      while ( i < 15000 )
      {
        var o : Object = {};
        o.x = .5 + Math.sin(i) * Math.random() * .5;
        o.y = .5 + Math.cos(i) * Math.random() * .5;
        datum.push( o );
        i++;
      }
      renderData();
    }
    
    private function renderData() : void
    {
      var i : int = 0;
      var w : int = chart.width;
      var h : int = chart.height;
      var g:Graphics = chart.graphics;
      
      g.clear();
      
      
      //draw the background
      g.beginFill( 0xFFFFFF, 1 )
      g.drawRect( 0,0, w,h );
      g.endFill();
      
      g.lineStyle(1, 0, .5);
      
      //draw each data point
      for each (var o : Object in datum)
      {
        g.beginFill( o.x * 0xFFFFFF, 1 )
        g.drawCircle( o.x * w, h-(o.y * h), 2 );
        g.endFill();
      }
      
      //draw the overlaying grid
      g.lineStyle(1, 0);
      g.drawRect( 0,0,w,h );
      
      var segments : int = 10;
      var interval : int = w/segments;
      
      for ( i = interval; i<w-interval; i += interval )
      {
        g.moveTo( i, 0 );
        g.lineTo( i, h );
      }
      
      interval = h/segments;
      
      for ( i = interval; i<h-interval; i += interval )
      {
        g.moveTo( 0, i );
        g.lineTo( w, i );
      }
    }
    
  ]]>
</mx:Script>

<mx:ApplicationControlBar
  dock="true">
  
  <mx:Button 
    label="Generate Data Set"
    click="generateData()" />
  
</mx:ApplicationControlBar>

<mx:UIComponent 
  id="chart"
  top="10" bottom="10" left="10" right="10"
  cacheAsBitmap="true"
  mask="{ maskCanvas }" 
  resize="renderData()" />

<mx:Canvas
  id="maskCanvas"
  top="10" bottom="9" left="10" right="9"
  backgroundColor="#FFFFFF" />


Now, let's get down to business... Let's say that you have a million data points. Yes, you heard me... That is a one followed by six zeros: 1,000,000. Surely that can't be rendered in a Flex UI, can it?

Actually, yes it can.

You might have some trouble using the drawing API's moveTo, lineTo, and other methods with this much data, however you can use an Image object and set BitmapData pixels directly. This is the absolute fastest way to render large datasets. By no coincidence, this is also the hardest way and requires a lot of work (and potentially a lot of manual calculation). The drawing API makes the complex tasks of drawing shapes (with anti aliasing) easy, but they are also processing intensive by comparison to simply setting pixel values.

The next example shows a scatter-plot visualization that displays data for 1 million data points. It generates the data, and sets individual pixel values based on the x,y position of the data points. I've actually pushed this example to render 5 million (5,000,000) data points, but there are obvious performance issues with this much data.

Click the "Generate Data Set" button to view the data visualization.

again, please be patient if it's running slow on your machine! that's a million data points!

<mx:Script>
<![CDATA[
  import flash.utils.getTimer;

  private var datum : Array = [];

  private function generateData() : void
  {
    datum = [];
    var i : int = 0;
    while ( i < 1000000 )
    {
      var o : Object = {};

      o.x = .5 + Math.sin(i) * Math.random() * .5;
      o.y = .5 + Math.cos(i) * Math.random() * .5;

      datum.push( o );
      i++;
    }
    renderData();

  }

  private function renderData() : void
  {
    var i : Number = 0;

    var w : int = chart.width;
    var h : int = chart.height;

    //create a new bitmapdata object
    var bd : BitmapData = new BitmapData( w, h, false, 0xFFFFFF );


    //render each data point by setting a pixel
    for each (var o : Object in datum)
    {
      bd.setPixel( o.x * w, h-(o.y * h), o.y * 0xFFFFFF ); 
    }


    //render the grid overlay
    var segments : int = 10;
    var interval : Number = w/segments;
    var q:Number;

    for ( i = 0; i<=w; i += interval )
    {
      for ( var j : int = 0; j < w; j ++ )
      {
        bd.setPixel( i, j, 0 );
      }
    }

    interval = h/segments;
    for ( i = 0; i<=h; i += interval )
    {
      for ( j = 0; j < w; j ++ )
      {
        bd.setPixel( j, i, 0 );
      }
    }

    //set the image source to the manipulated bitmap
    chart.source = new Bitmap( bd );
  }

]]>
</mx:Script>

<mx:ApplicationControlBar
dock="true">

<mx:Button 
  label="Generate Data Set"
  click="generateData()" />

</mx:ApplicationControlBar>

<mx:Image 
id="chart"
top="10" bottom="10" left="10" right="10"
cacheAsBitmap="true" 
resize="renderData()" />


This is just a simple example for pixel manipulation, which is only intended to demonstrate the technique. The data that is rendered is only between the values 0 and 1, so it is easy to calculate each data point's X,Y placement. "Real" data will require additional logic to determine the value range of the scatter plot, the X,Y position of each data point, and any kind of labels or axes. I'm also not accounting for multiple points that may overlap on the same pixel, and you will also notice that the chart is not interactive.

In conclusion, the Flash player can handle lots of data. Again, this is by no means a bash on the Flex charting components. Out of the box, they support complex visualization, they are extremely customizable, they are very easy to animate, they are easy to use, and they are great. However, if you are working with extremely large data sets, then you need to explore other means to visually represent your data.

Posted by andrewtrice | Dec 04 2008, 09:44:38 AM EST
XML