d3js word cloud

, , Leave a comment

To display in an attractive and condensed way the type of projects I have been involved in, I decided to make a word cloud. I used the d3js library, which makes spot-on interactive graphs. The community is also very active, and making one d3js visualisation requires understanding data, data format, and javascript, so each time I settle on making one, I learn a little bit more about one of these.

The Word Cloud

Word clouds are definitely not an analysis tool, but they are really good communication tools! d3js has an implementation of the word cloud, which I found here: Github link. The API is very versatile: there are methods about the status of the word cloud or the design. I wanted a simple word cloud, regularly updated with different words. I found someone who had implemented it by loading a list of words, and used a random frequency to display the size of the words.

My implementation

The base: update cloud from a list

The main component of the cloud was to draw it, and to update it.

function makewordcloud(selector) {
    var w = 500;
    var h = 500;
    var fill = d3.scale.category20();

    // Create word cloud svg
    var svg = d3.select(selector).append("svg")
        .attr("width", w)
        .attr("height", h)
        .append("g")
        .attr("transform", "translate(250,250)");

    // Draw the word cloud
    function draw(projects) {
        // Load the text
        var cloud = svg.selectAll("g text")
                        .data(projects, function(d) {return d.text; })

        // Create a scale based on the frequency words appear (variable in the data)
        var sizeScale = d3.scale.linear()
                            .domain([0, d3.max(projects, function(d) { return d.freq} )])
                            .range([10, 95]); 

        // Enter and style each word
        cloud.enter()
            .append("text")
            .style("font-family", "Impact")
            .style("fill", function(d, i) { return fill(i); })
            .attr("text-anchor", "middle")
            .attr('font-size', 1)
            .text(function(d) { return d.text; });

        // Transitions between each drawing
        cloud.transition()
                .duration(600)
                .style("font-size", function(d) { return sizeScale(d.freq) + "px"; })
                .attr("transform", function(d) {
                    return "translate(" + [d.x, d.y] + ")rotate(" + d.rotate + ")";
                })
                .style("fill-opacity", 1);

        // Exit the words by slowly reducing
        cloud.exit()
            .transition()
                .duration(200)
                .style('fill-opacity', 1e-6)
                .attr('font-size', 1)
                .remove();
    }
    // Udate the words to be shown
    return {
        update: function(frequency_list) {
           var sizeScale = d3.scale.linear()
                            .domain([0, d3.max(frequency_list, function(d) { return d.freq} )])
                            .range([10, 95]);
             //Update the title of the project               
            document.getElementById('title').innerHTML = frequency_list[0].title;

            d3.layout.cloud().size([w, h])
                    .words(frequency_list)
                    .padding(5)
                    .rotate(function() { return ~~(Math.random() * 2) * 90; })
                    .font("Impact")
                    .fontSize(function(d) { return sizeScale(d.freq); })
                    .on("end",draw)
                    .start();
            }
        }
}//makewordcloud

Cleaning words

The list that fed into the word cloud contained the entire text I wrote for that project. I didn’t want to select and I wanted it to be somewhat dynamic.. so I looked for a stopword function. I’ve used it with NLTK, the natural language processing library of python, but I didn’t want to do too much pre-processing.

I found a great javascript function by GeekLand that did just that, and I used it to “clean” the list of projects I had. The function is applied on a string from which I removed already the punctuation otherwise “blah.” would have been different from “blah”.

    var cleanwords = [];
    for(var j=0; j < projects.length; j++) {
        cleanwords[j] = removeStopWords(projects[j].replace(/[!\.,:;\?\']/g, ''));
    }

So the string is cleaned by the javascript function as it gets loaded in the browser. For performance it’s probably not the best in case there is a lot of text, but I wanted it to be easy to update, and I didn’t want a lot of text anyway.

Adding frequency

A word cloud doesn’t make much sense if the size of the words are random. I wrote a function to count the frequency of each word and to select the 20 most frequent (actually I used the sort function on the frequency list and took the 20 first words)

function getwords(i) {
    var wordSize = 50;
    var list = cleanwords[i].split(' ');
        result = { };
        for(i = 0; i < list.length; ++i) {
            if(!result[list[i]])
                result[list[i]] = 0;
            ++result[list[i]];
        }
        var newList = _.uniq(list);
        var frequency_list = [];
        for (var i = 0; i < newList.length; i++) { var temp = newList[i]; frequency_list.push({ text : temp, freq : result[newList[i]], title: list[0] }); } frequency_list.sort(function(a,b) { return parseFloat(b.freq) - parseFloat(a.freq) } ); for(i in frequency_list){ if(frequency_list[i].freq*wordSize > 160)   
                wordSize = 3;
        }

        frequency_list = frequency_list.slice(1,20);

    return frequency_list;
}

Link

You can find my implementation bl.ocks.org: