Data visualization of frequencies of state transitions (possibly in R?)

I am working on some experimental data, which can be of types A, B and C. Now I observe this data for 5 time points, and I can see them move between A to B, B to C,... etc. I see such transitions for a number of independent data points, and I have the cumulative frequencies from all data.

For example, I have: $$ Period A B C \newline 1 4 4 2 2 1 2 7 3 0 1 9 4 10 0 0 5 8 1 1 $$

I DO know the transitions from one state to another, for example from A->B, B->C so on and so forth. For example I know that from Period 1, (all A's went to C. Among the missing B's one went to A, and rest to C.) I was thinking of what would be the best way to visually represent this time wise transitions from one state to another. I was thinking that there might be some better way than just having a transition matrix, maybe something that looks like a Markov Chain but which could accommodate all the 5 periods of transitions in a succinct way? I myself work on a statistical software called STATA, which has limited graphical applications. IS there something on other software packages (R maybe?) which can help me in this?

  • Sorry for the hack representation of the data matrix.

Topic stata markov-process visualization r

Category Data Science


How about a Sankey diagram with time on the x-axis and flow width representing state transition frequency. Here is a SO discussion on implementing Sankey diagrams in R. enter image description here

One possible R package is {riverplot}... here is code showing the first transition in your data:

library(riverplot)
nodes <- as.character(sapply(1:2, FUN = function(n){paste0(LETTERS[1:3],n)}))
edges <- list(A1=list(C2=4), B1=list(A2=1,C2=1,B2=2), C1=list(C2=2))
r <- makeRiver( nodes, edges, node_xpos= c( 1,1,1 ,2,2,2),
                node_labels= c( A1= "A", B1= "B", C1= "C", A2="A",B2="B",C2="C" ))
plot( r )

Will produce this: enter image description here


If you have the data in the form of a table of transition counts: $$ Transition Period 1 Period 2 Period 3 Period 4 \newline A->A 0 0 0 8 A->B 0 0 0 1 A->C 4 1 0 1 B->A 1 0 1 0 B->B 2 0 0 0 B->C 1 1 0 0 C->A 0 0 9 0 C->B 0 0 0 0 C->C 2 7 0 0 $$ Then a possible visualization is an area plot. The following chart was produceds in Excel (use Charts/Area button on the Insert ribbon). This chart accurately captures all transitions that occurred in each period. Shaded areas of different colors represent the relative frequencies of transitions by origin-destination pair.


I'm not sure if this is the type of analysis you are after, but you mention that the visual side is restricted in STATA. A colleague wrote a blog that utilised neo4j to read web data into a graph database, and d3js to display the data graphically.

I realise you don't have web data as such, but your data can be stored in a graph database, but I guess when I was asking about what types of analysis you were planning on doing, I was asking were you needing a qualitative or quantitative direction. But it seems like you are still in the process of working that out. The nice thing with neo4j is that you can pull the data into R and do any sort of analytics you want on it.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.