Are there decisive leaders in programming with tabular data?

What are the most effective bread-and-butter in-memory open source tabular data frameworks today? I have been working with tabular data for years with an in-house solution that integrates with Excel well, but falls short of many other expectations. I would like to (if possible/true) demonstrate that our solution has fallen behind the times.

In other words, assuming an SQL-like platform is responsible for persistence of a data set, but cycle intensive calculations need to be performed on that dataset (E.g. stochastic simulation processes), an efficient framework to program in is . Advantageous features, to give an idea

  • efficient use of memory for common operations like random access, sorting, map/reduce/filter.
  • performant when serializing and deserializing
  • offers good expressiveness or extensibility
  • Isn't bound by oppressive commercial licensing agreements

The dataframe of pandas is the best product I can find in the community, but it is a mixed bag as far as performance is concerned. Matlab comes to mind but is excessively commercialized to the point that using it for most distributed applications on homegrown cloud application becomes a nightmare.

Topic data-table sql pandas nosql

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.