Big data case study or use case example

I have read lot of blogs\article on how different type of industries are using Big Data Analytic. But most of these article fails to mention

  1. What kinda data these companies used. What was the size of the data
  2. What kinda of tools technologies they used to process the data
  3. What was the problem they were facing and how the insight they got the data helped them to resolve the issue.
  4. How they selected the tool\technology to suit their need.
  5. What kinda pattern they identified from the data what kind of patterns they were looking from the data.

I wonder if someone can provide me answer to all these questions or a link which at-least answer some of the the questions. I am looking for real world example.

It would be great if someone share how finance industry is making use of Big Data Analytic.

Topic usecase bigdata data-mining

Category Data Science


  • Kaggle has a short summary of applications

  • Revolution Analytics published many general case studies, datasheets, and white papers

  • For applications in sciences and engineering, you can consult Nutonian case studies

  • Analyx told potential clients about applications in commerce

  • The Financial Times published a collection of stories about business applications of big data

  • McKinsey outlined applications back in 2011

Other consulting firms made similar reports.

Gartner created Hype Cycle for Big data:

enter image description here

Not to mention the case studies and white papers by other companies that want to promote their products.


Take a look at O'Reilly free data reports . You can find reports on Banking and Fintech, Sports, Fashion, Music, Health, Oil and Gas and so on.

Keep in mind that McKinsey report mentioned earlier is a classic report and a must-read.


Financial Services is a big user of Big Data, and innovator too. One example is mortgage bond trading. To answer your questions for it:

What kinda data these companies used. What was the size of the data?

  • Long histories of each mortgage issued for the past many years, and payments by month against them. (Billions of rows)
  • Long histories of credit histories. (Billions of rows)
  • Home price indices. (Not as big)

What kinda of tools technologies they used to process the data?

It varies. Some use in-house solutions built on databases like Netezza or Teradata. Others access the data via systems provided by the data providers. (Corelogic, Experian, etc) Some banks use columnal database technologies like KDB, or 1010data.

What was the problem they were facing and how the insight they got the data helped them to resolve the issue.

The key issue is determining when mortgage bonds (mortgage backed-securities) will prepay or default. This is especially important for bonds that lack the government guarantee. By digging into payment histories, credit files, and understanding the current value of the house, it's possible to predict the likelihood of a default. Adding an interest rate model and prepayment model also helps predict the likelihood of a prepayment.

How they selected the tool\technology to suit their need.

If the project is driven by internal IT, usually it's based off of a large database vendor like Oracle, Teradata or Netezza. If it's driven by the quants, then they are more likely to go straight to the data vendor, or a 3rd party "All in" system.

What kinda pattern they identified from the data & what kind of patterns they were looking from the data.

Linking the data gives great insights into who is likely to default on their loans, and prepay them. When you aggregated the loans into bonds, it can be the difference between a bond issued at $100,000,000 being worth that amount, or as little as $20,000,000.


News outlets tend to use "Big Data" pretty loosely. Vendors usually provide case studies surrounding their specific products. There aren't a lot out there for open source implementations, but they do get mentioned. For instance, Apache isn't going to spend a lot of time building a case study on hadoop, but vendors like Cloudera and Hortonworks probably will.

Here's an example case study from Cloudera in the finance sector.

Quoting the study:

One major global financial services conglomerate uses Cloudera and Datameer to help identify rogue trading activity. Teams within the firm’s asset management group are performing ad hoc analysis on daily feeds of price, position, and order information. Having ad hoc analysis to all of the detailed data allows the group to detect anomalies across certain asset classes and identify suspicious behavior. Users previously relied solely on desktop spreadsheet tools. Now, with Datameer and Cloudera, users have a powerful platform that allows them to sift through more data more quickly and avert potential losses before they begin.

.

A leading retail bank is using Cloudera and Datameer to validate data accuracy and quality as required by the Dodd-Frank Act and other regulations. Integrating loan and branch data as well as wealth management data, the bank’s data quality initiative is responsible for ensuring that every record is accurate. The process includes subjectingthe data to over 50 data sanity and quality checks. The results of those checks are trended over time to ensure that the tolerances for data corruption and data domains aren’t changing adversely and that the risk profiles being reported to investors and regulatory agencies are prudent and in compliance with regulatory requirements. The results are reported through a data quality dashboard to the Chief Risk Officer and Chief Financial Officer, who are ultimately responsible for ensuring the accuracy of regulatory compliance reporting as well as earnings forecasts to investors

I didn't see any other finance related studies at Cloudera, but I didn't search very hard. You can have a look at their library here.

Also, Hortonworks has a case study on Trading Strategies where they saw a 20% decrease in the time it took to develop a strategy by leveraging K-means, Hadoop, and R.

Each color indicates a group of strategies with similar probability of a profit and loss

how the trading system was improved by using Hadoop (Hortonworks Data Platform), and the k-means algorithm

These don't answer all of your questions. I'm pretty sure both of these studies covered most of them. I don't see anything about tool selection specifically. I imagine sales reps had a lot to do with getting the overall product in the door, but the data scientists themselves leveraged the tools they were most comfortable with. I don't have a lot of insight into that area in the big data space.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.