Thursday, March 31, 2016

Presentation and Visualization Methods

Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly.

Because of the way the human brain processes information, it is faster for people to grasp the meaning of many data points when they are displayed in charts and graphs rather than poring over piles of spreadsheets or reading pages and pages of reports.

Data Visualization


Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily. Visualizations convey information in a universal manner and make it simple to share ideas with others. Gigantic amounts of data are being generated on a daily basis. The amount of data being generated is growing exponentially every year.

Some key points:

·         Identify areas that need attention or improvement
·         Understand what factors influence your customers’ behavior
·         Know which products to place where
·         Predict sales volumes

   Transportation


The power of big data tools and web analytics techniques help the transportation industry improve operations, reduce costs and better serve travelers by crafting important depictions from the large volume of data in underlying databases.



This is a Bar Graph, where each bar depicts a mode of transport and the level of each bar depicts a measure. The graphs show the cost efficiency of various modes of transportation based on different factors.

Healthcare

Driven by industry trends, the analysis of large sets of data, such as medication usage or hospital re-admissions, has enabled health care providers and policymakers to make smarter decisions and predict future trends. Electronic medical records and decisions by governments and companies to share data have made for smarter decision-making that can save money and provide better care.



Below is a dashboard that depicts the various charts and visualizations that can be used for derive important decisions.

Insurance


The primary value chain of an insurance company is seemingly short and simple. The core processes are to issue policies, collect premium payments, and process claims. The organization is interested in better understanding the metrics spawned by each of these events. Users want to analyze detailed transactions relating to the formulation of policies, as well as transactions generated by claims processing. They want to measure performance over time by coverage, covered item, policyholder, and sales distribution channel characteristics.


In conclusion, what we need to learn from this is that its important to know your audience and present the information to them in the best possible way for them to understand. Visualization should be Intuitive, Simple, Appealing and Interactive.

Reference:

http://www.usatoday.com/story/news/nation/2013/11/24/big-data-health-care/3631211/
http://valen.com/category/predictive-analytics/



Tuesday, March 1, 2016

Big Unstructured Data v/s Structured Relational Data

Lets begin by understanding the definitions of these terms and how they correlate with each other.

Unstructured data: Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

Structured data: structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite.




The major differences between unstructured and structured data are the following:














Volume of data:
The amount of data in each of these types is large. But unstructured data is gaining popularity and is much larger in size. Over the years, the growth of unstructured data is much more than that of structured data. The use of social media and rich media has increased a lot and has given rise to huge amount of content.

Limitations of data warehousing:
1. Cost is high: Implementing a new technology or platform for data warehousing is costly. In the past, there was a high cost for data storage which has now been replaced with integration and maintenance costs.
2. Analysis of unstructured data: The cost of implementing technologies or language like Hadoop costs a lot and is very complex too. Data warehousing systems have to constantly compare the unstructured data with the structured relational data in order to make sense and create a grain. This task is time consuming and takes a lot of resources.
Other disadvantages are:
Major data schema transforms from each of the data sources to one schema in the data warehouse, which can represent more than 50% of the total data warehouse effort.
Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues.

Data warehousing in the long run:

As per leading experts, traditional data warehouse ETL has become too slow, too complicated, and too expensive to address the torrent of new data sources and new analytic approaches needed for decision making. The new ETL environment is already looking drastically different.

Data Analytics can move beyond the limitations imposed due to the lack of structure in unstructured data and can now seamlessly use all forms of data together in a single context for analytics. The value of such a capability holds tremendous promises for the future of analytics.

More and more firms will be moving on faster cloud based databases.Multi structure formats like XML, JSON will be supported and processing of the data will be offered on the cloud.

References:

http://www.whamtech.com/adv_disadv_dw.htm
http://www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it/
http://go.cloudera.com/the-future-of-data-warehousing
http://www.edureka.co/blog/answering-the-big-question-what-is-big-data/