Thursday, March 31, 2016

Presentation and Visualization Methods

Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly.

Because of the way the human brain processes information, it is faster for people to grasp the meaning of many data points when they are displayed in charts and graphs rather than poring over piles of spreadsheets or reading pages and pages of reports.

Data Visualization


Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily. Visualizations convey information in a universal manner and make it simple to share ideas with others. Gigantic amounts of data are being generated on a daily basis. The amount of data being generated is growing exponentially every year.

Some key points:

·         Identify areas that need attention or improvement
·         Understand what factors influence your customers’ behavior
·         Know which products to place where
·         Predict sales volumes

   Transportation


The power of big data tools and web analytics techniques help the transportation industry improve operations, reduce costs and better serve travelers by crafting important depictions from the large volume of data in underlying databases.



This is a Bar Graph, where each bar depicts a mode of transport and the level of each bar depicts a measure. The graphs show the cost efficiency of various modes of transportation based on different factors.

Healthcare

Driven by industry trends, the analysis of large sets of data, such as medication usage or hospital re-admissions, has enabled health care providers and policymakers to make smarter decisions and predict future trends. Electronic medical records and decisions by governments and companies to share data have made for smarter decision-making that can save money and provide better care.



Below is a dashboard that depicts the various charts and visualizations that can be used for derive important decisions.

Insurance


The primary value chain of an insurance company is seemingly short and simple. The core processes are to issue policies, collect premium payments, and process claims. The organization is interested in better understanding the metrics spawned by each of these events. Users want to analyze detailed transactions relating to the formulation of policies, as well as transactions generated by claims processing. They want to measure performance over time by coverage, covered item, policyholder, and sales distribution channel characteristics.


In conclusion, what we need to learn from this is that its important to know your audience and present the information to them in the best possible way for them to understand. Visualization should be Intuitive, Simple, Appealing and Interactive.

Reference:

http://www.usatoday.com/story/news/nation/2013/11/24/big-data-health-care/3631211/
http://valen.com/category/predictive-analytics/



Tuesday, March 1, 2016

Big Unstructured Data v/s Structured Relational Data

Lets begin by understanding the definitions of these terms and how they correlate with each other.

Unstructured data: Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

Structured data: structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite.




The major differences between unstructured and structured data are the following:














Volume of data:
The amount of data in each of these types is large. But unstructured data is gaining popularity and is much larger in size. Over the years, the growth of unstructured data is much more than that of structured data. The use of social media and rich media has increased a lot and has given rise to huge amount of content.

Limitations of data warehousing:
1. Cost is high: Implementing a new technology or platform for data warehousing is costly. In the past, there was a high cost for data storage which has now been replaced with integration and maintenance costs.
2. Analysis of unstructured data: The cost of implementing technologies or language like Hadoop costs a lot and is very complex too. Data warehousing systems have to constantly compare the unstructured data with the structured relational data in order to make sense and create a grain. This task is time consuming and takes a lot of resources.
Other disadvantages are:
Major data schema transforms from each of the data sources to one schema in the data warehouse, which can represent more than 50% of the total data warehouse effort.
Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues.

Data warehousing in the long run:

As per leading experts, traditional data warehouse ETL has become too slow, too complicated, and too expensive to address the torrent of new data sources and new analytic approaches needed for decision making. The new ETL environment is already looking drastically different.

Data Analytics can move beyond the limitations imposed due to the lack of structure in unstructured data and can now seamlessly use all forms of data together in a single context for analytics. The value of such a capability holds tremendous promises for the future of analytics.

More and more firms will be moving on faster cloud based databases.Multi structure formats like XML, JSON will be supported and processing of the data will be offered on the cloud.

References:

http://www.whamtech.com/adv_disadv_dw.htm
http://www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it/
http://go.cloudera.com/the-future-of-data-warehousing
http://www.edureka.co/blog/answering-the-big-question-what-is-big-data/

Monday, February 15, 2016

Dimensional Modelling helps NBTY with Performance Analysis

NBTY can trace its long history as far back as 1870 with the founding of our Holland & Barrett stores in England. Over 145 years later, the company continues to enrich the lives of consumers around the world and proudly stands as the leader in health and wellness by introducing innovative products and solutions to the marketplace. A global company committed to consistently producing the highest-quality products, we offer a wide range of brands that people love and trust across the entire value spectrum. We have a significant presence in virtually every major vitamin, mineral, herb, sports, active nutrition and supplement product category and in multiple key distribution channels. Our brands include Holland & Barrett®, Nature’s Bounty®, Sundown Naturals®, Osteo Bi-Flex®, Solgar®, MET-Rx®, Pure Protein®, Body Fortress®, Balance Bar®, Puritan’s Pride® and many others.

Today NBTY employs over 13,000 associates worldwide. In addition to our Long Island corporate headquarters, NBTY has manufacturing, packaging, warehouse, distribution and administration facilities throughout the United States and Canada. The company also maintains overseas offices in the United Kingdom, China, the Netherlands, Spain, South Africa and New Zealand. In 2010, The Carlyle Group, one of the world’s leading private investment firms, acquired the company. In September 2014, Steve Cahillane was named President and CEO. Under his leadership, the company continues to fulfill its commitment to supporting wellness by creating products that consumers want and making them easily available anywhere they shop.

Steve Cahillane, CEO of NBTY would like to measure and analyze the performance of NBTY so that improvements and enhancements could be made. But performance measurement for a company like NBTY could be in any of the following domains:

  • Quality of care
  • Utilization/Cost/Efficiency
  • Satisfaction
  • Financial performance
  • Inventory
  • Retail sales

For the purpose of this blog, we will consider the Retail Sales process of the NBTY as that is a major dimension for their business. Some of the facts that the CEO would be interested in looking at would be: Sales Quantity, Total sales dollar amount, Total discount dollar amount, and Total Cost dollar amount. He might be interested in these metrics across stores and products. It is certain that their profitability depends on the quantity they sell. Higher the quantity sold, higher is the profit. The CEO might be interested in the discount given to the customers, to see if the sales increase during that period. Again, to answer so as to how the business is performing, one needs to analyze the total cost and sales dollar amounts. These two metrics determine the profitability to the company. Thus I feel these metrics would be really important for measuring the performance.

To keep track of all the metric mentioned above a dimensional model can come in handy. Through a well-defined grain at the atomic level these metrics can be captured accurately and then a roll up can be performed to aggregate the data to obtain the metric for a weekly, monthly or yearly period.

The type of dimension model for NBTY should be periodic. The reason is that it summarizes the metrics/measurements over a standard period like a day, week, month etc. I feel a periodic dimensional model that captures information per week is the best option as it would give the weekly quantity sold, discounted dollar amount, total cost and sales amount.

For NBTY, we can use this dimensional model to represent the Retail sales:


References:

Thursday, February 4, 2016

BUSINESS INTELLIGENCE AND ANALYTICAL TOOLS

Business intelligence tools (BI tools) are a way for companies to monitor data and generate business insights – necessary components in making smarter, better decisions that drive results. But once you start research BI, you realize there are many types, from analytics and big data statistics to reporting tools and dashboards that offer at-a-glance information across indicators.
When choosing the right business intelligence tools for your organization, consider your company, your employees, your departments and teams – and the success factors that drive your decision-making.
Select tools that allow you to visualize and analyze relevant data, combining and eliminating and customizing to generate information that helps you better understand your data. The goal: to make fact-based and insightful decisions that will improve company performance.

1.     SAP BUSINESS INTELLIGENCE
SAP BI is a full-functional tool designed to cater to a diverse set of needs, satisfying the requirements of everyone in your organization – IT professionals, senior management, and end users included. The tool’s robust infrastructure hosts wide-ranging functionality in one integrated platform.
A single, integrated platform pulls together applications and reporting to provide a detailed snapshot of your organization. Visualization makes it easy to understand your data.

2.     ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION
For businesses with serious BI needs, OBIEE is an incredibly powerful business intelligence tool. The Enterprise Edition integrates many of Oracle’s most useful components, including BI Server, BI Answers, BI Interactive Dashboards, BI Delivers, BI Publisher, the MS Office Plug-in, Hyperion Interactive Reporting, Hyperion SQR Production Reporting, Hyperion Financial Reporting, and Hyperion Web Analysis. OBIEE provides comprehensive BI tools that inform and inspire better decisions across the organization. Interactive dashboards make reporting and data visualization simple.

3.     MICROSTRATEGY BUSINESS INTELLIGENCE
MicroStrategy Business Intelligence has one primary goal: leverage data to help organizations find timely, informed answers to any question. Powerful dashboards and data analytics transform your company’s information into easy-to-understand reports designed to improve productivity, boost cost-efficiency, optimize revenue, monitor trends, forecast new opportunities, and fortify client relationships. MicroStrategy runs against data stored in ERP systems (e.g. SAP and Oracle), operational databases, and data warehouses. This software saves data on-site or in the cloud via Amazon Web Services.

4.     QLIKVIEW
QlikView is a user-friendly platform that straddles the gap between tech-savvy BI tools and traditional productivity apps, creating a solution that’s available to all. QlikView’s primary goal is to enable business users to leverage their data to discover new solutions and opportunities, and it does so with a clean and straightforward interface. This self-service tool allows for data analytics, insights and existing data manipulation. Visually appealing dashboards present data in an easy-to-understand format.

5.     TABLEAU
Tableau’s intuitive BI software makes it easy for anyone, regardless of technological know-how, to connect with data and create visual reports. The platform is as simple to use as Excel, but is very feature rich – shareable dashboards, interactive reporting, flexible features, and scalability make for one-click access to any data you need to analyze. This software lets you choose between the on-premise Tableau server or cloud-based Tableau Online. It’s one click reporting gets the answers in seconds.

For the weighted decision matrix, we will be using the following criteria:

1.    Data sourcing: The crux of analysis of any BI tool is its ability of fetching data. Data can be from text files, csv files, databases, servers, other ERP systems, etc. The popularity of the BI tool is majorly based on the number of sources from which it can access data and the ease of connecting the data sources to the BI tool.
2.     Cost per user/business: Cost is an important component in selecting a business tool. Costs are of course relative in nature and vary depending on the size and complexity of the business model of the user or the company using it. Yet, we can score the tool based on the cost-to-features model that the tool proposes.
3.     Filtering and visualization: The user should be able to drill down to the lowest level of granularity of the data they are seeking. This can be achieved using selection criteria, filters, sorting algorithms, and the acumen of the software in understanding the requirement. The next step of data filtering and sorting is to display it in a way that is easily understood to the business in order to make informed decisions. If a tool allows the user to see data in terms of graphs or charts and then allow drilling down, that tool is intuitive.
4.     Reporting: An important requirement of any BI tool is its ability to make useful reports. Reports should be customizable based on user selected criteria and the tool should allow exporting reports in different formats. The reports should be easy to read and should provide graphical outputs along with the basic rows and columns matrix. The tool should be able to export these reports onto everyday products like Microsoft Excel, Word, PDF, etc.
5.     Deployment: The BI tool in question should be compatible with a variety of operating systems as well as enterprise level ERP systems like SAP, Oracle, PeopleSoft, etc. It should be easy for the users to integrate the tool into their current systems without having to go through a major shift in the business process.

Based on these criteria, we now use the weighted analysis model to determine the standing of each of the BI tools described above.

For reference, we would be using the weighted analysis model on these BI tools:

2.      MICROSTRATEGY
4.      QLIKVIEW
5.      TABLEAU










From the above matrix, it is evident that OBIEE ranks higher than all the other tools in terms of the criteria we applied to it. Closely ranked are Tableau and SAP BI which have similar functionality and have been used extensively in various industries across the globe.

Sources: