Table of Contents
Contents at a Glance
Contents
About the Authors
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Business Intelligence for Everybody
What Is Business Intelligence?
BI Evolution
From Strategic to Tactical
Big Data
Internet of Things
BI Characteristics
Hidden Relevant Information
Accuracy
Significant KPIs
On Time
Company Analytics Life Cycle and Continuous Improvement
Benefits of BI
Direct Benefits
Indirect Benefits
Objectives
Who Can Get Benefit from BI?
BI Platform Components
Source ERP
Database
ETL
Front-End Tool
Budgeting Tool
BI Platform Location
BI Concepts
Datawarehouse
DataMart
Logical Model
Relational Model
Normalized Model
Denormalized Model
Star Model
Snowflake Model
Physical Model
BI Main Objects
Project
Tables
Facts
Dimensions
Reports
BI Approaches
Query and Reporting
Information Sharing
Dashboarding
Data Import
Data Discovery
MOLAP
Data Mining
Incoming Approaches
BI Capabilities
Drilling
Pivoting
Visualizations
Map Visualizations
Sorting
Grouping
Filtering
Conditional Expressions
Subtotals
Administration
User Interface
Conclusion
Chapter 2: Agile Methodologies for BI Projects
Introduction to Agile Methodologies
Agile Approaches
Our Recommended Mix between Scrum and Kanban
Developing Projects with Scrum
Roles
Product Owner
Scrum Master
Development Team
Stakeholders
Users
Product Owner Assistant
Sprint
Specific Scrum Meetings
Release
Artifacts Used in Scrum
User Story
Developer Story
Product Backlog
Definition of Done
When to Start - Ready Definition
Product Backlog Burndown Chart
Sprint Backlog
Sprint Backlog Burndown Chart
Impediment List
Improvement List
Limiting Spent Time with Timeboxing
Spike
Maintenance with Kanban
Kanban Concepts
Task Workflow
Task Board and Visual Management
Work In Progress
Lead Time
Mix of Both Methodologies, Scrumban
Peculiarities of Scrum for BI
Sprint 0 - Longer Initial Analysis
BI Project Segmentation
Front-End vs. Back-End Activities
Role Isolation - Specific Knowledge
Developer Story Types in BI
Data Modeling Developer Stories
ETL Developer Stories
BI Model Developer Stories
Report and Dashboarding Developer Stories
MOLAP Developer Stories
Agile Management Tools
Trello
JIRA Software
Conclusion
Chapter 3: SQL Basics
What Is SQL?
The Relational Model
Databases and Database Vendors
ACID Compliance
Types of SQL Statements
SQL Datatypes
Numeric Datatypes
Text Datatypes
Date Datatypes
Other Datatypes
Retrieving Data from a Table
The select * Statement
The select column Statement
The select count (*) or count (column) Statement
The select distinct Clause
Sorting
Filtering
The Where Clause
Operators
Logical Operators
Grouping Data
Using Subqueries
Joining Tables
Types of Joins
Cartesian Join
Inner Join
Outer Join
Set Operations
Union and Union All Operators
The Intersect Operator
The Minus Operator
Working with Dates
The SYSDATE, NOW(), CURRENT_DATE and GETDATE() Functions
Adding Days to a Date
Conditional Expressions
The Case Expression
The Decode() or IF() Expressions
Conclusion
Chapter 4: Project Initialization – Database and Source ERP Installation
The Need for Data
Setting the System Up with Odoo ERP
The Bitnami Odoo Package
Downloading and Installing Odoo
Bitnami and Odoo Configuration Files
Installing psql and Checking the Database Connection
Accessing the Application
Configuring and Installing Modules
Selecting our Datawarehouse Database
Sourcing and Installing MariaDB
Windows Installation
Linux Installation
Installation in Ubuntu
Installation of MariaDB in centos
Solving Connectivity Issues
Creating Our First Database
Show Databases and Tables Commands
Creating Users for the Two New Databases
Grant Permission to Databases
Analyzing the Data Source
Inspecting our Model
Setting up a PostgreSQL Connection to the Server
Conclusion
Chapter 5: Data Modeling for BI Solutions
Naming Convention and Nomenclature
Modeling Steps
Business Model
Logical Model
Dimensional Model
Physical Model
Data Load Strategies
Defining our Model
Sales Dimension
Status Table
Currency Dimension
Customer Dimension
Employee Dimension
Product Dimension
Time Dimension
Exploring Data Modeling Possibilities
View Layer
Materialized Views
Data Split
Fact Normalization and Denormalization
Time Transformations
Fact Phases
Real vs. Potential
Historical Hierarchies
Multiple Views Using a Table, Entity Isolation
Modifying Existing Structures, Adding More Sales Document Status
Currency Transformation
Geographical Information
Data Modeling Tools
Erwin DataModeler
MySQL Workbench
Preparing the ETL
Source Systems
Source Tables
Source Fields
Conclusion
Chapter 6: ETL Basics
Why Do We Need an ETL Process?
Details of the Solution
Open Source ETL Suites
Downloading and Installing Pentaho Data Integration
Understanding ETL Concepts
Repositories and Connections
Transformations, the Core of Kettle
Jobs or How to Organize a Set of Transformations in a Workflow
Create and Share a Connection
The Global Picture
The Product, Product Category, and Product Parent Category Tables
The Customer and Customer Country Tables
The Employee and Employee Category and Employee Department Tables
The Fact Table: How to Create the Transformation for the Sales
Creating the Time Dimension
Generate as Many Days as We Need to Store
Using the Calculator to Compute Some Other Fields
Calculating More Complex Fields and Dropping Unneeded Ones
Doing Advanced Manipulations
The formula step
The Modified JavaScript Value step
Fixing a few things
The last step
Connecting All of It Up
The Time Tables
The Product Tables
The Employee and Customer Tables
The Sales Table: t_f_sales
Designing the Job
Open Source Alternatives to PDI
Conclusion
Chapter 7: Performance Improvements
Database Optimizations
Avoid Using Dates for Join Keys (and Primary Keys Too)
Analyze the Tables in Your Database
Indexing, or How to Speed Up Queries
Standard or B+ TREE Indexes
Bitmap Indexes
FULLTEXT Indexes
Partitioning
Partitioning Types
Range Partitioning
List Partitioning
Partitioning Considerations
Using the EXPLAIN Sentence
Views and Materialized Views
HINTS
Denormalization
Disabling Triggers and Constraints
ETL Optimizations
Offloading Operations to the Database
Check the Network Links
Performance Tips with PDI
Increase the Number of Rows in the Rowset
Parallelize Transformations
Bulk Loads and Batch Updates
Ordering in Merge Operations
Conclusion
Chapter 8: The BI Reporting Interface
How to Choose the BI Tool
Best Practices in Dashboarding
Starting from Top Left
Joining Related Information
Focus on Relevant Data
Formatting Recommendations
Corporative Integration
Related Data Alignment
Conditional Formatting
Intensity vs. Color
Visibility of Data
Graphic Usages
Pie Chart – Percentage Contribution for a Few Values
Stacked Bar Graph – Percentage Comparison
Horizontal Bars – Top N Elements Comparison
Vertical Lines, Bars, or Areas – Time Evolution
Vertical Bars – Histogram
Bubble Chart – Correlation
BI Tools
Microstrategy Desktop
Microsoft Power BI
Qlik Sense
Conclusion
Chapter 9: MOLAP Tools for Budgeting
Multidimensional Databases
Dimensions or Axes for Analysis
Dimension Members
Sharing Properties across Cubes
MDX Language
MOLAP Data Interfaces
Data Import
Source Data Preparation
Data Export
Calculations
PowerOLAP
Starting with PowerOLAP
Creating Dimensions in PowerOLAP
Cube Definition
Slices
Formulas
Caching
Data Import and Export
Conclusion
Chapter 10: BI Process Scheduling: How to Orchestrate and Update Running Processes
Finishing the ETL
Creating the Job in PDI
Overview at PDI Command-Line Tools
Launching Transformations from the Command Line with Pan
Launching Jobs from the Command Line with Kitchen
Scheduling Jobs in the Task Scheduler
Scheduling a PDI Job in Windows
Scheduling a PDI Job in Unix/Linux
Running Database Maintenance Tasks
Backing Up Our MySQL/Maria DB Database
Perform Checks and Optimizations in the Database
Conclusion
Chapter 11: Moving to a Production Environment
Multienvironment Scenario
Deploying a Productive Environment
Server Capacity Plan
Database server capacity planning
ETL server capacity planning
BI server capacity planning
MOLAP server capacity planning
Licensing and Support Costs
Adding Environments
Isolation of Environments
Multienvironment Recommendations
Parameterizing
Server Paths
Naming Alignment for Connectivity Objects
Maintaining Your Environment
Object Life Cycle
Transport Procedure
Transport windows
Transport Automation
Testing Procedure
Test automation
Monitoring Tools
High Availability
Using Testing as Disaster Recovery of Production
Backing Up Your Platform
Historification Process
Security
Security Management, Using Roles and Groups
User Role Definition
Security Matrix
Auditing
Security Audits
Auditing Best Practices
Conclusion
Chapter 12: Moving BI Processes to the Cloud
Deciding our Cloud Provider
Initial Considerations
Sizing of the Platform: Performance and Scalability
The Physical Design
Security
Data Location
Health Checks
SLA and Support Plans
Calculate the Running Cost of the Platform
A First Look at the Physical Design
Choosing the Right Cloud Provider
Amazon Web Services (AWS)
Implementing the dev Environment in AWS
Launching an EC2 instance
Connecting to our EC2 instance
Launching a RDS Database
Calculating Prices with the AWS Calculator
Having a Go with AWS Quicksight
Using Software from the AWS Marketplace
Microsoft Azure
Creating a Machine in Azure
Creating a Database in Azure
Google Cloud
Vendor-Based Cloud Solutions
Conclusion
Chapter 13: Conclusions and Next Steps
Documentation
BPM Software
Big Data
Index