Statistics with R and SQL

# Basics of Probability

In this post am going to introduce into some of the basic principles of probability – and use it in other posts going forward. Quite a number of people would have learned these things in high school math and then forgotten – I personally needed a refresher. These concepts are very useful if we can keep them… Continue reading Basics of Probability

Statistics with R and SQL

# Generating Frequency Table

This week’s blog post is rather simple. One of the main characteristics of a data set involving classes, or discrete variables – are frequencies. The number of times each data element or class is observed is called its frequency. A table that displays the discrete variable and number of times it occurs in the data… Continue reading Generating Frequency Table

Statistics with R and SQL

# The Empirical Rule

I am resuming technical blogging after a gap of nearly a month. I will continue to blog my re learning of statistics and basic concepts, and illustrate them to the best of my ability using R and T-SQL where appropriate. For this week I have chosen a statistical concept called ‘Empirical Rule’. The empirical rule is… Continue reading The Empirical Rule

Statistics with R and SQL

# Multivariate Variable Analysis using R

So far I’ve worked on simple analytical techniques using one or two variables in a dataset. This article is a sort of a summary – about various techniques we can use for such datasets, depending on the type of variable in question. The techniques include – how to get summary statistics out as relevant, and… Continue reading Multivariate Variable Analysis using R

Statistics with R and SQL

# Associative Analytics: Two sample T Test

In the previous post we looked at a one way T-Test. A one way T Test helped us determine if a selected sample was indeed truly representative of the larger population. A Two way T Test goes a step further – it helps us determine if both samples came from the same population, or if… Continue reading Associative Analytics: Two sample T Test

Statistics with R and SQL

# Statistics with TSQL and R: Chi Square Test

As I move on from descriptive and  largely univariate (one variable based) analysis of data into more multivariate data – one of the first data analysis tests that came to mind is the Chi Square Test. It is a very commonly used test to understand relationships between two variables that are largely categorical in nature.… Continue reading Statistics with TSQL and R: Chi Square Test

Statistics with R and SQL

# Statistics with T-SQL and R – the Pearson’s Correlation Coefficient

In this post I will attempt to explore calculation of a very basic statistic based on linear relationship between two variables. That is, a number that tells you if two numeric variables in a dataset are possibly correlated and if yes, by what degree. The Pearson’s coefficient is a number that attempts to measure this… Continue reading Statistics with T-SQL and R – the Pearson’s Correlation Coefficient

# Descriptive Statistics with SQL and R – II

In the previous post I looked into some very basic and common measures of descriptive statistics – mean, median and mode, and how to derive these using T-SQL, R as well as a combo of the two in SQL Server 2016. These measures also called measures of ‘Central Tendency‘. In this post am going to… Continue reading Descriptive Statistics with SQL and R – II

Statistics with R and SQL

# Script to create demo database and load data for statistics and R

Make sure you have a working install of SQL Server 2016. The size of the database is only 8 MB. USE [master] GO /****** Object: Database [WorldHealth] Script Date: 7/15/2016 4:44:58 PM ******/ CREATE DATABASE [WorldHealth] CONTAINMENT = NONE ON PRIMARY ( NAME = N’WorldHealth’, FILENAME = N’D:\DATA\WorldHealth.mdf’ , SIZE = 8192KB , MAXSIZE =… Continue reading Script to create demo database and load data for statistics and R