Statistics with R and SQL

Box-and-whisker plot and data patterns with R and T-SQL

R is particularly good with drawing graphs with data. Some graphs are familiar to most DBAs as it has been things we have seen and used over time – bar charts, pie diagram and so on. Some are not. Understanding exploratory graphics is vitally important to the R programmer/data science newbie. This week I wanted… Continue reading Box-and-whisker plot and data patterns with R and T-SQL

Statistics with R and SQL

Confidence Intervals for a proportion – using R

What is the difference between reading numbers as they are presented, and interpreting them in a mature, deeper way? One way perhaps to look at the latter is what statisticians call ‘confidence interval’. Suppose I look at a sampling of 100 americans who are asked if they approve of the job the supreme court is… Continue reading Confidence Intervals for a proportion – using R

Statistics with R and SQL

Normal approximation to binomial distribution using T-SQL and R

In the previous post I demonstrated the use of binomial formula to calculate probabilities of events occurring in certain situations. In this post am going to explore the same situation with a bigger sample set. Let us assume, for example, that instead of 7 smokers we had 100 smokers. We want to know what are… Continue reading Normal approximation to binomial distribution using T-SQL and R

DBA · Statistics with R and SQL

Sampling Distribution and Central Limit Theorem

In this post am going to explain (in highly simplified terms) two very important statistical concepts – the sampling distribution and central limit  theorem. The sampling distribution is the distribution of means collected from random samples taken from a population. So, for example, if i have a population of life expectancies around the globe. I draw… Continue reading Sampling Distribution and Central Limit Theorem