The Binomial Formula with T-SQL and R

In a previous post I explained the basics of probability. In this post I will use some of those principles to see how to solve certain problems. I will pick a very simple problem that I found in a statistics textbook. Suppose I have 7 friends who are smokers. The probability that a random smoker will develop a lung condition is 0.3. What is the probability that a maximum of 2 of them will develop a severe lung condition? To apply the binomial formula for this problem – I need the following conditions to be met:

1 The trials are independent
2 The number of trials, n is fixed.
3 Each trial outcome can be a success or a failure
4 The probability of success in each case is the same.

Applying these rules –
1 The 7 smoking friends are not related or from the same group. (This is important as one friend’s habits can influence another and that does not make for an independent trial).
2 They smoke approximately at the same rate.
3 Either they get a lung disease or they don’t. We are not considering other issues they may have because of smoking.
4 Since all these conditions are met, the probability of each of them getting a lung disease is more or less the same.

The binomial formula given that
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of a success on an individual trial
n = number of trials
q= 1 – p – is as below:

prob

For those not math savvy the ! stands for factorial of a number. In above example n equals 7. x, The number of ‘successes’ (morbid, i know, to define a lung condition as a success but just an example) we are looking for is 2. p is given to be 0. and q is 1 – 0.3 which is 0.7.  Now, given the rules of probability – we need to add probability of 0 or none having a lung condition, 1 person having a lung condition and 2 having a lung condition – to see what is the probability of a maximum of 2 having a lung condition. Let us look at doing this with T-SQL first, then with R and then calling the R script from within T-SQL.

1 Using T-SQL:

There are a lot of different ways to write the simple code of calculating factorial. I found this one to be most handy and reused it. I created the user defined function as ‘factorial’ and used the same code below to calculate probabilities of 0.1 or 2 people getting a lung illness. If we add the 3 together we get the total probability of the maximum of 2 people getting a lung illness – which is about 0.65 or 65 %.

DECLARE @n decimal(10,2), @x decimal(10, 2), @p decimal(10, 2), @q decimal(10, 2)
DECLARE @p0 decimal(10, 2), @p1 decimal(10, 2), @p2 decimal(10, 2), @n1 decimal(10, 2), @n2 decimal(10, 2), @n3 decimal(10, 2)
SELECT @n = 7, @x = 0, @p = 0.3,@q=0.7
SELECT @x = 0
SELECT @n1 = dbo.factorial(@n) 
SELECT @n2 = dbo.factorial(@n-@x)
SELECT @n3 = 1

SELECT @p1 = ( @n1/(@n2 * @n3))*power(@p, @x)*power(@q,@n-@x)
select @p1 as 'Probability of 0 people getting lung illness'

SELECT @x = 1
SELECT @p1 = ( @n/@x)*power(@p, @x)*power(@q,@n-@x)
select @p1 as 'Probability of 1 person getting lung illness'

SELECT @x = 2
SELECT @n1 = dbo.factorial(@n) 
SELECT @n2 = dbo.factorial(@n-@x)
SELECT @n3 = dbo.factorial(@x)
SELECT @p2 = ( @n1/(@n2 * @n3))*power(@p, @x)*power(@q,@n-@x)
select @p2 as 'Probability of 2 people getting lung illness'




Results are as below:

prob1

2 Using R:

The R function for this is seriously simple, one line call as below.

dbinom(0:2, size=7, prob=0.3)

My results, almost exactly the same as what we got with T-SQL.

prob2

3 Calling R from T-SQL:

Instead of writing all that code i can simply call this function from with TSQL –

EXEC sp_execute_external_script
 @language = N'R'
 ,@script = N'x<-dbinom(0:2, size=7, prob=0.3);
 print(x);'
 ,@input_data_1 = N'SELECT 1;'

Results as below:

prob3

It is a LOT of fun to get our numbers to tie in more than one way. Thanks for reading.

 

 

4 Comments

  1. Oded Dror says:

    Mala,

    Nice article, can you please provide also the factorial that you use in your code?

    Thanks,
    Oded Dror

    Like

    1. diligentdba says:

      Hi , thank you for your response..i used the code from this link http://sql-server-2012-tutorials.blogspot.com/2013/04/factorial-function-in-sql-server.html – i just called the udf ‘factorial’.

      Like

      1. Oded Dror says:

        Mala,

        Thank you

        Oded Dror

        Like

  2. Cyrus Muchiri says:

    Thanks for the article.

    Like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.