I have had a property in Baltimore a few years and it’s never had too much trouble. However, the winter of 2015 the pipes froze and the house was without water for a couple of weeks which caused a major headache… To find out if this winter (Jan – Mar) was indeed significantly colder than recent winters I chose to use the data available from the ‘weatherData’ package for R.

Get data using weatherData package, range to be considered limited to 2010 thru 2015 only:

<pre class="r"><code class="r"><span class="identifier">d</span> <span class="operator">&lt;-</span> <span class="identifier">multiYear</span><span class="paren">(</span><span class="string">"BWI"</span>, <span class="identifier">c</span><span class="paren">(</span><span class="number">2010</span><span class="operator">:</span><span class="number">2015</span><span class="paren">)</span><span class="paren">)</span> <span class="comment">## or d &lt;- dget("bwidata.R")</span>
<span class="identifier">names</span><span class="paren">(</span><span class="identifier">d</span><span class="paren">)</span> <span class="operator">&lt;-</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"date"</span>, <span class="string">"maxT"</span>, <span class="string">"meanT"</span>, <span class="string">"minT"</span><span class="paren">)</span>
<span class="identifier">d</span><span class="operator">$</span><span class="identifier">year</span> <span class="operator">&lt;-</span> <span class="identifier">strftime</span><span class="paren">(</span><span class="identifier">d</span><span class="operator">$</span><span class="identifier">date</span>, <span class="identifier">format</span> <span class="operator">=</span> <span class="string">"%Y"</span><span class="paren">)</span>
<span class="identifier">d</span><span class="operator">$</span><span class="identifier">month</span> <span class="operator">&lt;-</span> <span class="identifier">month</span><span class="paren">(</span><span class="identifier">d</span><span class="operator">$</span><span class="identifier">date</span>, <span class="identifier">label</span> <span class="operator">=</span> <span class="literal">T</span><span class="paren">)</span></code></pre>

Let’s see what the data looks like for winter (Jan, Feb and Mar) for the minimum temperature with boxplot:

<pre class="r"><code class="r"><span class="comment"># boxplots</span>
<span class="identifier">d2</span> <span class="operator"><-</span> <span class="identifier">subset</span><span class="paren">(</span><span class="identifier">d</span>, <span class="identifier">month</span> <span class="operator">==</span> <span class="string">"Jan"</span> <span class="operator">|</span> <span class="identifier">month</span> <span class="operator">==</span> <span class="string">"Feb"</span> <span class="operator">|</span> <span class="identifier">month</span> <span class="operator">==</span> <span class="string">"Mar"</span><span class="paren">)</span>
<span class="identifier">g</span> <span class="operator"><-</span> <span class="identifier">ggplot</span><span class="paren">(</span><span class="identifier">data</span> <span class="operator">=</span> <span class="identifier">d2</span>, <span class="identifier">aes</span><span class="paren">(</span><span class="identifier">x</span> <span class="operator">=</span> <span class="identifier">factor</span><span class="paren">(</span><span class="identifier">year</span><span class="paren">)</span>, <span class="identifier">y</span> <span class="operator">=</span> <span class="identifier">minT</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">g</span> <span class="operator">+</span> <span class="identifier">geom_boxplot</span><span class="paren">(</span><span class="paren">)</span> <span class="operator">+</span> <span class="identifier">facet_grid</span><span class="paren">(</span>.<span class="operator">~</span><span class="identifier">month</span><span class="paren">)</span> <span class="operator">+</span> <span class="identifier">xlab</span><span class="paren">(</span><span class="string">"Year"</span><span class="paren">)</span> <span class="operator">+</span> <span class="identifier">ylab</span><span class="paren">(</span><span class="string">"Minimum Temperature (F)"</span><span class="paren">)
</span></code>

temp boxplot.png

For those of you unfamiliar with boxplots, here’s a short description from Wikipedia:

Box and whisker plots are uniform in their use of the box: the bottom and top of the box are always the first and third quartiles, and the band inside the box is always the second quartile (the median).

Alternatively, density distribution can also show variation in temperature across the years:


<pre class="r"><code class="r"><span class="comment"># distribution comparison</span>
<span class="identifier">g</span> <span class="operator"><-</span> <span class="identifier">ggplot</span><span class="paren">(</span><span class="identifier">data</span> <span class="operator">=</span> <span class="identifier">d2</span>, <span class="identifier">aes</span><span class="paren">(</span><span class="identifier">x</span> <span class="operator">=</span> <span class="identifier">minT</span>, <span class="identifier">group</span> <span class="operator">=</span> <span class="identifier">year</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">g</span> <span class="operator">+</span> <span class="identifier">geom_density</span><span class="paren">(</span><span class="identifier">aes</span><span class="paren">(</span><span class="identifier">color</span> <span class="operator">=</span> <span class="identifier">year</span><span class="paren">)</span>, <span class="identifier">size</span> <span class="operator">=</span> <span class="number">1.5</span><span class="paren">)</span> <span class="operator">+</span> <span class="identifier">facet_wrap</span><span class="paren">(</span><span class="operator">~</span><span class="identifier">month</span><span class="paren">)</span> <span class="operator">+</span> <span class="identifier">xlab</span><span class="paren">(</span><span class="string">"Minimum Temperature (F)"</span><span class="paren">)</span></code></pre>

temp distribution.png

Density function being:

probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value.

Both the boxplot and distribution showed that in 2015 we experienced colder weather in Feb (although Jan of 2014 was also pretty cold).

So was 2015 winter really colder than 2014? Let’s test our hypothesis for each winter month with significance level of 5%:

H_0: average minimum temperature was the same for 2014 and 2015
H_a: average minimum temperature for 2015 < 2014

<pre class="r"><code class="r"><span class="identifier">d2014</span> <span class="operator"><-</span> <span class="identifier">subset</span><span class="paren">(</span><span class="identifier">d</span>, <span class="identifier">year</span> <span class="operator">==</span> <span class="string">"2014"</span><span class="paren">)</span>
<span class="identifier">d2015</span> <span class="operator"><-</span> <span class="identifier">subset</span><span class="paren">(</span><span class="identifier">d</span>, <span class="identifier">year</span> <span class="operator">==</span> <span class="string">"2015"</span><span class="paren">)</span>
<span class="identifier">m</span> <span class="operator"><-</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"Jan"</span>, <span class="string">"Feb"</span>, <span class="string">"Mar"</span><span class="paren">)</span>
<span class="keyword">for</span><span class="paren">(</span><span class="identifier">i</span> <span class="keyword">in</span> <span class="number">1</span><span class="operator">:</span><span class="identifier">length</span><span class="paren">(</span><span class="identifier">m</span><span class="paren">)</span><span class="paren">)</span> <span class="paren">{</span>
    <span class="identifier">x</span> <span class="operator"><-</span> <span class="identifier">subset</span><span class="paren">(</span><span class="identifier">d2015</span>, <span class="identifier">month</span> <span class="operator">==</span> <span class="identifier">m</span><span class="paren">[</span><span class="identifier">i</span><span class="paren">]</span><span class="paren">)</span>
    <span class="identifier">y</span> <span class="operator"><-</span> <span class="identifier">subset</span><span class="paren">(</span><span class="identifier">d2014</span>, <span class="identifier">month</span> <span class="operator">==</span> <span class="identifier">m</span><span class="paren">[</span><span class="identifier">i</span><span class="paren">]</span><span class="paren">)</span>
    <span class="identifier">print</span><span class="paren">(</span><span class="identifier">paste</span><span class="paren">(</span><span class="string">"For the month of "</span>, <span class="identifier">m</span><span class="paren">[</span><span class="identifier">i</span><span class="paren">]</span><span class="paren">)</span><span class="paren">)</span>
    <span class="identifier">print</span><span class="paren">(</span><span class="identifier">t.test</span><span class="paren">(</span><span class="identifier">x</span><span class="operator">$</span><span class="identifier">minT</span>, <span class="identifier">y</span><span class="operator">$</span><span class="identifier">minT</span>, <span class="identifier">alternative</span> <span class="operator">=</span> <span class="string">"less"</span>, <span class="identifier">conf.level</span> <span class="operator">=</span> <span class="number">0.95</span><span class="paren">)</span><span class="paren">)</span>
<span class="paren">}</span></code></pre>

Returned output:

<pre><code>## [1] "For the month of  Jan"
## 
##  Welch Two Sample t-test
## 
## data:  x$minT and y$minT
## t = 2.3575, df = 53.343, p-value = 0.989
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 9.101768
## sample estimates:
## mean of x mean of y 
##  23.19355  17.87097 
## 
## [1] "For the month of  Feb"
## 
##  Welch Two Sample t-test
## 
## data:  x$minT and y$minT
## t = -3.9832, df = 48.907, p-value = 0.0001128
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -4.922149
## sample estimates:
## mean of x mean of y 
##  15.57143  24.07143 
## 
## [1] "For the month of  Mar"
## 
##  Welch Two Sample t-test
## 
## data:  x$minT and y$minT
## t = 0.515, df = 58.037, p-value = 0.6957
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 5.204443
## sample estimates:
## mean of x mean of y 
##  29.25806  28.03226</code></pre>

Based on the p-value, the only month where the null hypothesis was rejected was Feb – which makes sense, that is the month where the pipes froze after all…

Advertisements