Correlation, Simple Linear Regression

In this paper, we perform a linear regression analysis on previously collected data related to the number of daily e-mails received (R) and sent (S) by a particular user (the author). We have depicted the original daily e-mail data as a time series, incrementing N. By 1 for each day’s measurement. The computed regression coefficient r is the slope of the regression line.

A time series consists of sequencing successive data points at uniform time intervals. As such, this exercise represents a meaningful statistical analysis to determine whether a natural temporal ordering is inherent in the data. It should be recalled that the collection of data for each of R. And S. consisted of 15 daily samples, which were collected during two exercises spanning 10 and 5 days respectively. This factor will be noted in the analysis to follow. Table 1 illustrates the predicted values and subsequent regression analysis for the e-mails received (R). Table 2 illustrates the predicted values and subsequent regression analysis for the e-mails sent (S).

This section summarizes the regression analysis for e-mails received (R).

Time Series (N)

E-mails Received (R)

Predicted Value

1

2.581

2

0.147

3

6.098

4

2.311

5

97

6.909

6

72

10.290

7

81

9.073

8

77

9.614

9

87

8.261

10

93

7.450

11

56

12.454

12

67

10.966

13

70

10.561

14

61

11.778

15

63

11.507

Table 1: Predicted Values for R.

Based on the online analysis using the tools by Waner et. al. (1999), the linear regression equation, the regression coefficient r, and the resulting graphical portrayal of the time series for R. are given by:

y = f (R) = -0.135243 x + 20.0276

r = -0.843775

Figure 1: linear regression for R, all 15 days

Analysis of the results for parameter R. display a distinct downward trend over time, and the best-fit regression line does appear to visually correspond to the distribution of data. This may therefore be considered a credible interpretation of the trend of received e-mails (R), over a period of 15 consecutive days.

This section summarizes the regression analysis for e-mails sent (S).

Time Series (N)

E-mails Sent (S)

Predicted Value

1

68

7.771

2

72

8.280

3

64

7.263

4

55

6.118

5

43

4.592

6

49

5.355

7

52

5.737

8

55

6.118

9

46

4.974

10

62

7.008

11

98

11.586

12

12.730

13

11.967…