some issues in chapter 5 #7

fakecv · 2017-01-16T10:51:24Z

In the code "mails_by_day_of_week.r"

inbox_count <- dates_count(dates=inbox_data['date'], element='%a')
sent_count <- dates_count(dates=sent_data['date'], element='%a')

days_of_week <- c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")

I think you should use %u instead of %a, otherwise the frequency will sort literally as

> test <- function(dates,element) {
+  dates <- as.Date(as.vector(as.matrix(dates)),"%Y-%m-%dT%H:%M:%S")
+ elements <- format(dates, element)
+ data.frame(table(elements))
+ }
> inbox_test <- test(dates=inbox_data['date'], element='%a')
> inbox_test
  elements Freq
1      Fri 1983
2      Mon 1568
3      Sat  142
4      Sun  360
5      Thu 1845
6      Tue 1776
7      Wed 1940

not from Monday to Sunday as the sequence of vector days_of_week.

So the surprising conclusion in the book doesn't exist, email count will reach the low point in the weekend instead of middle of the week. ( Sat/Son sit in positions of Wed/Thu)

this also happened as same in "mails_by_month.r" (use %m instead of %b)

when read from .csv file, an addition option "quote='' " will be better, because some single quote appeared in email address( just like european.vp's(AT)enron.com, nicholas.o'day(AT)enron.com)

-inbox_data <- read.table("inbox_data_enron.csv", header=TRUE, sep=",")
+inbox_data <- read.table("inbox_data_enron.csv", header=TRUE, sep=",",comment.char='',quote='')

I like your book, and things like this even make more fun. Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some issues in chapter 5 #7

some issues in chapter 5 #7

fakecv commented Jan 16, 2017 •

edited

Loading

some issues in chapter 5 #7

some issues in chapter 5 #7

Comments

fakecv commented Jan 16, 2017 • edited Loading

fakecv commented Jan 16, 2017 •

edited

Loading