Mattermost Lunch Channel History¶
- Data Source: Mattermost API, CCTB instance
- Tasks:
- Part I - June 2024: retrieving chat history data through the mattermost API
- Part II - September 2024: analyzing messages in the lunch channel
- Part III - September 2024: specific tasks
- Language: python
Select one of the following tasks¶
General comment: your estimate in step 1 does not need to be perfect, settle for a heuristic that is good enough
Task A - most crowded day of the week¶
- estimate the total number of people having lunch (or coffee) at the CCTB/mensa for each day → when was the time that most people went to lunch?
- plot the number of people per day over time (also try to summarize by week/month/year)
- plot a boxplot for the number of people per day of the week → what is the most crowded day of the week?
- make the same plot as above, but separately for every month/year → is there a shift in day of the week preference?
- perform a statistical test for the hypothesis: "Mondays and Fridays are less crowded than Tuesday to Thursday"
- discuss caveats of the data and methods used
Task B - lunch time¶
- estimate the time of lunch/coffee for each day → when was the most popular time?
- try to consider proposed times ("mensa at 12?", "11:15?")
- direct calls ("mensa?", "now")
- relative times ("lunch in 5min", "mensa in half an hour?")
- plot the lunch time over the years (also try to summarize by week/month/year) → is there a trend (gradual shift or break point(s)) in lunch time?
- plot a boxplot for the lunch time per day of the week → is there a difference in lunch time per day of the week?
- make the same plot as above, but separately for every month/year → is the pattern above consistent over the year(s)?
- perform a statistical test for the hypothesis: "Lunch time is later during semester break (April,May,August,September) than during lecture period since 2022"
- discuss caveats of the data and methods used
Task C - your own idea¶
If you have other ideas, feel free to follow them, but create a plan similar to that for Task A and B above, before you start.
In [2]:
import numpy as np
In [3]:
np.random.seed(42)
" → ".join(np.random.permutation("AJ Magdalena Felix Jannik Robin".split()))
Out[3]:
'Magdalena → Robin → Felix → AJ → Jannik'
In [44]:
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
In [45]:
messages = pd.read_csv('messages.csv')
reactions = pd.read_csv('reactions.csv')
files = pd.read_csv('files.csv')
In [48]:
date = pd.to_datetime(messages['create_at'])
date["day_of_week"] =date[1].dt.day_name()
date["hour"] =date[1].dt.hour
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /tmp/ipykernel_669/1856756113.py in <cell line: 3>() 1 date = pd.to_datetime(messages['create_at']) 2 ----> 3 date["day_of_week"] =date[1].dt.day_name() 4 5 date["hour"] =date[1].dt.hour() AttributeError: 'Timestamp' object has no attribute 'dt'
In [31]:
date=date.datetime.weekday()
date
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /tmp/ipykernel_669/563424764.py in <cell line: 1>() ----> 1 date=date.datetime.weekday() 2 date /usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in __getattr__(self, name) 6202 ): 6203 return self[name] -> 6204 return object.__getattribute__(self, name) 6205 6206 @final AttributeError: 'Series' object has no attribute 'datetime'
In [25]:
File "/tmp/ipykernel_669/4294287132.py", line 1 date[,0] ^ SyntaxError: invalid syntax
In [60]:
import seaborn as sns
import matplotlib
messages = pd.read_csv('messages.csv')
messages
# Convert 'create_at' to datetime
messages['create_at'] = pd.to_datetime(messages['create_at'])
messages
# Extract day of the week and hour
messages['day_of_week'] = messages['create_at'].dt.day_name()
messages['hour'] = messages['create_at'].dt.hour
messages
# Group by day of the week and hour to get message density
heatmap_data = messages.groupby(['day_of_week', 'hour']).size().unstack(fill_value=0)
# Reorder days to start from Monday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
heatmap_data = heatmap_data.reindex(day_order)
# Plotting the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(heatmap_data, cmap='Blues', annot=False)
plt.title('Message Density by Day of Week and Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Day of Week')
# Show the plot
plt.show()
Out[60]:
In [63]:
messages
# Group by day of the week and hour to get message density
user_heatmap_data = messages.groupby(['day_of_week', 'username']).size().unstack(fill_value=0)
user_heatmap_data
# Reorder days to start from Monday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
user_heatmap_data = user_heatmap_data .reindex(day_order)
# Plotting the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(user_heatmap_data , cmap='Blues', annot=False)
plt.title('Message Density by Day of Week and username')
plt.xlabel('username')
plt.ylabel('Day of Week')
# Show the plot
plt.show()
Out[63]:
In [0]:
messages
# Group by day of the week and hour to get message density
user_heatmap_data = messages[:.groupby(['day_of_week', 'username']).size().unstack(fill_value=0)
user_heatmap_data
# Reorder days to start from Monday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
user_heatmap_data = user_heatmap_data .reindex(day_order)
# Plotting the heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(user_heatmap_data , cmap='Blues', annot=False)
plt.title('Message Density by Day of Week and username')
plt.xlabel('username')
plt.ylabel('Day of Week')
# Show the plot
plt.show()
In [83]:
filtered_messages = messages[messages['create_at'].dt.year>2022]
#filtered_messages['create_at_date'] = filtered_messages['create_at'].dt.date()
filtered_messages["create_at"] = fltered_messages
Out[83]:
pandas.core.series.Series
In [0]: