Mattermost Lunch Channel History¶
- Data Source: Mattermost API, CCTB instance
- Tasks:
- Part I - June 2024: retrieving chat history data through the mattermost API
- Part II - September 2024: analyzing messages in the lunch channel
- Language: python
Potential questions¶
- At what times is usually posted in the lunch channel
- Extract "the main lunch message" (MLM)
- How many reactions does MLM retrieve
- Is there a correlation by day of week
- Is there a trend over time
- Who posts MLM most often?
- Who attends most often without posting MLM?
- For an individal user get the attendance over time
- …
In [4]:
import numpy as np
In [5]:
np.random.seed(42)
" → ".join(np.random.permutation("Dominik Magdalena Felix Joel Robin".split()))
Out[5]:
'Magdalena → Robin → Felix → Dominik → Joel'
In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
table = pd.read_csv('messages.csv')
sns.displot(table.create_at)
6934
In [20]:
table
Out[20]:
post_id | create_at | message | username | num_reactions | num_files | |
---|---|---|---|---|---|---|
0 | h957oi1t63rf5b6p4pqfhiis5h | 2016-05-02 12:02:13.000 | Also ich habe etwas dabei | greatfireball | 0 | 0 |
1 | rpwuii8hci8ezmn3d41u9d58fc | 2016-05-02 12:03:02.000 | ich gehe mir jetzt was holen | nterhoeven | 0 | 0 |
2 | e8bnwq8c7fdf5k7zpqyeihougr | 2016-05-02 12:03:24.000 | Alles klar | greatfireball | 0 | 0 |
3 | sku34dhpntdfdmc5d9ky9a9yso | 2016-05-02 12:20:33.000 | Hallo @joerg | greatfireball | 0 | 0 |
4 | hi38ssjpb3yqz88dk83tzjq5zr | 2016-05-02 12:20:35.000 | Essen? | greatfireball | 0 | 0 |
... | ... | ... | ... | ... | ... | ... |
6929 | bztf3jo6q3rqbkpmabmbonhfde | 2024-08-29 11:05:25.020 | when? | felixfink | 0 | 0 |
6930 | pzcs4g7f3pgi7g35d6f4hwnp7w | 2024-08-29 11:06:41.125 | 11:15? | sascha_ollertz | 2 | 0 |
6931 | d3dxbcsmyi8i7rae83hn6m6aue | 2024-08-30 11:09:52.213 | lunch 11:15? | felixfink | 2 | 0 |
6932 | gnamzgc7gprzzch6kpio9emwjy | 2024-09-02 11:01:02.743 | 11:15? | felixfink | 5 | 1 |
6933 | mputf8b3mpygu8td7hr9nmnw3r | 2024-09-03 11:00:22.944 | 11:15? | felixfink | 7 | 1 |
6934 rows × 6 columns
In [12]:
table.dtypes
time = pd.to_datetime(table.create_at)
time
Out[12]:
0 2016-05-02 12:02:13.000 1 2016-05-02 12:03:02.000 2 2016-05-02 12:03:24.000 3 2016-05-02 12:20:33.000 4 2016-05-02 12:20:35.000 ... 6929 2024-08-29 11:05:25.020 6930 2024-08-29 11:06:41.125 6931 2024-08-30 11:09:52.213 6932 2024-09-02 11:01:02.743 6933 2024-09-03 11:00:22.944 Name: create_at, Length: 6934, dtype: datetime64[ns]
In [14]:
sns.displot(data=time)
Out[14]:
<seaborn.axisgrid.FacetGrid at 0x7f664c0c8940>
Out[14]:
In [24]:
table.query("create_at>='2024-07'")
Out[24]:
post_id | create_at | message | username | num_reactions | num_files | |
---|---|---|---|---|---|---|
6880 | ywkwbkbbp3rx8prdk6gzqnm9ih | 2024-07-01 10:13:45.680 | 11:15? :eyes: | marko_korb | 3 | 0 |
6881 | wmgxbhn6p3de7mb7btkhp845xw | 2024-07-02 10:57:56.224 | 11:15? | robin | 0 | 1 |
6882 | d4kx9urq83geu89s9aq6uuyupr | 2024-07-05 11:04:10.014 | 11:15? | robin | 2 | 1 |
6883 | 6fj37shripnhtkxgf3r4bqb5my | 2024-07-08 10:13:47.192 | 11:15? :eyes: | marko_korb | 2 | 0 |
6884 | 9ky97nqcptnopf5b6ctmcxj4wr | 2024-07-12 11:29:11.707 | Now? 👀 | marko_korb | 0 | 0 |
6885 | 6oeo59xiw7gmdb141qf4scc8fy | 2024-07-15 10:57:40.233 | 11:15? | felixfink | 4 | 1 |
6886 | wdtwdze7offdjcxrmmejdksgxr | 2024-07-15 15:32:38.168 | 15:45? | felixfink | 2 | 1 |
6887 | 66er8cjy1j853dtdfywu319tge | 2024-07-16 10:49:29.438 | 11:15? | felixfink | 2 | 1 |
6888 | j48cnpyk9bynp8nuaczdtz16dh | 2024-07-17 12:11:01.277 | Now | iimog | 2 | 0 |
6889 | y1k7i6upntnupj9xptre6kd1iy | 2024-07-19 11:05:51.221 | anyone here for lunch? | felixfink | 0 | 0 |
6890 | kynf489tqpb4mrk3ao31pxpqro | 2024-07-22 10:58:42.709 | Anyone here for lunch? | sabine | 2 | 0 |
6891 | e9ugfip8gtb9uyzrhsfiuo8yre | 2024-07-22 11:01:37.332 | NaN | felixfink | 0 | 1 |
6892 | yd14wdtkojbzifqifi6daipmee | 2024-07-22 11:07:21.895 | Ah totally forgot to mention in the morning...... | chaitanya | 2 | 0 |
6893 | r7d6o8isgpy1xdw1cihh8mx4aw | 2024-07-22 11:14:27.887 | ah, btw: 11:15 as usual? | felixfink | 1 | 0 |
6894 | 4dk3qq8kubyg7dum3isioasr3y | 2024-07-23 10:48:44.299 | 11:15? | robin | 2 | 1 |
6895 | 7j4r9zhjctnp5ruibup1rmazye | 2024-07-23 11:02:46.237 | NaN | felixfink | 2 | 1 |
6896 | 4t3k8eb6kbf6i87e9q9w87kdba | 2024-07-25 11:00:19.409 | 11:15? | felixfink | 1 | 0 |
6897 | pidtbmc4qjgb9fratwco735osy | 2024-07-26 10:19:37.512 | Lunch 1130/1145? | felixfink | 1 | 0 |
6898 | qrfkbiypbfy5bynxaxhugce1sa | 2024-07-26 11:27:54.980 | I'll meet you at the Mensa | felixfink | 0 | 0 |
6899 | nhhnkzhye7n4i8zyqw3zyzfsac | 2024-07-26 11:36:24.608 | coming | david | 0 | 0 |
6900 | hmj85r9r6byh7kpze9fmq7tr9y | 2024-07-29 10:58:06.289 | 11:15? | felixfink | 2 | 1 |
6901 | 3krp8is4ob8axp5cr4115on59w | 2024-07-30 10:24:09.483 | 11:15? | david | 1 | 0 |
6902 | d6fwmmuzctgg3gx4ybw4dh4s4w | 2024-07-30 11:54:05.221 | @all We have another cooking session tomorrow,... | janablechschmidt | 0 | 0 |
6903 | pexikanbztr7ifm64eirofwh9h | 2024-07-31 13:05:09.677 | es sind 5,60€ :relaxed: paypal jblechschmidt1... | janablechschmidt | 0 | 0 |
6904 | qqzrwj8dqjf47r1ctm8u43mc3c | 2024-08-01 10:54:55.476 | Anyone here for lunch? | felixfink | 1 | 1 |
6905 | 5g7mqrhfufdy9gbrgsb95dht7a | 2024-08-01 11:16:38.914 | 11:15? | felixfink | 0 | 0 |
6906 | ub4zm7unpbgdfbro7hwzxr6xjo | 2024-08-02 10:51:12.938 | 11:15? | robin | 3 | 1 |
6907 | 36wteq5ef3fndpnxwf4re5m5jy | 2024-08-05 10:56:17.167 | 11:15? | felixfink | 1 | 1 |
6908 | qbraxfdgtjb77bf4tojqk811zw | 2024-08-05 12:07:21.863 | kitchen lunch anyone? | janablechschmidt | 0 | 0 |
6909 | znq5zkuk8bnbupc4z616daomwa | 2024-08-06 10:52:58.421 | 11:15? | felixfink | 1 | 0 |
6910 | t8mnuirj7ingbfwihfh1fpbxra | 2024-08-07 11:08:26.720 | 11:16? | iimog | 4 | 0 |
6911 | cssnpiukwifd58iyw4dzssd7qa | 2024-08-08 10:45:56.817 | 11:17? | iimog | 1 | 0 |
6912 | nkm1mg51dty55b9ghzsppjcw4e | 2024-08-12 11:03:00.166 | NaN | felixfink | 1 | 1 |
6913 | j4q6fu33ofyi3pt1y6q9q1k19o | 2024-08-13 11:30:01.410 | lunch anyone? | felixfink | 1 | 0 |
6914 | 4mtzmpezxibhmcp9eiaeisgxya | 2024-08-13 11:30:45.467 | 11:35? | felixfink | 1 | 0 |
6915 | c6zdx94jbtromkrk6y434sxyic | 2024-08-13 12:06:39.640 | kitchen lunch 12:15? | janablechschmidt | 0 | 0 |
6916 | 7fburkqmh7fa7q35iohuiz1iqe | 2024-08-14 10:42:46.110 | 11:15? | robin | 2 | 1 |
6917 | yyb41a6tpfyy3fbw8kmofnk8bw | 2024-08-14 10:53:17.055 | NaN | felixfink | 0 | 1 |
6918 | j94zhyi3zibidgxb1ne19ofxuy | 2024-08-14 10:56:28.369 | NaN | robin | 0 | 1 |
6919 | uwirpgowu7nqipfjkki64gaedy | 2024-08-19 10:49:02.058 | 11:15? | robin | 4 | 1 |
6920 | tsonz19q3jrm8pp3f8o5ccboto | 2024-08-20 10:25:06.137 | 11:15? | david | 2 | 0 |
6921 | nejc3nh1offzdp7bmgpnqe8ekr | 2024-08-21 10:50:57.235 | NaN | felixfink | 3 | 1 |
6922 | i5wt4fnqopygzjxfuum8br19wr | 2024-08-22 10:45:31.522 | 11:15? | david | 0 | 0 |
6923 | bbfpcxrnq3g7fq95ftjor4f3ge | 2024-08-22 10:47:39.847 | NaN | felixfink | 3 | 1 |
6924 | 3mkddbhfftbmjecwinq7nuy6zw | 2024-08-23 10:38:44.762 | 11:15? | robin | 2 | 1 |
6925 | k7qp798w4t8hppx8myzn3x6khc | 2024-08-26 11:00:43.699 | 11:15? | felixfink | 4 | 0 |
6926 | o84wq6g8jjfipj8aqj5b1j761o | 2024-08-27 10:37:48.163 | 11:15? | david | 4 | 0 |
6927 | yqhxnkrcit8k8dywm75d1oyjiy | 2024-08-28 10:35:52.100 | 11:15? | felixfink | 7 | 0 |
6928 | jqnbok68gpd45mg3tg87gzda3a | 2024-08-29 11:03:48.686 | lunch? | david | 0 | 0 |
6929 | bztf3jo6q3rqbkpmabmbonhfde | 2024-08-29 11:05:25.020 | when? | felixfink | 0 | 0 |
6930 | pzcs4g7f3pgi7g35d6f4hwnp7w | 2024-08-29 11:06:41.125 | 11:15? | sascha_ollertz | 2 | 0 |
6931 | d3dxbcsmyi8i7rae83hn6m6aue | 2024-08-30 11:09:52.213 | lunch 11:15? | felixfink | 2 | 0 |
6932 | gnamzgc7gprzzch6kpio9emwjy | 2024-09-02 11:01:02.743 | 11:15? | felixfink | 5 | 1 |
6933 | mputf8b3mpygu8td7hr9nmnw3r | 2024-09-03 11:00:22.944 | 11:15? | felixfink | 7 | 1 |
In [19]:
table = table.assign(create_at=time)
df1 = table.groupby(table["create_at"].dt.to_period("M")).count()
print(df1)
post_id create_at message username num_reactions num_files create_at 2016-05 196 196 196 196 196 196 2016-06 145 145 145 145 145 145 2016-07 209 209 209 209 209 209 2016-08 282 282 282 282 282 282 2016-09 245 245 245 245 245 245 ... ... ... ... ... ... ... 2024-05 16 16 16 16 16 16 2024-06 23 23 15 23 23 23 2024-07 24 24 22 24 24 24 2024-08 28 28 23 28 28 28 2024-09 2 2 2 2 2 2 [98 rows x 6 columns]
In [41]:
df2 = table['create_at'].dt.year
df2
Out[41]:
0 2016 1 2016 2 2016 3 2016 4 2016 ... 6929 2024 6930 2024 6931 2024 6932 2024 6933 2024 Name: create_at, Length: 6934, dtype: int32
In [38]:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) /tmp/ipykernel_304/212619551.py in <cell line: 1>() ----> 1 df2.groupby(df2['create_at'].dt.year).agg(['mean']) /usr/local/lib/python3.10/dist-packages/pandas/core/series.py in __getitem__(self, key) 1038 1039 elif key_is_scalar: -> 1040 return self._get_value(key) 1041 1042 # Convert generator to list before going through hashable part /usr/local/lib/python3.10/dist-packages/pandas/core/series.py in _get_value(self, label, takeable) 1154 1155 # Similar to Index.get_value, but we do not fall back to positional -> 1156 loc = self.index.get_loc(label) 1157 1158 if is_integer(loc): /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py in get_loc(self, key) 416 raise KeyError(key) from err 417 if isinstance(key, Hashable): --> 418 raise KeyError(key) 419 self._check_indexing_error(key) 420 raise KeyError(key) KeyError: 'create_at'
In [0]:
In [39]:
df2.columns
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /tmp/ipykernel_304/3936327964.py in <cell line: 1>() ----> 1 df2.columns /usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in __getattr__(self, name) 6202 ): 6203 return self[name] -> 6204 return object.__getattribute__(self, name) 6205 6206 @final AttributeError: 'Series' object has no attribute 'columns'
In [55]:
df1 = df1.assign(create_at = df1.index.to_timestamp())
sns.displot(data=df1, x = 'create_at')
Out[55]:
<seaborn.axisgrid.FacetGrid at 0x7f6622015240>
Out[55]:
In [43]:
df1.columns
Out[43]:
Index(['post_id', 'create_at', 'message', 'username', 'num_reactions', 'num_files'], dtype='object')
In [0]: