Mattermost Lunch Channel History¶

  • Data Source: Mattermost API, CCTB instance
  • Tasks:
    • Part I - June 2024: retrieving chat history data through the mattermost API
    • Part II - September 2024: analyzing messages in the lunch channel
  • Language: python

Potential questions¶

  • At what times is usually posted in the lunch channel
  • Extract "the main lunch message" (MLM)
  • How many reactions does MLM retrieve
    • Is there a correlation by day of week
    • Is there a trend over time
  • Who posts MLM most often?
  • Who attends most often without posting MLM?
  • For an individal user get the attendance over time
  • …
In [4]:
import numpy as np
In [5]:
np.random.seed(42)
" → ".join(np.random.permutation("Dominik Magdalena Felix Joel Robin".split()))
Out[5]:
'Magdalena → Robin → Felix → Dominik → Joel'

1. Data loading¶

Load files:

  • messages.csv
  • reactions.csv
  • files.csv
In [8]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

table = pd.read_csv('messages.csv')
sns.displot(table.create_at)
6934
In [20]:
table
Out[20]:
post_id create_at message username num_reactions num_files
0 h957oi1t63rf5b6p4pqfhiis5h 2016-05-02 12:02:13.000 Also ich habe etwas dabei greatfireball 0 0
1 rpwuii8hci8ezmn3d41u9d58fc 2016-05-02 12:03:02.000 ich gehe mir jetzt was holen nterhoeven 0 0
2 e8bnwq8c7fdf5k7zpqyeihougr 2016-05-02 12:03:24.000 Alles klar greatfireball 0 0
3 sku34dhpntdfdmc5d9ky9a9yso 2016-05-02 12:20:33.000 Hallo @joerg greatfireball 0 0
4 hi38ssjpb3yqz88dk83tzjq5zr 2016-05-02 12:20:35.000 Essen? greatfireball 0 0
... ... ... ... ... ... ...
6929 bztf3jo6q3rqbkpmabmbonhfde 2024-08-29 11:05:25.020 when? felixfink 0 0
6930 pzcs4g7f3pgi7g35d6f4hwnp7w 2024-08-29 11:06:41.125 11:15? sascha_ollertz 2 0
6931 d3dxbcsmyi8i7rae83hn6m6aue 2024-08-30 11:09:52.213 lunch 11:15? felixfink 2 0
6932 gnamzgc7gprzzch6kpio9emwjy 2024-09-02 11:01:02.743 11:15? felixfink 5 1
6933 mputf8b3mpygu8td7hr9nmnw3r 2024-09-03 11:00:22.944 11:15? felixfink 7 1

6934 rows × 6 columns

In [12]:
table.dtypes
time = pd.to_datetime(table.create_at)
time
Out[12]:
0      2016-05-02 12:02:13.000
1      2016-05-02 12:03:02.000
2      2016-05-02 12:03:24.000
3      2016-05-02 12:20:33.000
4      2016-05-02 12:20:35.000
                 ...          
6929   2024-08-29 11:05:25.020
6930   2024-08-29 11:06:41.125
6931   2024-08-30 11:09:52.213
6932   2024-09-02 11:01:02.743
6933   2024-09-03 11:00:22.944
Name: create_at, Length: 6934, dtype: datetime64[ns]
In [14]:
sns.displot(data=time)
Out[14]:
<seaborn.axisgrid.FacetGrid at 0x7f664c0c8940>
Out[14]:
No description has been provided for this image
In [24]:
table.query("create_at>='2024-07'")
Out[24]:
post_id create_at message username num_reactions num_files
6880 ywkwbkbbp3rx8prdk6gzqnm9ih 2024-07-01 10:13:45.680 11:15? :eyes: marko_korb 3 0
6881 wmgxbhn6p3de7mb7btkhp845xw 2024-07-02 10:57:56.224 11:15? robin 0 1
6882 d4kx9urq83geu89s9aq6uuyupr 2024-07-05 11:04:10.014 11:15? robin 2 1
6883 6fj37shripnhtkxgf3r4bqb5my 2024-07-08 10:13:47.192 11:15? :eyes: marko_korb 2 0
6884 9ky97nqcptnopf5b6ctmcxj4wr 2024-07-12 11:29:11.707 Now? 👀 marko_korb 0 0
6885 6oeo59xiw7gmdb141qf4scc8fy 2024-07-15 10:57:40.233 11:15? felixfink 4 1
6886 wdtwdze7offdjcxrmmejdksgxr 2024-07-15 15:32:38.168 15:45? felixfink 2 1
6887 66er8cjy1j853dtdfywu319tge 2024-07-16 10:49:29.438 11:15? felixfink 2 1
6888 j48cnpyk9bynp8nuaczdtz16dh 2024-07-17 12:11:01.277 Now iimog 2 0
6889 y1k7i6upntnupj9xptre6kd1iy 2024-07-19 11:05:51.221 anyone here for lunch? felixfink 0 0
6890 kynf489tqpb4mrk3ao31pxpqro 2024-07-22 10:58:42.709 Anyone here for lunch? sabine 2 0
6891 e9ugfip8gtb9uyzrhsfiuo8yre 2024-07-22 11:01:37.332 NaN felixfink 0 1
6892 yd14wdtkojbzifqifi6daipmee 2024-07-22 11:07:21.895 Ah totally forgot to mention in the morning...... chaitanya 2 0
6893 r7d6o8isgpy1xdw1cihh8mx4aw 2024-07-22 11:14:27.887 ah, btw: 11:15 as usual? felixfink 1 0
6894 4dk3qq8kubyg7dum3isioasr3y 2024-07-23 10:48:44.299 11:15? robin 2 1
6895 7j4r9zhjctnp5ruibup1rmazye 2024-07-23 11:02:46.237 NaN felixfink 2 1
6896 4t3k8eb6kbf6i87e9q9w87kdba 2024-07-25 11:00:19.409 11:15? felixfink 1 0
6897 pidtbmc4qjgb9fratwco735osy 2024-07-26 10:19:37.512 Lunch 1130/1145? felixfink 1 0
6898 qrfkbiypbfy5bynxaxhugce1sa 2024-07-26 11:27:54.980 I'll meet you at the Mensa felixfink 0 0
6899 nhhnkzhye7n4i8zyqw3zyzfsac 2024-07-26 11:36:24.608 coming david 0 0
6900 hmj85r9r6byh7kpze9fmq7tr9y 2024-07-29 10:58:06.289 11:15? felixfink 2 1
6901 3krp8is4ob8axp5cr4115on59w 2024-07-30 10:24:09.483 11:15? david 1 0
6902 d6fwmmuzctgg3gx4ybw4dh4s4w 2024-07-30 11:54:05.221 @all We have another cooking session tomorrow,... janablechschmidt 0 0
6903 pexikanbztr7ifm64eirofwh9h 2024-07-31 13:05:09.677 es sind 5,60€ :relaxed: paypal jblechschmidt1... janablechschmidt 0 0
6904 qqzrwj8dqjf47r1ctm8u43mc3c 2024-08-01 10:54:55.476 Anyone here for lunch? felixfink 1 1
6905 5g7mqrhfufdy9gbrgsb95dht7a 2024-08-01 11:16:38.914 11:15? felixfink 0 0
6906 ub4zm7unpbgdfbro7hwzxr6xjo 2024-08-02 10:51:12.938 11:15? robin 3 1
6907 36wteq5ef3fndpnxwf4re5m5jy 2024-08-05 10:56:17.167 11:15? felixfink 1 1
6908 qbraxfdgtjb77bf4tojqk811zw 2024-08-05 12:07:21.863 kitchen lunch anyone? janablechschmidt 0 0
6909 znq5zkuk8bnbupc4z616daomwa 2024-08-06 10:52:58.421 11:15? felixfink 1 0
6910 t8mnuirj7ingbfwihfh1fpbxra 2024-08-07 11:08:26.720 11:16? iimog 4 0
6911 cssnpiukwifd58iyw4dzssd7qa 2024-08-08 10:45:56.817 11:17? iimog 1 0
6912 nkm1mg51dty55b9ghzsppjcw4e 2024-08-12 11:03:00.166 NaN felixfink 1 1
6913 j4q6fu33ofyi3pt1y6q9q1k19o 2024-08-13 11:30:01.410 lunch anyone? felixfink 1 0
6914 4mtzmpezxibhmcp9eiaeisgxya 2024-08-13 11:30:45.467 11:35? felixfink 1 0
6915 c6zdx94jbtromkrk6y434sxyic 2024-08-13 12:06:39.640 kitchen lunch 12:15? janablechschmidt 0 0
6916 7fburkqmh7fa7q35iohuiz1iqe 2024-08-14 10:42:46.110 11:15? robin 2 1
6917 yyb41a6tpfyy3fbw8kmofnk8bw 2024-08-14 10:53:17.055 NaN felixfink 0 1
6918 j94zhyi3zibidgxb1ne19ofxuy 2024-08-14 10:56:28.369 NaN robin 0 1
6919 uwirpgowu7nqipfjkki64gaedy 2024-08-19 10:49:02.058 11:15? robin 4 1
6920 tsonz19q3jrm8pp3f8o5ccboto 2024-08-20 10:25:06.137 11:15? david 2 0
6921 nejc3nh1offzdp7bmgpnqe8ekr 2024-08-21 10:50:57.235 NaN felixfink 3 1
6922 i5wt4fnqopygzjxfuum8br19wr 2024-08-22 10:45:31.522 11:15? david 0 0
6923 bbfpcxrnq3g7fq95ftjor4f3ge 2024-08-22 10:47:39.847 NaN felixfink 3 1
6924 3mkddbhfftbmjecwinq7nuy6zw 2024-08-23 10:38:44.762 11:15? robin 2 1
6925 k7qp798w4t8hppx8myzn3x6khc 2024-08-26 11:00:43.699 11:15? felixfink 4 0
6926 o84wq6g8jjfipj8aqj5b1j761o 2024-08-27 10:37:48.163 11:15? david 4 0
6927 yqhxnkrcit8k8dywm75d1oyjiy 2024-08-28 10:35:52.100 11:15? felixfink 7 0
6928 jqnbok68gpd45mg3tg87gzda3a 2024-08-29 11:03:48.686 lunch? david 0 0
6929 bztf3jo6q3rqbkpmabmbonhfde 2024-08-29 11:05:25.020 when? felixfink 0 0
6930 pzcs4g7f3pgi7g35d6f4hwnp7w 2024-08-29 11:06:41.125 11:15? sascha_ollertz 2 0
6931 d3dxbcsmyi8i7rae83hn6m6aue 2024-08-30 11:09:52.213 lunch 11:15? felixfink 2 0
6932 gnamzgc7gprzzch6kpio9emwjy 2024-09-02 11:01:02.743 11:15? felixfink 5 1
6933 mputf8b3mpygu8td7hr9nmnw3r 2024-09-03 11:00:22.944 11:15? felixfink 7 1
In [19]:
table = table.assign(create_at=time)
df1 = table.groupby(table["create_at"].dt.to_period("M")).count()
print(df1)
           post_id  create_at  message  username  num_reactions  num_files
create_at                                                                 
2016-05        196        196      196       196            196        196
2016-06        145        145      145       145            145        145
2016-07        209        209      209       209            209        209
2016-08        282        282      282       282            282        282
2016-09        245        245      245       245            245        245
...            ...        ...      ...       ...            ...        ...
2024-05         16         16       16        16             16         16
2024-06         23         23       15        23             23         23
2024-07         24         24       22        24             24         24
2024-08         28         28       23        28             28         28
2024-09          2          2        2         2              2          2

[98 rows x 6 columns]
In [41]:
df2 = table['create_at'].dt.year
df2
Out[41]:
0       2016
1       2016
2       2016
3       2016
4       2016
        ... 
6929    2024
6930    2024
6931    2024
6932    2024
6933    2024
Name: create_at, Length: 6934, dtype: int32
In [38]:
 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_304/212619551.py in <cell line: 1>()
----> 1 df2.groupby(df2['create_at'].dt.year).agg(['mean'])

/usr/local/lib/python3.10/dist-packages/pandas/core/series.py in __getitem__(self, key)
   1038 
   1039         elif key_is_scalar:
-> 1040             return self._get_value(key)
   1041 
   1042         # Convert generator to list before going through hashable part

/usr/local/lib/python3.10/dist-packages/pandas/core/series.py in _get_value(self, label, takeable)
   1154 
   1155         # Similar to Index.get_value, but we do not fall back to positional
-> 1156         loc = self.index.get_loc(label)
   1157 
   1158         if is_integer(loc):

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py in get_loc(self, key)
    416                 raise KeyError(key) from err
    417         if isinstance(key, Hashable):
--> 418             raise KeyError(key)
    419         self._check_indexing_error(key)
    420         raise KeyError(key)

KeyError: 'create_at'
In [0]:
 
In [39]:
df2.columns
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_304/3936327964.py in <cell line: 1>()
----> 1 df2.columns

/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   6202         ):
   6203             return self[name]
-> 6204         return object.__getattribute__(self, name)
   6205 
   6206     @final

AttributeError: 'Series' object has no attribute 'columns'
In [55]:
df1 = df1.assign(create_at = df1.index.to_timestamp())

sns.displot(data=df1, x = 'create_at')
Out[55]:
<seaborn.axisgrid.FacetGrid at 0x7f6622015240>
Out[55]:
No description has been provided for this image
In [43]:
df1.columns
Out[43]:
Index(['post_id', 'create_at', 'message', 'username', 'num_reactions',
       'num_files'],
      dtype='object')
In [0]: