Bits and thoughts

#!/bin/bash is not rude

self-hosting

Adapt bayesian filters for spam on server

Written by ⓘⓓⓔⓝⓣⓛⓤⓓ - -

When the detection of spam is activated through spamassassin on the server it would be gold to have the spamassassin daemon to learn what is spam and what is not.

One way to do that is to write a simple script that could be run manually or through a crontab entry :

#!/usr/bin/env bash
basedir=Maildir
ls -d $basedir/.* |  while read dir
do
        processedPart=$(basename "$dir")
        if [ "$processedPart" != "." -a "$processedPart" != ".." ]
        then
                echo -n Processing $processedPart ...
                if [ "$processedPart" == ".Spam" -o "$processedPart" == ".Pourriels" ]
                then
                        echo as SPAM
                        sa-learn --spam "$basedir/$processedPart/{cur,new}"
                else
                        echo as ham
                        sa-learn --ham "$basedir/$processedPart/{cur,new}"
                fi
        fi
done

Of course ".Spam" and ".Pourriels" ara my own choices for IMAP folders to contain spam. Update this script with your own preferences.
I have placed this script in a "spam" file in ~/bin to run it manually.



GNUSocial on RaspberryPi

Written by ⓘⓓⓔⓝⓣⓛⓤⓓ - -

After a few days of running GNUSocial on RaspberryPI I came to the conclusion that the huge amount of data that it processes through mysql is too important for the tiny memory of my RaspberryPI.

The GNUSocial database had become corrupted and was irrecoverable. I had to move it away from mysql harms and restart mysqld.

My previous server is still running. I simply lost 4 days of messages from my dear colleagues ...

Changing GNUSocial log files

Written by ⓘⓓⓔⓝⓣⓛⓤⓓ - -

GNUSocial uses syslog to create logs during its execution. As there are no specific indications these logs all go to /var/log/syslog.

Mixing sub-systems

This is also where some other sub systems are writing their own logs.
awk '{print $5}' /var/log/syslog \
| sed -e 's/[0-9].//g' \
| tr -d '[]' \
| sort -u
gives this result :
kernel:
named:
postfix/anvil:
postfix/cleanup:
postfix/local:
postfix/master:
postfix/pickup:
postfix/qmgr:
postfix/smtp:
postfix/smtpd:
rsyslogd:statusnet:
/USR/SBIN/CRON:

Configure GNUSocial to use its own log file

I wanted to have GNUSocial logs separated from the other subsystems. I followed the instructions of the documentation and I modified config.php to add two lines (I did that when it was still status.net)
#logs$config['site']['logdebug'] = false;
$config['site']['logfile'] = '/var/log/statusnet/statusnet.log';
At the first attempt to write a log it failed ... nothing was logged  in /var/log/statusnet/statusnet.log . Looks like a user permissions problem.... (www-data is the group of www-data user who runs Apache)
chown -R root:www-data /var/log/statusnet
chmod g+w /var/log/statusnet/statusnet.log

Limit the growth

Now that GNUSocial logs should be on their own, how can I limit their disk usage over time ? On a linux system the answer is by logrotating them. So I wrote a logrotated configuration file for these logs in /etc/logrotate.d/statusnet
/var/log/statusnet/*.log
{
  rotate 7
        daily
        missingok
        notifempty
        delaycompress
        compress
        create 660 root www-data
}
This a read like this :
        
  • /var/log/statusnet/*.log : consider every file named *.log in /var/log/statusnet directory
  • rotate 7,  daily,  missingok,  notifempty,  delaycompress,  compress : every day create a new log file and keep the seven latest log files. If nothing was written in the log file than do noting. You can also ignore the log rotation if there is no file at all. And, by the way, compress it so that it takes even less room.
  •     
  • create 660 root www-data : every new log file should be created so that users belonging to www-data group and root itself can read and write the file

Strange behaviour

Using the aforementioned configuration of config.php I expect not to see any more debug logs and I expected to see every log in /var/log/statusnet/statusnet.log ... Well that's not what is happening : 
  • /var/log/sysout still contains GNUSocial logs but much less than before. I can read daemon messages and every messages stating that a PHP call is "Including config file: /var/www/statusnet/config.php"
  • LOG_DEBUG messages are in /var/log/statusnet/statusnet.log
Discussing with @jpope I configured LogFilter plugin
addPlugin('LogFilter', array( 'priority' => array(LOG_ERR => true,    LOG_INFO => true,    LOG_DEBUG => false),    'regex' => array('/About to push/i' => false,      '/Including config file/i' => false,      '/Successfully handled item/i' => false)      ));
But these line are still printed out. A quick search indicates that ./lib/statusnet.php is the culprit
# find . -name "*.php" -exec grep -H "Including config file" '{}' \;./config.php:      '/Including config file/i' => false,./lib/statusnet.php:    common_log(LOG_INFO, "Including config file: " . $_config_file);
My next step is to change this message priority to LOG_DEBUG by editing ./lib/statusnet.php ... to no avail for the message is still present ...

Ocam's razor is still sharp :
replace
common_log(LOG_INFO, "Including config file: " . $_config_file);
by
//common_log(LOG_INFO, "Including config file: " . $_config_file);
in ./lib/statusnet.php

Et voilà ...

Self-Hosting on Raspberry Pi

Written by ⓘⓓⓔⓝⓣⓛⓤⓓ - -

I recently bought a RaspberryPi v2.

My goal is to selfhost some of my services :
  • email
  • GNUSocial instance
  • XMPP server
  • This blog !!

I had to migrate all these services from my previously OVH hosted site. I won't bother you much with the details but it took me 2 days and a half to migrate it all.

I started with my email server because I like to start with difficult things. And then my GNUSocial instance. The XMPP Server was a piece of cake (If you retrieve what was parameterized in /var/lib/prosody too ...).

As for now it runs smoothly and very very silently.

The largest memory hog is mysql of course !

Fail2ban Analysis

Written by ⓘⓓⓔⓝⓣⓛⓤⓓ - -


Having restarted my fail2ban daemon on 2014-12-17 I was curious about the time distribution of bans and their original location.I crafted a small bash one-liner that gives me raw data : 
for ipadd in $(zgrep Ban fail2ban.log.* | awk '{print $NF}' | sort -u); \
do \
zgrep $ipadd fail2ban.log.* | awk -F':' '{print $2 $NF}' ; \
done | awk '{print $1" " $NF}' | \
grep -v '2014-12-16\|2014-12-15\|2014-12-14\|2014-12-13\|2014-12-12\|2014-12-11\|2014-12-10\|2014-12-09\|2014-12-08\|2014-12-07' \
| sort | while read data; \
do \
ipaddr2=$(echo $data | awk '{print $2}'); \
country=$(whois $ipaddr2 | egrep -i "^country:"| awk '{print $NF}' | sort -u); \
echo $data $country; \
done
It represents a total of 270 bans in almost three weeks.The ip adresses for which there is no country code is because the whois returned a Korean UTF-8 content that I couldn't parse. But they are all Koreans (I checked them manually).

IP address bans by day China comes first with 97 bans. It is followed by the United states of america (50 bans) and then Germany (30 bans).