Diaspora* Too many open files

For a little while, I have occasionally noticed that my pod will not send emails out when it's supposed to. If I navigated to the Sidekiq monitor in the admin area, and reprocessed all of the Dead jobs (or at least the email related jobs), the emails would come through. Upon further inspection, I found that it was due to a 'Too many open files' error. This post originally started as a post on my pod so, I'll quote that first:

I kept finding ‘Too many open files’ in my Sidekiq dead jobs and in the logs. They were mostly related to getaddrinfo (169 entries) or sendmail (223 entries). Also, emails would fail to send somewhat often from my pod. Took a look at this wiki entry (and the linked article) and hoping I will stop seeing those ‘too many open files’ entries in the logs.

Side note, I didn’t find any ‘…open files’ entries in sidekiq.log itself, I ended up running grep "Too many open files" log/*.log to find the entries.

Wiki entry:

Check for out of open files errors

In the diaspora* app folder, do grep "Too many open files" log/sidekiq.log and check for results relating to the system user running the diaspora server running out of open files. If you get any errors, it is possible sidekiq can be brought down by the same problem. Follow for example this guide to increase the limit of open files available to the user running diaspora*.

Links:

There are a handful of comments in the post as well, read them as you wish... One thing I do want to point out though. Jason Robinson mentioned that my pod is the second one that he knows of that has had this problem. It doesn't appear that currently, this is not a widespread issue.

That fix went in and a little later, I went to bed. The next day after work, I dug into the production logs again and found that roughly 18 hours after putting in the fix, I found some 'Too many open files' entries again. Did a little checking, the diaspora user had the limits that I set but, I found that the actual diaspora process was being limited much lower. Grr.

Pod Setup:

  • OS: CentOS 7
  • Init: systemd (based on this service file)
  • Backend: MySQL (hosted on a different host)
  • Branch: develop

Diagnosis:

After making the above changes, I could see that the hard and soft limits for the diaspora user were what I had set.

# su - poduser -c 'ulimit -aHS' -s '/bin/bash'
...
open files                      (-n) 524288
...

However, looking at the actual parent PID for the pod, I found that the limit was quite a bit lower than set. Found the PID with:

# systemctl status diaspora.service
...
Main PID: 30264 (bash)
...

and then:

# cat /proc/30264/limits 
Limit             Soft Limit   Hard Limit   Units
...
Max open files    1024         4096         files
...

Hmm, seems that the limits here are too damn low. So, off I went on a websearch and came across this article. Way down in the comments, I found my answer.

Fix:

The systemd service file needed a little edit within the [Service] section:

LimitNOFILE=524288

After restarting the pod and running the cat /proc/XXX/limits command again:

# cat /proc/XXX/limits 
Limit             Soft Limit   Hard Limit   Units
...
Max open files    524288       524288       files
...

Bam! Now things are looking better. I will be adjusting the various limits as time goes on yet. Hopefully, this fixes the issue for good but, at the time of this writing, it's too early to tell just yet. I'll update this post if it doesn't. If it does help, w00t!

Now it's time to crack open a cold one. Cheers!

EDIT: I'm posting this here mostly for my own benefit. Here's a bash script that will email me the current open file count for the diaspora user: link

EDIT 2015.08.02: After watching this open file issue for a bit, I found that the bulk of the "open files" were sidekiq connections hanging out in CLOSE_WAIT status. To get around this, I wrote a little bash script to restart sidekiq and a cron entry to do that daily. With the current state of Diaspora, I am only using the stop function of the bash script and letting the main D* process restart sidekiq on it's own.

Links:


Have a response to this post? Please use this link.