bash to resolve dependencies

Ambari – user can log in but has no privileges

We ran into some interesting issue – user has all the privileges, was in correct ldap group, can log in to the ambari, but privileges weren’t effective.

I did multiple discoveries – checked log ambari-server.log and found out following

So, tried to restart ambari-server, didn’t work. Did further investigation, using ambari API

so, this worked. Notice uppercase in user_name. So I tried to fetch user info with privileges

Nice. Works too. So another check – load only certain privileges – 153.

Aha! Looks like there’s some inconsitency in the postgres database. So, check it!

Gotcha. There’s no such user named myuser in the database – and pg is case sensitive. So, rename the user

and restart ambari-server. Now we can try again the api call


Side by side images

Quick script. As I’ve got FLIR images of my house, I wanted to have them side by side – original image + FLIR. So I wrote some small (fast) script in bash to do that.

And the resulting image:


sidekiq getting stuck?

We had quite strange scenario. Two hosts with sidekiq, one working well, second one gets stuck after few seconds or minutes. Happened in workers with opening new TCP/UDP connections. So, I started to review all the files, where ‘max files’ is set.

So, all this looks OK so far. After numerous checks I’ve found, that our eye  process is running for a while, maybe since the beginning and probably it took old file-max  settings. How to check? Simply run eye info  or get PID of the sidekiq  process and run following command (assume 22613 is the PID):

Gotcha! 1024 as soft-limit, 4096 hard limit. The fix is pretty simple – just quit eye , load it’s config and restart sidekiq.

voila! We’re back on track :)

sendmail + osx 10.10.4/ios 8.4/ios 9

After upgrade I was unable se to send mails through my sendmail, from both OSX 10.10.4 and upgraded iOS9 …

this was in my sendmail log:

fix is amazingly easy and effective:

generate dh2048 certificate:

and add to

then recompile your .mc and restart sendmail.

Optimizing fluentd

We’re currently using (for one part of our infrastructure) logging into elasticsearch. We have fluentd collectors and kibana interface for viewing and searching through the logs. fluentd This is how it works. Logs are sent to fluentd forwarder and then over the network to fluentd collector, which pushes all the logs to elasticsearch. As we have plenty of logs, we need to incorporate some buffering – on both sides – using buffer_file statement in the fluentd config. Here is a part of our fluentd config from forwarder

and the same for the collector

So. For the forwarder, we’re using buffer with max 4096 8MB chunks = 32GB of buffer space. Forwarder is flushing every 10secs. For collector, we use bigger chunks, as elasticsearch is capable to handle it – but not using default 256MB chunks due to memory limitations. Flushing period is longer – and should be – recommended value is 5minutes. We can keep up to 64Gigs of buffer data.

What happens if one of the fluentd dies. Some data will be probably lost, when unsaved to buffer. But. When there’s connection lost or collector fluentd isn’t running, all logs, collected by forwarder, are stored into the buffer – and sent later. Which is great. The same when ES is down for some reason, collector node is still receiving data and is able to continue sending into ES after full recovery.

PS: don’t forget to make some tweaks to the system itself, like raise the limit for max files opened and some tcp tunning.

Securing kibana + elasticsearch

After some successful setup of Kibana + es for fluentd there’s a need to secure whole website. So I decided to use nginx and basic auth. I assume you have standard configuration – with es running on localhost:9200.

and now modify nginx config:

Duplicity – BackendException: ssh connection to server:22 failed: Unknown server

Booom! After reinstall of one of our servers I got into this. Weird error. It’s caused by paramiko. There’s no code fix available, but reason is simple – and fix too.

Connect to your box, and simply remove two files from /etc/ssh directory

so, remove these two files:

Then clean up ~/.ssh/known_hosts  file on the box your’re running backup from

connect using ssh to backup server from that host (to write id_rsa keys into known_hosts file)

and run duplicate again.

voila! :)

redis sentinel setup


  • multiple clients with redis 2.8.2+ installed

Do I need sentinel? If you want to have some kind of redis failover (there’s no cluster yet) – yes. Sentinels continuously monitor every redis instance and change configuration of given redis node(s) – if specified number of sentinels decided whether master is down, then they elect and promote new master and set other nodes as a slave of this master.

Looks interesting? Yes. It is. But. There’s a little time gap between electing and switching to the new master. You have to resolve this on application level.

Basically. Initial setup expects all nodes running as a master, with manual set slaveof ip port in redis-cli on meaned redis slaves. Then run sentinel and it does the rest.

sample redis configururation files follow:

and sentinel configuration file:

Start all of your redis nodes with redis config and choose master. Then run redis console and set all other nodes as a slave of given master, using command slaveof 6379. Then you can connect to your master and verify, if there are all of your slave nodes, connected and syncing – run info command in your master redis console. Output should show you something like this

To test, if your sentinel works, just shutdown your redis master and watch sentinel log. You should see something like this

explained line by line

master is subjectively down (maybe)

master is objectively down (oh, really), two of four sentinels have the same opinion

so we switch to another master – chosen

reconfigure as a slave of new master

sorry, former master, you have to serve as a slave now

+sdown, -odown? + means ‘is’, – means ‘is no longer’. Then “+sdown” can be translated as “is subjectively down” and “-odown” like “is no longer objectively down”. Simple, huh? :)

PS: take my configuration files as a sample. Feel free to modify to match your need and check redis/sentinel configuration docs to get deeper knowledge about configuration options.

rails + passenger + nginx maintenance mode

I need to add maintenance page to some rails app, running with passenger and nginx. Here’s some config and steps.

You just need to add static html file to app_root/public/maintenance.html – and I assume css files on /assets url.

so, here’s nginx config:

setting maintance mode is really simple – set app_root/tmp/maintenance.txt file – when escaping, just remove that file.