Philip Hutchins

Head in the cloud...

Elasticsearch Notes

Elasticsarch is a wonderful and powerful search and analytics engine. Operating an ES cluster or node is generally fairly easy and straight forward however there are a few situations where the resolution to seemingly common issues is not so clear. I will gather my notes and helper scripts here in an effort to help others better understand and resolve these certain issues and configurations quickly.

Settings

Set number of replicas for all indices to 0

When you spin up a single node cluster, the default setting for number of replicas is 1. This means that the cluster is going to try to create a second copy of each shard. This is not possible as you only have one node in the cluster. This keeps your cluster (single node) in the yellow status and it will never reach green. A node can function this way but it is annoying to not see a green state when everything is actually healthy.

1
curl -XPUT 'localhost:9200/_settings' -d '{"index": { "number_of_replicas": 0 } }'

Recovering

Ran out of disk

When you run out of disk, shards will have not been allocated and your cluster will likely be stuck in status RED. To recover, you need to find out which indices are unassigned and assign them manually

Commands

Check your clusters health and status of unassigned shards

1
curl -XGET http://localhost:9200/_cluster/health?pretty=true

Display the indices health

1
curl -XGET 'http://localhost:9200/_cluster/health?level=indices&pretty'

Display shards

1
curl -XGET 'http://localhost:9200/_cat/shards'

Helper Scripts

SSL Automation With LetsEncrypt in Kubernetes

Problem

When deploying services to Kubernetes, a certificate has to be injected into the container via secret. It doesn’t make sense to have each container renew it’s own certificates as it’s state can be wiped at any given time.

Solution

Build a service within each Kubernetes namespace that handles renewing all certificates used in that namespace. This service would kick off the request to renew each cert at a predetermined interval. It would then accept all verification requests ( GET request to domain/.well-known/acme-challenge ) and respond as necessary. After being issued the new certificate, it would recreate the appropriate secret which contains that certificate and initiate a restart of any container or service necessary.

Spec

SSL Renewal Container

To automate the creation and renewal of certificates, we will need to create container with Letsencrypt to request creation or renewal of each certificate, Nginx to receive and confirm domain validation, and scripts to push the generated certificates to secrets in Kubernetes. This container will be deployed to Kubernetes as a daemonset and should run in each of your Kubernetes clusters.

Container Creation & Setup

  • Nginx
  • LetsEncrypt (CertBot)

Pushing Secrets

  • kubectl
  • Access?

Restarting Services

  • kubectl

Domain List Configuration

SSL Ingress in Kubernetes

Previously to acieve using SSL/TLS in Kubernetes, we had to set up some sort of SSL/TLS termination proxy. With the addition of a few new features in Kubernetes 1.2 Ingress, we’re able to do away with the proxy and allow Kubernetes to handle this task.

Chef Provisioner SSL Errors

While setting up chef-provisioning to provision servers in Google Cloud, I ran into a pretty tricky bug which took a number of hours to troubleshoot.

Command I Was Running

chef-client -z elasticsearch-cluster.rb

The error…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Compiled Resource:
------------------
# Declared in /Users/philip/github/chef-storj/provisioners/elasticsearch-cluster.rb:21:in `from_file'

machine("elasticsearch-1") do
  action [:converge]
  retries 0
  retry_delay 2
  default_guard_interpreter :default
  chef_server {:chef_server_url=>"http://localhost:8889", :options=>{:api_version=>"0"}}
  driver "fog:Google"
  machine_options {:insert_options=>{:tags=>{:items=>["elasticsearch"]}, :disks=>[{:deviceName=>"elasticsearch-1", :autoDelete=>true, :boot=>true, :initializeParams=>{:sourceImage=>"projects/ubuntu-os-cloud/global/images/ubuntu-1404-trusty-v20150316", :diskType=>"zones/us-east1-b/diskTypes/pd-ssd", :diskSizeGb=>80}}, {:type=>"PERSISTENT", :mode=>"READ_WRITE", :zone=>"zones/us-east1-b", :source=>"zones/us-east1-b/disks/elasticsearch-1", :deviceName=>"elasticsearch-1"}]}, :key_name=>"google_default"}
  declared_type :machine
  cookbook_name "@recipe_files"
  recipe_name "/Users/philip/github/chef-storj/provisioners/elasticsearch-cluster.rb"
  run_list ["recipe[chefsj-elk::elasticsearch-1]"]
end

[2016-05-03T13:26:18-04:00] INFO: Running queued delayed notifications before re-raising exception

Running handlers:
[2016-05-03T13:26:18-04:00] ERROR: Running exception handlers
Running handlers complete
[2016-05-03T13:26:18-04:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 04 seconds
[2016-05-03T13:26:18-04:00] FATAL: Stacktrace dumped to /Users/philip/.chef/local-mode-cache/cache/chef-stacktrace.out
[2016-05-03T13:26:18-04:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2016-05-03T13:26:18-04:00] ERROR: machine[elasticsearch-1] (@recipe_files::/Users/philip/github/chef-storj/provisioners/elasticsearch-cluster.rb line 21) had an error: Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
[2016-05-03T13:26:19-04:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

Testing SSL

Using the knife ssl check command, check the status of ssl between you and your chef server.

Obtaining an Updated cert.pem

1
curl http://curl.haxx.se/ca/cacert.pem -o /usr/local/etc/openssl/cert.pem

The Problem

The precompiled versions of ruby from RVM are pointing at G/etc/openssl/certs when looking for it’s ca certificate file. Newer versions of OSX have moved their certs to a different directory, or possibly /usr/local/etc/openssl/certs if you’ve installed openssl from brew or some other source.

The Solution

Reinstall ruby from source. rvm reinstall 2.2.1 --disable-binary

Uninstall all the chef gems gem uninstall chef chef-zero berkshelf knife-solo

Reinstall ChefDK

Links

Bash Tricks and Shortcuts

Loops

Often times you need to run the same task in bash against a number of different arguments. Loops in bash can make this very quick and easy.

One of the simplest ways you can do this in a one liner is as follows

1
2
3
4
5
$ for i in one two three four; do echo $i; done
one
two
three
four

You can also predefine an array to use later like this

1
2
3
4
5
files=( "/tmp/file_one" "/tmp/file_two" "/tmp/file_three" )
for i in "${files[@]}"
do
  echo $i
done

Or, to do this on one line

1
2
3
4
$ files=("/tmp/file_one" "/tmp/file_two" "/tmp/file_three" ); for i in "${files[@]}"; do echo $i; done
/tmp/file_one
/tmp/file_two
/tmp/file_three

You can use ranges with seq

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for year in $(seq 2000 2013); do echo $year; done
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013

If you need a counter you could do something like this

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
## declare an array variable
declare -a array=("one" "two" "three")

# get length of an array
arraylength=${#array[@]}

# use for loop read all values and indexes
for (( i=1; i<${arraylength}+1; i++ ));
do
  echo $i " / " ${arraylength} " : " ${array[$i-1]}
done

File Permissions

There are a few shortcuts that make life easier when working with file and directory permissions. Here are a few.

When you want to recursively change permissions in a directory, you will want to change the file permissions separately from the directory permissions. You can accomplish this by using two different find commands piped to xargs as follows.

1
2
$ find * -type d -print0 | xargs -0 chmod 0755 # for directories
$ find . -type f -print0 | xargs -0 chmod 0644 # for files

or

1
2
$ find /path/to/directory -type d -exec chmod g+rsx '{}' \;
$ find /path/to/files -type f -exec chmod g+rsx '{}' \;

Three permission triads

1
2
3
first triad       what the owner can do
second triad      what the group members can do
third triad       what other users can do

Each triad

1
2
3
4
5
first character   r: readable
second character  w: writable
third character   x: executable
                  s or t: executable and setuid/setgid/sticky
                  S or T: setuid/setgid or sticky, but not executable

References, Operators and Modifiers

Above, you can see that permissions can be changed using u, g, o and a. These represent references to User, Group, Other and All. + (u)ser: + The user is the owner of the files. The user of a file or directory can be changed with the chown [3]. command. + Read, write and execute privileges are individually set for the user with 0400, 0200 and 0100 respectively. Combinations can be applied as necessary eg: 0700 is read, write and execute for the user. + (g)roup: + A group is the set of people that are able to interact with that file. The group set on a file or directory can be changed with the chgrp [4]. command. + Read, write and execute privileges are individually set for the group with 0040, 0020 and 0010 respectively. Combinations can be applied as necessary eg: 0070 is read, write and execute for the group. + (o)ther: + Represents everyone who isn’t an owner or a member of the group associated with that resource. Other is often referred to as “world”, “everyone” etc. + Read, write and execute privileges are individually set for the other with 0004, 0002 and 0001 respectively. Combinations can be applied as necessary eg: 0007 is read, write and execute for other. + (a)ll: + Represents everyone

The operator is what is used to control adding or removing of modifiers + + Add the specified file mode bits to the existing file mode bits of each file + – removes the specified file mode bits to the existing file mode bits of each file + = adds the specified bits and removes unspecified bits, except the setuid and setgid bits set for directories, unless explicitly specified.

Modifiers + r read + w write + x execute (or search for directories) + X execute/search only if the file is a directory or already has execute bit set for some user + s setuid or setgid (depending on the specified references) + S setuid or setgid (depending on the specified references) without the executable bit (or search for directories) set + t restricted deletion flag or sticky bit

Octal

  • The read bit adds 4 to its total (in binary 100),
  • The write bit adds 2 to its total (in binary 010), and
  • The execute bit adds 1 to its total (in binary 001).

These values never produce ambiguous combinations; each sum represents a specific set of permissions. More technically, this is an octal representation of a bit field – each bit references a separate permission, and grouping 3 bits at a time in octal corresponds to grouping these permissions by user, group, and others.

SetUID, SetGID and the Stick Bit

SUID / Set User ID : A program is executed with the file owner’s permissions (rather than with the permissions of the user who executes it).

1
2
$ chmod  u+s testfile.txt
$ chmod 4750  testfile.txt

SGID / Set Group ID : Files created in the directory inherit its GID, i.e When a directory is shared between the users , and sgid is implemented on that shared directory , when these users creates directory, then the created directory has the same gid or group owner of its parent directory.

1
2
$ chmod g+s
$ chmod 2750

Sticky Bit : It is used mainly used on folders in order to avoid deletion of a folder and its content by other user though he/she is having write permissions. If Sticky bit is enabled on a folder, the folder is deleted by only owner of the folder and super user(root). This is a security measure to suppress deletion of critical folders where it is having full permissions by others.

1
2
3
$ chmod o+t /opt/ftp-data
$ chmod +t /opt/ftp-data
$ chmod 1757 /opt/ftp-dta

’S’ = The directory’s setgid bit is set, but the execute bit isn’t set. ’s’ = The directory’s setgid bit is set, and the execute bit is set.

These are represented in the ls -la (list all files in list format) by the following

1
2
3
4
5
6
7
Permissions Meaning
--S------   SUID is set, but user (owner) execute is not set.
--s------   SUID and user execute are both set.
-----S---   SGID is set, but group execute is not set.
-----s---   SGID and group execute are both set.
--------T   Sticky bit is set, bot other execute is not set.
--------t   Sticky bit and other execute are both set.

Creating a Proxy Host on Linux No Additional Software

It is fairly easy to create a linux proxy host that proxies traffic from other hosts that don’t have direct access to the internet. This is a great and simple solution for keeping your backend workers off of the public internet to avoid attacks while at the same time allowing outbound traffic from them.

Here are the steps to configure this setup…

Worker

Configuring the worker that does not have direct access to the internet

DNS

  • Ensure that the host is using an externally resolvable IP address for DNS (this may not be needed in most cases) edit /etc/resolvconf/resolv.conf.d/base add…
1
2
nameserver 8.8.8.8
nameserver 8.8.4.4
  • Reload config files for DNS
1
$ sudo resolvconf -u

Networking

  • Change default gateway to IP address of proxy host
1
2
$ ip route del default
$ ip route add default via 192.168.3.1

Making the settings persist through a reboot

Default Route Changes

  • On Ubuntu you would edit your interfaces file, /etc/network/interfaces and update your private network interface block to include the following…
1
2
up ip route del default
up ip route add default via 192.168.3.1

… it would then look something like this …

1
2
3
4
5
6
auto eth2
iface eth2 inet static
  address 192.168.0.2
  netmask 255.255.255.0
  up ip route del default
  up ip route add default via [PROXY_IP_ADDRESS]

Proxy Host

Configuring the proxy host to allow the worker to proxy it’s traffic through it

Networking

  • add iptables rules to enable nat and forwarding with masquerade

You can add them via command line…

1
2
3
4
5
*nat
iptables -t nat -A POSTROUTING -o [PUBLIC_INTERFACE] -j MASQUERADE
*filter
iptables -A FORWARD -i [PUBLIC_INTERFACE] -o [PRIVATE_INTERFACE] -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i [PRIVATE_INTERFACE] -o [PUBLIC_INTERFACE] -j ACCEPT

The static config that this produces looks a little different than the commands used via command line to create it. You can use $ iptables-save > iptables.rules to dump your rules to a file called iptables.rules. You can then use this file to programatically load the rules upon boot.

OR

You can create an iptables file /etc/iptables.rules to load the rules from…

1
2
3
4
5
*nat
-A POSTROUTING -o [PUBLIC_INTERFACE] -j MASQUERADE
*filter
-A FORWARD -i [PUBLIC_INTERFACE] -o [PRIVATE_INTERFACE] -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i [PRIVATE_INTERFACE] -o [PUBLIC_INTERFACE] -j ACCEPT

…then use the following bash script to restore the rules (this will wipe any existing rules not existing in /etc/iptables.rules

1
2
3
#!/bin/sh
iptables-restore < /etc/iptables.rules
exit 0
  • Enable forwarding at the OS level in configs which will persist at reboot by editing /etc/sysctl.conf and uncommenting net.ipv4.ip_forward and setting it to 1
  • Enable forwarding at the OS level by running…
1
  echo 1 > /proc/sys/net/ipv4/ip_forward

To set up proxying of ssh connections through the proxy host to the backend workers do the following

  • add iptables rules to proxy the ssh traffic to the appropriate hosts (note that this goes under the nat table, do not add another nat line if one alraedy esists)
1
2
3
4
5
6
7
8
9
10
11
12
13
*nat
...
# Proxy SSH connections to badkend hosts
-A PREROUTING  -p tcp -m tcp -d [PROXY_HOST_PUBLIC_IP] --dport [EXT_SSH_PORT_1] -j DNAT --to-destination [BACKEND_WORKER_HOST_1_PRIV_NET_IP]:[BACKEND_WORKER_HOST_SSH_LISTEN_PORT]
-A PREROUTING  -p tcp -m tcp -d [PROXY_HOST_PUBLIC_IP] --dport [EXT_SSH_PORT_2] -j DNAT --to-destination [BACKEND_WORKER_HOST_2_PRIV_NET_IP]:[BACKEND_WORKER_HOST_SSH_LISTEN_PORT]
-A POSTROUTING -p tcp -m tcp -s [BACKEND_WORKER_HOST_1_PRIV_NET_IP] --sport [BACKEND_WORKER_HOST_SSH_LISTEN_PORT] -j SNAT --to-source [PROXY_HOST_PUBLIC_IP]
-A POSTROUTING -p tcp -m tcp -s [BACKEND_WORKER_HOST_2_PRIV_NET_IP] --sport [BACKEND_WORKER_HOST_SSH_LISTEN_PORT] -j SNAT --to-source [PROXY_HOST_PUBLIC_IP]
...
*filter
# Proxy SSH connections to backend hosts
-A FORWARD -m state -p tcp -i [PUBLIC_INTERFACE] -o [PRIVATE_INTERFACE] --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A [PUBLIC_INTERFACE_NICKNAME] -p tcp -m tcp --dport [EXT_SSH_PORT_1] -j ACCEPT
-A [PUBLIC_INTERFACE_NICKNAME] -p tcp -m tcp --dport [EXT_SSH_PORT_2] -j ACCEPT
  • $PUBLIC_INTERFACE_NICKNAME – refers to -A INPUT -i eth2 -j privnet where the interface nickname would be privnet and eth2 would be the private interface

Making the IPTables changes persist through reboot

On Ubuntu add the following bash script named iptablesload to /etc/network/if-pre-up.d/ (this will wipe any existing rules not existing in /etc/iptables.rules)

1
2
3
#!/bin/sh
iptables-restore < /etc/iptables.rules
exit 0

Debugging SSH Connection Issues

Debugging SSH connection issues can be tricky and frustrating.

Common Issues & Causes

  • ssh_exchange_identification: Connection closed by remote host
    • SSHD key’s are corrupt
    • Connection to host does not complete due to network issue
    • The signature for the remote host in known_hosts is not correct
    • There is a problem with the SSH Daemon on the remote host

Debugging Steps

  1. Check hosts.deny and hosts.allow and ensure that you are not blocking the client, or allowing the client if necessary
  2. Check MaxStartups value in /etc/ssh/sshd_config, the default is 10 but something like 10:30:60 is a bit safer for ssh brute force attacks
  3. Run ssh in debug mode. ..+ This will help to expose problems with things like keys and auth types.
1
ssh my.host.com -VVV
  1. Watch logs on remote server (if possible)
  2. Run sshd on separate port with debug logging to console ..+ This is a very useful step. After you start the ssh daemon on the remote host (we use a different port so that we can troubleshoot while being remote). The high port number allows you to run the sshd process as a non root user.
1
/usr/sbin/sshd -p 2121 -D -d -e

Explanation

1
2
3
4
-p    Set the listening port.
-D    This option tells sshd to not detach and does not become a daemon. It allows for easy monitoring.
-d    Enable debug mode.
-e    Write logs to standard error instead of system log.

How to Easily and Quickly Clean Up Your Ubuntu /boot Partition

Cleanup /boot

After a few kernel upgrades, the /boot partition can fill up quickly if yours is 100M like mine. It’s quite painful to remove package by package to free up some space so that you can continue upgrading. Here’s a helpful oneliner to clean up all unused kernel packages…

1
for i in `dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d'`; do sudo apt-get -y purge $i; done

OSX Equivalent Linux/UNIX Commands

There are a few staple commands that we use as engineers to trobleshoot issues on Linux machines and servers. Some of these unfortunately do not translate directly to OSX’s underlying UNIX based system. Fortunately three are equivalent commands for most of them!

Networking

It is very handy to be able to determine what ports are listening on a box, or not. It’s also helpful to be able to determine which process and binary is using that port.

List open ports

Linux

1
2
3
4
5
6
7
8
$ netstat -antlp
(snippet of output)
tcp        0      0 10.210.149.179:37499    10.210.149.179:21041    TIME_WAIT   -
tcp        0      0 10.210.149.179:36687    10.210.149.179:21041    TIME_WAIT   -
tcp        0      0 10.210.149.179:37499    10.210.149.179:21041    TIME_WAIT   -
tcp6       0      0 :::14150                :::*                    LISTEN      14765/zabbix_agentd
tcp6       0      0 :::21030                :::*                    LISTEN      15751/bitcoind
tcp6       0      0 :::21031                :::*                    LISTEN      15751/bitcoind

OSX

1
2
3
4
5
6
7
8
9
$ lsof -i -P -n
(snippet of output)
Bitcoin-Q  5114 phutchins   57u  IPv4 0x769f1abf3f7392c3      0t0  TCP 10.0.0.13:59997->188.138.104.253:8333 (ESTABLISHED)
Bitcoin-Q  5114 phutchins   62u  IPv4 0x769f1abf24ef47a3      0t0  UDP *:*
Bitcoin-Q  5114 phutchins   63u  IPv4 0x769f1abf46995463      0t0  TCP 10.0.0.13:59354->185.53.131.187:8333 (ESTABLISHED)
node       5115 phutchins   15u  IPv4 0x769f1abf251ad853      0t0  TCP *:9000 (LISTEN)
node       5115 phutchins   16u  IPv4 0x769f1abf251ae123      0t0  TCP 127.0.0.1:52442->127.0.0.1:27017 (ESTABLISHED)
node       5115 phutchins   17u  IPv4 0x769f1abf2516f123      0t0  TCP 127.0.0.1:52443->127.0.0.1:27017 (ESTABLISHED)
node       5115 phutchins   18u  IPv4 0x769f1abf2509a2c3      0t0  TCP 127.0.0.1:52444->127.0.0.1:27017 (ESTABLISHED)

List routes

Linux

1
$ route -n

OSX

1
netstat -nr

Accept DNS Push on Linux From OpenVPN

DNS Push from OpenVPN

If you’ve set up an OpenVPN server for multiple OS tennants, you might have noticed that your OSX clients connect and receive their DNS setting from the server just fine. Your Linux clients however, if running resolvconf or openresolv may not work as easily. Luckily there is a simple and easy fix.

The Fix

In the clients OpenVPN config file add the following…

1
2
3
script-security 2
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf

You can also add this on the server side configuration in the Custom Directives section.

Note however tho that if you put this on the server side, all of your clients will get this change and the file that we are referencing above may not exist on their machines which will cause problems.

Compare Commits/Tags/Branches on GitHub

GitHub previously had a feature in their GUI allowing you to compare two different commits, tags or branches. The shortcuts to this feature in the GUI have been removed for some reason but the ability to do this is still there.

This is the basic URL format for doing a compare

1
http://github.com/<USER>/<REPO>/compare/[<START>...]<END>

Options

Some of the available options for compare are undocumented. (append these changes to the end of the URL)

Ignore whitespace changes

1
?w=1