Elasticsearch Delete Issues

Problem :­
We have some types of documents which have to be deleted daily based on the date in them and if it has expired. An alert from another tool tells us that if such an old dated document appears in the search results which is filtered out by the end service.
So, for the past month, we saw that the alert was getting triggered every night between 12am and 7:15am and then it disappears (no more data from yesterday)

Remember, that these documents are deleted daily using the XDELETE query to the elasticsearch cluster.

Experiments :­
We run an hourly job to delete old data and logically, one expects the old data to disappear as soon as the first delete job is run after 12am, but we keep seeing the alert with the log showing that a result from yesterday was getting triggered.

I increased the job frequency to run every 15 minutes with a "refresh" on the index after each job and yet, the splunk alert would show up until 7:15am every day. The pattern that is seen is between 6am and 7am, the number of old documents begin to decrease and finally there are none after 7:15am. Even "optimize_expunge_deletes" does not decrease the time these documents show up in the elasticsearch results.

Possible causes :­
• The second last comment on this forum reveals that elasticsearch marks documents for deletes and by default, merges only happen on an index when there is 10% or more documents to be added or deleted. As we index daily, probably 7:15am is the time when a merge happens and the documents get deleted.

• Though, we see the docID in the logs, when searching for the documents in the elasticsearch cluster with the ID leads to no results and hence, this points to that different results shown from the head than from the client API.

Possible solution :­
Note, the index does not use TTL(time­to­live) value for the documents but as per the elasticsearch forums, people still see this issue after using TTL.
So supposedly, this is the way lucene and elasticsearch want to be efficient by limiting merges as a merge is an expensive operation from memory’s perspective.
The likely solution given by elasticsearch forum experts is to create new index and swap indices whenever possible(this means swapping indices everyday for us) as a new index has faster searches avoiding the deleted documents in the lookup (overkill !!).

Considering, most of the elasticsearch users talk about documents being present in the millions, sometimes it is not feasible to index and come up with a new index daily. So, an alternate short approach can be to update the document with a field of deleted set as true and filter out such documents.

Elasticsearch document inconsistency

Problem :­
A particular search query would show results sometimes and sometimes won't.


Root cause analysis :­
A pattern observed with the hits and misses was of 011100 occuring in a repeated way.
Elasticsearch by default delegates the queries amongst the nodes till it finds the results and hence, if you want to direct your query to a particular node, a "preference" parameter can be set to "nodeID"

The way to get a nodeID (the one that I found) is ­
curl ­XGET 'localhost:9200/_nodes'

wherein every node has a ID associated in the json header.
The nodeID can be used as ­
searchRequestBuilder.setPreference("_only_node:xyz");

Using this preference, one can narrow down the shard that was showing this inconsistent behaviour for the given document. The shard stats showed that it had the same number of "numDocs" and hence it seemed for some reason, the document was not searchable on that shard.

Found a similar issue in elasticsearch groups ­

Resolution :­
The elasticsearch logs did not show any issue on the nodes and that "refresh" and "flush" on the index did not produce any exception/error in the logs and did not resolve the issue.
As per the issue thread, people had recreated their indexes or stuck to using the primary shard(which wasnt an option as recreating the index would take a lot of time and hitting the primary shard only always gave us no hits)
The no­-downtime way was to bring the replica shards down to 0 and create them again. (which luckily fixed the issue for us)

curl ­XPUT 'localhost:9200/my_index/_settings' ­d '{
  "index" : {
    "number_of_replicas" : 0
  }
}'

Run an indexing job to fill back in the documents lost in the process.

Possible Causes :­
We had some downtimes in the cluster due to high CPU usage and maybe such outages would have caused an inconsistency in the shards. Though, we still kept getting consistent results for most of the queries we had.
So the guess is that any such issue can cause elasticsearch shards to become inconsistent and the resolution is variable/dependent on the number of documents in the index and the possible downtime that can be taken.

Sinking the Submarine Puzzle

So every now and then one comes across some puzzle which makes you delve deep into number theory and though things appear weird at the start, they end up being deterministic... or precisely certainly solvable though in infinity !!

Here is the "Sinking the Submarine" problem -
[I got this puzzle from Alex Mikhailov and a little help from Yuval]

An enemy submarine is somewhere on the number line (consider only integers for this problem). It is moving at some rate (again, integral units per minute). You know neither its position nor its velocity. You can launch a torpedo each minute at any integer on the number line. If the the submarine is there, you hit it and it sinks. You have all the time and torpedoes you want. You must sink this enemy sub - devise a strategy that is guaranteed to eventually hit the enemy sub.

Solution :-
Looking at the problem, one feels a simple strategy can help us reach the solution. But then one goes into asking random questions, like 
- Do we know the initial position?
- Should we even care about the initial position?

Remember that the submarine does not have an acceleration. So atleast, some randomness has already been constrained. So, lets go about proving that the submarine can be sinked certainly.
In this puzzle, we have combination of values for velocity and displacement of submarine on the number line based on a function between initial position and velocity. Remember, that the speed(velocity) remains constant. So, the submarine would be travelling the same distance between time t0 to t1 as it would between time t1 and t2.
The possible values of velocity are 0,1,2,..... infinity
The possible values of starting displacement are 0,1,2,...infinity.
So, we need to traverse the combinations of starting displacement and velocity in such a way that we hit the submarine somewhere before infinity. Using a simple function, such as 
currentPostion = startingDisplacement + (time * velocity) 
we can brute force each combination value and sink the submarine before infinity.
Here is one of the lame ways to iterate over all possible combinations of velocity and time


Had some free time, so decided to code it in.
New Document

1: import java.util.Scanner;
2: 
3: public class SinkTheSubmarine {
4:
5: public static void printNumShotsRequired(long startPosition, long velocity) {
6:  if (startPosition == 0 && velocity == 0) {
7:   System.out.println("Submarine hit on 1 attempt"); 
8:   return;
9:  } 
10:  long subPosition = startPosition;
11:  long time = 1;
12:  long startDisp = 0;
13:  long vel = 0;
14:  long shotPosition = 0;
15:  boolean hit = false;
16:  boolean changeStartDisp = false;
17:  boolean changeVel = false;
18:  
19:  startDisp++;
20:  changeStartDisp=true;
21:  do { 
22:   System.out.println("startDisp:"+startDisp+" vel:"+vel);
23:   subPosition = startPosition + (time * velocity);
24:   shotPosition = startDisp + (time * vel);
25:   if (shotPosition == subPosition) {
26:    System.out.println("Submarine hit on " + (time + 1) + " attempt"); 
27:    hit = true;
28:   }
29:   if (changeStartDisp && !changeVel) {
30:    if (startDisp == 0) {
31:     //Move down
32:     vel++;
33:     changeStartDisp = false;
34:     changeVel = true;
35:    } else {
36:     //Move left bottom
37:     vel++;
38:     startDisp--;
39:    }
40:   }else if(!changeStartDisp && changeVel) {
41:    if (vel == 0) {
42:     //Move right
43:     startDisp++;
44:     changeStartDisp = true;
45:     changeVel = false;
46:    } else {
47:     //Move right up
48:     startDisp++;
49:     vel--;
50:    }
51:   }  
52:   time++;
53:  } while(!hit);
54:  return;
55: }
56: 
57: public static void main(String[] args) {
58:  Scanner scanner = new Scanner(System.in);
59:  System.out.println("Enter starting location of submarine:");
60:  long startstartDisplacement = scanner.nextInt();
61:  System.out.println("Enter velocity of submarine:");
62:  long velocity = scanner.nextInt();
63:  System.out.println(startstartDisplacement + " " + velocity);
64:  printNumShotsRequired(startstartDisplacement, velocity);
65: }
66:}

Remember, that if we had acceleration in the picture, then we would have to change our function with something like Newton's second law of motion -
s = ut + (at2/2)
and will have to iterate over combinations for starting displacement, velocity and acceleration and hence, even in that case, sinking the submarine is a certainty.

PixBlog 4 - Really North NorCal (Shasta, Random Peaks and Rocks being split by water)

Burning up the rocks to find a way through [Location :- McArthur-Burney Falls]

Some science experiment. Sulphur mixing with boiled up water [Location - Lassen National Park]

What happens outside the visitor center, stays outside the visitor center [Location - Crater Lake National Park, Oregon]

Steps to the temple with Avalanche Gulch shining bright [Location - Bunny Flat trail, Mount Shasta]

For sometimes, there is no reference line to match when the expanse is so huge [Location - Mount Shasta]

 Because sometimes one is lazy enough to not go down [Location - Middle McCloud Falls]

The crater which was formed by a collapse and not a strike !! [Location - Crater Lake National Park, Oregon]

Road to Rendition [Location - North Lake Tahoe]


One below the other should be one and the same, maybe a few centuries from now ;)
[Lower & Upper Yosemite Falls]

Lower & Upper Yosemite Falls to the Left, Half Dome in the center, Vernal & Nevada Falls to the right [Location - Glacier Point, Yosemite National Park]

PixBlog 3 - Zion(Angels and subways) and Bryce(Earth's own fingers)

The bridge on the right is not photo-shopped !! [Location - Zion National Park]

Water being confused if its getting dark or if its gonna get brighter... [Location - Zion National Park]

A wedge shaped rock but worth climbing. [Location - Angel's Landing, Zion National Park]

Jeremy Clarkson style - "Some say, this is where angels used to land, or they still do!!" [Location - Angel's Landing, Zion National Park]

Valley view with a few sun rays peeping through. [Location :- Angel's Landing, Zion National Park]

"I'm Arnold. Get DOWWWNN !!" [Location - Angel's Landing, Zion National Park]

Just cant help myself from saying "Daya.. darwaza tod do" [Location :- Outskirts of Bryce Canyon National Park]

Earth showing its cocky attitude to the sky with its fingers [Location :- Bryce Canyon National Park]

 Fight between water and sandstone over time [Location :- Bryce Canyon National Park]


"I'm still Arnold. Get DOWWWNN" [Location :- Bryce Canyon National Park]

PixBlog 2 - Yosemite in Winter



Bridal's veil... [Bridalveil Falls]


Hence Proved, ice does not reflect !! [Mirror Lake]


Justin Timberlake singing Mirrors !! [Mirror Lake]


Last rites [Rays on Half Dome]

El Capitan heating up [ El Capitan to the left, Half Dome in the center, Bridalveil Falls and Cloud's Rest to the right]


Because the rays would find a way [ Foot of Vernal Falls]

The rocks pouring their heart out [ Upper Yosemite Falls]


Ice and Ice melt [ Lower Yosemite Falls]

Ice and Ice melt becoming Ice again !! [Foot of Lower Yosemite Falls]


This is what the falls view all year long [SubDome and Half dome to the left]

Even Tree

So I recently came across this coding question on hackerrank and stackoverflow called Even Tree. The question goes as follows - 

You are given a tree (a simple connected graph with no cycles). You have to remove asmany edges from the tree as possible to obtain a forest with the condition that : Each connected component of the forest should contain an even number of vertices.

To accomplish this, you will remove some edges from the tree. Find out the number of removed edges.

So the question requires us to traverse a tree and cutoff edges such that each sub-tree/forest created has even number of nodes. You can also treat this question as a directional graph question wherein you start from a node and remove any edges to come up with subgraphs that contain even number of nodes.

To make it easy to visualize, lets consider a tree -

                                    Fig 1 - Sample tree with all it edges
                            Fig 2 - Tree with edges to be removed marked in blue
Algorithm :-
  • Populate the children below each node:- 
This step relies on the way the input is given for the problem(upper nodes being nearer to the root and lower nodes being the leaves. If the input is given out of order, I recommend using a level order traversal and do a bottom-up approach for this step to calculate the children below. The function populateChildren adds the number below each to a map and as we traverse bottom to top, we keep adding the number of children of each node in a node's adjacency list to that node's number of children.

  • Traverse each edge and decide if it needs to be removed based on the number of children below each of the vertices connected through that edge:- 
This step iterates through all edges and checks if an edge needs to be removed. There are two cases when an edge can be removed 
- The first case is when the difference between the number of children below the parent node and the child node in the edge is odd AND the number of children below the child is odd. (The difference means that child node of the edge along with its children aggregate to an even number and the parent can get into a different forest of nodes. For example, consider the edge joining nodes 2 and 5. As 5 has odd number of children and difference between the children of parent and node is also odd indicating that the parent can find a different group with its children or its parent or both.)

- The second case is the case when children below both parent and child of an edge have an odd number, it is safe to remove that edge to get forests of even nodes.(For example, consider the edge joining nodes 1 and 3. For node 3, the number of children is three and for node 1, the number of children is nine. So node 1 will be able to connect to the remaining 5 nodes in such a way so as to make a forest of even nodes)

A sample input for this solution is as below :-

 30 29
 2 1
 3 2
 4 3
 5 2
 6 4
 7 4
 8 1
 9 5
 10 4
 11 4
 12 8
 13 2
 14 2
 15 8
 16 10
 17 1
 18 17
 19 18
 20 4
 21 15
 22 20
 23 2
 24 12
 25 21
 26 17
 27 5
 28 27
 29 4

 30 25



In the input, the first line indicates the number of vertices and edges respectively and the following lines indicate the edges connecting the nodes.

Pasting the solution code in Java below :- New Document
1:  import java.util.ArrayList;  
2:  import java.util.HashMap;  
3:  import java.util.Map;  
4:  import java.util.Scanner;  
5:    
6:  class EvenTree {  
7:      private int numVertices = 0;  
8:      private int edges = 0;  
9:      private Map<Integer, Integer> edgeListMap = new HashMap<Integer, Integer>();  
10:      private Map<Integer, ArrayList<Integer>> adjListMap = new HashMap<Integer, ArrayList<Integer>>();  
11:      private Map<Integer, Integer> numChildren = new HashMap<Integer, Integer>();  
12:    
13:      //Method to decide which edges to remove  
14:      private void eliminateEdges() {  
15:          int edgesRemoved = 0;  
16:          for (Map.Entry<Integer, Integer> entry : edgeListMap.entrySet()) {  
17:              if (((this.numChildren.get(entry.getValue()) - this.numChildren  
18:                      .get(entry.getKey())) % 2 != 0 && this.numChildren  
19:                      .get(entry.getKey()) % 2 != 0)  
20:                      || (this.numChildren.get(entry.getValue()) % 2 != 0 && this.numChildren  
21:                              .get(entry.getKey()) % 2 != 0)) {  
22:                  // For printing the edges removed  
23:                  //System.out.println(entry.getKey() + ":" + entry.getValue());  
24:                  ++edgesRemoved;  
25:              }  
26:          }  
27:          System.out.println(edgesRemoved);  
28:      }  
29:    
30:      //Method to populate the number of children below each node  
31:      private void populateChildren() {  
32:          for (int i = numVertices; i > 0; i--) {  
33:              int numChildren = 0;  
34:              ArrayList<Integer> adjList = this.adjListMap.get(new Integer(i));  
35:              if (adjList != null) {  
36:                  for (Integer node : adjList) {  
37:                      if (this.numChildren.containsKey(node)) {  
38:                          numChildren = numChildren + 1  
39:                                  + this.numChildren.get(node);  
40:                      } else  
41:                          ++numChildren;  
42:                  }  
43:              }  
44:              this.numChildren.put(i, numChildren);  
45:          }  
46:      }  
47:    
48:      public static void main(String[] args) {  
49:    
50:          Scanner ob = new Scanner(System.in);  
51:          EvenTree eventree = new EvenTree();  
52:          eventree.numVertices = ob.nextInt();  
53:          if (eventree.numVertices % 2 != 0) {  
54:              System.out.println("Number of vertices are not even. Exiting...");  
55:              return;  
56:          }  
57:          eventree.edges = ob.nextInt();  
58:          int vertex1, vertex2;  
59:          for (int i = 0; i < eventree.edges; i++) {  
60:              vertex1 = ob.nextInt();  
61:              vertex2 = ob.nextInt();  
62:              eventree.edgeListMap.put(vertex1, vertex2);  
63:              if (eventree.adjListMap.containsKey(vertex2)) {  
64:                  ArrayList<Integer> adjListMap = eventree.adjListMap  
65:                          .get(vertex2);  
66:                  adjListMap.add(vertex1);  
67:              } else {  
68:                  ArrayList<Integer> adjList = new ArrayList<Integer>();  
69:                  adjList.add(vertex1);  
70:                  eventree.adjListMap.put(vertex2, adjList);  
71:              }  
72:          }  
73:      //Populate the children map  
74:          eventree.populateChildren();  
75:      //Decide the edges to be removed  
76:          eventree.eliminateEdges();  
77:      }  
78:  }  
The time complexity of this solution is O(V + E).