Master and node infrastructure, job and build details such as console In Part 2 of this series I plan to go through a use case and how to implement the above strategy through Splunk configuration.Splunk plugin for Jenkins provides deep insights into your Jenkins Hopefully this post provided some insight into how to plan your index structure. However, if most searches go across data from apps A, B, and C, such as when defining a transaction across multiple systems, then keeping them in the same index would make sense as data pulled from the indexers would potentially be relevant to the search. In this case, it would be better to create an index for data from app C to improve search performance, since Splunk does not have to remove events from apps A and B after pulling them from the indexers. However, data from app C only accounts for 10% of the volume and is usually searched on it's own, not in combination with data from apps A and B. Let's say an "application" index contains data from applications A, B, and C. In general, you want to think about the types of searches that will be done on the data after it's indexed. In this case, the "firewall" index would be configured to purge data after it's ten days old. However, because of the volume of data generated by the firewalls, the current storage configuration only alows for data to be kept for ten days. The IT operations group finds it useful to be able to search the data for two weeks. An organization may have a legal requirement to keep the data for one week. (why keep around old, worn out pairs of shoes you never wear?) By retaining data for only as long as it's required, an organization is able to best utilize their data storage resources. Although storage capacity is growing tremendously, it's still finite and has a real cost.For example, a development group might want to keep application log data for their entire three month development cycle to monitor performance during that span. There is an organizational need to keep the data.There is a legal requirement to store the data for a certain amount of time, such as industry or federal compliance standards. The retention policy is usually dictated by three factors, listed below in order of highest to lowest precedence Organizations have different requirements for the amount of time they want to keep data. An example of this strategy would be creating an index to receive firewall data and only allowing the "operations" group access to that index. By segregtating data, you can take advantage of the ability to set which indexes a Splunk user role has access to. It's possible to set search filters per role, however this is less optimal because of maintaining the filter configuration and the performance hit since data has to be retrieved and then filtered. If all the data is in one index then controlling access becomes complicated. There might be sensitive information in the data sources that only certain members of the organization should be able to see. In many organizations, access to data is given on a "need to know" basis. Here's how a well thought out index structure addresses these items. search performance will be come a bigger concern.different types of data will have different retention requirements.different groups of people will need access to different sets of data.Why does index structure matter? Isn't it okay to just dump all your organization's data into the "main" index and never have to type something like "index=tennis_shoes"? Of course not! As your installation grows a few things will inevitably happen: This is how you should think about your Splunk index structure. Now imagine all of these shoes neatly organized on a shoe rack. Whereas having a pair of loafers, tennis shoes, and boots to choose from allows you to accomodate a given task. The all-in-one shoe doesn't have the right attributes for each task. This may be okay when you're 5, but not at 35. This approach is like using one pair of shoes for everything you do such as going to the office, hiking, dancing up a storm, or playing tennis. In many burgeoning installations, all data is sent to the "main" (default) index. I bet if I think about it real hard I could remember my first pair of shoes." - from the movie Forrest Gump "Momma always says there's an awful lot you could tell about a person by their shoes. The old saying that Forrest learned from his mom comes to mind. Why? Well there's a lot you can tell about the maturity of a Splunk deployment based on this particular configuration. When splunking with a new customer, one the first things I review when auditing their environment is their index structure. Image courtesy of: Your Splunk Shoe Rack
0 Comments
Leave a Reply. |