Q1. Compare Splunk & Spark
Ans:
Criteria | Splunk | Spark |
Deployment area | Collecting large amounts of machine generated data | Iterative applications & in-memory processing |
Nature of tool | Proprietary | Open Source |
Working mode | Streaming mode | Both streaming and batch mode |
Q2. What is Splunk tool?
Ans: Splunk is a powerful platform for searching, analyzing, monitoring, visualizing and reporting of your enterprise data. It acquires important machine data and then converts it into powerful operational intelligence by giving real time insight to your data using alerts, dashboards and charts etc.
Or:
What is Splunk? Why is Splunk used for analyzing machine data?
This question will most likely be the first question you will be asked in any Splunk interview. You need to start by saying that:
Q3. Explain the working of Splunk ?
Ans: Splunk works into three phases –
Ans: Splunk has four important components :
Q5. What are the types of Splunk forwarder?
Ans: Splunk has two types of Splunk forwarder which are as follows:
Q5. What are alerts in Splunk?
Ans: An alert is an action that a saved search triggers on regular intervals set over a time range, based on the results of the search. When the alerts are triggered, various actions occur consequently.. For instance, sending an email when a search to the predefined list of people is triggered.
Three types of alerts:
Q6. What are the categories of SPL commands?
Ans: SPL commands are divided into five categories:
Q7. What are common port numbers used by Splunk?
Ans: Common ports numbers on which services are run (by default) are :
Service | Port Number |
Splunk Management Port | 8089 |
Splunk Index Replication Port | 8080 |
KV store | 8191 |
Splunk Web Port | 8000 |
Splunk Indexing Port | 9997 |
Splunk network port | 514 |
Q8. What are Splunk buckets? Explain the bucket lifecycle ?
Ans:
Ans: A directory that contains indexed data is known as a Splunk bucket. It also contains events of a certain period. Bucket lifecycle includes following stages:
Q9. What command is used to enable and disable Splunk to boot start?
Ans:
$SPLUNK_HOME/bin/splunk enable boot-start
$SPLUNK_HOME/bin/splunk disable boot-start
Q10. What is eval command?
Ans: It evaluates an expression and consigns the resulting value into a destination field. If the destination field matches with an already existing field name, the existing field is overwritten with the eval expression. This command evaluates Boolean , mathematical and string expressions.
Using eval command:
Convert Values
Round Values
Perform Calculations
User conditional statements
Format Values
Q11. What is lookup command and its use case?
Ans: The lookup command adds fields based while looking at the value in an event, referencing a lookup table, and adding the fields in matching rows in the lookup table to your event.
Example
… | lookup usertogroup user as local_user OUTPUT group as user_group
Q12. What is inputlookup command?
Ans: inputlookup command returns the whole lookup table as search results.
For example
…| inputlookup intellipaatlookup returns a search result for every row in the table intellipaatlookup which has two field values:
Q13. Explain outputlookup command?
Ans: This command outputs the current search results to a lookup table on the disk.
For example
...| outputlookup intellipaattable.csv saves all the results into intellipaattable.csv.
Q14. What commands are included in filtering results category?
Ans:
Q15. What commands are included in reporting results category?
Ans:
Q16. What commands are included in grouping results category?
Ans: transaction – Groups events that meet different constraints into transactions, where transactions are the collections of events possibly from multiple sources.
Q17. What is the use of sort command?
Ans: It sorts search results by the specified fields.
Syntax:
sort [<count>] <sort-by-clause>... [desc]
Example:
... | sort num(ip), -str(url)
It sort results by ip value in ascending order whereas url value in descending order.
Q18. Explain the difference between search head pooling and search head clustering?
Ans: Search head pooling is a group of connected servers that are used to share load, Configuration and user data Whereas Search head clustering is a group of Splunk Enterprise search heads used to serve as a central resource for searching. Since the search head cluster supports member interchangeability, the same searches and dashboards can be run and viewed from any member of the cluster.
Q19. Explain the function of Alert Manager ?
Ans: Alert manager displays the list of most recently fired alerts, i.e. alert instances. It provides a link to view the search results from that triggered alert. It also displays the alert’s name, app, type (scheduled, real-time, or rolling window), severity and mode.
Q20. What is SOS?
Ans: SOS stands for Splunk on Splunk. It is a Splunk app that provides graphical view of your Splunk environment performance and issues.
It has following purposes:
Q21. What is Splunk DB connect?
Ans: It is a general SQL database plugin that permits you to easily combine database information with Splunk queries and reports. It provides reliable, scalable and real-time integration between Splunk Enterprise and relational databases.
Q22. What is the difference between Splunk App Framework and Splunk SDKs?
Ans: Splunk App Framework resides within Splunk’s web server and permits you to customize the Splunk Web UI that comes with the product and develop Splunk apps using the Splunk web server. It is an important part of the features and functionalities of Splunk Software , which does not license users to modify anything in the Splunk Software.
Splunk SDKs are designed to allow you to develop applications from the ground up and not require Splunk Web or any components from the Splunk App Framework. These are separately licensed to you from the Splunk Software and do not alter the Splunk Software.
Q23. What is Splunk indexer and explain its stages?
Ans: The indexer is a Splunk Enterprise component that creates and manages indexes. The main functions of an indexer are:
Input : Splunk Enterprise acquires the raw data from various input sources and breaks it into 64K blocks and assign them some metadata keys. These keys include host, source and source type of the data.
Parsing : Also known as event processing, during this stage, the Enterprise analyzes and transforms the data, breaks data into streams, identifies, parses and sets timestamps, performs metadata annotation and transformation of data.
Indexing : In this phase, the parsed events are written on the disk index including both compressed data and the associated index files.
Searching : The ‘Search’ function plays a major role during this phase as it handles all searching aspects (interactive, scheduled searches, reports, dashboards, alerts) on the indexed data and stores saved searches, events, field extractions and views
Q24. What is the use of replace command?
Ans: Replace command performs a search-and-replace on specified field values with replacement values. The values in a search and replace are case sensitive.Syntax:
replace (<wc-string> WITH <wc-string>)... [IN <field-list>]
Example:
… | replace *localhost WITH localhost IN hostChange any host value that ends with “localhost” to “localhost”.
Ans: File precedence in Splunk is as follows:
Q26. What is the use of regex command?
Ans: It removes results that do not match the specified regular expression.
Syntax:
regex (<field>=<regex-expression> | <field>!=<regex-expression> | <regex-expression>)
Q27. Where is Splunk default configuration stored?
Ans: Splunk default configuration is stored at $splunkhome/etc/system/default
Q28. How to reset Splunk admin password?
Ans: reset password, follow these steps:
Q29. How to list all the saved searches in Splunk?
Ans:
Using syntax:
rest /servicesNS/-/-/saved/searches splunk_server=loca
Q30. State the different between stats and eventstats commands?
Ans: stats – This command produces summary statistics of all existing fields in your search results and store them as values in new fields.
eventstats – It is same as stats command except that aggregation results are added in order to every event and only if the aggregation is applicable to that event. It computes the requested statistics similar to stats but aggregates them to the original raw data.
Q32. Why use only Splunk? Why can’t I go for something that is open source?
Ans: This kind of question is asked to understand the scope of your knowledge
. You can answer that question by saying that Splunk has a lot of competition in the market for analyzing machine logs, doing business intelligence, for performing IT operations and providing security. But, there is no one single tool other than Splunk that can do all of these operations and that is where Splunk comes out of the box and makes a difference. With Splunk you can easily scale up your infrastructure and get professional support from a company backing the platform. Some of its competitors are Sumo Logic in the cloud space of log management and ELK in the open source category. You can refer to the below table to understand how Splunk fares against other popular tools feature-wise.
Q33. Which Splunk Roles can share the same machine?
Ans: This is another frequently asked Splunk interview question which will test the candidate’s hands-on knowledge. In case of small deployments, most of the roles can be shared on the same machine which includes Indexer,Search Head and License Master. However, in case of larger deployments the preferred practice is to host each role on stand alone hosts. Details about roles that can be shared even in case of larger deployments are mentioned below:
Q34. What are the unique benefits of getting data into a Splunk instance via Forwarders?
Ans: You can say that the benefits of getting data into Splunk via forwarders are bandwidth throttling, TCP connection and an encrypted SSL connection for transferring data from a forwarder to an indexer. The data forwarded to the indexer is also load balanced by default and even if one indexer is down due to network outage or maintenance purpose, that data can always be routed to another indexer instance in a very short time. Also, the forwarder caches the events locally before forwarding it, thus creating a temporary backup of that data.
Q35. What is the use of License Master in Splunk?
Ans: License master in Splunk is responsible for making sure that the right amount of data gets indexed. Splunk license is based on the data volume that comes to the platform within a 24hr window and thus, it is important to make sure that the environment stays within the limits of the purchased volume.
Consider a scenario where you get 300 GB of data on day one, 500 GB of data the next day and 1 terabyte of data some other day and then it suddenly drops to 100 GB on some other day. Then, you should ideally have a 1 terabyte/day licensing model. The license master thus makes sure that the indexers within the Splunk deployment have sufficient capacity and are licensing the right amount of data.
Q36. What happens if the License Master is unreachable?
Ans: In case the license master is unreachable, then it is just not possible to search the data. However, the data coming in to the Indexer will not be affected. The data will continue to flow into your Splunk deployment, the Indexers will continue to index the data as usual however, you will get a warning message on top your Search head or web UI saying that you have exceeded the indexing volume and you either need to reduce the amount of data coming in or you need to buy a higher capacity of license.
Basically, the candidate is expected to answer that the indexing does not stop; only searching is halted.
Q37. Explain ‘license violation’ from Splunk perspective.
Ans: If you exceed the data limit, then you will be shown a ‘license violation’ error. The license warning that is thrown up, will persist for 14 days. In a commercial license you can have 5 warnings within a 30 day rolling window before which your Indexer’s search results and reports stop triggering. In a free version however, it will show only 3 counts of warning.
Q38. Give a few use cases of Knowledge objects.
Ans: Knowledge objects can be used in many domains. Few examples are:
Physical Security: If your organization deals with physical security, then you can leverage data containing information about earthquakes, volcanoes, flooding, etc to gain valuable insights
Application Monitoring: By using knowledge objects, you can monitor your applications in real-time and configure alerts which will notify you when your application crashes or any downtime occurs
Network Security: You can increase security in your systems by blacklisting certain IPs from getting into your network. This can be done by using the Knowledge object called lookups
Employee Management: If you want to monitor the activity of people who are serving their notice period, then you can create a list of those people and create a rule preventing them from copying data and using them outside
Easier Searching Of Data: With knowledge objects, you can tag information, create event types and create search constraints right at the start and shorten them so that they are easy to remember, correlate and understand rather than writing long searches queries. Those constraints where you put your search conditions, and shorten them are called event types.
Q39. Explain Search Factor (SF) & Replication Factor (RF)
Ans: Questions regarding Search Factor and Replication Factor are most likely asked when you are interviewing for the role of a Splunk Architect. SF & RF are terminologies related to Clustering techniques (Search head clustering & Indexer clustering).
Q40. Which commands are included in ‘filtering results’ category?
Ans: There will be a great deal of events coming to Splunk in a short time. Thus it is a little complicated task to search and filter data. But, thankfully there are commands like ‘search’, ‘where’, ‘sort’ and ‘rex’ that come to the rescue. That is why, filtering commands are also among the most commonly asked Splunk interview questions.
Search: The ‘search’ command is used to retrieve events from indexes or filter the results of a previous search command in the pipeline. You can retrieve events from your indexes using keywords, quoted phrases, wildcards, and key/value expressions. The ‘search’ command is implied at the beginning of any and every search operation.
Where: The ‘where’ command however uses ‘eval’ expressions to filter search results. While the ‘search’ command keeps only the results for which the evaluation was successful, the ‘where’ command is used to drill down further into those search results. For example, a ‘search’ can be used to find the total number of nodes that are active but it is the ‘where’ command which will return a matching condition of an active node which is running a particular application.
Sort: The ‘sort’ command is used to sort the results by specified fields. It can sort the results in a reverse order, ascending or descending order. Apart from that, the sort command also has the capability to limit the results while sorting. For example, you can execute commands which will return only the top 5 revenue generating products in your business.
Rex: The ‘rex’ command basically allows you to extract data or particular fields from your events. For example if you want to identify certain fields in an email id: abc@edureka.co, the ‘rex’ command allows you to break down the results as abc being the user id, edureka.co being the domain name and edureka as the company name. You can use rex to breakdown, slice your events and parts of each of your event record the way you want.
Q41. What is a lookup command? Differentiate between inputlookup&outputlookupcommands.Ans:
Ans: Lookup command is that topic into which most interview questions dive into, with questions like: Can you enrich the data? How do you enrich the raw data with external lookup?
You will be given a use case scenario, where you have a csv file and you are asked to do lookups for certain product catalogs and asked to compare the raw data & structured csv or json data. So you should be prepared to answer such questions confidently.
Lookup commands are used when you want to receive some fields from an external file (such as CSV file or any python based script) to get some value of an event. It is used to narrow the search results as it helps to reference fields in an external CSV file that match fields in your event data.
An inputlookup basically takes an input as the name suggests. For example, it would take the product price, product name as input and then match it with an internal field like a product id or an item id. Whereas, anoutputlookup is used to generate an output from an existing field list. Basically, inputlookup is used to enrich the data and outputlookup is used to build their information.
Q42. What is the difference between ‘eval’, ‘stats’, ‘charts’ and ‘timecharts’ command?
Ans: ‘Eval’ and ‘stats’ are among the most common as well as the most important commands within the Splunk SPL language and they are used interchangeably in the same way as ‘search’ and ‘where’ commands.
Stats | Chart | Timechart |
Stats is a reporting command which is used to present data in a tabular format. | Chart displays the data in the form of a bar, line or area graph. It also gives the capability of generating a pie chart. | Timechart allows you to look at bar and line graphs. However, pie charts are not possible. |
In Stats command, you can use multiple fields to build a table. | In Chart, it takes only 2 fields, each field on X and Y axis respectively. | In Timechart, it takes only 1 field since the X-axis is fixed as the time field. |
Q43. What are the different types of Data Inputs in Splunk?
Ans: This is the kind of question which only somebody who has worked as a Splunk administrator can answer. The answer to the question is below.
The obvious and the easiest way would be by using files and directories as input
Configuring Network ports to receive inputs automatically and writing scripts such that the output of these scripts is pushed into Splunk is another common way
But a seasoned Splunk administrator, would be expected to add another option called windows inputs. These windows inputs are of 4 types: registry inputs monitor, printer monitor, network monitor and active directory monitor.
Q43. What are the defaults fields for every event in Splunk?
Ans: There are about 5 fields that are default and they are barcoded with every event into Splunk.
They are host, source, source type, index and timestamp.
Q44. Explain file precedence in Splunk.
Ans: File precedence is an important aspect of troubleshooting in Splunk for an administrator, developer, as well as an architect. All of Splunk’s configurations are written within plain text .conf files. There can be multiple copies present for each of these files, and thus it is important to know the role these files play when a Splunk instance is running or restarted. File precedence is an important concept to understand for a number of reasons:
To determine the priority among copies of a configuration file, Splunk software first determines the directory scheme. The directory schemes are either a) Global or b) App/user.
When the context is global (that is, where there’s no app/user context), directory priority descends in this order:
When the context is app/user, directory priority descends from user to app to system:
Q45. How can we extract fields?
Ans: You can extract fields from either event lists, sidebar or from the settings menu via the UI.
The other way is to write your own regular expressions in props.conf configuration file.
Q46.What is the difference between Search time and Index time field extractions?
Ans: As the name suggests, Search time field extraction refers to the fields extracted while performing searches whereas, fields extracted when the data comes to the indexer are referred to as Index time field extraction. You can set up the indexer time field extraction either at the forwarder level or at the indexer level.
Another difference is that Search time field extraction’s extracted fields are not part of the metadata, so they do not consume disk space. Whereas index time field extraction’s extracted fields are a part of metadata and hence consume disk space.
Q47.What is summary index in Splunk?
Ans: Summary index is another important Splunk interview question from an administrative perspective. You will be asked this question to find out if you know how to store your analytical data, reports and summaries. The answer to this question is below.
The biggest advantage of having a summary index is that you can retain the analytics and reports even after your data has aged out. For example:
But the limitations with summary index are:
That is the use of Summary indexing and in an interview, you are expected to answer both these aspects of benefit and limitation.
Q48. How to exclude some events from being indexed by Splunk?
Ans: You might not want to index all your events in Splunk instance. In that case, how will you exclude the entry of events to Splunk.
An example of this is the debug messages in your application development cycle. You can exclude such debug messages by putting those events in the null queue. These null queues are put into transforms.conf at the forwarder level itself.
If a candidate can answer this question, then he is most likely to get hired.
Q49. What is the use of Time Zone property in Splunk? When is it required the most?
Ans: Time zone is extremely important when you are searching for events from a security or fraud perspective. If you search your events with the wrong time zone then you will end up not being able to find that particular event altogether. Splunk picks up the default time zone from your browser settings. The browser in turn picks up the current time zone from the machine you are using. Splunk picks up that timezone when the data is input, and it is required the most when you are searching and correlating data coming from different sources. For example, you can search for events that came in at 4:00 PM IST, in your London data center or Singapore data center and so on. The timezone property is thus very important to correlate such events.
Q50. What is Splunk App? What is the difference between Splunk App and Add-on?
Ans: Splunk Apps are considered to be the entire collection of reports, dashboards, alerts, field extractions and lookups.
Splunk Apps minus the visual components of a report or a dashboard are Splunk Add-ons. Lookups, field extractions, etc are examples of Splunk Add-on.
Any candidate knowing this answer will be the one questioned more about the developer aspects of Splunk.
Q51. How to assign colors in a chart based on field names in Splunk UI?
Ans: You need to assign colors to charts while creating reports and presenting results. Most of the time the colors are picked by default. But what if you want to assign your own colors? For example, if your sales numbers fall below a threshold, then you might need that chart to display the graph in red color. Then, how will you be able to change the color in a Splunk Web UI?
You will have to first edit the panels built on top of a dashboard and then modify the panel settings from the UI. You can then pick and choose the colors. You can also write commands to choose the colors from a palette by inputting hexadecimal values or by writing code. But, Splunk UI is the preferred way because you have the flexibility to assign colors easily to different values based on their types in the bar chart or line chart. You can also give different gradients and set your values into a radial gauge or water gauge.
Q52. What is sourcetype in Splunk?
Ans: Now this question may feature at the bottom of the list, but that doesn’t mean it is the least important among other Splunk interview questions.
Sourcetype is a default field which is used to identify the data structure of an incoming event. Sourcetype determines how Splunk Enterprise formats the data during the indexing process. Source type can be set at the forwarder level for indexer extraction to identify different data formats. Because the source type controls how Splunk software formats incoming data, it is important that you assign the correct source type to your data. It is important that even the indexed version of the data (the event data) also looks the way you want, with appropriate timestamps and event breaks. This facilitates easier searching of data later.
For example, the data maybe coming in the form of a csv, such that the first line is a header, the second line is a blank line and then from the next line comes the actual data. Another example where you need to use sourcetype is if you want to break down date field into 3 different columns of a csv, each for day, month, year and then index it. Your answer to this question will be a decisive factor in you getting recruited.