CRITERIA |
SPARK |
STORM |
Data operation | Data at rest | Data in motion |
Parallel computation | Task parallel | Data parallel |
Latency | Few seconds | Sub-second |
Deploying the application | Using Scala, Java, Python | Using Java API |
Ans: For streaming of data flow, three components are used
Ans:
Ans: Yes, It acts as proxy also by using the mod_proxy module. This module implements a proxy, gateway or cache for Apache. It implements proxying capability for AJP13 (Apache JServ Protocol version 1.3), FTP, CONNECT (for SSL),HTTP/0.9, HTTP/1.0, and (since Apache 1.3.23) HTTP/1.1. The module can be configured to connect to other proxy modules for these and other protocols.
Ans: ZeroMQ is “a library which extends the standard socket interfaces with features traditionally provided by specialized messaging middleware products”. Storm relies on ZeroMQ primarily for task-to-task communication in running Storm topologies.
Ans: There are three distinct layers to Storm’s codebase.
First : Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.
Second : all of Storm’s interfaces are specified as Java interfaces. So even though there’s a lot of Clojure in Storm’s implementation, all usage must go through the Java API. This means that every feature of Storm is always available via Java.
Third : Storm’s implementation is largely in Clojure. Line-wise, Storm is about half Java code, half Clojure code. But Clojure is much more expressive, so in reality the great majority of the implementation logic is in Clojure.
A tuple coming off a spout can trigger thousands of tuples to be created based on it. Consider, for example,
the streaming word count topology:TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("sentences", new KestrelSpout("kestrel.backtype.com",
22133,
"sentence_queue",
new StringScheme()));
builder.setBolt("split", new SplitSentence(), 10)
.shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 20)
.fieldsGrouping("split", new Fields("word"));
Ans: This topology reads sentences off a Kestrel queue, splits the sentences into its constituent words, and then emits for each word the number of times it has seen that word before. A tuple coming off the spout triggers many tuples being created based on it: a tuple for each word in the sentence and a tuple for the updated count for each word.
Storm considers a tuple coming off a spout “fully processed” when the tuple tree has been exhausted and every message in the tree has been processed. A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.
Ans: The cleanup method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There’s no guarantee that this method will be called on the cluster: For instance, if the machine the task is running on blows up, there’s no way to invoke the method.
The cleanup method is intended when you run topologies in local mode (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks.
Ans: To kill a topology, simply run:
storm kill {stormname}
Give the same name to storm kill as you used when submitting the topology.
Storm won’t kill the topology immediately. Instead, it deactivates all the spouts so that they don’t emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.
Ans: A CombinerAggregator is used to combine a set of tuples into a single field. It has the following signature:
public interface CombinerAggregator {
T init (TridentTuple tuple);
T combine(T val1, T val2);
T zero();
}
Storm calls the init() method with each tuple, and then repeatedly calls the combine()method until the partition is processed. The values passed into the combine() method are partial aggregations, the result of combining the values returned by calls to init().
Ans: There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with “TOPOLOGY” can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:
Learn more about Apache Storm in this Apache Storm Video to get ahead in your career.
Ans: Yes, to update a running topology, the only option currently is to kill the current topology and resubmit a new one. A planned feature is to implement a Storm swap command that swaps a running topology with a new one, ensuring minimal downtime and no chance of both topologies processing tuples at the same time.
Ans: Storm UI is used in monitoring the topology. The Storm UI provides information about errors happening in tasks and fine-grained stats on the throughput and latency performance of each component of each running topology.
Ans: SSL (Secure Socket Layer) data transport requires encryption, and many governments have restrictions upon the import, export, and use of encryption technology. If Apache included SSL in the base package, its distribution would involve all sorts of legal and bureaucratic issues, and it would no longer be freely available. Also, some of the technology required to talk to current clients using SSL is patented by RSA Data Security, who restricts its use without a license.
Ans: Apache is a Web (HTTP) server, not an application server. The base package does not include any such functionality. PHP project and the mod_perl project allow you to work with databases from within the Apache environment.
Ans: The first two are remnants from the NCSA times, and generally you should be fine if you delete the first two, and stick with httpd.conf.
Ans: We can check syntax for httpd configuration file by using
following command.
httpd –S
This command will dump out a description of how Apache parsed the configuration file. Careful examination of the IP addresses and server names may help uncover configuration mistakes.
Ans: Field grouping in storm uses a mod hash function to decide which task to send a tuple, ensuring which task will be processed in the correct order. For that, you don’t require any cache. So, there is no time-out or limit to known field values.
The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the “user-id” field, tuples with the same “user-id” will always go to the same task, but tuples with different “user-id”‘s may go to different tasks.
Ans: This module creates dynamically configured virtual hosts, by allowing the IP address and/or the Host: header of the HTTP request to be used as part of the path name to determine what files to serve. This allows for easy use of a huge number of virtual hosts with similar configurations.
Ans: No. Root process opens port 80, but never listens to it, so no user will actually enter the site with root rights. If you kill the root process, you will see the other roots disappear as well.
Learn Apache Storm in this Apache Storm Certification Course.
Ans: MultiViews search is enabled by the MultiViews Options. It is the general name given to the Apache server’s ability to provide language-specific document variants in response to a request. This is documented quite thoroughly in the content negotiation description page. In addition, Apache Week carried an article on this subject entitled It then chooses the best match to the client’s requirements, and returns that document.
Ans: Yes, Apache contains a Search engine. You can search a report name in Apache by using the “Search title”.
Ans: To read from the log files, you can configure your spout and emit per line as it read the log. The output then can be assign to a bolt for analyzing.
Ans: In financial services, Storm can be helpful in preventing:
Ans: Apache Web Server package does not include ASP support.
However, a number of projects provide ASP or ASP-like functionality for Apache. Some of these are:
Ans: It is the maximum amount of time allotted to the topology to fully process a message released by a spout. If the message in not acknowledged in given time frame, Apache Storm will fail the message on the spout.
Ans: It defines whether Apache should spawn itself as a child process (standalone) or keep everything in a single process (inetd). Keeping it inetd conserves resources.
The ServerType directive is included in Apache 1.3 for background compatibility with older UNIX-based version of Apache. By default, Apache is set to standalone server which means Apache will run as a separate application on the server. The ServerType directive isn’t available in Apache 2.0.
Ans: Java applications are not stored in Apache, it can be only connected to a other Java webapp hosting webserver using the mod_jk connector. mod_jk is a replacement to the elderly mod_jserv. It is a completely new Tomcat-Apache plug-in that handles the communication between Tomcat and Apache.Several reasons: