Ans: Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained across and among the data and tables. In a relational database, relationships between data items are expressed by means of tables. Interdependencies among these tables are expressed by data values rather than by pointers. This allows a high degree of data independence. An RDBMS has the capability to recombine the data items from different files, providing powerful tools for data usage.
Ans: The data in an RDBMS is stored in database objects which are called as tables. This table is basically a collection of related data entries and it consists of numerous columns and rows.
Remember, a table is the most common and simplest form of data storage in a relational database. The following program is an example of a CUSTOMERS table −
| ID | NAME | AGE | ADDRESS | SALARY |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
Ans: Database normalization is a data design and organization process applied to data structures based on rules that help build relational databases. In relational database design, the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Ans: E-R diagram also termed as Entity-Relationship diagram shows relationship between various tables in the database.
Ans: Every table is broken up into smaller entities called fields. The fields in the CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about every record in the table.
Ans:
1NF: Eliminate Repeating Groups Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.
2NF: Eliminate Redundant Data If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3NF: Eliminate Columns Not Dependent On Key If attributes do not contribute to a description of the key, remove them to a separate table. All attributes must be directly dependent on the primary key.
BCNF: Boyce-Codd Normal Form If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
4NF: Isolate Independent Multiple Relationships No table may contain two or more 1:n or n:m relationships that are not directly related.
5NF: Isolate Semantically Related Multiple Relationships There may be practical constrains on information that justify separating logically related many-to-many relationships.
ONF: Optimal Normal Form A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.
DKNF: Domain-Key Normal Form A model free from all modification anomalies. Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first fulfill all the criteria of a 2NF and 1NF database.
Ans: A record is also called as a row of data is each individual entry that exists in a table. For example, there are 7 records in the above CUSTOMERS table. Following is a single row of data or record in the CUSTOMERS table −
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
A record is a horizontal entity in a table.
Ans: A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.
e.g. sp_helpdb, sp_renamedb, sp_depends etc.
Ans: A column is a vertical entity in a table that contains all information associated with a specific field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents location description and would be as shown below −
| ADDRESS |
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
Ans: A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; the DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be viewed as similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.
Ans: A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a field that contains spaces. A field with a NULL value is the one that has been left blank during a record creation.
Ans: A simple view can be thought of as a subset of a table. It can be used for retrieving data, as well as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does data in the view, as views are the way to look at part of the original table. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views.
Ans: Constraints are the rules enforced on data columns on a table. These are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints can either be column level or table level. Column level constraints are applied only to one column whereas, table level constraints are applied to the entire table.
Following are some of the most commonly used constraints available in SQL −
Ans: An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance. Clustered indexes define the physical sorting of a database table’s rows in the storage media. For this reason, each database table may have only one clustered index. Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself.
Ans: The following categories of data integrity exist with each RDBMS −
Ans: A database is a logically coherent collection of data with some inherent meaning, representing some aspect of real world and which is designed, built and populated with data for a specific purpose.
Ans: Database normalization is the process of efficiently organizing data in a database. There are two reasons of this normalization process −
Both these reasons are worthy goals as they reduce the amount of space a database consumes and ensures that data is logically stored. Normalization consists of a series of guidelines that help guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the format or the way a database structure is laid out. The aim of normal forms is to organize the database structure, so that it complies with the rules of first normal form, then second normal form and finally the third normal form.
It is your choice to take it further and go to the fourth normal form, fifth normal form and so on, but in general, the third normal form is more than enough.
Ans: It is a collection of programs that enables user to create and maintain a database. In other words it is general-purpose software that provides the users with the processes of defining, constructing and manipulating the database for various applications.
Ans: DBMS provides a systematic and organized way of storing, managing and retrieving from collection of logically related information. Relational Data Base Management System also provides what DBMS provides but above that it provides relationship integrity. So in short we can say
RDBMS = DBMS + REFERENTIAL INTEGRITY
These relations are defined by using “Foreign Keys” in any RDBMS. Many DBMS companies claimed there DBMS product was a RDBMS compliant, but according to industry rules and regulations if the DBMS fulfills the twelve CODD rules it’s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the twelve CODD rules and are considered as truly Relational Data Base Management System.
Ans: The database and DBMS software together is called as Database system.
Ans: In 1969 Dr. E. F. Codd laid down some 12 rules which a DBMS should adhere in order to get the logo of a true RDBMS.
Rule 1: Information Rule- “All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values in tables.”
Rule 2: Guaranteed access Rule- “Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.”In flat files we have to parse and know exact location of field values. But if a DBMS is truly RDBMS you can access the value by specifying the table name, field name, for instance Customers.Fields [‘Customer Name’].
Rule 3: Systematic treatment of null values- “Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.”.
Rule 4: Dynamic on-line catalog based on the relational model- “The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.”The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database.
Rule 5: Comprehensive data sub-language Rule- “A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items
Data Definition
View Definition
Data Manipulation (Interactive and by program).
Integrity Constraints
Authorization.
Transaction boundaries ( Begin , commit and rollback)
Rule 6: View updating Rule- “All views that are theoretically updatable are also updatable by the system.”
Rule 7: High-level insert, update and delete- “The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.”
Rule 8: Physical data independence- “Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.”
Rule 9: Logical data independence- “Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables.”
Rule 10: Integrity independence- “Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs.”
Rule 11: Distribution independence- “A relational DBMS has distribution independence.”
Rule 12: Non-subversion Rule- “If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”
Ans: I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the requirement – the second most important lesson.
Everyone from the senior stakeholders to architects, designers and developers need to fully understand the goal. Not the solution, but the “problem we’re trying to solve”. End users never ask for what they need (the requirement), but instead, they describe a potential solution. IT professionals are by nature delivery focused, and, get frustrated when it appears “the user changed their mind”. I find the user seldom changes their mind. In reality, the requirement was never fully understood.
To summarize. Focus on the problem not the solution. Then (once understood), suggest a dozen solutions and pick the best one. But keep it simple.
Ans: There are three major relationship models:-
Ans: The traditional RDBMS solutions from Oracle, IBM and Microsoft implement the relational model on a 1970s hardware architecture, and typically provide a good general purpose database platform which can be applied to most OLTP and Data Warehouse use cases.
However, as Dr. Michael Stonebraker indicated in this 2007 paper, The End of an Architectural Era (It’s Time for a Complete Rewrite), these are no longer fit for purpose, as both the hardware technology, and processing demands have moved on. In particular, the need for real time (millisecond) performance, greater scalability to handle web-scale applications, and the need to handle unstructured and semi-structured data.
Ans: The Key-Value NoSQL databases were developed to handle the massive transaction volume, and low latency needed to handle web commerce at Amazon and LinkedIn. Others (eg. MongoDB) where developed to handle semi-structured data, while still others (eg. Neo4J) were built to efficiently model data networks of the type found at Facebook or LinkedIn.
The common thread with NoSQL databases is they tend to use an API interface rather than industry standard SQL, although increasingly that’s changing. They do however, entirely reject the relational model and ACID compliance. They typically don’t support cross-table join operations, and are entirely focused on low latency, trading consistency for scalability.
The so-called NewSQL databases include VoltDB , MemSQL and CockroachDB are a return to the relational model, but re-architected for modern hardware and web scale use cases. Like NoSQL solutions, they tend to run on a shared nothing architecture, and scale to millions of transactions per second, but they also have full transaction support and ACID compliance that are critical for financial operations.
Ans: Denormalization is the process of putting one fact in numerous places (its vice-versa of normalization).Only one valid reason exists for denormalizing a relational design – to enhance performance.The sacrifice to performance is that you increase redundancy in database.
Ans: The shared nothing architecture is built to support horizontal scalability, and when combined with data replication, can provide high availability and good performance. If one node in the cluster fails, the system continues, as the data is available on other nodes. The NoSQL database is built upon this architecture, and to maximize throughput, ACID compliance is relaxed in favor of Eventual Consistency, and in some cases (eg. Cassandra), it supports tunable consistency, allowing the developer to trade performance for consistency, and durability.
For example, after a write operation, the change cannot be considered durable (the D in ACID) until the change is replicated to at least one, ideally two other nodes , but this would increase latency, and reduce performance. It’s possible however, to relax this constraint, and return immediately, with the risk the change may be lost if the node crashes before the data is successfully replicated. This becomes even more of a potential issue if the node is temporarily disconnected from the network, but is allowed to continue accepting transactions until the connection is restored. In practice, consistency will be eventually be achieved when the connection is reestablished – hence the term Eventual Consistency.
A NewSQL database on the other hand accepts no such compromise, and some databases (eg. VoltDB), even support full serializability, executing transactions as if they were executed serially. Impressively, they manage this impressive feat at a rate of millions of transactions per second, potentially on commodity hardware.
Ans: In fourth normal form it should not contain two or more independent multi-v about an entity and it should satisfy “Third Normal form”.
Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.
All the low-latency, high throughput of the NoSQL solutions, but with the flexibility and ease of a full relational database, and full SQL support.
In conclusion, the solution needs to abandon the traditional batch oriented solution in favour of an always-on streaming solution with all processing in memory.
Ans: If we want relational system in conjunction with time we use sixth normal form. At this moment SQL Server does not supports it directly.
Ans: First stated in the paper One Size Fits All – An Idea Whose Time Has Come And Gone Dr. Michael Stonebraker argued that the legacy RDBMS dominance was at an end, and would be replaced by specialized database technology including stream processing, OLTP and Data Warehouse solutions.
Certainly disruption in the Data Warehouse database market has been accelerated with the move towards the cloud, and as this Gigaom Report illustrates, there are at least nine major players in the market, with new specialized tools including Google Big Query, Amazon Redshift and Snowflake, and the column store (in memory or on secondary storage) dominates.
Finally, the explosion of specialized NoSQL and NewSQL databases, each with its own specialty including Key-Value, Document Stores, Text Search and Graph databases lend credence to the statement “One Size no longer fits all”.
I do think however, we’re still in a transformation stage, and the shake-out is not yet complete. I think a lot of large corporations (especially Financial Services) are wary of change, but it’s already happening.