MySQL Fundamental Questions
1. What is a database?
A database is an organized collection of data. It stores data in a way that data can be easily accessed and maintained. It has tables that contain rows and columns.
Example: School Management Database, Bank Management Database, etc.
2. What is DBMS?
DBMS stands for Database Management System. DBMS is a system software responsible for the creation, retrieval, update and management of the database. It ensures that our data is consistent, organized and is easily accessible by serving as an interface between the database and its end-users or application software.
Examples of DBMS are MySQL, SQL Server, PostgreSQL, etc.
3. What is RDBMS?
RDBMS stands for Relational Database Management System. The key difference here, compared to DBMS, is that RDBMS stores data in the form of a collection of tables and relations can be defined between the common fields of these tables. Most modern database management systems like MySQL, Microsoft SQL Server, Oracle, IBM DB2 and Amazon Redshift are based on RDBMS.
4. What is the difference between SQL and MySQL?
SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.
5. What is SQL?
SQL stands for Structured Query Language. It is the standard language for performing CRUD operations on relational database management systems. It is especially useful in handling organized data comprised of entities (variables) and relations between different entities of the data.
6. What are Tables and Fields?
A table is a set of data that is organized in a model with Columns and Rows. Columns can be categorized as vertical, and Rows are horizontal. A table has a specified number of columns called fields but can have any number of rows which is called record. Here is an example:
Column: id, first_name,date_of_birth.
Row: 234, Peter, 1/20/1990.
7. What is Primary Key?
a PRIMARY KEY uniquely identifies a row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).
CREATE TABLE user ( /* Create table with a single field as primary key */ id INT NOT NULL name VARCHAR(255) PRIMARY KEY (id) ); CREATE TABLE user ( /* Create table with multiple fields as primary key */ id INT NOT NULL last_name VARCHAR(255) first_name VARCHAR(255) NOT NULL, CONSTRAINT PK_user PRIMARY KEY (id, first_name) ); ALTER TABLE user /* Set a column as primary key */ ADD PRIMARY KEY (id); ALTER TABLE user /* Set multiple columns as primary key */ ADD CONSTRAINT PK_user /*Naming a Primary Key*/ PRIMARY KEY (id, first_name);
8. What is a foreign key?
A foreign key in one table is a primary key of another table. A foreign key is used to set relationships(one to one, one to many, and many to many) between two tables. This relationship is set up as a parent and child relationship. A child table references its parent table using a foreign key which is its parent’s primary key.
In this example, Employee is the parent table and Salary is the child table. Salary has Employee primary key (EmployeeID) as a foreign key.
9. What is a unique key?
A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike the primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.
CREATE TABLE user ( /* Create table with multiple fields as unique */ id INT NOT NULL name VARCHAR(255) NOT NULL CONSTRAINT PK_user UNIQUE (id, name) );
10. What is a join?
A JOIN is a keyword used to query data from two or more tables. Join is querying data from two or more tables.
INNER JOIN or JOIN – Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.
SELECT * FROM employee emp JOIN salary sal ON emp.id = sal.emp_id; // OR SELECT * FROM employee emp INNER JOIN salary sal ON emp.id = sal.emp_id;
LEFT JOIN or LEFT OUTER JOIN– Retrieves all the records/rows from the left and the matched records/rows from the right table.
SELECT * FROM employee emp LEFT JOIN salary sal ON emp.id = sal.emp_id;
RIGHT JOIN or RIGHT OUTER JOIN – Retrieves all the records/rows from the right and the matched records/rows from the left table.
SELECT * FROM employee emp RIGHT JOIN salary sal ON emp.id = sal.emp_id;
FULL JOIN or FULL OUTER JOIN – Retrieves all the records where there is a match on either the left or right table.
SELECT * FROM employee emp FULL JOIN salary sal ON emp.id = sal.emp_id;
11. What is a self JOIN?
A self JOIN is a case of regular join where a table is joined to itself based on some relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the table within the query.
SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee", empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor" FROM employee empA, employee empB WHERE empA.emp_sup = empB.emp_id; // OR SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee", empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor" FROM employee empA, JOIN employee empB ON empA.emp_sup = empB.emp_id;
12. What is a CROSS JOIN?
Cross join can be defined as a cartesian product of the two tables included in the join. The table after join contains the same number of rows as in the cross-product of number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like an INNER JOIN.
SELECT emp.name, sal.amount FROM employee AS emp CROSS JOIN salary AS sal;
12. What is Normalization?
Normalization is the process of organizing your database tables in a way that your data is not redundant.
For example, a spreadsheet containing information about salespeople and customers serves several purposes:
By limiting a table to one purpose you reduce the number of duplicate data contained within your database.
There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries.
13. What is Denormalization?
Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema that has redundant information. The performance is improved by using redundancy and keeping redundant data consistent. The reason for performing denormalization is the overheads produced in the query processor by an over-normalized structure.
14. What are the different forms of Normalization?
1st Normal Form – This should remove all the duplicate columns from the table. Creation of tables for the related data and identification of unique columns.
2nd Normal Form – Meeting all requirements of the first normal form. Placing the subsets of data in separate tables and Creation of relationships between the tables using primary keys.
3rd Normal Form – This should meet all requirements of 2NF. Removing the columns which are not dependent on primary key constraints.
4th Normal Form – Meeting all the requirements of the third normal form and it should not have multi-valued dependencies.
15. What is a view?
A view is a virtual table that consists of a subset of data from one or more tables. Views are not virtually present, and it takes less space to store. View can have data of one or more tables combined, and it is depending on the relationship.
16. What is an Index?
An index is very similar to an index in the back of a book. For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to the index, which lists all the topics alphabetically and is then referred to one or more specific page numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.
17. What are all the different types of indexes?
There are three types of indexes -.
This indexing does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.
This type of index reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.
NonClustered Index does not alter the physical order of the table and maintains logical order of data. Each table can have 999 nonclustered indexes.
18. What is a Cursor?
A database Cursor is a control pointer that enables traversal over the rows or records in the table. This can be viewed as a pointer to one row in a set of rows. The cursor is very much useful for traversing such as retrieval, addition, and removal of database records.
19. What is a query?
A DB query is a code written in order to put or get the information back from the database. Query can be designed in such a way that it matched our expectations of the result set. Simply, a question to the Database.
20. What is subquery?
A subquery is a query within another query. The outer query is called the main query, and the inner query is called subquery. A SubQuery is always executed first, and the result of the subquery is passed on to the main query. A sub-query is executed and its results are passed to the outer query which is then executed after.
21. What are the types of subquery?
There are two types of subquery – Correlated and Non-Correlated.
A correlated subquery is a subquery that uses fields from the outer query or the main query. It evaluates once for every row in the outer query.
A Non-Correlated sub-query is independent of the outer or main query. It does not reference any fields from the outer query.
22. What is a stored procedure?
Stored Procedure is a function consists of many SQL statements to access the database system. Several SQL statements are consolidated into a stored procedure and execute them whenever and wherever required.
Such procedures are stored in the database data dictionary. The sole disadvantage of a stored procedure is that it can be executed nowhere except in the database and occupies more memory in the database server. It also provides a sense of security and functionality as users who can’t access the data directly can be granted access via stored procedures.
23. What is a trigger?
A DB trigger is a code or program that automatically execute when some event on a table or view in a database takes place. Mainly, the trigger helps to maintain the integrity of the database.
Example: When a new student is added to the student database, new records should be created in the related tables like Exam, Score and Attendance tables.
24. What are the TRUNCATE, DELETE and DROP statements?
DELETE – delete a row in a table
DELETE FROM Candidates WHERE CandidateId > 1000;
TRUNCATE – command is used to delete all the rows from the table and free the space containing the table.
TRUNCATE TABLE Candidates;
DROP – command is used to remove an object from the database. If you drop a table, all the rows in that table are deleted and the table structure is removed from the database.
DROP TABLE user;
25. What is the difference between DROP and TRUNCATE statements?
If a table is dropped, all things associated with that table are dropped as well. This includes – the relationships defined on the table with other tables, the integrity checks and constraints, access privileges and other grants that the table has. To create and use the table again in its original form, all these relations, checks, constraints, privileges and relationships need to be redefined. However, if a table is truncated, none of the above problems exist and the table retains its original structure.
26. What is the difference between DELETE and TRUNCATE statements?
The TRUNCATE command is used to delete all the rows from the table and free the space containing the table.
The DELETE command deletes only the rows from the table based on the condition given in the where clause or deletes all the rows from the table if no condition is specified. But it does not free the space containing the table.
27. What are local and global variables and their differences?
Local variables are the variables that can be used or exist inside the function. They are not known to the other functions and those variables cannot be referred or used. Variables can be created whenever that function is called.
Global variables are the variables that can be used or exist throughout the program. The same variable declared in global cannot be used in functions. Global variables cannot be created whenever that function is called.
28. What is a constraint?
A constraint can be used to specify the limit on the data type of table. A constraint can be specified while creating or altering the table statement.
29. What is data Integrity?
Data Integrity defines the accuracy and consistency of data stored in a database. It can also define integrity constraints to enforce business rules on the data when it is entered into the application or database.
30. What is Auto Increment?
Autoincrement keyword allows the user to create a unique number to be generated when a new record is inserted into the table. AUTOINCREMENT keyword can be used in Oracle and IDENTITY keyword can be used in SQL SERVER.
Mostly this keyword can be used whenever the PRIMARY KEY is used.
31. What are Aggregate and Scalar functions?
An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. Following are the widely used SQL aggregate functions:
AVG() – Calculates the mean of a collection of values.
COUNT() – Counts the total number of records in a specific table or view.
MIN() – Calculates the minimum of a collection of values.
MAX() – Calculates the maximum of a collection of values.
SUM() – Calculates the sum of a collection of values.
FIRST() – Fetches the first element in a collection of values.
LAST() – Fetches the last element in a collection of values.
Note: All aggregate functions described above ignore NULL values except for the COUNT function.
A scalar function returns a single value based on the input value. Following are the widely used SQL scalar functions:
LEN() – Calculates the total length of the given field (column).
UCASE() – Converts a collection of string values to uppercase characters.
LCASE() – Converts a collection of string values to lowercase characters.
MID() – Extracts substrings from a collection of string values in a table.
CONCAT() – Concatenates two or more strings.
RAND() – Generates a random collection of numbers of given length.
ROUND() – Calculates the round off integer value for a numeric field (or decimal point values).
NOW() – Returns the current data & time.
FORMAT() – Sets the format to display a collection of values.
32. What is the difference between a Cluster and Non-Cluster Index?
a clustered index is used for easy retrieval of data from the database by altering the way that the records are stored. Database sorts out rows by the column which is set to be clustered index.
A nonclustered index does not alter the way it was stored but creates a completely separate object within the table. It points back to the original table rows after searching.
33. What is Datawarehouse?
Datawarehouse is a central repository of data from multiple sources of information. Those data are consolidated, transformed and made available for the mining and online processing. Warehouse data have a subset of data called Data Marts.
34. What is OLTP?
OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An essential attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. These systems are usually designed for a large number of users who conduct short transactions. Database queries are usually simple, require sub-second response times and return relatively few records. Here is an insight into the working of an OLTP system
35. What are the differences between OLTP and OLAP?
OLTP stands for Online Transaction Processing, which is a class of software applications capable of supporting transaction-oriented programs. An important attribute of an OLTP system is its ability to maintain concurrency. OLTP systems often follow a decentralized architecture to avoid single points of failure. These systems are generally designed for a large audience of end-users who conduct short transactions. Queries involved in such databases are generally simple, need fast response times and return relatively few records. The number of transactions per second acts as an effective measure for such systems.
OLAP stands for Online Analytical Processing, a class of software programs that are characterized by a relatively low frequency of online transactions. Queries are often too complex and involve a bunch of aggregations. For OLAP systems, the effectiveness measure relies highly on response time. Such systems are widely used for data mining or maintaining aggregated, historical data, usually in multi-dimensional schemas.
36. What is User-defined function? What are its various types?
The user-defined functions in SQL are like functions in any other programming language that accept parameters, perform complex calculations, and return a value. They are written to use the logic repetitively whenever required. There are two types of SQL user-defined functions:
Scalar Function: As explained earlier, user-defined scalar functions return a single scalar value.
Table-Valued Functions: User-defined table-valued functions return a table as output.
Inline: returns a table data type based on a single SELECT statement.
Multi-statement: returns a tabular result-set but, unlike inline, multiple SELECT statements can be used inside the function body.
37. What is collation?
Collation is defined as a set of rules that determine how character data can be sorted and compared. This can be used to compare A and, other language characters and also depends on the width of the characters.
ASCII value can be used to compare these character data.
38. What is Pattern Matching in SQL?
SQL pattern matching provides for pattern search in data if you have no clue as to what that word should be. This kind of SQL query uses wildcards to match a string pattern, rather than writing the exact word. The LIKE operator is used in conjunction with SQL Wildcards to fetch the required information.
a. Using the % wildcard to perform a simple search
The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern. Search a student in your database with first name beginning with the letter K:
SELECT * FROM students WHERE first_name LIKE 'K%'
b. Omitting the patterns using the NOT keyword
Use the NOT keyword to select records that don’t match the pattern. This query returns all students whose first name does not begin with K.
SELECT * FROM students WHERE first_name NOT LIKE 'K%'
c. Matching a pattern anywhere using the % wildcard twice
Search for a student in the database where he/she has a K in his/her first name.
SELECT * FROM students WHERE first_name LIKE '%Q%'
d. Using the _ wildcard to match pattern at a specific position
The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard. This query fetches all students with letter K at the third position in their first name.
SELECT * FROM students WHERE first_name LIKE '__K%'
e. Matching patterns for specific length
The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results. For example –
SELECT * /* Matches first names with three or more letters */ FROM students WHERE first_name LIKE '___%' SELECT * /* Matches first names with exactly four characters */ FROM students WHERE first_name LIKE '____'