Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


MySQL interview – Advanced Queries

MySQL Advanced Queries

1.a Write a query to find the top 3 highest salaries in the company.

SELECT salary
FROM employee
ORDER BY salary DESC
LIMIT 3;

1.b Write a query to find the top 3 highest salary employees in the company.

SELECT *
FROM employee
ORDER BY salary DESC
LIMIT 3;

1.c Write a query to find the top 3rd highest salary employee in the company.

SELECT *
FROM employee
ORDER BY salary DESC
-- LIMIT offset, count
LIMIT 2, 1;

1.d Write an SQL query to find the maximum, minimum, and average salary of the employees.

SELECT 
    Max(salary), 
    Min(salary), 
    AVG(salary) 
FROM employee;

 

2. Write a query to find the number of employees who have been with the company for more than 5 years.

– Use (CURRENT_DATE – hired_date) / 365 to calculate 1 year

SELECT COUNT(*) as _5YearAndOlder
FROM employee
WHERE ((CURRENT_DATE - hired_date) / 365) >= 5;

3.a Write a query to find the average salary for each department in the company. Sort the average_salary from highest to lowest

SELECT department_id, AVG(salary) AS average_salary
FROM employee
GROUP BY department_id
ORDER BY average_salary DESC;

3.b Write a query to find the count of employees per deparment. 

SELECT department_id, COUNT(id) AS departmentCount
FROM employee
GROUP BY department_id
ORDER BY departmentCount DESC;

3.c Write a query to find the employees that have salary that is above average salary for each department in the company.

  • Get the average salary per department in a subquery
  • Use the average found above to filter employees by their salary
SELECT emp.id, emp.first_name, emp.last_name, emp.salary, avg_salary.average_salary
FROM employee as emp
JOIN (
    SELECT department_id, AVG(salary) AS average_salary
    FROM employee
    GROUP BY department_id
) as avg_salary ON avg_salary.department_id = emp.department_id
WHERE emp.salary >= avg_salary.average_salary;

 

4. Write a query to find the employees who are working on more than 1 project.

SELECT pe.employee_id, emp.first_name, emp.last_name, COUNT(pe.project_id) as projectCount 
FROM project_employees as pe
JOIN employee as emp ON pe.employee_id = emp.id -- join employee just to display name
GROUP BY pe.employee_id
HAVING projectCount > 1;

5. Write a query to find the employees who are not working on any projects.

SELECT id, first_name, last_name
FROM employee
WHERE id NOT IN (
  SELECT DISTINCT employee_id
  FROM project_employees
);

6.a Write a query to find the employees who have joined last year.

SELECT id, first_name, last_name, hired_date
FROM employee
WHERE YEAR(hired_date) = YEAR(CURRENT_DATE()) - 1;

6.b Write a query to find the employees who have joined in the last year.

SELECT id, first_name, last_name, hired_date
FROM employee
WHERE hired_date > CURRENT_DATE - INTERVAL 1 YEAR;

7. Write a query to find the employees who have submitted more than 1 bug reports in the last month.

SELECT br.reporter_id, emp.first_name, emp.last_name, COUNT(br.id) as bugCount
FROM employee_bug_report as br
JOIN employee as emp ON emp.id = br.reporter_id
WHERE br.reported_at > CURRENT_DATE - INTERVAL 1 MONTH
GROUP BY br.reporter_id
HAVING bugCount > 1;

8.a Write a query to find the employee who has submitted the most bug reports.

SELECT br.reporter_id, emp.first_name, emp.last_name, COUNT(br.id) as bugCount
FROM employee_bug_report as br
JOIN employee as emp ON emp.id = br.reporter_id
GROUP BY br.reporter_id
ORDER BY bugCount DESC
LIMIT 0, 1;

8.b Write a query to find the employee who has submitted the most bug reports in the last month.

SELECT br.reporter_id, emp.first_name, emp.last_name, (CURRENT_DATE - INTERVAL 1 MONTH) as from_date, COUNT(br.id) as bugCount
FROM employee_bug_report as br
JOIN employee as emp ON emp.id = br.reporter_id
WHERE br.reported_at > CURRENT_DATE - INTERVAL 1 MONTH
GROUP BY br.reporter_id
ORDER BY bugCount DESC
LIMIT 0, 1;

9.a Write a query to find the employees who have duplicated first_names

  • This question can be worded differently in many ways. It can be asked for duplicate emails.
SELECT emp.first_name, COUNT(emp.id) as nameCount
FROM employee as emp
GROUP BY emp.first_name
HAVING nameCount > 1;

9.b Write a query to find the count of employees who have duplicated first_names

SELECT COUNT(duplicatedNames.idCount) as numberOfDuplicatedFirstName
FROM (
    SELECT COUNT(emp.id) as idCount
    FROM employee as emp
    GROUP BY emp.first_name
    HAVING idCount > 1
) as duplicatedNames;

9.c Write a query to find duplicated first_names of over 2 employees or more

SELECT COUNT(duplicatedNames.idCount) as numberOfDuplicatedFirstName
FROM (
    SELECT COUNT(emp.id) as idCount
    FROM employee as emp
    GROUP BY emp.first_name
    HAVING idCount >= 2
) as duplicatedNames;

10. Write a query to find the employees that their first_name starts with Sh

— start with is LIKE ‘xxx%’

SELECT emp.*
FROM employee as emp
WHERE emp.first_name LIKE 'Sh%';

11. Write a query to find the employees that their first_name ends with n

— end with is LIKE ‘%xxx’

SELECT emp.*
FROM employee as emp
WHERE emp.first_name LIKE '%n';

12. Write a query to find the employees that their first_name has or contains an

— contain is LIKE ‘%xxx%

SELECT emp.*
FROM employee as emp
WHERE emp.first_name LIKE '%an%';

13. Write a query to find the employees that their first_name starts with t and last_name ends with n

SELECT emp.*
FROM employee as emp
WHERE emp.first_name LIKE 't%' AND emp.last_name LIKE '%n';

14. Write a query to put together employee first_name and last_name as full_name

SELECT CONCAT(emp.first_name, ' ', emp.last_name) as full_name
FROM employee as emp;

15. Write an SQL query to create a new table(employee1) with data and structure copied from another table(employee).

CREATE TABLE employee1 
SELECT * FROM employee;

16. Find the total number of bugs reported per day, ordered by chronological date

  • Use DATE to GROUP BY date out of the date field.
SELECT DATE(br.reported_at) as dateReported, COUNT(br.id) as bugCount
FROM employee_bug_report as br
GROUP BY DATE(br.reported_at)
ORDER BY DATE(br.reported_at) DESC;

Github Project

October 10, 2019

MySQL interview – Fundamentals

MySQL Fundamental Questions

1. What is a database?

A database is an organized collection of data. It stores data in a way that data can be easily accessed and maintained. It has tables that contain rows and columns.

Example: School Management Database, Bank Management Database, etc.

Image result for uml relationship diagram"

 

2. What is DBMS?

DBMS stands for Database Management System. DBMS is a system software responsible for the creation, retrieval, update and management of the database. It ensures that our data is consistent, organized and is easily accessible by serving as an interface between the database and its end-users or application software.

Examples of DBMS are MySQL, SQL Server, PostgreSQL, etc.

 

3. What is RDBMS?

RDBMS stands for Relational Database Management System. The key difference here, compared to DBMS, is that RDBMS stores data in the form of a collection of tables and relations can be defined between the common fields of these tables. Most modern database management systems like MySQL, Microsoft SQL Server, Oracle, IBM DB2 and Amazon Redshift are based on RDBMS.

4. What is the difference between SQL and MySQL?

SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.

5. What is SQL?

SQL stands for Structured Query Language. It is the standard language for performing CRUD operations on relational database management systems. It is especially useful in handling organized data comprised of entities (variables) and relations between different entities of the data.

6. What are Tables and Fields?

A table is a set of data that is organized in a model with Columns and Rows. Columns can be categorized as vertical, and Rows are horizontal. A table has a specified number of columns called fields but can have any number of rows which is called record. Here is an example:

Table: User

Column: id, first_name,date_of_birth.

Row: 234, Peter, 1/20/1990.

7. What is Primary Key?

a PRIMARY KEY uniquely identifies a row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).

CREATE TABLE user ( 	 /* Create table with a single field as primary key */
    id INT NOT NULL
    name VARCHAR(255)
    PRIMARY KEY (id)
);

CREATE TABLE user ( 	 /* Create table with multiple fields as primary key */
    id INT NOT NULL
    last_name VARCHAR(255)
    first_name VARCHAR(255) NOT NULL,
    CONSTRAINT PK_user
    PRIMARY KEY (id, first_name)
);

ALTER TABLE user 	 /* Set a column as primary key */
ADD PRIMARY KEY (id);

ALTER TABLE user 	 /* Set multiple columns as primary key */
ADD CONSTRAINT PK_user 	 /*Naming a Primary Key*/
PRIMARY KEY (id, first_name);

8. What is a foreign key?

A foreign key in one table is a primary key of another table. A foreign key is used to set relationships(one to one, one to many, and many to many) between two tables. This relationship is set up as a parent and child relationship. A child table references its parent table using a foreign key which is its parent’s primary key.

In this example, Employee is the parent table and Salary is the child table. Salary has Employee primary key (EmployeeID) as a foreign key.

Image result for one to one relationship sql"

9. What is a unique key?

A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike the primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.

CREATE TABLE user ( 	 /* Create table with multiple fields as unique */
    id INT NOT NULL
    name VARCHAR(255) NOT NULL
    CONSTRAINT PK_user
    UNIQUE (id, name)
);

10. What is a join?

A JOIN is a keyword used to query data from two or more tables. Join is querying data from two or more tables.

INNER JOIN or JOIN – Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.

SELECT *
FROM employee emp
JOIN salary sal ON emp.id = sal.emp_id;

// OR

SELECT *
FROM employee emp
INNER JOIN salary sal ON emp.id = sal.emp_id;

LEFT JOIN or LEFT OUTER JOIN– Retrieves all the records/rows from the left and the matched records/rows from the right table.

SELECT *
FROM employee emp
LEFT JOIN salary sal
ON emp.id = sal.emp_id;

RIGHT JOIN or RIGHT OUTER JOIN – Retrieves all the records/rows from the right and the matched records/rows from the left table.

SELECT *
FROM employee emp
RIGHT JOIN salary sal
ON emp.id = sal.emp_id;

FULL JOIN or FULL OUTER JOIN – Retrieves all the records where there is a match on either the left or right table.

SELECT *
FROM employee emp
FULL JOIN salary sal
ON emp.id = sal.emp_id;

11. What is a self JOIN?

self JOIN is a case of regular join where a table is joined to itself based on some relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the table within the query.

SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee",
empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor"
FROM employee empA, employee empB
WHERE empA.emp_sup = empB.emp_id;

// OR

SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee",
empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor"
FROM employee empA, 
JOIN employee empB ON empA.emp_sup = empB.emp_id;

 

12. What is a CROSS JOIN?

Cross join can be defined as a cartesian product of the two tables included in the join. The table after join contains the same number of rows as in the cross-product of number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like an INNER JOIN.

SELECT emp.name, sal.amount 
FROM employee AS emp
CROSS JOIN salary AS sal;

12. What is Normalization?

Normalization is the process of organizing your database tables in a way that your data is not redundant.

For example, a spreadsheet containing information about salespeople and customers serves several purposes:

  • Identify salespeople in your organization
  • List all customers your company calls upon to sell products
  • Identify which salespeople call on specific customers.

By limiting a table to one purpose you reduce the number of duplicate data contained within your database.

There are three main reasons to normalize a database.  The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries. 

13. What is Denormalization?

Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema that has redundant information. The performance is improved by using redundancy and keeping redundant data consistent. The reason for performing denormalization is the overheads produced in the query processor by an over-normalized structure.

14. What are the different forms of Normalization?

1st Normal Form – This should remove all the duplicate columns from the table. Creation of tables for the related data and identification of unique columns.

2nd Normal Form – Meeting all requirements of the first normal form. Placing the subsets of data in separate tables and Creation of relationships between the tables using primary keys.

3rd Normal Form – This should meet all requirements of 2NF. Removing the columns which are not dependent on primary key constraints.

4th Normal Form – Meeting all the requirements of the third normal form and it should not have multi-valued dependencies.

15. What is a view?

A view is a virtual table that consists of a subset of data from one or more tables. Views are not virtually present, and it takes less space to store. View can have data of one or more tables combined, and it is depending on the relationship.

Image result for sql view"

16. What is an Index?

An index is very similar to an index in the back of a book. For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to the index, which lists all the topics alphabetically and is then referred to one or more specific page numbers.

An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.

17. What are all the different types of indexes?

There are three types of indexes -.

  • Unique Index.

This indexing does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.

  • Clustered Index.

This type of index reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.

  • A clustered index determines the physical order of data rows in a table. In other words, it dictates how the data is stored on disk.
  • There can be only one clustered index per table because the physical order of data can only be organized in one way.
  • The leaf nodes of a clustered index contain the actual data rows of the table, arranged in the order defined by the index key.
  • Clustered indexes are particularly useful when there is a need to frequently retrieve data based on a range of values or when the data needs to be stored in a specific order.

Example: Let’s consider a table called “Employees” with a clustered index on the “EmployeeID” column. The data within the table will be physically stored on disk in the order of the EmployeeID values. If you query the table based on the EmployeeID, the data can be quickly retrieved because it is organized according to the clustered index.

  • NonClustered Index.

NonClustered Index does not alter the physical order of the table and maintains logical order of data. Each table can have 999 nonclustered indexes.

  • A non-clustered index is a separate structure that contains a copy of selected columns from a table along with a pointer to the corresponding data rows.
  • Multiple non-clustered indexes can exist on a table, allowing different access paths for different columns.
  • The leaf nodes of a non-clustered index contain the index key values and the associated pointers to the actual data rows.
  • Non-clustered indexes are useful when there is a need to quickly retrieve data based on specific columns that are frequently queried.

Example: Consider the same “Employees” table, but this time with a non-clustered index on the “LastName” column. The index will contain the LastName values in a sorted order along with pointers to the corresponding rows in the table. If you query the table based on the LastName, the non-clustered index allows for quick lookup and retrieval of the data rows.

 

18. What is a Cursor?

A database Cursor is a control pointer that enables traversal over the rows or records in the table. This can be viewed as a pointer to one row in a set of rows. The cursor is very much useful for traversing such as retrieval, addition, and removal of database records.

19. What is a query?

A DB query is a code written in order to put or get the information back from the database. Query can be designed in such a way that it matched our expectations of the result set. Simply, a question to the Database.

20. What is subquery?

A subquery is a query within another query. The outer query is called the main query, and the inner query is called subquery. A SubQuery is always executed first, and the result of the subquery is passed on to the main query. A sub-query is executed and its results are passed to the outer query which is then executed after.

21. What are the types of subquery?

There are two types of subquery – Correlated and Non-Correlated.

A correlated subquery is a subquery that uses fields from the outer query or the main query. It evaluates once for every row in the outer query.

A Non-Correlated sub-query is independent of the outer or main query. It does not reference any fields from the outer query.

22. What is a stored procedure?

Stored Procedure is a function consists of many SQL statements to access the database system. Several SQL statements are consolidated into a stored procedure and execute them whenever and wherever required.

Such procedures are stored in the database data dictionary. The sole disadvantage of a stored procedure is that it can be executed nowhere except in the database and occupies more memory in the database server. It also provides a sense of security and functionality as users who can’t access the data directly can be granted access via stored procedures.

23. What is a trigger?

A DB trigger is a code or program that automatically execute when some event on a table or view in a database takes place. Mainly, the trigger helps to maintain the integrity of the database.

The TRIGGER statement is used to define a set of actions that are automatically executed in response to specified database events, such as INSERT, UPDATE, or DELETE operations.

Example: When a new student is added to the student database, new records should be created in the related tables like Exam, Score and Attendance tables.

24. What are the TRUNCATE, DELETE and DROP statements?

DELETE – delete a row in a table

DELETE FROM Candidates
WHERE CandidateId > 1000;

TRUNCATE – command is used to delete all the rows from the table and reset the table.

TRUNCATE TABLE Candidates;

DROP – command is used to remove an object from the database. If you drop a table, all the rows in that table are deleted and the table structure is removed from the database.

DROP TABLE user;

25. What is the difference between DROP and TRUNCATE statements?

If a table is dropped, all things associated with that table are dropped as well. This includes – the relationships defined on the table with other tables, the integrity checks and constraints, access privileges and other grants that the table has. To create and use the table again in its original form, all these relations, checks, constraints, privileges and relationships need to be redefined. However, if a table is truncated, none of the above problems exist and the table retains its original structure.

26. What is the difference between DELETE and TRUNCATE statements?

The TRUNCATE command is used to delete all the rows from the table and free the space containing the table.

The DELETE command deletes only the rows from the table based on the condition given in the where clause or deletes all the rows from the table if no condition is specified. But it does not free the space containing the table.

27. What are local and global variables and their differences?

Local variables are the variables that can be used or exist inside the function. They are not known to the other functions and those variables cannot be referred or used. Variables can be created whenever that function is called.

Global variables are the variables that can be used or exist throughout the program. The same variable declared in global cannot be used in functions. Global variables cannot be created whenever that function is called.

28. What is a constraint?

A constraint can be used to specify the limit on the data type of table. A constraint can be specified while creating or altering the table statement.

  • NOT NULL.
  • CHECK.
  • DEFAULT.
  • UNIQUE.
  • PRIMARY KEY.
  • FOREIGN KEY.

29. What is data Integrity?

Data Integrity defines the accuracy and consistency of data stored in a database. It can also define integrity constraints to enforce business rules on the data when it is entered into the application or database.

30. What is Auto Increment?

Autoincrement keyword allows the user to create a unique number to be generated when a new record is inserted into the table. AUTOINCREMENT keyword can be used in Oracle and IDENTITY keyword can be used in SQL SERVER.

Mostly this keyword can be used whenever the PRIMARY KEY is used.

31. What are Aggregate and Scalar functions?

An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. Following are the widely used SQL aggregate functions:

AVG() – Calculates the mean of a collection of values.
COUNT() – Counts the total number of records in a specific table or view.
MIN() – Calculates the minimum of a collection of values.
MAX() – Calculates the maximum of a collection of values.
SUM() – Calculates the sum of a collection of values.
FIRST() – Fetches the first element in a collection of values.
LAST() – Fetches the last element in a collection of values.
Note: All aggregate functions described above ignore NULL values except for the COUNT function.

A scalar function returns a single value based on the input value. Following are the widely used SQL scalar functions:

LEN() – Calculates the total length of the given field (column).
UCASE() – Converts a collection of string values to uppercase characters.
LCASE() – Converts a collection of string values to lowercase characters.
MID() – Extracts substrings from a collection of string values in a table.
CONCAT() – Concatenates two or more strings.
RAND() – Generates a random collection of numbers of given length.
ROUND() – Calculates the round off integer value for a numeric field (or decimal point values).
NOW() – Returns the current data & time.
FORMAT() – Sets the format to display a collection of values.

32. What is the difference between a Cluster and Non-Cluster Index?

a clustered index is used for easy retrieval of data from the database by altering the way that the records are stored. Database sorts out rows by the column which is set to be clustered index.

A nonclustered index does not alter the way it was stored but creates a completely separate object within the table. It points back to the original table rows after searching.

33. What is Datawarehouse?

Datawarehouse is a central repository of data from multiple sources of information. Those data are consolidated, transformed and made available for the mining and online processing. Warehouse data have a subset of data called Data Marts.

34. What is OLTP?

OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An essential attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. These systems are usually designed for a large number of users who conduct short transactions. Database queries are usually simple, require sub-second response times and return relatively few records. Here is an insight into the working of an OLTP system

35. What are the differences between OLTP and OLAP?

OLTP stands for Online Transaction Processing, which is a class of software applications capable of supporting transaction-oriented programs. An important attribute of an OLTP system is its ability to maintain concurrency. OLTP systems often follow a decentralized architecture to avoid single points of failure. These systems are generally designed for a large audience of end-users who conduct short transactions. Queries involved in such databases are generally simple, need fast response times and return relatively few records. The number of transactions per second acts as an effective measure for such systems.

OLAP stands for Online Analytical Processing, a class of software programs that are characterized by a relatively low frequency of online transactions. Queries are often too complex and involve a bunch of aggregations. For OLAP systems, the effectiveness measure relies highly on response time. Such systems are widely used for data mining or maintaining aggregated, historical data, usually in multi-dimensional schemas.

36. What is User-defined function? What are its various types?

The user-defined functions in SQL are like functions in any other programming language that accept parameters, perform complex calculations, and return a value. They are written to use the logic repetitively whenever required. There are two types of SQL user-defined functions:

Scalar Function: As explained earlier, user-defined scalar functions return a single scalar value.
Table-Valued Functions: User-defined table-valued functions return a table as output.
Inline: returns a table data type based on a single SELECT statement.
Multi-statement: returns a tabular result-set but, unlike inline, multiple SELECT statements can be used inside the function body.

37. What is collation?

Collation is defined as a set of rules that determine how character data can be sorted and compared. This can be used to compare A and, other language characters and also depends on the width of the characters.

ASCII value can be used to compare these character data.

38. What is Pattern Matching in SQL?

SQL pattern matching provides for pattern search in data if you have no clue as to what that word should be. This kind of SQL query uses wildcards to match a string pattern, rather than writing the exact word. The LIKE operator is used in conjunction with SQL Wildcards to fetch the required information.

a. Using the % wildcard to perform a simple search
The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern. Search a student in your database with first name beginning with the letter K:

SELECT *
FROM students
WHERE first_name LIKE 'K%'

b. Omitting the patterns using the NOT keyword
Use the NOT keyword to select records that don’t match the pattern. This query returns all students whose first name does not begin with K.

SELECT *
FROM students
WHERE first_name NOT LIKE 'K%'

c. Matching a pattern anywhere using the % wildcard twice
Search for a student in the database where he/she has a K in his/her first name.

SELECT *
FROM students
WHERE first_name LIKE '%Q%'

d. Using the _ wildcard to match pattern at a specific position
The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard. This query fetches all students with letter K at the third position in their first name.

SELECT *
FROM students
WHERE first_name LIKE '__K%'

e. Matching patterns for specific length
The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results. For example –

SELECT * 	 /* Matches first names with three or more letters */
FROM students
WHERE first_name LIKE '___%'

SELECT * 	 /* Matches first names with exactly four characters */
FROM students
WHERE first_name LIKE '____'

39. What is the difference between UNION and UNION ALL in SQL?

UNION combines the result sets of two or more SELECT statements, removing duplicates, while UNION ALL combines the result sets without removing duplicates.

  • The UNION operator is used to combine the result sets of two or more SELECT statements into a single result set.
  • The result set produced by the UNION operator eliminates duplicate rows.
  • The columns in the SELECT statements being combined must have compatible data types and be in the same order.
  • The number of columns in all the SELECT statements must be the same.
  • The column names in the result set are typically derived from the first SELECT statement, but aliases can be used to provide custom names.
  • The result set is sorted in ascending order by default, but the ORDER BY clause can be used to specify a custom sort order.
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

UNION ALL:

  • The UNION ALL operator is also used to combine the result sets of two or more SELECT statements into a single result set.
  • Unlike UNION, the UNION ALL operator does not eliminate duplicate rows. It simply combines all the rows from each SELECT statement.
  • The columns in the SELECT statements being combined must have compatible data types and be in the same order.
  • The number of columns in all the SELECT statements must be the same.
  • The column names in the result set are typically derived from the first SELECT statement, but aliases can be used to provide custom names.
  • The result set is not sorted and maintains the order in which the rows are retrieved from each SELECT statement.
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;

When deciding whether to use UNION or UNION ALL, consider the following:

  • Use UNION when you want to combine result sets and eliminate duplicate rows.
  • Use UNION ALL when you want to combine result sets and retain all rows, including duplicates.

It’s important to note that UNION and UNION ALL have performance implications. Since UNION performs a distinct operation, it requires additional processing time compared to UNION ALL, which simply combines the rows without any duplicate elimination. Therefore, if you know that your result sets do not contain duplicates, using UNION ALL can be more efficient.

40. What is an SQL injection, and how can it be prevented?

SQL injection is a type of security vulnerability in which an attacker can manipulate a SQL query to perform unauthorized operations or access sensitive data. It occurs when user input is not properly validated or sanitized before being used in constructing a SQL query, allowing malicious SQL code to be injected. It can be prevented by using prepared statements or parameterized queries and by properly sanitizing user input.

Consider a simple login form where a user enters their username and password. The server-side code might construct a SQL query like this:

String query = "SELECT * FROM users WHERE username = '" + username + "' AND password = '" + password + "'";

If the input values are not properly validated or sanitized, an attacker can manipulate the input by entering a malicious string like ' OR '1'='1' --', which can change the meaning of the query. The resulting query would be:

SELECT * FROM users WHERE username = '' OR '1'='1' --' AND password = ''

In this case, the attacker has successfully bypassed the login mechanism by making the condition '1'='1' always true, effectively retrieving all user records from the database.

To prevent SQL injection, it is crucial to use parameterized queries or prepared statements, which separate the SQL code from the user input. Here’s an example of how to solve the SQL injection vulnerability using parameterized queries:

String query = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement statement = connection.prepareStatement(query);
statement.setString(1, username);
statement.setString(2, password);
ResultSet resultSet = statement.executeQuery();

Spring Data is a part of the larger Spring Framework and provides abstractions and utilities for interacting with databases. When using Spring Data’s repositories or query methods, the framework internally leverages features of the underlying persistence technology, such as Hibernate or JPA (Java Persistence API). These technologies handle parameterized queries and protect against SQL injection. By following best practices and utilizing Spring Data’s provided features, you can ensure that your queries are secure and protected from SQL injection vulnerabilities.

41. Explain the ACID properties in the context of database transactions.

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These properties are essential for ensuring the reliability, integrity, and robustness of database transactions.

a. Atomicity: Atomicity guarantees that a transaction is treated as a single indivisible unit of work. It means that either all the operations within a transaction are successfully completed, or none of them are applied. If any part of the transaction fails, the entire transaction is rolled back to its initial state.

Example: Consider a banking application where a transfer of funds from one account to another is performed. To ensure atomicity, the debit from the sender’s account and the credit to the receiver’s account must occur together as a single atomic transaction. If either operation fails, the transaction is rolled back to maintain data consistency.

b. Consistency: Consistency ensures that a transaction brings the database from one consistent state to another. It enforces integrity constraints, domain rules, and predefined data validations during the transaction, ensuring that the data is valid and consistent before and after the transaction.

Example: In a database where the age of a person must be greater than zero, if a transaction tries to update the age of a person to a negative value, the consistency property will prevent the transaction from being committed, maintaining the data’s integrity.

c. Isolation: Isolation ensures that concurrent transactions do not interfere with each other and are executed in isolation. Each transaction should appear as if it is running in isolation, even when multiple transactions are executing concurrently. This property prevents issues like dirty reads, non-repeatable reads, and phantom reads.

Example: When two transactions simultaneously try to update the same record in a database, the isolation property ensures that each transaction sees its own consistent snapshot of the data and does not interfere with the other transaction’s updates until they are committed.

d. Durability: Durability guarantees that once a transaction is committed, its changes are permanent and will survive any subsequent system failures. The changes made by a committed transaction are stored in a durable manner, typically on disk or other persistent storage, to ensure their long-term persistence.

Example: After a successful transaction commits, the changes made to the database, such as inserting, updating, or deleting records, will be durably stored, even in the event of a system crash or power failure. The changes will be available and recoverable when the system is restarted.

To use ACID transactions with Hibernate, you typically perform the following steps:

  1. Configure Hibernate’s transaction management: Set up the required configurations, such as defining the database connection, transaction manager, and transactional boundaries.

  2. Annotate transactional methods: Use Hibernate’s @Transactional annotation on the methods that should participate in transactions. This annotation ensures that the methods are executed within a transactional context, adhering to the ACID properties.

  3. Perform database operations: Use Hibernate’s session or entity manager to perform database operations within the transactional methods. Hibernate automatically manages the transaction boundaries and ensures that the ACID properties are maintained.

With Hibernate’s transparent transaction management, you can rely on its underlying support for ACID transactions, allowing you to focus on the business logic without explicitly handling transaction management code. Hibernate integrates with various transaction managers, such as Java Transaction API (JTA) or Spring’s transaction management, to provide ACID support.

Here’s how ACID transactions can be used with Spring Data:

  1. Configure Transaction Management: In your Spring configuration file, set up the necessary configurations for transaction management. This includes defining a transaction manager bean, specifying the data source, and configuring the transaction boundaries.

  2. Annotate Transactional Methods: Use Spring’s @Transactional annotation to mark the methods that should participate in transactions. This annotation can be applied at the class level or the method level, depending on your requirements.

  3. @Service
    @Transactional
    public class UserService {
        @Autowired
        private UserRepository userRepository;
    
        public User createUser(User user) {
            // Perform data manipulation operations using the UserRepository methods
            // ...
        }
    }
    

ACID transactions can be used with Java JDBC (Java Database Connectivity), which is a standard Java API for interacting with databases.

Here’s how you can work with ACID transactions using Java JDBC:

  1. Establish Database Connection: Use the JDBC API to establish a connection to the database by providing the necessary connection details, such as the database URL, username, and password.

  2. Connection connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydatabase", "username", "password");
    

    Disable Auto-Commit: By default, JDBC auto-commits each individual SQL statement as a separate transaction. To enable ACID transactions, you need to disable the auto-commit mode on the connection.

  3. connection.setAutoCommit(false);
    

    Execute SQL Statements: Use the JDBC API to execute SQL statements within the transactional context. You can create and execute statements using the Statement or PreparedStatement interfaces.

  4. Statement statement = connection.createStatement();
    String sql = "INSERT INTO users (name, email) VALUES ('John', 'john@example.com')";
    statement.executeUpdate(sql);
    

    Commit or Rollback: After executing all the SQL statements within the transaction, you can choose to commit the transaction to make the changes permanent or roll back the transaction to discard the changes.

  5. connection.commit();
    connection.rollback();
    

    By explicitly managing the transaction boundaries using JDBC, you can ensure the ACID properties are maintained. Disabling auto-commit mode allows you to group multiple SQL statements into a single transaction, which will either be committed as a whole or rolled back if an error occurs.

    It’s important to note that working with ACID transactions using JDBC requires you to handle exception handling, connection management, and transactional logic manually. Frameworks like Hibernate and Spring Data provide higher-level abstractions that simplify transaction management and handle many of these tasks automatically.

    Additionally, you should also consider using a connection pool, such as Apache Commons DBCP or HikariCP, to improve connection management and efficiency in a production environment. Connection pools help manage and reuse database connections, reducing the overhead of establishing connections for each transaction.

42. What is the purpose of the HAVING clause in SQL?

  • The HAVING clause is used to filter the results of a GROUP BY query based on a condition applied to an aggregate function.

43. What are the differences between HAVING and WHERE?

The HAVING and WHERE clauses are both used in SQL to filter and restrict data, but they are used in different contexts and operate on different parts of a query. Here are the key differences between the HAVING and WHERE clauses:

  1. Usage:

    • WHERE: The WHERE clause is used in the SELECT, UPDATE, DELETE, and MERGE statements to filter rows from the source table(s) before the result set is returned or the operation is executed.
    • HAVING: The HAVING clause is used exclusively with the GROUP BY clause in the SELECT statement to filter rows after the grouping and aggregation operations have taken place.
  2. Application:

    • WHERE: The WHERE clause filters rows based on conditions applied to individual column values. It is used to narrow down the result set by evaluating conditions on non-aggregated columns.
    • HAVING: The HAVING clause filters rows based on conditions applied to aggregate functions. It is used to filter grouped data based on aggregated values, such as COUNT, SUM, AVG, etc.
  3. Placement:

    • WHERE: The WHERE clause appears before the GROUP BY clause (if present) and is used to filter rows before grouping and aggregation occur.
    • HAVING: The HAVING clause appears after the GROUP BY clause and is used to filter rows after the grouping and aggregation have been performed.
  4. Applicable Expressions:

    • WHERE: The WHERE clause can use any valid condition or expression involving columns of the source table(s) being queried. It can include comparisons, logical operators, and functions that operate on individual column values.
    • HAVING: The HAVING clause can use aggregate functions, column references, and logical operators. It allows conditions to be applied to aggregated values resulting from the GROUP BY operation.
  5. Grouping Requirement:

    • WHERE: The WHERE clause does not require the use of the GROUP BY clause. It can be used in any query, whether or not there is a GROUP BY clause present.
    • HAVING: The HAVING clause is used only when the GROUP BY clause is present in the query. It filters rows based on aggregated values resulting from the grouping.

In summary, the WHERE clause is used to filter rows before grouping and aggregation, while the HAVING clause is used to filter grouped and aggregated data after the grouping and aggregation operations have taken place. The WHERE clause operates on individual column values, while the HAVING clause applies conditions to aggregated values resulting from the GROUP BY operation.

October 10, 2019

Java interview – Serialization

Java Serialization Interview Questions and Answers

1. What is Serialization?

Serialization is the process of converting an object and its states to a binary stream that can then be saved into a database, cached or sent over the network to another JVM.

2. What is Deserialization?

Deserialization is the process of converting back a serialized object.

import java.io.Serializable;

import org.apache.commons.lang3.builder.ToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;

public class User implements Serializable {

	private static final long serialVersionUID = 3L;

	private String name;

	private int age;

	private String email;

	transient private double height;

	//setters and getters

	@Override
	public String toString() {
		return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE);
	}

}
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

public class SerializationDemo {

	public static void main(String[] args) throws IOException, ClassNotFoundException {
		String path = "serialized_user.ser";

		User serializedUser = serialize(path);

		System.out.println("serializedUser=" + serializedUser.toString());

		User deserializedUser = deserialize(path);
		
		System.out.println("deserializedUser=" + deserializedUser.toString());

	}

	private static User serialize(String path) throws IOException {

		FileOutputStream fileStream = new FileOutputStream(path);

		ObjectOutputStream objectStream = new ObjectOutputStream(fileStream);

		User user = new User();
		user.setAge(21);
		user.setName("Laulau");
		user.setEmail("folaukaveinga@gmail.com");
		user.setHeight(6.3);

		objectStream.writeObject(user);

		objectStream.close();
		fileStream.close();

		return user;
	}

	private static User deserialize(String path) throws IOException, ClassNotFoundException {
		FileInputStream fileInputStream = new FileInputStream(path);
		ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);

		User user = (User) objectInputStream.readObject();

		objectInputStream.close();
		fileInputStream.close();

		return user;
	}

}

Result

serializedUser=User[name=Laulau,age=21,email=folaukaveinga@gmail.com]
deserializedUser=User[name=Laulau,age=21,email=folaukaveinga@gmail.com]

3. What is Externalization?

Serialization is not very efficient. If you serialize bloated objects having a lot of attributes and properties, you do not wish to serialize. This is where Externalization in Java comes into the picture. Externalization is an advanced way of serialization. It is whenever you need to customize the serialization mechanism.

The class you want to be externalized must implement the Externalizable interface, then serialization of the object will be done using the method writeExternal(). When the Externalizable object is reconstructed at the receiver’s end, an instance will be created using no-argument constructor and this method is called readExternal().

This basically serves the purpose of custom Serialization, where you can decide what to store in a stream.

When to use Externalization?

When you want or need to serialize only part of an object, Externalization is the best option.

How to use Externalization?

By implementing the Externalizable interface, you control the process of reading and writing the objects during serialization and de-serialization process, you need to have the object’s class implemented the interface java.io.Externalizable. Only then you can implement your own code to read and write the object’s states. The methods, readExternal() and writeExternal() are defined by the Externalizable interface.

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;

import org.apache.commons.lang3.builder.ToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;

public class Member implements Externalizable {

	public static final long serialVersionUID = 2L;

	private String name;

	private int age;

	private String email;

	transient private double height;

	// setters and getters

	@Override
	public String toString() {
		return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE);
	}

	@Override
	public void writeExternal(ObjectOutput out) throws IOException {
		System.out.println("writing object state...");
		// TODO Auto-generated method stub
		out.writeObject(name);
		out.writeObject(email);
		out.writeInt(age);
		out.writeDouble(height);
	}

	@Override
	public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
		System.out.println("reading object state...");
		this.name = (String) in.readObject();
		this.email = (String) in.readObject();
		this.age = in.readInt();
		this.height = in.readDouble();
	}
}
public class ExternalizeDemo {

	public static void main(String[] args) throws IOException, ClassNotFoundException {
		String path = "externalized_member.ser";

		Member externalizedMember = serialize(path);

		System.out.println("externalizedMember=" + externalizedMember.toString());

		Member dexternalizedMember = deserialize(path);

		System.out.println("dexternalizedMember=" + dexternalizedMember.toString());
	}

	private static Member serialize(String path) throws IOException {

		FileOutputStream fileStream = new FileOutputStream(path);

		ObjectOutputStream objectStream = new ObjectOutputStream(fileStream);

		Member member = new Member();
		member.setAge(21);
		member.setName("Laulau");
		member.setEmail("folaukaveinga@gmail.com");
		member.setHeight(6.3);

		System.out.println(member.toString());

		objectStream.writeObject(member);

		objectStream.close();
		fileStream.close();

		return member;

	}

	private static Member deserialize(String path) throws IOException, ClassNotFoundException {
		FileInputStream fileInputStream = new FileInputStream(path);
		ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);

		Member member = (Member) objectInputStream.readObject();

		objectInputStream.close();
		fileInputStream.close();

		return member;
	}
}

Result

Member[name=Laulau,age=21,email=folaukaveinga@gmail.com]
writing object state...
externalizedMember=Member[name=Laulau,age=21,email=folaukaveinga@gmail.com]
reading object state...
dexternalizedMember=Member[name=Laulau,age=21,email=folaukaveinga@gmail.com]

4. What are the differences between Serialization and Externalization?

Externalizable provides us with writeExternal() and readExternal() methods that give us control on what attributes and values to serialize. Correct implementation of the Externalizable interface can improve the performance of application drastically.

ExternalizationSerialization
UIDNo need to have it but for best practice you must have itNeeds a serialVersionUID
StorageStore the data per attribute.Store the whole object directly.
AccessProvides complete control to the serialization and deserialization process.No access

5. How many methods does Serializable have?

Zero. Serializable is a marker interface.

6. What is a serialVersionUID? What happens when you don’t explicitly provide?

serialVersionUID is a version control id. It is basically the hashcode of the object. So if you change the object(add or remove an attribute) then the hashcode will be different so an already serialized object will not be able to deserialize. Java serialization process relies on correct serialVersionUID for recovering the state of the serialized object and throws java.io.InvalidClassException in case of serialVersionUID mismatch

When an object is serialized, the serialVersionUID is serialized along with the other contents. Later when that is deserialized, the serialVersionUID from the deserialized object is extracted and compared with the serialVersionUID of the loaded class. If the numbers do not match then, InvalidClassException is thrown.

serialVersionUID is a 64-bit hash of the class name, interface class names, methods and fields. Serialization runtime generates a serialVersionUID if you do not add one. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations and can produce different serialVersionUID in different environments.


If you don’t provide it, Java will provide it for you.

7. What happens if an object is serializable but it includes a reference to a non-serializable object?

Java will throw a NotSerializableException. To fix this issue you can do either one of these two things:

a. Make the non-serializable object serializable by making its class implement the Serializable interface.

b. Make the non-serializable object transient or static.

8. If a class is serializable but its superclass in not, what will be the state of the instance variables inherited from super class after deserialization?

The state will be its default value. For example:

int age; -> age = 0;

String name; -> name = null;

9. To serialize an array or a collection all the members of it must be serializable. True /False?

true

All standard implementations of collections List, Set and Map interface already implement java.io.Serializable. This means you do not really need to write anything specific to serialize collection objects. However, you should keep the following things in mind before you serialize a collection object – Make sure all the objects added in a collection are Serializable.

10. While serializing you want some of the members not to serialize, how do you achieve it?

If you don’t want any field to be part of the object’s state then declare it either static or transient based on your need and it will not be included during the serialization process.

Variables marked with transient are not included in the process of serialization and are not part of the object’s serialized state.

The static variables belong to a class are not part of the state of the object so they are not saved as part of the serialized object.

Why static variables are not serialized in Java?

The Java variables declared as static are not considered part of the state of an object since they are shared by all instances of that class. Saving static variables with each serialized object would have following problems

  • It will make redundant copy of same variable in multiple objects which makes it in-efficient.
  • The static variable can be modified by any object and a serialized copy would be stale or not in sync with current value.

11. What should you take care of when serializing an object?

You should make sure that all the included objects are also serializable. If any of the objects is not serializable then it throws a NotSerializableException.

12. Suppose a superclass of a new class implements Serializable interface, how can you avoid new class to being serialized?

Use Externalization and exclude the new class from the serialization process.

13. What are the ways to speed up Object Serialization? How to improve Serialization performance?

The default Java Serialization mechanism is really useful, however it can have a really bad performance based on your application and business requirements. The serialization process performance heavily depends on the number and size of attributes you are going to serialize for an object. Below are some tips you can use for speeding up the marshaling and un-marshaling of objects during Java serialization process.

  • Mark the unwanted or non Serializable attributes as transient. This is a straight forward benefit since your attributes for serialization are clearly marked and can be easily achieved using Serialzable interface itself.
  • Save only the state of the object, not the derived attributes. Some times we keep the derived attributes as part of the object however serializing them can be costly. Therefore consider calcualting them during de-serialization process.
  • Serialize attributes only with NON-default values. For examples, serializing a int variable with value zero is just going to take extra space however, choosing not to serialize it would save you a lot of performance. This approach can avoid some types of attributes taking unwanted space. This will require use of Externalizable interface since attribute serialization is determined at runtime based on the value of each attribute.
  • Use Externalizable interface and implement the readExternal and writeExternal methods to dynamically identify the attributes to be serialized. Some times there can be a custom logic used for serialization of various attributes.

14. Do constructors get invoked or run at deserialization?

When an instance of a serializable class is deserialized, the constructor does not run. If constructors were invoked it will give back the initial values rather than the value they had at the time of serialization.

15. What are the compatible and incompatible changes when dealing with versioned serialized objects during serialization?

Changes to a serializable class can be compatible or incompatible. Following is the list of changes which are compatible:

  • Adding fields – When the class being reconstituted has a field that does not occur in the stream, that field in the object will be initialized to the default value for its type.
  • Adding or removing classes – Comparing the class hierarchy in the stream with that of the current class can detect that a class has been deleted or added.
  • Adding or removing writeObject/readObject methods
  • Changing the access to a field – The access modifiers public, package, protected, and private have no effect on the ability of serialization to assign values to the fields.
  • Changing a field from static to nonstatic or transient to nontransient – When relying on default serialization to compute the serializable fields, this change is equivalent to adding a field to the class. The new field will be written to the stream but earlier classes will ignore the value since serialization will not assign values to static or transient fields.

List of incompatible changes:

  • Deleting fields
  • Changing a nonstatic field to static or a nontransient field to transient
  • Changing the declared type of a primitive field
  • Changing a class from Serializable to Externalizable or vice versa is an incompatible change since the stream will contain data that is incompatible with the implementation of the available class.
  • Changing a class from a non-enum type to an enum type or vice versa
  • Removing either Serializable or Externalizable is an incompatible change since when written it will no longer supply the fields needed by older versions of the class.
  • Changing the writeObject or readObject method
  • Moving classes up or down the hierarchy – This cannot be allowed since the data in the stream appears in the wrong sequence.

16. What are the guidelines for using serialVersionUID?

  • You must declare serialVersionUID because it gives us more control. For example, Default rules for generating serialVersionUID can be too strict in some cases. For example when the visibility of a field changes, the serialVersionUID changes too. or sometimes you just want to forbid deserialization of old serialized objects then you can just change serialVersionUID.
  • Include this field even in the first version of the class, as a reminder of its importance
  • You must not only declare it but also maintain it. You should change serialVersionUID when there is some change in the definition of data stored in the class. For example, data type of field is changed.

17. Does setting the serialVersionUID class field improve Java serialization performance?

Declaring an explicit serialVersionUID field in your classes saves some CPU time only the first time the JVM process serializes a given Class. However the gain is not significant, In case when you have not declared the serialVersionUID its value is computed by JVM once and subsequently kept in a soft cache for future use.

18. Suppose you have a class in which you serialized and stored in persistence and later modified that class to add a new field. What will happen if you deserialize the object already serialized?

It depends on whether a class has its own serialVersionUID or not. As we know that if we don’t provide serialVersionUID in our code java compiler will generate it and normally it’s equal to hashCode of the object. By adding any new field there is a chance that new serialVersionUID generated for that class version is not the same as the serialized object. In this case, Java Serialization API will throw java.io.InvalidClassException and this is the reason its recommended to have your own serialVersionUID in code and make sure to keep it same always for a single class.

October 10, 2019

Java interview – Exception

October 10, 2019

Java interview – I/O

October 10, 2019