MySQL Advanced Queries
1.a Write a query to find the top 3 highest salaries in the company.
SELECT salary FROM employee ORDER BY salary DESC LIMIT 3;
1.b Write a query to find the top 3 highest salary employees in the company.
SELECT * FROM employee ORDER BY salary DESC LIMIT 3;
1.c Write a query to find the top 3rd highest salary employee in the company.
SELECT * FROM employee ORDER BY salary DESC -- LIMIT offset, count LIMIT 2, 1;
1.d Write an SQL query to find the maximum, minimum, and average salary of the employees.
SELECT Max(salary), Min(salary), AVG(salary) FROM employee;
2. Write a query to find the number of employees who have been with the company for more than 5 years.
– Use (CURRENT_DATE – hired_date) / 365 to calculate 1 year
SELECT COUNT(*) as _5YearAndOlder FROM employee WHERE ((CURRENT_DATE - hired_date) / 365) >= 5;
3.a Write a query to find the average salary for each department in the company. Sort the average_salary from highest to lowest
SELECT department_id, AVG(salary) AS average_salary FROM employee GROUP BY department_id ORDER BY average_salary DESC;
3.b Write a query to find the count of employees per deparment.
SELECT department_id, COUNT(id) AS departmentCount FROM employee GROUP BY department_id ORDER BY departmentCount DESC;
3.c Write a query to find the employees that have salary that is above average salary for each department in the company.
SELECT emp.id, emp.first_name, emp.last_name, emp.salary, avg_salary.average_salary FROM employee as emp JOIN ( SELECT department_id, AVG(salary) AS average_salary FROM employee GROUP BY department_id ) as avg_salary ON avg_salary.department_id = emp.department_id WHERE emp.salary >= avg_salary.average_salary;
4. Write a query to find the employees who are working on more than 1 project.
SELECT pe.employee_id, emp.first_name, emp.last_name, COUNT(pe.project_id) as projectCount FROM project_employees as pe JOIN employee as emp ON pe.employee_id = emp.id -- join employee just to display name GROUP BY pe.employee_id HAVING projectCount > 1;
5. Write a query to find the employees who are not working on any projects.
SELECT id, first_name, last_name FROM employee WHERE id NOT IN ( SELECT DISTINCT employee_id FROM project_employees );
6.a Write a query to find the employees who have joined last year.
SELECT id, first_name, last_name, hired_date FROM employee WHERE YEAR(hired_date) = YEAR(CURRENT_DATE()) - 1;
6.b Write a query to find the employees who have joined in the last year.
SELECT id, first_name, last_name, hired_date FROM employee WHERE hired_date > CURRENT_DATE - INTERVAL 1 YEAR;
7. Write a query to find the employees who have submitted more than 1 bug reports in the last month.
SELECT br.reporter_id, emp.first_name, emp.last_name, COUNT(br.id) as bugCount FROM employee_bug_report as br JOIN employee as emp ON emp.id = br.reporter_id WHERE br.reported_at > CURRENT_DATE - INTERVAL 1 MONTH GROUP BY br.reporter_id HAVING bugCount > 1;
8.a Write a query to find the employee who has submitted the most bug reports.
SELECT br.reporter_id, emp.first_name, emp.last_name, COUNT(br.id) as bugCount FROM employee_bug_report as br JOIN employee as emp ON emp.id = br.reporter_id GROUP BY br.reporter_id ORDER BY bugCount DESC LIMIT 0, 1;
8.b Write a query to find the employee who has submitted the most bug reports in the last month.
SELECT br.reporter_id, emp.first_name, emp.last_name, (CURRENT_DATE - INTERVAL 1 MONTH) as from_date, COUNT(br.id) as bugCount FROM employee_bug_report as br JOIN employee as emp ON emp.id = br.reporter_id WHERE br.reported_at > CURRENT_DATE - INTERVAL 1 MONTH GROUP BY br.reporter_id ORDER BY bugCount DESC LIMIT 0, 1;
9.a Write a query to find the employees who have duplicated first_names
SELECT emp.first_name, COUNT(emp.id) as nameCount FROM employee as emp GROUP BY emp.first_name HAVING nameCount > 1;
9.b Write a query to find the count of employees who have duplicated first_names
SELECT COUNT(duplicatedNames.idCount) as numberOfDuplicatedFirstName FROM ( SELECT COUNT(emp.id) as idCount FROM employee as emp GROUP BY emp.first_name HAVING idCount > 1 ) as duplicatedNames;
9.c Write a query to find duplicated first_names of over 2 employees or more
SELECT COUNT(duplicatedNames.idCount) as numberOfDuplicatedFirstName FROM ( SELECT COUNT(emp.id) as idCount FROM employee as emp GROUP BY emp.first_name HAVING idCount >= 2 ) as duplicatedNames;
10. Write a query to find the employees that their first_name starts with Sh
— start with is LIKE ‘xxx%’
SELECT emp.* FROM employee as emp WHERE emp.first_name LIKE 'Sh%';
11. Write a query to find the employees that their first_name ends with n
— end with is LIKE ‘%xxx’
SELECT emp.* FROM employee as emp WHERE emp.first_name LIKE '%n';
12. Write a query to find the employees that their first_name has or contains an
— contain is LIKE ‘%xxx%
SELECT emp.* FROM employee as emp WHERE emp.first_name LIKE '%an%';
13. Write a query to find the employees that their first_name starts with t and last_name ends with n
SELECT emp.* FROM employee as emp WHERE emp.first_name LIKE 't%' AND emp.last_name LIKE '%n';
14. Write a query to put together employee first_name and last_name as full_name
SELECT CONCAT(emp.first_name, ' ', emp.last_name) as full_name FROM employee as emp;
15. Write an SQL query to create a new table(employee1) with data and structure copied from another table(employee).
CREATE TABLE employee1 SELECT * FROM employee;
16. Find the total number of bugs reported per day, ordered by chronological date
SELECT DATE(br.reported_at) as dateReported, COUNT(br.id) as bugCount FROM employee_bug_report as br GROUP BY DATE(br.reported_at) ORDER BY DATE(br.reported_at) DESC;
MySQL Fundamental Questions
1. What is a database?
A database is an organized collection of data. It stores data in a way that data can be easily accessed and maintained. It has tables that contain rows and columns.
Example: School Management Database, Bank Management Database, etc.
2. What is DBMS?
DBMS stands for Database Management System. DBMS is a system software responsible for the creation, retrieval, update and management of the database. It ensures that our data is consistent, organized and is easily accessible by serving as an interface between the database and its end-users or application software.
Examples of DBMS are MySQL, SQL Server, PostgreSQL, etc.
3. What is RDBMS?
RDBMS stands for Relational Database Management System. The key difference here, compared to DBMS, is that RDBMS stores data in the form of a collection of tables and relations can be defined between the common fields of these tables. Most modern database management systems like MySQL, Microsoft SQL Server, Oracle, IBM DB2 and Amazon Redshift are based on RDBMS.
4. What is the difference between SQL and MySQL?
SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.
5. What is SQL?
SQL stands for Structured Query Language. It is the standard language for performing CRUD operations on relational database management systems. It is especially useful in handling organized data comprised of entities (variables) and relations between different entities of the data.
6. What are Tables and Fields?
A table is a set of data that is organized in a model with Columns and Rows. Columns can be categorized as vertical, and Rows are horizontal. A table has a specified number of columns called fields but can have any number of rows which is called record. Here is an example:
Table: User
Column: id, first_name,date_of_birth.
Row: 234, Peter, 1/20/1990.
7. What is Primary Key?
a PRIMARY KEY uniquely identifies a row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).
CREATE TABLE user ( /* Create table with a single field as primary key */ id INT NOT NULL name VARCHAR(255) PRIMARY KEY (id) ); CREATE TABLE user ( /* Create table with multiple fields as primary key */ id INT NOT NULL last_name VARCHAR(255) first_name VARCHAR(255) NOT NULL, CONSTRAINT PK_user PRIMARY KEY (id, first_name) ); ALTER TABLE user /* Set a column as primary key */ ADD PRIMARY KEY (id); ALTER TABLE user /* Set multiple columns as primary key */ ADD CONSTRAINT PK_user /*Naming a Primary Key*/ PRIMARY KEY (id, first_name);
8. What is a foreign key?
A foreign key in one table is a primary key of another table. A foreign key is used to set relationships(one to one, one to many, and many to many) between two tables. This relationship is set up as a parent and child relationship. A child table references its parent table using a foreign key which is its parent’s primary key.
In this example, Employee is the parent table and Salary is the child table. Salary has Employee primary key (EmployeeID) as a foreign key.
9. What is a unique key?
A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike the primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.
CREATE TABLE user ( /* Create table with multiple fields as unique */ id INT NOT NULL name VARCHAR(255) NOT NULL CONSTRAINT PK_user UNIQUE (id, name) );
10. What is a join?
A JOIN is a keyword used to query data from two or more tables. Join is querying data from two or more tables.
INNER JOIN or JOIN – Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.
SELECT * FROM employee emp JOIN salary sal ON emp.id = sal.emp_id; // OR SELECT * FROM employee emp INNER JOIN salary sal ON emp.id = sal.emp_id;
LEFT JOIN or LEFT OUTER JOIN– Retrieves all the records/rows from the left and the matched records/rows from the right table.
SELECT * FROM employee emp LEFT JOIN salary sal ON emp.id = sal.emp_id;
RIGHT JOIN or RIGHT OUTER JOIN – Retrieves all the records/rows from the right and the matched records/rows from the left table.
SELECT * FROM employee emp RIGHT JOIN salary sal ON emp.id = sal.emp_id;
FULL JOIN or FULL OUTER JOIN – Retrieves all the records where there is a match on either the left or right table.
SELECT * FROM employee emp FULL JOIN salary sal ON emp.id = sal.emp_id;
11. What is a self JOIN?
A self JOIN is a case of regular join where a table is joined to itself based on some relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the table within the query.
SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee", empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor" FROM employee empA, employee empB WHERE empA.emp_sup = empB.emp_id; // OR SELECT empA.emp_id AS "Emp_ID",empA.emp_name AS "Employee", empB.emp_id AS "Sup_ID",empB.emp_name AS "Supervisor" FROM employee empA, JOIN employee empB ON empA.emp_sup = empB.emp_id;
12. What is a CROSS JOIN?
Cross join can be defined as a cartesian product of the two tables included in the join. The table after join contains the same number of rows as in the cross-product of number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like an INNER JOIN.
SELECT emp.name, sal.amount FROM employee AS emp CROSS JOIN salary AS sal;
12. What is Normalization?
Normalization is the process of organizing your database tables in a way that your data is not redundant.
For example, a spreadsheet containing information about salespeople and customers serves several purposes:
By limiting a table to one purpose you reduce the number of duplicate data contained within your database.
There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries.
13. What is Denormalization?
Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema that has redundant information. The performance is improved by using redundancy and keeping redundant data consistent. The reason for performing denormalization is the overheads produced in the query processor by an over-normalized structure.
14. What are the different forms of Normalization?
1st Normal Form – This should remove all the duplicate columns from the table. Creation of tables for the related data and identification of unique columns.
2nd Normal Form – Meeting all requirements of the first normal form. Placing the subsets of data in separate tables and Creation of relationships between the tables using primary keys.
3rd Normal Form – This should meet all requirements of 2NF. Removing the columns which are not dependent on primary key constraints.
4th Normal Form – Meeting all the requirements of the third normal form and it should not have multi-valued dependencies.
15. What is a view?
A view is a virtual table that consists of a subset of data from one or more tables. Views are not virtually present, and it takes less space to store. View can have data of one or more tables combined, and it is depending on the relationship.
16. What is an Index?
An index is very similar to an index in the back of a book. For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to the index, which lists all the topics alphabetically and is then referred to one or more specific page numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.
17. What are all the different types of indexes?
There are three types of indexes -.
This indexing does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.
This type of index reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.
Example: Let’s consider a table called “Employees” with a clustered index on the “EmployeeID” column. The data within the table will be physically stored on disk in the order of the EmployeeID values. If you query the table based on the EmployeeID, the data can be quickly retrieved because it is organized according to the clustered index.
NonClustered Index does not alter the physical order of the table and maintains logical order of data. Each table can have 999 nonclustered indexes.
Example: Consider the same “Employees” table, but this time with a non-clustered index on the “LastName” column. The index will contain the LastName values in a sorted order along with pointers to the corresponding rows in the table. If you query the table based on the LastName, the non-clustered index allows for quick lookup and retrieval of the data rows.
18. What is a Cursor?
A database Cursor is a control pointer that enables traversal over the rows or records in the table. This can be viewed as a pointer to one row in a set of rows. The cursor is very much useful for traversing such as retrieval, addition, and removal of database records.
19. What is a query?
A DB query is a code written in order to put or get the information back from the database. Query can be designed in such a way that it matched our expectations of the result set. Simply, a question to the Database.
20. What is subquery?
A subquery is a query within another query. The outer query is called the main query, and the inner query is called subquery. A SubQuery is always executed first, and the result of the subquery is passed on to the main query. A sub-query is executed and its results are passed to the outer query which is then executed after.
21. What are the types of subquery?
There are two types of subquery – Correlated and Non-Correlated.
A correlated subquery is a subquery that uses fields from the outer query or the main query. It evaluates once for every row in the outer query.
A Non-Correlated sub-query is independent of the outer or main query. It does not reference any fields from the outer query.
22. What is a stored procedure?
Stored Procedure is a function consists of many SQL statements to access the database system. Several SQL statements are consolidated into a stored procedure and execute them whenever and wherever required.
Such procedures are stored in the database data dictionary. The sole disadvantage of a stored procedure is that it can be executed nowhere except in the database and occupies more memory in the database server. It also provides a sense of security and functionality as users who can’t access the data directly can be granted access via stored procedures.
23. What is a trigger?
A DB trigger is a code or program that automatically execute when some event on a table or view in a database takes place. Mainly, the trigger helps to maintain the integrity of the database.
The TRIGGER statement is used to define a set of actions that are automatically executed in response to specified database events, such as INSERT, UPDATE, or DELETE operations.
Example: When a new student is added to the student database, new records should be created in the related tables like Exam, Score and Attendance tables.
24. What are the TRUNCATE, DELETE and DROP statements?
DELETE – delete a row in a table
DELETE FROM Candidates WHERE CandidateId > 1000;
TRUNCATE – command is used to delete all the rows from the table and reset the table.
TRUNCATE TABLE Candidates;
DROP – command is used to remove an object from the database. If you drop a table, all the rows in that table are deleted and the table structure is removed from the database.
DROP TABLE user;
25. What is the difference between DROP and TRUNCATE statements?
If a table is dropped, all things associated with that table are dropped as well. This includes – the relationships defined on the table with other tables, the integrity checks and constraints, access privileges and other grants that the table has. To create and use the table again in its original form, all these relations, checks, constraints, privileges and relationships need to be redefined. However, if a table is truncated, none of the above problems exist and the table retains its original structure.
26. What is the difference between DELETE and TRUNCATE statements?
The TRUNCATE command is used to delete all the rows from the table and free the space containing the table.
The DELETE command deletes only the rows from the table based on the condition given in the where clause or deletes all the rows from the table if no condition is specified. But it does not free the space containing the table.
27. What are local and global variables and their differences?
Local variables are the variables that can be used or exist inside the function. They are not known to the other functions and those variables cannot be referred or used. Variables can be created whenever that function is called.
Global variables are the variables that can be used or exist throughout the program. The same variable declared in global cannot be used in functions. Global variables cannot be created whenever that function is called.
28. What is a constraint?
A constraint can be used to specify the limit on the data type of table. A constraint can be specified while creating or altering the table statement.
29. What is data Integrity?
Data Integrity defines the accuracy and consistency of data stored in a database. It can also define integrity constraints to enforce business rules on the data when it is entered into the application or database.
30. What is Auto Increment?
Autoincrement keyword allows the user to create a unique number to be generated when a new record is inserted into the table. AUTOINCREMENT keyword can be used in Oracle and IDENTITY keyword can be used in SQL SERVER.
Mostly this keyword can be used whenever the PRIMARY KEY is used.
31. What are Aggregate and Scalar functions?
An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. Following are the widely used SQL aggregate functions:
AVG() – Calculates the mean of a collection of values.
COUNT() – Counts the total number of records in a specific table or view.
MIN() – Calculates the minimum of a collection of values.
MAX() – Calculates the maximum of a collection of values.
SUM() – Calculates the sum of a collection of values.
FIRST() – Fetches the first element in a collection of values.
LAST() – Fetches the last element in a collection of values.
Note: All aggregate functions described above ignore NULL values except for the COUNT function.
A scalar function returns a single value based on the input value. Following are the widely used SQL scalar functions:
LEN() – Calculates the total length of the given field (column).
UCASE() – Converts a collection of string values to uppercase characters.
LCASE() – Converts a collection of string values to lowercase characters.
MID() – Extracts substrings from a collection of string values in a table.
CONCAT() – Concatenates two or more strings.
RAND() – Generates a random collection of numbers of given length.
ROUND() – Calculates the round off integer value for a numeric field (or decimal point values).
NOW() – Returns the current data & time.
FORMAT() – Sets the format to display a collection of values.
32. What is the difference between a Cluster and Non-Cluster Index?
a clustered index is used for easy retrieval of data from the database by altering the way that the records are stored. Database sorts out rows by the column which is set to be clustered index.
A nonclustered index does not alter the way it was stored but creates a completely separate object within the table. It points back to the original table rows after searching.
33. What is Datawarehouse?
Datawarehouse is a central repository of data from multiple sources of information. Those data are consolidated, transformed and made available for the mining and online processing. Warehouse data have a subset of data called Data Marts.
34. What is OLTP?
OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An essential attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. These systems are usually designed for a large number of users who conduct short transactions. Database queries are usually simple, require sub-second response times and return relatively few records. Here is an insight into the working of an OLTP system
35. What are the differences between OLTP and OLAP?
OLTP stands for Online Transaction Processing, which is a class of software applications capable of supporting transaction-oriented programs. An important attribute of an OLTP system is its ability to maintain concurrency. OLTP systems often follow a decentralized architecture to avoid single points of failure. These systems are generally designed for a large audience of end-users who conduct short transactions. Queries involved in such databases are generally simple, need fast response times and return relatively few records. The number of transactions per second acts as an effective measure for such systems.
OLAP stands for Online Analytical Processing, a class of software programs that are characterized by a relatively low frequency of online transactions. Queries are often too complex and involve a bunch of aggregations. For OLAP systems, the effectiveness measure relies highly on response time. Such systems are widely used for data mining or maintaining aggregated, historical data, usually in multi-dimensional schemas.
36. What is User-defined function? What are its various types?
The user-defined functions in SQL are like functions in any other programming language that accept parameters, perform complex calculations, and return a value. They are written to use the logic repetitively whenever required. There are two types of SQL user-defined functions:
Scalar Function: As explained earlier, user-defined scalar functions return a single scalar value.
Table-Valued Functions: User-defined table-valued functions return a table as output.
Inline: returns a table data type based on a single SELECT statement.
Multi-statement: returns a tabular result-set but, unlike inline, multiple SELECT statements can be used inside the function body.
37. What is collation?
Collation is defined as a set of rules that determine how character data can be sorted and compared. This can be used to compare A and, other language characters and also depends on the width of the characters.
ASCII value can be used to compare these character data.
38. What is Pattern Matching in SQL?
SQL pattern matching provides for pattern search in data if you have no clue as to what that word should be. This kind of SQL query uses wildcards to match a string pattern, rather than writing the exact word. The LIKE operator is used in conjunction with SQL Wildcards to fetch the required information.
a. Using the % wildcard to perform a simple search
The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern. Search a student in your database with first name beginning with the letter K:
SELECT * FROM students WHERE first_name LIKE 'K%'
b. Omitting the patterns using the NOT keyword
Use the NOT keyword to select records that don’t match the pattern. This query returns all students whose first name does not begin with K.
SELECT * FROM students WHERE first_name NOT LIKE 'K%'
c. Matching a pattern anywhere using the % wildcard twice
Search for a student in the database where he/she has a K in his/her first name.
SELECT * FROM students WHERE first_name LIKE '%Q%'
d. Using the _ wildcard to match pattern at a specific position
The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard. This query fetches all students with letter K at the third position in their first name.
SELECT * FROM students WHERE first_name LIKE '__K%'
e. Matching patterns for specific length
The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results. For example –
SELECT * /* Matches first names with three or more letters */ FROM students WHERE first_name LIKE '___%' SELECT * /* Matches first names with exactly four characters */ FROM students WHERE first_name LIKE '____'
39. What is the difference between UNION and UNION ALL in SQL?
UNION combines the result sets of two or more SELECT statements, removing duplicates, while UNION ALL combines the result sets without removing duplicates.
SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2;
UNION ALL:
SELECT column1, column2 FROM table1 UNION ALL SELECT column1, column2 FROM table2;
When deciding whether to use UNION or UNION ALL, consider the following:
It’s important to note that UNION and UNION ALL have performance implications. Since UNION performs a distinct operation, it requires additional processing time compared to UNION ALL, which simply combines the rows without any duplicate elimination. Therefore, if you know that your result sets do not contain duplicates, using UNION ALL can be more efficient.
40. What is an SQL injection, and how can it be prevented?
SQL injection is a type of security vulnerability in which an attacker can manipulate a SQL query to perform unauthorized operations or access sensitive data. It occurs when user input is not properly validated or sanitized before being used in constructing a SQL query, allowing malicious SQL code to be injected. It can be prevented by using prepared statements or parameterized queries and by properly sanitizing user input.
Consider a simple login form where a user enters their username and password. The server-side code might construct a SQL query like this:
String query = "SELECT * FROM users WHERE username = '" + username + "' AND password = '" + password + "'";
If the input values are not properly validated or sanitized, an attacker can manipulate the input by entering a malicious string like ' OR '1'='1' --'
, which can change the meaning of the query. The resulting query would be:
SELECT * FROM users WHERE username = '' OR '1'='1' --' AND password = ''
In this case, the attacker has successfully bypassed the login mechanism by making the condition '1'='1'
always true, effectively retrieving all user records from the database.
To prevent SQL injection, it is crucial to use parameterized queries or prepared statements, which separate the SQL code from the user input. Here’s an example of how to solve the SQL injection vulnerability using parameterized queries:
String query = "SELECT * FROM users WHERE username = ? AND password = ?"; PreparedStatement statement = connection.prepareStatement(query); statement.setString(1, username); statement.setString(2, password); ResultSet resultSet = statement.executeQuery();
Spring Data is a part of the larger Spring Framework and provides abstractions and utilities for interacting with databases. When using Spring Data’s repositories or query methods, the framework internally leverages features of the underlying persistence technology, such as Hibernate or JPA (Java Persistence API). These technologies handle parameterized queries and protect against SQL injection. By following best practices and utilizing Spring Data’s provided features, you can ensure that your queries are secure and protected from SQL injection vulnerabilities.
41. Explain the ACID properties in the context of database transactions.
ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These properties are essential for ensuring the reliability, integrity, and robustness of database transactions.
a. Atomicity: Atomicity guarantees that a transaction is treated as a single indivisible unit of work. It means that either all the operations within a transaction are successfully completed, or none of them are applied. If any part of the transaction fails, the entire transaction is rolled back to its initial state.
Example: Consider a banking application where a transfer of funds from one account to another is performed. To ensure atomicity, the debit from the sender’s account and the credit to the receiver’s account must occur together as a single atomic transaction. If either operation fails, the transaction is rolled back to maintain data consistency.
b. Consistency: Consistency ensures that a transaction brings the database from one consistent state to another. It enforces integrity constraints, domain rules, and predefined data validations during the transaction, ensuring that the data is valid and consistent before and after the transaction.
Example: In a database where the age of a person must be greater than zero, if a transaction tries to update the age of a person to a negative value, the consistency property will prevent the transaction from being committed, maintaining the data’s integrity.
c. Isolation: Isolation ensures that concurrent transactions do not interfere with each other and are executed in isolation. Each transaction should appear as if it is running in isolation, even when multiple transactions are executing concurrently. This property prevents issues like dirty reads, non-repeatable reads, and phantom reads.
Example: When two transactions simultaneously try to update the same record in a database, the isolation property ensures that each transaction sees its own consistent snapshot of the data and does not interfere with the other transaction’s updates until they are committed.
d. Durability: Durability guarantees that once a transaction is committed, its changes are permanent and will survive any subsequent system failures. The changes made by a committed transaction are stored in a durable manner, typically on disk or other persistent storage, to ensure their long-term persistence.
Example: After a successful transaction commits, the changes made to the database, such as inserting, updating, or deleting records, will be durably stored, even in the event of a system crash or power failure. The changes will be available and recoverable when the system is restarted.
To use ACID transactions with Hibernate, you typically perform the following steps:
Configure Hibernate’s transaction management: Set up the required configurations, such as defining the database connection, transaction manager, and transactional boundaries.
Annotate transactional methods: Use Hibernate’s @Transactional
annotation on the methods that should participate in transactions. This annotation ensures that the methods are executed within a transactional context, adhering to the ACID properties.
Perform database operations: Use Hibernate’s session or entity manager to perform database operations within the transactional methods. Hibernate automatically manages the transaction boundaries and ensures that the ACID properties are maintained.
With Hibernate’s transparent transaction management, you can rely on its underlying support for ACID transactions, allowing you to focus on the business logic without explicitly handling transaction management code. Hibernate integrates with various transaction managers, such as Java Transaction API (JTA) or Spring’s transaction management, to provide ACID support.
Here’s how ACID transactions can be used with Spring Data:
Configure Transaction Management: In your Spring configuration file, set up the necessary configurations for transaction management. This includes defining a transaction manager bean, specifying the data source, and configuring the transaction boundaries.
Annotate Transactional Methods: Use Spring’s @Transactional
annotation to mark the methods that should participate in transactions. This annotation can be applied at the class level or the method level, depending on your requirements.
@Service @Transactional public class UserService { @Autowired private UserRepository userRepository; public User createUser(User user) { // Perform data manipulation operations using the UserRepository methods // ... } }
ACID transactions can be used with Java JDBC (Java Database Connectivity), which is a standard Java API for interacting with databases.
Here’s how you can work with ACID transactions using Java JDBC:
Establish Database Connection: Use the JDBC API to establish a connection to the database by providing the necessary connection details, such as the database URL, username, and password.
Connection connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydatabase", "username", "password");
Disable Auto-Commit: By default, JDBC auto-commits each individual SQL statement as a separate transaction. To enable ACID transactions, you need to disable the auto-commit mode on the connection.
connection.setAutoCommit(false);
Execute SQL Statements: Use the JDBC API to execute SQL statements within the transactional context. You can create and execute statements using the Statement
or PreparedStatement
interfaces.
Statement statement = connection.createStatement(); String sql = "INSERT INTO users (name, email) VALUES ('John', 'john@example.com')"; statement.executeUpdate(sql);
Commit or Rollback: After executing all the SQL statements within the transaction, you can choose to commit the transaction to make the changes permanent or roll back the transaction to discard the changes.
connection.commit(); connection.rollback();
By explicitly managing the transaction boundaries using JDBC, you can ensure the ACID properties are maintained. Disabling auto-commit mode allows you to group multiple SQL statements into a single transaction, which will either be committed as a whole or rolled back if an error occurs.
It’s important to note that working with ACID transactions using JDBC requires you to handle exception handling, connection management, and transactional logic manually. Frameworks like Hibernate and Spring Data provide higher-level abstractions that simplify transaction management and handle many of these tasks automatically.
Additionally, you should also consider using a connection pool, such as Apache Commons DBCP or HikariCP, to improve connection management and efficiency in a production environment. Connection pools help manage and reuse database connections, reducing the overhead of establishing connections for each transaction.
42. What is the purpose of the HAVING clause in SQL?
43. What are the differences between HAVING and WHERE?
The HAVING and WHERE clauses are both used in SQL to filter and restrict data, but they are used in different contexts and operate on different parts of a query. Here are the key differences between the HAVING and WHERE clauses:
Usage:
Application:
Placement:
Applicable Expressions:
Grouping Requirement:
In summary, the WHERE clause is used to filter rows before grouping and aggregation, while the HAVING clause is used to filter grouped and aggregated data after the grouping and aggregation operations have taken place. The WHERE clause operates on individual column values, while the HAVING clause applies conditions to aggregated values resulting from the GROUP BY operation.
1. What is Serialization?
Serialization is the process of converting an object and its states to a binary stream that can then be saved into a database, cached or sent over the network to another JVM.
2. What is Deserialization?
Deserialization is the process of converting back a serialized object.
import java.io.Serializable; import org.apache.commons.lang3.builder.ToStringBuilder; import org.apache.commons.lang3.builder.ToStringStyle; public class User implements Serializable { private static final long serialVersionUID = 3L; private String name; private int age; private String email; transient private double height; //setters and getters @Override public String toString() { return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE); } }
import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; public class SerializationDemo { public static void main(String[] args) throws IOException, ClassNotFoundException { String path = "serialized_user.ser"; User serializedUser = serialize(path); System.out.println("serializedUser=" + serializedUser.toString()); User deserializedUser = deserialize(path); System.out.println("deserializedUser=" + deserializedUser.toString()); } private static User serialize(String path) throws IOException { FileOutputStream fileStream = new FileOutputStream(path); ObjectOutputStream objectStream = new ObjectOutputStream(fileStream); User user = new User(); user.setAge(21); user.setName("Laulau"); user.setEmail("folaukaveinga@gmail.com"); user.setHeight(6.3); objectStream.writeObject(user); objectStream.close(); fileStream.close(); return user; } private static User deserialize(String path) throws IOException, ClassNotFoundException { FileInputStream fileInputStream = new FileInputStream(path); ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream); User user = (User) objectInputStream.readObject(); objectInputStream.close(); fileInputStream.close(); return user; } }
Result
serializedUser=User[name=Laulau,age=21,email=folaukaveinga@gmail.com] deserializedUser=User[name=Laulau,age=21,email=folaukaveinga@gmail.com]
3. What is Externalization?
Serialization is not very efficient. If you serialize bloated objects having a lot of attributes and properties, you do not wish to serialize. This is where Externalization in Java comes into the picture. Externalization is an advanced way of serialization. It is whenever you need to customize the serialization mechanism.
The class you want to be externalized must implement the Externalizable interface, then serialization of the object will be done using the method writeExternal(). When the Externalizable object is reconstructed at the receiver’s end, an instance will be created using no-argument constructor and this method is called readExternal().
This basically serves the purpose of custom Serialization, where you can decide what to store in a stream.
When to use Externalization?
When you want or need to serialize only part of an object, Externalization is the best option.
How to use Externalization?
By implementing the Externalizable interface, you control the process of reading and writing the objects during serialization and de-serialization process, you need to have the object’s class implemented the interface java.io.Externalizable. Only then you can implement your own code to read and write the object’s states. The methods, readExternal() and writeExternal() are defined by the Externalizable interface.
import java.io.Externalizable; import java.io.IOException; import java.io.ObjectInput; import java.io.ObjectOutput; import org.apache.commons.lang3.builder.ToStringBuilder; import org.apache.commons.lang3.builder.ToStringStyle; public class Member implements Externalizable { public static final long serialVersionUID = 2L; private String name; private int age; private String email; transient private double height; // setters and getters @Override public String toString() { return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE); } @Override public void writeExternal(ObjectOutput out) throws IOException { System.out.println("writing object state..."); // TODO Auto-generated method stub out.writeObject(name); out.writeObject(email); out.writeInt(age); out.writeDouble(height); } @Override public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { System.out.println("reading object state..."); this.name = (String) in.readObject(); this.email = (String) in.readObject(); this.age = in.readInt(); this.height = in.readDouble(); } }
public class ExternalizeDemo { public static void main(String[] args) throws IOException, ClassNotFoundException { String path = "externalized_member.ser"; Member externalizedMember = serialize(path); System.out.println("externalizedMember=" + externalizedMember.toString()); Member dexternalizedMember = deserialize(path); System.out.println("dexternalizedMember=" + dexternalizedMember.toString()); } private static Member serialize(String path) throws IOException { FileOutputStream fileStream = new FileOutputStream(path); ObjectOutputStream objectStream = new ObjectOutputStream(fileStream); Member member = new Member(); member.setAge(21); member.setName("Laulau"); member.setEmail("folaukaveinga@gmail.com"); member.setHeight(6.3); System.out.println(member.toString()); objectStream.writeObject(member); objectStream.close(); fileStream.close(); return member; } private static Member deserialize(String path) throws IOException, ClassNotFoundException { FileInputStream fileInputStream = new FileInputStream(path); ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream); Member member = (Member) objectInputStream.readObject(); objectInputStream.close(); fileInputStream.close(); return member; } }
Result
Member[name=Laulau,age=21,email=folaukaveinga@gmail.com] writing object state... externalizedMember=Member[name=Laulau,age=21,email=folaukaveinga@gmail.com] reading object state... dexternalizedMember=Member[name=Laulau,age=21,email=folaukaveinga@gmail.com]
4. What are the differences between Serialization and Externalization?
Externalizable provides us with writeExternal()
and readExternal()
methods that give us control on what attributes and values to serialize. Correct implementation of the Externalizable interface can improve the performance of application drastically.
Externalization | Serialization | |
UID | No need to have it but for best practice you must have it | Needs a serialVersionUID |
Storage | Store the data per attribute. | Store the whole object directly. |
Access | Provides complete control to the serialization and deserialization process. | No access |
5. How many methods does Serializable have?
Zero. Serializable is a marker interface.
6. What is a serialVersionUID? What happens when you don’t explicitly provide?
serialVersionUID is a version control id. It is basically the hashcode of the object. So if you change the object(add or remove an attribute) then the hashcode will be different so an already serialized object will not be able to deserialize. Java serialization process relies on correct serialVersionUID for recovering the state of the serialized object and throws java.io.InvalidClassException in case of serialVersionUID mismatch
When an object is serialized, the serialVersionUID is serialized along with the other contents. Later when that is deserialized, the serialVersionUID from the deserialized object is extracted and compared with the serialVersionUID of the loaded class. If the numbers do not match then, InvalidClassException is thrown.
serialVersionUID is a 64-bit hash of the class name, interface class names, methods and fields. Serialization runtime generates a serialVersionUID if you do not add one. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations and can produce different serialVersionUID in different environments.
If you don’t provide it, Java will provide it for you.
7. What happens if an object is serializable but it includes a reference to a non-serializable object?
Java will throw a NotSerializableException. To fix this issue you can do either one of these two things:
a. Make the non-serializable object serializable by making its class implement the Serializable interface.
b. Make the non-serializable object transient or static.
8. If a class is serializable but its superclass in not, what will be the state of the instance variables inherited from super class after deserialization?
The state will be its default value. For example:
int age; -> age = 0;
String name; -> name = null;
9. To serialize an array or a collection all the members of it must be serializable. True /False?
true
All standard implementations of collections List, Set and Map interface already implement java.io.Serializable. This means you do not really need to write anything specific to serialize collection objects. However, you should keep the following things in mind before you serialize a collection object – Make sure all the objects added in a collection are Serializable.
10. While serializing you want some of the members not to serialize, how do you achieve it?
If you don’t want any field to be part of the object’s state then declare it either static or transient based on your need and it will not be included during the serialization process.
Variables marked with transient are not included in the process of serialization and are not part of the object’s serialized state.
The static variables belong to a class are not part of the state of the object so they are not saved as part of the serialized object.
Why static variables are not serialized in Java?
The Java variables declared as static are not considered part of the state of an object since they are shared by all instances of that class. Saving static variables with each serialized object would have following problems
11. What should you take care of when serializing an object?
You should make sure that all the included objects are also serializable. If any of the objects is not serializable then it throws a NotSerializableException.
12. Suppose a superclass of a new class implements Serializable interface, how can you avoid new class to being serialized?
Use Externalization and exclude the new class from the serialization process.
13. What are the ways to speed up Object Serialization? How to improve Serialization performance?
The default Java Serialization mechanism is really useful, however it can have a really bad performance based on your application and business requirements. The serialization process performance heavily depends on the number and size of attributes you are going to serialize for an object. Below are some tips you can use for speeding up the marshaling and un-marshaling of objects during Java serialization process.
14. Do constructors get invoked or run at deserialization?
When an instance of a serializable class is deserialized, the constructor does not run. If constructors were invoked it will give back the initial values rather than the value they had at the time of serialization.
15. What are the compatible and incompatible changes when dealing with versioned serialized objects during serialization?
Changes to a serializable class can be compatible or incompatible. Following is the list of changes which are compatible:
List of incompatible changes:
16. What are the guidelines for using serialVersionUID?
17. Does setting the serialVersionUID class field improve Java serialization performance?
Declaring an explicit serialVersionUID field in your classes saves some CPU time only the first time the JVM process serializes a given Class. However the gain is not significant, In case when you have not declared the serialVersionUID its value is computed by JVM once and subsequently kept in a soft cache for future use.
18. Suppose you have a class in which you serialized and stored in persistence and later modified that class to add a new field. What will happen if you deserialize the object already serialized?
It depends on whether a class has its own serialVersionUID or not. As we know that if we don’t provide serialVersionUID in our code java compiler will generate it and normally it’s equal to hashCode of the object. By adding any new field there is a chance that new serialVersionUID generated for that class version is not the same as the serialized object. In this case, Java Serialization API will throw java.io.InvalidClassException
and this is the reason its recommended to have your own serialVersionUID in code and make sure to keep it same always for a single class.