Aggregate operators in dbms


What do you mean by Aggregate operators or Aggregate Functions in dbms? What is Basic Aggregation? What is Aggregation with Grouping? What is The Having Clause?How Null and Boolean Values are used in Aggregation?
Aggregate operators in dbms
Learn about Aggregate Operators
If you have any queries related to these topics, you have landed on the perfect page. In this post, I am going to make you understand what are these Aggregate Operators or Functions in DBMS. So let's jump right into it-

What are Aggregate Operators or Functions?

Aggregate operators are functions that take a collection (a set or multiset) of values as input and return a single value.

SQL offers five built-in aggregate functions:

Average: avg

Minimum: min

Maximum: max

Total: sum

Count: count

The input to sum and avg must be a collection of numbers, but the other operators can operate on collections of non-numeric data types, such as strings, as well.

What is Basic Aggregation? 

Consider the query "Find the average payment of instructors in the Computer Science department". We write this query as follows: 

select avg (salary
from instructor 
where dept_name= 'Comp.Sci.'; 

The results of this query is a relation with a single attribute, containing a single tuple with a numerical value corresponding to the average payment of instructors in the Computer Science department. 

The database system could provide an arbitrary name to the result relation attribute that is generated by aggregation; but, we can provide a meaningful name to the attribute by using the as clause as follows: 

select avg (salary) as avg_regular 
from instructor 
where dept_name= 'Comp.Sci.'; 

In the instructor relation, the salaries in the Computer Science department are $75,000, $65,000, and $92,000. 

The average balance is $232,000/3 = $77,333.33. Retaining duplicates is very important in computing an average. 

Suppose the Computer Science department adds a fourth instructor whose payment happens to be $75,000. If duplicates were eliminated, we would obtain the incorrect answer ($232,000/4 = $58.000) instead of the right answer of $76,750. 

There are cases where we must eliminate duplicates before computing an aggregate function. If we do wish to eliminate duplicates, we use the keyword distinct in the aggregate expression. 

An example arises in the query "Find the total number of instructors who teach a course in the Spring 2010 semester." In this case, an instructor counts only once, in spite of the number of course sections that the instructor teaches. 

The desired data is contained in the relation teaches, and that we write this query as follows: 

select count (distinct ID) 
from teaches 
where semester = 'Spring' and year = 2010; 

Because of the keyword distinct preceding ID, although an instructor teaches more than one course, she is counted just once in the result. 

We use the aggregate function count often to count the quantity of tuples in a relation. The notation for this function in SQL is count (*). 

Thus, to search out the quantity of tuples in the course relation, we write 

select count (*) 
from course;

SQL does not permit the use of distinct with count (*). It is legal to use distinct with max and min, although the result does not change. 

We can use the keyword all in place of distinct to specify duplicate retention, but, since all is the default, there is no need to do so. 

 What is Aggregation with Grouping? 


There are circumstances where we would wish to apply the aggregate function not only to a single set of tuples, but also to a bunch of sets of tuples; we specify this wish in SQL using the group by clause. 

The attribute or attributes given in the group by clause are used to form groups. Tuples with the same value on all attributes in the group by clause are placed in one group. 

As an illustration, consider this query "Find the average salary in every department." We write this query as follows: 

select dept_name, avg (salary) as avg_salary 
from instructor
group by dept_name
Aggregate operators in dbms
Tuples of the instructor relation, grouped by the dept_name attribute
Image above shows the tuples in the instructor relation grouped by the dept_name attribute, which is the first step in computing the query result. 

The specified aggregate is computed for every group, and the result of the query is shown in image below. In distinction, consider the query "Find the average salary of all instructors." 

Aggregate operators in dbms
The result relation for the query "Find the average salary in each department"
We write this query as follows: 

select avg (salary
from instructor;

In this case the group by clause has been omitted, therefore the entire relation is treated as a single group. 

As another example of aggregation on groups of tuples, think about the query "Find the quantity of instructors in every department who teach a course in the Spring 2010 semester." 

Information about which instructors teach which course sections in which semester is available in the teaches relation. 

However, this information has to be joined with information from the instructor relation to get the department name of each instructor. Thus, write this query as follows: 

select dept_name, count (distinct ID) as instr_count 
from instructor natural join teaches 
where semester = 'Spring' and year = 2010 
group by dept_name

When an SQL query uses grouping, it is necessary to make sure that the only attributes that appear in the select statement without  being aggregated are those that are present in the group by clause. 

In different words, any attribute that is not present in the group by clause must appear only inside an aggregate function if it appears in the select clause, otherwise the query is treated as incorrect. 

For instance, the following query is incorrect since ID does not appear in the group by clause, and yet it appears in the select clause without being aggregated:

/* incorrect query */
select dept_name, ID, avg (salary
from instructor 
group by dept_name

each instructor in a particular group (defined by dept_name) can have a distinct ID, and since only one tuple is output for each group, there is no distinctive manner of selecting which ID value to output. As a result, such cases are disallowed by SOL. 

What is The Having Clause? 


At times, it's helpful to state a condition that applies to groups rather than to tuples. As an example, we might be interested in only those departments where the average salary of the instructors is more than $42,000. 

This condition does not apply to a single tuple; rather, it applies to each group created by the group by clause. To express such a query, we use the having clause of SQL. 

SQL applies predicates in the having clause once groups have been formed, so aggregate functions may be used. We express this query in SQL as follows: 

                               select dept_nameavg (salaryas avg_salary    
from instructor
group by dept_name
having avg (salary) > 42000; 

Aggregate operators in dbms
The result relation for the query "Find the average salary of instructors in those departments where the avg salary is more than $42,000."
As was the case for the select clause, any attribute that is present in the having clause without being aggregated must appear in the group by clause, otherwise the query is treated as incorrect. 

The meaning of a query containing aggregation, group by, or having clauses is defined by the following sequence of operations: 

1. As was the case for queries without aggregation, the from clause is first evaluated to get a relation.

2. If a where clause is present, the predicate in the where clause is applied on the result relation of the from clause. 

3. Tuples satisfying the where predicate are then placed into groups by the group by clause if it is present. If the group by clause is absent, the entire set of tuples satisfying the where predicate is treated as being in one group. 

4. The having clause, if it is present, is applied to every group; the groups don't satisfy the having clause predicate are removed. 

5. The select clause uses the remaining groups to generate tuples of the result of the query, applying the aggregate functions to get a single result tuple for each group. 

To illustrate the use of both a having clause and a where clause in the same query, we consider the query "For every course section offered in 2009, find the average total credits (tot_cred) of all students enrolled in the section, if the section had at least 2 students." 

select course_id, semester, year, sec_id, avg (tot_cred
from takes natural join student 
where year = 2009 
group by course_id, semester, year, sec_id 
having count (ID) >= 2; 

Note that all the required information for the preceding query is accessible from the relations takes and student, and that although the query pertains to sections, a join with section is not required. 

What are Aggregation with Null and Boolean Values? 


Null values, when they exist, complicate the processing of aggregate operators. As an example, assume that some tuples in the instructor relation have a null value for salary. 

Consider the following query to total all salary amounts: 

select sum (salary
from instructor

The values to be summed in the preceding query includes null values, since some tuples have a null value for salary

Rather than say that the total sum is itself null, the SQL standard says that the sum operator should ignore null values in its input. 

In general, aggregate functions treat nulls  with the following rule: All aggregate functions except count (*) ignore null values in their input collection. 

As a result of null values being ignored, the collection of values may be empty. The count of an empty collection is defined to be zero, and all other aggregate operations return a value of null when applied on an empty collection. 

The effect of null values on some of the more sophisticated SQL constructs can be delicate. A Boolean data type that can take values true, false, and unknown, was introduced in SQL:1999. 

The aggregate functions some and every, which mean specifically what you would intuitively expect, can be applied on a collection of Boolean values.

I hope you got your queries solved and if you still got any, feel free to comment it down below👇💬. Share this post with your friends to help them as well😉.

Previous
Next Post »

No comments:

Post a Comment

Please do not enter any spam links in comment box.