Oracle | Analytic Functions

Oracle Analytic Functions Version 21c

General Information

Library Note

Morgan's Library Page Header

The best Oracle News for FY2026

Oracle Database 26ai will be available on generic Linux platforms in January and soon on AIX and Windows

Analytic Functions

ANY_VALUE	FEATURE_VALUE	PREDICTION
AVG	FIRST	PREDICTION_COST
BIT_AND_AGG	FIRST_VALUE	PREDICTION_DETAILS
BIT_OR_AGG	IGNORE_NULLS	PREDICTION_PROBABILITY
BIT_XOR_AGG	KURTOSIS_POP	PREDICTION_SET
CHECKSUM	KURTOSIS_SAMP	RANK
CLUSTER_DETAILS	LAG	RATIO_TO_REPORT
CLUSTER_DISTANCE	LAST	REGR_
CLUSTER_ID	LAST_VALUE	ROW_NUMBER
CLUSTER_PROBABILITY	LEAD	SKEWNESS_POP
CLUSTER_SET	LISTAGG	SKEWNESS_SAMP
CORR	MATCH_RECOGNIZE	STDDEV
COUNT	MAX	STDDEV_POP
COVAR_POP	MIN	STDDEV_SAMP
COVAR_SAMP	NTH_VALUE	SUM
CUME_DIST	NTILE	VAR_POP
DENSE_RANK	PERCENTILE_CONT	VAR_SAMP
FEATURE_DETAILS	PERCENTILE_DISC	VARIANCE
FEATURE_ID	PERCENT_RANK	Additional Demos
FEATURE_SET

ANY_VALUE (new 20c)

Returns a single non-deterministic value of expr. Can be used with both aggregate or analytic functions ANY_VALUE(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN <data_type>;

SELECT c.cust_id, ANY_VALUE(cust_last_name),  SUM(amount_sold)

      FROM customers c, sales s

      WHERE s.cust_id = c.cust_id

      GROUP BY c.cust_id;

AVG

Returns a running average

Three overloads: NUMBER, DOUBLE, and FLOAT AVG([DISTINCT | ALL] <expression>) OVER (analytic clause) <additional analytic function clauses> RETURN NUMBER;

conn uwclass/uwclass@pdbdev

      

      CREATE TABLE vote_count (

      submit_date  DATE NOT NULL,

      num_votes    NUMBER NOT NULL);

      

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-4, 100);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-3, 150);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-2, 75);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-3, 25);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-1, 50);

      COMMIT;

      

      SELECT * FROM vote_count;

      

      SELECT submit_date, num_votes, TRUNC(AVG(num_votes)

      OVER(ORDER BY submit_date  ROWS UNBOUNDED PRECEDING)) AVG_VOTE_PER_DAY

      FROM vote_count

      ORDER BY submit_date;

      

      SELECT submit_date, num_votes, TRUNC(AVG(num_votes)

      OVER(PARTITION BY submit_date
      ORDER BY submit_date  ROWS UNBOUNDED PRECEDING)) AVG_VOTE_PER_DAY

      FROM vote_count

      ORDER BY submit_date;

BIT_AND_AGG (new 20c)

Bitwise aggregation function that returns the result of a bitwise AND operation BIT_AND_AGG(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

conn uwclass/uwclass@pdbdev

      

      SELECT '011' num, bin_to_num(0,1,1) bits FROM dual

      UNION ALL SELECT '101' num, bin_to_num(1,0,1) bits FROM dual;

      

      NUM  BITS

      --- -----

      011     3

      101     5

      

      SELECT bit_and_agg(bits)

      FROM (

        SELECT '011' num, bin_to_num(0,1,1) bits FROM dual

        UNION ALL

        SELECT '101' num, bin_to_num(1,0,1) bits FROM dual);

      

      BIT_AND_AGG(BITS)

      -----------------

                      1

BIT_OR_AGG (new 20c)

Bitwise aggregation function that returns the result of a bitwise OR operation BIT_OR_AGG(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER; [DISTINCT | ALL] <expression>) OVER (analytic clause) <additional analytic function clauses> RETURN NUMBER;

TBD

BIT_XOR_AGG (new 20c)

Bitwise aggregation function that returns the result of a bitwise XOR operation BIT_XOR_AGG(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER; [DISTINCT | ALL] <expression>) OVER (analytic clause) <additional analytic function clauses> RETURN NUMBER;

TBD

CHECKSUM (new 20c)

Detects changes in a table. The order of the rows in the table does not affect the result. Use with DISTINCT, as part of a GROUP BY query, as a window function, or an analytical function CHECKSUM(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

TBD

CLUSTER_DETAILS

Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns an XML string that describes the predicted cluster or a specified cluster. CLUSTER_DETAILS (INTO n [,<cluster_id>[,<topN>]] [<ABS | ASC | DESC>] USING <mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>])

SELECT * FROM (

        SELECT cust_id, CLUSTER_ID (INTO 4 USING *) OVER () cls,

                        CLUSTER_DETAILS(INTO 4 USING *) OVER () cls_details

        FROM mining_data_apply_v)

      WHERE cust_id <= 100003

      ORDER BY 1;

CLUSTER_DISTANCE

Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns the raw distance between each row and the centroid of either the predicted cluster or a specified. CLUSTER_DISTANCE(INTO n [,<cluster_id>] <mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>]) RETURN BINARY_DOUBLE;

TBD

CLUSTER_ID

Returns the cluster identifier of the predicted cluster with the highest probability for the set of predictors specified in the mining_attribute_clause CLUSTER_ID(<schema.model> <mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>]) RETURN NUMBER;

TBD

CLUSTER_PROBABILITY

Returns a measure of the degree of confidence of membership of an input row in a cluster associated with the specified model CLUSTER_PROBABILITY(<schema.model>, <cluster_id> <mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>]) RETURN BINARY_DOUBLE;

TBD

CLUSTER_SET

Returns a varray of objects containing all possible clusters that a given row belongs to. Each object in the varray is a pair of scalar values containing the cluster ID and the cluster probability. The object fields are named CLUSTER_ID and PROBABILITY, and both are Oracle NUMBER CLUSTER_SET(<schema.model>, <top N>, <cutoff><mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>])

TBD

CORR

Returns the coefficient of correlation of a set of number pairs CORR(<expression1>, <expression2>) OVER (<analytic clause>)

conn sh/sh@pdbdev

      

      SELECT t.calendar_month_number,

      CORR (SUM(s.amount_sold), SUM(s.quantity_sold))

      OVER  (ORDER BY t.calendar_month_number) AS CUM_CORR

      FROM sales s, times t

      WHERE s.time_id = t.time_id AND calendar_year = 1998

      GROUP BY t.calendar_month_number;

COUNT

Returns a running count of all records or by partition COUNT(<*, [DISTINCT | ALL] <expression>>) OVER (<analytic clause>)

conn uwclass/uwclass@pdbdev

      

      SELECT submit_date, num_votes, TRUNC(COUNT(num_votes)

      OVER(ORDER BY submit_date  ROWS UNBOUNDED PRECEDING)) AS DAY_COUNT

      FROM vote_count

      ORDER BY submit_date;

      

      SELECT submit_date, COUNT(*)

      OVER(PARTITION BY submit_date ORDER BY submit_date

      ROWS UNBOUNDED PRECEDING) NUM_RECS

      FROM vote_count;

Return a frequency distribution conn uwclass/uwclass@pdbdev CREATE TABLE myprods ( prod1 NUMBER(3), prod2 NUMBER(3), prod3 NUMBER(3)); INSERT INTO myprods VALUES (34,23,45); INSERT INTO myprods VALUES (34,22,34); INSERT INTO myprods VALUES (54,44,45); INSERT INTO myprods VALUES (23,22,45); INSERT INTO myprods VALUES (45,22,34); SELECT prod1, COUNT(prod1) OVER (PARTITION BY prod1) freq1, prod2, COUNT(prod2) OVER (PARTITION BY prod2) freq2, prod3, COUNT(prod3) OVER (PARTITION BY prod3) freq3 FROM myprods;

COVAR_POP

Returns the population covariance of a set of number pairs COVAR_POP(<expression1>, <expression2>) OVER (<analytic clause>)

conn oe/oe@pdbdev

      

      SELECT job_id,

      COVAR_POP(SYSDATE-hire_date, salary) AS covar_pop,

      COVAR_SAMP(SYSDATE-hire_date, salary) AS covar_samp

      FROM employees

      WHERE department_id in (50, 80)

      GROUP BY job_id;

COVAR_SAMP

Returns the sample covariance of a set of number pairs COVAR_SAMP(<expression1>, <expression2>) OVER (<analytic clause>)

See COVAR_POP Demo above

CUME_DIST

Returns the cumulative distribution of a value in a group of values CUME_DIST(<value>) OVER (<partition_clause> <order by clause>)

conn oe/oe@pdbdev

      

      SELECT job_id, last_name, salary, CUME_DIST()

      OVER (PARTITION BY job_id  ORDER BY salary) AS cume_dist

      FROM employees

      WHERE job_id LIKE 'PU%';

DENSE_RANK

Ranks items in a group leaving no gaps in ranking sequence when there are ties DENSE_RANK() OVER (<query_partition_clause> <order_by_clause>)

conn oe/oe@pdbdev

      

      SELECT d.department_name, e.last_name, e.salary, DENSE_RANK()

       OVER (PARTITION BY e.department_id  ORDER BY e.salary) AS DENSE_RANK

      FROM employees e, departments d

      WHERE e.department_id = d.department_id

      AND d.department_id IN (30, 60);

FEATURE_DETAILS

Predicts feature matches for each row. Can use a pre-defined feature extraction model or perform dynamic feature extraction. Returns an XML string that describes the predicted feature or a specified feature. FEATURE_DETAILS(<schema.model><model>[feature_id[,topN]][<ABS | ACS |DESC>] <mining_attribute_clause> OVER ([<query_partition_clause>] [<order_by_clause>])

TBD

FEATURE_ID

Returns an Oracle NUMBER that is the identifier of the feature with the highest coefficient value FEATURE_ID(<schema.model> <mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>])

TBD

FEATURE_SET

Returns a varray of objects containing all possible features. Each object in the varray is a pair of scalar values containing the feature ID and the feature value. The object fields are named FEATURE_ID and VALUE, and both are Oracle NUMBERs. FEATURE_SET(<schema.model>, <top N>, <cutoff><mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>])

TBD

FEATURE_VALUE

Returns the value of a given feature. If the feature_id argument is omitted, then returns the highest feature value. Use this form in conjunction with the FEATURE_ID function to obtain the largest feature/value combo. FEATURE_VALUE(<schema.model>,<feature_id><mining_attribute_clause>) OVER ([<query_partition_clause>] [<order_by_clause>])

TBD

FIRST

Returns the row ranked first using DENSE_RANK SELECT <aggregate_function(column_name)> KEEP (DENSE_RANK FIRST ORDER BY <column_name> [<ASC|DESC> NULLS <FIRST|LAST>) OVER (PARTITION BY <column_name>) FROM <table_name> GROUP BY <column_name>;

conn oe/oe@pdbdev

      

      SELECT last_name, department_id, salary,

      MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct)

       OVER (PARTITION BY department_id) "Worst",

      MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct)

       OVER (PARTITION BY department_id) "Best"

      FROM employees

      WHERE department_id IN (30, 60)

      ORDER BY department_id, salary;

FIRST_VALUE

Returns the first value in an ordered set of values. If the first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS FIRST_VALUE(<expression> [<RESPECT | IGNORE> NULLS]) OVER (<analytic clause>)

conn oe/oe@pdbdev

      

      SELECT last_name, salary, hire_date, FIRST_VALUE(hire_date)

      OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv

      FROM (SELECT * FROM employees WHERE department_id = 90

      ORDER BY hire_date);

IGNORE NULLS

The following is a minor modification of a demo published in the November/December 2006 issue of Oracle Magazine by Tom Kyte (<column_name> IGNORE NULLS)

conn uwclass/uwclass@pdbdev

      

      CREATE TABLE t1 (

      row_num NUMBER(3),

      col1    VARCHAR2(15),

      col2    VARCHAR2(15));

      

      INSERT INTO t1 VALUES (6, NULL, NULL);

      INSERT INTO t1 VALUES (1, 'Category 1', 'Mango');

      INSERT INTO t1 VALUES (2, NULL, NULL);

      INSERT INTO t1 VALUES (3, NULL, NULL);

      INSERT INTO t1 VALUES (4, NULL, 'Banana');

      INSERT INTO t1 VALUES (5, NULL, NULL);

      INSERT INTO t1 VALUES (6, NULL, NULL);

      INSERT INTO t1 VALUES (7, 'Category 2', 'Vanilla');

      INSERT INTO t1 VALUES (8, NULL, NULL);

      INSERT INTO t1 VALUES (9, 'Category 3', 'Strawberry');

      COMMIT;

      

      SELECT * FROM t1;

      

      SELECT row_num,

      LAST_VALUE(col1  IGNORE NULLS) OVER (ORDER BY row_num) col1,

      LAST_VALUE(col2  IGNORE NULLS) OVER (ORDER BY row_num) col2

      FROM t1

      ORDER BY row_num;

KURTOSIS_POP (new 20c)

The population kurtosis function is primarily used to determine the characteristics of outliers in a given distribution KURTOSIS_POP(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

TBD

KURTOSIS_SAMP (new 20c)

The sample kurtosis function is primarily used to determine the characteristics of outliers in a given distribution KURTOSIS_SAMP(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

TBD

LAG

Provides access to more than one row of a table at the same time without a self-join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. LAG(<value expression>, <offset>, <default>) [<RESPECT | IGNORE> NULLS] OVER ([<query partition clause>] <order_by_clause>);

conn oe/oe@pdbdev

      

      SELECT last_name, hire_date, salary,

      LAG(salary, 1, 0)  OVER (ORDER BY hire_date) AS PREV_SAL

      FROM employees

      WHERE job_id = 'PU_CLERK';

LAST

Returns the row ranked last using DENSE_RANK <aggregate function> KEEP (DENSE_RANK LAST ORDER BY (<expression> <ASC | DESC> NULLS <FIRST | LAST>)

See FIRST Demo above

LAST_VALUE

Returns the last value in an ordered set of values. If the last value in the set is null, returns NULL unless IGNORE NULLS is specified. LAST_VALUE (<expression> [<RESPECT | IGNORE> NULLS]) OVER (<analytic clause>);

conn oe/oe@pdbdev

      

      SELECT last_name, salary, hire_date, FIRST_VALUE(hire_date)

      OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv

      FROM (SELECT * FROM employees WHERE department_id = 90

      ORDER BY hire_date);

LEAD

LEAD provides access to a row at a given physical offset beyond that position LEAD(<expression, offset, default>) [(<query_partition_clause>)] [<RESPECT | IGNORE> NULLS] OVER (<order_by_clause>);

conn uwclass/uwclass@pdbdev

      

      SELECT submit_date, num_votes,

      LEAD(num_votes, 1, 0)  OVER (ORDER BY submit_date) AS NEXT_VAL

      FROM vote_count;

LISTAGG

Returns all of the employees in Department 30 in the hr.employees table, ordered by hire date and last name LISTAGG (measure_expression [, delimiter]) WITHIN GROUP (order by clause) [OVER query_partition_clause);

conn hr/hr@pdbdev

      

      SELECT last_name

      FROM employees

      WHERE department_id = 30

      ORDER BY hire_date, last_name;

      

      col emp_list format a60

      

      SELECT LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date, last_name) "Emp_list",

      MIN(hire_date) "Earliest"

      FROM employees

      WHERE department_id = 30;

      

      SELECT department_id, WM_CONCAT(last_name)

      FROM employees

      GROUP BY department_id;

      

      SELECT department_id "Dept.",

      LISTAGG(last_name, '; ') WITHIN GROUP (ORDER BY hire_date) "Employees"

      FROM employees

      GROUP BY department_id;

      

      conn scott/tiger@pdbdev

      

      SELECT LISTAGG(deptno || ',' || dname || ',' || loc || chr(10))

      WITHIN GROUP (ORDER BY 1) cl 

      FROM dept;

MATCH_RECOGNIZE

Supports pattern matching. Use this clause to recognize patterns in a sequence of rows in table, which is called the row pattern input table. The result of a query that uses the clause is the row pattern output table

Enables the following tasks:

Logically partition and order the data with the PARTITION BY and ORDER BY clauses
Define measures, which are expressions usable in other parts of the SQL query, in the MEASURES clause
Define patterns of rows to seek using the PATTERN clause. These patterns use regular expression syntax, a powerful and expressive feature, applied to the pattern variables you define
Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause

MATCH_RECOGNIZE([PARTITION BY columns] [ORDER BY columns]

      [MEASURES row_pattern_measures_columns] [<ONE ROW | ALL ROWS> PER MATCH] [row_pattern_skip_to] PATTERN (row_pattern) [row_pattern_subset_clause]

      DEFINE (row_pattern_definition_list));

-- this example from the Oracle 12.2 docs

      <query id="q"><![CDATA[

          SELECT

              T.firstW,

              T.lastZ

          FROM

              S2

          MATCH_RECOGNIZE (

              MEASURES

                  A.stockTick as firstW,

                  last(Z) as lastZ

              PATTERN(A W+ X+ Y+ Z+)

              DEFINE 

                  W as W.price < prev(W.price),

                  X as X.price > prev(X.price),

                  Y as Y.price < prev(Y.price),

                  Z as Z.price > prev(Z.price)

          ) as T

          WHERE

              S2.symbol = "oracle"

      ]]></query>

MAX

Returns the maximum value by partition MAX (<DISTINCT | ALL> expression) OVER (<analytic clause>)

conn oe/oe@pdbdev

      

      SELECT manager_id, last_name, salary

      FROM (

        SELECT manager_id, last_name, salary,

        MAX(salary)  OVER (PARTITION BY manager_id) AS rmax_sal

        FROM employees)

      WHERE salary = rmax_sal;

MIN

Returns the minimum value by partition MIN (<DISTINCT | ALL> expression) OVER (<analytic clause>)

conn oe/oe@pdbdev

      

      SELECT manager_id, last_name, salary

      FROM (

        SELECT manager_id, last_name, salary,

        MAX(salary)  OVER (PARTITION BY manager_id) AS rmax_sal

        FROM employees)

      WHERE salary = rmax_sal;

NTH_VALUE

Returns the measure expression value of the nth row in the window defined by the analytic_clause. The returned value has the data type of the measure expression. NTH_VALUE (measure_expression [, n]) [FROM <FIRST | LAST>] [<RESPECT | IGNORE> NULLS] OVER (<analytic clause>)

conn  sh/sh@pdbdev

      

      SELECT prod_id, channel_id, MIN(amount_sold),

      NTH_VALUE(MIN(amount_sold), 2) OVER (

      PARTITION BY prod_id ORDER BY channel_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) NV

      FROM sales

      WHERE prod_id BETWEEN 13 AND 16

      GROUP BY prod_id, channel_id;

NTILE

Divides an ordered data set into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr. The expr value must resolve to a positive constant for each partition. NTILE (<expression>) OVER ([query_partition_clause] <order by clause>)

conn oe/oe@pdbdev

      

      SELECT last_name, salary,

      NTILE(4)  OVER (ORDER BY salary DESC) AS quartile

      FROM employees

      WHERE department_id = 100;

OVER PARTITION BY

This demo returns employees that are making above average salary in their respective department NTILE (<expression>) OVER ([query_partition_clause] <order by clause>)

conn hr/hr@pdbdev

      

      col ename format a30

      col department_name format a20

      

      SELECT * FROM (

        SELECT e.ffirst_name || ' ' || e.last_name ENAME, d.department_name,

        e.salary, TRUNC(e.salary - avg(e.salary) OVER (

        PARTITION BY e.department_id)) sal_dif

      FROM employees e, departments d

      WHERE e.department_id=d.department_id)

      WHERE sal_dif > 0

      ORDER BY 2,4 DESC;

PERCENT_RANK

For a row r, PERCENT_RANK calculates the rank of r minus 1, divided by 1 less than the number of rows being evaluated (the entire query result set or a partition). PERCENT_RANK(<value>) OVER (<partition_clause> <order_by_clause>) RETURN NUMBER;

conn oe/oe@pdbdev

      

      SELECT department_id, last_name, salary, PERCENT_RANK()

      OVER (PARTITION BY department_id  ORDER BY salary DESC) AS pr

      FROM employees

      ORDER BY pr, salary;

PERCENTILE_CONT

Inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation. PERCENTILE_CONT(<value>) WITHIN GROUP (ORDER BY <expression> [ASC | DESC]) OVER (<partition_clause>)

conn oe/oe@pdbdev

      

      SELECT last_name, salary, department_id,

      PERCENTILE_CONT(0.5)  WITHIN GROUP (ORDER BY salary DESC)

      OVER  (PARTITION BY department_id) PCT_CONT, PERCENT_RANK()

      OVER  (PARTITION BY department_id  ORDER BY salary DESC) PCT_RANK

      FROM employees

      WHERE department_id IN (30, 60);

PERCENTILE_DISC

An inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation. PERCENTILE_DISC(<expression>) WITHIN GROUP (ORDER BY <order_by_clause>)

conn oe/oe@pdbdev

      

      col cume_dist format 9.999

      

      SELECT last_name, salary, department_id,

      PERCENTILE_DISC(0.5)  WITHIN GROUP (ORDER BY salary DESC)

      OVER  (PARTITION BY department_id) PCT_DISC,

      CUME_DIST()  OVER (PARTITION BY department_id

      ORDER BY salary DESC) CUME_DIST

      FROM employees

      WHERE department_id IN (30, 60);

PREDICTION

Returns the best prediction for the model. The datatype returned depends on the target value type used during the build of the model. For regression models, this function returns the expected value. PREDICTION(<OF ANOMALY | FOR <expression>> [<cost_matrix_clause>] <mining_attribute_clause>) OVER (<mining_analytic_clause>);

SELECT cust_id, age, pred_age, age-pred_age age_diff, pred_det

      FROM (

        SELECT cust_id, age, pred_age, pred_det,

        RANK() OVER (ORDER BY ABS(age-pred_age) desc) rnk

        FROM (

          SELECT cust_id, age, PREDICTION(FOR age USING *) OVER () pred_age,

            PREDICTION_DETAILS(FOR age ABS USING *) OVER () pred_det

          FROM mining_data_apply_v)

        )

      WHERE rnk <= 3;

PREDICTION_COST

Returns a measure of cost for a given prediction as an Oracle NUMBER PREDICTION_COST(<OF ANOMALY | FOR <expression>>[,<class>] <cost_matrix_clause> <mining_attribute_clause>) OVER (<mining_analytic_clause>);

SELECT cust_id

      FROM (

        SELECT cust_id,rank()

          OVER (ORDER BY PREDICTION_COST(DT_SH_Clas_sample, 1 COST MODEL USING *) ASC, cust_id) rnk

        FROM mining_data_apply_v

        WHERE country_name = 'Italy')

      WHERE rnk <= 10

      ORDER BY rnk;

PREDICTION_DETAILS

Returns an XML string containing model-specific information related to the scoring of the input row PREDICTION_DETAILS(<schema.model> <mining_attribute_clause>)

See PREDICTION Demo Above

PREDICTION_PROBABILITY

Returns the probability for a given prediction as an Oracle NUMBER PREDICTION_COST(<OF ANOMALY | FOR <expression>>[,<class>] <mining_attribute_clause>) OVER (<mining_analytic_clause>) RETURN BINARY_DOUBLE

TBD

PREDICTION_SET

Returns a varray of objects containing all classes in a multiclass classification scenario PREDICTION_COST(<OF ANOMALY | FOR <expression>>[,<bestN>] [,<cutoff>] [<cost_matrix_clause>] <mining_attribute_clause>) OVER (<mining_analytic_clause>) RETURN VARRAY;

TBD

RANK

Calculates the rank of a value in a group of values RANK(<value>) OVER (<partition_clause> ORDER BY <order_by_clause>) RETURN NUMBER;

conn oe/oe@pdbdev

      

      SELECT department_id, last_name, salary, commission_pct,

      RANK()  OVER (PARTITION BY department_id

      ORDER BY salary DESC, commission_pct) "Rank"

      FROM employees

      WHERE department_id = 80;

      

      /* The following query finds the 5 top-selling products for each product subcategory where that product contributes more than 20% of the sales within its product category. */

      conn sh/sh@pdbdev

      

      col categ format a15

      col prod_subcategory format a20

      

      SELECT SUBSTR(prod_category,1,8) AS CATEG, prod_subcategory, prod_id, sales

      FROM (

        SELECT p.prod_category, p.prod_subcategory, p.prod_id,

        SUM(amount_sold) as SALES, SUM(SUM(amount_sold))

        OVER (PARTITION BY p.prod_category) AS CAT_SALES,

        SUM(SUM(amount_sold))

        OVER (PARTITION BY p.prod_subcategory) AS SUBCAT_SALES,

        RANK() OVER
      (PARTITION BY p.prod_subcategory

        ORDER BY SUM(amount_sold)) AS RANK_IN_LINE

        FROM sales s, customers c, countries co, products p

        WHERE s.cust_id = c.cust_id

        AND c.country_id = co.country_id

        AND s.prod_id = p.prod_id

        AND s.time_id = TO_DATE('11-OCT-2000')

        GROUP BY p.prod_category, p.prod_subcategory, p.prod_id

        ORDER BY prod_category, prod_subcategory)

      WHERE SUBCAT_SALES > 0.2 * CAT_SALES

      AND RANK_IN_LINE<=5;

RATIO_TO_REPORT

Computes the ratio of a value to the sum of a set of values. If expr evaluates to null, then the ratio-to-report value also evaluates to null. RATIO_TO_REPORT(<value>) OVER (<partition_clause>)

conn oe/oe@pdbdev

      

      SELECT last_name, salary, RATIO_TO_REPORT(salary)  OVER () AS RR

      FROM employees

      WHERE job_id = 'PU_CLERK';

REGR_ (Linear Regression) Functions

Generic Syntax FUNCTION_NAME (<expression1>,<expression2>) OVER (<analytic_clause>)

REGR_AVGX conn oe/oe@pdbdev SELECT job_id, employee_id ID, salary, REGR_SLOPE(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) slope, REGR_INTERCEPT(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) intcpt, REGR_R2(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) rsqr, REGR_COUNT(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) count, REGR_AVGX(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) avgx, REGR_AVGY(SYSDATE-hire_date, salary) OVER (PARTITION BY job_id) avgy FROM employees WHERE department_id in (50, 80) ORDER BY job_id, employee_id;

REGR_AVGY -- also see REGR_AVGX Demo conn oe/oe@pdbdev SELECT job_id, REGR_AVGY(SYSDATE - hire_date, salary) avgy, REGR_AVGX(SYSDATE - hire_date, salary) avgx FROM employees WHERE department_id in (30, 50) GROUP BY job_id;

REGR_COUNT -- also see REGR_AVGX Demo conn oe/oe@pdbdev SELECT job_id, REGR_COUNT(SYSDATE-hire_date, salary) count FROM employees WHERE department_id in (30, 50) GROUP BY job_id;

REGR_INTERCEPT -- also see REGR_AVGX Demo conn oe/oe@pdbdev SELECT job_id, REGR_SLOPE(SYSDATE - hire_date, salary) AS SLOPE, REGR_INTERCEPT(SYSDATE - hire_date, salary) intercept FROM employees WHERE department_id in (50,80) GROUP BY job_id ORDER BY job_id;

REGR_R2 -- also see REGR_AVGX Demo conn oe/oe@pdbdev SELECT job_id, REGR_R2(SYSDATE-hire_date, salary) Regr_R2 FROM employees WHERE department_id IN (50, 80) GROUP by job_id;

REGR_SLOPE See REGR_AVGX Demo See REGR_INTERCEPT Demo

REGR_SXX -- also see REGR_AVGX Demo conn oe/oe@pdbdev SELECT job_id, REGR_SXY(SYSDATE - hire_date, salary) regr_sxy, REGR_SXX(SYSDATE - hire_date, salary) regr_sxx, REGR_SYY(SYSDATE - hire_date, salary) regr_syy FROM employees WHERE department_id in (50, 80) GROUP BY job_id ORDER BY job_id;

REGR_SXY See REGR_SXX Demo

REGR_SYY See REGR_SXX Demo

ROW_NUMBER

Assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query), in the ordered sequence of rows specified in the order by clause, beginning with 1. ROW_NUMBER(<value>) OVER (<partition_clause> ORDER BY <order_by_clause>)

conn uwclass/uwclass@pdbdev

      

      CREATE TABLE test (

      id      NUMBER(1),

      degrees NUMBER(3));

      

      INSERT INTO test VALUES (0,235);

      INSERT INTO test VALUES (0,276);

      INSERT INTO test VALUES (1,211);

      INSERT INTO test VALUES (1,250);

      INSERT INTO test VALUES (1,255);

      INSERT INTO test VALUES (2,55);

      INSERT INTO test VALUES (2,277);

      INSERT INTO test VALUES (2,69);

      INSERT INTO test VALUES (3,25);

      INSERT INTO test VALUES (3,166);

      INSERT INTO test VALUES (3,262);

      INSERT INTO test VALUES (4,47);

      INSERT INTO test VALUES (4,238);

      INSERT INTO test VALUES (4,40);

      COMMIT;

      

      SELECT * FROM test;

      

       -- choose the starting cell

      SELECT id, degrees s

      FROM (

        SELECT id, degrees, (360 - degrees) d360,

        ROW_NUMBER()  OVER(PARTITION BY id

        ORDER BY CASE

          WHEN (degrees < 360 - degrees) THEN degrees

          ELSE 360 - degrees

          END) rn

        FROM test) t

      WHERE rn = 1;

      

       -- order the rest clockwise

      SELECT *

      FROM (

        SELECT t.id, t.degrees,

        ROW_NUMBER() OVER(PARTITION BY t.id

        ORDER BY CASE

          WHEN (t.degrees < starting_cell.degrees) THEN t.degrees + 360

          ELSE t.degrees

          END) rn

        FROM test t

      JOIN (

        SELECT id, degrees, (360 - degrees) d360,

        ROW_NUMBER() OVER(PARTITION BY id

        ORDER BY CASE

          WHEN (degrees < 360 - degrees) THEN degrees

          ELSE 360 - degrees

          END) rn

        FROM test)  starting_cell

        ON t.id = starting_cell.id

        WHERE starting_cell.rn=1)t

      ORDER BY id, rn;

SKEWNESS_POP (new 20c)

An aggregate function that is primarily used to determine symmetry in a given distribution SKEWNESS_POP(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

TBD

SKEWNESS_SAMP (new 20c)

An aggregate function that is primarily used to determine symmetry in a given distribution SKEWNESS_SAMP(<ALL | DISTINCT | UNIQUE>) <expression>) RETURN NUMBER;

TBD

STDDEV

Returns the sample standard deviation of an expression STDDEV([DISTINCT | ALL] <expression>) OVER (<analytic_clause>)

conn oe/oe@pdbdev

      

      col stddev format 99999.999

      

      SELECT last_name, salary,

      STDDEV(salary) OVER (ORDER BY hire_date) "StdDev"

      FROM employees

      WHERE department_id = 30;

STDDEV_POP

Computes the population standard deviation and returns the square root of the population variance STDDEV_POP(<expression>) OVER (<analytic_clause>)

conn oe/oe@pdbdev

      

      SELECT department_id, last_name, salary,

      STDDEV_POP(salary)  OVER (PARTITION BY department_id) AS pop_std

      FROM employees;

STDDEV_SAMP

Computes the cumulative sample standard deviation and returns the square root of the sample variance STDDEV_SAMP(<expression>) OVER (<analytic_clause>)

conn oe/oe@pdbdev

      

      SELECT department_id, last_name, hire_date, salary,

      STDDEV_SAMP(salary)  OVER (PARTITION BY department_id

      ORDER BY hire_date

      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sdev

      FROM employees;

SUM

Computes the cumulative sample running sum

Corrected thanks to a note from Mette Stephansen in Denmark. SUM(<expression>) OVER (<analytic_clause>)

conn oe/oe@pdbdev

      

      CREATE TABLE vote_count (

      submit_date  DATE NOT NULL,

      num_votes    NUMBER NOT NULL);

      

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-4, 100);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-3, 150);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-2, 75);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-3, 25);

      INSERT INTO vote_count VALUES (TRUNC(SYSDATE)-1, 50);

      COMMIT;

      

      SELECT * FROM vote_count;

      

      SELECT submit_date, num_votes, SUM(num_votes)

      OVER(ORDER BY submit_date  ROWS UNBOUNDED PRECEDING) TOT_VOTE

      FROM vote_count

      ORDER BY submit_date;

VAR_POP

Returns the population variance of a set of numbers VAR_POP(<value>) OVER (<analytic_clause>)

conn sh/sh@pdbdev

      

      SELECT t.calendar_month_desc,
      VAR_POP(SUM(s.amount_sold))

      OVER (ORDER BY t.calendar_month_desc) "Var_Pop",

      VAR_SAMP(SUM(s.amount_sold))

      OVER (ORDER BY t.calendar_month_desc) "Var_Samp"

      FROM sales s, times t

      WHERE s.time_id = t.time_id AND t.calendar_year = 2001

      GROUP BY t.calendar_month_desc;

VAR_SAMP

Returns the sample variance of a set of numbers VAR_SAMP(<value>) OVER (<analytic_clause>)

See VAR_POP Demo above

VARIANCE

Returns the variance of an expression VARIANCE([DISTINCT | ALL] <value>) OVER (<analytic_clause>)

conn oe/oe@pdbdev

      

      SELECT last_name, salary,

      VARIANCE(salary) OVER (ORDER BY hire_date) AS VARIANCE

      FROM employees

      WHERE department_id = 30;

Additional Demos

This one written by Maxim Demenko conn uwclass/uwclass@pdbdev CREATE TABLE test(id, quality, weight) AS SELECT 1,'A',10 FROM dual UNION ALL SELECT 2,'A',11 FROM dual UNION ALL SELECT 3,'A',12 FROM dual UNION ALL SELECT 4,'B',11 FROM dual UNION ALL SELECT 5,'B',19 FROM dual UNION ALL SELECT 6,'A',9 FROM dual UNION ALL SELECT 7,'A',14 FROM dual UNION ALL SELECT 8,'C',4 FROM dual UNION ALL SELECT 9,'C',7 FROM dual; SELECT * FROM test; SELECT MAX(id) ID, MAX(quality) QUALITY, SUM(weight) WEIGHT FROM ( SELECT id, quality, weight, SUM(new_seq) OVER (ORDER BY id) new_grp FROM ( SELECT id, quality, weight, DECODE(LAG(quality) OVER (ORDER BY id), quality, 0, id) new_seq FROM test)) GROUP BY new_grp ORDER BY 1;

This one written by Rene Nyffenegger

It returns all employees that are making above average salary in their respective department. conn hr/hr@pdbdev col ename format a30 col department_name format a20 SELECT * FROM ( SELECT e.first_name || ' ' || e.last_name ENAME, d.department_name, e.salary, TRUNC(e.salary - avg(e.salary) OVER (PARTITION BY e.department_id)) sal_dif FROM employees e, departments d WHERE e.department_id = d.department_id) WHERE sal_dif > 0 ORDER BY 2,4 DESC;