Oracle Data Mining Functions
Version 21c

General Information
Library Note Morgan's Library Page Header
ACE Director Alum Daniel Morgan, founder of Morgan's Library, is scheduling complimentary technical Workshops on Database Security for the first 30 Oracle Database customers located anywhere in North America, EMEA, LATAM, or APAC that send an email to asra_us@oracle.com. Request a Workshop for your organization today.
The data mining functions operate on models that have been built using the DBMS_DATA_MINING package or the Oracle Data Mining Java API.

For a close to complete list of Oracle built-in functions and demos in the library, both stand-alone and in built-in packages: [ Click Here ].
System Privileges
ALTER ANY MINING MODEL CREATE ANY MINING MODEL DROP ANY MINING MODEL
COMMENT ANY MINING MODEL CREATE MINING MODEL SELECT ANY MINING MODEL
 
CLUSTER_DETAILS
Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns an XML string that describes the predicted cluster or a specified cluster. CLUSTER_DETAILS ([schema.]<model> [,<cluster_id>[,<topN>]] [<ABS | ASC | DESC>] USING <mining_attribute_clause>)
SELECT s.cluster_id, probability prob,
  cluster_details
(em_sh_clus_sample, s.cluster_id, 5 using t.*) det
FROM (
  SELECT v.*, CLUSTER_SET(em_sh_clus_sample, NULL, 0.2 using *) pset
  FROM mining_data_apply_v v
  WHERE cust_id = 100955) t,
  TABLE(t.pset) s
ORDER BY 2 DESC;
 
CLUSTER_DISTANCE
Predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. Returns the raw distance between each row and the centroid of either the predicted cluster or a specified. CLUSTER_DISTANCE([schema.]<model> [,<cluster_id>] USING <mining_attribute_clause>)
SELECT cust_id
FROM (
SELECT cust_id,
rank() OVER
(order by Ccluster_distance(km_sh_clus_sample using *) DESC) rnk
FROM mining_data_apply_v)
WHERE rnk <= 11
ORDER BY rnk;
 
CLUSTER_ID
Returns the cluster identifier of the predicted cluster with the highest probability for the set of predictors specified in the mining_attribute_clause CLUSTER_ID(<schema.model> <mining_attribute_clause>)
SELECT cluster_id(km_sh_clus_sample using *) AS clus, COUNT(*) AS cnt
FROM km_sh_sample_apply_prepared
GROUP BY cluster_id(km_sh_clus_sample using *)
ORDER BY cnt DESC;
 
CLUSTER_PROBABILITY
Returns a measure of the degree of confidence of membership of an input row in a cluster associated with the specified model CLUSTER_PROBABILITY(<schema.model>, <cluster_id> <mining_attribute_clause>)
SELECT *
FROM (
  SELECT cust_id, cluster_probability(km_sh_clus_sample, 2 using *) prob
  FROM km_sh_sample_apply_prepared
  ORDER BY prob DESC)
WHERE ROWNUM < 11;
 
CLUSTER_SET
Returns a varray of objects containing all possible clusters that a given row belongs to. Each object in the varray is a pair of scalar values containing the cluster ID and the cluster probability. The object fields are named CLUSTER_ID and PROBABILITY, and both are Oracle NUMBER CLUSTER_SET(<schema.model>, <top N>, <cutoff>
<mining_attribute_clause>);
WITH clus_tab AS (
  SELECT id, a.attribute_name aname, a.conditional_operator op,
  NVL(A.attribute_str_value,
  ROUND(DECODE(a.attribute_name, n.col,
               a.attribute_num_value * N.scale + N.shift,
               a.attribute_num_value),4)) val,
               a.attribute_support support,
               a.attribute_confidence confidence
FROM TABLE(dbms_data_mining.get_model_details_km('km_sh_clus_sample')) t,
     TABLE(t.rule.antecedent) a, km_sh_sample_norm n
WHERE a.attribute_name = n.col(+)
AND a.attribute_confidence > 0.55), clust AS (
  SELECT id, CAST(COLLECT(Cattr(aname, op, TO_CHAR(val), support, confidence)) AS cattrs) cl_attrs
  FROM clus_tab
  GROUP BY id), custclus AS (
    SELECT T.cust_id, S.cluster_id, S.probability
    FROM (
      SELECT cust_id, cluster_set(km_sh_clus_sample, NULL, 0.2 using *) pset
      FROM km_sh_sample_apply_prepared
      WHERE cust_id = 101362) t,
      TABLE(t.pset) s)
    SELECT a.probability prob, a.cluster_id cl_id, b.attr, b.op, b.val, b.supp, b.conf
    FROM custclus a, (
      SELECT t.id, c.*
      FROM clust t, TABLE(t.cl_attrs) c) b
      WHERE a.cluster_id = B.id
      ORDER BY prob DESC, cl_id ASC, conf DESC, attr ASC, val ASC;
 
CUBE_TABLE
Extracts data from a cube or dimension and returns it in the two-dimensional format of a relational table, which can be used by SQL-based applications CUBE_TABLE('<schema_name>.<cube_name> [<HIERARCHY | HRR> <DIMENSION> <HIERARCHY>');

CUBE_TABLE('<schema_name>.<dimension_name> [<HIERARCHY | HRR> [<DIMENSION>] <HIERARCHY>');
conn global/global

SELECT * FROM TABLE(cube_table('global.channel'));

-- the docs claim this works ... but from my experience it does not

SQL> SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time calendar'));
SELECT * FROM TABLE(CUBE_TABLE('global.units_cube HIERARCHY customer market_rollup HIERARCHY time ca
*
ERROR at line 1:
ORA-36842: Hierarchy GLOBAL.CUSTOMER.MARKET_ROLLUP was not found


-- the error has been reported to the tech writer
 
FEATURE_COMPARE
Uses a Feature Extraction model to compare two different documents, including short ones such as keyword phrases or two attribute lists, for similarity or dissimilarity. Can be used with Feature Extraction algorithms such as Singular Value Decomposition (SVD), Principal Component Analysis PCA), Non-Negative Matrix Factorization (NMF), and Explicit Semantic Analysis (ESA). FEATURE_COMPARE([<schema_name>.]<model><mining_attribute_clause>
AND <mining_attribute_clause>
USING '<document>'
AND USING '<document>';
SELECT 1-feature_compare(esa_wiki_mod
using 'There are several PGA tour golfers from South Africa' text
AND using 'Nick Price won the 2002 Mastercard Colonial Open' text) similarity
FROM dual;

SIMILARITY
----------
      .258
 
FEATURE_DETAILS
Predicts feature matches for each row. Can use a pre-defined feature extraction model or perform dynamic feature extraction. Returns an XML string that describes the predicted feature or a specified feature. FEATURE_DETAILS(<schema.model><model>[feature_id[,topN]][<ABS | ACS | DESC>]<mining_attribute_clause>
TBD
 
FEATURE_ID
Returns an Oracle NUMBER that is the identifier of the feature with the highest coefficient value FEATURE_ID(<schema.model> <mining_attribute_clause>)
SELECT feature_id(nmf_sh_sample using *) AS feat, COUNT(*) AS cnt
FROM nmf_sh_sample_apply_prepared
GROUP BY feature_id(nmf_sh_sample using *)
ORDER BY cnt DESC;
 
FEATURE_SET
Returns a varray of objects containing all possible features. Each object in the varray is a pair of scalar values containing the feature ID and the feature value. The object fields are named FEATURE_ID and VALUE, and both are Oracle NUMBERs. FEATURE_SET(<schema.model>, <top N>, <cutoff>
<mining_attribute_clause>)
WITH feat_tab AS (
  SELECT f.feature_id fid,
       a.attribute_name attr,
       TO_CHAR(a.attribute_value) val,
       a.coefficient coeff
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF('nmf_sh_sample')) f,
       TABLE(F.attribute_set) a
  WHERE A.coefficient > 0.25),
     feat AS (
  SELECT fid,
       CAST(COLLECT(Featattr(attr, val, coeff)) AS FeatAttrs) f_attrs
  FROM feat_tab
  GROUP BY fid),
    cust_10_features AS (
  SELECT t.cust_id, s.feature_id, s.value
  FROM (SELECT cust_id, feature_set(nmf_sh_sample, 10 using *) pset
        FROM nmf_sh_sample_apply_prepared
        WHERE cust_id = 100002) t, TABLE(T.pset) s)
SELECT a.value, A.feature_id fid, b.attr, b.val, b.coeff
FROM cust_10_features A,
  (SELECT t.fid, f.* FROM feat t, TABLE(t.f_attrs) f) b
WHERE A.feature_id = B.fid
ORDER BY a.value DESC, a.feature_id ASC, coeff DESC, attr ASC, val ASC;
 
FEATURE_VALUE
Returns the value of a given feature. If you omit the feature_id argument, then the function returns the highest feature value. You can use this form in conjunction with the FEATURE_ID function to obtain the largest feature/value combo. FEATURE_VALUE(<schema.model>,<feature_id><mining_attribute_clause>)
SELECT *
FROM (
  SELECT cust_id, feature_value(nmf_sh_sample,3 using *) match_qual
  FROM nmf_sh_sample_apply_prepared
  ORDER BY match_quality DESC)
WHERE ROWNUM < 11;
 
ORA_DM_PARTITION_NAME
A single row function, returns the name of the partition associated with the input row. When used on a non-partitioned model, the result is NULL. ORA_DM_PARTITION_NAME(<schema.model> <mining_attribute_clause>)
USING <using_clause>
SELECT prediction(mymodel using *) pred,
ora_dm_partition_name
(mymodel using *) pname
FROM customers;
 
PREDICTION
Returns the best prediction for the model. The datatype returned depends on the target value type used during the build of the model. For regression models, this function returns the expected value. PREDICTION(<schema.model>, <cost_matrix_clause>
<mining_attribute_clause>)
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age
FROM mining_data_apply_v
WHERE prediction(DT_SH_Clas_sample COST MODEL
using cust_marital_status, education, household_size) = 1
GROUP BY cust_gender
ORDER BY cust_gender;
 
PREDICTION_BOUNDS
For use only with generalized linear models. It returns an object with two NUMBER fields LOWER and UPPER PREDICTION_BOUNDS(<schema.model>, <confidence_level>, <class_value>
USING <* | schema.table.* | expression AS alias>
SELECT cust_marital_status, COUNT(cust_id) AS CUST_COUNT
FROM (
  SELECT cust_id, cust_marital_status
  FROM mining_data_apply_v
  WHERE prediction_bounds(glmr_sh_regr_sample,0.98 using *).LOWER > 24
  AND prediction_boundsS(glmr_sh_regr_sample,0.98 using *).UPPER < 46)
GROUP BY cust_marital_status;
 
PREDICTION_COST
Returns a measure of cost for a given prediction as an Oracle NUMBER PREDICTION_COST(<schema.model>, <class> <cost_matrix_clause>
<mining_attribute_clause>)
WITH cust_italy AS (
  SELECT cust_id
  FROM mining_data_apply_v
  WHERE country_name = 'Italy'
  ORDER BY prediction_cost(DT_SH_Clas_sample,1 COST MODEL using *) ASC,1)
SELECT cust_id
FROM cust_italy
WHERE rownum < 11;
 
PREDICTION_DETAILS
Returns an XML string containing model-specific information related to the scoring of the input row PREDICTION_DETAILS(<schema.model> <mining_attribute_clause>)
SELECT cust_id, education,
prediction_details(DT_SH_Clas_sample using *) treenode
FROM mining_data_apply_v
WHERE occupation = 'TechSup' AND age < 25
ORDER BY cust_id;
 
PREDICTION_PROBABILITY
Returns the probability for a given prediction as an Oracle NUMBER PREDICTION_PROBABILITY(<schema.model> <class>
<mining_attribute_clause>)
SELECT cust_id
FROM (
  SELECT cust_id
  FROM mining_data_apply_v
  WHERE country_name = 'Italy'
  ORDER BY predicton_probability(DT_SH_Clas_sample, 1 using *)
  DESC, cust_id)
WHERE rownum < 11;
 
PREDICTION_SET
Returns a varray of objects containing all classes in a multi-class classification scenario PREDICTION_SET(<schema.model>, <best N>, <cutoff>
<cost_matrix_clause> <mining_attribute_clause>)
SELECT T.cust_id, S.prediction, S.probability, S.cost
FROM (
  SELECT cust_id,
 prediction_set(dt_sh_clas_sample COST MODEL using *) pset
  FROM mining_data_apply_v
  WHERE cust_id < 100011) T,
TABLE(T.pset) S
ORDER BY cust_id, S.prediction;

Related Topics
All Functions
Analytic Functions
Cast
Character Functions
Collection Functions
Conversion Functions
Date Functions
Miscellaneous Functions
Nested Table Functions
Numeric Functions
OLAP Functions
String Functions
Timestamp Functions
XML Functions
What's New In 21c
What's New In 23c

Morgan's Library Page Footer
This site is maintained by Dan Morgan. Last Updated: This site is protected by copyright and trademark laws under U.S. and International law. © 1998-2023 Daniel A. Morgan All Rights Reserved
  DBSecWorx