Package edu.caltech.nanodb.queryeval
Class SelectivityEstimator
- java.lang.Object
-
- edu.caltech.nanodb.queryeval.SelectivityEstimator
-
public class SelectivityEstimator extends java.lang.Object
This utility class is used to estimate the selectivity of predicates that appear on Select and Theta-Join plan-nodes.
-
-
Field Summary
Fields Modifier and Type Field Description static float
DEFAULT_SELECTIVITY
This constant specifies the default selectivity assumed when a select predicate is too complicated to compute more accurate estimates.private static org.apache.logging.log4j.Logger
logger
A logging object for reporting anything interesting that happens.private static java.util.HashSet<SQLDataType>
SUPPORTED_TYPES_COMPARE_ESTIMATES
This collection specifies the data-types that support comparison selectivity estimates (not including equals or not-equals).
-
Constructor Summary
Constructors Modifier Constructor Description private
SelectivityEstimator()
This class should not be instantiated.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static float
computeRatio(java.lang.Object low1, java.lang.Object high1, java.lang.Object low2, java.lang.Object high2)
This method computes the function (high1 - low1) / (high2 - low2), given Object-values that can be coerced into types that can be used for arithmetic.static float
estimateBoolOperSelectivity(BooleanOperator bool, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes a selectivity estimate for a general Boolean expression that may be comprised of one or more components.private static float
estimateCompareColumnColumn(CompareOperator.Type compType, ColumnValue columnOne, ColumnValue columnTwo, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This helper function computes a selectivity estimate for a comparison between two columns.private static float
estimateCompareColumnValue(CompareOperator.Type compType, ColumnValue columnValue, LiteralValue literalValue, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This helper function computes a selectivity estimate for a comparison between a column and a literal value.static float
estimateCompareSelectivity(CompareOperator comp, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes a selectivity estimate for a general comparison operation.static float
estimateSelectivity(Expression expr, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes the selectivity of a selection predicate, using table statistics and other estimates to make an educated guess.static boolean
typeSupportsCompareEstimates(SQLDataType type)
Returns true if the database supports selectivity estimates for comparisons (other than equals and not-equals) on the specified SQL data type.
-
-
-
Field Detail
-
logger
private static org.apache.logging.log4j.Logger logger
A logging object for reporting anything interesting that happens.
-
SUPPORTED_TYPES_COMPARE_ESTIMATES
private static java.util.HashSet<SQLDataType> SUPPORTED_TYPES_COMPARE_ESTIMATES
This collection specifies the data-types that support comparison selectivity estimates (not including equals or not-equals). It must be possible to use thecomputeRatio(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object)
on the data-type so that an estimate can be made about where a value fits within the minimum and maximum values for the column.Note that we can compute selectivity for equals and not-equals simply from the number of distinct values in the column.
-
DEFAULT_SELECTIVITY
public static final float DEFAULT_SELECTIVITY
This constant specifies the default selectivity assumed when a select predicate is too complicated to compute more accurate estimates. We are assuming that generally people are going to do things that limit the results produced.- See Also:
- Constant Field Values
-
-
Method Detail
-
typeSupportsCompareEstimates
public static boolean typeSupportsCompareEstimates(SQLDataType type)
Returns true if the database supports selectivity estimates for comparisons (other than equals and not-equals) on the specified SQL data type. SQL types that support these selectivity estimates will include minimum and maximum values in their column-statistics.- Parameters:
type
- the SQL data type being considered- Returns:
- true if the database supports selectivity estimates for the type
-
estimateSelectivity
public static float estimateSelectivity(Expression expr, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes the selectivity of a selection predicate, using table statistics and other estimates to make an educated guess. The result is between 0.0 and 1.0, with 1.0 meaning that all rows will be selected by the predicate.- Parameters:
expr
- the expression whose selectivity we are estimatingexprSchema
- a schema describing the environment that the expression will be evaluated withinstats
- statistics that may be helpful in estimating the selectivity- Returns:
- the estimated selectivity as a float
-
estimateBoolOperSelectivity
public static float estimateBoolOperSelectivity(BooleanOperator bool, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes a selectivity estimate for a general Boolean expression that may be comprised of one or more components. The method treats components as independent, estimating the selectivity of each one separately, and then combines the results based on whether the Boolean operation is an AND, an OR, or a NOT operation. As one might expect, this method delegates toestimateSelectivity(edu.caltech.nanodb.expressions.Expression, edu.caltech.nanodb.relations.Schema, java.util.ArrayList<edu.caltech.nanodb.queryeval.ColumnStats>)
to compute the selectivity of individual terms.- Parameters:
bool
- the compound Boolean expressionexprSchema
- a schema specifying the environment that the expression will be evaluated withinstats
- a collection of column-statistics to use in making selectivity estimates- Returns:
- a selectivity estimate in the range [0, 1].
-
estimateCompareSelectivity
public static float estimateCompareSelectivity(CompareOperator comp, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This function computes a selectivity estimate for a general comparison operation. The method examines the types of the arguments in the comparison and determines if it will be possible to make a reasonable guess as to the comparison's selectivity; if not then a default selectivity estimate is used.- Parameters:
comp
- the comparison expressionexprSchema
- a schema specifying the environment that the expression will be evaluated withinstats
- a collection of column-statistics to use in making selectivity estimates- Returns:
- a selectivity estimate in the range [0, 1].
-
estimateCompareColumnValue
private static float estimateCompareColumnValue(CompareOperator.Type compType, ColumnValue columnValue, LiteralValue literalValue, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This helper function computes a selectivity estimate for a comparison between a column and a literal value. Note that the comparison is always assumed to have the column-name on the left, and the literal value on the right. Examples would be T1.A > 5, or T2.C = 15.- Parameters:
compType
- the type of the comparison, e.g. equals, not-equals, or some inequality comparisoncolumnValue
- the column that is used in the comparisonliteralValue
- the value that the column is being compared toexprSchema
- a schema specifying the environment that the expression will be evaluated withinstats
- a collection of column-statistics to use in making selectivity estimates- Returns:
- a selectivity estimate in the range [0, 1].
-
estimateCompareColumnColumn
private static float estimateCompareColumnColumn(CompareOperator.Type compType, ColumnValue columnOne, ColumnValue columnTwo, Schema exprSchema, java.util.ArrayList<ColumnStats> stats)
This helper function computes a selectivity estimate for a comparison between two columns. Examples would be T1.A = T2.A.- Parameters:
compType
- the type of the comparison, e.g. equals, not-equals, or some inequality comparisoncolumnOne
- the first column that is used in the comparisoncolumnTwo
- the second column that is used in the comparisonexprSchema
- a schema specifying the environment that the expression will be evaluated withinstats
- a collection of column-statistics to use in making selectivity estimates- Returns:
- a selectivity estimate in the range [0, 1].
-
computeRatio
public static float computeRatio(java.lang.Object low1, java.lang.Object high1, java.lang.Object low2, java.lang.Object high2)
This method computes the function (high1 - low1) / (high2 - low2), given Object-values that can be coerced into types that can be used for arithmetic. This operation is useful for estimating the selectivity of comparison operations, if we know the minimum and maximum values for a column.The result of this operation is clamped to the range [0, 1].
- Parameters:
low1
- the low value for the numeratorhigh1
- the high value for the numeratorlow2
- the low value for the denominatorhigh2
- the high value for the denominator- Returns:
- the ratio of (high1 - low1) / (high2 - low2), clamped to the range [0, 1].
-
-