Class SelectivityEstimator


  • public class SelectivityEstimator
    extends java.lang.Object
    This utility class is used to estimate the selectivity of predicates that appear on Select and Theta-Join plan-nodes.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static float DEFAULT_SELECTIVITY
      This constant specifies the default selectivity assumed when a select predicate is too complicated to compute more accurate estimates.
      private static org.apache.logging.log4j.Logger logger
      A logging object for reporting anything interesting that happens.
      private static java.util.HashSet<SQLDataType> SUPPORTED_TYPES_COMPARE_ESTIMATES
      This collection specifies the data-types that support comparison selectivity estimates (not including equals or not-equals).
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private SelectivityEstimator()
      This class should not be instantiated.
    • Field Detail

      • logger

        private static org.apache.logging.log4j.Logger logger
        A logging object for reporting anything interesting that happens.
      • SUPPORTED_TYPES_COMPARE_ESTIMATES

        private static java.util.HashSet<SQLDataType> SUPPORTED_TYPES_COMPARE_ESTIMATES
        This collection specifies the data-types that support comparison selectivity estimates (not including equals or not-equals). It must be possible to use the computeRatio(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object) on the data-type so that an estimate can be made about where a value fits within the minimum and maximum values for the column.

        Note that we can compute selectivity for equals and not-equals simply from the number of distinct values in the column.

      • DEFAULT_SELECTIVITY

        public static final float DEFAULT_SELECTIVITY
        This constant specifies the default selectivity assumed when a select predicate is too complicated to compute more accurate estimates. We are assuming that generally people are going to do things that limit the results produced.
        See Also:
        Constant Field Values
    • Constructor Detail

      • SelectivityEstimator

        private SelectivityEstimator()
        This class should not be instantiated.
    • Method Detail

      • typeSupportsCompareEstimates

        public static boolean typeSupportsCompareEstimates​(SQLDataType type)
        Returns true if the database supports selectivity estimates for comparisons (other than equals and not-equals) on the specified SQL data type. SQL types that support these selectivity estimates will include minimum and maximum values in their column-statistics.
        Parameters:
        type - the SQL data type being considered
        Returns:
        true if the database supports selectivity estimates for the type
      • estimateSelectivity

        public static float estimateSelectivity​(Expression expr,
                                                Schema exprSchema,
                                                java.util.ArrayList<ColumnStats> stats)
        This function computes the selectivity of a selection predicate, using table statistics and other estimates to make an educated guess. The result is between 0.0 and 1.0, with 1.0 meaning that all rows will be selected by the predicate.
        Parameters:
        expr - the expression whose selectivity we are estimating
        exprSchema - a schema describing the environment that the expression will be evaluated within
        stats - statistics that may be helpful in estimating the selectivity
        Returns:
        the estimated selectivity as a float
      • estimateBoolOperSelectivity

        public static float estimateBoolOperSelectivity​(BooleanOperator bool,
                                                        Schema exprSchema,
                                                        java.util.ArrayList<ColumnStats> stats)
        This function computes a selectivity estimate for a general Boolean expression that may be comprised of one or more components. The method treats components as independent, estimating the selectivity of each one separately, and then combines the results based on whether the Boolean operation is an AND, an OR, or a NOT operation. As one might expect, this method delegates to estimateSelectivity(edu.caltech.nanodb.expressions.Expression, edu.caltech.nanodb.relations.Schema, java.util.ArrayList<edu.caltech.nanodb.queryeval.ColumnStats>) to compute the selectivity of individual terms.
        Parameters:
        bool - the compound Boolean expression
        exprSchema - a schema specifying the environment that the expression will be evaluated within
        stats - a collection of column-statistics to use in making selectivity estimates
        Returns:
        a selectivity estimate in the range [0, 1].
      • estimateCompareSelectivity

        public static float estimateCompareSelectivity​(CompareOperator comp,
                                                       Schema exprSchema,
                                                       java.util.ArrayList<ColumnStats> stats)
        This function computes a selectivity estimate for a general comparison operation. The method examines the types of the arguments in the comparison and determines if it will be possible to make a reasonable guess as to the comparison's selectivity; if not then a default selectivity estimate is used.
        Parameters:
        comp - the comparison expression
        exprSchema - a schema specifying the environment that the expression will be evaluated within
        stats - a collection of column-statistics to use in making selectivity estimates
        Returns:
        a selectivity estimate in the range [0, 1].
      • estimateCompareColumnValue

        private static float estimateCompareColumnValue​(CompareOperator.Type compType,
                                                        ColumnValue columnValue,
                                                        LiteralValue literalValue,
                                                        Schema exprSchema,
                                                        java.util.ArrayList<ColumnStats> stats)
        This helper function computes a selectivity estimate for a comparison between a column and a literal value. Note that the comparison is always assumed to have the column-name on the left, and the literal value on the right. Examples would be T1.A > 5, or T2.C = 15.
        Parameters:
        compType - the type of the comparison, e.g. equals, not-equals, or some inequality comparison
        columnValue - the column that is used in the comparison
        literalValue - the value that the column is being compared to
        exprSchema - a schema specifying the environment that the expression will be evaluated within
        stats - a collection of column-statistics to use in making selectivity estimates
        Returns:
        a selectivity estimate in the range [0, 1].
      • estimateCompareColumnColumn

        private static float estimateCompareColumnColumn​(CompareOperator.Type compType,
                                                         ColumnValue columnOne,
                                                         ColumnValue columnTwo,
                                                         Schema exprSchema,
                                                         java.util.ArrayList<ColumnStats> stats)
        This helper function computes a selectivity estimate for a comparison between two columns. Examples would be T1.A = T2.A.
        Parameters:
        compType - the type of the comparison, e.g. equals, not-equals, or some inequality comparison
        columnOne - the first column that is used in the comparison
        columnTwo - the second column that is used in the comparison
        exprSchema - a schema specifying the environment that the expression will be evaluated within
        stats - a collection of column-statistics to use in making selectivity estimates
        Returns:
        a selectivity estimate in the range [0, 1].
      • computeRatio

        public static float computeRatio​(java.lang.Object low1,
                                         java.lang.Object high1,
                                         java.lang.Object low2,
                                         java.lang.Object high2)
        This method computes the function (high1 - low1) / (high2 - low2), given Object-values that can be coerced into types that can be used for arithmetic. This operation is useful for estimating the selectivity of comparison operations, if we know the minimum and maximum values for a column.

        The result of this operation is clamped to the range [0, 1].

        Parameters:
        low1 - the low value for the numerator
        high1 - the high value for the numerator
        low2 - the low value for the denominator
        high2 - the high value for the denominator
        Returns:
        the ratio of (high1 - low1) / (high2 - low2), clamped to the range [0, 1].