Class ColumnStatsCollector


  • public class ColumnStatsCollector
    extends java.lang.Object
    This class facilitates the collection of statistics for a single column of a table being analyzed by the TableManager.analyzeTable(edu.caltech.nanodb.relations.TableInfo) method. Instances of the class compute the number of distinct values, the number of non-NULL values, and for appropriate data types, the minimum and maximum values for the column.

    The class also makes it very easy to construct a ColumnStats object from the result of the analysis.

    Design Note:
    (Donnie) This class is limited in its ability to efficiently compute the number of unique values for very large tables. An external-memory approach would have to be used to support extremely large tables.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) java.lang.Comparable maxValue
      The maximum value seen in the column's values, or null if the maximum is unknown or won't be computed.
      (package private) java.lang.Comparable minValue
      The minimum value seen in the column's values, or null if the minimum is unknown or won't be computed.
      private int numNullValues
      A count of the number of NULL values seen in the column-values.
      private SQLDataType sqlType
      The SQL data-type for the column that stats are being collected for.
      private java.util.HashSet<java.lang.Object> uniqueValues
      The set of all values seen in this column.
    • Constructor Summary

      Constructors 
      Constructor Description
      ColumnStatsCollector​(SQLDataType sqlType)
      Initializes a new column-stats collector object for a column with the specified base SQL datatype.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addValue​(java.lang.Object value)
      Adds another column-value to this stats-collector object, updating the statistics for the column.
      ColumnStats getColumnStats()
      This helper method constructs and returns a new column-statistics object containing the stats collected by this object.
      java.lang.Object getMaxValue()
      Returns the maximum value seen for the column, or null if the column's type isn't supported for comparison estimates (or if there aren't any rows in the table being analyzed).
      java.lang.Object getMinValue()
      Returns the minimum value seen for the column, or null if the column's type isn't supported for comparison estimates (or if there aren't any rows in the table being analyzed).
      int getNumNullValues()
      Returns the number of NULL values seen for the column.
      int getNumUniqueValues()
      Returns the number of unique (and non-NULL) values seen for the column.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • sqlType

        private SQLDataType sqlType
        The SQL data-type for the column that stats are being collected for.
      • uniqueValues

        private java.util.HashSet<java.lang.Object> uniqueValues
        The set of all values seen in this column. This set could obviously occupy a large amount of memory for large tables.
      • numNullValues

        private int numNullValues
        A count of the number of NULL values seen in the column-values.
      • minValue

        java.lang.Comparable minValue
        The minimum value seen in the column's values, or null if the minimum is unknown or won't be computed.
      • maxValue

        java.lang.Comparable maxValue
        The maximum value seen in the column's values, or null if the maximum is unknown or won't be computed.
    • Constructor Detail

      • ColumnStatsCollector

        public ColumnStatsCollector​(SQLDataType sqlType)
        Initializes a new column-stats collector object for a column with the specified base SQL datatype.
        Parameters:
        sqlType - the base SQL datatype for the column.
    • Method Detail

      • addValue

        public void addValue​(java.lang.Object value)
        Adds another column-value to this stats-collector object, updating the statistics for the column.
        Parameters:
        value - the value from the column being analyzed.
        Design Note:
        (Donnie) We have to suppress "unchecked operation" warnings on this code, since Comparable is a generic (and thus allows us to specify the type of object being compared), but we want to use it without specifying any types.
      • getNumNullValues

        public int getNumNullValues()
        Returns the number of NULL values seen for the column.
        Returns:
        the number of NULL values seen for the column
      • getNumUniqueValues

        public int getNumUniqueValues()
        Returns the number of unique (and non-NULL) values seen for the column.
        Returns:
        the number of unique (and non-NULL) values seen for the column
      • getMinValue

        public java.lang.Object getMinValue()
        Returns the minimum value seen for the column, or null if the column's type isn't supported for comparison estimates (or if there aren't any rows in the table being analyzed).
        Returns:
        the minimum value in the table for the column
      • getMaxValue

        public java.lang.Object getMaxValue()
        Returns the maximum value seen for the column, or null if the column's type isn't supported for comparison estimates (or if there aren't any rows in the table being analyzed).
        Returns:
        the maximum value in the table for the column
      • getColumnStats

        public ColumnStats getColumnStats()
        This helper method constructs and returns a new column-statistics object containing the stats collected by this object.
        Returns:
        a new column-stats object containing the stats that have been collected by this object