ABLE, Version 1.1b

com.ibm.able.beans
Class AbleDataSet

java.lang.Object
  |
  +--com.ibm.able.beans.AbleDataSet

public class AbleDataSet
extends java.lang.Object
implements java.io.Serializable

This class describes an external data set using a definition file *.dfn which describes the field name and type (continuous/discrete/categorical) special types (ignore / predicted) are used by auto-generation functions min/mean/max values are computed for continuous values for auto-scaling

Version:
1.0.0; change log:
 Version   Description
 --------  -----------
 1.0.0     Initial release.
 8/2/00 jpb changed to support generation of multiple output fields (predicted/class)

 
See Also:
Serialized Form

Field Summary
protected  boolean allNumericData
           
protected  int bufferRecordIndex
           
protected  int bufferSize
           
protected  boolean computeStatistics
           
protected  AbleDataSetDefinition dataSetDefinition
           
protected  boolean eof
           
protected  java.util.Vector fieldList
           
protected  int fieldsPerRec
           
protected  java.lang.String fileName
           
protected  boolean firstPass
           
protected  java.io.BufferedReader in
           
protected static int MAX_BUFFER_SIZE
           
protected  java.util.Vector numericData
           
protected  int numRecords
           
protected  int[] randomIndices
           
protected  boolean randomizeData
           
protected  boolean ready
           
protected  int recordIndex
           
protected  java.util.Vector textData
           
protected  java.util.Hashtable variableList
           
 
Constructor Summary
AbleDataSet()
          Create a data set object over a text data file this object includes the in-memory cache or buffer for all or part of the data and handles the text file i/o using a BufferedReader
 
Method Summary
protected  void allocateDataArrays(int numRecs)
          used only when data is read in blocks (sets size of data buffer or cache)
 void close()
          close the BufferedReader on the data file (if open)
protected  AbleField createAbleField(AbleFieldDefinition def)
          construct an AbleField of the corresponding type based on the field definition
 void displayVariables()
           
 void generateTranslateTemplate()
          Given a loaded data set generate a translate template with 1 record for each field fields in the *.dfn file named "ignore" will have template usage = IGNORE fields marked as "output" it will have template usage = OUTPUT and an output template will be generated continuous fields are scaled to between 0.1 and 0.9 (using min/mean/max)
 int getBufferSize()
           
 java.util.Vector getFieldList()
           
 int getFieldsPerRec()
          return the number of fields in each record
 double[] getNextNumericRecord()
          return the next array of doubles (a numeric record) from the data set
 void getNextRecordBlock(int numRecs)
          read the next n records from a text file into a vector of String arrays assumes that the variables/ field list have already been created that the textData and numericData vectors (of arrays) have been allocated and that the BufferedReader is initialized wraps at end of file
protected  int getNextRecordIndex()
          return the index of the next record load next block of records if necessary NOTE: both recordIndex and bufferRecordIndex should be initialized to -1
 java.lang.String[] getNextTextRecord()
          return the next array of text from the data set
 int getNormalizedRecordSize()
          return the size of the record after categorical fields are expanded this assumes 1-of-N encoding
 void getNumericData()
           
 double[] getNumericRecord(int inx)
          return an array of doubles (a numeric record) from the data set
 boolean getRandomizeData()
           
 java.lang.String[] getTextRecord(int inx)
          return the specified array of text from the data set
 void initDataFile()
          scan the entire data file from a text file into a fixed size buffer of String arrays this method is used when we are cacheing data (buffersize > 0)
 boolean isAllNumericData()
          returns true if all fields are "continuous" or "discrete" false if any are "categorical" (i.e.
 boolean isReady()
          says whether data set is ready to provide data for processing i.e.
 void loadDataFile()
          load the entire data file from a text file into a vector of String arrays assumes the data file definition (variables) has already be loaded text data can be accessed using getTextRecord() numeric data can be accessed using getNumericRecord()
 void loadDataFileDefinition()
          Create the dataSetDefinition by reading the *.dfn file then instantiate corresponding field objects
 void open(java.lang.String aFileName)
          create a BufferedReader over the specified text file read the *.dfn file and create the variables for this data set ready to read the data file after an open() using either loadDataFile() -- loads entire file into memory access the data directly using getTextRecord(n), getNumericRecord(n) or initDataFile() -- does initial pass over file using the buffer access through the buffer using getNextTextRecord(), getNextNumericRecord()
 void reopen()
           
 void setBufferSize(int size)
           
 void setRandomizeData(boolean state)
           
 java.lang.String toString()
          return a formatted string describing the state of this DataSet
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MAX_BUFFER_SIZE

protected static final int MAX_BUFFER_SIZE

dataSetDefinition

protected AbleDataSetDefinition dataSetDefinition

fileName

protected java.lang.String fileName

in

protected transient java.io.BufferedReader in

allNumericData

protected boolean allNumericData

computeStatistics

protected boolean computeStatistics

firstPass

protected boolean firstPass

randomizeData

protected boolean randomizeData

randomIndices

protected int[] randomIndices

textData

protected java.util.Vector textData

numericData

protected java.util.Vector numericData

variableList

protected java.util.Hashtable variableList

fieldList

protected java.util.Vector fieldList

eof

protected boolean eof

bufferSize

protected int bufferSize

fieldsPerRec

protected int fieldsPerRec

numRecords

protected int numRecords

recordIndex

protected int recordIndex

bufferRecordIndex

protected int bufferRecordIndex

ready

protected boolean ready
Constructor Detail

AbleDataSet

public AbleDataSet()
Create a data set object over a text data file this object includes the in-memory cache or buffer for all or part of the data and handles the text file i/o using a BufferedReader
Method Detail

isReady

public boolean isReady()
says whether data set is ready to provide data for processing i.e. definition file was read and data file is open for read

displayVariables

public void displayVariables()

createAbleField

protected AbleField createAbleField(AbleFieldDefinition def)
construct an AbleField of the corresponding type based on the field definition

loadDataFileDefinition

public void loadDataFileDefinition()
                            throws java.io.FileNotFoundException,
                                   java.io.IOException
Create the dataSetDefinition by reading the *.dfn file then instantiate corresponding field objects

getFieldsPerRec

public int getFieldsPerRec()
return the number of fields in each record

open

public void open(java.lang.String aFileName)
          throws java.io.FileNotFoundException,
                 java.io.IOException
create a BufferedReader over the specified text file read the *.dfn file and create the variables for this data set ready to read the data file after an open() using either loadDataFile() -- loads entire file into memory access the data directly using getTextRecord(n), getNumericRecord(n) or initDataFile() -- does initial pass over file using the buffer access through the buffer using getNextTextRecord(), getNextNumericRecord()

close

public void close()
close the BufferedReader on the data file (if open)

reopen

public void reopen()
            throws java.io.FileNotFoundException,
                   java.io.IOException

loadDataFile

public void loadDataFile()
                  throws java.io.IOException
load the entire data file from a text file into a vector of String arrays assumes the data file definition (variables) has already be loaded text data can be accessed using getTextRecord() numeric data can be accessed using getNumericRecord()

initDataFile

public void initDataFile()
scan the entire data file from a text file into a fixed size buffer of String arrays this method is used when we are cacheing data (buffersize > 0)

allocateDataArrays

protected void allocateDataArrays(int numRecs)
used only when data is read in blocks (sets size of data buffer or cache)

getNextRecordBlock

public void getNextRecordBlock(int numRecs)
read the next n records from a text file into a vector of String arrays assumes that the variables/ field list have already been created that the textData and numericData vectors (of arrays) have been allocated and that the BufferedReader is initialized wraps at end of file

getNormalizedRecordSize

public int getNormalizedRecordSize()
return the size of the record after categorical fields are expanded this assumes 1-of-N encoding

getNumericData

public void getNumericData()

generateTranslateTemplate

public void generateTranslateTemplate()
                               throws java.io.IOException
Given a loaded data set generate a translate template with 1 record for each field fields in the *.dfn file named "ignore" will have template usage = IGNORE fields marked as "output" it will have template usage = OUTPUT and an output template will be generated continuous fields are scaled to between 0.1 and 0.9 (using min/mean/max)

getNextRecordIndex

protected int getNextRecordIndex()
return the index of the next record load next block of records if necessary NOTE: both recordIndex and bufferRecordIndex should be initialized to -1

isAllNumericData

public boolean isAllNumericData()
returns true if all fields are "continuous" or "discrete" false if any are "categorical" (i.e. symbols)

getTextRecord

public java.lang.String[] getTextRecord(int inx)
return the specified array of text from the data set

getNextTextRecord

public java.lang.String[] getNextTextRecord()
return the next array of text from the data set

getNumericRecord

public double[] getNumericRecord(int inx)
return an array of doubles (a numeric record) from the data set

getNextNumericRecord

public double[] getNextNumericRecord()
return the next array of doubles (a numeric record) from the data set

setBufferSize

public void setBufferSize(int size)

getBufferSize

public int getBufferSize()

getFieldList

public java.util.Vector getFieldList()

setRandomizeData

public void setRandomizeData(boolean state)

getRandomizeData

public boolean getRandomizeData()

toString

public java.lang.String toString()
return a formatted string describing the state of this DataSet
Overrides:
toString in class java.lang.Object

ABLE, Version 1.1b

ABLE: Produced by Joe, Don, and Jeff who say, 'Thanks for your support.'