|
ABLE, Version 1.1b | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.able.beans.AbleDataSet
This class describes an external data set using a definition file *.dfn which describes the field name and type (continuous/discrete/categorical) special types (ignore / predicted) are used by auto-generation functions min/mean/max values are computed for continuous values for auto-scaling
Version Description -------- ----------- 1.0.0 Initial release. 8/2/00 jpb changed to support generation of multiple output fields (predicted/class)
Field Summary | |
protected boolean |
allNumericData
|
protected int |
bufferRecordIndex
|
protected int |
bufferSize
|
protected boolean |
computeStatistics
|
protected AbleDataSetDefinition |
dataSetDefinition
|
protected boolean |
eof
|
protected java.util.Vector |
fieldList
|
protected int |
fieldsPerRec
|
protected java.lang.String |
fileName
|
protected boolean |
firstPass
|
protected java.io.BufferedReader |
in
|
protected static int |
MAX_BUFFER_SIZE
|
protected java.util.Vector |
numericData
|
protected int |
numRecords
|
protected int[] |
randomIndices
|
protected boolean |
randomizeData
|
protected boolean |
ready
|
protected int |
recordIndex
|
protected java.util.Vector |
textData
|
protected java.util.Hashtable |
variableList
|
Constructor Summary | |
AbleDataSet()
Create a data set object over a text data file this object includes the in-memory cache or buffer for all or part of the data and handles the text file i/o using a BufferedReader |
Method Summary | |
protected void |
allocateDataArrays(int numRecs)
used only when data is read in blocks (sets size of data buffer or cache) |
void |
close()
close the BufferedReader on the data file (if open) |
protected AbleField |
createAbleField(AbleFieldDefinition def)
construct an AbleField of the corresponding type based on the field definition |
void |
displayVariables()
|
void |
generateTranslateTemplate()
Given a loaded data set generate a translate template with 1 record for each field fields in the *.dfn file named "ignore" will have template usage = IGNORE fields marked as "output" it will have template usage = OUTPUT and an output template will be generated continuous fields are scaled to between 0.1 and 0.9 (using min/mean/max) |
int |
getBufferSize()
|
java.util.Vector |
getFieldList()
|
int |
getFieldsPerRec()
return the number of fields in each record |
double[] |
getNextNumericRecord()
return the next array of doubles (a numeric record) from the data set |
void |
getNextRecordBlock(int numRecs)
read the next n records from a text file into a vector of String arrays assumes that the variables/ field list have already been created that the textData and numericData vectors (of arrays) have been allocated and that the BufferedReader is initialized wraps at end of file |
protected int |
getNextRecordIndex()
return the index of the next record load next block of records if necessary NOTE: both recordIndex and bufferRecordIndex should be initialized to -1 |
java.lang.String[] |
getNextTextRecord()
return the next array of text from the data set |
int |
getNormalizedRecordSize()
return the size of the record after categorical fields are expanded this assumes 1-of-N encoding |
void |
getNumericData()
|
double[] |
getNumericRecord(int inx)
return an array of doubles (a numeric record) from the data set |
boolean |
getRandomizeData()
|
java.lang.String[] |
getTextRecord(int inx)
return the specified array of text from the data set |
void |
initDataFile()
scan the entire data file from a text file into a fixed size buffer of String arrays this method is used when we are cacheing data (buffersize > 0) |
boolean |
isAllNumericData()
returns true if all fields are "continuous" or "discrete" false if any are "categorical" (i.e. |
boolean |
isReady()
says whether data set is ready to provide data for processing i.e. |
void |
loadDataFile()
load the entire data file from a text file into a vector of String arrays assumes the data file definition (variables) has already be loaded text data can be accessed using getTextRecord() numeric data can be accessed using getNumericRecord() |
void |
loadDataFileDefinition()
Create the dataSetDefinition by reading the *.dfn file then instantiate corresponding field objects |
void |
open(java.lang.String aFileName)
create a BufferedReader over the specified text file read the *.dfn file and create the variables for this data set ready to read the data file after an open() using either loadDataFile() -- loads entire file into memory access the data directly using getTextRecord(n), getNumericRecord(n) or initDataFile() -- does initial pass over file using the buffer access through the buffer using getNextTextRecord(), getNextNumericRecord() |
void |
reopen()
|
void |
setBufferSize(int size)
|
void |
setRandomizeData(boolean state)
|
java.lang.String |
toString()
return a formatted string describing the state of this DataSet |
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
wait,
wait,
wait |
Field Detail |
protected static final int MAX_BUFFER_SIZE
protected AbleDataSetDefinition dataSetDefinition
protected java.lang.String fileName
protected transient java.io.BufferedReader in
protected boolean allNumericData
protected boolean computeStatistics
protected boolean firstPass
protected boolean randomizeData
protected int[] randomIndices
protected java.util.Vector textData
protected java.util.Vector numericData
protected java.util.Hashtable variableList
protected java.util.Vector fieldList
protected boolean eof
protected int bufferSize
protected int fieldsPerRec
protected int numRecords
protected int recordIndex
protected int bufferRecordIndex
protected boolean ready
Constructor Detail |
public AbleDataSet()
Method Detail |
public boolean isReady()
public void displayVariables()
protected AbleField createAbleField(AbleFieldDefinition def)
public void loadDataFileDefinition() throws java.io.FileNotFoundException, java.io.IOException
public int getFieldsPerRec()
public void open(java.lang.String aFileName) throws java.io.FileNotFoundException, java.io.IOException
public void close()
public void reopen() throws java.io.FileNotFoundException, java.io.IOException
public void loadDataFile() throws java.io.IOException
public void initDataFile()
protected void allocateDataArrays(int numRecs)
public void getNextRecordBlock(int numRecs)
public int getNormalizedRecordSize()
public void getNumericData()
public void generateTranslateTemplate() throws java.io.IOException
protected int getNextRecordIndex()
public boolean isAllNumericData()
public java.lang.String[] getTextRecord(int inx)
public java.lang.String[] getNextTextRecord()
public double[] getNumericRecord(int inx)
public double[] getNextNumericRecord()
public void setBufferSize(int size)
public int getBufferSize()
public java.util.Vector getFieldList()
public void setRandomizeData(boolean state)
public boolean getRandomizeData()
public java.lang.String toString()
|
ABLE, Version 1.1b | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |