Name - Shubham Kaushik
Email - [email protected]
-
Describe any design decisions you made. These may be minimal for pa1
- Use of
ConcurrentHashMap
for defining private variable inclass Catalog
to make it thread safe. - Defined variable
pages
inclass BufferPool
usingConcurrentHashMap
for thread safety. - For
Join
, a simple nested loop join is used, in which it will iterate over one child only once and compare with each tuple of another child. So, if relation R has N records and S has M records it will run do IO for (N + (N*M)). - For
HashEquiJoin
, partitions are only done for one of the child and stored in-memory. The hasNext will pull one tuple from another child and do hashing over the join value and pull that particular partition for first child. This way it will save some comparison assuming the data is uniformly distributed
- Use of
-
Discuss and justify any changes you made to the API
-
class Tuple : Added few bunch of private variable like tupleDesc, recordId, fields and implemented an Iterator for fields.
-
class TupleDesc : Added
TDItem[] tItems
array for holding TDItem objects. The typeAr should have some item otherwise constructor will throw Exception. -
class Catalog : Defined bunch of new private
ConcurrentHashMap
variables, which are thread safe as per the java documentation. -
class BufferPool : Defined variable
pages
usingConcurrentHashMap
for thread safe. ImplementedgetPage
to get the page from buffer if exists or if not read it from disk(DbFile/HeapFile
). -
class HeapPageId : Defined few private final variables and also updated
hashCode
&equals
methods. -
class RecordId : Defined few private final variabled and also updated
hashCode
&equals
methods. -
class HeapPage : Updated
getNumTuples
&getHeaderSize
with the help of given doc on Lab webpage. Also, implemented theiterator
method for returning an iterator over theTuple
objects stored on this page. -
class HeapFile : Updated the
readPage
method to read a page fromHeapFile
. I am usingRandomAccessFile
to seek over the file randomly and read theBufferPool.getPageSize
bytes data each time. ThepageNumber
is used to calculate the address of the page in the disk. I have also implementedDbFileIterator
to iterate over all of the tuples we have in thisHeapFile
. -
SeqScan : This was easy as we just have to link this with the
HeapFile
andDbFileIterator
. -
class BTreeFile :
- findLeafPage() : This method is used to pull leaf page as per the given Field key value. This is implemented using recursion. If the Field value is passed as 'null', it will pull the left most page at leaf level.
- splitLeafPage() : This method help in splitting the leaf page when we insert more values and the leaf page is full. The middle node is copied to the internal nodes after splitting and parent & sibling pointer are updated.
- splitInternalPage() : The method do the splitting part for the internal node. This also increases the height of the BTree and push the middle node towards the parent node.
- stealFromLeafPage() : The method is used for the Deletion purpose, if the tuples are less than half full in any leaf node. It will try to steal nodes from its siblings.
- stealFromLeftInternalPage() : This method is for stealing nodes from the sibling when the internal node is out of its half full capacity.
- stealFromRightInternalPage() : This is similar to the upper method just be used when stealing nodes from right internal node, instead of left.
- mergeLeafPages() : While deletion if the stealing doesn't work out than merging of two nodes happens in BTree. The mergeLeafPages merges the leaf pages whenever required.
- mergeInternalPages() : This is same as above just to merge the internal nodes.
-
class IndexPredicate : Updated to compare the field values using the given operators. The operators are defined as enums The Comparison is done based in the given '>=', '<=', '==' operators.
-
class Predicate : Defined the required class variables to support IndexPredicate class operations.
-
Filter, Join & HashEquiJoin : Added required class variables and implemented the hasNext() method call to return the results after filter, join and hashJoin. The strategy used for join is simple nested loop join and for hashEquiJoin I have used hashing and making partition for one child.
-
IntegerAggregator, StringAggregator, Aggregate : Implemented integer & string aggregate classes for aggregating on the required column based on groupby if given.
-
HeapPage, HeapFile : Implemented insert and delete tuples method to add mutability of tuples in pages or file. Also, eviction is checked for both operations.
-
Insert, Delete : Insert and delete operators are implemented for inserting the new record or deleting the existing one.
-
-
Describe any missing or incomplete elements of your code
- As per my knowledge, all the elements required for lab1 are complete in my assignment. The only part I could guess is the use of transactionId and Permissions are not in use for anything as of now.
-
Describe how long you spent on the assignment, and whether there was anything you found particularly difficult or confusing
- For this assignment, I've spent around 2-3 days to understand and start myself writing code. There were two things which were bit tricky for me, first the use of bit manipulation for the
HeapPage.isSlotUsed
method to compute if the slot is in use with the help of header array of bytes, second was the linking of Iterators fromSeqScan
toDbFileIterator
ofHeapFile
class. Things are quite clear now.
- For this assignment, I've spent around 2-3 days to understand and start myself writing code. There were two things which were bit tricky for me, first the use of bit manipulation for the
-
If you collaborate with someone else, you have to discribe how you split the workload
- I worked alone on this assignment but yes the discussion in labs, javadoc and slide for simpledb helped a lot to speed up.