Database format

Kyle R. Rose (krose nospam at theory.lcs.mit.edu)
09 Mar 1999 20:19:25 -0500

I was thinking earlier about the database format, and a solution to
the inode problem occurred to me.

Say our hash function is uniform and that the hashes are 128 bits,
which is 32 hexadecimal digits. Consider hash value XYZWabc... Then,
create directories from the first four digits. Say we create a two
level directory system (i.e., XY/ZW). This would result in 2^8 * 2^8
+ 2^8 approx= 2^16 = 65536 directories, which a typical filesystem can
easily handle.

In each directory, we would have a single text file containing all xml
entries with the prefix XYZW. Since there are only 50k entries now,
we would expect fewer than one entry per file. Given 1 million
entries, we would expect only 16 entries per file. This is easily
maintainable, given that most of the time we would be searching and
appending, and only infrequently would we be changing an existing
entry.

Finally, note that we don't actually need subdirectories ZW, since
there is only one file in the subdirectory. =) Alternatively, we
could keep the subdirectory, and use an additional index file for
faster searching into the record file.

Kyle

-- 
Kyle R. Rose                      "They can try to bind our arms,
Laboratory for Computer Science    But they cannot chain our minds
MIT NE43-309, 617-253-5883             or hearts..."
http://web.mit.edu/krr/www/                           Stratovarius
krose nospam at theory.lcs.mit.edu                              Forever Free