Say our hash function is uniform and that the hashes are 128 bits,
which is 32 hexadecimal digits. Consider hash value XYZWabc... Then,
create directories from the first four digits. Say we create a two
level directory system (i.e., XY/ZW). This would result in 2^8 * 2^8
+ 2^8 approx= 2^16 = 65536 directories, which a typical filesystem can
easily handle.
In each directory, we would have a single text file containing all xml
entries with the prefix XYZW. Since there are only 50k entries now,
we would expect fewer than one entry per file. Given 1 million
entries, we would expect only 16 entries per file. This is easily
maintainable, given that most of the time we would be searching and
appending, and only infrequently would we be changing an existing
entry.
Finally, note that we don't actually need subdirectories ZW, since
there is only one file in the subdirectory. =) Alternatively, we
could keep the subdirectory, and use an additional index file for
faster searching into the record file.
Kyle
-- Kyle R. Rose "They can try to bind our arms, Laboratory for Computer Science But they cannot chain our minds MIT NE43-309, 617-253-5883 or hearts..." http://web.mit.edu/krr/www/ Stratovarius krose nospam at theory.lcs.mit.edu Forever Free