[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SAGE] simple database problem
At 12:42 PM -0500 2005-01-03, Andrew Hume wrote:
> the problem is that on Linux (actually, i could just stop here,
> couldn't I?), the 'print all' operation can take 30mins or more
> on a busy machine, (busy here means lots of I/O) as opposed to
> the normal 2-3secs, apparently because of the random seeking around
> in the database file. performance is significantly helped by simply
> running 'wc db.dbm' just prior to using the database.
I must confess that I don't know a whole lot about *dbm, all I
know is that it has always seemed to be slow and non-scalable for the
sorts of things I've tried to do with it in the past, in comparison
to Berkeley DB.
In my own experiences, with a million e-mail addresses in a *dbm
file, the system becomes dead-dog slow when you try to handle an
operational e-mail load. Substitute db instead, and you can't slow
the system down with 10 million e-mail addresses and a much higher
load. I even threw 100 million e-mail addresses at the problem, and
the system was not measurably degraded over 10 million. Of course,
they weren't 100 million real e-mail addresses, so db may have been
able to exploit the random methods I was using to generate the input
in order to optimize performance at those levels, but the difference
between *dbm and db on just one million real addresses was quite
extreme.
Now, I'm sure that this sounds like a case of "if you only have a
hammer", but I'm curious to know why you choose to use *dbm instead
of Berkeley DB?
Among other things, I know that db will try to cache the entire
database in memory, which may or may not be a good thing, depending
on your application (although in your case, I think it would probably
be good). I also know that db gives you lots of options in terms of
storage methods used, and b-tree may be best for some applications,
while a hash may be better for others. Contrariwise, *dbm doesn't
give you any storage method choices that I know of.
Anyway, I don't think that I have any solutions to your specific
problems, but I am curious to know why *dbm was chosen over Berkeley
DB.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.