Commit Graph

4 Commits

Author SHA1 Message Date
mwells
3374ce450a fix a couple catdb generation bugs.
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
f562e6da9a just ignore all urls with # (hashtag) in them
from the dmoz dump. we were truncating
http://twitter.com/#!/ronpaul to
http://twitter.com/ and when looking up
the catids of twitter.com got that ronpaul url.
so that's bad. people should respect the hashtag.
2013-10-03 23:33:55 -06:00
mwells
a0c79932bb catdb is now generated successfully. 2013-10-02 23:36:49 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00