Commit Graph

116 Commits

Author SHA1 Message Date
Matt
80991c943f complete merge of ia code into testing.
make indexing warcs/arcs a switch in spider controls.
2015-11-09 12:46:06 -07:00
Matt
0f453d5cdf Merge branch 'ia-zak' into testing 2015-10-07 10:02:38 -06:00
Matt
16db36252c Merge branch 'diffbot-testing' into testing 2015-10-07 10:02:06 -06:00
Matt
df1c7f6e0f update qa.cpp syntax test to do &n=100
for gbssStatusCode:0 query
2015-10-05 17:31:35 -06:00
Zak Betz
c947252fee Add gbcapturedate to individual doc's metadata when injecting warcs. 2015-10-04 01:53:54 -06:00
Matt
cb4bbe8892 Merge branch 'ia' into ia-zak 2015-09-30 07:58:31 -06:00
Matt
d4c677170f index metadata on EDOCUNCHANGED errors, and append new meta data
to XmlDoc::ptr_metadata.
2015-09-30 07:57:40 -06:00
Matt
100888d691 fix file/dir creation permissions bugs 2015-09-21 12:44:41 -06:00
Matt
74cde33a3a just use the user's umask val for all file/dir creation 2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d all files made are now group writable.
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
c803e0906e fix </script> tag detection stuff again. 2015-08-31 14:06:44 -06:00
Zak Betz
e252dfb088 Add docs per second stat.
Fix auto update on statsdb graph.
Add Stat toggles for statsdb graph.
Add a unit test for indexing an array in metadata.
2015-08-22 12:05:20 -06:00
Zak Betz
a7ae510e31 Fix string faceting display for json metadata.
Add unit test for faceted metadata.
2015-07-06 23:05:18 -06:00
Zak Betz
87fcda0f93 Fix atotime5 to parse ISO8601.
Fix qa test for warcs and arcs.
Fix inject script.
2015-07-06 00:51:18 -06:00
Zak Betz
7b507a70ef Set value length to 0 for something that does not return a string value
in Json.cpp.
Fix the '-' -> '_' when indexing generic fields.
Add a StackBuf macro which is a Safebuf initialized with a small
stack buffer for use in a local scope.
2015-06-30 14:09:57 -06:00
Zak Betz
9ca0223cf1 Translate metadata field names with dashes to _.
Add unit tests for searching for certain types of metadata.
2015-06-17 23:36:31 -06:00
Zak Betz
32987e76ee Add json metadata field to page inject.
Fix memory leak when spidering warc files.
Add script to inject warcs from internet archives search results.
2015-06-14 20:58:41 -06:00
Zak Betz
e399a8b0aa Add qa test for arc and warc files. Change XmlDoc to use timeaxis url
when creating the titlerec key instead of the firsturl.
2015-05-21 15:19:33 -06:00
Zak Betz
36037c23a1 Add a test for useTimeAxis. 2015-05-12 15:18:38 -06:00
Matt
697b8307b2 fix qa test to make it easier to see the real diffs 2015-04-30 19:38:27 -07:00
Matt
e2eba10068 qa test fix 2015-04-28 13:48:29 -07:00
Matt
f26c9d609b one more qa test fix for spider status docs 2015-04-01 12:47:32 -06:00
Matt
5e46262cb2 more fixes for qa'ing of new spider status docs 2015-04-01 12:03:17 -06:00
Matt
10a31783bb fixes to pass internal qa tests in light
of gbss (spider status doc) changes and other things.
had to make xmldoc.o -O2 instead of -O3 to fix strange bug.
2015-04-01 11:20:36 -06:00
Matt
6b293f17e6 now show "totalDocsWithField" for each facet, so we know
how many docs had that field, with any particular value,
so we can do tf/idf type things.
2015-04-01 09:16:42 -06:00
Matt
8e72d6e4cc fix a couple critical xml parsing bugs. fixes
parsing of rss feeds better and xml in general.
fixed qa tests to ignore collection list when doing diff.
2015-03-10 19:13:21 -07:00
Matt
e8e5f9e005 qa test fixes 2015-03-05 07:45:28 -08:00
Matt
856823e862 fix qa test some. 2015-02-19 20:18:30 -07:00
Matt
ef99aabf4d try to fix qainject1 core in qa.cpp 2015-02-17 20:17:59 -07:00
Matt
dce8d9f930 fix qa bug of not resetting s_i.
fix tcpserver.cpp bug of destroying a streaming
socket after what is really not the final write.
2015-02-17 20:10:13 -07:00
Matt
c0332d4381 fix qa 2015-01-31 18:42:31 -07:00
Matt
72b6546ed9 fix some smoke tests 2015-01-22 15:53:04 -07:00
Matt
faaaf3cb89 smoke test for query fix 2015-01-22 14:56:51 -07:00
Matt
e178c67f4b do not core on qa test fail 2014-12-17 16:31:37 -08:00
Matt
27db9d57a1 added undeletable posdb key test to qainject1().
caught an undeletable rec and fixed that in xmldoc.cpp.
2014-12-16 13:29:04 -08:00
Matt
578cde9d9d fix sections.cpp to not set root title section
to tagid TAG_TITLE.
2014-12-11 19:54:33 -08:00
Matt
b89f071f7c quite a few bug fixes from adding the new query
syntax qa test.
2014-12-11 18:24:28 -08:00
Matt
0460335861 more permission system updates 2014-12-08 09:49:17 -08:00
Matt
41c8817bdb fixed summary initialization error
of the flags buffer.
fixed term freq algo. use exact term freq
for qatest123. made Summary.o -O3 again.
fix gbsystem() to disable both timers.
2014-12-06 10:14:48 -07:00
Matt
5b92b5f6d5 now term freqs are almost exact for qatest123.
sometimes an off by 1 bug. we should really call
msg5 to get the list w/o thread and get a truly
exact term freq for qatest123 for consistency.
that would be in Posdb.cpp::getTermFreq()
2014-11-25 15:54:15 -07:00
Matt
266d97608a fix a few more 64-bit conversion cores 2014-11-20 16:12:18 -08:00
Matt
4a0554c76f more 64bit fixes 2014-11-14 17:30:32 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
b13f3d24d7 replaced unsigned long long with uint64_t 2014-10-30 13:30:39 -06:00
mwells
ce56fb93ab fix qa test so we can roll out proxy code. 2014-09-30 15:40:02 -07:00
mwells
a8c5d6a46e fix gbfacetstr: operator for xml docs 2014-09-28 12:09:04 -07:00
mwells
7d3bcd7672 1 spider out at a time for qa test consistency 2014-09-28 11:00:31 -07:00
mwells
7a0f9fe370 fix support for indexing xml docs.
no longer use hacks gbxmltitle and gbxmllinks.
no longer convert html entities for xml docs using hacks
since we have XmlDoc::hashXmlFields() function.
added qaxml() qa test to test xml doc indexing and searching.
ignore <?xml> tag when generating xml tag compound name.
2014-09-28 10:43:41 -07:00
mwells
0267e865b8 minor fixes 2014-09-27 17:01:16 -07:00