Commit Graph

31 Commits

Author SHA1 Message Date
Zak Betz
ff6caf79a2 Increase time to mark item as stale in warc injector. 2015-11-01 19:45:29 -07:00
Zak Betz
aeca57e9f4 Pass in the buffer size of an injection request so that if the content
length header field is bigger than the actual buffer we won't index
random memory.  Fixes bug with truncated warc captures.
2015-10-28 00:38:08 -06:00
Zak Betz
f7bb617b85 Fixes for bad content lengths when injecting warcs. 2015-10-26 22:15:03 -06:00
Zak Betz
089b36e050 Injector fixes. 2015-10-20 17:01:05 -06:00
Zak Betz
925fea29f4 Bug fix for search with facets with s=N | N > 0
Make warc injector more resillient to advancedsearch.php failure.
2015-10-19 18:28:15 -06:00
Zak Betz
667e65ce01 Progress bar for warc injector. 2015-10-19 10:08:04 -06:00
Zak Betz
ea139a65e6 Warc stream busy loop fixes.
Load balance msg22 to the one with the least outstanding requests.
2015-10-15 22:30:07 -06:00
Zak Betz
6a40315237 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into warc-stream
Conflicts:
	XmlDoc.cpp
2015-10-12 00:32:52 -06:00
Zak Betz
45744d74f3 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into warc-stream 2015-10-07 08:46:07 -06:00
Zak Betz
49ec9c99fd Don't restart all items when forcing a list of items into injector. 2015-10-05 15:33:09 -06:00
Zak Betz
c947252fee Add gbcapturedate to individual doc's metadata when injecting warcs. 2015-10-04 01:53:54 -06:00
Zak Betz
6becb55a2b Stream warcs instead of downloading them and unzipping them on disk. 2015-09-30 22:25:59 -06:00
Zak Betz
55169be6fc Warc injector update. 2015-09-21 09:31:59 -06:00
Zak Betz
e50c57db90 Warc injector script 2015-09-12 14:08:42 -06:00
Zak Betz
a270e163de Fix coring on udp timeout when clustering search results.
Add ability to force update a list of items in warc injector.
2015-09-11 11:05:57 -06:00
Zak Betz
41268aeba7 Changes to script to copy to back twins. 2015-08-28 09:06:16 -06:00
Zak Betz
94871521e7 nothing. 2015-08-23 21:06:19 -06:00
Zak Betz
e252dfb088 Add docs per second stat.
Fix auto update on statsdb graph.
Add Stat toggles for statsdb graph.
Add a unit test for indexing an array in metadata.
2015-08-22 12:05:20 -06:00
Zak Betz
36b8d384bd Fixes to injector script.
New colors and metrics on performance graph.
2015-08-13 23:29:20 -06:00
Zak Betz
dead58329e Add a script for interacting with hosts.conf files. 2015-07-21 10:17:01 -06:00
Zak Betz
15eb7f659d Fix some malformed html on hosts page.
Fix core when no collection record in injection request.
Add a script to test disk speed.
2015-07-16 12:02:14 -06:00
Zak Betz
a697b3d5a5 Fix Bad File Descriptor loop bug when downloading a static file on a
slow disk.
2015-07-14 17:00:09 -06:00
Zak Betz
6e21bc7d7c Injection script fixes.
Temporary fix for core when injecting large warc.
2015-07-08 14:03:39 -06:00
Zak Betz
87fcda0f93 Fix atotime5 to parse ISO8601.
Fix qa test for warcs and arcs.
Fix inject script.
2015-07-06 00:51:18 -06:00
Zak Betz
6de4199ee8 Fix linkdb core.
Make file and line number the label for StackBuf.
2015-07-02 12:17:10 -06:00
Zak Betz
b88079a2d4 Fix warc injector script. 2015-06-30 22:19:13 -06:00
Zak Betz
7b507a70ef Set value length to 0 for something that does not return a string value
in Json.cpp.
Fix the '-' -> '_' when indexing generic fields.
Add a StackBuf macro which is a Safebuf initialized with a small
stack buffer for use in a local scope.
2015-06-30 14:09:57 -06:00
Zak Betz
f490847eb2 Fix injector build script.
Add IA's lib for getting metadata.
2015-06-18 01:23:13 -06:00
Zak Betz
9f61636881 Change collection on inject script. 2015-06-18 00:24:36 -06:00
Zak Betz
9ca0223cf1 Translate metadata field names with dashes to _.
Add unit tests for searching for certain types of metadata.
2015-06-17 23:36:31 -06:00
Zak Betz
32987e76ee Add json metadata field to page inject.
Fix memory leak when spidering warc files.
Add script to inject warcs from internet archives search results.
2015-06-14 20:58:41 -06:00