Dmitry Smirnov
c124eda914
cleanup: remove local zlib. All distros provide zlib1g-dev.
2021-05-05 10:36:21 +10:00
Dmitry Smirnov
3a55b74050
cleanup: removed useless local binaries (libgcc.a libc.a)
2021-05-05 10:24:45 +10:00
Dmitry Smirnov
e18d2396a6
Removed private OpenSSL [hygiene,FTBFS]. All distros provide OpenSSL.
2021-05-05 10:20:11 +10:00
Dmitry Smirnov
7a2bca9649
Compile with "-std=c++98" to fix FTBFS ( Closes : #164 )
...
~~~~
Mem.cpp:233:8: error: declaration of ‘void* operator new(size_t) throw (std::bad_alloc)’ has a different exception specifier
233 | void * operator new (size_t size) throw (std::bad_alloc) {
| ^~~~~~~~
~~~~
2021-05-05 10:07:55 +10:00
Gigablast
9146f05574
Merge pull request #157 from shijuraj/master
...
Merge pull request #1 from gigablast/master
2020-05-04 08:55:25 -06:00
Shijuraj J
c1e5f9fd7f
Merge pull request #1 from gigablast/master
...
Commits on Jun 02, 2017
2019-09-21 08:24:58 +05:30
Gigablast
4a943f1c79
Merge pull request #136 from vonbetz/master
...
Fix infinite loop on malformed proxy.
2017-06-02 11:32:56 -06:00
Zak Betz
f10fdada73
Fix infinite loop on malformed proxy.
2017-06-02 11:28:58 -06:00
Matt
3d248732d0
fix to shut up app checker.
2016-11-04 17:28:26 -06:00
Matt
c0b2cdb60a
hide the verify disk writes parm, seems to be causing
...
cores when activated. and shouldn't really need to be used.
is for debugging disk issues.
2016-11-04 17:09:15 -06:00
Matt
8891100c2a
fix add url on root page to set collnum properly.
...
fix Summary::getBestWindow() underrun bug.
2016-04-06 10:31:04 -06:00
Matt
70ca2fe48c
update ./gb -h desc for ./gb inject.
2016-04-05 21:06:38 -06:00
Gigablast
f5d0045b43
Merge pull request #82 from vonbetz/testing
...
Fix for сацминэнергорф --> сацминэнерго.рф in getDisplayUrl(...)
2016-03-29 13:12:56 -06:00
Zak Betz
3c140b87aa
Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing
2016-03-29 12:42:05 -06:00
Zak Betz
cf7ec13de6
Fix international domain printing bug.
2016-03-29 12:41:34 -06:00
Matt
33e76af1d1
Merge branch 'testing'
2016-03-29 04:11:30 -06:00
Matt
816d69b34c
a lot of bug fixes thanks to isj.
2016-03-29 04:08:17 -06:00
Matt
5072e851b7
fix misspelling
2016-03-28 17:26:40 -06:00
Matt
5935619eb2
hack on parentUrlDocId to the json object dump
...
of diffbot objects.
2016-03-28 12:39:48 -06:00
Matt
cab6d5c519
fix keysize==8 bug in keycmp
2016-03-28 09:17:01 -06:00
Matt
b65a16caee
Merge branch 'diffbot-testing' into testing
2016-03-22 16:25:21 -06:00
Matt Wells
3c743a7d0e
allow more docids to be downloaded/served in search results.
2016-03-22 15:24:33 -07:00
Matt Wells
04a8433256
show gbssParentDocId in status doc for children docs,
...
like diffbot object docs.
2016-03-22 09:00:10 -07:00
Matt Wells
483d69d7f7
added httprequest debug line
2016-03-21 14:46:10 -07:00
Matt Wells
136d23816c
fix hashbang properly
2016-03-21 09:29:55 -07:00
Matt
48398d0cd7
Merge branch 'diffbot-testing' into testing
2016-03-20 23:14:26 -06:00
Matt Wells
136b8842db
fix more data corruption bugs. hopefully
...
will dump out all the collections this time and
not leave any in the tree, otherwise, especially if there
are a lot left behind, they get corrupted.
2016-03-20 21:04:01 -07:00
Matt Wells
61ef806dea
hash bang fix.
...
detect more corruption.
don't dump titledb and spiderdb at same time,
seems to reduce corruption in rdbmem.
2016-03-20 12:50:43 -07:00
Matt Wells
fc495a5bf5
fix dump core when collection deleted while dumping
2016-03-18 06:46:38 -07:00
Matt
8922b8e69c
Merge branch 'diffbot-testing' into testing
2016-03-17 14:31:22 -06:00
Matt Wells
56bde4c3ef
fix the data corruption fix
2016-03-17 13:22:56 -07:00
Matt Wells
8bc653c31c
after dump completes scan tree to ensure all nodes
...
reference secondary mem ptr so they don't get their
data overwritten.
2016-03-17 10:09:49 -07:00
Matt Wells
0caf345850
if running ./gb start and another gb is already bound on the port
...
then quickly exit(0) and have the bash keep alive loop exit the loop
based on that return value. we can't use ./cleanexit file because it
doesn't get remove and will mess up the main process that is running.
2016-03-16 16:56:48 -07:00
Matt Wells
36fdbf2f5a
rename log files in the gb main.cpp code not in the
...
bash loop. do not rename the log file if failed to start
gb because socket was already bound. prevents us from double
starts moving the log file, which is annoying.
2016-03-16 16:08:08 -07:00
Matt Wells
a2e8a3a1fd
use ./cleanexit file to ensure gb doesn't restart
...
after a graceful exit in the bash keep alive loop.
2016-03-16 14:57:19 -07:00
Matt Wells
7396e57660
show docids of corrupted title recs found.
...
show key range of each dump to disk.
fix 'sentToDiffbot' bug for unchanged docs in status docs.
make sure firstKeyInQueue is set properly from current key,
so reset list ptr before doing that in RdbDump.cpp.
2016-03-16 13:53:08 -07:00
Matt
5e8c47adfd
Merge branch 'diffbot-testing' into testing
2016-03-16 01:14:37 -06:00
Matt Wells
1faff50f5a
if msg22a never called to get docid, then
...
error out.
2016-03-16 00:14:02 -07:00
Matt
c7c8c9e5ad
Merge branch 'diffbot-testing' into testing
2016-03-16 00:54:49 -06:00
Matt Wells
0b5f417349
if old title rec was corrupted we would get a random docid
...
when re-spidering the url causing some chaos. now things
should return to normal and we should overwrite the corrupted
titlerec on the next spidering. also, no longer do robots.txt
titlerec lookups. silly.
2016-03-15 23:26:57 -07:00
Matt Wells
58993dbbf9
do not allow crawlbot seeds to be deduped out
2016-03-15 20:42:28 -07:00
Matt Wells
bf45db6f48
Merge branch 'diffbot-testing' into testing
2016-03-15 15:55:55 -07:00
Matt Wells
8a65d21371
fix the source of lots of corruption in spiderdb and titledb.
...
rdbmem.cpp was storing in secondary mem which got reset when
dump completed. also do not add keys that are in collnum and
key range of list currently being dumped, return ETRYAGAIN.
added verify writes parm. clean out tree of titledb and spiderdb
corruption on startup.
2016-03-15 15:54:12 -07:00
Matt Wells
0fdbaa4196
makefile optimizations
2016-03-14 16:34:24 -07:00
Matt
0dbc304bbf
fix to allow us to gather ip-only url outlinks again
2016-03-14 10:56:33 -06:00
Matt
2c167aada7
fix redirect to self bug that requires setting cookie
2016-03-14 10:33:05 -06:00
Matt Wells
d6fe684b99
fix another core caused by deleted coll
2016-03-07 10:20:25 -08:00
Matt Wells
d4e16a4dab
pass a crawlbotnightly smoke
2016-03-04 13:14:28 -08:00
Matt Wells
e75d80abbe
ignore meta redirect tags in html comment tags.
2016-02-22 12:41:03 -08:00
Matt Wells
412b04bbd4
fix neverending crawl rounds by only trying each url
...
once per round. updated url filters.
2016-02-22 09:28:46 -08:00