Commit Graph

127 Commits

Author SHA1 Message Date
Joshua Clayton
ad9308a672 Encapsulate runtime config into a withRuntime wrapper 2016-06-09 06:44:18 -04:00
Joshua Clayton
09231cdccd Cover case within autoLowLikelihood where no matches are present
Why?
====

In cases where no matchers are present, a language config should not
auto-classify every match as low-likelihood; instead, it should return
False so subsequent checks can operate on the match itself.

This is related to 9bf9499e67f52bcde2420bfe3945f73cfdaa06d7; both are
handling cases where less configuration data than ideal is present and
the program still needs to operate correctly.
2016-06-07 05:19:40 -04:00
Joshua Clayton
8a294a6acc Don't cache an empty list of matches
Why?
====

If a list of matches is empty, there's no reason to cache it.
2016-06-05 08:09:52 -04:00
Joshua Clayton
9bf9499e67 Ensure results work when no config can be loaded
Why?
====

If no config can be loaded correctly, unused should still function
correctly, albeit with likely more false positives.
2016-06-05 08:09:02 -04:00
Joshua Clayton
54e55c46a2 Add missing spec name to unused.cabal 2016-06-05 07:42:18 -04:00
Joshua Clayton
5c4e0c1ccd Allow allowedTerms and autoLowLikelihood to be optional configs 2016-06-04 08:11:47 -04:00
Joshua Clayton
9cc640aef7 Bump to 0.3.0.0 2016-06-04 06:53:53 -04:00
Joshua Clayton
7c26ae8e72 Move withoutCursor to wrap run
Why?
====

If parsing options fails, the program will exit; if
withoutCursor has been called prior to execParser, the program may exit
without any way to re-enable the cursor. This can cause confusion and
frustration for users.
2016-06-04 06:52:01 -04:00
Joshua Clayton
0505b4bff3 Re-enable cache by default
Why?

With SHA fingerprinting speeds improved drastically by
f618d8a796, we can now re-enable
caching by default.

This introduces a -C flag to disable the cache for a run.

Note that the cache is always invalidated when files are modified.
2016-06-04 06:44:41 -04:00
Joshua Clayton
792d0dca05 Minor reorganization within Main.hs 2016-06-02 08:22:53 -04:00
Joshua Clayton
6ffb098b20 Initial support of aliases based on wildcard matching
Why?
====

Dynamic languages, and Rails in particular, support some fun method
creation. One common pattern is, within RSpec, to create matchers
dynamically based on predicate methods. Two common examples are:

* `#admin?` gets converted to the matcher `#be_admin`
* `#has_active_todos?` gets converted to the matcher `#have_active_todos`

This especially comes into play when writing page objects with predicate
methods.

This change introduces the concept of aliases, a way to describe the
before/after for these transformations. This introduces a direct swap
with a wildcard value (%s), although this may change in the future to
support other transformations for pluralization, camel-casing, etc.

Externally, aliases are not grouped together by term; however, the
underlying counts are summed together, increasing the total occurrences
and likely pushing the individual method out of "high" likelihood into
"medium" or "low" likelihood.

Closes #19.
2016-06-01 22:16:44 -04:00
Joshua Clayton
0dcb06fe70 Incorporate hlint suggestions 2016-06-01 05:36:32 -04:00
Joshua Clayton
95836e536b Use isPrefixOf and isSuffixOf for simple start/end matches over regex 2016-05-29 16:33:05 -04:00
Joshua Clayton
ef0fb49841 Classify Phoenix controller actions as always low-likelihood 2016-05-27 15:01:33 -04:00
Joshua Clayton
965cc0a178 Move allowedTerms into auto low-likelihood classification
Why?
====

This ensures no method/function bleed between languages, which may cause
confusing miscalculation when methods/functions are reused across
different types of projects (e.g. index from Rails migrations and index
action from Phoenix controllers).
2016-05-27 15:01:33 -04:00
Joshua Clayton
6eb2e38882 Use ReaderT for ColumnFormatter
Why?
====

When printing results, the column formatter has to be configured at the
topmost level (where it has all result data) to calculate widths
appropriately; however, it's only used layers deep, when rendering the
columns themselves.

This moves the formatter into a ReaderT so the configuration can be
passed around appropriately.
2016-05-27 11:17:48 -04:00
Joshua Clayton
4dfd788318 Extract views to a Views module
Why?
====

View logic was scattered all over the place; this introduces a views
module to encapsulate any corresponding view work into one spot.
2016-05-27 06:11:52 -04:00
Joshua Clayton
1b945eba18 Update image to demonstrate usage 2016-05-27 06:10:50 -04:00
Joshua Clayton
2a74a807ac Bump to LTS 6.0 2016-05-26 22:24:28 -04:00
Joshua Clayton
0d2470815d Simplify parsing and caching of results
Why?
====

Parsec is overkill when all that's really needed is splitting on
semicolons and converting a string to a non-negative Int.

One side-effect of this is to convert the caching mechanism from flat
text to CSV, with cassava handling (de-)serialization.

Additional
==========

Introduce ReaderT to calculate sha once per cache interaction

Previously, we were calculating the fingerprint (SHA) for match results
potentially twice, once when reading from the cache, and a second time
if no cache was found. This introduces a ReaderT to manage cache
interaction with a single fingerprint calculation.

This also abstracts what's being cached to only care about the fact that
the data can be converted to/from csv.
2016-05-26 21:37:11 -04:00
Joshua Clayton
f618d8a796 Use .gitignore to determine files for fingerprinting a project
Why?
====

Because a .gitignore file captures a fair number of project-specific
directories and files to ignore, we can use this list to reduce the
number of files to look at when determining a fingerprint for a project.

Because the fingerprint should be based on files we care about changing,
the project-specific .gitignore is a great place to start.

This drastically reduces fingerprint timing - for larger projects, or
projects with a massive number of files (e.g. anything doing anything
significant with NPM and a front-end framework), this will help make
caching usable. For normal projects, this cuts fingerprint
calculation to 10%-20% of what it was previously.

Closes #38
2016-05-26 17:19:35 -04:00
Joshua Clayton
279cdfa494 Be more flexible with progress indicator types 2016-05-26 05:56:24 -04:00
Joshua Clayton
fb1198dc2f Operate on purer data sets 2016-05-23 08:22:09 -04:00
Joshua Clayton
db10b79f93 Extract results printing to separate function 2016-05-23 08:18:43 -04:00
Joshua Clayton
3f402120d9 Begin moving away from ParseResponse 2016-05-23 06:36:06 -04:00
Joshua Clayton
e006b9c2dd Fix wording around how unused finds tokens to work with 2016-05-23 05:02:54 -04:00
Joshua Clayton
6db5cb0120 Rails allowed terms covering inflections, concerns, rendering 2016-05-22 07:51:11 -04:00
Joshua Clayton
697f8b4135 Token matches can occur at the beginning or end of a file 2016-05-22 07:03:37 -04:00
Joshua Clayton
b7aefe66d0 Bump version to 0.2.0.0 2016-05-22 06:15:00 -04:00
Joshua Clayton
43edf288e2 Attempt to find and load tags automatically
Why?
====

Frequency of a tool's usage is determined by how easy it is to use the
tool. By having to pipe in ctags files all the time, and not provide any
guidance to the user, this program is merely a toy, since it's hard to
get right, and harder to explore.

This modifies the default behavior to look for a ctags file in a few
common locations, and lets the user choose a custom location if she so
chooses.

Resolves #35
2016-05-22 06:06:09 -04:00
Joshua Clayton
c5f2a38e80 Rails config: handle i18n and migration methods 2016-05-21 06:29:28 -04:00
Joshua Clayton
307dd2030f Introduce internal yaml configuration of auto low likelihood match handling
Why?
====

Handling low likelihood configuration was previously a huge pain,
because the syntax in Haskell was fairly terse. This introduces a yaml
format internally that ships with the app covering basic cases for
Rails, Phoenix, and Haskell. I could imagine getting baselines in here
for other languages and frameworks (especially ones I've used and am
comfortable with) as a baseline.

This also paves the way for searching for user-provided additions and
loading those configurations in addition to what we have here.
2016-05-21 05:34:18 -04:00
Joshua Clayton
6c9912fa29 Switch to faster/more naive implementation of String to Int conversion
Why?
====

This library converts lots of strings to positive integers
(specifically, when it's parsing output from search results). Because
these are always positive integers, we can make more assumptions about
the data and how to parse the values.

Corresponding benchmark: https://gist.github.com/joshuaclayton/767c507edf09215d08cdd79c93a5f383
2016-05-18 11:17:13 -04:00
Joshua Clayton
4c8e8b2d72 Switch to faster implementation of grouping a list
Why?
====

Immediately after searching the codebase after tokens, we group results
together based on match. This can be slow, especially within large
codebases. This improves the time taken, previously O(n^2).

Code reference: http://stackoverflow.com/a/15412231
Corresponding benchmark: https://gist.github.com/joshuaclayton/3dcde3b19e2c3006ee922053edebc417
2016-05-18 06:45:39 -04:00
Joshua Clayton
2e3bb0e67c Users opt into using the cache
Why?
====

Calculating the SHA of the entire tree can be expensive; this shifts
reading from/writing to the cache to be configured via a switch in the
CLI.

In the future, it might make sense to store metadata about the repo,
including historical time to calculate both the SHA and non-cached
versions, to compare and choose which one to do intelligently.
2016-05-18 06:44:30 -04:00
Joshua Clayton
c77cd2a8f6 Handle Ctrl-C both at top thread and forked thread for progressbar
This ensures Ctrl-C interrupts the main thread as well as kills the
forked thread rendering progress.
2016-05-16 21:52:38 -04:00
Joshua Clayton
44ab0a1435 Read unchanged results from the cache
At some point, this also needs to md5 the tags list itself and factor
that in (since if the tagging algorithm changes, and new tokens get
uncovered, it'd invalidate the cache)
2016-05-16 21:48:36 -04:00
Joshua Clayton
85df4ae01f Return ThreadId with ProgressIndicator
Why?
====

It's common to return a two-tuple of `(ThreadId, a)` when performing a
forking operation to provide a handle to the thread.
2016-05-16 06:06:51 -04:00
Joshua Clayton
8c5e94c862 Extract interrupt code 2016-05-16 05:59:32 -04:00
Joshua Clayton
fde423f272 Reset color when handling interrupt 2016-05-15 16:18:24 -04:00
Joshua Clayton
2d65555b60 Wrap cursor management into one function 2016-05-15 16:14:40 -04:00
Joshua Clayton
233f83bdf8 Have withInterruptHandler be more transparent 2016-05-15 16:02:21 -04:00
Joshua Clayton
790b62c999 Render base header as soon as we have tokens 2016-05-15 08:35:27 -04:00
Joshua Clayton
e70a7e4e0b Move line handling to occur when binding to results 2016-05-15 08:05:41 -04:00
Joshua Clayton
0e966c9302 Test Util.groupBy 2016-05-15 05:53:29 -04:00
Joshua Clayton
97f083fc2c Use regex in ag for simple words
Why?
====

ag supports using regular expressions for searches; however, the -Q
flag, which was previously always used, resulted in literal search
results.

By searching literal matches, it would potentially return too many
results. For example, with a `me` method in a controller, it'd match
words like `awesome` or `method`.

This introduces a check where, if the token being searched is only
composed of word characters (`[A-Za-z0-9_]`), it'll switch over to use
regular expressions with ag and surround the token with non-word matches
on either end. The goal here is to reduce false-positives in matches.
2016-05-14 08:14:54 -04:00
Joshua Clayton
3851c98d59 Initial Haskell support for allowed tokens
This is based off of tags generated by Hasktags:

https://hackage.haskell.org/package/hasktags
2016-05-14 06:37:48 -04:00
Joshua Clayton
bcbc1b6462 Allow search result grouping
Why?
====

Grouping results can be helpful to view information differently, e.g. to
see highest-offending files or to remove grouping entirely.

This introduces a flag to allow overriding the default group (two levels
of directory)
2016-05-14 06:36:01 -04:00
Joshua Clayton
a8a9d250e3 Parallelize search
Why?
====

Searching hundreds or thousands of tokens with ag can be slow; this
introduces parallel processing of search so results are returned more
quickly.
2016-05-13 14:46:23 -04:00
Joshua Clayton
bbb178f7d5 Extract ioOps calculation 2016-05-13 14:44:03 -04:00