Commit Graph

22 Commits

Author SHA1 Message Date
Joshua Clayton
b65de02efc Display recent git SHAs per token
This creates a new "list" output format that includes a certain number
of git SHAs per token. This allows for perusal of the most recent
changes for a given token to understand what changed.
2016-06-21 23:02:37 -04:00
Joshua Clayton
ce9b3b8a13 Store computed aliases on TermResults 2016-06-21 06:12:40 -04:00
Joshua Clayton
09231cdccd Cover case within autoLowLikelihood where no matches are present
Why?
====

In cases where no matchers are present, a language config should not
auto-classify every match as low-likelihood; instead, it should return
False so subsequent checks can operate on the match itself.

This is related to 9bf9499e67f52bcde2420bfe3945f73cfdaa06d7; both are
handling cases where less configuration data than ideal is present and
the program still needs to operate correctly.
2016-06-07 05:19:40 -04:00
Joshua Clayton
9bf9499e67 Ensure results work when no config can be loaded
Why?
====

If no config can be loaded correctly, unused should still function
correctly, albeit with likely more false positives.
2016-06-05 08:09:02 -04:00
Joshua Clayton
6ffb098b20 Initial support of aliases based on wildcard matching
Why?
====

Dynamic languages, and Rails in particular, support some fun method
creation. One common pattern is, within RSpec, to create matchers
dynamically based on predicate methods. Two common examples are:

* `#admin?` gets converted to the matcher `#be_admin`
* `#has_active_todos?` gets converted to the matcher `#have_active_todos`

This especially comes into play when writing page objects with predicate
methods.

This change introduces the concept of aliases, a way to describe the
before/after for these transformations. This introduces a direct swap
with a wildcard value (%s), although this may change in the future to
support other transformations for pluralization, camel-casing, etc.

Externally, aliases are not grouped together by term; however, the
underlying counts are summed together, increasing the total occurrences
and likely pushing the individual method out of "high" likelihood into
"medium" or "low" likelihood.

Closes #19.
2016-06-01 22:16:44 -04:00
Joshua Clayton
965cc0a178 Move allowedTerms into auto low-likelihood classification
Why?
====

This ensures no method/function bleed between languages, which may cause
confusing miscalculation when methods/functions are reused across
different types of projects (e.g. index from Rails migrations and index
action from Phoenix controllers).
2016-05-27 15:01:33 -04:00
Joshua Clayton
0d2470815d Simplify parsing and caching of results
Why?
====

Parsec is overkill when all that's really needed is splitting on
semicolons and converting a string to a non-negative Int.

One side-effect of this is to convert the caching mechanism from flat
text to CSV, with cassava handling (de-)serialization.

Additional
==========

Introduce ReaderT to calculate sha once per cache interaction

Previously, we were calculating the fingerprint (SHA) for match results
potentially twice, once when reading from the cache, and a second time
if no cache was found. This introduces a ReaderT to manage cache
interaction with a single fingerprint calculation.

This also abstracts what's being cached to only care about the fact that
the data can be converted to/from csv.
2016-05-26 21:37:11 -04:00
Joshua Clayton
f618d8a796 Use .gitignore to determine files for fingerprinting a project
Why?
====

Because a .gitignore file captures a fair number of project-specific
directories and files to ignore, we can use this list to reduce the
number of files to look at when determining a fingerprint for a project.

Because the fingerprint should be based on files we care about changing,
the project-specific .gitignore is a great place to start.

This drastically reduces fingerprint timing - for larger projects, or
projects with a massive number of files (e.g. anything doing anything
significant with NPM and a front-end framework), this will help make
caching usable. For normal projects, this cuts fingerprint
calculation to 10%-20% of what it was previously.

Closes #38
2016-05-26 17:19:35 -04:00
Joshua Clayton
697f8b4135 Token matches can occur at the beginning or end of a file 2016-05-22 07:03:37 -04:00
Joshua Clayton
307dd2030f Introduce internal yaml configuration of auto low likelihood match handling
Why?
====

Handling low likelihood configuration was previously a huge pain,
because the syntax in Haskell was fairly terse. This introduces a yaml
format internally that ships with the app covering basic cases for
Rails, Phoenix, and Haskell. I could imagine getting baselines in here
for other languages and frameworks (especially ones I've used and am
comfortable with) as a baseline.

This also paves the way for searching for user-provided additions and
loading those configurations in addition to what we have here.
2016-05-21 05:34:18 -04:00
Joshua Clayton
6c9912fa29 Switch to faster/more naive implementation of String to Int conversion
Why?
====

This library converts lots of strings to positive integers
(specifically, when it's parsing output from search results). Because
these are always positive integers, we can make more assumptions about
the data and how to parse the values.

Corresponding benchmark: https://gist.github.com/joshuaclayton/767c507edf09215d08cdd79c93a5f383
2016-05-18 11:17:13 -04:00
Joshua Clayton
0e966c9302 Test Util.groupBy 2016-05-15 05:53:29 -04:00
Joshua Clayton
97f083fc2c Use regex in ag for simple words
Why?
====

ag supports using regular expressions for searches; however, the -Q
flag, which was previously always used, resulted in literal search
results.

By searching literal matches, it would potentially return too many
results. For example, with a `me` method in a controller, it'd match
words like `awesome` or `method`.

This introduces a check where, if the token being searched is only
composed of word characters (`[A-Za-z0-9_]`), it'll switch over to use
regular expressions with ag and surround the token with non-word matches
on either end. The goal here is to reduce false-positives in matches.
2016-05-14 08:14:54 -04:00
Joshua Clayton
3851c98d59 Initial Haskell support for allowed tokens
This is based off of tags generated by Hasktags:

https://hackage.haskell.org/package/hasktags
2016-05-14 06:37:48 -04:00
Joshua Clayton
bcbc1b6462 Allow search result grouping
Why?
====

Grouping results can be helpful to view information differently, e.g. to
see highest-offending files or to remove grouping entirely.

This introduces a flag to allow overriding the default group (two levels
of directory)
2016-05-14 06:36:01 -04:00
Joshua Clayton
86cdd114d5 Be more explicit about usage during liklihood 2016-05-13 12:42:05 -04:00
Joshua Clayton
7f0e701823 Extract internal parsing handling to separate module
Why?
====

Parsing lines of results was somewhat unreliable, and terms with odd
characters were causing problems. This:

* extracts parsing into an Unused.Parser.Internal module for ease of
  testing
* fixes cases where certain tokens weren't matching
2016-05-12 18:02:59 -04:00
Joshua Clayton
1457bf0100 Allow Mixfile and __using__ for Elixir 2016-05-12 18:01:16 -04:00
Joshua Clayton
2650e1f040 Improve likelihood calculation and include reasons for evaluation
Why?
====

A simple calculation ("yes, this should be removed" or "no, this is
probably fine") is frankly not enough information for someone evaluating
their codebase to understand why we made the decision.

This introduces a removal reason, so a user understands why we ranked it
the way we did, and adds additional logic around a method and its tests
to determine if a method exists and is only being used in the tests (if
so, it should probably be deleted).

This is done with an Occurrances record, which is created for total
files, test code, and non-test code. The test code logic is somewhat
naive but works in most cases. It doesn't ensure a particular directory,
in the case that tests live alongside source code (e.g. Go), and
captures RSpec cases as well.
2016-05-11 05:18:55 -04:00
Joshua Clayton
3b627ee1c3 Allow multiple matches with single-occurring appropriate tokens 2016-05-08 22:25:48 -04:00
Joshua Clayton
f7a2e1a287 Add Hspec and tests around parsing 2016-05-08 06:54:34 -04:00
Joshua Clayton
8931c08f93 Initial 2016-04-28 05:37:06 -04:00