Project Configuration
=====================

Specify the Git Repository to Analyze
-------------------------------------

Your first step is to tell CodeScene where your code is. There are six different ways
of doing that:

1. Specify the URLs to Git remotes. CodeScene supports the protocols specified by Git clone: ssh, http, and git.
CodeScene will clone the remotes to a local folder that you specify in the
configuration as illustrated in :numref:`repo-config-git-remotes`. Note that CodeScene will re-use a local Git repository if
there's an existing clone on the path you specify. Also note that
you need to have a an ssh-key that lets the CodeScene (system) user access your remote repositories.

2. Clone an existing analysis configuration. CodeScene copies all your configuration options -- filters, repository paths,
exclusions, teams, ex-developer configuration, etc -- to a new project. From here those two projects (the original and the clone) are
completely independent and changes to one of them do not affect the other.

.. figure:: RepoConfigGitRemotes.png
   :name: repo-config-git-remotes
   :alt: Let CodeScene clone your Git repositories through their URL.

   Let CodeScene clone your Git repositories through their URL.

Finally, note that you cannot mix local repository paths with URLs to remote Git repositories in a single analysis project.

3. Import a project configuration that has been created by exporting the configuration of a project from this or
another CodeScene instance.

4. Specify the paths to your local, physical Git repository, which has
to be on the same machine as CodeScene runs on. The path you
specify has to be to the root folder of your repository (i.e. the folder
that contains your .git folder).

5. Let CodeScene scan a folder on your file system for repositories to analyze. You'll be
prompted with the results and are free to ignore the repositories you want to exclude. This option is
useful in a multi-repository project.

Analyze Projects organized in Multiple Git Repositories
-------------------------------------------------------

There's a recent trend towards organizing the source code of larger
systems in multiple Git repositories. For example, you may have the code
for your user interface in one repository, the code for your service
layer in another repository and perhaps even a Git repository dedicated
to your back end mechanism. Another typical example is *Microservices*
where each service is deployed according to its own life cycle. In that
case, organizations often chose to use one Git repository per service.

CodeScene supports an analysis of multiple repositories at once.
All you have to do is to specify the paths to them:

.. figure:: MultiRepoConfigGuide.png
   :alt: Configuration of multiple repositories

   Configuration of multiple repositories.

The screenshot above shows two repositories that belong to the same
product. During an analysis, CodeScene will analyze the
evolution of the code in all those repositories *as though they were in
the same physical Git repository*.

You can specify as many repositories as you want and remove one at any
time (just erase the text in that box). However, a word of warning: do
*NOT* attempt to analyze unrelated repositories in the same
configuration. First of all it's a breach of the license agreement.
Worse, you won't get useful results since many of the basic metrics,
like Hotspots, are relative metrics.

Scan Directory
----------------------------

Specifying one or two repositories by hand is straightforward. However, some systems
consists of hundreds of repositories. In that case you want to use the Scan Directory feature.

The Scan Directory feature lets you specify a root path to where your repositories are located.
Here's what it looks like:

.. figure:: RepoAutoImportGuide.png
   :alt: Auto import multiple repositories

   Automate the import of multiple repositories.

CodeScene will scan the path you provide to discover any Git repositories.
The discovered Git repositories are presented in a list. Note that you can add
additional repositories manually or remove the once you want to exclude:

.. figure:: RepoAutoImportSampleGuide.png
   :alt: Auto import of multiple repositories

   The result of scanning and importing multiple repositories.

From here you just press Continue to proceed with the configuration of your analysis. The
rest of the workflow is identical to the case where you specify repositories manually.

.. _temporal-coupling:

Tune the House-Keeping Options for Analysis Results
---------------------------------------------------

CodeScene is designed to run continuously to monitor your system. That also means you will
accumulate lots of historic analysis results that occupy space on your host machine.

CodeScene lets you specify a house-keeping strategy that automatically cleans out old
historic results, as illustrated in :numref:`house-keeping`.

.. figure:: house-keeping.png
   :name: house-keeping
   :alt: Configure house-keeping options.

   Specify how much history you want to keep.

Measure Change Coupling across Multiple Repositories
------------------------------------------------------

The normal change coupling metric considers two files coupled if they
tend to change in the same commits. This won't work if your codebase is
split across multiple repositories. Instead, you want to aggregate
individual commits into logical commits. CodeScene supports two
different strategies for aggregating commits:

By Author and Time
   When you specify this option,
   the tool will consider all commits by the same author on the same day
   as a single, logical commit. This option is a heuristic that works
   well in the absence of a Ticket ID in your data.
By custom Ticket ID
   This option uses an
   identifier in your commit headers. All commits that refer to the same
   identifier will be considered one logical commit.

The second option, *By custom Ticket ID*, is the preferred method.
:numref:`change-coupling-strategy` shows the options in the repository
configuration section `Change Coupling`.

.. figure:: change-coupling-strategy.png
   :name: change-coupling-strategy
   :alt: There are two available strategies for aggregating commits.

   There are two available strategies for aggregating commits.


To aggregate by custom Ticket ID, you need specify a *Ticket ID Pattern*, in
the *Ticket ID Mapping* section (see :numref:`ticket-id-pattern`). The pattern
is used to extract the Ticket ID from the commit message. The example pattern
in :numref:`ticket-id-pattern` will extract all identifiers that start with the
text ``ISSUE-`` followed by at least one digit.  For example, the commit
message ``ISSUE-42`` will result in ``42`` as the extracted Ticket ID.

.. figure:: ticket-id-pattern.png
   :alt: Configure a pattern to extract a Ticket ID.
   :name: ticket-id-pattern

   Configure a pattern to extract a Ticket ID.

Note that CodeScene will still calculate normal change
coupling on a single commit basis. You want that in order to spot
unexpected dependencies between files in the same repository. The
change coupling results for the logical commits discussed above are
presented in a separate analysis view.

Change Coupling Exclusion Filters
-----------------------------------

You might have files that you expect to be temporally coupled, for example
tests and the corresponding units under test, or matching `.c` and `.h` files.
To exclude these coupling from visualization by default, go to the "Change
Coupling" section of the project configuration and add "Change Coupling
Filters" for the patterns you want to exclude, as shown in
:numref:`change-coupling-filters`.

.. figure:: project-change-coupling-filters.png
   :alt: Configure change coupling filters for expected file couplings.
   :name: change-coupling-filters

   Configure change coupling filters for expected file couplings.

Each filter has a *name*, that can be anything you like, and *patterns* for
coupled file paths. The patterns are a regular expressions. When a pair of
coupled files match the patterns, in either direction, they are excluded by the
filter.

All filters are tried in sequence, and if any filter
hits a coupled pair, the pair is excluded. Some useful examples of patterns
are:

+--------------------------------+--------------------------------+-------------------------------------------------------------------+
| Pattern (File 1)               | Pattern (File 2)               | Description                                                       |
+================================+================================+===================================================================+
| ``.+\.(?:c|cc|cpp|cxx)``       | ``.+\.(?:h|hh|hxx)``           | C/C++ includes, e.g. ``gc.cpp`` and ``util.h``                    |
+--------------------------------+--------------------------------+-------------------------------------------------------------------+
| ``.+\/(.+)\.java``             | ``.+\/(.+)Impl\.java``         | Java "Impl" pairs, e.g. ``Thing.java`` and ``ThingImpl.java``     |
+--------------------------------+--------------------------------+-------------------------------------------------------------------+
| ``.+\/(.+)\.cs``               | ``.+\/I(.+)\.cs``              | C# interface pairs, e.g. ``IComponent.cs`` and ``Component.cs``   |
+--------------------------------+--------------------------------+-------------------------------------------------------------------+
| ``.*\/(?:(?!test).)+\.py``     | ``.*\/test_.+\.py``            | Python files and tests, e.g. ``foo/a.py`` and ``tests/test_a.py`` |
+--------------------------------+--------------------------------+-------------------------------------------------------------------+

If any of the patterns have capturing groups, both matches must generate the
same number of captures, with equal values, to trigger the filter.  Note that
non-capturing groups and negative look-ahead in regular expressions can be
useful if you want to write advanced filters, and only trigger filters on
corresponding files in corresponding directories.

Linking to an External Ticket System
------------------------------------

If you have a Ticket ID Pattern configured, and a way to deep-link to tickets
by the matched identifiers, you can configure a *Ticket URI Template* to enable
links in analysis views. That way you will be able to quickly navigate from
Code Churn by Task to the external ticket system, and view more details
there.

The Ticket URI Template is based on `the URI Template format (RFC
6570) <https://tools.ietf.org/html/rfc6570>`_, with support for the single
expression ``{ticket-id}``. The matched ticket value, i.e. the captured value
of the regular expression group, is used as ``{ticket-id}`` for hyperlinks.
For example, if your Ticket ID Pattern is ``#(\d+)``, and your Ticket URI
Template is ``https://example.com/tickets/{ticket-id}``, a commit containing
the string ``#1234`` will result in a hyperlink to
``https://example.com/tickets/1234``.

Some useful examples of Ticket ID Patterns and Ticket Template URIs are:

* **GitHub:** ``#(\d+)`` and ``https://github.com/your-org/your-project/issues/{ticket-id}``
* **JIRA:** ``(\[A-Z]{2,}-\d+)`` and ``https://example.com/jira/browse/{ticket-id}``
* **Trello (Card Numbers):** ``CARD-(\d+)`` and ``https://trello.com/search?q={ticket-id}``
* **Trello (Card Short IDs):** ``CARD-(.+)`` and ``https://trello.com/c/{ticket-id}``
* **Azure DevOps:** ``#(\d+)`` and ``https://dev.azure.com/your-org/your-project/_workitems/edit/{ticket-id}``

Detect Patterns in Code Comments
--------------------------------

Exhaustive use of certain code comments indicate code smells. For example, a file that is filled with `TODO` comments is
usually not that reassuring. On a similar notes, organizations might use static analysis tools and use code comments
to suppress the findings. By configuring a set of patterns, you can use CodeScene's virtual code reviewer to
detect such patterns as shown in :numref:`code-comments-biomarkers`.

.. figure:: code-comments-biomarkers.png
   :alt: Detect specific type of code comments
   :name: code-comments-biomarkers

   Detect specific type of code comments.

The configuration is a bit special, but read along for examples -- it's not hard:

.. figure:: code-comments-biomarkers-config.png
   :alt: Configure regular expressions to detect code comments
   :name: code-comments-biomarkers-config

   Configure regular expressions to detect code comments.

:numref:`code-comments-biomarkers-config` presents two patterns that CodeScene will match in the code comments of
your hotspots. Each pattern consists of two parts, separated by the regex inline comment syntax, `(?#comment)`:

1. A regular expression to match in the code comments.
2. A descriptive name of the content that the regular expression matches. This will be used in the virtual code reviewer.

In the first example, we match the expression `codechecker_\w+`. That is, any code comment that starts with `codechecker_` followed by
a string such as `_confirmed` or `_critical`. We then add the descriptive comment `(?#Suppress Dead Code)`. Note that only "Suppress Dead Code" makes
up the name; the `(?#...)` syntax is only to embed the name in the regex.

The second example shows a simpler pattern where we match the literal string `TODO` in a code comment, and associate it with
the label "Detect TODOs" which will then be displayed in the virtual code review.

Exclude Initial Commits from an Analysis
----------------------------------------

Some Git repositories start their life as an import of an existing codebase. If the previous history isn't migrated
together with the code, the author that does the initial commit of the existing codebase gets all the credit. This
leads to a bias in the social analyses.

The solution is to exclude all contributions done as part of the initial commit. You specify those commits (fetch them
from your Git log) in the project configuration as shown in :numref:`configure-exclude-commits`.

.. figure:: exclude-commits.png
   :alt: Exclude specific commits
   :name: configure-exclude-commits

   Exclude specific commits from the analysis.

Exclude Files from an Analysis
------------------------------

An analysis will include all textual content in your repository. That
means: you get an analysis of your build scripts, resource files,
configuration files, test data, etc. While it's a good practice to run
an analysis of all content every now and then, there's also the risk
that you get too much noise in the analysis results. For example, you
typically want to exclude auto generated content.

The *Exclude Files* option lets you specify a set of file extensions
that will be excluded from your analysis:

.. figure:: RepoGuideExcludeFiles.png
   :alt: Exclude specific types of files

   Exclude specific types of files.

CodeScene comes with a set of pre-defined exclusion patterns
that should match the most common cases. You're free to extend this set
if you have additional file types that you want to exclude. Just
remember to use a semi-colon (;) to separate each file extension you
want to exclude.

 .. _Exclude-Specific-Files-and-Folders-from-an-Analysis:

Exclude Specific Files and Folders from an Analysis
---------------------------------------------------

You just learned how you can exclude certain types of files, no matter
where they are located in the your codebase. But sometimes you'd like to exclude a particular file or,
more often, a complete folder. For example, let's say that you check-in
third party code in your repository. You don't want that code to obscure
potential analysis findings in your own code.

There are two different ways to exclude complete folders and files:

1. Whitelist the content you want to include in the analysis. All other content will automatically be excluded.
2. "Exclude Content" that isn't of interest in the analysis, typically 3rd party code and auto-generated code.

You can specify both white- and black list content. The white listing will be applied first.

You specify a *glob pattern* to white list the content to include in your analysis as illustrated in :numref:`whitelistcontent`.

.. figure:: whitelistcontent.png
   :alt: Patterns to whitelist content
   :name: whitelistcontent

   Glob patterns to whitelist content.

You specify a *glob pattern* to Exclude Content from the analysis as illustrated in :numref:`excludecontent`.

.. figure:: excludecontent.png
   :alt: Patterns to exclude content
   :name: excludecontent

   Glob patterns to exclude content.

The example above will exclude all content under the external folder and
the file samples.txt from the generator folder.

*Note:* You need to specify your exclusion paths using UNIX style path
names. That is, use forward slashes as separators. Also note that the
paths have to start with the name of your repository root. That is, if
your Git repository is located in a folder named backend, as in the
example above, you have to prepend that folder name to all your
exclusion patterns. The reason for that is due to CodeScene's support for
multiple repositories where you have to be explicit about what
repository you exclude things from.

There's one exception to the rule that patterns have to specify the repository root. That's the case when you
want a pattern to apply across all repositories. For example, let's say that you want to exclude all shell scripts
in your test folder. In that case you specify a pattern like `**/test/*.sh` That is, your patterns are
allowed to start with a wildcard too.

A Brief Guide to Glob Patterns
------------------------------

Glob patterns let you specify paths- and file names with different wildcards. CodeScene supports the following wildcards:

1. `*`: A single asterisk matches any string of characters. Use it to exclude or while list particular files. For example `*.h` will exclude all files with extension `h`.
You can also use the single asterisk to specify glob patterns that apply to *all* your repositories in a multi repository analysis project.
For example, the glob pattern `*/version.txt` will match (and possibly exclude) the `version.txt` files at the top level of each of your repositories.

2. `**`: The double asterisk matches whole paths/directories. You use the double asterisk to exclude or white list content *independent* of
the content's location in your codebase. For example, the pattern `myrepository/**/*.h` will match all files with extension `h` in *any*
directory in your repository. You can also use the double asterisk to match exclude or white list whole folders. Let's say we want to
exclude all our unit tests from an analysis and that those tests are located in the repository 'coolstuff'. Here's a pattern for that: `coolstuff/test/**`.

3. `?`: The question mark matches a single character.

Please note that *all* glob patterns are specified using UNIX style path names. That is, if you're on Windows you do *not* use backslash to separate
directory names, but rather the UNIX style forward slash. That is, the directory `SomeRepo\\Test` is excluded by specifying `SomeRepo/Test/**`.

Specify An Analysis Period
--------------------------

CodeScene lets you specify how far back in time we go to mine your repository history.
The actual analysis period you select depends on several factors:

1. *The activity in your project*: Select a short analysis period, like 6 months, in a codebase with a lot of development activity.
2. *The information you want*: If you want an overall view of potential maintenance problems, we recommend that you use a longer analysis period like a couple of years. If, on the other hand, you want to identify recent modifications to the codebase, your analysis period could be as short as just a couple of iterations, e.g. 1 month.

By default CodeScene uses three separate analysis periods depending on the type of information it analyses:

* Hotspot information uses a sliding window to avoid that historic -- but now stable -- hotspots bias more recent trends.
* The team-level analyses use a separate date. Specify the date of the last organizational change here.
* Individual knowledge metrics and trends should use the full history of your repository.

The rationale is that analyses on the level of individual developers, like knowledge maps and knowledge loss, need to take the full history of
the codebase into account in order to be accurate. You can disable this behavior and use the specified date for all analyses
by unchecking the box "Use the complete Git history for knowledge metrics".

Similar, team-level analyses like coordination needs and Conway's Law should ignore the historic activity of previous organizational structures, and
you want to measure from the date where the current team structure got operational.

Finally, please remember that selecting an analysis time span depends on the questions you have. As such your choice depends
on your context and is more of a heuristic than a science. Always start with an analysis of the full history when in doubt.

Exporting the project configuration
-----------------------------------
On the Export tab, the entire project configuration or parts of it can be exported to downloadable JSON and CSV files.
This can be used for sharing the project configuration to another CodeScene instance, or for archiving projects before deletion.

.. figure:: export-configuration.png
	    :name: export-configuration
	    :alt: Export the project configuration to file

Custom project defaults
-----------------------

When new projects are created, CodeScene sets a number of default analysis configuration settings.
In most cases, these default settings only need to be adjusted to meet the specific needs of indivudal projects.
However, some organizations may want to apply different defaults to all of their project.
With CodeScene's **custom project defaults**, you can avoid the repetitive work of manually changing those values on every new project.
This is especially helpful for organizations with large numbers of projects.

How it works: the source project
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When the feature is enabled, you choose a project that will act as the *source project* for the default configuration settings.
Changes made to these values in the source project will be applied to any new projects when they are created.

After a project has been created, changes to the source project **will not** have any effect.
Custom defaults are only applied at project creation time.

In large installations, it is recommended to create a dedicated source project that is only accessible to administrators.

Enabling custom defaults
^^^^^^^^^^^^^^^^^^^^^^^^

Custom defaults can be enabled from the global Configuration page. 

.. figure:: CustomProjectDefaultsConfig1.png
            :name: custom-project-defaults-config
            :alt: Enable the custom project defaults on the global Configuration page

Clicking on the "information" icon will display an up-to-date list of the configuration settings that can be used as defaults.


Using the defaults
^^^^^^^^^^^^^^^^^^

When creating a new project, the custom defaults will be applied
automatically.  Users who want to create their projects with the
standard CodeScene defaults can opt out by unchecking the box labeled
"Use default project settings defined by your Admin".


.. figure:: CustomProjectDefaultsProjectCreation.png
            :name: custom-project-defaults-project-creation
            :alt: At project configuration time, users can choose to use the CodeScene defaults instead


Which configuration settings are included?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Not all configuration values can become custom defaults.
Only settings that can be reliably set across all projects in a CodeScene installation can be set as custom defaults.
Many of the configuration fields in CodeScene are too closely related to a single project.
It would not make sense, for example, to reuse the same Architectural component definitions on every project. 

Only configuration items in a project's "Configuration" tab can be reused as defaults.
Thus, PM and PR integration configurations cannot be set as defaults.
Because of their security implications, status badges must always be enabled per project.

For a complete list of the settings included as custom defaults, administrators can consult the popup on the global Configuration page:

.. figure:: CustomProjectDefaultsPopup.png
            :name: custom-project-defaults-settings-list-popup
            :alt: Click on the "i" icon to see an up-to-date list of custom default settings.       



