REFERENCE - SESSION - IN-DEPTH EXERCISE - PROJECT GUIDANCE

You have a rough project definition. Now what?

You don’t know exactly what the final product looks like, or how you are going to get there, but like every good researcher you want to follow the scientific method, accomplish research effectively, and share that work with the community.

You can help yourself, your collaborators, and other researchers achieve these goals by choosing and organizing your tools and workspace effectively.

Science as Process and Product

What makes science good? One aspect concerns characteristics of results: novelty, generalizability or extensibility, utility, etc.

Another concerns how those results were achieved: replicability, verifiability, validity, credibility, etc.

When there is a large software component to some research work, much like when there is a large empirical component either in laboratory experiment or field observation, organization can contribute to those aspects. For particularly complex phenomena, a lack of organization can thwart even getting results, let alone providing solid provenance for those results.

In the classroom session and extended exercise, we will try connect to how you can organize your workspace and tools in a way that improves your project work towards satisfying those concerns.

Caveat: Deal With the Fact That You Don’t Know!

Science is typically about exploration. How you choose to organize your workspace can support that exploration, or obsessing about arrangements can distract from getting things done.

What follows are discussions about how to strike that balance.

Collaboration & Configuration

You and your collaborators (and more generally, any researchers engaging with your work) are going to be using different computers. How are you going to share work effectively?

One problem is just making sure the team is all working with a shared perspective. The are several approaches to data sharing. When it comes to analytical code, version control is the preferred way to manage that.

But beyond these basic concerns, the team members will likely have different setups on their computers. This is fine - different people have different resources, tasks, preferences, etc.

Many times these differences are trivial - for example, if you’re all working on machines in a lab with the same computers and OS, doing research that doesn’t require any particular libraries. However, you and your collaborators may be using different operating systems, on a project that relies on a variety of different tools.

In that situation, you’ll need to think ahead a bit. What sort of problems does this create? How can you address those challenges?

Search for python dependency management or r dependency management. What do you find?

What to Do About Data?

Version Control

Another thing to keep in mind: what goes in the repository vs. what stays on your local machine.

System Tools

Unix-like systems provide a variety of command line tools to accomplish tasks in the file system. In some settings (e.g., typical supercomputers) these tools are it, so they are important to understand. However, even where you can use fancier tools, these may do parts of the job better.

Some command line cheat sheets:

Basic Layout

IDE Organization Tools

Publish!

The our goal as scientists is to create useful knowledge (useful may be defined on a very long time scale). Knowledge doesn’t exist if people don’t have access to it, and it’s not useful if they can’t engage with it.

There are a variety of tools that support co-mingling your code with the scientific report it supports:

Finally: Think in Terms of the Product

We want “software” (e.g., the combination of scripts, analysis code, data management, reference management, external tools devoted to addressing a particular research question / area) that is:

Think in Terms of the Process

motivate remaining points in terms of how a scientist works. Organization should support: