Skip to content

AI for Research

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Introduction

GPTs excel at scientific research, but become specialized rapidly depending upon their application. GPTs and LLMs also fit as a cog within the larger AI ecosystem of natural language processing, and machine learning.

When deployed privately into secure data enclaves, GPTs can be used with sensitive and secure data (e.g., FERPA, HIPAA, or CUI) without the risk of data breaches or interception over internet traffic.

Specific to this workshop, we focus on code interpreters and code execution using GPTs, but we will also touch upon the creation and deployment of custom AI applications and how to use commercial and open source GPTs for each.

In a future workshop we will cover the deployment of secure private GPTs and LLMs in data enclaves

Why use GPTs for research?

Advantages

  • Increased Efficiency and Productivity: perhaps the most obvious and enticing reason for using GPTs is to automate tedious and repetitive tasks, creating more time for analyses and research.

  • Accuracy & Objectivity: GPTs analyze data without human bias.

  • Pattern Recogition: GPTs may identify patterns and connections in data that a human cannot.

Disadvantages

  • Human Oversight: GPTs should not be used to replace human expertise. Researchers must always evaluate and ensure GPT output are factual and align with published research artifacts.

  • Bias: GPTs can reduce human bias, but suffer from their own training biases.

  • Potential Misuse: GPTs can be used to fabricate scientific research papers or manipulate data, undermining the integrity of science.

Literature Review and Synthesis

GPTs are excellent summarization tools. When coupled with large corpuses of published research they can be invaluable for literature review and synthesis.

Perplexity.ai has established itself as a popular GPT for search and summary of existing web-based material.

Google Deep Research is positioning itself as a platform for in depth prompts on specific topics.

Google NotebookLM allows you to personalize your research by providing your own literature or knowledge (files, images, audio).

Custom ChatGPTs for Literature Review

ScholarAI

ScholarAI is the most highly starred ⭐ ai research assistant on custom GPTs on ChatGPT for research.

ScholarGPT

ScholarGPT was one of the early custom GPTs created on ChatGPT and has many millions of resources embedded within it.

Semantic Scholar

Semantic Scholar is a free, AI-powered research tool for scientific literature, based at Ai2.

🤗 HuggingFace

HuggingFace is the dominant registry for AI models and model data.

Data Analysis

Linux Guru

ChatGPT is trained on common data science languages, like Python, Julia, and R. Use ChatGPT to help develop basic code or to explain and debug code you're trying to write.

Using ChatGPT can be a time savings, reducing the time it takes to look for the answers yourself over conventional search.

I want you to act as a humble data scientist who works a lot with Python and scientific visualization

Create a Python script which generates a visually pleasing and compelling heat map for a CSV dataset

You can also use it to summarize code or to help explain its operation

I want you to act as a humble data scientist who works a lot with Linux 

Explain to me what the following code does:

$ find /home/www \( -type d -name .git -prune \) -o -type f -print0 | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'

Other valuable uses:

  • Change variable names and file names! When you have a large dataset with many files and folder names, you can ask ChatGPT to help design a schema for renaming your project's content

  • Regular Expressions, or regex is a bane of many programmers. ChatGPT can write, edit, and explain complex regex

```markdown I want you to act as a regex generator. Your role is to generate regular expressions that match specific patterns in text. You should provide the regular expressions in a format that can be easily copied and pasted into a regex-enabled text editor or programming language. Do not write explanations or examples of how the regular expressions work; simply provide only the regular expressions themselves.

remove any numbers from a string and replace them with a capital X

Hypothesis generation

Examples of roles you might ask for are: a domain science expert, an IT or DevOps engineer, software programmer, journal editor, paper reviewer, mentor, teacher, or student. You can even instruct ChatGPT to respond as though it were a Linux terminal, a web browser, a search engine, or language interpreter.

Data Scientist

Let's try an example prompt with role-playing to help write code in the R programming language.

I want you to act as a data scientist with complete knowledge of the R language, 
the TidyVerse, and RStudio. 

Write the code required to create a new R project environment,
Download and load the Palmer Penguins dataset, and plot regressions of body mass, 
bill length, and width for the species of Penguins in the dataset. 

Your response output should be in R and RMarkDown format 
with text and code delineated with ``` blocks.

At the beginning of new file make sure to install any 
RStudio system dependencies and R libraries that Palmer Penguins requires.

Example can use GPT o1 or Gemini 2.0

Talk to Dead Scientists

Try to ask a question with and without Internet access enabled:

I want you to respond as though you are the mathematician Benoit Mandelbrot

Explain the relationship of lacunarity and fractal dimension for a self-affine series

Show your results using mathematical equations in LaTeX or MathJax style format
Again, there is no guarantee that the results ChatGPT provides are factual, but it does greatly improve the odds that they are relevant to the prompt. Most importantly, these extensions provide citations for their results, allowing you to research the results yourself.

Feedback

Example 3: Programming help

Another impressive application of ChatGPT is in the field of programming. You can use it as a coding assistant, where it can help write code, debug issues, or explain complex code snippets. By asking it to convert your high-level descriptions into code, or to suggest improvements for existing code, you can significantly enhance your programming productivity.

Coding Assistant

Suppose you're working on a Python program to perform data analysis, but you're not sure how to write a function to calculate the median from a list of numbers. You might use ChatGPT like this:

Python median function
I'm trying to write a Python function that takes a list of numbers as an argument and returns the median. I'm not sure about the best way to implement this. Could you help me write the code?

ChatGPT could then provide you with a suitable Python function, demonstrating the logic to calculate the median from a list of numbers.

Debugging

Let's say you're having trouble with a piece of JavaScript code that's not behaving as expected. You could ask ChatGPT for help as follows:

Debugging JavaScript

my JavaScript code to add event listeners to buttons isn't working as expected. Here's the code:
let buttons = document.querySelectorAll('.btn');
for (let i = 0; i < buttons.length; i++) {
    buttons[i].addEventListener('click', function() {
        console.log('Button ' + i + ' clicked');
    });
}
When I click a button, it always logs 'Button 5 clicked', no matter which button I click. What's going wrong, and how can I fix it?"

ChatGPT could then explain the issue (in this case, a common pitfall with JavaScript closures) and suggest a corrected version of your code.

Limitations

Remember, while ChatGPT is knowledgeable in many programming languages and concepts, it doesn't replace a full Integrated Development Environment (IDE) or debugger and should be used as a supplementary tool for coding assistance.

  • Data Cleaning and Preprocessing: Automate the process of cleaning and preparing data for analysis, including handling missing values, data normalization, and outlier detection.
  • Code Generation: Generate code snippets for specific data analysis tasks, such as statistical tests, data visualization, and machine learning model implementation.
  • Algorithm Selection and Design: Suggest appropriate algorithms or models based on the characteristics of the data and the research question.
  • Automated Report Writing: Generate summaries of data analysis results, including key findings, visualizations, and interpretations.
  • Literature Review Assistance: Quickly find and summarize relevant research papers, identify key concepts, and extract important information.
  • Hypothesis Generation: Explore potential research questions and hypotheses based on existing data and literature.
  • Experimental Design: Assist in designing experiments, including determining sample sizes, selecting appropriate variables, and suggesting control measures.