Experiments#

For reproducing experiments from our paper, please have a look at the experiments folder.

You can find prior experiments in our old repository.

0. No fine-tuning#

The 00_no_fine_tuning folder contains the code for the experiments we conducted to investigate the performance of GPT without any fine-tuning. We basically see that without fine-tuning GPT cannot answer our questions—also simply because it does not know the format of the prompts and the format of the completions we would expect.

1. Tuning parameter influence#

The 01_tuning_parameter_influence folder contains the code for the experiments we conducted to investigate the influence of a couple of tuning parameters on the performance of the classification performance. For computational reasons, we limited these experiments to a binary classification on the photoswitch dataset. For this experiments we split the dataset into two balanced classes. Therefore, a dummy classifier would have an accuracy of 50%.

Note

We recommend to always also train dummy models, e.g. sklearn.dummy.DummyClassifier or sklearn.dummy.DummyRegressor to get a sense of the performance of a random model. Only consider GPT-3’s predictions if they are better than the dummy model.

We investigated the following parameters:

num_train_epochs: Number of training epochs
learning_rate_multiplier: Multiplier for the learning rate
base_model: The base model used for fine-tuning
- GPT-3:
  
  text-davinci-003
  
  text-ada-001
  
  text-babbage-002
  
  text-curie-002
At the beginning of Jan 2023, Codex models could not yet be fine-tuned.

To run the baselines, the following additional dependencies are needed:

tabpfn. Follow the installation instructions of the tabpfn repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository

3. Classification experiments#

Each folder will contain a Python script to run the experiments and a notebook to analyze the results.

You will find subfolders for the following experiments:

HOMO/LUMO gaps#

See the folder bandgap

To run the baselines, the following additional dependencies are needed:

molclr. Follow the installation instructions of our fork of the MolCLR repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository
optuna Follow the installation instructions of the optuna repository.

Heat capacity#

See the folder cv

To run the baseline, follow the instructed provided by Moosavi et al.. For the composition-based baseline, you also need CrabNet. Follow the installation instructions on Sterling Baird’s fork of the CrabNet repository.

High entropy alloy phase#

See the folder hea_phase

High entropy alloy single vs multiphase#

See the folder hea_single_vs_multiphase

To run the baselines, the following additional dependencies are needed:

automatminer Follow the installation instructions of the automatminer repository.
CrabNet Follow the installation instructions on Sterling Baird’s fork of the CrabNet repository.

Henry coefficients#

See the folder henry.

To run the baselines, the following additional dependencies are needed:

optuna Follow the installation instructions of the optuna repository.

Lipophilicity#

See the folder lipophilicity

To run the baselines, the following additional dependencies are needed:

molclr. Follow the installation instructions of our fork of the MolCLR repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository
optuna Follow the installation instructions of the optuna repository.

Matbench#

See the folder matbench. In there, there is one folder per task.

automatminer Follow the installation instructions of the automatminer repository.
MODNet Follow the installation instructions of the MODNet repository.

OPV#

See the folder opv

To run the baselines, the following additional dependencies are needed:

molclr. Follow the installation instructions of our fork of the MolCLR repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository
optuna Follow the installation instructions of the optuna repository.

Photoswitches#

See the folder photoswitch

To run the baselines, the following additional dependencies are needed:

molclr. Follow the installation instructions of our fork of the MolCLR repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository
optuna Follow the installation instructions of the optuna repository.

Polymers#

To run the baselines, the following additional dependencies are needed:

optuna Follow the installation instructions of the optuna repository.

C-N cross-coupling#

See the folder rxn_doyle.

To run the baselines, the following additional dependencies are needed:

gauche. Follow the installation instructions of the gauche repository.`

C-C cross-coupling#

See the folder rxn_suzuki.

To run the baselines, the following additional dependencies are needed:

gauche. Follow the installation instructions of the gauche repository.`

Solubility#

See the folder solubility

To run the baselines, the following additional dependencies are needed:

gauche. Follow the installation instructions of the gauche repository.`
molclr. Follow the installation instructions of our fork of the MolCLR repository.
gpflow. Follow the installation instructions of the photoswitch dataset repository
optuna Follow the installation instructions of the optuna repository.
deepchem Follow the installation instructions of the deepchem repository.

4. Regression experiments#

The regression experiments follow the same structure as the classification experiments.

5. Inverse design#

The evaluation of the HOMO-LUMO gap inverse design expects that the scripts are run on a server with slurm as job scheduler.

Some experiments hard-code pretrained models. You won’t have access to those as they are limited to our organization. However, you can fine-tune a model yourself and then change the modelname.

6. Prompt structure#

In this experiments, we attempted to add some examples (with dummy data) to the prompt with the hope that this would make the learning more efficient—because the model might “in-context-learn” the structure of the prompt. However, we found this to rather confuse the model.

7. Few shot learning#

This contains our experiments for in-context learning. Note that this also calls the largest models and hence can be quite expensive to run.

8. Permutation test#

Our experiment on the photoswitch where we permute the labels to see if this is different to learning from chemically meaningful data.

9. Invalid prompts#

In those experiments we used some pre-trained models. You won’t have access to those as they are limited to our organization. However, you can fine-tune a model yourself and then change the modelname.

Experiments#

0. No fine-tuning#

1. Tuning parameter influence#

3. Classification experiments#

HOMO/LUMO gaps#

Heat capacity#

High entropy alloy phase#

High entropy alloy single vs multiphase#

Henry coefficients#

Lipophilicity#

Matbench#

OPV#

Photoswitches#

Polymers#

C-N cross-coupling#

C-C cross-coupling#

Solubility#

4. Regression experiments#

5. Inverse design#

6. Prompt structure#

7. Few shot learning#

8. Permutation test#

9. Invalid prompts#

10. Functional groups#