Show full content
Update 2025-03-12: We have since improved STGG+ and added active learning (STGG+AL). It beats RL method at generating molecules with complex properties. The molecules we get are much nicer than the ones from the original paper. Molecule synthesizability can be improved simply by adding constraints such as max-ring-size ≤ 6 and removing too large molecules (since STGG+ already takes care of ensuring proper valency rules). See below for an example of a molecule made by STGG+AL.

——————————————————————————————————–
Paper / Code
Twitter (sorry 𝕏
) is obsessed with Large Language Models (LLMs) lately, so we hear very little about other cool applications of generative AI. Molecule generation
is an exciting area of generative AI since it can serve to generate new drugs or materials (such as Organic-LED; the material used in the screen of your smartphone
and even newer TVs
). In this work, we describe a powerful new method for property-conditional molecule generation with self-criticism using the lesser-known Spanning Tree-based Graph Generation (STGG) method. We derive a lot of exciting techniques, such as self-filtering through property-prediction and random classifier-free guidance.
The molecules we love


]
What we care about for real-world molecule generation 
Generating valid instead of invalid molecules: Generative models trained on standard molecule representations (SMILES, 2D Graph) will lead to many invalid molecules, especially with large molecules. SELFIES is a well-known method to prevent invalid molecules through a specific grammar, but it often leads to worse performance. STGG is a lesser-known way of preventing invalid molecules through specific if/else rules during next-token sampling, which mask invalid tokens. We start from STGG as a base approach since it’s one of the best-performing molecule representations (along with GEEL) for unconditional generation.
Property-conditional generation: Most of the research on molecule generation focuses on unconditional molecule generation. However, in the real world, we care about generating molecules with some desired properties rather than unconditionally, which has little practical use.
Any-property-conditioning: We also want to consider any combination of desirable properties without retraining the model every time. We make this possible by masking random combinations of properties during training.
A powerful, fast, and modern architecture: We improve on the STGG base Transformer using all the tricks used in Modern LLMs: RMSNorm, projection weight initialization, no bias terms, Rotary embeddings, Flash-Attention-2, SwiGLU, and changes in hyperparameters.
Self-criticism
: Synthesizing a molecule can take days, weeks, or even months. Thus, we cannot expect overworked chemists to synthesize and measure the properties of all our generated molecules! We need a way to filter out the molecules we provide to chemists. We propose the following idea: have the generative model predict the properties of its own molecules! We give the model the ability to predict properties and thus self-criticize its own generated molecules, allowing it to automatically filter out molecules with incorrect properties. 
Classifier-free guidance: Classifier-free guidance (CFG) is a technique for improving the performance of diffusion models. It has also been shown to be useful for language models. We also found it to improve molecule autoregressive generation.
Out-of-distribution properties: We may seek to generate novel molecules with out-of-distribution properties that have never been observed before in order to expand the range of our molecular knowledge. These properties generally involve an extreme range of values, sometimes leading to worse performance when using classifier-free guidance with large guidance (w>1). We propose to randomly sample a guidance w ∼ U(0.5, 2) for each sample, ensuring a mix of low (w<1) and high (w>1) guidance. Then, our method selects the best-out-of-k molecule from the molecules generated at different guidance levels, indirectly allowing the model to determine by itself which guidance is best for each sample.
The resulting method (STGG+)
Figure 1: Our STGG+ architecture. The molecule is tokenized and embedded. The number of started rings and embeddings of continuous and categorical properties are added, and the output is passed to the Transformer. The Transformer output is then split to produce 1) the predicted property and 2) the token predictions (masked to prevent invalid tokens).

Figure 2: Generation and self-prediction using STGG+. We autoregressively generate K molecules conditional on desired properties using classifier-free guidance. The unconditional model predicts the properties of the K molecules, and the molecule assumed to be closest to the desired properties is returned.

Some results


Final words
Check out the paper for more details! We hope this gets you more interested in molecule generation. This field has many exciting applications. This work paves the way toward real-world applications. As the next step, at Samsung, we will apply this method to search for novel materials. Stay tuned!
), 2) deep learning generally works the best, and 3) it is highly scalable through
tabular data generation and diverse data imputations 



.


We highlight some of our most exciting results below (for the actual videos, see the 

This means that training Generalists is more beneficial than training Specialists! See Figure 6 below for some examples of this. Note that we are still running some general models as the paper contains fewer models with future-frames masking; we will keep everyone up-to-date as new results arrive.
). Given our GPU constraint, our number of parameters was limited, and thus we could not reach the state-of-the-art on unconditional generation for very hard datasets such as UCF-101. See Figure 7 below.


, our code and checkpoints will be fully open-source (we are finalizing the code; we will release it within the end of the week)! 


