Guest Column | June 10, 2025

A Guide To Designing mRNA Medicines

By Helen Gunter and Timothy Mercer, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland

dna, rna molecules-GettyImages-537327406

The cellular instructions to produce a protein are encoded within the mRNA sequence. Optimizing the mRNA primary sequence can markedly improve the magnitude and duration of protein expression and deliver better performance at a lower dosage. Conversely, a poorly designed mRNA sequence can create ongoing difficulties during downstream clinical development and manufacture.

The design of an effective mRNA medicine typically has three key aims. The first is to increase the magnitude of protein expression; the second is to improve mRNA stability and increase the duration of expression, and the final aim is to avoid unwanted innate immune responses. Here, we discuss the different design features and strategies employed to meet these objectives and optimize mRNA performance.

The Biology Of mRNA Sequence

An understanding of the biology of mRNA within the cell is needed to inform mRNA design. Within the cell, the mRNA sequence both encodes the protein and regulates translation. However, the primary sequence also impacts many other aspects of the mRNA life cycle, including splicing, transport, and subcellular localization. These roles are facilitated through interactions with RNA-binding protein (RBP) partners or other RNAs that bind to sequence elements and secondary structures within the mRNA. An understanding of these sequence elements can be leveraged to modulate the expression of mRNA medicines.

Sequence Design

An mRNA can encode almost any protein of almost any length and type, from short peptide sequences to long structural proteins. The amino acid sequence of the protein is encoded within an open reading frame (ORF) that begins with a methionine (AUG) start codon and ends with one or more stop codons. The amino acid sequence is reverse translated into a nucleotide sequence, using optimized codons that are commonly used in other highly expressed housekeeping mRNAs within the target organism or tissue.

Difficult sequences, such as repetitive, low complexity, or GC rich regions may also be avoided. These sequences are difficult to manufacture, prone to transcriptional and translational errors, and fold into stable local secondary structures. Similarly, codons that include uridines, which can be recognized by Toll-like receptors, may elicit an unwanted innate immune response.

Untranslated Regions

The 5’ untranslated region (5’ UTR) is a noncoding sequence, upstream of the open reading frame, that has a primary role in determining the efficiency of translation initiation. The 5′ UTR is bound and scanned by the ribosome and interacts with RNA elements that modulate translation, such as the Kozak sequence that enhances translation. Additional structures in the 5’ UTR, such as internal ribosome entry sites (IRES), can recruit ribosomes independently of the 5′ cap, while hairpins may hinder ribosome scanning and translation.

The 3′ UTR is typically longer than the 5’ UTR and has a greater role in modulating cell-specific mRNA expression. For example, endogenous microRNAs can bind to complementary sequences in the 3′ UTR and recruit the RNA-induced silencing complex (RISC) to repress mRNA translation and reduce stability. Similarly, AU-rich elements (AREs) within the 3’ UTR can drive mRNA degradation, while stabilizing secondary structures can protect against exonuclease activity and increase mRNA stability.

UTR sequences used in mRNA medicines are typically selected from housekeeping genes, which often possess short, unstructured UTRs that drive constitutive expression. Experimental techniques such as SELEX also can generate artificial UTRs through iterative rounds of competitive selection from randomized RNA libraries. High-throughput screening platforms can also evaluate the impact of large numbers of UTRs on reporter gene expression and identify optimal UTRs according to tissue-specific requirements.

Poly(A) Tail

The mRNA sequence ends with a poly(A) tail, which enhances stability and protects the transcript from exonucleolytic degradation. The poly(A) tail also forms a closed-loop structure that interacts with the translation initiation machinery. Longer poly(A) tails typically increase mRNA half-life and, once the tail becomes shortened, the mRNA becomes susceptible to degradation via the 3′ exonuclease pathway. However, the inclusion of non-adenosine and modified nucleotides in the poly(A) tail can impede exonuclease digestion and improve mRNA stability.

Secondary Structure

mRNA sequences can fold into complex, thermodynamically stable secondary and tertiary structures within the cell. These mRNA structures facilitate complex interactions, perform enzymatic and structural functions, and regulate many aspects of mRNA biology.

mRNA structures in 5’ UTRs can regulate translation in response to environmental cues. For example, RNA hairpins in heat-shock mRNAs melt at higher temperatures to reveal start codons, while riboswitches can turn translation on or off by altering structural conformations in response to binding trigger RNAs. Indeed, the engineering of RNA secondary structures to control mRNA expression has found diverse applications in synthetic biology.

The formation of long double-stranded RNA (dsRNA) can enhance the stability of mRNA. Due to hydrogen bonding and base-stacking interactions, dsRNA forms a more rigid A-helix that is less susceptible to hydrolytic attack and shearing forces than single-stranded RNA. However, extended dsRNA can be recognized by innate immune sensors (e.g., RIG-I or Toll-like receptors) and elicit unwanted immune responses.

A variety of tools for predicting mRNA structure have been developed. Thermodynamic tools such as RNAfold, mfold, and RNAstructure predict the lowest free energy RNA structure using nearest-neighbor models. These models are fast and widely used; however, their accuracy is more limited for longer and complex mRNA sequences. The conserved signatures of mRNA structures, such as nucleotide covariation, also can be identified within sequence alignments across species using tools such as RNAalifold, R-scape, and EvoFold.

The prediction of mRNA structures can be validated using experimental techniques such as SHAPE and DMS methods that chemically probe exposed RNA secondary structures. With increasing data on RNA structures, emerging AI-methods, including SPOT-RNA, MXfold2, and RNAformer, use neural networks trained to learn folding rules and predict challenging structures such as pseudoknots and longer mRNA sequences.

Design Software

Several software tools have been developed to assist the sequence design of mRNA medicines. Algorithmic approaches, such as mRNArchitect, systematically explore the combinatorial sequence space to identify the optimal solutions that balance multiple hard and soft constraints like codon usage, GC content, and RNA secondary structures. Alternative methods, such as LinearDesign, use dynamic programming to optimize mRNA secondary structure and enhance stability.

AI approaches, such as machine learning, have proven adept at resolving codes, including transcription factor- and ribosome binding, and splicing elements, from NGS datasets. These models are trained on large-scale data sets, such as RNA sequencing, genomic data, ribosome profiling, and reporter assays, and can model the complex relationship between mRNA sequence and performance.

Examples such as mDD-0 and RNA Diffusion use discrete diffusion models that iteratively refine random sequences into optimized mRNA constructs by probabilistic sampling of codon patterns and motifs. GEMORNA and PARADE use deep generative models to produce novel codon-optimized and UTR sequences, while mRNA2vec and UTR-LM use language models to predict mRNA performance and identify new regulatory elements.

Conclusion

Much of the life cycle and regulation of endogenous mRNA are driven by its primary sequence. In addition to encoding proteins, mRNAs contain sequence elements that interact with the cellular machinery and regulate their translation. As our understanding of mRNA biology advances, this informs the improved design of mRNA medicines with the desired expression profile.

Advances in AI, such as the development of foundational models for mRNA design and RNA structural prediction, have the potential to model the complex relationships between mRNA sequence and expression, stability, and immunogenicity. However, these generalized models must be trained and tested on large and diverse data sets, such as RNA sequencing and high-throughput assays, and experimentally validated. Nevertheless, this provides further opportunities to advance the design of complex and effective mRNA medicines.

About The Authors:

Helen M. Gunter, Ph.D., is a senior research scientist at the University of Queensland's Australian Institute for Bioengineering and Nanotechnology and the BASE mRNA Facility. With over 17 years of experience in genomics research across Australia, Germany, and the UK, her work focuses on developing tools to enhance the design, quality, and efficacy of mRNA medicines.



Professor Timothy Mercer, PhD., is with the Australian Institute for Bioengineering and Nanotechnology (AIBN), The University of Queensland (UQ), and has expertise in RNA biology, genomics, and bioinformatics. Mercer’s research in gene expression and transcriptomics have been developed into a range of biotechnologies that have been adopted in both research and clinic use. He is founding director of the BASE mRNA facility, which is focused on the research and manufacture of new mRNA medicines.