A structure-based mechanism for tRNA and retroviral RNA remodelling during primer annealing

To prime reverse transcription, retroviruses require annealing of a transfer RNA molecule to the U5 primer binding site (U5-PBS) region of the viral genome. The residues essential for primer annealing are initially locked in intramolecular interactions; hence, annealing requires the chaperone activity of the retroviral nucleocapsid (NC) protein to facilitate structural rearrangements. Here we show that, unlike classical chaperones, the Moloney murine leukaemia virus NC uses a unique mechanism for remodelling: it specifically targets multiple structured regions in both the U5-PBS and tRNAPro primer that otherwise sequester residues necessary for annealing. This high-specificity and high-affinity binding by NC consequently liberates these sequestered residues—which are exactly complementary—for intermolecular interactions. Furthermore, NC utilizes a step-wise, entropy-driven mechanism to trigger both residue-specific destabilization and residue-specific release. Our structures of NC bound to U5-PBS and tRNAPro reveal the structure-based mechanism for retroviral primer annealing and provide insights as to how ATP-independent chaperones can target specific RNAs amidst the cellular milieu of non-target RNAs.

To prime reverse transcription, retroviruses require annealing of a transfer RNA molecule to the U5 primer binding site (U5-PBS) region of the viral genome 1,2 . The residues essential for primer annealing are initially locked in intramolecular interactions [3][4][5] ; hence, annealing requires the chaperone activity of the retroviral nucleocapsid (NC) protein to facilitate structural rearrangements 6 . Here we show that, unlike classical chaperones, the Moloney murine leukaemia virus NC uses a unique mechanism for remodelling: it specifically targets multiple structured regions in both the U5-PBS and tRNA Pro primer that otherwise sequester residues necessary for annealing. This highspecificity and high-affinity binding by NC consequently liberates these sequestered residues-which are exactly complementary-for intermolecular interactions. Furthermore, NC utilizes a step-wise, entropydriven mechanism to trigger both residue-specific destabilization and residue-specific release. Our structures of NC bound to U5-PBS and tRNA Pro reveal the structure-based mechanism for retroviral primer annealing and provide insights as to how ATP-independent chaperones can target specific RNAs amidst the cellular milieu of non-target RNAs.
*These authors contributed equally to this work. 1  a, b, Secondary structures of U5-PBS (a) and tRNA Pro (b) with complementary PBS/anti-PBS and PAS/anti-PAS sequences shown in blue and green, respectively. The two internal loops in U5-PBS are boxed, and the first, second and third high-affinity NC binding site sequences are shown in red, orange and yellow, respectively. The tRNA Pro residue numbering reflects the canonical numbering scheme for all tRNAs and, hence, the number 17 is omitted. c, NMR structure of the free U5-PBS RNA. Insets: top, zoom-in view of the sequestration of U 145 and 11 U 146 residues by U 109 and C 108 of the UCUG 110 NC binding site; middle, G 110 aromatic ring is turned outside of the helix in a conformation poised for NC interaction; bottom, the minor, extruded conformation of G 155 is shown. PAS residues are sequestered by alternating G-C and G-U base pairs in both conformations. d, One-dimensional slice of a 1 H- 13 C spectrum depicting the NC tail-mediated increase of the minor conformation (indicated by an asterisk) of the tetraloop C 128 residue. On the basis of the population distribution in the pre-bound state, we estimate the extruded conformation to have a free energy of ,1.4 kcal mol 21 greater than the stacked conformation. e, Structure of NC bound to UCUG 110 via the zinc finger (the Trp 35 and Tyr 28 stacking interactions are in black) showing that the protein tails can extend to contact the UCAG 130 tetraloop and residues in and near the (G 97 A, G 155 U) internal loop. f, Left, secondary structures of the wild-type (WT), G110U and deletion mutant (DM). Right, reduction of viral infectivity is observed in mutants (n 5 6). Error bars indicate standard deviations (n 5 6 for both packaging and infectivity experiments). As a negative control, heat-inactivated (HI) virions were used for infection. A packaging assay was also done to confirm that the mutations do not affect genome encapsidation (see Extended Data Fig. 4g).
Retroviruses preferentially use specific host tRNAs as primers for the first step of reverse transcription; for example, human immunodeficiency virus requires tRNA Lys 3 , while Moloney murine leukaemia virus (MLV) uses tRNA Pro 1,2 . Two distinct sequences in the tRNA anneal to complementary sequences in the retroviral U5-PBS domain to form the initiation complex: the 39 end of the tRNA acceptor stem anneals to the 18-nucleotide PBS sequence 7 , while a portion of the tRNA TYC arm basepairs with a primer activation signal (PAS) 8,9 (Fig. 1a, b). However, for primer annealing to occur, favourable intramolecular associations involving both the PBS and PAS in U5-PBS and the complementary anti-PBS and anti-PAS sequences in tRNA must first be disrupted by NC chaperone proteins. Mechanistically, all NC proteins have thus far been thought to function as classical, ATP-independent chaperones, using both their zinc-finger domain(s) and unstructured tails for this process 10,11 (Extended Data Fig. 1a). ATP-independent chaperones are known to permit RNA molecules to access higher energy conformations and then allow refolding by rapidly dissociating from the RNA during the process 12 . These transitory, low-affinity interactions generally necessitate the coating of an RNA with many molecules of chaperone 12 (Extended Data Fig. 1b). In addition, and in contrast, to the transient interactions of NC proteins [13][14][15] , the NC zinc fingers are also capable of sequence-specific, high-affinity binding to RNA 16 . However, this mode of interaction has, until now, been thought to be used exclusively for the recognition of the genome via interaction with the Y-genome packaging signal during viral assembly (Supplementary Discussion 1). To gain insights into the mechanism of NC-mediated primer annealing in MLV, a prototypical retrovirus, we solved structures of both the genomic U5-PBS RNA and the tRNA Pro primer, both in the free form and in complex with MLV NC proteins, by NMR spectroscopy.
The free U5-PBS is a largely linear molecule capped by a structured tetraloop (UCAG 130 ) and contains one single-nucleotide bulge (A 122 ) and two internal loops ((UCUGA 111 , UU 146 ) and (GA 98 , GU 156 )) ( Fig. 1a, c, Extended Data Figs 2, 3 and Extended Data Table 1). In the absence of NC, the (UCUGA 111 , UU 146 ) internal loop maintains a distinct, folded configuration in which residue C 108 of the NC binding site sequesters the 59 end of the PBS sequence ( 11 U 146 ) via an intramolecular ribose zipper interaction 17,18 (superscript denotes the PBS position from 59 to 39) ( Fig. 1c and Extended Data Fig. 3e-g). Residue U 109 of the NC binding site also base pairs with U 145 , which is the first template residue read by reverse transcriptase. Continuous intramolecular base stacking interactions from U 144 to 12 G 147 further serve to tether the 59 end of the PBS inside the internal loop. On the other side of the bulge, continuous nuclear Overhauser enhancements (NOEs) indicate the stacking of C 106 with C 108 and, hence, extrusion of residue U 107 . Importantly, residue G 110 exhibits a syn glycosidic torsion angle and faces the major groove, making it poised for interaction with NC (Fig. 1c). In the NC-bound structure, the NC zinc finger binds the UCUG 110 sequence of the internal loop (dissociation constant (K d ) 5 33 6 3 nM; Extended Data Fig. 4a) in a mode similar to that previously described for UCUG 309 in the MLV Y-genome packaging signal 19-21 (see Supplementary Discussion 1 and 2). Notably, since NC-binding residues are initially involved in sequestering the first template (U 145 ) and the first PBS ( 11 U 146 ) residues, the well-defined internal loop structure is mutually exclusive with NC binding. Thus, NC binding liberates U 145 and 11 U 146 for primer annealing and reverse transcription initiation (Fig. 1e).
Whereas the NC tails are not involved in binding the Y-packaging signal, in U5-PBS, the NC tails specifically remodel the (GA 98 , GU 156 ) internal loop and the UCAG 130 capping tetraloop (Fig. 1e). In the absence of NC, the six PAS residues in the lower stem, 11 G 99 to 16 U 104 (subscript with plus sign denotes the PAS position from 59 to 39), form base pairs with PBS residues (Fig. 1c and Extended Data Fig. 2a), and hence are not available for primer annealing. The preceding (GA 98 , GU 156 ) internal loop, however, exists in multiple conformations; in the major form, a continuous internal stacking of all residues leads to the formation of tandem A 98 -110 G 155 and G 97 -111 U 156 non-canonical base pairs, while in the minor, destabilized form, the 110 G 155 and 111 U 156 PBS residues are extra-helical ( Fig. 1c and Extended Data Figs 2a, 3h-j). Similarly, the capping UCAG 130 tetraloop forms a YNMG-type structure 22 (Extended Data Fig. 3a), with the C 128 base either stacked on A 129 or, in a small population, extruded from the structure ( Fig. 1d and Extended Data Fig. 3c). Interaction with the NC tails alters the equilibria between the two conformations in favour of the minor, destabilized conformations ( Fig. 1d) and hence leads to release of the 10th and 11th ( 110 G 155 and 111 U 156 ) PBS residues and the C 128 tetraloop residue. The destabilization of the latter is important for cooperative binding of a second NC to the tetraloop (Extended Data Fig. 4a; see Supplementary Discussion 3). Thus, the positively charged NC tails do not globally destabilize U5-PBS but instead specifically target residues inherently predisposed for destabilization. Furthermore, because PAS residues immediately follow the destabilized internal loop (Fig. 1a), the NC tails also specifically perturb residues 11 G 99 , 12 G 100 , and 13 G 101 (Extended Data Fig. 3k). Interestingly, both NC tails (Ala 1-Arg 17 and Arg 4-Leu 56) remain disordered, indicating that destabilization must occur via transient interactions that are nevertheless residue-specific owing to the inherent accessibility of the particular RNA residues and the orientation constraints imposed by the zinc finger binding to UCUG 110 ( Fig. 1e and Extended Data Fig. 3l). In live viruses, a G110U mutant designed to liberate the U 145 and 11 U 146 residues and a deletion mutant designed to sequester them exhibited only  The zinc finger interaction with the GUUG 9 binding site is shown in red. The tails are excluded for simplicity since they form random coils and do not specifically interact with the tRNA (Extended Data Fig. 9f). Top inset shows a close-up of the zinc finger interaction, with G 9 inserted into the hydrophobic pocket of the NC protein due to stacking interactions of Trp 35. Bottom inset shows the interaction of the variable loop with the core of the tRNA molecule that is maintained after interaction with NC-1.

RESEARCH LETTER
56% and 7%, respectively, of wild-type MLV infectivity (Fig. 1f). While the severe infectivity defect of deletion mutant virions confirms the importance of high-affinity NC binding in releasing the 59 end of the PBS, the partial defect observed for G110U virions indicates that tail-mediated interactions are also required for optimal function.
In tRNA Pro , NC binding occurs first at GUUG 9 (site T1; 4 6 2 nM) followed by the anticodon loop AGGG 37 (site T2; 13 6 3 nM) and then the D-stem loop UAUG 23 (site T3; 834 6 343 nM) (Extended Data Fig. 4b). Importantly, titration of MLV NC into the human immunodeficiency virus (HIV) primer, tRNA Lys 3 , did not lead to NMR chemical shift perturbations, thus confirming the specificity of MLV NC for tRNA Pro (Extended Data Fig. 1f, g; for assignment strategy of tRNA Pro see Extended Data Figs 5-8 and Supplementary Discussion 4). The structure of the first NC bound to tRNA Pro shows the zinc finger making extensive contacts with the GUUG 9 sequence that links the acceptor stem with the D-stem loop, with G 9 stacking within the zinc finger pocket (Fig. 2, Extended Data Table 1 and Extended Data Fig. 9a). Importantly, before NC binding, all four GUUG 9 residues are involved in intramolecular interactions: G 6 and U 7 are part of the acceptor stem, and U 8 and G 9 are involved in core tertiary interactions (Fig. 2a, b and Extended Data Figs 6, 8a). Similar to U5-PBS, because these contacts are mutually exclusive with NC interactions, NC binding leads to major RNA remodelling events. First, since residues G 6 and U 7 are initially base paired with anti-PBS residues 210 C 67 and 211 A 66 (superscript with minus sign denotes the anti-PBS position from 39 to 59), respectively, NC binding releases these sequestered anti-PBS residues ( Fig. 2c and Extended Data Fig. 9b, c). Second, because U 8 and G 9 are involved in core tertiary interactions via triple base formation with the D-stem loop (U 8 :A 14 -A 21 , G 9 :C 12 -G 23 ) (Fig. 2a, b and Extended Data Fig. 8a), NC binding disrupts these core tertiary interactions (Extended Data Fig. 9c). As a result of this rearrangement, D-stem residues A 21 and G 23 , which are part of the UAUG 23 site T3 sequence, are made partially available for the third NC binding event. Globally, while the helical arrangement between the TYC-stem and the acceptor stem is lost, the helical stacking between the D-stem and anticodon stem is preserved (Extended Data Fig. 9d In comparison with sites T1 and T3 (see later), there is a marked mechanistic difference in the remodelling activity of the NC that binds the second site, T2-this NC uses its tails to achieve residue-specific destabilization. After the first NC binding event, the second NC accesses the residual elbow structure by anchoring its zinc finger to the distal AGGG 37 sequence in the anticodon loop, with the G 37 base stacking inside the zinc finger ( Fig. 3b and Extended Data Fig. 9g). While the elbow interaction is maintained, the NC tails specifically perturb D-loop and TYC-loop residues G 16 and A 59 , respectively ( Fig. 3c, d), which are in close proximity to each other (Fig. 3b). Prior to NC binding, these residues are extruded out of their respective loops and are hence available for NC interaction. Thus, as in the interaction with U5-PBS, the NC tails do not cause global destabilization of tRNA Pro but instead target specific, already accessible residues for remodelling.
We also structurally characterized the third NC binding event using the tRNA Pro -T1 M T2 M construct (see Supplementary Discussion 5). Our structures show that NC zinc finger binding to UAUG 23 in the D-stem disrupts the entire helix and, because the D-stem architecture is crucial for the D-loop-TYC interaction, eliminates the residual elbow tertiary structure (Fig. 3a, e and Extended Data Fig. 9h, i). Consequently, the interactions between D-loop residues G 18 and G 19 and TYC-loop anti-PAS residues 21 C 56 and 22 U 55 are lost, leading to the release of these a, A portion of the two-dimensional nuclear Overhauser effect spectroscopy (NOESY) spectrum with H8/H19 correlations for tRNA Pro showing the perturbation of anticodon residues only after titrations above 1.0 equivalent of NC, thus confirming the sequential binding mode. b, 1 H-13 C two-dimensional U-labelled heteronuclear multiple quantum coherence (HMQC) spectra showing that whereas the U8 resonance perturbation occurs upon the addition of one equivalent of NC, the perturbation of D-loop-TYC signals occur only upon the third NC binding, thus indicating that the elbow contacts are maintained during the first two binding events. c, 1 H-13 C two-dimensional HMQC spectra showing selective perturbation of the extruded G 16 residue in the D-loop as evidenced by a marked chemical shift change is shown by asterisks. d, Regions of two-dimensional NOESY for NC complexes with tRNA Pro and tRNA Pro -T1 M T2 M . Top and middle panels show that the protected A 57 in the TYC loop is not perturbed, but the extruded A 59 is affected upon the second NC binding. In the T1 M T2 M mutant (bottom panel), NC binding to the third site disrupts the TYC-D-loop interaction, resulting in a chemical shift change for residue A 57 . Thus, the lack of A 57 perturbation in the native tRNA Pro also demonstrates that the elbow region is maintained after the second NC binding. e, Structure of NC bound to tRNA Pro sites T1 and T2 via the zinc fingers. The structures show that the NC-2 protein tails can extend to contact the D-loop-TYC-loop elbow region. Inset shows the proximity of the extruded G 16

LETTER RESEARCH
sequestered anti-PAS residues (subscript with minus sign denotes the anti-PAS position from 39 to 59) (Fig. 3d). NC binding to site T3 thus serves to dismantle the residual tRNA tertiary structure before primer annealing. Importantly, destabilization of the D-and TYC-loop residues by the anticodon site T2 NC tails is maintained even after the elbow contacts are dismantled by the third NC binding event (Fig. 3c), prohibiting the freed TYC-loop from forming an intrinsically stable structure 23 (see Extended Data Fig. 6) and ensuring that the released anti-PAS residues will remain accessible during primer annealing.
Our data demonstrate how MLV NC 'captures' specific portions of both the U5-PBS and tRNA Pro through high-affinity interactions with residues that are normally engaged in intramolecular stabilizing interactions and results in the subsequent 'release' of these sequestered residues, thereby reducing the energetic barrier for primer-template complex formation (Fig. 4). The combinations of liberated and pre-exposed residues within tRNA Pro and U5-PBS are exactly complementary and therefore poised for intermolecular base pairing. Furthermore, the complementarity of liberated sequences to regions that are already exposed in the counterpart RNA allows remodelling to occur with a limited number of NC molecules (Fig. 4). Indeed, the presence of four NC molecules is sufficient for formation of a functional U5-PBS-tRNA Pro complex (Extended Data Fig. 4c, d). Importantly, because the NC binding sites are perfectly positioned in close proximity to, but not overlapping with, the RNAannealing sequences, subsequent dissociation of NC from the annealed complex is not required 24 . In fact, the presence of NC has been shown to be important for the elongation step of reverse transcription 25,26 .
In addition to defining previously undiscovered roles for high-affinity NC binding events in the retroviral lifecycle, our study has implications for the location, timing and specificity of primer annealing (see Supplementary Discussion 6). Like MLV NC, some other RNA chaperones and remodellers bind with high affinity to their substrates; however, they typically require the input of additional energy for subsequent dissociation 27 . The MLV NC-mediated capture-and-release mechanism described here is distinct from mechanisms used by other known ATP-independent RNA chaperones 28,29 : During the capture-and-release RNA remodelling, NC uses high-affinity interactions to bind a limited number of sites with high specificity. Furthermore, unlike typical chaperones, which cause global destabilization to allow access to higher energy conformations, the mechanism of NC-mediated remodelling in primer annealing involves the formation of stable, lower energy complexes with RNA that cause strategic local destabilization of the regions important for annealing. Consistent with this, the thermodynamic analyses of all U5-PBS-NC and tRNA Pro -NC interactions show high binding affinities with entropically driven profiles (see Supplementary Discussion 7). This entropy-driven, capture-and-release remodelling thus represents the first example, to our knowledge, of a new mechanism by which RNA chaperones can specifically select their specific targets from a sea of cellular RNAs.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.

METHODS
Sample preparation. The MLV NC protein was prepared as described previously 21 . The HIV NC protein was made in an analogous manner to that of the MLV NC protein; the NC sequence was amplified from pNL4-3 plasmid 30 , cloned into pGEX-6p1 (GE Healthcare) vector, and the purified fusion protein was cleaved from glutathione S-transferase (GST) through the use of PreScission Protease. NMR experiments were used to confirm the correct folding of the proteins. The HIV and MLV U5-PBS RNA samples were also made using methods previously described 21 . tRNA Pro and tRNA Lys 3 DNA templates were constructed through annealing and ligation of oligonucleotides designed to contain a T7 promoter and two 29-O-methoxy modified nucleotides at the 59 end of the template strand to maintain 39 end homogeneity 31 .
For the U5-PBS construct used for NMR studies, a single base-pair swap (G96C: C157G) was made to ameliorate the spectral overlap problem that arose owing to five consecutive guanosine residues in a row. This swap does not change the secondary or tertiary structure of the U5-PBS domain (Extended Data Fig. 2b). The G110U mutation was structurally characterized to ensure that the mutation does not give rise to an alternate structure that sequesters the 59 end of the PBS (data not shown). Isothermal titration calorimetry. Prior to all ITC experiments, NC proteins and RNAs were exchanged into ITC buffer (25 mM NaCl, 1 mM MgCl 2 , 0.1 mM ZnCl 2 , 10 mM Tris, pH 7.0). For each ITC experiment, reaction heats (mCal s 21 ) were measured for 1.5-2 ml titrations of 70-90 mM NC into 5 mM RNA at 30 uC using an iTC200 machine (MicroCal). Titration curves were analysed using ORIGIN (OriginLab). NMR data acquisition, resonance assignment and structure calculations. For NMR experiments, RNA samples were resuspended in NMR buffer (10 mM Tris-HCl, pH 7.5, and 10 mM NaCl). NMR data for tRNA Pro were collected with and without the presence of 1 mM MgCl 2 and 100-150 mM NaCl. For NC complex studies, the samples were prepared in buffers containing 1 mM MgCl 2 . NMR data were acquired using a Bruker 700 MHz spectrometer equipped with a cryoprobe. Spectra were recorded at 35 uC, with the exception of data for the imino region, which were also collected at 10 uC. Non-exchangeable assignments were made using two-dimensional NOESY, two-dimensional HMQC and three-dimensional HMQC-NOESY experiments using both unlabelled and nucleotide-specific labelled (G-, U-, A-, and C-15 N, 13 C-labelled and C/U/A-deuterated) or protonated samples. Residual dipolar coupling (RDC) data for U5-PBS were collected using Pf1 phage (12 mg ml 21 , ASLA Biotech) as has been described 32 . Initial structures were calculated as described 26 using manually assigned restraints in CYANA 28 . In contrast to all long range triple bases in tRNA Pro , the two base pairs between the D-loop and TYC-loop were not observable by NMR: these hydrogen bonds were modelled on the basis of our knowledge of tRNA structures. The crystal structure of tRNA Phe was used as a guide for soft phosphate-phosphate distance restraints within a particular stem 33 . For U5-PBS-NC and tRNA Pro -NC complexes, the ten best CYANA models were then used for final structure calculations in AMBER similar to what has been described 34 . Specifically, the refinement included 50,000 steps, with temperature increasing from 0 K to 500 K over the first 12,500 steps, remaining at 500 K over the next 32,500 steps, and then decreasing to 0 K over the next 5,000 steps. These calculations incorporated all upper limit restraints used in CYANA but not the angle restraints. The individual structures generated were then used for tensor fitting, and the above structure calculation process was repeated with the RDC restraints (for U5-PBS) along with a final minimization that included 8,000 steps. In the final minimization step for the U5-PBS structure, loose hydrogen bond restraints used for the ribose zippers were removed. Several rounds of in vacuo AMBER calculations were done until the distance violations were less than 0.5 Å . Molecular images were generated with PyMOL (http://www.pymol.org). Mutagenesis of pNCS vector for MLV virus production. The pNCS plasmid encoding the MLV genome was a gift from the laboratory of S. Goff. The presence of the repeated U5 sequence at both ends of the proviral DNA required use of the NdeI restriction endonuclease to cleave the pNCS plasmid into two pieces, a 3 kb fragment and a 9 kb fragment. This allowed for selective introduction of the mutations only into the U5 sequence in the 59 long terminal repeat (LTR). The 9 kb fragment was circularized by self-ligation and used as a template to introduce the G110U, deletion mutation or Y C331G mutation using a QuickChange II XL Site-Directed Mutagenesis Kit (Agilent Technologies). Mutant plasmids were then digested with NdeI and re-ligated with the 3 kb fragment to form the complete mutant pNCS. DNA sequencing confirmed both the orientation of the insert and the presence of the desired mutation. MLV virion infectivity and genome packaging assay. 293T and Rat2 cells were grown at 37 uC and 5% CO 2 in DMEM supplemented with 10% fetal bovine serum (FBS) and 5% penicillin-streptomycin. The 293T cells were used for transfectionmediated viral production, while Rat2 cells were used for infectivity assays. Transfection of 293T cells with pNCS was performed at a cell confluency of ,80% using Fugene-6 (Roche). Virion-containing supernatant was harvested 48 h after transfection and filtered through a 0.45 mm filter. Viruses were quantified with the use of exogenous Luciferase RNA template (Promega) provided in excess during qPCR reactions. Purified, exogenous MLV reverse transcriptase (RT) (Promega) was used to generate a linear standard curve (R 2 . 0.98) for the calculation of RT activity. SYBR Green-based qPCR was carried out as described 35 , and dissociation curve measurements were performed at the conclusion of each run to confirm primer specificity. Equal numbers of wild-type or mutant virions were used to infect 293T cells, as has been described 36,37 . As a negative control, separate aliquots of wild-type virions were incubated at 80 uC for 15 min to render the virus non-infectious. The RT-based virus quantification assay was used to quantify the viral yield resulting from infection.
For the genome packaging assay, total virion RNA was extracted from viral supernatant with the QIAamp Viral RNA Mini Kit (Qiagen) using 50-150 ml of viral supernatant per RNA extraction column. Three independent extractions were pooled and viral genomic RNA was quantified by qPCR using a TaqMan probe-based assay system as has been described 38 with an AABI 7900 sequence detection apparatus (Applied Biosystems). Negative control reactions lacking viral RNA template yielded negligible signals, and standard curves were generated using threshold cycle (Ct) values from a range of different input viral DNA (pNCS) concentrations. A two-sample, one-tailed unpaired t-test was performed to compare mean values of RT activity for the using the statistical package Stata (StataCorp), assuming unequal variances and using a 99% confidence interval.
For heat-annealing, 250 pmol of MLV U5-PBS RNA was incubated with 250 pmol tRNA Pro at 50 uC for 5 min followed by 85 uC for 15 min in 50 mM Tris-HCl (pH 7.2), 50 mM KCl and 5 mM MgCl 2 . For NC-mediated annealing, the same amounts of RNA were incubated with five molar equivalents of NC at 37 uC for 16 h. After annealing, 100 U RNase inhibitor and 750 U MLV reverse transcriptase were added to each sample, along with Cy3-labelled dideoxy CTP (ddCTP) to a final concentration of 0.4 nM. Following ethanol precipitation, the RNA was digested by Riboshredder RNase and resolved on a denaturing gel. DNA terminated with Cy3-ddCTP was visualized with an imager.
Extended Data Figure 1 | Specific, higher-affinity interactions of MLV NC with monomeric MLV U5-PBS and tRNA Pro and its implications. a, MLV NC sequence depicting the zinc finger moiety with amino-and carboxyterminal tails. b, Cartoon representation displaying the general mechanism of ATP-independent chaperones. c, Cartoon representation of the E-and B-forms of U5-PBS genomic RNA, including the secondary structure of the Y-genome packaging signal, with the high-affinity sites coloured in red (see Supplementary Discussion 1). d, ITC data for MLV NC binding to native U5-PBS-B (black) and site 1 mutant (V1 M , red) forms. Top, raw ITC data for 1.5 ml titrations of 80 mM NC into 5 mM RNA at 30 uC. Bottom, data following peak integration, with continuous black line representing the fit for a one-site binding model. Elimination of site V1 in the U5-PBS B-form completely abolishes the binding. e, f, ITC data showing weak binding of HIV NC to HIV U5-PBS RNA and tRNA Lys 3 primer. g, Portions of two-dimensional NOESY spectra collected in D 2 O displaying an overlay of free tRNA Lys 3 primer (black) and 0.5 equivalents of MLV NC (red). Lack of any chemical shift perturbation indicates the absence of any interactions between the two molecules. h, Models for primer annealing in MLV and HIV-1: in MLV (left), high-affinity binding of the NC domain (red) of Gag to the U5-PBS region of the viral genome and to the tRNA Pro primer promotes tRNA annealing in the cytosol of the host cell before virion budding. Further supporting this model is the evidence that MLV virions are only slightly enriched for the tRNA Pro primer (see Supplementary Discussion 1). In HIV-1 (right), NC does not bind with high affinity to the U5-PBS domain or to the primer tRNA Lys 3 . Furthermore, the tRNA Lys 3 primer is highly enriched in HIV-1 virions (see Supplementary Discussion 1), which leads to primer annealing after virion budding, mediated by weak, non-specific interactions of HIV-1 NC with the viral and primer RNAs. For the sake of simplicity, only one of the two packaged retroviral genome copies is shown. portion of the 1 H-1 H two-dimensional NOESY spectrum showing imino-toimino NOEs for the U5-PBS RNA. Data were collected at 10 uC in 10 mM NaCl and 10 mM Tris (pH 5.0). Imino-to-imino connectivities for the upper stem are shown in orange, while those for the lower stem are shown in blue. Inset, secondary structure of the U5-PBS construct used for structural and biochemical studies. Grey residues indicate non-native positions: the G96C:C157G base pair represents a base-pair swap alteration that was made to aid in unambiguous assignments of the otherwise five consecutive Gs and the terminal G-C base pair was added for the purposes of transcriptional efficiency and for SmaI digestion of the DNA template. Bottom, portions of the 1 H-15 N two-dimensional HSQC spectra for 15 N, 13 C-labelled U5-PBS. The imino resonances for U-and G-labelled samples are shown, as are amino resonances for the A-labelled sample. The A 98 amino resonances had unusually downfield chemical shifts due to interaction with the G 155 inside the helix. Formation of the stacking interactions of G 155 is shown on the right with a zoom-in inset. The stacking is also confirmed by a G 155 to U 156 walk in the D 2 O two-dimensional NOESY spectrum (data not shown). b, Portion of the 1 H-1 H two-dimensional NOESY imino spectrum overlay for U5-PBS constructs with and without the G96C:C157G base-pair swap. No change in the imino-toimino connectivities were observed for the RNA except at the expected mutation site (shown in boxed region). In fact, the GA 98 , GU 156 bulge immediately above the G-C swap has the exact chemical shifts, line widths and so on in both constructs, demonstrating that they have similar secondary structures. Importantly, similar binding affinities with the NC protein were obtained for both constructs. c, U5-PBS NMR ensemble alignments: the ensemble of the lowest-energy AMBER structures is shown. They are aligned by the top portion (left) or bottom portion (right) of the molecule. NC binding sites V1 (UCUG 110 ) and V2 (UCAG 130 ) are shown in red and orange, respectively. Left, alignment of residues 112-125 and 132-144, yielding a root mean squared deviation (r.m.s.d.) value of 0.6 6 0.2 Å . Right, alignment of residues 94-106 and 147-159, excluding G155, which appears from NOE data to exhibit multiple conformations. The resulting r.m.s.d. for alignment of the bottom residues is 1.2 6 0.5 Å .

LETTER RESEARCH
Extended Data Figure 3 | Structural characterization of U5-PBS in the free and bound form. a, Three-dimensional NOESY-HMQC strip plot of the 13 C-edited 1 H-1 H planes for C 128 and A 129 aromatic and H19 protons. NOEs from A 129 H8 to the preceding C 128 ribose protons are indicative of stacking interaction between the two residues. In contrast, the minor conformation that is populated upon NC binding does not give rise to any inter-residue NOEs (data not shown). b, Structure of the UCAG tetraloop. The U-G hydrogen bonds, characteristic of UNCG tetraloops, are shown in green. c, Lack of NOEs between U 121 and A 122 indicate a break in regular stacking interactions between these two residues. Strip plots of 13 C-edited 1 H-1 H planes for U 121 H29 and A 122 H2 from three-dimensional NOESY-HMQC spectra are shown. The U 121 H29-C 123 H6 and A 122 H2-G 135 H19 long-range NOEs are indicated in red. d, The A 122 bulge forms a triple base interaction. Hydrogen bonds are shown in green, and the A 122 H2-G 135 H19 distance is indicated by a solid black line. e, Strip plots of 13 C edited 1 H-1 H planes from three-dimensional NOESY HMQC spectra of U5-PBS, showing interresidue NOE connectivities. Black, dashed lines show sequential inter-residue connectivities, while the orange lines represent long-range interactions. The strong G 110 H8-to-H19 intraresidue NOE is indicative of a syn glycosidic torsion angle. Residues in the PBS and NC binding site are labelled with blue and red, respectively. f, Lowest-energy AMBER structures of the (UCUGA 111 , UU 146 ) internal loop, showing the flexibility of U 107 due to lack of inter-residue NOEs. Alignment of residues 108-111 and 144-146 yields an r.m.s.d. of 0.8 6 0.3 Å . g, Portions from three-dimensional NOESY-HMQC spectra collected for selectively labelled 13 C, 15 N-U5-PBS. As evidence for the ribose-zipper motif, intense H19-H19 inter-residue connectivities are observed between U146 and C108, which are located on opposite strands. h, Portions from three-dimensional NOESY-HMQC spectra collected for selectively labelled 13 C, 15 N-U5-PBS in complex with NC. Both the ribose and aromatic protons of U 146 do not give rise to any inter-residue NOEs, indicating the lack of U 146 interaction with neighbouring residues. An inset of the secondary structure of the bulge region is shown to visualize the availability of the released uracils. i, j, 1 H-13 C two-dimensional HMQC spectra collected with 0 (black) and 0.9 (red) equivalents of NC. Perturbation of G 155 , U 156 signals towards the minor (indicated by an asterisk) conformation are shown by dashed lines. Blue asterisk indicates the emergence of new freed uridine resonances after NC addition. k, Specific perturbation of G 100 is seen by selective loss of imino NOE to G 101 . l, A portion of twodimensional NOESY spectra collected in D2O for a 1:1 Y-packaging signal-NC complex (black) and U5-PBS-NC complex (red). The complete match of the two data sets indicates that, similar to the Y-NC interaction 19 , the NC tails continue to exist in a random coil confirmation upon complex formation.      a, Imino resonances for full-length tRNA Pro indicate proper tertiary structure formation. Portions of the 1 H-15 N two-dimensional HSQC spectra for U-(top) and G-labelled (centre) full-length tRNA Pro constructs, with imino assignments indicated. A portion of the 1 H-1 H two-dimensional NOESY spectrum showing imino-to-imino NOEs is shown at the bottom. b, Twodimensional NOESY of a G H sample of the T2 M construct wherein only the guanosines were protonated and the other three nucleotides were deuterated. This mutant was chosen to unambiguously assign the G resonances from the D-loop and the variable loop by reducing the spectral complexity from the G-rich anticodon loop. c, Strips from fully protonated and G-protonated twodimensional NOESY spectra showing direct evidence for the D-loop-TYC interaction (see Supplementary Discussion 4). The left panel shows data collected at low temperature that allow us to confirm the long-range assignment because the A 58 H19 NOE spin diffuses to the aromatic proton of G 19 via G 18 . d, Strips from fully protonated sample showing the NOE walk from the D-stem to the loop residue G 15 that forms the critical Levitt base pair with C 48 . This arrangement leads to an unusual downfield shift of the C 48 H19 (see Supplementary Discussion 4).