• Protein and RNA Dynamics Play Key Roles in Determining the ...

Sponsored Links

Download the ebook

doi:10.1016/j.jmb.2005.01.046 J. Mol. Biol. (2005) 347, 719–733
Protein and RNA Dynamics Play Key Roles in
Determining the Specific Recognition of GU-rich
Polyadenylation Regulatory Elements by Human Cstf-64
Pritilekha Deka1, P. K. Rajan1, Jose Manuel Perez-Canadillas2 and
Gabriele Varani1*
Department of Biochemistry The N-terminal domain of the 64 kDa subunit of the cleavage stimulation
and Department of Chemistry factor (CstF-64) recognizes GU-rich elements within the 3 0 -untranslated
University of Washington region of eukaryotic mRNAs. This interaction is essential for mRNA 3 0 end
Seattle, WA 98195-1700, USA processing and transcription termination, and its strength affects the
2 efficiency of utilization of different polyadenylation sites. The structure of
MRC Laboratory of Molecular
the RNA-binding N-terminal domain of CstF-64 showed how the
Biology, Hills Road, Cambridge
N-terminal RNA recognition motif of CstF-64 recognizes GU-rich RNAs.
CB2 2QH, UK However, it is still perplexing how this protein can bind selectively to
RNAs that are rich in G and U residues regardless of their detailed
sequence composition, yet discriminate effectively against non-GU-RNAs.
We investigated by NMR the dynamics of the CstF-64 RNA-binding
domain, both free and bound to two GU-rich RNA sequences that represent
polyadenylation regulatory elements. While the free protein displays the
motional properties typical of a well-folded protein domain and is
uniformly rigid, the protein–RNA interface acquires significant mobility
on the micro- to millisecond time-scale once GU-rich RNAs binds to it.
These motional features, we propose, are intrinsic to the functional
requirement to bind all GU-rich sequences and yet to discriminate against
non-GU-rich RNAs. This behavior may be a general mechanism by which
some RNA-binding proteins are able to bind to classes of sequences, as
opposed to a well-defined sequence or consensus.
q 2005 Elsevier Ltd. All rights reserved.
Keywords: RNA processing; protein dynamics; RNA-binding proteins;
*Corresponding author nuclear magnetic resonance; RNA recognition
Introduction is also required for transcription termination4 and
therefore closely coupled with transcription itself.5,6
Approximately 30% of all human mRNAs con- The efficiency of mRNA 3 0 end processing is
tain alternative polyadenylation signals,1 and an controlled by regulatory cis-acting RNA elements
increasing number of developmental and differen- and their interactions with trans-acting protein
tiation decisions are executed by alternative poly- factors. In the best-studied situation, poly(A) site
adenylation of the same mRNA,2,3 Polyadenylation selection is regulated by altering the intracellular
levels of general processing factors, the most
important of which appears to be CstF-64, a highly
Abbreviations used: RRM, RNA recognition motif; conserved RNA-binding protein.7–9 While several
CstF, cleavage and stimulation factor; DSE, downstream aspects of the biochemistry of polyadenylation are
sequence element; CPSF, cleavage and polyadenylation now understood,2,7,10 the structural knowledge of
specificity factor; HSQC, heteronuclear single quantum
the basic components of the 3 0 end processing
coherence; NOE, nuclear Overhauser enhancement;
NOESY, NOE spectroscopy; ss, single-stranded; CPMG, apparatus and their interaction with each other and
Carr–Purcell–Meiboom–Gill. with transcription remains very limited.11–14
E-mail address of the corresponding author: In higher eukaryotes, the core polyadenylation
[email protected] signal comprises two major elements in addition to
0022-2836/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.
720 Protein Dynamics and RNA Recognition
Figure 1. Schematic represen-
tation of the cleavage and poly-
adenylation complex of higher
eukaryotes (adapted from Perez-
Canadillas & Varani 13). The
AAUAAA polyadenylation signal
is recognized by CPSF, while the
CstF heterotrimer binds to the
GU-rich downstream elements to
form a strong cooperative complex
with CPSF. Cleavage factors (CF Im
and CF IIm) and the poly(A) poly-
merase (PAP) are then recruited
for cleavage and polymerization of
the poly(A) tail. The domain struc-
ture of CstF-64 is shown at the
the cleavage site itself (often a CpA dinucleotide) binding proteins that bind to loosely defined
(Figure 1): the highly conserved AAUAAA hexa- sequence elements and it is functionally significant.
mer 10–30 nucleotides upstream of the cleavage site GU-rich elements are poorly conserved to allow
and a poorly conserved yet essential GU-rich regulation of poly(A) site selection in conjunction
downstream sequence element (DSE).7,10 These with the nearly universal AAUAAA signal. The
two RNA elements are recognized by two protein relatively weak interaction of both CPSF and CstF
complexes that assemble onto the pre-mRNA to with the regulatory signals allow for efficient co-
form the pre-cleavage complex (Figure 1). The regulation of polyadenylation. In studying the
poly(A) signal (AAUAAA) is recognized by the structure of the CstF-64 complex with a DSE
cleavage and polyadenylation specificity factor sequence mimic,13 we observed considerable
(CPSF) through its 160 kDa component (CPSF- broadening of key interfacial residues upon RNA
160). The DSE is recognized instead by the hetero- binding. We suggested that these dynamic pro-
trimeric cleavage and stimulation factor (CstF) cesses could be related to the unusual specificity of
through its 64 kDa subunit.15,16 Both CPSF and CstF-64 protein itself. Do protein (and RNA)
CstF bind RNA weakly, but form a strong coopera- dynamics play a role in dictating the specificity
tive complex when bound together to the same pre- profile of this protein and other proteins with
mRNA. Enzymatic factors responsible for the diffuse specificities?
phosphodiesterase and polymerase activities (CF Here, we investigate how CstF-64 protein
Im, CF IIm, PAP, etc.) are then recruited to the CstF/ dynamics changes upon binding to GU-rich
CPSF/pre-mRNA complex. Because the AAUAAA sequence elements and correlate our observation
sequence is very highly conserved, the relative of increased motion on the micro- to millisecond
strength of competing poly(A) sites is defined time-scale with the “diffuse” specificity of CstF-64.
primarily by the distance between the cis-acting We observe unfolding of a key structural element of
sequences (AAUAAA and DSE) and by the affinity the protein upon RNA binding and considerable
of the DSE/CstF-64 interaction. Therefore, under- intermediate time scale dynamics in the protein, but
standing how CstF-64 recognizes GU-rich regu- only upon RNA binding. We propose that this
latory elements is a key goal in understanding dynamic behavior is a key aspect of the specificity
constitutive and regulated polyadenylation. of CstF-64 and therefore of its biological function.
CstF-64 is a multi-domain protein15,16 containing
an RNA-binding region (amino acid residues 1–111)
that is both necessary and sufficient for DSE Results
recognition. We recently reported the structure of
the RNA-binding domain of CstF-64 and studied Spectral assignments
how CstF-64 protein binds to G/U-rich RNAs.13
However, CstF-64 does not recognize a single RNA Typical 1H-15N HSQC spectra for free CstF-64
sequence or consensus with high affinity and N-terminal domain and for its RNA complex with
selectivity. Rather, it has the ability to bind many 5 0 -GUGUGUGUUG-3 0 RNA are shown in Figure 2.
GU-rich (U-rich in yeast) sequences regardless of For both the free and RNA-bound protein, back-
their detailed composition, while discriminating bone amide chemical shift assignments were
effectively against other RNAs.15,16 The unusual available to us from the previous study.13 The
specificity of CstF-64 is shared by other RNA- identification of cross-peaks prior to the analysis of
Protein Dynamics and RNA Recognition 721
Some residues were excluded from the analysis of
the relaxation data because the corresponding
resonances were overlapped in the spectra. For
the free protein, the 15N relaxation properties for
85/104 residues found within structured regions of
the proteins (residues 8–111) could be measured
and analyzed reliably, but residues 3 through 7 were
not included in the analysis. They are unfolded both
in the presence and in the absence of RNA and do
not contribute to binding: their dynamic properties
are uncorrelated to the rest of the domain. The
remaining 20 residues that could not be analyzed
are the five proline residues plus D12 and R13
(overlapped with two residues in the N terminus of
the protein), R16 overlapped with S17; V42 with
A73, F45 with V89; F82 with D90, A93 with K96 and
Figure 2. Superposition of the 1H-15N HSQC spectra of K98 with L100 and G107 overlapped with an
free 15N-CstF-64 (red) and the 15N-CstF-64/(GU)4UG unassigned N-terminal residue. Similarly, 23 resi-
complex (blue) at a ratio of 1:1 protein to RNA. Folded dues were excluded from the analysis for the
Arg side-chain resonances are in orange/green. Both protein–RNA complex (five Pro and overlapped
samples contain approximately 1 mM CstF-64 in 50 mM
potassium phosphate buffer (pH 6.0), in 95% H2O/5%
residues R13 and G107, plus L77 with N22, A86
H2O. The spectra were recorded at 25 8C on a Bruker with A73, F19 with E95 and K102 with R13).
Avance 500 MHz spectrometer. Residues after A93 were excluded from the analysis
of the complex, because the C-terminal helix
unfolds upon RNA binding.13
the relaxation data was further validated by the Measurement of T1, T2 and heteronuclear NOEs
analysis of 2D heteronuclear single quantum
coherence (HSQC) and 3D nuclear Overhauser 15
N T1 , T2 and heteronuclear NOEs were
enhancement spectroscopy (NOESY)-HSQC spec- recorded as described in Materials and Methods.
tra. Verification of the assignments of the NH and Typical T1 and T2 relaxation decay curves for a
N resonances was obtained from the daN and dNN representative residue (F36) are shown in Figure 3.
connectivities for the b-sheet and a-helical regions In the remainder, only the data concerning the
of the protein, respectively, as expected from the
known structure.13 The only unassigned residues
belong to the linker between the protein and the
N-terminal His-tag.
Titration data showed that the chemical shift
changes in the 15N-labeled protein saturate at an
equimolar molar ratio of CstF-64 to RNA, confirm-
ing the 1:1 stoichiometry of the complex. The
changes in 1H chemical shift upon complex for-
mation are less than 0.3 ppm for most residues13
and the trimmed average 1H chemical shift change
was w35 Hz; as expected, the largest changes in
chemical shift are seen in the RNA-binding region
of the protein. The 15N chemical shift changes are
less than 1.5 ppm for most residues; larger changes
are only seen in the C-terminal helix (A93, K95, N97,
E100) that undergoes unfolding upon RNA binding
and in the b-sheet region of the protein (F45, R46,
K57, N90) that is the RNA-binding region of the
protein.13 The dissociation constants were found to
be 7 mM and 14 mM for the protein-(GU)4UG and
protein-(GU)4 complexes, respectively. Under these
conditions, all residues are in the fast exchange Figure 3. Normalized peak intensities as a function of
regime for the second complex and in the fast-to- relaxation delays for 15N-relaxation experiments carried
out at 750 MHz. The plots corresponds to (a) 15N T1 and
intermediate exchange regime for the first complex,
(b) 15N T2 measurements for residue F36 in the free
as described.13 However, for all residues (in both protein ($) and in the two protein–RNA complexes:
complexes): kdiss[Dd, where Dd is the 15N chemi- (GU)4UG (,) and (GU)4 (B). The results of duplicate
cal shift difference (in Hertz) between the free and experiments are included to provide a sense of the
RNA-bound forms of CstF-64. reproducibility of the data.
722 Protein Dynamics and RNA Recognition
complex of CstF-64 with the GUGUGUGUUG RNA
will be presented and analyzed; the data on the
second complex studied in this project (GUGU-
GUGU) are very similar.
The average T1 was 547 ms for the free protein at
500 MHz with an average trimmed error of 2.9%,
while the average T1 at 750 MHz was 970 ms, with
an average trimmed error of 4.6%. Trimmed errors
were calculated by excluding the top and bottom
10% of the data to eliminate outlying data from the
analysis. The average T2 for the well-folded part of
the protein was 106 ms at 500 MHz, with an average
trimmed error of 3.8%, while the average T2 at
750 MHz was 91 ms, with an average trimmed error
of 3.3%. Average trimmed errors for the hetero-
nuclear NOE measurements were 1.7% and 2.0% at
500 MHz and 750 MHz, respectively. The increased
uncertainty at 750 MHz reflects the relatively poor
sensitivity of a console that is now ten years old, but
it remains within the error range (5%) used to
validate the data analysis protocol.17,18
Concerning the protein–RNA complex, the aver-
age T1 for the well-folded part of the protein was
600 ms at 500 MHz, with an average trimmed error
of 6.1%, while the average T1 at 750 MHz was
877 ms, with an average trimmed error of 7.1%. The
average T2 for the well-folded part of the protein
was 93.8 ms at 500 MHz, with an average trimmed
error of 5.2%, while the average T2 at 750 MHz was
73.7 ms, with an average error of 7.0%. The average
values for the heteronuclear NOE were 0.789 and
0.813, with average trimmed errors of 7.9% and
4.9% at 500 and 750 MHz, respectively. The
increased uncertainty for the protein–RNA complex
is largely due to the presence of resonances that
experience a significant increase in the line width
(residues R46, T53, K57, Y59, R88). The resulting
weak signals made the quantitative analysis of their
relaxation properties more uncertain. All these
residues belong to the b-sheet or to loop 3, the
RNA-binding region of the protein.
The observed relaxation rates and heteronuclear
NOEs for the free protein and protein–RNA
complex at 500 MHz and 750 MHz are shown in
Figure 4(a) and (b), respectively. In general, T1
values scale with field as expected, with a small
increase in the spin-lattice relaxation time from
500 MHz to 750 MHz. The behavior of T2 relaxation
times is also as expected, with only small changes
between 500 MHz and 750 MHz. Although the field
dependence of T1 and T2 for residues 48–68 may at
first appear anomalous, the independent analysis of
Figure 4. Relaxation parameters versus residue number
the data at 500 MHz and 750 MHz demonstrates
for (a) free CstF-64 and (b) CstF-64/(GU)4UG complex.
The secondary structure of the protein is shown at the top. (see below) that the data reflect the expected
Top to bottom: 15N T1; 15N T2 and heteronuclear NOE. The dependence of T1 and T2 on molecular size (free
data from 500 MHz is shown in black (C); while the data protein versus complex) and field. T1 increases with
from 750 MHz is shown in blue ( ). Error bars represent field for both the protein and the protein–RNA
uncertainties in the fit of the primary relaxation data to complex; however, the increase is slightly less for
exponential decays, as described in Materials and the complex, because of the variation of T1 with
Methods. Residues for which no results are shown overall rotational correlation time. The observed
correspond either to proline or to overlapped cross-peaks
decrease in T2 occurs because of the increases in
that could not be analyzed quantitatively. Residues with the
largest uncertainties generally correspond to exchange- field and in molecular size.
broadened residues in the protein–RNA complex.
Protein Dynamics and RNA Recognition 723
Qualitative analysis of 15N relaxation data and of we observed with U1A protein, where sequence-
the changes in dynamics upon RNA binding dependent variations in T2 were quenched upon
RNA binding.22
A qualitative analysis of the heteronuclear NOE Shorter than normal T2 values are most clearly
values along the sequence suffices to reveal inter- seen in the strands b2 and b3, but they are also
esting motional trends. In the free protein, residues observed for some residues belonging to strands b1
at the very N and C termini exhibit lower NOEs and and b4 and to loop 3. These effects are much clearer
significantly increased T2 values, as expected for in the data collected at 750 MHz, consistent with
regions undergoing fast internal motions (D8–A10 their origin being in exchange phenomena. Faster
and L104 onwards). The most obvious changes relaxation cannot be due to increased exchange
upon RNA binding occur at the C terminus of the with water, since most of these residues map to
domain (A93 onwards). Residues within helix C the RNA-binding surface of the protein and are
display sharper line-widths in the HSQC spectrum therefore protected from solvent in the complex.
and the heteronuclear NOEs for the corresponding Furthermore, intrinsic amide exchange is relatively
residues were generally less than 0.6 in the complex, slow at pH 6 and does not occur on the micro- to
suggesting significantly increased pico- to nano- millisecond time-scale. Variations in T2 could also
second internal motions. The reduced hetero- be due to anisotropic diffusion, although their
nuclear NOE values are matched by decreased T1 magnitude and the field dependence of the effects
values at 750 MHz. The analysis of the NOE argue against this explanation. The analysis of
patterns of the free protein and protein–RNA relaxation data using the model-free formalism23,24
complex reveals that this behavior can be attributed and the relaxation dispersion data confirm that the
to the unfolding of the C-terminal helix.13 Thus, decreased T2 values are due to genuine confor-
unlike U1A where RNA-binding led to a confor- mational exchange by attributing high Rex values to
mational change in a C-terminal helix but not to its these same residues.
unfolding,19,20 the C-terminal helix of CstF-64
(which is much more tightly associated with the Quantitative analysis of the relaxation data
b-sheet surface than U1A13) unfolds upon RNA
binding. In other cases of RRM-RNA recognition, The T1, T2 and heteronuclear NOE are related to
formation of a helix following the last b-strand the spectral density functions J(u) describing
occurred instead upon RNA binding.21 atomic motion within the protein by well-known
Other clear changes in the NMR relaxation expressions.25,26 Motional properties are extracted
properties of CstF-64 upon RNA binding involve from the relaxation data either by using the spectral
the 15N T2. The T1 and T2 versus protein sequence densities themselves27,28 or through the so-called
profiles are featureless in RNA-free CstF-64 protein. model-free approach.23,24 In the ModelFree
In other words, the relaxation properties for the approximation, parameters describing the time-
RNA-free CstF-64 protein are largely independent scale of the internal motion, the order parameters
of the location of the amide within the structure, and the anisotropy of global reorientation are
with the exception of the unfolded tails. However, extracted from the relaxation data under the
RNA binding introduces significant sequence- assumption that global reorientation of the protein
specific differences in protein dynamics, particu- and internal motions are uncoupled. We extracted
larly on T2. Several amides located within the these motional parameters from the primary relax-
central strands of the b-sheet protein relax quicker ation data by using an algorithm based on Bayesian
than the rest of the protein: R46, K57, T59, F61, C62 statistics17,18 that analyzes the data collected at two
and E63. This is the complete opposite from what fields simultaneously. In addition, we also analyzed
Table 1. Rex values obtained by ModelFree analysis of the relaxation data and from relaxation dispersion experiments
Rex (sK1) at 500 MHz Rex (sK1) at 500 MHz Rex (sK1) at 750 MHz
Residue relaxation dispersiona from ModelFree ModelFree
G21 2.3 – –
S44 3.13 2.36 3.83
F45 10.99 7.18 15
R46 5.72 4.35 8.96
L47 3.43 2.31 2.92
D50 – 3.52 5.41
T53 11.34 – –
K55 2.14 2.14 4.89
K57 – 4.28 10.26
Y59 5.93 4.95 8.43
F61 3.34 3.38 –
C62 3.51 3.21 –
E63 2.62 – 4.84
D90 6.02 3.13 6.15
Rex from relaxation dispersion experiments were obtained by using the equation: Rex Z ½Reff ð1=tCP / 0ÞK Reff ð1=tCP /Nފ.52,53
2 2
724 Protein Dynamics and RNA Recognition
Table 2. ModelFree analysis of the relaxation parameters extended ModelFree model (S2, S2 , te, Rex) is used at
of the protein–RNA complex at 500 MHz and 750 MHz all times to describe the internal dynamics. The
S2 750 MHz
uncertainty in the estimated ModelFree parameters
S2 Bayesian S2 500 MHz ModelFree is obtained by generating 20,000 Monte Carlo
analysis ModelFree analysis samples per residue distributed according to the
density P(S2, te, Rex, tim;app , tRi) (see equation (2) in
b-Strands 0.79G0.06 0.84G0.06 0.91G0.06
a-Helices 0.84G0.02 0.89G0.02 0.93G0.02 Materials and Methods).
Loops 0.82G0.04 0.86G0.03 0.93G0.04 Examples of the Monte Carlo simulations used to
estimate the anisotropic motional properties of
The average values of S2 for various domains are shown.
S2 values obtained from the Bayesian statistical method are also
the protein are shown in Figure 5(a) and (b). The
shown for comparison. width of the bell-shaped curve at half maximum
represents the standard deviation of the distri-
bution. The absence of Monte Carlo samples at
RaxialZ1 (Figure 5(a)) indicates that an anisotropic
the relaxation data at the two fields separately with diffusion model is required to fit the data satisfac-
the algorithm used in most studies of protein torily. However, the degree of anisotropy was found
dynamics (ModelFree 4.15).29 The results of the to be low, as indicated by the value of Rasym close
Bayesian analysis are shown in Figure 6 and the (but not equal) to 1 (Figure 5(b)). The expectation
results for ModelFree are summarized in Table 1 values and standard deviations for the anisotropic
and Table 2. We observe no systematic deviation tensors extracted from the statistical analysis
between data analyzed with different algorithms. are: DisoZ21.9(G0.12) msK1, RaxialZ0.66(G0.04),
For example, the S2 values were found to be RasymZ0.83(G0.02), fZ338(G21)8, qZ33.4(G3.7)8
within 7% of each other for most residues. and cZ152(G8.6)8. These results are consistent
The extraction of motional parameters from the with the known structure of the protein. As
NMR relaxation data requires a description of the described in Materials and Methods:
overall tumbling of the molecule in solution,
including its anisotropy.30,31 In the analysis with 2Dxx
Raxial ¼
ModelFree, the anisotropy and an initial estimate of Dyy þ Dzz
the correlation time tm is obtained by using routines
such as R2R1_TM and Quadric Diffusion. The and:
Bayesian analysis is on its own a probability-based Dyy
method to find the overall tumbling parameters that Rasym ¼
fit the data best. Anisotropy is taken into account by
calculating the expected tðiÞ
m;app value as a function of This leads to the following values of the aniso-
six tensor parameters Diso, Raxial, Rasym, f, q and c tropic tensor: DxxZ16.37 msK1; DyyZ22.5 msK1 and
using equation (5) in Materials and Methods. The DzzZ27.1 msK1. For comparison, data were also
direction cosines of the 1H-15N vectors were subjected to a standard analysis of global diffusion.
calculated from the coordinates of the protein The local effective correlation times were calculated
(PDB code 1P1T). The Euler angles f, q and c give on the basis of the T1/T2 ratios using the program
the orientation of the principal axis system (PAS) R2R1_TM. Using these correlation times, the global
relative to the molecular frame (the coordinate diffusion parameters were then calculated using the
system of the RCSB PDB file). The probability of any program Quadric_Diffusion. The axial diffusion
of the parameters having a certain value was then tensors were found to fit the data with statistically
evaluated by taking the product of the marginal significant improvement over the isotropic fit, but
probability densities over all N residues evaluated there was not much improvement in going from the
at their respective expected tðiÞ m;app values using axially symmetric case to fully anisotropic. The
equation (6). Axial symmetry was enforced, when following results were obtained for the anisotropic
needed, by imposing RasymZ1.0; when Rasym is tensor: DisoZ21.4 msK1; Dxx/DyyZ0.80 and 2Dzz/
allowed to depart from the value of 1, the asym- (DxxCDyy)Z1.44. These values compare very well
metric model is obtained. to those obtained by the Bayesian statistical
In ModelFree, five models of increasing com- method.
plexity were introduced in succession to extract The product of Pðtim;app jRi Þ over all the N residues
motional parameters (see Materials and Methods was used to calculate the rotational correlation time
for a more detailed description). Statistical tests are tm for the protein,17,18 which was found to be
then used to validate the reliability of these global 7.37 ns. This value is typical for a protein of this
parameters and their uncertainty.29 Bayesian size.26,32 A comparable analysis was done on the
statistics provides instead an efficient way to relaxation data of the protein–RNA complex using
combine the information from multiple experi- the coordinates from the PDB file of the protein to
ments and multiple fields. In this approach, global describe anisotropic tumbling. The anisotropic
tumbling and internal motional parameters and tensors extracted from the analysis were:
their uncertainties are estimated without any DisoZ19.08(G0.208) msK1, RaxialZ0.59(G0.10) and
prior model selection of the kind performed in RasymZ0.895(G0.034). The overall rotational corre-
ModelFree. In the Bayesian approach, instead, the lation time was 8.7 ns. This value is significantly
Protein Dynamics and RNA Recognition 725
Figure 5. Representative Monte Carlo samples (20,000 simulations) generated from the conjoint analysis of the
relaxation data collected at 500 MHz and 750 MHz for the (a)–(c) free and (d) RNA-bound forms of the protein. Monte
Carlo samples were generated from the posterior probability Pasym(Diso, Raxial, Rasy

Use: 0.0678