A direct solution to the crystallography phase problem

Credit: RL Kingston, RP Millane, IUCrJ 9648 (2022)

For determining the structure of proteins and other biological macromolecules, x-ray diffraction is a workhorse—the method accounts for 86% of the nearly 200,000 experimental structures archived in the Protein Data Bank, a global open-access repository. With a suitable choice of solvent, many proteins—but notably, not all (see the article by Mike Martynowycz and Tamir Gonen, PhysicsTodayJune 2022, page 38)—can be coerced to form crystals.

X rays scatter off a protein crystal into a diffraction pattern that’s uniquely determined by the protein’s electron distribution. But x-ray diffraction inherently suffers from what’s known as the phase problem: Detectors record only the intensity of the scattered x rays, yet both amplitude and phase information are needed to computationally reconstruct the protein structure.

The phase problem has garnered attention for over a century, and various solutions have been put forward. (See, for example, the article by Manuel Guizar-Sicairos and Pierre Thibault, PhysicsToday, September 2021, page 42.) Most are indirect methods that require substantial experimental effort or preexisting models and do not always succeed. Richard Kingston of the University of Auckland and Rick Millane of the University of Canterbury, both in New Zealand, have now demonstrated a general technique that can directly reconstruct phase information from only the diffraction data and an estimate of the solvent concentration in the crystal, as long as the solvent concentration is sufficiently high.

A ring of yellow ribbon-like structures with blue ribbon-structures overlaid on the lower-left corner
Credit: RL Kingston, RP Millane, IUCrJ 9648 (2022)

The new ab initio approach imposes two constraints on the computed protein structure: The crystal’s solvent region is essentially featureless, and the protein region has a characteristic distribution of electron densities. It also employs an iterative algorithm that can start with random phase assignments and still converge on a global solution—no initial phase estimates are needed. For computational efficiency, the researchers split the problem into two parts: first determining the approximate molecular shape (shown in the top image on the left for one test protein, with the actual shape on the right) and then determining the phases of the diffraction pattern . Looking for clusters of similar shape and phase solutions across multiple computational runs helps identify and promote the correct solutions.

To demonstrate the practicality of their technique, the researchers tested it on 42 proteins with known structures. The second image illustrates the result of one test: The calculated electron-density map (gold) agrees well with the known structure (ribbons). The researchers show that their approach routinely converges to the correct solution when the solvent concentration is at least 70%. About 4.5% of protein crystals are expected to fall into that range. But 19% will have a solvent fraction of 60% or more, so the researchers are optimistic that even small improvements in the method will bring many more protein structures into reach. (RL Kingston, RP Millane, IUCrJ 9648, 2022.)

Leave a Comment