Commissioned Papers Synthetic Genomics Governance



Commissioned Papers Synthetic Genomics: Risks and Benefits for Science and Society

This volume of papers accompanies the report Synthetic Genomics: Options for Governance

Synthetic Genomics: Risks and Benefits for Science and Society

(this page blank)

Synthetic Genomics: Risks and Benefits for Science and Society

COMMISSIONED PAPERS

TABLE OF CONTENTS

Robert Jones Sequence Screening……………………………………………………………...……1-16

Yogesh Sanghvi A Roadmap to the Assembly of Synthetic DNA from Raw Materials……..………….17-33

Ralph S. Baric Synthetic Viral Genomics…………………………………………………………….35-81

Marc S. Collett Impact of Synthetic Genomics on the Threat of Bioterrorism with Viral Agents..………………………………………………………………………..83-103

Diane O. Fleming Risk Assessment of Synthetic Genomics: A Biosafety and Biosecurity Perspective………………………...………………………………….105-164

Franco Furger From Genetically Modified Organisms to Synthetic Biology: Legislation in the European Union, in Six Member Countries, and in Switzerland…………………………………………………………..……..165-184

Synthetic Genomics: Risks and Benefits for Science and Society

(this page blank)

Synthetic Genomics: Risks and Benefits for Science and Society

The following papers were commissioned for the project Synthetic Genomics: Risks and Benefits for Science and Society. These papers formed the basis of many discussions at project workshops and at a large invitational meeting. The information elicited from these meetings, and from the commissioned papers themselves, formed the basis of our report Synthetic Genomics: Options for Governance (http://dspace.mit.edu/handle/1721.1/39141). The views and opinions expressed in these commissioned papers are those of the authors of the papers and not necessarily those of the authors of the report, or of the institutions at which the authors work.

Citation: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Please cite individual pieces as follows: Baric RS. 2006. Synthetic Viral Genomics. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 35-81. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Collett MS. 2006. Impact of Synthetic Genomics on the Threat of Bioterrorism with Viral Agents. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 83-103. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Fleming DO. 2006. Risk Assessment of Synthetic Genomics: A Biosafety and Biosecurity Perspective. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 105-164. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Furger F. 2006. From Genetically Modified Organisms To Synthetic Biology: Legislation in the European Union, in Six Member Countries, and in Switzerland. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 165-184. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Jones R. 2005. Sequence Screening. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 1-16. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007. Sanghvi Y. 2005. A Roadmap to the Assembly of Synthetic DNA from Raw Materials. In: Working Papers for Synthetic Genomics: Risks and Benefits for Science and Society, pp. 17-33. Garfinkel MS, Endy D, Epstein GL, Friedman RM, editors. 2007.

Synthetic Genomics: Risks and Benefits for Science and Society

(this page blank)

Sequence Screening Robert Jones Craic Computing LLC, Seattle, WA

Introduction The use of biological agents in acts of terrorism has received heightened interest since the mailing of anthrax spores in the United States in 2001. Many scenarios have been considered in which bacterial and viral pathogens could be produced and employed as weapons. While some may seem unlikely, the scientific community has a responsibility to assess all threats and to develop ways to monitor, and perhaps counter, any attempts to carry them out. The focus of the current study concerns the use of DNA synthesis and genetic manipulation to create or modify pathogens. It can be argued that a terrorist group would be much more likely to use a 'conventional' pathogen, such as anthrax, than to design and engineer a modified organism. While this is convincing, there are several strong reasons why someone might wish to employ synthetic DNA. Conventional threats require that the terrorist has access to the pathogen. Some pathogens, such as anthrax, can be isolated in the wild in certain parts of the world. It is clearly possible to culture natural isolates, but the process can be laborious and may yield a strain that is not well suited for use as a biological weapon. In most cases, the easiest sources for pure cultures of pathogens are the laboratories that work on them. In the US such labs are strictly regulated with measures such as background checks on researchers, careful inventory management and high levels of building security. These make it extremely difficult for anyone outside those laboratories to access the pathogens that they contain. Smallpox virus is an example of a pathogen that, having been eradicated in the wild, could only be obtained from a few specific laboratories, all of which operate under tight security. The alternative approach that concerns us here is that someone could synthesize the entire genome of a dangerous pathogen, such as smallpox, from scratch. This requires no access

Synthetic Genomics: Risks and Benefits for Science and Society

to the secure laboratories. Potentially it requires no prior experience in working with the pathogen. Most troubling is the fact that such synthesis could be accomplished in a conventional molecular biology laboratory, without the need for specialized equipment and without attracting attention to the project from others. The technology required to synthesize the genome of an entire viral pathogen, or genes thereof, is already available. Rapid development in the field of synthetic biology is destined to make this process easier, faster and cheaper. This evolution in technology brings with it tremendous benefits to biotechnology and medicine but its potential for abuse is a cause for concern. Being able to determine if nefarious activity is underway will become an important requirement for the regulatory authorities. Here there is some cause for optimism. Currently the vast majority of DNA synthesis is performed by service companies or by in-house central facilities in universities and large companies. The DNA synthesis industry provides researchers with custom DNA at such low cost and with such convenience that almost all synthesis work takes place in a relatively small number of facilities. A request for DNA synthesis requires that the customer provide the sequence of the molecule. This creates the opportunity to monitor or screen input sequences for matches to a database of pathogen sequences. Finding a positive match at the time the order was received would allow the vendor to alert the relevant authority and to delay shipment of that DNA. I have written a software package, called BlackWatch, that implements sequence screening. This paper will describe the operation of this system, its current shortcomings and ways that these might be addressed. 1. The Business of Synthetic DNA At this point it is worth reviewing the state of the synthetic DNA industry as it stands today. Not only does this provide the venue in which to monitor attempts at engineering COMMISSIONED PAPERS

2

Synthetic Genomics: Risks and Benefits for Science and Society

pathogens, but its particular constraints and operating procedures have a significant practical impact on the way any sequence screening strategy might be implemented. The chemical synthesis of oligonucleotides (oligos), short fragments of DNA, became widely available about 20 years ago with the manufacture of desktop DNA synthesizers. Oligos found widespread use in DNA sequencing with an equal, and perhaps greater, application in Polymerase Chain Reaction (PCR) experiments. The high demand for oligos led to the creation of companies that performed contract DNA synthesis on request. The convenience and low cost of using these vendors has driven substantial growth and competition in this industry and today hardly any research laboratories synthesize oligos themselves. Fierce competition between the synthesis companies has driven prices down to the point where profit margins are minimal. In fact certain companies appear to offer the service as a 'loss leader' in order to attract customers to their other more lucrative products. These companies try to differentiate themselves on the basis of easy ordering via the Web, fast turnaround and value added options, such as chemical modifications of the oligos. The customer can visit the web site of the vendor, create an account, enter in the sequence of the oligo they want synthesized, enter their credit card details and hit submit. A tube containing the DNA will arrive by express delivery the next day or day after. The cost for this service is remarkable. A typical oligo of perhaps 20 to 25 nucleotides in length will cost around $0.30 per nucleotide, a total of less than $10 for a completely custom organic chemical synthesis. As a result, using one of these services is the preferred option for almost all laboratories. There are probably several hundred companies or university facilities that offer oligo synthesis services around the world. The throughput of the larger companies is impressive. Integrated DNA Technologies (http://www.idtdna.com) of Iowa, states in its press releases that it synthesizes between 15,000 and 25,000 oligos daily and has more than 60,000 customers worldwide.

COMMISSIONED PAPERS

3

Synthetic Genomics: Risks and Benefits for Science and Society

With improvements in the technologies behind DNA synthesis and gene assembly, it has become feasible to synthesize entire genes from sets of oligos. Several companies provide this service, some of which derive their entire income from gene synthesis. The technology is more involved than oligo synthesis but costs can be kept low, while maintaining accuracy, through the extensive use of laboratory automation. The distinction between companies that synthesize short oligos and those that synthesize entire genes, assembling these from sets of oligos, is important in the context of sequence screening. The turnaround time for the synthesis of a gene of a few thousand nucleotides is a couple of weeks and the cost can be as low as $1.60 per nucleotide. At this price point it becomes easier to synthesize certain genes than to try to isolate them from their native genomes. There are around 25 companies in the US that offer this service with about the same number in the rest of the world, mostly in Europe. However, it would appear that most of that work is performed in a small subset of these companies. The next step in the evolution of these technologies is the synthesis of entire genomes. Already the genomes of poliovirus (Wimmer et al. 2002. Science 297: 1016-1018) and bacteriophage phiX174 (Smith et al. 2003. Proc. Natl. Acad. Sci. USA 100: 15440-15445) have been synthesized from scratch and used to create infectious virus and phage particles, respectively. Work

is

underway

at

the

company

Synthetic

Genomics

(http://www.syntheticgenomics.com) and at the J. Craig Venter Institute to identify the minimal set of genes that are necessary to sustain the bacterium Mycoplasma genitalium. Once this minimal genome has been defined, the company intends to use it as the foundation for a range of engineered synthetic organisms that possess novel characteristics. If the history of DNA sequencing, PCR and oligo synthesis serve as a guide, we can expect the synthesis of genes and small genomes to become routine tools for molecular biology over the next decade.

COMMISSIONED PAPERS

4

Synthetic Genomics: Risks and Benefits for Science and Society

One final aspect of the business of synthetic DNA is of particular importance. Confidentiality and the protection of intellectual property are extremely important to the biotechnology industry. Oligo vendors help ensure confidentiality by not asking customers about the nature of the sequences that they request or the uses to which they will be put. Indeed, most corporate customers would immediately stop using these vendors if they were required to disclose any information about the requested sequences. This intentional ignorance about the sequences on the part of the vendor could play into the hands of anyone intent on synthesizing or engineering a pathogen. Widespread use of sequence screening software has the potential to remove this vulnerability while still retaining confidentiality for the vast majority of DNA synthesis customers. . 2. Sequence Screening The basic idea behind sequence screening is straightforward. Sequences of oligos or entire genes that are to be synthesized are compared against a specific curated database of sequences from known pathogens, the 'Select Agents'. Any request that produces a significant match to a pathogen is tagged as being of interest and the site administrator is alerted. I have implemented this approach in the BlackWatch software system. This consists of a custom sequence database, the BLAST sequence comparison software from NCBI (Altschul et al., Nucleic Acids Res. 1997, v25, pp3389-3402) and a set of Perl wrapper scripts that manage the user interface, run the BLAST searches and process the results. The system can be accessed from a web interface, the UNIX command line and from custom interfaces to relational databases. A schematic diagram of the system in shown in Figure 1.

COMMISSIONED PAPERS

5

Synthetic Genomics: Risks and Benefits for Science and Society

Figure 1: Structure of the BlackWatch Software. Input sequences are passed to the core scripts from one of the interfaces. An assessment is made on the basis of length as to whether each batch contains short oligos or longer sequences. BLAST searches are initiated against the select agent sequence database. The system currently runs a blastn search of input nucleotide sequences against the nucleotide database and a blastx search of translated nucleotide sequences against a parallel database of protein sequences from the same pathogens. tblastx searches of translated nucleotide sequences against the translated nucleotide database will be introduced in the next version of the software. BLAST results are processed and matches are assessed based on three criteria – absolute score, statistical significance (E-value), and the coverage of the matching segment. Coverage indicates how much of the query sequence is involved in the match. For an oligonucleotide one would expect the entire query sequence to be included in the alignment, whereas perhaps only part of a larger sequence would be involved. A combination of these criteria is used to select positive matches, with different cutoffs used with oligos relative to long sequences.

COMMISSIONED PAPERS

6

Synthetic Genomics: Risks and Benefits for Science and Society

Search results for sequences that do not match are discarded, along with the sequences themselves. This is an important component in protecting proprietary information from customers. Positive search results against the select agent database are then searched against the non-pathogen database to see if they also match there. This is to help resolve false positives and is discussed below. Data are archived for each positive result. These include the input sequence, the raw BLAST output and associated information such as the customer identifier, date and time. The system can be interfaced with relational databases. This will allow it to be driven by production databases at synthesis companies. In the absence of any common architecture for these databases, a custom interface script will have to be written for each company that chooses to set this up. I have successfully integrated the system with an Oracle database during beta testing at a leading oligo synthesis company. Positive matches can be reported to relevant staff by way of email alerts. These include links to the web interface that will bring up the details of the match. The search archives can be accessed by customer ID, allowing the history of sequence submissions to be reviewed. This will be important if a customer submits multiple related or overlapping sequences over a period of time. Comments can be added to each match and these are stored in the archive alongside the BLAST output. So one might record why a single match was assessed as a false positive. Later review in the context of other matches might lead you to change that assessment. Below are some screenshots from the web interface. The first shows the query sequence input screen that will load a FASTA format file or accept sequences that are cut and pasted into the form.

COMMISSIONED PAPERS

7

Synthetic Genomics: Risks and Benefits for Science and Society

Figure 2: Sequence Input Web Page Most searches will not produce matches and these are simply acknowledged as having been run. Positive matches are highlighted with links to the GenBank sequence that was hit, the raw BLAST output, the query sequence, etc.

Figure 3: Example of a Positive Match The BLAST output is available for positive matches, allowing an expert to evaluate the quality of the alignment and thereby assess the likelihood of this being a true or false positive.

COMMISSIONED PAPERS

8

Synthetic Genomics: Risks and Benefits for Science and Society

Figure 4: An Example of Detailed BLAST Output for a Positive Match

You

can

access

a

demonstration

version

of

BlackWatch

at

http://biotech.craic.com/blackwatch. 3. The Custom Sequence Database The database of sequences from select agents is a critical component of the BlackWatch system. Its composition directly influences the numbers of false positive and negative matches, as well as the performance of the search process. Only sequences from defined select agents are included in the database. A critical issue in sequence screening is the potential disclosure of information about customer sequences. By limiting the database to only select agent sequences, the system minimizes this risk. So oligos related to a human gene would not be expected to match anything in the database. Sequences of bacterial origin, for example, have a much higher risk of matching, especially in light of the approach to false positive control. Understanding the probability of finding such matches will be important in the development and adoption of this system. There are two approaches to building the sequence database. The first is to limit the sequences to those of genes known to be involved in virulence, toxin synthesis, etc. This highly focused approach would produce a small database with a low probability of false

COMMISSIONED PAPERS

9

Synthetic Genomics: Risks and Benefits for Science and Society

positive matches. But this approach has several problems. Firstly it requires considerable effort up front in deciding what genes should be included and in extracting only the relevant sequences from GenBank. Secondly it ignores the possibility that genes other than this subset might be employed in the modification of a pathogen. The alternative approach, which is used in BlackWatch, is to include all sequences that have been assigned to any organism on the select agent list. It is relatively straightforward to extract sequences based on the organism tag in a GenBank record and this selection can be fully automated using simple Perl scripts. Minimal up front effort is required and the data can be made available for searching immediately. It also ensures that all the available data is used in searching, with no preconceptions about how sequences might be used. The drawbacks of the approach include the potential for redundant data being included in the database, slowing down searches and perhaps creating ambiguity. Some basic checks for redundancy are currently used in the preparation of the database but these could be improved. Perhaps the major problem is that the approach will include sequences of housekeeping genes, such as those for ribosomal proteins, which are highly conserved between diverse species. This raises the probability of false positives significantly. 4. Composition of the Database All sequences in the database are extracted from the public GenBank database, hosted by the NIH (http://www.ncbi.nlm.nih.gov/Genbank/). This contains sequences for most if not all of the select agents, with complete genomes available for many of the organisms. Anyone attempting to engineer a pathogen using synthetic DNA would be expected to use this same database. No classified or proprietary sequences are included. Not only are these not available to me, but their inclusion would greatly complicate the software and its intended distribution to DNA synthesis companies. The list of organisms for which all available sequences have been extracted is a composite of those included in the CDC select agent rule (42CFR73), the USDA regulations (7CFR331 and 9CFR121) and the Dept of Commerce Export Administration

COMMISSIONED PAPERS

10

Synthetic Genomics: Risks and Benefits for Science and Society

"Commerce Control List" (CCL). The composite list specifies a total of 75 organisms and 22 toxins. The breakdown of these is shown in Table 1.

Host Pathogen Type Viruses Bacteria Fungi Rickettsiae Prions Toxins

Human/Animal

Animal Only

Plant

Total

19 15 2 4 0 22

12 3 0 0 1 0

2 8 2 9 0 0

33 26 11 4 1 22

Table 1: Pathogens in Composite Select Agent List

The list is included as an appendix to this paper and is also available online at: http://biotech.craic.com/blackwatch/regulations/List_of_Select_Agents.pdf Toxins pose a problem for sequence screening. Protein toxins like abrin, ricin and conotoxin are gene products and so DNA and protein sequences for the toxins themselves are available. In the case of mycotoxins, such as aflatoxin, the molecule is not a protein. In these cases the sequences of genes that encode the biosynthetic pathway may be appropriate targets for sequence screening. This component of the database needs further study. It might be advisable to include antibiotic resistance genes in the database as an obvious scenario that we need to consider is that of someone introducing antibiotic resistance into an existing bacterial pathogen. Unfortunately the widespread use of these genes in conventional molecular biology would ensure a very large number of false positive matches. This issue should be revisited once progress has been made dealing with the general problem of false positives.

COMMISSIONED PAPERS

11

Synthetic Genomics: Risks and Benefits for Science and Society

5. Current Implementation of the BlackWatch Software The software is written in Perl and runs on Linux systems. Porting the scripts to other UNIX variants and Mac OS X would be trivial and a port to the Windows operating system should be straightforward. The system is in operation on my web server and has been in production use at Blue Heron Biotechnology in Bothell, WA, where it is used to screen requests for entire gene synthesis. It has also been beta tested for a limited period at a leading oligo synthesis company. They chose not to continue using the system for business reasons. In order for the software to meet the sequence screening needs of the gene and oligo synthesis industry in general, it will require some additional development work. Performance needs to be improved to handle the throughput at large oligo synthesis sites. Integrating the system with existing relational databases that manage orders at these companies needs to be made easier. Most importantly the rate of false positive matches needs to be studied and minimized. 6. False Positives The primary challenge facing sequence screening is to minimize the number of false positive matches. Every match reported by the system needs to be evaluated at some point by an expert. Those that are deemed to be real may trigger the involvement of the regulatory authorities. Every false positive that passes initial scrutiny will waste considerable time and devalue the importance of the approach in the eyes of those authorities. Fine tuning the cutoff values for BLAST score, significance and coverage may help reduce false positives in general but will do nothing to address matches to housekeeping genes, etc. The approach that I am experimenting with at the moment is to use a second sequence database of non-pathogens. Any query sequence that hits the pathogen database is then searched against the non-pathogen, or 'reference', database and the corresponding matches, if any are presented to the user alongside the pathogen hits.

COMMISSIONED PAPERS

12

Synthetic Genomics: Risks and Benefits for Science and Society

Currently the reference database is limited to bacteria and contains the genome sequences for E.coli and B.subtilis. This screenshot shows the results from a search with a ribosomal protein gene from S.typhimurium.

Figure 5: Example of a False Positive Match

This conserved gene has produced a match to the equivalent genes in Y.pestis and Coxiella burnetii in the pathogen database and also to E.coli in the reference database. By comparing the relative scores and significance, a reviewer would judge the query sequence as being more similar to the non-pathogen than to either of the pathogens. Hence this is probably a false positive. The approach appears quite promising but work needs to be done in creating a comprehensive set of related non-pathogen sequences for viruses, etc., and in automating the process of calling false positives. No approach will catch false positives with 100% accuracy and so an expert reviewer will continue to be required. Perhaps the best that can be achieved is to add weight the scoring of matches according to the biological significance of the matching sequence. A very strong match to a sequence involved in anthrax toxin would be a clear positive match. A match to a less important region of the B.anthracis genome would be weighted down. This argues for a sequence database that combines the approach I currently use of capturing all sequences from the pathogens with some degree of expert curation that can define which genes are of particular concern.

COMMISSIONED PAPERS

13

Synthetic Genomics: Risks and Benefits for Science and Society

False positives are inevitably more likely in the case of oligo sequences because of the sequence length. Here there is the opportunity to do some simulations and real world tests to quantify the problem.

7. Future Developments There are many scenarios whereby someone who wished to synthesize or modify a pathogen could use the services of synthesis companies and still evade detection by BlackWatch. Minor variation in sequences, such as third position variation in codons, can already be caught by the blastx searches against protein sequences. Other scenarios include sending orders for overlapping oligos to different vendors or spreading out orders over a period of time so as to avoid revealing the intent behind a project. One way to address this would be to scan the archived searches across customers, or even across synthesis companies, looking for orders that might be related. This would require that the results of screening from all vendors be submitted to a central location where these correlations could be made. I return to this idea at the end of the paper. The technical challenges of making these connections are very interesting, but they go hand in hand with a number of important business and confidentiality concerns. The BLAST sequence comparison software is the obvious choice for comparing relatively large sequences against the database but for oligo comparisons it may be faster to use another approach such as a sequence word lookup table or a suffix tree algorithm. Computational speed could become a problem in high throughput oligo synthesis facilities. The figure of up to 25,000 oligos synthesized per day that Integrated DNA Technologies quotes is sobering. This means that a complete evaluation of each oligo must take place in less than 4 seconds. This can be achieved through a combination of adequate hardware and good software engineering but the system is not currently capable of this throughput.

COMMISSIONED PAPERS

14

Synthetic Genomics: Risks and Benefits for Science and Society

8. Practical Deployment of Sequence Screening

Beyond the purely technical challenges of the BlackWatch package, its performance and the issue of false positives, there are several broader challenges to its practical deployment in the DNA synthesis industry that need to be overcome. We need to make it very easy for a synthesis company to obtain, install and operate the software package. The barrier to its procurement can be reduced by making the software available free of charge. Appropriate software engineering can ensure that it is easy to set up and run. External funding from NIH or another agency will be necessary to support the development and deployment of the software. It is unlikely that the synthesis companies would fund the effort themselves. We need to minimize the cost to the synthesis companies of evaluating the reports that sequence screening will yield. This is the time and effort that staff have to devote to looking at, acting upon, the putative positive matches. Some of these companies, most notably the oligo vendors, operate on very thin profit margins. Any added expense will be most unwelcome, especially if it requires effort on the part of skilled scientists. But beyond these operational issues there are two major challenges that stand in the way of broad deployment—how to assess the validated, significant matches that do emerge from the screening and what to action to take based on that information. Neither role belongs with the DNA synthesis companies. They require expert knowledge and access to specific staff within the regulatory authorities.

COMMISSIONED PAPERS

15

Synthetic Genomics: Risks and Benefits for Science and Society

Conclusion A significant fraction of the synthetic DNA currently being produced today could be monitored by sequence screening at the major oligo and gene synthesis companies. For legitimate customers this process should pose no significant threat to their intellectual property. For a group wanting to engineer a biological weapon, however, screening could serve as a serious deterrent. They would be faced with the choice of potential discovery by the screening software or having to bring the work in-house and significantly increase the level of effort and expertise needed to accomplish their goal. Sequence screening has its limitations, as do most technologies that attempt to monitor threats, but I believe it should play an important role in the development of synthetic biology.

COMMISSIONED PAPERS

16

Synthetic Genomics: Risks and Benefits for Science and Society

A Roadmap to the Assembly of Synthetic DNA from Raw Materials

Yogesh S. Sanghvi Rasayan Inc., Encinitas, California Introduction Until recently, the synthesis of DNA has been a tedious, time consuming, expensive and experimentally challenging task. But advances in automated instrumentation and improved chemistry have now made it possible to make any moderate-length sequence of DNA in any quantity. The ease of automated chemical synthesis of DNA has triggered a whole new industry of low-cost DNA suppliers around the globe. The convenience of ordering DNA sequence by mail has opened new avenues in research both in academia and in the healthcare products developed by pharmaceutical companies. At the same time, these advances have made it theoretically possible to synthesize DNA that could be used to do harm. This article aims to describe the first stages of DNA synthesis, from readily available raw materials to medium-sized segments with a desired sequence (oligonucleotides), and examines whether there are points at which such activities could be, for example, monitored or controlled. Some academic and commercial applications of DNA synthesis require the construction of very small quantities of the desired sequence; others involve synthesis at the gram scale or larger. I provide comments on possible intervention points for both types of application. Terms shown in bold are defined in the glossary. 1. History and key landmarks Our current methods of DNA synthesis have evolved over almost 150 years. In 1869, Friedrich Miescher first isolated a new substance from human pus cells, which he named nuclein. Two years later, he found that the same material could also be isolated from salmon caught in the river Rhine. Subsequently, Richard Altman in 1889, further purified nuclein as a protein-free product that he called nucleic acid. In 1900, Albrecht Kossel

COMMISSIONED PAPERS

17

Synthetic Genomics: Risks and Benefits for Science and Society

studied the chemical composition of nucleic acids, and found that they contained adenine, cytosine, guanine and thymine bases (Figure 1). Kossel was awarded the Nobel Prize for his work. In his acceptance speech he noted that “nucleic acids possess a great biological significance”. In 1902, Emil Fischer received his Nobel Prize for the first chemical synthesis of a purine base. In 1955, the first chemical synthesis of a dimeric block of DNA was accomplished by M. Michelson and Alexander (Lord) Todd. Subsequently, this contribution was recognized by a Nobel Prize to Todd. Next, Har Gobind Khorana and his colleagues showed how a DNA sequence could be assembled via chemical means, now known as the phosphodiester method. In 1976 Khorana with his 19 co-workers reported on the synthesis of a 126-residue long DNA. This project took 8 years; today the same product can be made in one day using an automated DNA synthesizer. The pioneering work of Robert Letsinger, Kevin Ogilvie and Colin Reese using the phosphotriester method also helped to pave the road to solution-phase synthesis of DNA. In the mid-1970’s, the first solid-phase preparation of DNA was performed in the laboratories of Hubert Köster, Michael Gait and K. Itakura. Solid-phase synthesis is the dominant method used today. The specific chemistry we use today came slightly later, in 1981, when Mark Matteucci and Marv Caruthers reported an efficient automated synthesis of DNA employing the P(III) amidite chemistry. 2. Transforming raw materials into the building blocks of DNA DNA is a long chain polymer that is made up of four repeating units called nucleotides. Half of the structure is identical for all four nucleotides, and consists of the sugar and phosphate groups (red boxes, Figure 1). The other half of the structure, the base (blue boxes, Figure 1) comes in four varieties, divided into two groups. The pyrimidines (thymidine and cytosine) each have a six-membered ring containing nitrogen, while the purines (adenine and guanine) have a double ring, a fusion of a six-membered ring with a five-membered ring. In the famous double helix of DNA, these nucleotides line up as

COMMISSIONED PAPERS

18

Synthetic Genomics: Risks and Benefits for Science and Society

(a)

O HO

O

Thymine Base

NH

N

O O

Phosphate linkage

NH2

O P O O

O

N

Cytosine Base

N O

O

Negatively charged phosphate backbone

O P O O

NH2

N O

N

Adenine Base

N N

2-Deoxy sugar O O P O O

O

N O

N

Guanine Base

NH N NH2

HO

H N

(b) O N dR

N

H

H

N

H

N N

N

O N

N dR

A-T Base Pair

dR

H

N

O H

O H

N H

N

N N

N dR

G-C Base Pair

Figure 1: Structural elements of DNA. (a) A single strand of DNA, showing the structure of the bases and the backbone elements. Note that the negative charge on the backbone is balanced by positively charged ions such as Na+ in a solution of DNA. (b) Structure of the base pairs that form when two matched strands of DNA are allowed to pair to make a double helix.

COMMISSIONED PAPERS

19

Synthetic Genomics: Risks and Benefits for Science and Society

pairs: as shown in Figure 1, adenine (A) pairs with thymine (T), and cytosine (C) pairs with guanine (G). Because a six-membered-ring base always pairs with a double-ring base, the spacing between the two strands of DNA is maintained, and the overall shape of the molecule is the same no matter what the sequence. The “backbone” of the structure is also always the same, a repeated pattern of sugar-phosphate groups. It is this uniformity of structure that makes it possible to automate the synthesis of DNA. No matter the sequence to be produced, the chemical reaction required is always the same.

The

problem of making the sequence of DNA needed can thus be reduced to the problem of using the right nucleotide building blocks in the right order.

2.1. Availability of bases All four bases are available in metric ton quantities from a variety of sources. The cheapest suppliers are in China; they sell their product for under $100/Kg. These products are chemically synthesized and are stable indefinitely when stored appropriately. The chemical synthesis of all four bases is straightforward and it can be carried out almost anywhere with the help of easily accessible reagents in a chemical laboratory. However, easy access to the bases does not lead to easy access to the nucleotides that are essential for the assembly of DNA (see below). 2.2 Availability of nucleotides and nucleosides Nucleotides are the key reagents used in DNA synthesis. They can readily be made from nucleosides by adding phosphate groups. As recently as six years ago, all the nucleotides needed for DNA synthesis were made from nucleosides isolated from natural sources, such as fish milt. A flow chart of this isolation process is shown in Figure 2. Several companies, including Yamasa in Japan, Reliable in the USA and ProBioSint in Italy have used this method to produce nucleosides in metric ton quantities. It is not a rapid process (it can take 1.5 years from beginning to end) and it is very labor intensive. Some years ago, attempts began to develop alternative sources for nucleoside production. Today, at least six Asian companies manufacture the pyrimidine nucleosides at low-cost and in metric ton quantities using a completely chemical process, starting with cane sugar

COMMISSIONED PAPERS

20

Synthetic Genomics: Risks and Benefits for Science and Society

(Figure 3). Processing of cane sugar furnishes D-glucose, which is transformed into 2deoxy-D-ribose in just a few steps. Next, the 2-deoxy-D-ribose is converted into a reactive α-chloro-sugar that is easily converted into the pyrimidine nucleosides (T and C). Mitsui Chemicals has developed a process for producing purine nucleosides at a very large scale, using a phosphate analog of 2-deoxy-D-ribose. The new process is patent protected and currently practiced in Japan for the production of the purine nucleosides (A and G).

All four nucleosides are now available in large quantities from chemical

synthesis at a significantly lower cost than the nucleosides isolated from fish milt.

Process Steps

Timelines

Volumes

Salmon (Fish)

September 2005

100,000 Kg

Salmon Milt Cell Digest/DNA Solubilization DNA Salt Precipitation DNA Salt Digestion IE Chromatography 2'-Deoxynucleosides

September 2006

55 Kg

Protected Nucleosides Phosphoramidites Synthetic DNA

March 2006

5.5 Kg

Figure 2: Raw material pipeline from fish to synthetic DNA

COMMISSIONED PAPERS

21

Synthetic Genomics: Risks and Benefits for Science and Society

HO

HO

O HO

Cane Sugar

O OH

OH

HO

OH

HO

D-Glucose

2-Deoxy-D-ribose O

NH2

H3C

N

OH N

O

NH

OH RO

O

O

O

N

O

Cl RO

HO

α-Cl sugar

"T" nucleoside

HO

"C" nucleoside NH2

O N

OH O

G

N H

NH N

OH

NH2

Glycosylation

HO

N N H

O OPO3 HO

N

OH

N

Glycosylation

O

A

HO

"G" nucleoside

"A" nucleoside

Figure 3: Chemical synthesis pathway for nucleosides from cane sugar

Both of the methods of producing nucleosides require significant skill, especially the chemical synthesis approach. Chemical synthesis of nucleosides requires Ph.D.-level chemistry personnel, specialty chemicals and specialized equipment. The most difficult part in the synthesis of nucleosides is to chemically connect a base to the top face (βconnection) of the sugar. An incorrect linkage from the bottom face will result in the formation of α-nucleosides, which are useless for DNA synthesis. Despite easy access to the bases from China, the synthesis of pure β-nucleosides and high purity amidites is not an easy task for a novice in the field. In practice, most DNA synthesis today depends on the availability of nucleotide amidites (Figure 4), since the amidite chemistry is the dominant chemistry used in automated synthesizers. Good quality amidites are essential for successful synthesis of DNA on an automated machine. The production of good quality amidites is also a skilled task, and large quantities of anhydrous solvents and airtight equipment are required.

COMMISSIONED PAPERS

22

Synthetic Genomics: Risks and Benefits for Science and Society Nucleoside DMT O HO

O

O

B

Solid-Support

B

O

B

O

O O

HO

P N

CN

O P O O

O

B

O

2'-Deoxynucleosides (B = T, C, A or G)

O

Amidites (B = T, CBz, ABz or GIbu)

Synthetic DNA

Figure 4: Key raw materials for DNA synthesis. Several steps are required to convert nucleosides to amidites.

2.3 What would it take to make building blocks from scratch? If a chemist were cut off from all the sources of ready-made nucleotides, nucleosides and bases described above, how would he or she approach the problem of putting together the essential ingredients for DNA synthesis? First, this person would need to be an excellent chemist and have access to a well-equipped chemistry laboratory. The most likely route for such a person to take would be to use the older method of purification from salmon milt. As noted above, isolation of nucleosides from fish is long, tedious and inefficient. To isolate 1Kg of four nucleosides, one would need 1,818 Kg of salmon. If access to nucleosides is not a problem, the chemist would still need to synthesize nucleotide amidites.

This requires the use of special reagents (e.g. phosphitylation

reagent), solvents (e.g. anhydrous acetonitrile) and airtight equipment. Isolation, storage and handling of P(III) amidites is an art that is not easily acquired even by an experienced chemist. However, given the tools, training and chemicals, an expert in the field could produce gram quantities of amidites in about six months. In reality, substantial supplies of nucleosides and all four amidites are already distributed across the globe in large quantities, held by a large number of potential suppliers. It is highly unlikely that even the most concerted international effort would be able to restrict the raw material supplies available to the degree envisioned above. COMMISSIONED PAPERS

23

Synthetic Genomics: Risks and Benefits for Science and Society

3. Using the building blocks to make a desired sequence The assembly of a useful (or harmful) sequence of DNA starts with the assembly of several nucleotides into a medium-length DNA strand, called an oligonucleotide. This is done using automated solid-phase synthesis; in other words, the chain of nucleotides is built on a solid bead, one at a time, with washing steps in between. The solid phase is essential to allow multiple steps to be performed with reasonable efficiency. Usually, processes that require a large number of chemical steps give a poor yield; if each step of a six-step synthesis is 95% efficient, the overall yield is only 73%. It would require about 80 individual steps to complete the sequential assembly of a 20-unit long oligonucleotide. The result is that the product is mixed with unreacted starting material and the products of undesired reactions, and can be very hard to purify. The larger the number of steps, the worse the problem gets. Solid phase synthesis makes multi-step synthesis easier in two ways. First, it is very easy to separate the product (which is attached to the bead) from the unreacted starting material (which is in solution) by simply washing the beads extensively. Second, this ease of separation makes it possible to use large excesses of starting material to drive the reaction very close to completion. In the case of the amidite chemistry that is typically used, each reaction occurs with >99% efficiency. The only impurity left is the product of partial reactions. Partial reactions are inevitable in any chemical process. For example, if one is trying to construct the sequence ATGCCAA, one would start with an A attached to a bead, then react it with a T. In most cases, the sequence AT will be made, but in a few cases the T will not be added. If the unreacted A is allowed to continue in the elongation reaction, the end result would be the wrong sequence, AGCCAA, which might have a completely different biological effect from the desired sequence. The same problem can occur at any step of the elongation process. Current DNA synthesis technology uses a trick to minimize the problems caused by incomplete reaction, “capping” unreacted sequences with special blocks that prevent their further elongation.

COMMISSIONED PAPERS

24

Synthetic Genomics: Risks and Benefits for Science and Society

Each cycle of elongation takes place in four main steps, with wash steps in between: (1) deprotection, in which a group that prevents premature reaction is removed from the end of a growing nucleotide chain; (2) coupling, in which a new nucleotide is added; (3) oxidation to stabilize the newly formed linkage; and (4) capping of partial products. Finally, the completed chain must be cleaved from the bead. The reagents needed for each step are discussed below. I will focus primarily on the amidite method, because of its widespread use on automated machines that produce thousands of DNA sequences every day around the world. Several other reaction schemes are possible, although they are less efficient. 3.1. Building blocks and reagents required for solid-phase synthesis The most important reagent required for amidite chemistry is a protected stable amidite derivative (see Figure 4), which provides extremely high (>99%) reaction efficiencies. These amidites are easily synthesized from nucleosides in just a few chemical steps, none of which would be challenging for a reasonably well-trained chemist with access to a sophisticated laboratory. Until recently the manufacturing and sales of these amidites was restricted due to the Köster patent. With the expiration of the patent this year, a number of low-cost Asian suppliers are now producing amidites in commercial quantities. This is one of the key reasons for the recent reduction in the cost of synthetic DNA. The four amidites of interest are wax-like, hygroscopic and easily decomposed upon heating. They must be carefully protected from air, water and heat. For most DNA synthesis applications, the amidites are sold in convenient pre-packed bottles that are simply plugged into a synthesizer without exposing them to air. The solid support is the second most important raw material needed for DNA synthesis (Figure 5). In essence, the solid support is a small mechanically sturdy polymeric porous bead that is chemically inert during DNA synthesis. The bead must have a reasonable surface area so that each bead can accommodate many growing chains. The most popular solid-support for small-scale synthesis is controlled pore glass (CPG) made from glass or silica (Figure 5). CPG is a special bead custom-made for the synthesis of DNA by a

COMMISSIONED PAPERS

25

Synthetic Genomics: Risks and Benefits for Science and Society

handful of companies. Synthesis of DNA on ordinary glass is possible but less efficient and would lead to decreased production of the desired DNA strand. Beads made of crosslinked polymers (reminiscent of nylon, but more rigid) can also be used as an alternative support.

DMT O

B1

O

HO

Detritylation

Solid support

O

B1

DMT

O

O O

N H

1

Step #1

O N H

O

2 Amidite

O

O

P

Succinyl linker

N

Coupling Step #2

O(CH2)2CN

3

Pure Synthetic DNA Purification

B2

O

O

DMT

DMT O

O

O

B2

O

B2

Crude Strand of DNA + Impurities O

Repeat Steps 1-4 Cleave DNA from bead using ammonium hydroxide

O P O(CH2)2CN O

O

B1

Oxidation

O

Step #3

O

P O(CH2)2CN

O

O

5 + Uncoupled 2 O

N H

4 + Uncoupled 2 O

Ac O

O

B1

Capping of unreacted chains Step #4

O O N H

O

6+5

Figure 5: General scheme for automated synthesis of DNA using amidites

Generally solid-supports are sold with the first nucleoside unit already anchored to the surface of the bead via a short cleavable succinyl linker (Figure 4c). The support is placed in a reactor (column) and connected to the automated synthesizer for the chain extension. The first step in oligonucleotide synthesis is the coupling of the first nucleotide to the

COMMISSIONED PAPERS

B1

O

O N H

O

26

Synthetic Genomics: Risks and Benefits for Science and Society

nucleoside already attached to the surface of the bead. The addition of each nucleotide unit requires four individual chemical steps and a number of reagents. These reagents include: (a) a deblocking solution that contains an acid such as dichloroacetic acid (DCA) in dichloromethane (DCM) or toluene; (b) an activator solution such as 1H-tetrazole or 4,5-dicyanoimidazole in acetonitrile; (c) an oxidation solution such as iodine in pyridine, THF and water; (d) two capping solutions, one containing N-methyl imidazole in pyridine and acetonitrile, the other containing acetic anhydride in acetonitrile. These are the only special reagents needed for the four-step repetitive synthesis cycle used in DNA construction (Figure 4). All of them are easily produced from common materials that would be next to impossible to control; alternative reagents have also been described in a variety of publications.

In the final step,

ammonium hydroxide solution is used to cleave the succinyl linker arm that holds the DNA chain attached to the surface of the solid support and removes the protecting groups that avoid side reactions during synthesis. Ammonium hydroxide is also a very common reagent. One of the important reagents used for DNA synthesis is the anhydrous acetonitrile required for the washing steps. Because the amidites are very sensitive to water, the grade of acetonitrile needed is higher than for most other applications. 3.2. Chemical steps during assembly The specifics of the chemical reactions that take place in an average DNA synthesizer are shown in Figure 4. The beads that make up the solid support, with the first nucleoside residue 1 attached, are packed into a column to allow solvents to be flowed through them efficiently. The synthesizer is programmed to pump reagents and solvent through the column, and the order of amidite addition is determined by the sequence of DNA needed. The chemical steps shown are: (1) detritylation, consisting of the removal of an acidlabile protecting group from the 5’-hydroxyl group of the nucleoside residue at the end of the growing oligonucleotide chain; (2) coupling of an activated amidite with the 5’hydroxyl group generated in step 1. (3) oxidation of the labile P(III) intermediate 4 to

COMMISSIONED PAPERS

27

Synthetic Genomics: Risks and Benefits for Science and Society

stable P(V) product 5 (Figure 4); and (4) capping using a mixture of two solutions, cap A (N-methyl imidazole in pyridine and acetonitrile) and cap B (acetic anhydride in acetonitrile), pumped through the column at the same time. A washing cycle between each step is essential. The four-step protocol is short and very efficient with each cycle completed in just a few minutes. Note that the “cap” added to the chain that failed to complete the desired reaction is an acetyl (Ac) residue, which is chemically different from the dimethoxytrityl (DMT) blocking group. The DMT group can be removed by gentle acid treatment, freeing it to react in the next cycle. The Ac group withstands this treatment, preventing the chain from elongating. DMT serves two purposes in the cycle; it prevents the amidites from reacting with themselves, and it prevents a chain that has been successfully elongated from being capped. After the capping step is complete the DMT can be removed to allow a new coupling reaction. 3.3. Automated synthesizers In the early 1980’s, the first commercial DNA synthesizers were built and sold by Applied Biosystems. These were single-column 380A and 380B instruments with capabilities to make one DNA sequence at a time on a very small scale (0.2 – 10 μmol). Today, there are a number of instruments on the market with the ability to produce hundreds or thousands of DNA sequences in parallel using both commercial and proprietary instrumentations. For example, Applied Biosystems 3900 DNA synthesizers use 96- or 384-well plates, making a different sequence in each well; specialized companies such as Illumina have adapted this strategy to use synthesizers with large platforms that carry many 384-well plates, again making an individual sequence in each well of each plate. Similarly, high throughput DNA synthesis is available on the MerMade Bioautomation or the Oligator Farm. Although synthesis of DNA without an automated synthesizer is in principle possible, in practice it would be highly inconvenient.

COMMISSIONED PAPERS

28

Synthetic Genomics: Risks and Benefits for Science and Society

4. Purification Some uses for oligonucleotides require a purification step to remove the products of incomplete reactions. The most widely used purification technologies are: (1) anion exchange, in which the oligonucleotide (after being cleaved from the solid support used in synthesis) is passed over positively-charged beads that retard the progress of individual oligonucleotides depending on how many negative charges are present on the molecule; and (2) reverse-phase chromatography, which separates molecules based on their degree of hydrophobicity. Both of these purification methods are very widely used for a variety of biochemical applications in academia and industry. New purification methods currently being explored include membrane-based chromatography and simulated moving bed (SMB) chromatography. SMB in particular looks promising, with the potential for >98% purity at the kilogram scale. 5. Points of intervention The reagents required for oligonucleotide synthesis are almost all so common, or so readily produced, as to defy restriction. The possibilities for restriction differ depending on whether the application to be controlled requires small amounts of material (as is the case for most genetic engineering applications) or large amounts (as for many medical applications). In both cases I focus on the amidite chemistry, since this is by far the most efficient chemistry currently available. Other chemistries can be used, but no sensible chemist would use them unless there was no other option. 5.1 Small-scale synthesis - Nucleosides, nucleotides and amidites.

These are the key building blocks of

oligonucleotide synthesis. They are used in a range of peaceful industries, including the production of important medicines such as AZT for the treatment of HIV. They are made and sold in very large scale; for example, Proligo (recently acquired by Sigma Aldrich) produces tons of amidites per year.

Denial of all ready nucleoside supplies to an

oligonucleotide chemist might slow the progress of DNA synthesis for months or years. At the same time, such restrictions would destroy or severely hamper the biotechnology COMMISSIONED PAPERS

29

Synthetic Genomics: Risks and Benefits for Science and Society

industry, and the progress of biomedical research.

Given the ready availability of

amidites from many suppliers across the world, it is hard to imagine an effective program to restrict access to these chemicals. - Solid support. A handful of companies produce the beads that permit efficient DNA synthesis on automated machines. Because of the highly specialized equipment and training required for the preparation of these beads, a skilled individual cannot make these products alone. Restricting access to beads may be worth exploring as a method of controlling DNA synthesis. Note, however, that unless one is willing to destroy the entire DNA synthesis industry, there will be a large number of companies that have legitimate uses for these beads. It would be a significant challenge to track every shipment of beads to every DNA synthesis company and ensure that all the beads are used for legitimate purposes. It is also increasingly possible to make oligonucleotides on derivatized glass slides, which are relatively easy to make by hand. - DNA synthesizer. A number of automated DNA synthesizers are available in the market place. It is possible that tracking the sales of new instruments might allow identification of potential terrorists. However, a very large number of existing instruments have already been sold and would be hard to track in this way; furthermore, an experienced engineer could construct one from spare parts with little difficulty. 5.2 Large-scale synthesis - Raw materials. The amount of raw materials needed for large-scale synthesis provides significant logistical challenges. Only a handful of companies are able to mass-produce these building blocks in the purity required for DNA synthesis. It should therefore be possible to identify and track bulk users of these chemicals. Furthermore, the capital investment in building and running a facility to produce DNA on large-scale is significant. For example, a kilo-scale plant used for the production of DNA would cost $2-5 million in capital investment alone. This does not include the cost of running the facilities. It would be relatively easy to keep track of the construction of such plants world-wide. COMMISSIONED PAPERS

30

Synthetic Genomics: Risks and Benefits for Science and Society

- Solvents and reagents. Although there are a number of choices for solvents and reagents for DNA synthesis, anhydrous acetonitrile is one solvent that is absolutely essential for the coupling step. A limited number of companies are producing DNA synthesis grade acetonitrile. Monitoring the sales of high purity acetonitrile might allow suspicious organizations to be identified if they are performing large-scale reactions. For small-scale oligonucleotide synthesis, however, anhydrous acetonitrile can be readily produced in the laboratory using a still. - Plant permit. Because a majority of large-scale (e.g. >10kg/year) plants are manufacturing medicines based on DNA they are regulated by the FDA for their GMP compliance. Therefore, it should be possible to monitor any suspicious or non-therapeutic activities and to require careful reagent tracking to minimize the risk that beads or solvents are diverted to other purposes. One possible hurdle could be put in place for such activity is to require a permit of some kind from an official entity before an organization is allowed to produce DNA in kilo quantities. - Product registration. It is possible to envision a system where someone requesting kiloscale custom synthesis of DNA is required to register with an organization describing its potential use. This system may create a barrier for the synthesis of DNA for harmful applications. Conclusion It would not be an easy matter to restrict the supply of the reagents needed for DNA synthesis to such an extent as to prevent a motivated individual from making oligonucleotides at a small scale. As noted above, the least implausible option for tracking and restriction would seem to be solid support beads. However, since these are widely used by the legitimate DNA synthesis industry, the restrictions must also include protocols for monitoring reagent use within a company and reporting their disposition. Several of the companies making and using these reagents reside outside the USA, complicating the task of imposing effective tracking policies.

COMMISSIONED PAPERS

31

Synthetic Genomics: Risks and Benefits for Science and Society

Glossary AMIDITE (also known as phosphoramidite): This is a protected version of a nucleoside that is easily activated for the coupling reaction. The P atom, which will eventually form part of the phosphate backbone, is protected with β-cyanoethyl and diisopropylamine groups. In the first stage of the coupling reaction, a weak acid protonates the nitrogen atom of the diisopropylamine protecting group, causing it to become positively charged and making it into a good leaving group. This allows nucleophilic attack by the free 5’ hydroxyl group of the bead-attached monomer on the phosphorous atom, forming the molecule referred to as 4 in Figure 4. Different protecting groups are also attached to the amines (-NH2) that are not part of a ring in the bases A, G and C, to prevent them from becoming protonated and causing unwanted reactions. These protecting groups, and the cyanoethyl protecting group on the phosphate, remain on the growing chain until it is finally released from the bead. ANHYDROUS: Water-free. Because the amidite chemistry depends on the hydroxyl group of the bead-attached monomer performing a nucleophilic attack on the positively charged diisopropylamine group, any other nucleophiles in the solution will reduce the efficiency of the coupling reaction. Water can act as a nucleophile, and must be rigorously excluded from the reaction. BASE: The structures of the bases are shown in Figure 1. The information content of a DNA molecule consists of the linear arrangement of the bases A, T, G and C along a phosphate/sugar backbone (also shown in Figure 1). It is the pairing of the bases, A with T and G with C, that allows DNA to be copied. 2-DEOXY-D-RIBOSE: The particular form of sugar that is used in DNA synthesis. The D in DNA stands for “deoxy”; this refers to the fact that carbon number 2 in the ribose ring does not carry an oxygen in the DNA structure. In RNA, the sugar used is ribose, not deoxyribose. NUCLEIC ACID: A polymer of nucleotide subunits. NUCLEOTIDES: A base connected to a sugar (ribose) ring and one or more phosphate groups. In the DNA structure, the nucleotides have one phosphate group each, which forms part of the backbone of the DNA. Each phosphate group is linked to two sugar groups, through two different oxygen atoms (see Figure 1).

COMMISSIONED PAPERS

32

Synthetic Genomics: Risks and Benefits for Science and Society

NUCLEOSIDE: The structures of the four nucleosides relevant to DNA synthesis are shown in Figure 3. A phosphate group must be added to nucleosides before they can be linked together to form DNA. In amidite chemistry, the phosphate group is formed by oxidation after the coupling step (see Figure 5). OLIGONUCLEOTIDE: A short stretch of DNA (for example, 20 nucleotide subunits linked together). PURINE: A base in which the pyrimidine ring is fused to a second ring, the imidazole ring. Imidazole is a five-membered aromatic ring with two nitrogens. PYRIMIDINE: A base consisting of a six-membered aromatic ring with two nitrogen atoms at positions 1 and 3.

COMMISSIONED PAPERS

33

Synthetic Genomics: Risks and Benefits for Science and Society

(this page blank)

COMMISSIONED PAPERS

34

Synthetic Genomics: Risks and Benefits for Science and Society

Synthetic Viral Genomics: Risks and Benefits for Science and Society Ralph S. Baric University of North Carolina at Chapel Hill

I. Introduction A. Viruses and Biological Warfare Viral disease outbreaks have long inspired fear in human populations. Highly pathogenic infectious disease has shaped world history, primarily by impacting the outcome of wars and other global conflicts and precipitating human movement. Historic accounts have documented the catastrophic consequences and human suffering associated with widespread viral outbreaks like smallpox virus, yellow fever virus, measles virus, human immunodeficiency virus (HIV), the severe acute respiratory syndrome coronavirus (SARS-CoV), the 1918 influenza virus and others (51). News accounts and film have reinforced the serious threat posed by the emergence of new viral diseases as well as the catastrophic consequences of intentional release of highly pathogenic viruses in human populations. As illustrated by the SARS epidemic and the continuing evolution of the H5N1 avian influenza, global and national infectious disease outbreaks can overwhelm disaster medical response networks and medical facilities, disrupt global economies, and paralyze health and medical services by targeting health care workers and medical staff (21). This review focuses on viruses of humans, animals and plants that are viewed as potential weapons of mass disruption to human populations, critical plant and animal food sources, and national economies; and will consider whether and how the availability of synthetic genomics technologies will change this landscape. Biological warfare (BW) agents are microorganisms or toxins that are intended to kill, injure or incapacitate the enemy, elicit fear and devastate national economies. Because small amounts of microorganisms might cause high numbers of casualties, they are classified as weapons of mass destruction. A number of naturally occurring viruses have COMMISSIONED PAPERS

35

Synthetic Genomics: Risks and Benefits for Science and Society

potential uses as BW agents, although the availability of these agents is oftentimes limited. This report discusses the potential use of recombinant and synthetic DNAs to resurrect recombinant BW viruses de novo and the potential for altering the pathogenic properties of viruses for nefarious purposes. Examples of weaponized viruses include Variola major (Smallpox), Venezuelan equine encephalitis virus (VEE), and the filoviruses Marburg and Ebola viruses, with the classic example being the use of smallpox virus-contaminated blankets against indigenous North American Indian populations (76). It is now clear that many viruses possess properties consistent with applications in biological warfare and bioterrorism. B.

Properties of Select BW Agents

Traditionally, biological warfare concerns have focused on a relatively limited, select group of naturally occurring pathogens viewed as having a set of desirable characteristics: 1) highly pathogenic, 2) readily available, 3) easily produced, 4) weaponizable, 5) stable, 6) infectious at a low dose, 7) easily transmissible, and 8) inspiring of fear (32). Viruses of concern include pathogens that replicate and produce serious morbidity and mortality in humans to pathogens that target farm animals and plants of economic importance.

Historically, weaponization of agents has been

constrained by availability, the biological characteristics specified within the genome of these organisms, the ability to replicate and produce large quantities of the material, and by the lack of appropriate associated technologies. Culture (growth) and containment conditions for most of the virus agents of concern have been solved and are readily available in the literature. Natural hosts and reservoirs of many viral agents have been identified, providing a means of readily acquiring these pathogens in nature, although this is not always the case. Most recently, full length genome sequences have been solved for many important human, animal and plant pathogens, providing a genetic template for understanding the molecular mechanisms of pathogenesis and replication. Structural studies have identified contact points between the virus and the host receptors needed for docking and entry, providing the means to humanize animal pathogens (42). With the advent of synthetic biology, recombinant DNA technology, reverse genetic approaches (i.e. the development of molecular clones of infectious genomes) and the identification of COMMISSIONED PAPERS

36

Synthetic Genomics: Risks and Benefits for Science and Society

virulence alleles, not only are new avenues available for obtaining these pathogens, but more ominously, tools exist for simultaneously modifying the genomes for increased virulence, immunogenicity, transmissibility, host range and pathogenesis (22, 59). Moreover, these approaches can be used to molecularly resurrect extinct human and animal pathogens, like the 1918 human influenza virus (81). National biodefense strategies are focused on threats posed by this small group of plant, animal and human pathogens that occur in nature. However, counterterrorism think-tanks anticipate that these particular threats will ameliorate over the next decade because of medical countermeasures (e.g., drugs, vaccines, diagnostics), coupled with a limited set of pathogens that include all of the biological warfare characteristics. More important, the anticipated long-term threat in biological warfare is in recognizing and designing countermeasures to protect against genetically modified and designer pathogens, made possible by newly emerging technologies in recombinant DNA, synthetic biology, reverse genetics and directed evolution (59). How will synthetic genomics effect future biological weapons development? What are the risks and benefits of these new technologies and how serious a threat do they pose for human health and the global economy? This paper builds upon earlier work and seeks to review the methodologies in isolating recombinant viruses in vitro and the application of these methods globally to biological warfare and biodefense (27).

II. Virus Classification and Reverse Genetic Approaches A. Overview of Virus Classification and Reverse Genetics From the genome, all viruses must generate a positive strand mRNA that is translated into proteins essential for genome replication and the assembly and formation of progeny virions. Depending upon the nature of the genome, all viruses can be clustered into seven fundamentally different groups, which utilize different strategies to synthesize mRNA

COMMISSIONED PAPERS

37

Synthetic Genomics: Risks and Benefits for Science and Society

from the input genome, a scheme called the Baltimore Classification (Figure 1).1 Because virus infectivity is dependent upon the ability to transcribe mRNAs, reverse genetic strategies are designed to insure expression of critical viral mRNAs that encode essential replicase proteins needed to “boot” (initiate) genome infectivity and initiate genome replication. mRNA

reverse transcription

(Group VI)

dsDNA

DNA Synthesis

ssDNA (Group II)

(Group I)

Transcription

dsRNA (Group III)

viral replicase

mRNA

viral replicase

(Group IV)

ssRNA (-) (Group V)

Translation

Protein (Infectious Prions)

Figure 1. Baltimore Classification Scheme.

Group I viruses include the double-stranded DNA (dsDNA) viruses, like the Herpes viruses and Poxviruses which replicate in the nucleus or cytoplasm, respectively. The dsDNA viruses use cellular and/or virally-encoded transcriptase components to mediate expression of viral mRNAs. Poxviruses for instance require one or more viral proteins to initiate mRNA transcription and boot infectivity of the viral genome. Hence, smallpox virus genomes are not infectious unless the appropriate suite of viral proteins is provided in trans (in addition to the genome itself). In contrast, the Herpes virus genome is infectious in the absence of any viral proteins as cellular transcriptase machinery induces expression of early mRNAs and proteins that regulate expression of other viral genes and replication. Using vaccinia (poxvirus) as a model, an approach to successfully initiate/jump start and boot the infectivity of poxviruses has been developed, providing a template strategy for the family (11, 24). Herpes virus genomes are infectious in the absence of additional viral factors.

Group II viruses encode single stranded DNA

genomes which must be used as templates for the synthesis of a dsDNA before 1

Named for the virologist David Baltimore, who proposed the system.

COMMISSIONED PAPERS

38

Synthetic Genomics: Risks and Benefits for Science and Society

transcription and translation of mRNAs can occur within cells. At this time, group II BW agents have not been identified. The Group III viruses contain double stranded RNA viruses, like reoviruses. Reovirus genomes consist of complementary positive and negative strands of RNA that are bound by hydrogen bonding, wrapped within a multistructured icosahedral core that is essential for virus transcription. The virion structure contains the necessary proteins required for initiating mRNA synthesis. Unlike many of the single-stranded RNA viruses, the dsRNA virus genomes are not infectious in isolation and the components necessary for booting genome infectivity remain unresolved. Group IV viruses contain a single-stranded positive polarity RNA genome and include the flaviviruses, alphaviruses, picornaviruses (including poliovirus), coronaviruses (including the SARS virus), caliciviruses and others. Upon entry into cells, positive strand RNA genomes are immediately recognized by host translational machinery and the genome is translated into a suite of viral proteins, including the replicase proteins and RNA-dependent RNA polymerase which is necessary for initiating the viral replication cycle. Consequently, genome infectivity usually does require viral proteins or transcripts provided in trans to boot genome infectivity, although some exceptions have been reported (13). Group V viruses contain a single-stranded negative polarity RNA genome and include filoviruses (Ebola/Marburg), myxoviruses (influenza), and paramyxoviruses (Hendra). Group V genomes come in two different flavors, segmented (e.g., myxoviruses) or nonsegmented (e.g., paramyxoviruses and filoviruses). In either case, the genome is not infectious because it is complementary in sequence (anti-sense); it is the opposite of the positive strand that specifies amino acids and thus cannot be translated directly into any of the critical viral structural or replicase proteins needed for producing infectious virions. Negative strand RNA genomes are encapsidated into a complex ribonucleoprotein structure (RNP) usually composed of several virally encoded replicase proteins (e.g., polymerase complex proteins, support proteins, trans-acting proteins) that are incorporated into the virion during assembly. Together, these compose a functional replication complex.

Upon entry, these RNP complexes immediately transcribe the

genome negative strand RNA into mRNA that can be translated into the viral proteins. COMMISSIONED PAPERS

39

Synthetic Genomics: Risks and Benefits for Science and Society

Consequently, genome infectivity requires the presence of full length RNA and a set of virally encoded replicase proteins that function as a transcriptional complex to express mRNAs. If mRNAs encoding the transcripton complex are provided in trans, group V genomes become infectious and virus will be successfully recovered. Group VI viruses, retroviruses (including HIV) and lentiviruses, encode single stranded positive polarity RNA genomes, but virions encode a reverse transcriptase enzyme to convert the mRNA genome into a complementary DNA (cDNA) which serves as template for dsDNA synthesis. Following the synthesis of dsDNA, group VI viruses use cellular transcriptional and translational machinery to express viral transcripts encoding structural and nonstructural proteins. At this time, the group VI viruses do not include any BW agents.

B. Infectious Genomes, Molecular Clones and Reverse Genetics The basic concepts central to understanding virus reverse genetics and molecular clones are summarized in Figures 1 and 2. The central idea is that the virion is an extracellular vehicle that transfers the viral genome (e.g., RNA or DNA genomes) between susceptible cells and protects the nucleic acid genome from degradation in the environment (Figure 2, Part A). Following entry, the viral genome is programmed to initiate a series of events that result in the production of a replicase complex that transcribes mRNA and replicates the genome. As discussed in the previous section, nucleic acid structure and organization determines the pathway of events needed to express mRNA and initiate virus gene expression and infection. Not all viruses, however, require virion attachment and entry to mediate a productive infection. In these cases, viral genomes can be isolated from virions and transfected directly into susceptible host’s cells. If the genome is infectious, viral RNAs and proteins will be expressed allowing for the production and release of progeny virions (Figure 2, Part B).

Classic examples of viruses with “infectious genomes”

include the herpes viruses, polioviruses, alphaviruses, polyomaviruses, and flaviviruses which are classified among the Group I, II or IV viruses. However, not all viral genomes are infectious upon delivery into cells. Viruses with Group III or V genomes have never COMMISSIONED PAPERS

40

Synthetic Genomics: Risks and Benefits for Science and Society

been demonstrated to be infectious upon genome delivery into susceptible cells. Some Group I (poxviruses) and group IV virus genomes (e.g., norovirus, a causative agent of non-bacterial gastroenteritis, or “cruise ship disease” and the coronavirus infectious bronchitis virus) are not infectious upon delivery into susceptible cells (13). In these instances, genome infectivity requires the presence of specific cofactors to initiate viral replication.

These cofactors typically represent one or more proteins that encode

essential replicase proteins or encapsidate the genome into an RNP structure necessary for initiating transcription of mRNA from the genome. In this example, infectious bronchitis virus genome infectivity requires the nucleocapsid protein in trans while the components needed to boot norovirus genome infectivity remain unknown (13).

Infectious Virion Particle

A. Virus Infection

Infectious? Viral Genome

B.

Virus Replication

Accessory Factors to "Boot" Infectivity?

Recombinant or Synthetic cDNA Cloning Techniques

Viral DNA

C.

Restriction Digest and Ligate

Viral DNA

Stop

cDNA Clone

Cell N

Progeny Virus

E.

D. Viral Genome

Plasmid

In vitro Transcription; purify cDNA genome T7 Start

plasmid DNA

Figure 2. Virus Reverse Genetic Strategies.

In the late 1970’s, a simple observation altered the course of virology research globally. Using a small dsDNA virus genome as a model (the Group I polyomavirus SV40) researchers cloned the viral genome into a bacterial plasmid and propagated the viral genome in bacteria.

Upon isolation of the plasmid DNA from bacteria, restriction

enzymes were used to excise the dsDNA viral genome, re-ligate the genome in vitro into a circular dsDNA and rescue virus following transfection of the genome into susceptible cells (Figure 2, Part C)(28). (Many advances in biotechnology have been, and continue to be, dependent upon this restrict-isolate-ligate technique, or variations of it.) Shortly

COMMISSIONED PAPERS

41

Synthetic Genomics: Risks and Benefits for Science and Society

thereafter, full length cDNAs of positive strand RNA genomes were isolated following reverse transcription, the cDNAs cloned and propagated in bacterial plasmids, and following introduction of full length DNA into eukaryotic cells, recombinant viruses were rescued from the transfected cultures, although very inefficiently.

The major

problems with this approach were the difficulty in generating the appropriate termini, accurate genome sequence, problems in nuclear transport of the full length RNA genome, and splicing of the viral genomic RNA.

To rectify the efficiency problems,

bacteriophage promoters (T7, SP6, T3) were introduced upstream of the cloned viral cDNAs, allowing in vitro transcription of full length RNA copies of the viral genome using the appropriate phage RNA polymerase, nucleotide triphosphates, and other constituents (Figure 2, Part D). The full length RNAs, near exact replicas of the viral genome, were highly infectious upon transfection of susceptible host cells (Figure 2, Part E)(2, 65, 66). The ability to clone full length copies of viral genomes allowed for ease of manipulation of the genome and the introduction of specific mutations.

Recovered

viruses contained the introduced mutations that were encoded within the full length cDNA clones, providing a ready means of performing detailed genetic analyses of virus replication and pathogenesis. As noted earlier not all viral genomes are infectious, complicating the development of full length cDNAs and the recovery of recombinant viruses. Isolated dsRNA genomes from Group V negative sense RNA viruses are not infectious because the genome sequence cannot be translated directly into a functional replicase complex needed to transcribe the incoming genomic RNA. As Group V virions contain a replicase protein complex essential for transcription, genome infectivity requires that cells be cotransfected with plasmids that express the genomic RNA and plasmids expressing transcripts that encode the replicase protein complex are needed for genome infectivity (Figure 3a). For most group V viruses, both genome negative and positive sense RNA infectivity can be booted using this approach with most investigators expressing full length plus (coding) strands from the initial transcript. The plus strands are transcribed to full length negative strands, which are used to express the appropriate set of mRNA encoding the full component of positive and negative strand RNAs. Using this approach

COMMISSIONED PAPERS

42

Synthetic Genomics: Risks and Benefits for Science and Society

Schnell et al. successfully recovered the first recombinant negative stranded RNA virus, rabies virus, from a cloned cDNA, ushering in an era of Group V virus reverse genetics (68, 82). These findings were rapidly extended to other linear negative stranded RNAs like paramyxoviruses and then to segmented negative strand RNA viruses like influenza and other myxoviruses, and then select bunyaviruses and arenaviruses (20). Reverse genetic strategies for group V viruses with segmented genomes are most complex as multiple plasmids expressing copies of each genome segment must be simultaneously delivered to a cell along with the support plasmids encoding the transcriptase complex.

T7 Plasmids Expressing Replicase N, P, L mRNAs and full length (+) antigenome

Vaccinia-T7 Recombinant

Co-transfect

Cell

N

(+) RNP Formation RNP Replication Expression of Viral Proteins from (-) RNP

Vaccinia Removal

Paramyxovirus Figure 3a. Category V (Linear negative sensed RNA genome) Reverse Genetic Approach.

COMMISSIONED PAPERS

43

Synthetic Genomics: Risks and Benefits for Science and Society

Most of the RNA viruses have relatively small genomes (under approximately 20,000 bases or base-pairs). Viruses with extremely large genomes (over 100,000 base-pairs, e.g., herpes viruses, poxviruses, or ~20,000-30,000 base pairs, e.g., coronaviruses, filoviruses) have presented additional obstacles in the development of stable molecular clones.

Generation of infectious clones for viruses encoding large RNA or DNA

genomes is complicated by the need for sequence accuracy (e.g., incorrect sequences usually contain lethal mutations), the lack of suitable cloning vectors that stably maintain large DNA inserts, large genome size, and that the genomes oftentimes encode regions that are toxic or unstable in bacteria. In poxviruses for example, the ~200 kilobase pair (kbp) genome has covalently closed hairpin ends (structures formed by the DNA itself) that are required for genome replication and virion encoded products are also essential for booting genome infectivity (24). Herpes virus genomes are ~150 kbp in size. One solution was to stably clone large viral genomes as bacterial artificial chromosome (BAC) vectors. BAC vectors are based on the replication of F factor in E.coli, which is tightly controlled and allows stable maintenance of large, complex DNA fragments up to 600 kbp and both herpesvirus and poxvirus genomes can be stably maintained in BAC vectors (17, 24). For Herpes viruses, BAC shuttle vector sequences encoding a marker are inserted by homologous recombination into the genome.

Circular viral DNA, which is generated during the Herpes virus

replication cycle, is purified from infected cells (so-called Hirt prep) and introduced in bacterial cells, which essentially generates a large plasmid containing the Herpes virus genome (49). As herpesvirus genomes are infectious, the BAC DNA sequences are rapidly lost after delivery to a suitable host cell, along with some surrounding viral sequences, because they are dispensable for viral DNA replication (71). Using the Cre/lox system (another basic tool of molecular biology), a self-recombining full length

COMMISSIONED PAPERS

44

Synthetic Genomics: Risks and Benefits for Science and Society

pseudorabies virus BAC was developed where the full length genome is automatically removed from the BAC sequences by the expression of Cre recombinase after transfection, reducing the potential for random deletions of viral sequences (72) (Figure 3b). Recombinant Herpes virus genomes that have been successfully cloned include mouse cytomegalovirus, herpes simplex virus 1, human cytomegalovirus, pseudorabies virus, and Kaposi’s Sarcoma virus (11, 24, 49).

Figure 3b. Bac Vector Based Recombinant Clones for Herpesviruses (HV).

Bac vector with Indicator gene

Co-transfect HV viral DNA and BAC vector

Progeny virus

Purify pBAC-HV DNA and transfect into susceptible cell homologous recombination

Host cell

E.coli Host cell

Transform circular viral DNA into E.coli

Isolate recombinant vBAC-HV viruses Host cell

circular dsDNA intermediate early in infection

Introduce Mutations

pBAC-HV

Poxvirus genome structure and replication modes make the development of an infectious poxvirus molecular clone an order of magnitude more difficult than generation of the Herpes virus molecular clone. Poxvirus genomes replicate in the cytoplasm and require several viral proteins to mediate mRNA transcription and a unique DNA-dependent RNA polymerase that are normally contained within the virion to initiate virus infection.

COMMISSIONED PAPERS

45

Synthetic Genomics: Risks and Benefits for Science and Society

Consequently, purified poxvirus DNA is not infectious.

In addition, the linear dsDNA

genome has closed hairpins at each end of the genome that are essential for DNA replication. How were these problems solved? As described with Herpes viruses, a mini BAC encoding a marker called green fluorescent protein (GFP) was recombined into the thymidine kinase gene encoded in the vaccinia genome (a model for smallpox). Recombinant viruses harboring the BAC cassette were identified by GFP expression. However, transformation of Vaccinia BAC vectors into E.coli required conversion of the linear genome with covalently closed ends into a closed circular DNA. To accomplish this, Domi and Moss blocked late viral gene expression knowing that this favored additional recombination events that allowed head to tail concatamers of full length genome from which monomeric recombinant genome in a covalently closed circle would result, a favored genome orientation for insertion into E.coli. Transfection of VAC-BAC DNA into mammalian cells, previously infected with a helper fowl pox virus whose replication is defective in mammalian cells, allowed recovery of recombinant vaccinia virus (23, 24). Although BACs are remarkably stable, both poxviruses and herpesvirus genomes contain repetitive sequence elements and other sequences that might be unstable with passage as no biological selective pressure exists to maintain virus genome sequence fidelity in E. coli. Because the large genome size makes it impractical to sequence the entire genome, in vivo pathogenesis studies have been used to demonstrate equivalent levels of pathogenicity and virulence between wildtype and recombinant herpes viruses, further supporting the hypothesis that BAC recombinant genomes are highly stable in E.coli (12). The availability of large dsDNA genomes in BACs provides two major opportunities for future research, the construction of expression vectors for treatment of human diseases and the mutagenesis of the viral genome for understanding gene function, virus replication and pathogenesis. A second solution to large genome instability was developed using coronaviruses as models. Seven contiguous cDNA clones that spanned the 31.5 kilobase (kb) coronavirus genome (e.g., mouse hepatitis virus [MHV] or SARS-CoV) were amplified, isolated and ligated into standard polymerase chain reaction (PCR) cloning vectors (PCR is one COMMISSIONED PAPERS

46

Synthetic Genomics: Risks and Benefits for Science and Society

technique used to amplify sequences that are rare and/or not available in large quantities, to provide enough material for subsequent experiments). The ends of the cDNAs were engineered with unique junctions, generated by class IIS restriction endonucleases like BglI or Esp3I. These enzymes leave asymmetric ends, which are designed to seamlessly reproduce the exact virus sequence, allow directional assembly of adjacent cDNA subclones, and direct the production of an intact full length cDNA construct of ~31.5 Kb in length. With enzymes like Esp3I, interconnecting restriction site junctions can be located at the ends of each cDNA and systematically removed during the assembly of the complete full-length cDNA product (Figure 4a). The availability of a contiguous set of DNAs containing unique interconnecting junctions provides for the systematic assembly of large DNA molecules greater than 1,000,000 base pairs by in vitro ligation (85). In the case of coronaviruses (Figure 4b), full length cDNAs are assembled that contain a T7 transcription site at the 5’ end of the genome. RNA transcripts driven from the full length cDNA were infectious upon delivery into susceptible cells (85, 87). Alternatively, coronavirus genomes can be stably cloned into BAC vectors.

T7 or eukaryotic

promoters encoded upstream of the viral sequences allow for the synthesis of full length RNA genome sequences, which are infectious upon introduction into cells (1). Seamless assembly (also called No See’m Sites (85)) cascades have been used to assemble full length cDNAs of the coronaviruses mouse hepatitis virus, transmissible gastroenteritis virus, infectious bronchitis virus and the SARS-CoV (85,86,87). Because certain type IIS restriction endonucleases (e.g., Esp3I, AarI, Sap1) recognize asymmetric binding sites and leave asymmetric ends, these enzymes can be used to create the unique interconnecting junctions, which can be subsequently removed from the final assembly product allowing for the seamless reconstruction of an exact sequence (Figure 4b). This approach avoids the introduction of nucleotide changes that are normally associated with building a full-length cDNA product of a viral genome. These non-palindrome restriction sites will also provide other novel recombinant DNA applications. For example, by PCR

COMMISSIONED PAPERS

47

Synthetic Genomics: Risks and Benefits for Science and Society Figure 4a. Systematic Whole Genome Assembly Techniques.

Esp3I (BsmB1) 5'-CGTCTCN-3' 3'-GCAGAGNNNNN-5'

Traditional Esp3I

5'3'-

MHV A Subclone

5'3'-

Esp3I

CGTCTCACCTCN8 GCAGAGTGGAGN8

5'-NNNNCGTCTCACCTC 3'-NNNNGCAGAGTGGAG

Esp3I CGTCTCA-CCTC GCAGAGTGGAG-

MHV A Subclone

MHV B Subclone

MHV B Subclone

-3' -5'

-3' -5'

Seamless Assembly Esp3I

5'3'-

MHV A Subclone

ATCCCTGAGACGNNNNN-3' TAGGGACTCTGCNNNNN-5'

5'-NNNNCGTCTCATCCC 3'-NNNNGCAGAGTAGGG

MHV B Subclone

-3' -5'

Esp3I Intact MHV Sequence

5'3'-

MHV A Subclone

ATCCC TAGGG

MHV B Subclone

-3' -5'

Esp3I Site Lost

it will be possible to insert Esp3I or a related non-palindromic restriction site at any given nucleotide in a viral genome and use the variable domain for simple and rapid sitespecific mutagenesis. By orientating the restriction sites as “No See’m”, the sites are removed during reassembly, leaving only the desired mutation in the final DNA product. The dual properties of strand specificity and a variable end overhang that can be tailored to match any sequence allow for Esp3I sites to be engineered as “universal connectors” that can be joined with any other four nucleotide restriction site overhangs (e.g. EcoRI, PstX1, BamH1). Alternatively, “No See’m” sites can be used to insert foreign genes into viral, eukaryotic, or microbial genome or vector, simultaneously removing all evidence of the restriction sites that were used in the recombinant DNA manipulation. Finally, these restriction sites allow for the rapid assembly of small synthetically produced cDNAs into progressively larger cDNAs. For example, enzymes like AarI recognize a 7 nucleotide recognition sequence and leave a four nucleotide asymmetric end (usually). In a random DNA sequence, this site occurs every 8,000 base pairs or so.

COMMISSIONED PAPERS

48

Synthetic Genomics: Risks and Benefits for Science and Society

Using a recursive assembly cascade 2~256 different 8Kb cDNAs can be assembled into extremely large >1,000,000 bp DNAs designed in BACs for stable maintenance in bacteria (85-87). At this time, well developed molecular clones have been constructed with representative viruses in most of the known virus families; specifically, the Groups I-IV genomes, thus providing a systematic approach for generating molecular clones of many Categories I, III, and IV BW agents. In addition, recent advances in synthetic biology provides promise for reconstructing microbial genomes de novo (15), as has been elegantly demonstrated with the recovery of recombinant poliovirus and ΦX174 viruses (14, 73) from synthetically derived genomes. In these instances, accurate sequences were available for

Figure 4b. Systematic Assembly of Coronavirus Genomes.

Purify plasmid DNA containing SARS CoV fragments A

B

C

D

E

F

Digest with Bgl2 restriction endonuclease and purify Finite source of nonreplicating full length cDNA that is consumed in the reaction

Ligate fragments Set of Contiguous ~5 Kb pieces

Transcribe genome length RNA

N transcripts (N protein) “boots” infectivity by 10>1000 fold (may enhance transcription)

Transfect Vero E6 cells and Isolate recombinant SARS-CoV

de novo synthesis, as functional molecular clones had existed for both viruses for many years. Consequently, the combination of proof of principle, available templates for genome construction and sequence information make it likely that any virus genome

COMMISSIONED PAPERS

49

Synthetic Genomics: Risks and Benefits for Science and Society

could be synthetically reconstructed from sequence databases, assuming that the sequence is correct (18, 36).

C. Review of Controlled Viruses The United States Department of Health and Human Services (HHS), the Centers for Disease Control and Prevention (CDC), and the United States Department of Agriculture (USDA) have identified bacteria, viruses, toxins, rickettsia, and fungi that pose a potential threat to public health or welfare. Some of these organisms are considered Select Agents and High Consequence Livestock Pathogens and all research laboratories with access to these agents must submits names and fingerprints of all individuals listed as working with Select Agents to the Department of Justice. Every person who enters a laboratory containing registered Select Agents must have FBI security clearance or be accompanied and monitored by such a cleared person. This includes visitors and employees performing routine cleaning, maintenance, and repairs. The CDC oversees and regulates all laboratories that possess or use select agents and the transfer of select agents and toxins that may be used to threaten the overall public health and safety as published in the Federal Register on March 18, 2005 (42 C.F.R. Part 73, 7 C.F.R. Part 331, and 9 C.F.R. Part 121) (Appendix 1). In addition, the Department of Commerce regulates the transport of many pathogenic agents deemed important for maintaining the public health or that could impact the economic vitality of the US. Many, but not all, overlap with the Select Agent List and the USDA High Consequence Livestock Pathogens. Finally, the National Institutes of Health has assembled a list of high priority agents for biodefense research, and provides special funding for basic science, vaccines and therapeutics. Select agents are typically grouped among category A agents that pose the most serious perceived risk to national security while category B agents include many important food and waterborne agents that are easy to disseminate. The category C agents are emerging pathogens of special concern or pathogens that could be engineered for mass dissemination. All work with microbes that might be harmful to workers or to the environment is conducted according to a variety of regulations directed to the general area of “biosafety COMMISSIONED PAPERS

50

Synthetic Genomics: Risks and Benefits for Science and Society

and containment”. What is important here is that biosafety and containment are accomplished through a suite of institutional and worker actions and these activities are referred to by the level of containment achieved. “Biosafety Level 1” (BSL-1) is the least stringent containment; BSL-4 the most stringent (used for the deadliest pathogens for which there are no treatments). Priority viruses will be discussed according to the Baltimore Classification Scheme. The key columns in these tables are the last three, Nature, Laboratory, and Synthetic. A “yes” in Nature indicates that the virus can be found in nature (thus, all viruses on the list except smallpox, 1918 H1N1 and 1957 H2N2 influenza, and the 2002-2003 strain of SARS CoV). A “yes” under Laboratory means that the virus can be found in some kind of lab, be it a research laboratory, a reference laboratory (e.g., the American Type Culture Collection), a commercial laboratory, etc. This is virtually all viruses on the list (smallpox is closely guarded, and the recently resurrected 1918 influenza virus, at least for now, is in a limited number of known laboratories). Synthetic captures two characteristics. First, is it possible to synthetically construct a virus of a specific family? These are indicated in bold, and takes into account both whether a synthetic DNA construct can supply the appropriate nucleic acid, and if enough is known about the other aspects of booting the system that it is imaginable that a synthetic approach would be taken. Second, for the individual viruses on the list, the range of possibility takes into account both whether it is possible to construct, and whether this would be an attractive possibility compared to finding it in nature, or trying to steal it from a laboratory (in the case of a bioterrorist). So for example, even though foot-and-mouth disease virus is easy to find in nature and highly contagious, it is also easy enough to synthesize that bioterrorists hoping to hide their tracks may prefer the synthetic route. The Group I agents include the dsDNA viruses contained among the Herpes viruses, Poxviruses and Asfarviruses (Figure 5). Herpes viruses contain linear dsDNA genomes of about 150,000 base pairs and include Herpes B virus (primate) and Malignant catarrhal fever viruses (swine), both of which are readily available in nature and for which culture conditions have been detailed in the literature. Herpes virus genomes are infectious; full length molecular clones and recombinant viruses have been described for several human COMMISSIONED PAPERS

51

Synthetic Genomics: Risks and Benefits for Science and Society

and animal herpes viruses (72). Although molecular clones for Herpes B virus and Malignant catarrhal fever virus have not been described, a significant body of literature provides a theoretical template and guide for the development of similar constructs with a high probability of success. Poxvirus genomes range in size from 150,000 to 196,000 base pairs in length and the genomes are not infectious upon introduction into susceptible cells. However, poxvirus genome infectivity can be booted by coinfection with an avian poxvirus that has an abortive infection in mammalian cell lines, but provides essential proteins for transcribing the poxvirus genome. A molecular clone has been described for vaccinia virus, providing a theoretical template for guiding similar technology with other members in the family (23, 24). Poxviruses like Variola major and Variola minor (smallpox) and monkey pox Figure 5. Category I Restricted Agents.

Family

Virus

Genome Size

Infectious/ Boot Infectivity

Category I

dsDNA Genome

Linear

Mixed/yes

NIH A-C

Commerce

USDA

Nature

Laboratory

Synthetic Yes but Difficult

Yes/Yes

Herpesviruses Herpes B Virus

156,789

Malignant catarrhal fever virus

156,789

Y Y

Yes

Yes

Unlikely

Yes

Yes

Unlikely Yes, but Difficult

No/Yes*

Poxviruses Variola Major Variola Minor Monkey pox

186,103185,578 186,986

196,858

No/No

Y

A

Y

No

No* (Limited)

Plausible but difficult

No/No

Y

A

Y

No

No* (Limited)

Plausible, but difficult

No/No

Y

A

Y

Yes

Yes

Unlikely

A

Y

Yes

Yes

Unlikely

White pox Goat pox

149,999

No/No

A

Y

Yes

Yes

Unlikely

Sheep pox virus

149,955

No/No

A

Y

Yes

Yes

Unlikely

No/No

A

Y

Yes

Yes

Unlikely

Camel pox

Asfarvirus

HMSCDC

Lumpy skin disease virus

150,773

No/No

Y

Yes

Yes

Unlikely

African swine fever virus

170,101

No/No

Y

Yes

Yes

Possible

COMMISSIONED PAPERS *Variola samples are maintained in two laboratories worldwide.

52

Synthetic Genomics: Risks and Benefits for Science and Society

viruses are select agents. Although most poxviruses can be readily found in nature and/or are maintained in laboratory settings, Variola major and minor are notable exceptions that are thought extinct in the wild. These two viruses are maintained in high security facilities in the US and Russia and it is very unlikely that these agents can be recovered from natural settings. Group III priority agents include the reoviruses African horse sickness and exotic bluetongue strains, which primarily infect domesticated animals (Figure 6). Reovirus genomes contain ten segments of double stranded RNA and these genomes are not infectious in isolation. Reproducible schemes to boot reovirus genome infectivity have recently been developed by the Dermody laboratory.

Although these viruses are

available in nature and in laboratory settings, the inability to initiate genome infectivity had hampered the successful development of reverse genetic approaches and molecular clones. Consequently, the use of natural or laboratory acquired strains represented the

Virus

Genome

Infectious/ Boot Infectivity

dsRNA Segmented Genome (10)

Linear, dsRNA

No, Yes*,

REOVIRUS

1-3965; 6-1566 2-3203; 7-1179 3-2792; 8-1166 4-1978; 9-1169 5-1566; 10-798 1-3944; 6-1658 2-2953; 7-1156 3-2772; 8-1125 4-1981; 9-1049 5-1769; 10-822

No, No

Y

Yes

Yes

Unlikely

Reovirus

African horse sickness virus

No, No

Y

Yes

Yes

Unlikely

Family Category III

Bluetongue virus (exotic)

HM S/C DC

NIH A-C

Commerce

USDA

Nature

Laboratory

Not Possible

Figure 6. Category III Priority Viruses.

COMMISSIONED PAPERS

Synthetic

53

Synthetic Genomics: Risks and Benefits for Science and Society

most likely approach to acquiring these agents for bioterrorism purposes, although the reovirus reverse genetic system should be an appropriate template for developing molecular clones to other reoviruses.. Group IV viruses contain single stranded positive polarity RNA genomes and include agents in the calicivirus, potyvirus, picornavirus, alphavirus, flavivirus and coronavirus families (Figure 7). These viruses have dramatically different virion structures, genome organizations, and transmission modes between hosts; they target different tissues, display different virulence and pathogenic determinants and use different replication strategies upon entry into susceptible cells. Common features, however, include an infectious positive sense RNA genome and relatively straightforward and well developed approaches for obtaining full length cDNA clones from which recombinant viruses can be easily isolated in culture. In most cases these viruses replicate efficiently in culture, and animal models of disease exist, allowing for easy cultivation, maintenance, and testing in a laboratory setting. A general rule of thumb is that the BSL2 positive single stranded RNA (e.g., human noroviruses) pathogens are more readily accessible than the BSL3 pathogens (e.g., SARS-CoV, VEE, etc.) in laboratory settings. BSL4 pathogens are the least accessible. Poliovirus, which is targeted for eradication, is not included among any of the high priority pathogen lists but has been synthetically reconstructed by the Wimmer laboratory.

Wild poliovirus is eradicated from the North and South

American continents and Europe, but is still prevalent in Africa and parts of Asia. The virus has been present in many laboratories throughout the world, although current efforts are aimed at limiting the availability of wildtype stocks to a few locations in the US. Should eradication efforts prove successful, poliovirus should almost certainly be listed as a high priority agent. In the future, poliovirus might represent a likely candidate for synthetic reconstruction efforts because whole genome sequence is available, genome size is small and could be purchased for about $10,000 US dollars, and synthetic polioviruses have been reconstructed in the laboratory. This possibility, however, may be several decades away and is also dependent upon an end to global vaccination efforts.

COMMISSIONED PAPERS

54

Synthetic Genomics: Risks and Benefits for Science and Society

The Group IV viruses are also very abundant in nature and many are present in laboratories. The main exception is the human 2002-3 SARS-CoV epidemic strain that is likely extinct in the wild, but is present in many laboratories throughout the world. Globally, most SARS-CoV isolates were late phase epidemic strains because many early and zoonotic (animal) isolates were never successfully cultured and not distributed outside of China (19, 41). Molecular clones have been described for prototype animal caliciviruses, picoronaviruses, potyviruses, alphaviruses, flaviviruses and coronaviruses, including many, but not all of the agents of interest in Figure 7. At this time, molecular clones for human noroviruses have not been successfully developed. Group V viruses contain a single stranded negative polarity RNA genome and include members of the bunyavirus, arenavirus, filovirus, paramyxovirus, rhabdovirus, and influenza virus families (Figure 8, below). As with the group IV viruses, these viruses differ dramatically in virion structure, genome organization, transmission modes, human disease severity, virulence and pathogenesis.

In general, negative stranded RNA

genomes are either nonsegmented and linear (e.g., paramyxovirus, filoviruses, rhabdovirus) or segmented and linear (e.g., bunyavirus, arenavirus, myxoviruses). These viruses are readily found in nature either in human and animal hosts or vectors; all of which have been well described in the literature. Most are easily cultured in laboratory settings. Again, laboratory availability diminishes with increased BSL ratings, so that BSL3 (e.g., 1918 influenza, Rift Valley Fever) and BSL4 (e.g., Ebola, Marburg, Lassa Fever, etc.) are the least available. The exceptions include the 1918 Spanish influenza virus and H2N2 (1957 pandemic) Asian influenza viruses which are likely extinct in the wild. The 1918 Spanish influenza was resurrected from a molecular clone and is only available in a few laboratories worldwide, but the H2N2 strain is more prevalent in laboratory settings (81). Both viruses are likely capable of producing pandemic disease,

COMMISSIONED PAPERS

55

Synthetic Genomics: Risks and Benefits for Science and Society

Figure 7. Category IV Priority Viruses. Family

Virus

Genome

Category IV

Positive Polarity RNA Genomes

Linear

Infectivity/Boot Infection Yes/Yes

Linear 7,654 8284

Yes/Yes No/No ?/No

7467

?/No

HAV Foot&Mouth Virus Poliovirus* Swine vesicular disease virus ssRNA + polarity Plum Pox Virus

7,478 8,161 7,440 7,401

Yes/Yes Yes/Yes Yes/Yes Yes/Yes Yes/Yes

VEE EEE WEE Chikungunya virus

11,444 11,675 11,484 11,826

Dengue West Nile Yellow Fever Wesselsbron disease virus Japanese Encephalitis Virus Central European TB-encephalitis Far Eastern TB encephalitis virus Louping ill virus Kyasanur Forest virus Omsk HF Virus Russian Spring/Summer Encephalitis virus Classical swine fever virus SARS-CoV

10,735 10,962 100,862 NA

Yes/Yes Yes/Yes Yes,Yes Yes,Yes Yes,Yes Yes Yes/Yes Yes/Yes Yes/Yes Yes/Yes Yes/No

10,976

Yes/Yes

10,97810,871 NA

Yes/Yes

Y

C

Yes/Yes

Y

10,871 Incomplete

No/No Yes/No

10,787

Yes/No Yes/No

12,301

Yes/

29,751

Yes/Yes

Calicivirus Human Norovirus Vesicular exanthema virus Rabbit Hemorrhagic virus Picornavirus

Potyvirus

9741

Alphavirus

Flavivirus

Coronavirus 1

HMS/CDC

NIH A-C

Commerce

USDA

Nature

Laboratory

Synthetic

Y

Yes Yes

Yes Yes

Possible Not yet Plausible

Y

Yes

Yes

Unlikely

Yes Yes Yes Yes

Yes Yes Yes Yes

Yes Unlikely Plausible Done Plausible

Yes

Yes

Yes Yes Yes Yes

Yes Yes Yes Yes

Y

Yes Yes Yes Yes

Yes Yes Yes Yes

Yes Unlikely Yes Plausible Unlikely Unlikely Unlikely Yes Unlikely Unlikely Unlikely Unlikely

Y

Yes

Yes

Unlikely

?

Yes

Yes

Unlikely

C

?

Yes

Yes

Unlikely

Y

B

?

Yes Yes

Yes Yes

Unlikely Unlikely

Y Y

C C

? Y

Yes Yes

Yes Yes

Unlikely Unlikely

Yes

Yes

Unlikely

No1

Yes

Yes

B

B Y Y Yes Y Y

Yes B B B

Y Y Y Y

A B C

Y

Y Y

Y

B

Y

Y C

The 2002-2003 epidemic strain is likely extinct in the wild; many zoonotic forms exist; *poliovirus is not included in any priority pathogen lists.

COMMISSIONED PAPERS

56

Synthetic Genomics: Risks and Benefits for Science and Society

as the Spanish Flu H1N1 and Asian H2N2 strains have not circulated in human populations for over 90 and 50 years, respectively.

Reverse genetics systems for

prototypic members of each virus family have been reported in the literature although success is more rare with arenaviruses and bunyaviruses. In contrast, well documented reverse genetic systems have been described for paramyxoviruses, rhabdoviruses, myxoviruses, and filoviruses providing clear templates for reconstruction of synthetic viruses. Although many Category I-V agents are available in laboratory settings, serial passage of virus in cell culture oftentimes selects for “culture adapted” variants that display altered or reduced pathogenicity in the original host. In fact, serial passage in cell culture or alternative animal model has been used to attenuate virus pathogenesis and was used as a method to develop live attenuated poliovirus and measles virus vaccines. Consequently, laboratory strains may not reproduce wildtype virus pathogenicity and virulence when reintroduced into the natural host and may not represent the preferred source of starting material for bioterrorism applications.

III. Barriers to Synthesizing and Resurrecting Viruses by Synthetic Biology and Reverse Genetics Genetic engineering of viruses requires the development of infectious clones from which recombinant viruses can be isolated.

Two basic strategies exist to develop and

molecularly clone a viral genome: classic recombinant DNA approaches or synthetic biology. Although the basic methodology is different, the outcome is the same, a full length DNA copy of the viral genome is constructed which is infectious upon delivery to a permissive host cell. Classic recombinant DNA approaches require the availability of viral nucleic acid, which is normally isolated from infected tissues or cells and used as template for cloning and sequence analysis. For RNA viruses, the approach includes using reverse transcriptase and polymerase chain reaction to clone overlapping pieces of the viral genome and then whole genome assembly and sequence validation before successful recovery of recombinant viruses (10).

Virus genome availability is an

important issue and until recently, a major bottleneck in constructing a molecular clone to COMMISSIONED PAPERS

57

Synthetic Genomics: Risks and Benefits for Science and Society

any BW virus. Most, though not all, viral BW agents are not readily available except in high containment BSL3 and BSL4 laboratories throughout the world. The few sites and lack of funding support historically limited access to a small number of researchers, although increased support for BW research has greatly increased the distribution and availability of these agents throughout the world (31). Most viruses are also available in zoonotic reservoirs although successful isolation may require an outbreak or knowledgeable individuals carrying out systematic sampling of hosts in endemic areas. Then, containment facilities for replicating virus are necessary. Some exceptions to this general availability of controlled viruses include early 20th century influenza viruses like the 1918 H1N1 (Spanish flu), the 1957 H2N2 (Asian Flu), smallpox viruses (extinct 1977) and perhaps the 2002-2003 epidemic SARS-CoV strains, all of which are likely extinct in the wild given the lack of recent human disease.

With the molecular

resurrection of the 1918 H1N1 strain using recombinant DNA techniques (81), these viruses only exist in select laboratories distributed throughout the world. Two general approaches exist for synthetic reconstruction of microbial genomes from published sequence databases: de novo DNA synthesis and polymerase cycling assembly (PCA). Roughly 50 commercial suppliers worldwide provide synthetic DNAs using either approach, mostly in the range of 30Kb. For example, Blue Heron’s GeneMaker™ is a proprietary, high-throughput gene synthesis platform with a ~3-4 week turnaround time and is reported to be able to synthesize any gene, DNA sequence, mutation or variantincluding SNPs, insertions, deletions and domain-swaps with perfect accuracy regardless of sequence or size (http://www.blueheronbio.com/). Most commercial suppliers, however, use polymerase cycling assembly (PCA), a variation on PCR. Using published sequence, sequential ~42 nucleotide oligomers are synthesized and oriented in both the top and bottom strand, as pioneered for ΦX174 (73) (Figure 9). Top and bottom strand oligomers overlap by ~22 bp. The PCA approach involves: 1) phosphorylation of high purity 42-mers (oligonucleotide strands of DNA) in the top and bottom strand, respectively, 2) annealing of the primers under high stringency conditions and ligation with the Taq ligase at 55oC, 3) assembly by polymerase cycling assembly (PCA) using

COMMISSIONED PAPERS

58

Synthetic Genomics: Risks and Benefits for Science and Society

the HF polymerase mixture from Clontech (N-terminal deletion mutant of Taq DNA polymerase lacking 5’-exonuclease activity and Deep VentR polymerase [NEB] with 3’ exonuclease proofreading activity), 4) PCR amplification and cloning of full length amplicons (Figure 9). The key issue is to use HPLC to maximize oligomer purity and to minimize the numbers of prematurely truncated oligmers used in assemblages. As PCR is an error prone process, the PCA approach is also error prone and it requires sequence verification to ensure accurate sequence. PCA is also limited to DNAs of 5-10 Kb in length which is well within the genome sizes of many viral genomes, although improvements in PCR technologies could extend this limitation.

Both approaches,

coupled with systematic genome assembly techniques shown in Figure 4, will allow assembly of extremely large viral genomes, including poxviruses and herpes viruses. Consequently, knowledgeable experts can theoretically reconstruct full length synthetic genomes for any of the high priority virus pathogens, although technical concerns may limit the robustness of these approaches. It is conceivable that a bioterrorist could order Figure 9. PCA Technique. Synthetic Reconstruction of Exotic SARS-CoV Spike Glycoproteins.

Synthetic S glycoproteins are synthesized and inserted into the SARS-CoV molecular clone; allowing for recovery of recombinant viruses encoding zoonotic S glycoproteins.

COMMISSIONED PAPERS

59

Synthetic Genomics: Risks and Benefits for Science and Society

genome portions from various synthesis facilities distributed in different countries throughout the world and then assemble an infectious genome without ever having access to the virus. To our knowledge, no international regulatory group reviews the body of synthetic DNAs ordered globally to determine if a highly pathogenic recombinant virus genome is being constructed. What, then, are the technical barriers to the reconstruction of viral genomes? Three major issues are generally recognized: sequence accuracy, genome size and stability, and expertise. They are discussed in this order below. Sequence databases record submissions from research facilities throughout the world. However, they have limited ability to review the accuracy of the sequence submission. Consequently, these databases are littered with mistakes ranging from 1 in 500 to 1 in 10,000 base pairs. In general, large sequencing centers are more accurate than independent research laboratories (18, 36). Accurate sequence is absolutely essential for rescuing recombinant viruses that are fully pathogenic (7, 10, 30, 85, 86) as even a single nucleotide change can result in viable virus that are completely attenuated in vivo (74). Sequence accuracy represents a significant barrier to the synthetic reconstruction of these highly pathogenic viruses.

RNA viruses exist in heterogeneous “swarms” of

“microspecies,” thus requiring the identification of a “master sequence;” i.e., the predominant sequence identified after sequencing the genome numerous times. Consequently, full length sequence information may have been reported, but the published sequence may actually not be infectious. Problems with sequence accuracy are proportional to genome size, as reported sequence for large viral genomes will more likely include a higher number of mutations than small genomes. In many instances, sequence errors will reside at the ends of viral genomes because the ends are oftentimes more difficult to clone and sequence. Using state of the art facilities, the smallpox genome from a Bangladesh 1975 strain was sequenced (47). However, an error rate of 1:10,000 would result in about 19-20 mistakes and 10-14 amino acid changes in the recombinant genome. Should these mistakes occur within essential viral proteins or occur in virulence alleles, recovery of highly pathogenic COMMISSIONED PAPERS

60

Synthetic Genomics: Risks and Benefits for Science and Society

recombinant viruses might be impossible. More recently, another genome sequence of Variola major (India 1967) has been reported in the literature (Bangladesh 75, and India 67; Accession # X69198 and L22579). These full length genomes differ in size by 525 base pairs, contain ~1500 other allelic changes scattered throughout the genomes, and also differ in size and sequence with the Variola minor genome (Figure 5). Although roughly 99.1% identical, which of these reported sequences are correct? Will pathogenic virus be recovered from a putative molecular clone of either, both or neither? If neither is infectious, which changes are responsible for the lethal phenotype? In the absence of documentation of the infectivity of a reported sequence, it becomes difficult to accurately predict the correct sequence that will allow for the recovery of infectious virus. At best, a combination of bioinformatics, evolutionary genetic and phylogenetic comparisons among family members may identify likely codon and nucleotide inconsistencies, simultaneously suggesting the appropriate nucleotide/codon at a given position. In the case of poxviruses, only two full length sequences of Variola major have been reported, hampering such sequence comparisons. Ultimately this approach only allows informed guesses that may not result in the production of recombinant virus. Obviously, reported full length genomic sequences that have been demonstrated to generate infectious viral progeny provide an exact sequence design for synthetic resurrection of a recombinant virus, greatly increasing the probability of success. In the absence of this data, multiple full length submissions are needed to enhance the probability of success. Another problem hampering the development of synthetic DNA genomes for genetic manipulation are genome size and sequence stability in microbial vectors. Many viral full-length cDNAs, including coronavirus genomes and certain flavivirus genomes like yellow fever virus are unstable in microbial vectors (10). Low copy BAC vectors and stable cloning plasmids oftentimes reduce the scope of this problem although instability has been reported with large inserts following passage (1, 85). Plasmid instability might be caused by sequence toxicity associated with the expression of viral gene products in microbial cells or the primary sequence might simply be unstable in microbial vectors, especially sequences that are A:T rich. To circumvent this problem, plasmid vectors have been developed that contain poly-cloning regions flanked by several transcriptional

COMMISSIONED PAPERS

61

Synthetic Genomics: Risks and Benefits for Science and Society

and translational stops to attenuate potential expression of toxic products (86). The development of wide host range, low copy vectors that can be used in Gram positive or lactic acid bacteria may also allow amplification of sequences that are unstable in E. coli hosts. Alternatively, theta-replicating plasmids that are structurally more stable and that accommodate larger inserts than plasmids that replicate by rolling circle models may alleviate these concerns in the future (3, 35, 58). Poxvirus vectors also provide an alternative approach for stably incorporating large viral genome inserts, although longterm stability of these vectors is unknown (1, 77). The technical skill needed to develop full length infectious cDNAs of viruses is not simple and requires a great deal of expertise and support: technically trained staff, the availability of state of the art research facilities, and funding. Theoretically, the ability to purchase a full length DNA of many viral biodefense pathogens is now possible, especially for those virus genomes that are less than 10 kb in length. In addition, defined infectious sequences are documented and methods have been reported in the literature. Infectious genomes of many Class IV viruses could be purchased and the need for trained staff becomes minimized.

Today, a picornavirus or flavivirus genome could be

purchased for as little as $15,000, a coronavirus genome for less than $40,000. It is much more difficult to reconstruct large viral genomes, meaning that trained staff and state of the art facilities become very essential to the process. However, it is conceivable that technical advances over the next decade may even render large viral genomes commercially available for use by legitimate researchers, but perhaps also by bioterrorists.

COMMISSIONED PAPERS

62

Synthetic Genomics: Risks and Benefits for Science and Society

IV. Risk and Benefits of Synthetic Organisms A. Benefits to Society The benefits of recombinant DNA have been heavily reviewed in the literature and include the development of safe and effective virus platform technologies for vaccine design and gene therapy, the production of large quantities of drugs and other human and animal medicines, and agricultural and other products key to robust national economies. Genetic engineering of bacteria and plants may allow for the production of large quantities of clean burning fuels, produce complex drugs, design highly stable biomolecules with new functions, and develop organisms that rapidly degrade complex pollutants (52, 56, 64, 78). Comparative genomics also provides numerous insights into the biology of disease-causing agents and is allowing for the development of new diagnostic approaches, new drugs and vaccines (27). Synthetic biology enhances all of the opportunities provided by recombinant DNA research. The main advantages of synthetic genomics over classic recombinant DNA approaches are speed and a mutagenesis capacity that allow for whole genome design in a cost effective manner (6). How will synthetic biology protect the overall public health? A major advantage is in the development of rapid response networks to prevent the spread of new emerging diseases. Platform technologies allow for rapid detection and sequencing of new emerging pathogens. The SARS-CoV was rapidly identified as a new coronavirus by gene discovery arrays and whole genome sequencing techniques within a month after spread outside of China (37, 46, 83, 84). Similar advances were also made in the identification of highly pathogenic avian H5N1 influenza strains, hendra virus and in other outbreaks. Sequence information allowed for immediate synthesis of SARS and H5N1 structural genes for vaccines and diagnosis and the rapid development of candidate vaccines and diagnostic tools within a few months of discovery. Classic recombinant DNA approaches requires template nucleic acid from infected cells and tissues (limited supply), followed by more tedious cloning and sequence analysis in independent labs throughout the world. As access to viral nucleic acids historically limited response

COMMISSIONED PAPERS

63

Synthetic Genomics: Risks and Benefits for Science and Society

efforts to only a few groups globally, research productivity was stifled. Synthetic biology results in a true paradigm shift in virus vaccine, therapeutic and diagnostic discovery, resulting in the near simultaneously engagement of multiple laboratories as genome sequence becomes available (Figure 10).

Figure 10. Synthetic DNA Rapid Response Applications.

Synthetic DNA Applications Vaccine Applications Pathogen Sequence Analysis

Recombinant Proteins Peptides (2-3 months)

(10 IU (Vibrio cholerae)

106 – 100 IU (Influenza A virus)

Transmission

Indirect contact (contact with contaminated surfaces, animal bedding)

Direct contact (droplet, tissue, fluid, secretion contact with mucous membranes; ingestion)

Stability

Survive minutes to hours on surfaces (Measles virus)

Survive days to weeks on surfaces (Hepatitis B virus)

Animal host range

Not likely to cross species barrier

Broad host range but not known to cause disease in humans

Occurrence of natural disease

Endemic

Not endemic

Probable causes of laboratory-associated infections

Absence of LAI reports

Accidents; percutaneous; ingestion; unknown

WHO Risk Group**

Risk Group 2 (moderate Risk Group 3 (high individual risk, low community individual risk, low risk) community risk)

6

HIGH

Severe disease (Cercopithecine herpes virus) Lethal disease or high infectivity
Recommend Documents
Dunlop, L. R., K. A. Oehlberg, J. J. Reid, D. Avci, and A .... isler. 2001. Immunology 101 at poxvirus U: Immune. 55. 56. tic, cellular and immune approaches to ...

The authors assume full responsibility for the report and the accuracy of its .... We found no “magic bullets” for assuring that synthetic ge- nomics is ... verified as legitimate users by a Biosafety Officer. I-4. ... Reading the evaluation diag

Li Ma, Monzia M. Moodie, Chuck Merryman, Sanjay Vashee, Radha Krishnakumar, ... Pfannkoch, Evgeniya A. Denisova, Lei Young, Zhi-Qing Qi, Thomas H.

Freedom and Responsibility in Synthetic Genomics: The Synthetic Yeast Project. Anna Sliva,*,† Huanming Yang,‡ Jef D. Boeke,† and Debra J. H. Mathews§,1.

My servant Moses commanded you. Do not turn from it to the right or the left, so that you will have success wherever you go. 8 This book of instruction must not ...

Thirty-three habitat distribution maps are presented in the form of dot maps at a 10km2 scale. Sixteen of ... whilst locational data exist at a site level for a number of habitats they are not presented at that scale ..... available NVC information f

Subsequent sampling at Loch Skeen by gill netting in 2003 revealed that these ... range 16 to 111 g) was recorded from inshore, offshore bottom and offshore surface .... followed a discrete systematic parallel design and took into account the ...