Position Summary

Known for its scientific and operational excellence, Regeneron is a leading science-based biopharmaceutical company that discovers, invents, develops, manufactures, and commercializes medicines for the treatment of serious medical conditions. Regeneron commercializes medicines for eye diseases, high LDL-cholesterol, atopic dermatitis and a rare inflammatory condition and has product candidates in development in other areas of high unmet medical need, including rheumatoid arthritis, asthma, pain, cancer and infectious diseases.

These positions are in Bioinformatics Core Services in Tarrytown, NY. There are 3 openings for Summer 2018. Each opening is described below.

Position 1: Supporting our drug research and developing pipelines, at REGENERON we leverage on whole genome/ exome/ transcriptome NGS technologies to inform the design of precise genome editing experiments on animal models and human cell lines. For that purpose, diverse DNA sequence data is generated in-house that is mapped onto the reference genomes via the REGENERON UCSC Genome Browser. This data includes whole genome sequencing of mouse strains and human cell lines, mRNA sequencing, KO/KI allele design information and reagents like CRISPR/Cas9 guide RNAs, primers and genotyping assays. With time, this sequence data has increased in number, diversity and complexity, making it impossible to display all at once and cumbersome for end-users to setup by themselves.

The student will participate in ongoing efforts to generate user friendly visualizations of the data generated in house upon selection of genome browser tracks and sessions. For that purpose, the student will develop a repository of genomic data that will be accessible by the web interface. The student will have the chance to learn and develop an R/Shiny web interface that communicates with the backend of the UCSC genome browser. Familiarity with any flavor of genome browser is required as well as Unix, as well as some working knowledge of bioinformatics tools and languages such as R, Perl, Python, Java, Ruby on Rails, MatLab, SQL, and web development tools.

Position 2: Reference genomes for mouse, human and other organisms are well characterized and universally used resource with multitudes of feature annotations such as gene bodies, protein domains, and homology to other organisms. Genome editing relies on these existing reference sequence, but largely ignore strain-specific variation. To create strain-specific genome editing reagents such as CRISPR guide RNAs or targeting oligo-nucleotide primers, it is not necessary to recapitulate the entire specific genome, but it is important recognize when a guide or oligo may be designed on a region of variation. Variant information specific to the strain can be used to modify and inform the sequence of the guide or oligo, with the net effect being an increased specificity in the targeting method.

The student will be expected to continue ongoing research to create a suite of modular programs that takes strain specific genome variation into consideration when returning sequence for diverse genome editing reagents (CRISPR/Cas9 guide RNAs, gene knockout/knockin oligo-nucleotide primers, Genotyping assays, among others). Common bioinformatic tools (such as BWA and GATK) increasingly take advantage of various haplotype-aware genomes. In the case of the Regeneron mouse, the different “haplotypes” are the different mouse strains. The student will be expected to create a mouse reference genome (based on GRCm38) that is mouse-strain aware. In this project, we not only introduce the principles of being “genome-aware” but also strain or haplotype differentiation when creating gene editing designs.

Position 3: The utility of the large quantities of variant genes identified within the Regeneron Genetics Center (RGC) Mendelian Project is growing increasingly apparent, and the association of these genes with diseases and their corresponding phenotypes may lead to the development of drug candidates. Bioinformatics Core Services (BiCS) is working with the RGC to develop tools that will facilitate automated ranking of variant genes as a function of (1) the phenotypes with which they are transitively associated via OMIM disease and (2) HPO phenotype annotations extracted from clinical presentation data. These contextualized rankings will be provided to users via specialized interfaces. Some elements of the underlying algorithmic mechanisms already exist, while others require generation, structuring, and/or integration using existing information within the company.

The student will participate in these efforts with a project that entails database ETL activities (Extract Transform Load) from open source databases and internally generated data. There will be learning opportunities in integrative algorithm development, semantic similarity analysis, MySQL, R, API (application programming interface) construction, and Shiny as well as REGENERON technologies.

• Must be an undergraduate student
• Must be pursuing a degree in Computational Biology or Bioinformatics
• Experience with programming language, including python, R, or similar
• Must have good oral and writing skills; the student will need to write reports and present findings.

General Intern Program Information:
• Must be enrolled in, or accepted to, an academic program pursuing a Bachelor’s, Master’s, PhD or PharmD
• Summer Program is full-time (~40 hrs/wk) for at least 10 weeks and is paid
• Prefer demonstrated leadership in areas such as campus activities, clubs, sports or the community
• Minimum GPA of 3.0
• You will work with a specific manager on a project(s)
• Enjoy weekly 1 hour Intern Program events: learn about other areas of the company from VPs and other employees, soft skills workshops, networking, volunteering, and more!
• Present to your team on your summer project

Transportation and Housing:
• A free shuttle is offered from the North White Plains or Tarrytown train stations to the Tarrytown campus
• Get to know other incoming Interns on our closed LinkedIn page when you accept an offer

