A Master's Journey
Digitization and artificial intelligence (AI) have profoundly revolutionized the biotech industry, transporting scientists beyond the lab into a new era of computationally-driven discovery. By automating repetitive tasks in drug research and manufacturing, enabling the analysis of massive datasets, and driving the production of highly sophistocated technologies, AI has become the cornerstone of modern biotechnology. The convergence of biology with computational science is not simply a technical implementation- it is a catalyst for monumental advancement: streamlined drug manufacturing, accelerated scientific breakthroughs, and a healthcare system capable of delivering earlier, more accurate diagnoses alongside more effective treatments. By embracing an interdisciplinary approach early on in my studies at Northeastern, I positioned myself to contribute meaningfully to the next generation of biotech innovations, where the synergy of biology, programming, and AI/ML is not optional, but essential.
As the biotech program does not traditionally offer courses on programming, machine learning, artificial intelligence, or bioinformatics, I chose to design my own genomic-bioinformatics specialization as a supplement to my master's. However, since I was required to complete my core curriculum prior to taking courses towards my specialization, I could not enroll in computer science courses at Northeastern until Spring 2025. Therefore, I took the initiative to engage in independent learning courses and seminars from LinkedIn, Coursera, Simplilearn, Rosalind Franklin, Oxford Academic, etc, to enhance my statistical analysis skills, learn to program, and develop an algorithmic mindset for approaching biological challenges. I also had the opportunity to learn about AI and ML through a three-seminar series offered by Northeastern's Koury College of Computer Sciences, for which you can view my badges of accomplishment.
Some of the skills and concepts I have learned include: using a high performance computing (HPC) cluster (Explorer) and Slurm to run quality control steps such as trimming (Trimmomatic), deduping (BBMap toolkit-dedupe), decontamination (BBMap Toolkit-bbduk) FastQC, and MultiQC analysis on large genomic datasets, proficiency in Python, R, Bash, and PHP, designing multi-module applications, building, training, and testing statistical and algorithmic models such as HMM, phylogenetic trees, KNN & clustering, dynamic programming, neighbor joining & UPGMA, Bayesian Networks, Needleman-Wunsch & Smith-Waterman, Multiple Sequence Alignment, Position Weight Matrices, maximum likelihoods, De Bruijn graphs, Viterbi, classification models & gene prediction, forecasting, linear regression & OLS, random forests, gradient boosting models,
In this section, you will find a collection of programs, statistical investigations, and and algorithmic modeling reports from assignments and projects I have completed. Important to note is that while not all this work was completed at Northeastern University, I did complete it all while in pursuit of my master's. This collection demonstrates my proficiency in and successful integration of computational methods and developing an algorithmic mindset.
An Introductory to R: Manipulating Flight Data in a csv file
Programming in
Python
Bioinformatics: The Intersection of Genes- A Multi-Directory Program
Descriptive Statistics in Python
Phylogenetic Trees, UPGMA, Coding Classifications, HMM, ORF Generator in Python
Counting Codons & the Concentration-Volume Calculation
Flowchart Assignment with Lucid & Python
Bioinformatics for Beginners with Pavel Pevzner: Reverse Compliment, Motif Finder, Pattern Matching in Python
Interesting Coding Facts!
Code is read more than written, code is run more than read.
The origin of a "bug" was a real moth discovered in a relay of an actual computer in 1947
In 2020, MIT neuroscientists found that interpreting code activates a general-purpose brain network, but not language-processing centers.
Learning to code has definite cognitive benefits- creative problem-solving, critical thinking, and developing teamwork skills. Coding is a left-brain activity, and research dating back to 1991 confirms that coders develop higher cognitive skills and coding dramatically reduces the chances of degenerative diseases such as Alzheimer's!
In the future, it is predicted that coding and technical literacy may be as essential to daily life as being literate is!
Over SEVEN HUNDRED coding languages exist!!!
All software that runs computers is written using only 0's and 1's- it's all binary!
Mother nature created a more complicated codebase with a base-4 system: A, T, C, and G! DNA is more like machine code or bytecode than human-readable code like Python or C++
There exists incredible controversy in calling DNA code -> "DNA is NOT code. It is NOT a language. It is NOT programmed. It is NOT software. The analogy to computer software is terribly flawed..." hmm...
Scientists were actually able to encode a program using nucleotides!
Scientists have also created malicious software by encoding commands into DNA. When a gene sequencer analyzes this DNA, the data stream executes the malware, taking control of the computer running the analysis! Scary stuff!!!
Another one: Researchers have encoded images and even MOVIES into DNA, then used LIVING cells to store and reconstruct the data with HIGH ACCURACY!!!
Digital programs and vast amounts of data can be encoded into quartz glass using lasers and creating an ultra-durable, long term storage media known as "5D memory crystals" that can survive extreme conditions for billions of years! Recent efforts have even been successful in creating "Superman memory crystals" to store the human genome as a blueprint for potential future reconstruction, the Bible, and the Magna Carta!!!
90% of money is JUST CODE!!!
NASA still uses code from the 1970's, demonstrating its enduring, critical nature.
The first computer programmer was a woman!!!
The average smartphone contains more code than the space shuttle!
The QWERTY keyboard design was meant to slow typing down- early computers jammed when people typed too fast, so QWERTY spaced out commonly paired letters!