My first hackathon @ UCLA -- CaSB
Thanks to my three teammates. We worked together to tackle this open-ending question.
This repo contains all the research paper, resources, codes for the UCLA winter CaSB minihack project.
The repository is messy right now because I didin't care to construct a well-maintained branch.
Some number inside the title means the percentage I used to split the dataset, which turns out to have huge results to the model's accuracy. The title also specs the training methods, type of model architechture.
For example, 2985 means split the dataset into three categories, with 29% and 85% as the two cutoff lines.
Reversed means the data has been enhanced by adding also its reverse, complimentary and reverse complimentary strands.
submit and submit1 folder takes turns to submit so I can multitask. I was chasing around the clock when the best results came out.
copies of files is for demonstration to GPT. This contests doesn't prohibit the use of such AI tools, and the internet.
There are some google colab documents I used at the first few stages to get a grasp of the statistics of the data and the initial training of the model. Here is the list of documents:
This tries to find relationship of: If I found a motif in a sequence, what would the activation score of the nuocleotide around that region looks like? Half-failed, results not promising
https://colab.research.google.com/drive/1Nln02BndHIbK0YJUOyXGYvs8bQo46ccL?authuser=1
This is the intial data exploration:
https://colab.research.google.com/drive/1vW28Efrl5Y8vk2IFpjpCXKnP_jREZpa_?authuser=1
Forgot what this is:
https://colab.research.google.com/drive/1L8ru3tYgC6J85lo5jcmOHPPRrwYa6J90?authuser=1
Overall activation score:
https://colab.research.google.com/drive/1LjxHkTiCTtB9sS4dy7EcZ4jHtweXYlz6?authuser=1
Failed in the code, the lesson: don't use GPT too much. Not dependable
https://colab.research.google.com/drive/17gf3W5s4Z4s-R7dRHRbVua4T7l3Lb-cs?authuser=1
https://colab.research.google.com/drive/1t_93Xj4iadGkOfz2ttUdmtXQgecICoyk?authuser=1