I’m trying to find particular biological patterns, so I build these models with a lot of structure and include a lot about what kinds of signals I’m expecting. I establish a scaffold, a set of parameters that will tell me what the data say, and what patterns may or may not be there. The model itself has only a certain amount of expressivity, so I’ll only be able to find certain types of patterns. From what I’ve seen, existing general models don’t do a great job of finding signals we can interpret biologically: They often just determine the biggest influencers of variance in the data, as opposed to the most biologically impactful sources of variance. The scaffold I build instead represents a very structured, very complex family of possible patterns to describe the data. The data then fill in that scaffold to tell me which parts of that structure are represented and which are not.
Barbara Engelhardt, Princeton University computer scientist