You've been giving us a lot of feedback about data generation times and features, and we've been listening. After a lot of R&D, we've now rolled out a new system for data generation in Gorilla, Codename: Data Hamster! This enables you to combine your data across experiment versions and across tasks/questionnaires within Gorilla.
In short, you can get the number of data files that you need to download down to just two!
Data is the key end result of the experiments you run on Gorilla. Just as Gorilla makes building your experiment faster and easier, generating and downloading your data should be fast and simple, giving you your data in the format you need.
Feedback from you, our users, and from the Gorilla team itself, was that we weren't hitting this requirement with data generation. As the number of active researchers, the complexity of experiments, and the number of participants being recruited went up, so did data generation times. Further, having data split between experiment tree nodes (tasks/questionnaires) and across experiment versions was proving arduous, requiring a lot of manual data merging. These, and other invaluable insights, were provided by the users who filled in our Data Collection survey early in 2023.
We knew we needed a better interface for selecting the data you want to download. To make that impactful, we first needed to make data generation faster overall. With our existing system, more filtering options would only mean longer wait times. Worse, we were starting to hit bandwidth limits within Microsoft Azure that would be expensive to overcome.
We needed a system that would
I won't go into the details of all the options we considered and their various pros and cons. That's a much longer and more technical article. What I will do is introduce you to the solution we've implemented, codename: Data Hamster.
Built for ingesting, the Hamster can process a lot of content in a small space, in a higher bandwidth environment. This allows the Hamster to process many more requests concurrently than our existing data generation system. Further, using the latest in stream feeding technology, we can run more complicated processing steps on individual lines and sections of data with only small reductions in speed.
In the short term, what this allows us to do is offer three new options for data generation:
You can any use any of these options in combination with each other. Selecting all three would mean you'll get
That's just two files for your whole experiment!
To make Data Hamster as performant as possible with these new combining options, your data needs to be in a readily nom-able, easily processed format. Data from new participants collected on new and existing experiments will be automatically compiled into this format.
However, for existing experiments/participants, there needs to be an additional migration step, copying and converting all of your existing participant data into the new format. This migration step will happen in the background: while it's happening, there will be no disruption to your experiment or data collection. You will still be able to regenerate and download data files using the old system. Importantly, the migration involves copying the data files, so there is no risk of data loss.
All content remains within Microsoft Azure, covered by all of their and our existing data security requirements and regulations. The copying of data between existing storage and Data Hamster's storage tooling is done entirely within Azure - data is not manually accessed or interacted with by Gorilla staff. All existing regulatory agreements and MSA's are preserved.
With everything we create at Gorilla, security is our first and foremost consideration. We are constantly updating and revising our existing content and new tooling to use the latest security features and recommendations.