Eye Tracking in Task Builder 2

Overview
Eye Tracking Setup
Eye Tracking Data
Eye Tracking Analysis
Eye Tracking FAQs

Overview

Welcome to the Eye Tracking in Task Builder 2 page!

Navigate through the menu to the left for information on how to understand your eye tracking data, prepare it for analysis and see answers to most commonly asked questions.

Eye Tracking Setup

Detailed instructions on how to set up the Eye Tracking (Webgazer) component in Task Builder 2, including tips for successful calibration, are available in our Task Builder 2 Components Guide.

See our Visual World Paradigm (Eye Tracking) sample to try out the component and view an example setup.

Eye Tracking Data

When you download your data, you will (as standard on Gorilla) receive one data file which will contain all of your task metrics for all of your participants. This will contain summarised eye-tracking data.

For each time the participant attempted calibration, the Response column will contain information on whether calibration failed or succeeded.

For each eye tracking data collection screen, you will receive information on the absolute and proportional time participants spent looking at each quadrant of the screen. Screen quadrants are represented in the Response column by the letters a, b, c and d (where a = top-left, b = top-right, c = bottom-left, and d = bottom-right). The time in milliseconds that the participant spent looking at each quadrant is shown in the Response Duration column, and the proportion of total screen time the participant spent looking at each quadrant is shown in the Proportion column. The screenshot below shows an example (some rows and columns have been hidden for clarity):

Screenshot of basic eyetracking data showing a, b, c and d in Response column, durations in Response Duration column and proportions in Proportion column

For many experiments, this will be the only eye tracking data you need.

By default, you will also receive the full coordinate data for the eye tracking component. You can control whether or not to collect this data using the 'Collect Raw XY Data' toggle in the Eye Tracking component settings. By default, it will be toggled on, as shown below.

Screenshot of settings of Webgazer component. Mode is set to Record and 'Collect Raw XY Data' is toggled on

You will receive the full eye-tracking data in separate files, one file per screen. When previewing a task, you can access the detailed eye-tracking files via a unique URL for each screen that will be contained in the Response column of your main data file. When running a full experiment, you will find these files in an 'uploads' folder included in your data zip file.

For guidance on interpreting and processing the data, see the Analysis section of this guide.

Eye Tracking Analysis

Once you have your detailed eye tracking data, this section describes the relevant column variables you need to look at. Any column headings not defined below are standard Gorilla data columns described in the Data Columns guide.

General Columns

'Spreadsheet Index'
- The current row number in the task spreadsheet minus 1
'Filename'
- The filename prefix for this detailed eyetracking file. This will be composed of the prefix entered in the File Prefix setting on the Eye Tracking component, followed by either -calibration or -collection.
'Type'
- Denotes what kind of data is shown in the current row. Explanations of the different values found in the Type column can be found below.

Guide to Coordinates

Within your detailed eye tracking data file, coordinates are given in two formats: raw and normalised.

Raw coordinates (Predicted Gaze X, Predicted Gaze Y) are in pixels. 0,0 is at the bottom-left of the screen. So a predicted gaze location of x = 50, y = 100 denotes a point 50 pixels from the left edge of the screen and 100 pixels from the bottom edge of the screen.

Because of variation between participants in screen and window sizes and display resolution, raw coordinates will not necessarily map onto the same visual content for different participants. Within-participant, you can still map gaze locations to AOIs by cross-referencing the predicted gaze coordinates with the Zone rows at the top of your detailed eye tracking data file. The Zone rows include the coordinates of the left and top edges of each object (or image, for images displayed within an Image object) on the screen, plus their width and height. See 'Object locations' section below for more details.

Normalised coordinates (Predicted Gaze X Normalised, Predicted Gaze Y Normalised) are normalised to a value between 0 and 1. They represent the predicted gaze location as a proportion of the Gorilla stage, the 4:3 area in which Gorilla studies are presented. 0,0 is at the bottom-left of the stage, and 1,1 is at the top-right. Normalised coordinates are comparable between participants: for example, 0.5, 0.5 always denotes the centre of the stage.

We've created a mock-up image which should make this clearer (note: image not to scale). To work out the normalised X, we need to take into account the white space on the side of the Gorilla stage.

Schematic showing visual representation of normalised x and y coordinates

Recording Files

Object locations

At the top of your detailed eye tracking data file, you will find a number of rows with 'zone' in the Type column. These rows contain the locations of each object on the task screen. You can use these to determine which objects on the screen the participant was looking at.

For object locations, the key variables for each sample/row are:

'Zone Name'
- Name of the object, as entered in the Name setting in the Task Builder.
'Zone UID'
- Unique identifier for this object. You can find this just above the Name setting for this object in the Task Builder.
'Zone X' and 'Zone Y'
- Coordinate of the left and top edges of the object in pixels. 0,0 is at the bottom-left of the screen. These will vary between participants with different window and screen sizes. For Image objects, the coordinates refer to the boundaries of the image as displayed within the object, not the boundaries of the object itself.
'Zone W' and 'Zone H'
- Width and height of the object in pixels. These will vary between participants with different window and screen sizes. For Image objects, the coordinates refer to the width and height of the image as displayed within the object, not the width and height of the object itself.
'Zone X Normalised' and 'Zone Y Normalised'
- Coordinate of the left and top edges of the object in normalised space (i.e. as a proportion of the Gorilla stage). 0,0 is at the bottom-left of the stage. Normalised coordinates are comparable between different participants: 0.5,0.5 will always be the centre of the stage, regardless of how big the stage is. For Image objects, the coordinates refer to the boundaries of the image as displayed within the object, not the boundaries of the object itself.
'Zone W Normalised' and 'Zone H Normalised'
- Width and height of the object in normalised space (i.e. as a proportion of the Gorilla stage). These are comparable between different participants. For Image objects, the coordinates refer to the width and height of the image as displayed within the object, not the width and height of the object itself.

Prediction rows

Below the 'zone' rows you will find a number of rows with 'prediction' in the Type column. Each of these rows represents Webgazer’s prediction of where the participant is looking on the screen. The eye tracking runs as fast as it can, up to the refresh rate of the monitor (normally 60Hz), so under ideal conditions you should get about 60 samples per second.

For predictions, the key variables for each sample/row are:

‘Predicted Gaze X’ & ‘Predicted Gaze Y’
- Predicted gaze location in pixels. 0,0 is at the bottom-left of the screen. These will vary between participants with different window and screen sizes.
‘Predicted Gaze X Normalised’ & ‘Predicted Gaze Y Normalised’
- These are prediction locations normalised as a proportion of the Gorilla stage. 0,0 is at the bottom-left of the stage. Normalised coordinates are comparable between different participants: 0.5,0.5 will always be the centre of the stage, regardless of how big the stage is.
'Timestamp'
- This indicates the current absolute timestamp for recording each prediction, without any adjustment for frame rendering. It represents the time when the current prediction was uploaded to the file. This is an absolute timestamp in Unix time, i.e., measured in milliseconds since midnight on 1st January 1970. You can match it up with the UTC Timestamp column in your main task data file to cross-reference gaze predictions with other events in your task (e.g. mouse clicks).
- A prediction is requested every 10ms (100Hz), but the recorded timestamp may be slightly different. WebGazer.js does not provide a consistent sampling rate, as there is a slight variable delay in generating predictions – based on the participant’s computer and browser power.
- In tools that require a fixed sampling rate, calculate an average interval and generate a dummy column incrementing each sample by this average.
'Elapsed'
- This is the difference between the Timestamp in the current row and the Timestamp in the first eye tracking row in this data file. It represents the time that has passed in milliseconds since eye tracking started on this screen.

Calibration Files

You can also choose to collect detailed eye tracking data for the Eye Tracking (Webgazer) component when used in Calibrate mode. The format of these files differs somewhat from the eyetracking recording files.

Rows containing 'calibration' in the Type column do not include gaze predictions, as these cannot be made until the eyetracker has been trained/calibrated. Rows containing 'validation' in the Type column include gaze predictions for a number of samples for each calibration point. Rows containing 'accuracy' in the Type column contain the validation information for each calibration point.

'Interval'
- The difference between the values in the current and previous row in the 'Elapsed' column. This allows you to see at a glance the interval between gaze predictions.
'Failures'
- The number of calibration points failed in the current calibration. This is reported at the end of each calibration, in the row where the Type column contains the overall status of the calibration attempt (either 'calibration succeeded', 'calibration failed - retrying', or 'calibration failed - max failures').
'Face Detection Confidence'
- The Support Vector Machine (SVM) classifier score for the face model fit. The SVM rates how strongly the image under the model resembles a face. 0 (no fit) to 1 (perfect fit). Values over 0.5 are ideal.
'X of Point' and 'Y of Point'
- The raw coordinates in pixels of the current calibration point. 0,0 is at the bottom-left of the screen. These will vary between participants with different window and screen sizes.
'X of Point Normalised' and 'Y of Point Normalised'
- The coordinates of the current calibration point as a proportion of the Gorilla stage. 0,0 is at the bottom-left of the stage. Normalised coordinates are comparable between different participants: 0.5,0.5 will always be the centre of the stage, regardless of how big the stage is.
'X of Centroid' and 'Y of Centroid'
- The raw coordinates in pixels of the average centroid based on validation predictions for the current calibration point. 0,0 is at the bottom-left of the screen. These will vary between participants with different window and screen sizes.
'X of Centroid Normalised' and 'Y of Centroid Normalised'
- The coordinates of the average centroid based on validation predictions for the current calibration point, measured as a proportion of the Gorilla stage. 0,0 is at the bottom-left of the stage. Normalised coordinates are comparable between different participants: 0.5,0.5 will always be the centre of the stage, regardless of how big the stage is.
'SD of X Centroid' and 'SD of Y Centroid'
- Standard deviation of the validation data for the current centroid.
'SD of X Centroid Normalised' and 'SD of Y Centroid Normalised'
- Standard deviation of the normalised validation data for the current centroid. Normalised data are comparable between participants.

Pointers for analysing data

Once you have downloaded your eye tracking data in CSV format, making sure the timestamps are printed out in full, you can use the 'Type', 'Trial Number', 'Screen Index', and 'Timestamp' columns to filter data into a format usable with most eyetracking analysis toolboxes.

Using your preferred data processing tool (R, Python, Matlab etc), select rows containing ‘prediction’ in the 'Type' column, and then use 'Trial Number', 'Screen Index', and 'Timestamp' to separate each trial or timepoints of data capture.

The data produced by Webgazer and Gorilla works best for Area of Interest (AOI) type data analyses. This is where we pool samples into falling into different areas on the screen, and use this as an index of attention.

Due to the predictive nature of the models used for webcam eyetracking, the estimates can jump around quite a bit – this makes the standard fixation and saccade detection a challenge in lots of datasets.

Toolboxes for data analysis

R: ‘Saccades’
- Toolbox in R for saccade, fixation and blink detection.
- https://github.com/tmalsburg/saccades.
R: ‘Gazepath’
- Toolbox for converting eyetracking data into Fixations and Saccades for analysis.
- Simply needs X & Y, estimated distance (we suggest using a dummy variable) and a trial index.
R: ‘eyetrackingR’
- Area of Interest (AOI) based tracking. Here you specify windows of interest and the toolbox analyses data based on if the gaze is placed in these AOIs or not.
- Various tools available for: Window analysis (i.e. in a certain window of time where did people look), Growth Curve Analysis (i.e. modelling timecourse of attention towards targets), Cluster analysis (identify spatio-temporal clusters of fixations in your data)
- Tutorial for reading in data
- Note: you have to use the ‘add_aoi’ function to convert X,Y data into AOI data
Python: ‘Pygaze Analyzer’
- Basic tool for visualising: Raw data, Fixation maps, Scanpaths, Heatmaps
- Pygaze Analyser

FAQ

Check out our Visual World Paradigm (Eye Tracking) sample to try the component out and see an example setup.

This is only available if 'Collect Raw XY Data' is toggled on in the Eye Tracking (Webgazer) component settings before starting to collect data - see the Data section of this guide.

When previewing a task, you can access the detailed eye-tracking files via a unique URL for each screen that will be contained in the Response column of your main data file. When running a full experiment, you will find these files in an 'uploads' folder included in your data zip file.

The Eye Tracking (Webgazer) component only allows you to calibrate the tracker once it has detected a face in the webcam. You may need to move hair off the eyes, come closer to the camera, or move around.

Eye tracking via the browser is much more sensitive to changes in environment/positioning than lab-based eye tracking. To increase the chances of successful calibration, instruct your participants to ensure they are in a well-lit environment, to ensure that just one face is present in the central box on the screen, and to avoid moving their head during the calibration. You can also adjust the calibration settings to make the calibration criteria less stringent (although note that this may result in noisier and less reliable data).

Yes and No - but mostly No. The nature of Webgazer.js means that predictions will be a function of how well the eyes are detected, and how good the calibration is. Inaccuracies in these can come from any number of sources (e.g. lighting, webcam, screen size, participant behaviour).

The poorer the predictions, the more random noise they include, and this stochasticity prevents standard approaches to detecting fixations, blinks and saccades. One option is to use spatio-temporal smoothing -- but you need to know how to implement this yourself.

In our experience less than 30% of your participants will give good enough data to detect these things.

You will get the best results by using a heatmap, or percentage occupancy of an area of interest type analysis. Open the expandable 'Are there any studies published using eye tracking in Gorilla?' below to see examples of the analyses done in previously published papers.

You can, however, there are two main issues here 1) the calibration stage requires the participant to look at a series of coloured dots, which would be a challenge with young children, and 2) getting children to keep their head still will be more difficult. If the child is old enough to follow the calibration it should work, but you will want to check your data carefully and you may want to limit the time you are using the eye tracking for.

We've created a mock-up image which should make this clearer (note: image not to scale). To work out the normalised X, we need to take into account the white space on the side of the Gorilla stage, the 4:3 area in which Gorilla studies are presented.

For object locations (rows at the top of your detailed eye tracking data file with 'zone' in the Type column), Zone X Normalised refers to the left edge of the object, and Zone Y Normalised refers to the top edge of the object. Zone W Normalised is the width of the zone in normalised space (i.e. as a proportion of the width of the stage). Zone H Normalised is the height of the zone in normalised space (i.e. as a proportion of the height of the stage).

Note that for Image objects, the coordinates, width, and height refer to the image as displayed within the object, not the object itself.

The Analysis section of this guide provides explanations of the columns included in your eye tracking data. If you have a specific question about your data, you can get in touch with our support desk, but unfortunately we’re not able to provide extensive support for eye tracking data analysis. If you want to analyse the full coordinate eye tracking data, you should ensure you have the resources to conduct your analysis before you run your full experiment.

Some examples of published studies using eye tracking in Gorilla are listed below - please let us know if you have published or are writing up a manuscript!

Lira Calabrich, S., Oppenheim, G., & Jones, M. (2021). Episodic memory cues in the acquisition of novel visual-phonological associations: a webcam-based eyetracking study. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 43, pp. 2719-2725). https://escholarship.org/uc/item/76b3c54t

Greenaway, A. M., Nasuto, S., Ho, A., & Hwang, F. (2021). Is home-based webcam eye-tracking with older adults living with and without Alzheimer's disease feasible? Presented at ASSETS '21: The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. https://doi.org/10.1145/3441852.3476565

Prystauka, Y., Altmann, G. T. M., & Rothman, J. (2023). Online eye tracking and real-time sentence processing: On opportunities and efficacy for capturing psycholinguistic effects of different magnitudes and diversity. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02176-4