Prediction of the Realisation of an Information Need: An EEG Study

One of the foundational goals of Information Retrieval (IR) is to satisfy searchers' Information Needs (IN). Understanding how INs physically manifest has long been a complex and elusive process. However, recent studies utilising Electroencephalography (EEG) data have provided real-time insights into the neural processes associated with INs. Unfortunately, they have yet to demonstrate how this insight can practically benefit the search experience. As such, within this study, we explore the ability to predict the realisation of IN within EEG data across 14 subjects whilst partaking in a Question-Answering (Q/A) task. Furthermore, we investigate the combinations of EEG features that yield optimal predictive performance, as well as identify regions within the Q/A queries where a subject's realisation of IN is more pronounced. The findings from this work demonstrate that EEG data is sufficient for the real-time prediction of the realisation of an IN across all subjects with an accuracy of 73.5% (SD 2.6%) and on a per-subject basis with an accuracy of 90.1% (SD 22.1%). This work helps to close the gap by bridging theoretical neuroscientific advancements with tangible improvements in information retrieval practices, paving the way for real-time prediction of the realisation of IN.


INTRODUCTION
The primary objective of any Information Retrieval (IR) system is to fulfil a searcher's Information Need (IN) [4,14,18].In the realm of IR, numerous endeavours have been dedicated to unravelling and defining the intricate concept of IN.Pioneering works, such as Taylor's Question Negotiation Process [19], Anomalous State of Knowledge Model [1], and Wilson's Information Seeking Behavior [21], have significantly contributed to this pursuit.These works explore the essence of IN by examining user behaviour through techniques like user-system interactions [2], self-reflective notes [10], and interviews/questionnaires [13].While these methods offer valuable insights, reporting IN by subjects is often challenging due to its intricate and elusive nature, thereby constraining the efficacy of user-based studies [18].
Consequently, over the last decade, a new line of research has endeavoured to address the inherent limitations by directly examining the neurological activity in the searcher's brain through the utilisation of neuroimaging technologies [14,[16][17][18].Research conducted at the crossroads of neuroscience and information retrieval is often referred to as NeuraSearch [15].This interdisciplinary field has yielded numerous findings focused on the tangible representation of information needs (INs) within specific brain regions.For instance, Functional Magnetic Resonance Imaging (fMRI) was employed to observe subjects' brain activity in a Q/A task [17,18].The results revealed a distributed network of brain regions commonly associated with IN, with varying activity levels in these regions based on whether the subject knew the answer to a question or needed to search for it.Further exploration [17] utilised fMRI data from a similar Q/A task to train a support vector machine (SVM) capable of distinguishing instances when a searcher possesses an IN.These investigations have presented compelling evidence regarding the existence and expression of INs within the minds of searchers.However, employing fMRI for such analyses presents several limitations.Firstly, the physical hardware of an fMRI machine is both sizable and costly, necessitating the subject to lie supine within the central bore of the apparatus while maintaining stillness, as outlined by Moshfeghi et al. [17].Secondly, despite its fine spatial resolution, fMRI exhibits suboptimal temporal resolution, with each measurement taking a duration of 2 seconds [17].This limitation is further exacerbated by the Blood Oxygenation Level Dependent (BOLD) signal's inherent delays [17,18].Despite the valuable insights offered by fMRI, the cumbersome nature and high cost of the equipment, coupled with its temporal constraints, hinder its seamless integration into current IR systems.
Acknowledging the limitations inherent within fMRI data, researchers sought alternative neuroimaging methods to better depict the dynamics of INs with higher temporal resolution.One such approach involves the utilisation of Electroencephalography (EEG) data, a cheaper and more practical method, where electrical activity from the brain is recorded at a millisecond scale through electrodes placed on the subject's scalp [20].In the research presented by [14], EEG data is employed to observe subjects' brain activity during a Q/A session.This investigation aims to understand the temporal dynamics of IN formation, detecting the presence of INs even before searchers consciously acknowledge them.This exploration opens avenues for a proactive search process, offering insights into the early stages of information needs.Although previous works [14] provided an excellent analysis of the physical manifestations of the realisation of an IN within real-time through the use of EEG data, the question of "Can the realisation of an IN be predicted in real-time?" is still unanswered.From this hypothesis, we formulate these four research questions: RQ1: "Is it possible to predict the realisation of an IN in real-time from EEG data?", RQ2: "Can prediction of the realisation of IN be generalised across subjects, or is it subjectspecific?",RQ3: "During the Q/A session, where are the strongest indicators of the realisation of an IN?", RQ4: "What combination of features is optimal for the realisation of an IN prediction?".
In order to address our first research question, within this study we incorporated the EEG data gathered from 14 subjects whilst they took part in a Q/A task which involved the subjects observing queries word-by-word and determining if they could correctly answer the question or had a need to search (IN).This data is then provided to machine learning models to predict the subject's realisation of an IN.Additionally, we investigate the inter and intravariability of EEG data across a variety of subjects by exploring how the prediction of the realisation of an IN is affected when the models are trained to generalise across subjects compared to when they are trained on a single subject at a time.Moreover, whilst subjects examine the queries from the Q/A tasks word-by-word, we determine which segments within the given sentences are the strongest indicator of the realisation of an IN.As well as this, we perform an ablation analysis to discern the combination of commonly extracted EEG features that enables the models to best distinguish between different search states.

METHODOLOGY
subjects.The subjects were recruited by the University of Strathclyde.They received no monetary payments but were eligible for academic credits.The subjects consisted of 13 females (93%) and 1 male (7%) within an age range between 18 and 39 years and a mean age of 23 years (SD 6.5).Recording.The EEG data was captured using a 40-electrode Neu-roScan Ltd. system with a 10/20 cap, sampled at a frequency of 500Hz.The Q/A task was made of general knowledge questions taken from TREC-8 and TREC-2001 and B-KNorms Database 2 .Q/A Dataset.Two independent assessors separately evaluated the question difficulty (Cohen's Kappa: 0.61).A subset of 120 questions was then selected, and both annotators agreed upon their difficulties.The difficulty of the questions was equally distributed between easy and difficult for the overall dataset.Experimental Procedure.Ethical permission to conduct the study was approved by the Universities Ethics Committee, with the tasks being conducted in a laboratory setting and all subjects meeting the inclusion criteria, i.e. healthy subjects of ages 18 -55 years, fluent English ability, and no prior/current neurological disorders that may influence the task.Before any trials began, consent was obtained from the subjects.To ensure the subjects had a solid grasp of the procedure, before the main trial, they were supplied with a practice example, which consisted of five questions not included in the main trial.For the practice session, there was no time limit, and subjects were allowed to repeat if required, until comfortable to proceed.The following task procedure was repeated for each trial.The trial began by viewing a fixation cross in the middle of the screen for a duration of 2000ms, indicating the location of the next stimuli on the screen, which was a way to minimise eye movements on the screen.The subjects then viewed a sequential presentation of a question randomly selected from the dataset.Each word within the question was displayed for 800ms on the screen one at a time.Within this step, the subject processed the information as it was being presented word-by-word.Following the presentation of all the words within the question, the subjects were presented with a now fully-displayed question and three on-screen answer choices associated with the question.They were requested to select the correct answer or the option "I do not know", see Figure 1.If the subjects correctly or incorrectly answered the question, the answer was displayed onscreen (NoNeedToSearch), where the trial terminated and moved on to the next question.However, if the subject selected the "I do not know", they were presented with two options: whether they wanted to search (NeedToSearch) for the correct answer or not (NoNeedToSearch).For this task, there was no search process as the overall goal was to analyse the presence of an information need based on the decision to search by the subject.After selecting one of the two options, the trial would terminate, and the next question would be presented.This was repeated for all 120 questions.Upon task completion, analysis of the 14 subjects revealed that 85% of the responses were classed as NoNeedToSearch, and the remaining 15% were NeedToSearch, in order to balance the dataset, the number of NoNeedToSearch classes was made equal to the number of NeedToSearch classes.The subjects completed the task (without breaks) on average in 44 min (sd=4.62,med=43.40).Pre-processing.During EEG recording, the individual's actions often introduce electrical activities that can affect measurements and distort results.To address this issue, it is essential to eliminate these artefacts as effectively as possible.Initially, we utilised a bandpass filter [5,9] with a range of 0.5 to 50Hz.This range is commonly used because research indicates that the brain's recorded electrical activity falls within this spectrum.Additionally, we implemented average re-referencing [11], a technique that establishes a reference point by aggregating the activity measured across all electrodes.The objective is to capture any noise or interference impacting all electrodes within this reference.Subsequently, we subtract this reference from each electrode's signal, effectively eliminating the noise from each electrode's signal.
Feature Extraction.For this study, we extract a commonly adopted core [9,22] set of features from the EEG signals to determine which combination is optimal for IN classification, with each feature being extracted per-electrode (channel) signals across four specific frequency bands: Delta (1-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12-30 Hz) and gamma (30-40 Hz).The features are extracted across every 800ms block where the question words are presented to the subject and when they respond to the question (NoNeed-ToSerach and NeedToSearch).The list of the features is as follows: Mean: Calculates the average amplitude of the EEG signal within the specified frequency band over the 800ms block.It indicates the central tendency of the signal, helping to characterise the overall activity level.[22].Standard Deviation: Measures the variability or spread of the EEG signal within the frequency band [22].Skewness: Quantifies the asymmetry of the EEG signal's distribution within the frequency band [9].Kurtosis: Measures the "tailedness" of the EEG signal's distribution within the frequency band [9].Curve Length: Calculates the cumulative Euclidean distance between consecutive data points in the EEG signal within the frequency band [22].Number of Peaks: Counts the number of local maxima or peaks in the EEG signal within the frequency band [22].Average Non-Linear Energy: Quantifies the non-linear dynamics of the EEG signal within the frequency band [22].Experiment Conditions.As the overall goal of this study is to explore the best methods for predicting IN from searchers RQ1, we found it key to explore several experimental parameters outlined by our research questions within Section 1.  these features resulted in 127 combinations.Each of these combinations was then input into the classifier, with the model's performance serving as the metric to assess the effectiveness of the various feature combinations.Predictive Models.For this study, we incorporated each of the aforementioned experiment conditions into a training loop, where Generalised & Personalised were separated where each would iterate through every possible Feature Combination and Window Size.The classifiers selected for this task were the Support Vector Machine (SVM) [7], Random Forest Classifier [3], and AdaBoost [6] models, as they have seen substantial success within the realm of EEG classification [8,12] and are well suited to the limited quantity of data available for this task.Prior to this investigate several Deep and Recurrent Neural Networks were trained on the collected EEG data, however, their performance was sub-optimal as they were limited by the number of samples within the dataset.Each dataset provided to the model through each of the possible combinations of experiment conditions was cross-validated with a k-fold size = 5.Each fold returned the following metrics: Accuracy, Precision, and Recall, where their average across each fold was calculated along with their standard deviation.Baseline.Since there are no prior works to compare to, we introduce a baseline that represents an untrained model where all its predictions are based on a random choice, i.e.where its accuracy is set to 50%.

RESULTS & CONCLUSION
The results produced for the Generalised and Personalised conditions are detailed in Table 1 and 2 respectively.Each of these tables denotes the Model, the selected window size (W-Size), and the best-performing feature combination at the given window size with its subsequent Accuracy, Precision, and Recall scores.We also performed a paired Wilcoxon test between the predictions obtained for each model to check the significance of the difference with the baseline.All of the results obtained from our models trained on a set of features were different from that of the baseline with a confidence level of (p < 0.01).
We first address RQ1 by reviewing the results produced in both the Generalised and Personalised conditions.As we can see the prediction of the realisation of IN is possible, as every model in Table 1 and Table 2 was able to achieve an accuracy score above that of random classification (50%), with the lowest reported accuracy score being the RandomForest classifier with an accuracy of 68.9% (SD 19.7%) and the highest being the AdaBoost model with a score of 90.1% (22.1%) as seen in Table 2.These results demonstrate that EEG data is capable of achieving greater Generalised and Perosnalised accuracy performance for the prediction of the realisation of IN than that of alternative neuroimaging techniques such as fMRI [17].By comparing the performance of the Genralised and Personalised models, it can be observed that the Personalised approach achieves the highest overall prediction accuracy, evidenced by the AdaBoost model that obtained 90.1% (SD 22.1%) in Table 2.When trained using the Personalised method, the RandomForest, SVM, and AdaBoost model's accuracy on average across window sizes increases over its Generalised counterparts by 1.4%, 4.2%, and 13% respectively.However, this increased accuracy also comes with an increased Standard Deviation, with the Personalsied RandomForest, SVM, and AdaBoost models on average across window sizes having a higher Standard Deviation of 19.3%, 21.8%, and 22.4% respectively than the Generalised models.These findings help to address RQ2 as they suggest, on average, creating a model tailored to each subject is the best approach for predicting the realisation of IN as evidenced by the performance of the AdaBoost model in Table 2.However, the variation in Standard deviation indicates that the Generalised models offer a more robust and reliable prediction accuracy.This difference follows the trend observed in prior works [17] and is likely due to the natural variability in EEG data collected across subjects.As such, for future systems aiming to predict the realisation of an IN from EEG, the best approach may be to assess the model performance on individual subjects and determine if the trade-off between accuracy and standard deviation is acceptable or if a generalised model with a lower accuracy but more reliable standard deviation is more suitable for their specific research purposes.
Regarding RQ3, the results presented in Table 1 and Table 2 are in contrast to each other.In the Generalised condition Table 1, we observe that all models achieve their peak performance when the window size is set to 2, with the RandomForest, SVM and AdaBoost models achieving an accuracy of 73.5%, 69.7%, and 70.6% respectively.As the window size is increased from 2 up to 16, the performance of the RandomForest, SVM, and AdaBoost models decreases by 2%, 0.7%, and 0.6% respectively.Conversely, in the Personalised results, Table 2 we observe that at the window size of 16, the RandomForest and AdaBoost models achieve their highest performance of 76.6% and 90.1% respectively.As the window size is increased from 2 up to 16 the RandomForest and AdaBoost models accuracy increases by 7.7% and 16.2% respectively, except the SVM model, which follows the same trend as the Generalised Models.
The results produced by the Generalised approach suggest that the distinctive EEG patterns associated with the realisation of an IN may be more strongly concentrated immediately after the subject concludes their review of the question.This might be indicative of a universal or commonly shared cognitive response that occurs promptly after the comprehension of a question, highlighting a quick and standardised recognition process for INs across subjects.In Contrast to this, the performance of the Personalsied models indicates that for individual subjects, the discernible EEG patterns unfold over a more extended period.This could be influenced by varying cognitive styles, attention spans, or information processing speeds unique to each subject.subjects might take more time to process and formulate their information needs, leading to a prolonged period of activity associated with information-seeking.Lastly, addressing RQ4, we can observe that the best-performing feature is the Mean value taken from the EEG segments, as it appeared in every single best-performing combination at each window size for both generalised and personalised training Table 1 and 2  In conclusion, the findings of this study demonstrate that through the use of Electroencephalography (EEG) data, we were able to predict the realisation of IN substantially above the random baseline classification accuracy of 50%, with models achieving up to 90.1% accuracy.This work is the first to ever demonstrate the prediction of the realisation of IN through the use of EEG data, and at an accuracy higher than any other previously utilised neuroimaging techniques, paving the way to real-time realisation of IN prediction.Furthermore, the encouraging results obtained from the Generalised and Personalised conditions will help to inform future research and Information Retrieval (IR) systems that seek to incorporate the realisation of IN prediction, by taking into consideration the inter and intra-variability of EEG data cross subjects and examine the trade-off between a potentially more accurate subject-specific models and a more reliable generalised model.Moreover, we also highlighted optimal ranges within queries that should be examined to provide the strongest indicators of the realisation of IN, as well as the optimal combinations of features that should be considered for the prediction of the realisation of IN within subjects.
Generalised & Personalised: To address the research question RQ2 during training, two methods are devised.For the first approach, the samples relating to IN and non-IN from every subject were combined into a single dataset that would then be passed onto the model, this being the generalised training strategy to asses how well our classifier can discern IN across all subjects as EEG has been noted to be heavily subject dependant.The second approach maintained the IN and non-IN EEG data at a subject level, allowing us to assess the variability of subject performance for IN prediction.Window Size: To address RQ3, we adjusted the size of question segments (words) utilised by our classifier.This modification involved implementing an expanding window, starting from the onset of the subject's search decisions: NoNeedToSearch and NeedToSearch.On average, each question comprised seven segments, encompassing both words and responses.The minimum segment count was 4, while the maximum reached 16 segments.In this investigation, we explored the expanding window with four distinct sizes: 2, 4, 8, and 16.These sizes represented the range from the moment of question response to the full length of the question, including the response, see Figure 2. The objective was to ascertain the segments that the classifier favoured for distinguishing between IN and non-IN instances, potentially revealing where the realisation of IN was most pronounced during the question review process.Feature Combination: In accordance with RQ4, one of the primary aims of this study is to determine the optimal combinations of features commonly employed in EEG classification for effectively predicting the realisation of IN.As elaborated in Section 2, we identified and extracted seven key features for this investigation.Generating an exhaustive list of all possible combinations from
respectively.However, a large subset of key features see substantial use across both training conditions, for Generalised condition the following features are listed in order of occurrence: with Curve Length appearing in 7 optimal combinations, Average Energy in 5, Standard Deviation in 4, Number of Peaks in 2, and Skewness in 1. Similarly for Personalised: Average Energy appears in 4, Curve Length in 3, Kurtosis in 2, and Standard Deviation appears in one.Our results indicate that these features are strong performers in predicting the realisation of IN.

Table 1 :
This table shows the prediction accuracy of our Generalised approach.The standard deviation is presented in parentheses.The best-performing model is highlighted in bold.

Table 2 :
This table shows the prediction accuracy of our Personalised approach.The standard deviation is presented in parentheses.The best-performing model is highlighted in bold.