Other considerations are worth noting,

however Modeling

Other considerations are worth noting,

however. Modeling exploration is not trivial, because it requires predicting that participants make a response that counters their general propensity to exploit the option with highest value, and therefore any model of exploration requires knowing when this will occur. Because exploited options are sampled more often, their outcome uncertainties are generally lower than those of the alternative options. Thus, when the subject exploits, they are selecting the least uncertain option, making it more difficult to estimate the positive influence of uncertainty on exploration. As noted above, this problem is exacerbated by “sticky choice,” whereby participants’ choices in a given trial are often autocorrelated with those of previous trials independent of value. Finally, studies failing to report an effect of uncertainty on exploration

Osimertinib cell line have all used n-armed bandit tasks with dynamic reward contingencies across trials (Daw et al., 2006, Jepma et al., 2010 and Payzan-LeNestour and Bossaerts, 2011), and participants responded as if only the very last trial was informative about value (Daw et al., 2006 and Jepma et al., 2010). It may be more difficult to estimate uncertainty-driven exploration in this context, given that participants would be similarly uncertain about all alternative options that had not been selected in the most recent trial. In our behavioral paradigms and model fits, we have attempted to confront these issues allowing us to estimate uncertainty, its effects on exploration, and the neural correlates ZD6474 of this

relationship. First, it is helpful to note the ways that the current paradigm is atypical in comparison to more traditional n-armed bandit tasks. Initially, the task was designed not to study exploration, but rather as a means of studying incremental learning in Parkinson’s patients and Ibrutinib as a function of dopamine manipulation (Moustafa et al., 2008). However, in the Frank et al. (2009) large-sample genetics study, it was observed that trial-by-trial RT swings appeared to occur strategically and attempts to model these swings found that they were correlated with relative uncertainty. Importantly, this is not just a recapitulation of the finding that the model fits better when relative uncertainty is incorporated (i.e., ε is nonzero); much of this improvement in fit was accounted for by directional changes in RT from one trial to the next (RT swings). This distinction is important: in principle a fitted nonzero ε could capture an overall tendency to respond to an action that is more or less certain, e.g., if a subject exploits most of the time, ε would be negative (assuming the exploitation part of the model is imperfect in capturing all exploitative choices).

Comments are closed.