Skip Navigation Links
ServicesExpand Services
ProductsExpand Products
ResearchExpand Research
MembershipExpand Membership
About Us
Contact Us



Image-based Vision Model:
Vision models can have different forms based on the purpose when the models were developed. For example, most of the existing spatial vision models take visual stimuli properties such as spatial frequency, contrast, location of some well-defined spatial patterns as the model inputs. These models have served their purposes very well as to probe the underlying visual mechanisms in processing spatial patterns. In addition, there are also image-based vision models (e.g., Watson and Ahumada, 2005), where the inputs to the models are images or pixels based distributions. Such models can be used in real industry applications to handle with arbitrary target shape.   
         The following is a description of our work on image-based vision model, which simulates biological visual image processing in the visual system.  

Visual Image Processing: We developed a framework to simulate and compute human visual performance based on the ideas of implicit masking, nolinear processes, and other well-known properties of the visual system that have been used in many models. The basic functional components of this model include a front-end low-pass filter, a retinal nonlinearity, a cortical frequency representation and a frequency-dependent nonlinear process, and finally a decision stage.

Low-pass Filtering: When the light modulated information of an image enters into human eyes, it passes through the optical lens of the eye and is captured by photoreceptors in the retina. One function of photoreceptors is to sample the continuous spatial variation of the image discretely.The cone signals are further processed through horizontal cells, bipolar cells, amacrine cells, and ganglion cells with some re-sampling. From an image processing point of view, the effects of optical lens, sampling, and re-sampling in the retinal mosaic are low-pass filtering.          We estimate the front-end filter from psychophysical experiments. It has been shown that the visual behavior at high spatial frequencies follows an exponential curve. Yang et al. (1995) extrapolated this relationship to low spatial frequencies to describe the whole front-end filter with an exponential function of spatial frequency:

LPF(f)  = Exp(-a f),                                                                             (1)

Where a is a parameter specifying the rate of attenuation for a specific viewing condition. Yang and Stevenson (1997) modified the formula to account for the variation in a with the mean luminance of the image:

a = a0 + d L00.5,                                                                                  (2)

where a0 and d are two parameters and L0 the mean luminance of the image.

Retinal Compressive Nonlinearity: In the retina, there are several major layers of cells, starting from photoreceptors including rods and three types of cones, to horizontal cells, bipolar cells, amacrine cells, and finally to ganglion cells where the information is transmitted out of the retina via optic nerve fibers to the central brain. Retinal processes include a light adaptation, where the retina becomes less sensitive if continuously exposed to bright light. The adaptation effects are spatially localized . In the current model, the adaptation pools are assumed to be constrained by ganglion cells with an aperture window:          

Wg(x, y) =Exp[-(x2 + y2)/(2rg2)]/(2prg2),                                            (3)

where rg is the standard deviation of the aperture.  The adaptation signal at the level of ganglion cells Ig is the convolution of the low-passed input image Ic with the window function Wg.  In this algorithm, the window profile is approximated as spatially invariant by considering only foveal vision. The retinal signal IR is the output of a compressive nonlinearity.  The form of this nonlinear function is assumed here to be the Naka-Rushton equation, which has been widely used in models of retinal light adaptation.  One major difference here is that the adaptation signal Ig in the denominator is a pooled signal, which is similar to a divisive normalization process:         

Ig = w0 (1 + I0n)Icn/(Ign + I0nw0n),                                                                                         (4)

where n and I0 are parameters that represent the exponent and the semi-saturation constant of the Naka-Rushton equation, respectively, and w0 is a reference luminance value. In conditions where Ic and Ig are all equal to w0, the retinal output signal is the same as the input signal strength.

Cortical Compressive Nonlinearity: Simple cells and complex cells in the visual striate cortex usually respond to stimuli of limited ranges in spatial frequency and orientation. To capture this frequency- and orientation-specific nonlinearity, one can transform the image IR from a spatial domain to a frequency domain representation via a Fourier transform to T(fx, fy), and divided by nx and ny to normalize the amplitude in the frequency domain.  Here fx and fy are the spatial frequencies in x and y directions, respectively, and nx by ny is the number of image pixels.
     These cells also exhibit nonlinear properties; their firing rate does not increase until the stimulation strength is above a threshold level and the firing rate saturates when the stimulation strength is very strong. In the model calculation, the signal in the frequency domain passes through the same type of nonlinear compressive transform as it did in the retinal processing.Following the concept of frequency spread in implicit masking (see Fig 2), one major step here is to compute the frequency spreading that affect the masking signal in the denominator of the nonlinear formula. In this model, the signal strength in the masking pool, Tm(fx, fy), is the convolution of the absolute signal amplitude |T(fx, fy)| and an exponential window function:

Wc(fx, fy) = Exp[-(fx2 + fy2)0.5/s],                                                         (5)

Where s correlates with the extent of the frequency spreading and the bandwidth of frequency channels. As the bandwidth of frequency channels increases with the spatial frequency, one should expect that the s value increases with spatial frequency. To simplify the computation, however, this value is approximated as a fixed value in the current algorithm. Applying the same form of compressive nonlinearity as in the retina, the cortical signal in the frequency domain is expressed as:                                           

Tc = sign(T) w0 (1 +T0v) |T|v/(Tmv + T0vw0v) ,                                        (6)

where v and T0 are parameters that represent the exponent and the semi-saturation constant of the Naka-Rushton equation for the cortical nonlinear compression, respectively. The term Tm in the denominator includes the energy spread of the DC component (i.e., at 0 cpd) of the spatial pattern.  This component is processed in the same way as other frequency maskers, if there are any, under Eq. 6.  Thus, the concept of implicit masking is naturally implemented in the image processing framework. In summary, the major process in the cortex is modeled by a compressive nonlinearity applying to the spatial frequency and orientation components. The cortical image representation in the frequency domain is given by the function Tc.This function can be used to calculate visual responses and to simulate visual performance on detecting spatial patterns, as well as for estimating perceived brightness.