The functional magnetic resonance imaging under naturalistic paradigm (NfMRI) exhibited promising ability in approximating the functional activities of brain in real life. Deep learning models such as convolutional neural network (CNN), convolutional autoencoder (CAE) and deep belief network (DBN) have shown notable performance in identifying temporal patterns and functional brain networks (FBNs) from fMRI data, in which most of these studies directly modelled the functional brain activities embedded in fMRI data. However, the hierarchical temporal and spatial organization of brain function under naturalistic condition has been rarely investigated and it is unknown whether it is possible to directly derive hierarchical FBNs from volumetric fMRI data using deep learning models. In addition, due to the high dimensionality of fMRI volume images and very large number of training parameters, the manual design of neural architecture for deep learning model is time-consuming and not optimal, thus awaiting further advances in automatic searching framework to learn optimal network architecture for deep learning model. To tackle these problems, we proposed a deep belief network (DBN) and neural architecture search (NAS) combined framework (Volumetric NAS-DBN) to directly model the fMRI volume images under naturalistic condition. Our results demonstrated that the DBN with optimal architecture can effectively characterize hierarchical organization of spatial distribution and temporal responses from volumetric fMRI data under naturalistic condition.