In this paper we compare the use of several features in the task of content filtering for video social networks, a very challenging task, not only because the unwanted content is related to very high-level semantic concepts (e.g., pornography, violence, etc.) but also because videos from social networks are extremely assorted, limiting the use of a priori information. We propose a simple method, able to combine diverse evidence, coming from different features and various video elements (entire video, shots, frames, keyframes, etc.). We evaluate our method in two social network applications, related to the detection of unwanted content — pornographic videos and violent videos. Using challenging test databases, we show that this simple scheme is able to obtain good results, provided that adequate features are chosen. Moreover, we establish the use of spatiotemporal local descriptors as critical to the success of the method in both applications.