Violence detection in videos aims to identify whether a violent action occurred within a video stream. Effective tools for intelligent video analysis are highly demanded, specially to determine violence in video streams. Such solution could have applications in detecting inappropriate behaviors in video feeds, aiding law-enforcement in forensic cases, protecting children from accessing inappropriate online content and helping parents making informed decisions about what their kids should watch. Prior art on violence detection, particularly recently proposed deep learning based ones, seeks to identify violence in videos as a whole, without considering breaking down the subject into some of its underlying concepts. In this paper, we explore a different methodology of violence detection, which relies upon two deep neural network (DNNs) frameworks to learn spatial-temporal information on video clips under different scenarios — subjective- and conceptual-based. We leverage deep feature representations for each specific concept, and aggregate them by training a shallow neural network as a binary-classification problem to describe violence as a whole. Finally, we show that using more specific concepts is an intuitive and effective solution, besides being complementary to form a more robust definition of violence.