Perception-based learning approaches to robotic grasping have shown significant promise. This is further reinforced by using supervised deep learning in robotic arm. However, to properly train deep networks and prevent overfitting, massive datasets of labelled samples must be available. Creating such datasets by human labelling is an exhaustive task since most objects can be grasped at multiple points and in several orientations. Accordingly, this work employs a self-supervised learning technique in which the training dataset is labelled by the robot itself. Above all, we propose a cascaded network that reduces the time of the grasping task by eliminating ungraspable samples from the inference process. In addition to grasping task which performs pose estimation, we enlarge the network to perform an auxiliary task, object classification in which data labelling can be done easily by human. Notably, our network is capable of estimating 18 grasping poses and classifying 4 objects simultaneously. The experimental results show that the proposed network achieves an accuracy of 94.8% in estimating the grasping pose and 100% in classifying the object category, in 0.65 seconds.