Medical image segmentation is pivotal for clinical diagnostics, enabling precise identification of anatomical structures and abnormalities in images from modalities such as ultrasound, CT scans, and X-rays. Traditional CNN-based models, while powerful, face limitations when handling long-distance dependencies between image elements, often missing finer details necessary for accurate segmentation. Recent Transformer-based approaches like TransUNet address these limitations but come with increased computational costs. To balance detail accuracy with computational efficiency, this proposal aims to develop a multimodal segmentation method that leverages the YOLOv8 model for efficient boundary box generation combined with the Segment Anything Model (SAM) for refined segmentation, particularly in handling ambiguous and variable ROI boundaries.