Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Google Cloud - Vision

Introduction

Google Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy-to-use REST API. It quickly classifies images into thousands of categories (e.g., "sailboat", "lion", "Eiffel Tower"), detects individual objects and faces within images, and reads printed words contained within images.

Getting Started

To begin using the Vision API, you need to set up a Google Cloud project and enable the Vision API. Follow these steps:

  • Go to the Google Cloud Console.
  • Create a new project or select an existing project.
  • Enable the Vision API for your project.
  • Set up authentication by creating a service account and downloading the JSON key file.

Example command to set the environment variable for authentication:

export GOOGLE_APPLICATION_CREDENTIALS="path/to/your/credentials-file.json"

Label Detection

Label detection identifies objects within an image. For example, it can identify a dog, cat, or car in an image. Here's how you can use the Vision API to detect labels:

Sample JSON request for label detection:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://example.com/path/to/your/image.jpg"
        }
      },
      "features": [
        {
          "type": "LABEL_DETECTION"
        }
      ]
    }
  ]
}
                

Send the request to the Vision API endpoint:

curl -X POST \
  -H "Content-Type: application/json" \
  --data-binary @request.json \
  "https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY"

Response example:

{
  "responses": [
    {
      "labelAnnotations": [
        {
          "mid": "/m/0k4j",
          "description": "cat",
          "score": 0.98
        },
        {
          "mid": "/m/01yrx",
          "description": "pet",
          "score": 0.96
        }
      ]
    }
  ]
}
                

Face Detection

Face detection detects multiple faces within an image along with the associated key facial attributes like emotional state or wearing headwear. Here's an example of how to perform face detection:

Sample JSON request for face detection:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://example.com/path/to/your/image.jpg"
        }
      },
      "features": [
        {
          "type": "FACE_DETECTION"
        }
      ]
    }
  ]
}
                

Send the request to the Vision API endpoint:

curl -X POST \
  -H "Content-Type: application/json" \
  --data-binary @request.json \
  "https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY"

Response example:

{
  "responses": [
    {
      "faceAnnotations": [
        {
          "boundingPoly": {
            "vertices": [
              {"x": 285, "y": 44},
              {"x": 382, "y": 44},
              {"x": 382, "y": 150},
              {"x": 285, "y": 150}
            ]
          },
          "fdBoundingPoly": {
            "vertices": [
              {"x": 285, "y": 44},
              {"x": 382, "y": 44},
              {"x": 382, "y": 150},
              {"x": 285, "y": 150}
            ]
          },
          "landmarks": [
            {
              "type": "LEFT_EYE",
              "position": {"x": 312, "y": 78, "z": 0}
            },
            {
              "type": "RIGHT_EYE",
              "position": {"x": 346, "y": 78, "z": 0}
            }
          ],
          "detectionConfidence": 0.99,
          "landmarkingConfidence": 0.9,
          "joyLikelihood": "VERY_LIKELY",
          "sorrowLikelihood": "VERY_UNLIKELY",
          "angerLikelihood": "VERY_UNLIKELY",
          "surpriseLikelihood": "VERY_UNLIKELY",
          "underExposedLikelihood": "VERY_UNLIKELY",
          "blurredLikelihood": "VERY_UNLIKELY",
          "headwearLikelihood": "VERY_UNLIKELY"
        }
      ]
    }
  ]
}
                

Text Detection

Text detection (OCR) detects and extracts text within an image with support for a broad range of languages. Here's an example of how to perform text detection:

Sample JSON request for text detection:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://example.com/path/to/your/image.jpg"
        }
      },
      "features": [
        {
          "type": "TEXT_DETECTION"
        }
      ]
    }
  ]
}
                

Send the request to the Vision API endpoint:

curl -X POST \
  -H "Content-Type: application/json" \
  --data-binary @request.json \
  "https://vision.googleapis.com/v1/images:annotate?key=YOUR_API_KEY"

Response example:

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "Google Cloud Platform",
          "boundingPoly": {
            "vertices": [
              {"x": 13, "y": 8},
              {"x": 215, "y": 8},
              {"x": 215, "y": 105},
              {"x": 13, "y": 105}
            ]
          }
        }
      ]
    }
  ]
}
                

Conclusion

Google Cloud Vision API provides powerful image analysis capabilities that can help developers build sophisticated image recognition and analysis applications. By leveraging machine learning models, developers can quickly integrate functionalities like label detection, face detection, and text detection into their applications.

For more information, refer to the official documentation.