1-Click Models are the new collaborative project from DigitalOcean and Hugging Face to bring you an easy method to interface with some of the best open-source Large Language Models (LLMs) on the most powerful GPUs available on the cloud. Together, users can optimize their usage of the best open-source models with no hassle or coding to setup.
In this tutorial, we are going to show and walkthrough the development of a voice-enabled personal assistant tool designed to run on any 1-Click Model enabled GPU Droplet. This application uses Gradio, and is fully API enabled with FastAPI. Follow along to learn more about the advantages of using 1-Click Models, learn the basics of querying a deployed 1-Click Model GPU Droplet, and see how to use the personal assistant on your own machines!
1-Click Hugging Face Models with DigitalOcean GPU Droplets
The new 1-Click models come with a wide variety of LLM options, all with different use cases. These are namely:
- meta-llama/Meta-Llama-3.1-8B-Instruct
- meta-llama/Meta-Llama-3.1-70B-Instruct
- meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
- Qwen/Qwen2.5-7B-Instruct
- google/gemma-2-9b-it
- google/gemma-2-27b-it
- mistralai/Mixtral-8x7B-Instruct-v0.1
- mistralai/Mistral-7B-Instruct-v0.3
- mistralai/Mixtral-8x22B-Instruct-v0.1
- NousResearch/Hermes-3-Llama-3.1-8B
- NousResearch/Hermes-3-Llama-3.1-70B
- NousResearch/Hermes-3-Llama-3.1-405B
- NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
Creating a new GPU Droplet with any of these models only requires the process of setting up a GPU droplet, as shown here.
Watch the following video for a full step-by-step guide to creating a 1-Click Model GPU Droplet, and check out this article for more details on launching a new instance.
Once you have set up your new machine, navigate to the next section for more detail on interacting with your 1-Click Model.
Interacting with the 1-Click Model Deployment
Connecting to the 1-Click Model Deployment is simple if we want to interact with it on the same machine. “When connected to the HUGS Droplet, the initial SSH message will display a Bearer Token, which is required to send requests to the public IP of the deployed HUGS Droplet. Then you can send requests to the Messages API via either localhost if connected within the HUGS Droplet, or via its public IP.” (Source). To access the Droplet on other machines then, we will require getting the Bearer Token. Connect to your machine using SSH to get a copy of the token, and save it for later. If we are just wanting to interact with the inference endpoint from our GPU Droplet, things are pretty simple. The variable is already saved to the environment.
Once the Bearer Token variable is set on the machine we are choosing to use, we can begin inferencing with the model. There are two routes to do this with at the moment: cURL and the Python. The endpoint will be automatically run from the port 8080, so we can default requests to our machine. If we are using a different machine, change the localhost
value below to the IPv4 address.
cURL
curl http://localhost:8080/v1/chat/completions -X POST -d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":128}}' -H 'Content-Type: application/json' -H "Authorization: Bearer $BEARER_TOKEN"
This code will ask the model “What is Deep Learning?” and issue a response in the following format:
{"object":"chat.completion","id":"","created":1731532721,"model":"hfhugs/Meta-Llama-3.1-8B-Instruct","system_fingerprint":"2.3.1-dev0-sha-169178b","choices":[{"index":0,"message":{"role":"assistant","content":"**Deep Learning: A Subfield of Machine Learning**n=====================================================nnDeep learning is a subfield of machine learning that focuses on the use of artificial neural networks to analyze and interpret data. It is inspired by the structure and function of the human brain and is particularly well-suited for tasks such as image and speech recognition, natural language processing, and data classification.nn**Key Characteristics of Deep Learning:**nn1. **Artificial Neural Networks**: Deep learning models are composed of multiple layers of interconnected nodes or "neurons"" that process and transform inputs into outputs.n2. **Non-Linear Transformations**: Each layer applies a non-linear""}