278: Azure is on a Bender: Bite my Shiny Metal FXv2-series VMs

Cloud Pod Header
tcp.fm
278: Azure is on a Bender: Bite my Shiny Metal FXv2-series VMs
Loading
/
76 / 100

Welcome to episode 278 of The Cloud Pod, where the forecast is always cloudy! When Justin’s away, the guys will… maybe get a show recorded? This week, we’re talking OpenAI, another service scheduled for the grave over at AWS, saying goodbye to pesky IPv4 fees, Azure FXv2 VMs, Valkey 8.0 and so much more! Thanks for joining us, here in the cloud! 

Titles we almost went with this week:

  • 🪦Another One Bites the Dust
  • 🪲Peak AI reached: OpenAI Now Puts Print Statements in Code to Help You Debug

A big thanks to this week’s sponsor: Archera

There are a lot of cloud cost management tools out there. But only Archera provides cloud commitment insurance. It sounds fancy but it’s really simple. Archera gives you the cost savings of a 1 or 3 year AWS Savings Plan with a commitment as short as 30 days. If you don’t use all the cloud resources you’ve committed to, they will literally put money back in your bank account to cover the difference. Other cost management tools may say they offer “commitment insurance”, but remember to ask: will you actually give me my money back? Archera will. Click this link to check them out

AI Is Going Great – Or How ML Makes All It’s Money

00:59 Introducing vision to the fine-tuning API.

  • OpenAI has announced the integration of vision capabilities into its fine-tuning API, allowing developers to enhance the GPT-4o model to analyze and interpret images alongside text and audio inputs. 
  • This update broadens the scope of applications for AI, enabling more multimodal interactions.
  • The fine-tuning API now supports image inputs, which means developers can train models to understand and generate content based on visual data in conjunction with text and audio.
  • After October 31, 2024, training for fine-tuning will cost $25 per 1 million tokens, with inference priced at $3.75 per 1 million input tokens and $15 per 1 million output tokens. 
  • Images are tokenized based on size before pricing. The introduction of prompt caching and other efficiency measures could lower the operational costs for businesses deploying AI solutions.
  • The API is also being enhanced to include features like epoch-based checkpoint creation, a comparative playground for model evaluation, and integration with third-party platforms like Weights and Biases for detailed fine-tuning data management.
  • What does it mean? Admit it – you’re dying to know. 
  • Developers can now create applications that not only process text or voice but also interpret and generate responses based on visual cues, and importantly fine tuned for domain specific applications, and this update could lead to more intuitive user interfaces in applications, where users can interact with services using images as naturally as they do with text or speech, potentially expanding the user base to those less tech-savvy or in fields where visual data is crucial.

03:53 📢 Jonathan – “I mean, I think it’s useful for things like quality assurance in manufacturing, for example. You know, could, you could tune it on what your nuts and bolts are supposed to look like and what a good bolt looks like and what a bad bolt looks like coming out of the factory. You just stream the video directly to, to an AI, AI like this and have it kick out all the bad ones. It’s kind of, kind of neat.”

04:41  Introducing the Realtime API

  • OpenAI has launched its Realtime API in public beta, designed to enable developers to create applications with real-time, low-latency, multimodal interactions. 
  • This API facilitates speech-to-speech conversations, making user interactions more natural and engaging.
  • The Realtime API uses WebSockets for maintaining a persistent connection, allowing for real-time input and output of both text and audio. This includes function calling capabilities, making it versatile for various applications.
  • It leverages the new GPT-4o model, which supports multimodal inputs (text, audio, and now with vision capabilities in fine-tuning).
  • Use Cases include:
    •  Interactive applications: Developers can now build apps where users can have back-and-forth voice conversations or even integrate visual data for a more comprehensive interaction.
    • Customer Service: The API can revolutionize customer service with real-time voice interactions that feel more human-like.
    • Voice Assistants: Healthify already uses the API for natural, conversational interactions with its AI coach, Ria.

5:54 📢 Matthew – “Just think about how much time you’ll have left in your life when you don’t actually have to attend the meetings. You train a model, you fine-tune it based on Ryan’s level of sassiness and how crabby he is that day. And you just put in the meeting so you can actually do work.”

09:58  Introducing Canvas 

  • OpenAI’s Canvas is an innovative interface designed to enhance collaboration with ChatGPT for writing and coding projects, moving beyond the traditional chat format to offer a more interactive and dynamic workspace – a similar idea to Anthropic Claude’s Projects and artifacts.
  • From drafting emails to writing articles, Canvas can assist in creating content, adjusting tone, length, or style, and providing real-time edits.
  • Developers can write, debug, and document code. 
  • Canvas supports creating an API web server, adding comments, explaining code sections, and reviewing code for improvements.
  • Best of all it can recommend where to place print statements for debugging!

11:18 📢 Jonathan – “I got my Pixel 9 phone, which comes with Gemini Pro for the year. And I noticed a shift kind of in the way AI is kind of being integrated with things. used to be, do you me to write the message for you? They’ve moved away from that now, I think, there’s a little pushback against that. People want to feel like they’re still authentic. So now instead, once you’ve finished writing the message, it’s like, would you like us to refine this for you? Like, yes, please, make it sound more professional.”

AWS

13:01 AWS Announces AWS re:Post Agent, a Generative AI-powered virtual assistant

  • AWS is starting to leverage Gen AI to auto respond to post on re:Post
  • Jonathan is especially looking forward to seeing the hallucinations that it posts. 

14:06 Maintain access and consider alternatives for Amazon Monitron

    • Amazon Monitron is being shut down.
    • It will no longer be available for new customers after October 31st, 2024. 
    • Existing customers will be able to purchase devices and continue utilizing the service as normal until July 2025. 
    • Customers will be considered an existing customer if they have commissioned an Amazon Monitron sensor through a project any time in the 30 days prior to October 31, 2024
  • “For existing Amazon business customers, we will allowlist your account with the existing Amazon Monitron devices. For existing Amazon.com retail customers, the Amazon Monitron team will provide specific ordering instructions according to individual request.”
  • Alternative for your condition monitoring needs, we recommend exploring alternative solutions provided by AWS Partners: Tactical Edge, IndustrAI, and Factory AI.

15:11 📢 Jonathan – “That’s a weird one, because I think they talked about this on stage at re.Invent a few years ago. It was a whole big industrial IoT thing. We have these devices that monitor the unique vibrations from each machine, and we can tell weeks in advance if some part’s going to fail or not. So it’s kind of weird that they’re killing it, but I guess the functionality can be built with other primitives that they have, and it doesn’t need to be its own service.”

17:05 Amazon Virtual Private Cloud (VPC) now supports BYOIP and BYOASN in all AWS Local Zones

  • Now you can BYOIP  and ASNS to local zones
  • Huzzah
  • It *should* save you all the pesky IPv4 fees that you were paying. 

18:19 Amazon EC2 now supports Optimize CPUs post instance launch

  • Amazon EC2 now allows customers to modify an instance’s CPU options after launch. 
  • You can modify the number of vCPUs and/or disable the hyperthreading of a stopped EC2 instance to save on vCPU-based licensing costs. 
  • In addition, an instance’s CPU options are now maintained when changing its instance type.
  • This is beneficial to customers who have a Bring-Your-Own-license (BYOL) for commercial database workloads, like Microsoft SQL Server.

18:53 📢 Ryan – “Yeah, this is one of those things where it’s a giant pain if you have to completely relaunch your instance. Or when you’re trying to upscale your instance to a new instance type to get more memory or what have you, and having that completely reset. so then not only are you trying to scale this, probably to avoid an outage, now it’s taking twice as long because you’re going to do a thing. So this is one of those really beneficial features that no one will ever mention again.”

21:36 Amazon WorkSpaces now supports file transfer between WorkSpaces sessions and local devices 

  • Amazon WorkSpaces now supports file transfers between Personal sessions and local computers. 
  • Administrators can control file upload/download permissions to safeguard data.
  • Infosec is just going to love all the data loss options. 

22:07 📢 Jonathan – “So they re-implement RDP, they take out the feature, then they add it again, and then they give you a switch, which everyone’s going to switch on to stop you from using it. That’s fantastic.”

22:17 📢 Matthew – “But they can check the box now saying it exists, which means they’ll pass some RFP. So now they’re more likely to be able to be considered.”

GCP

25:30 Introducing Valkey 8.0 on Memorystore: unmatched performance and fully open-source

  • Google Cloud has introduced Memorystore for Valkey 8.0, marking it as the first major cloud platform to offer Valkey 8.0 as a fully managed service. 
  • This launch signifies Google Cloud’s commitment to supporting open-source technologies by providing a high-performance, in-memory key-value store alternative to Redis, with enhancements in performance, reliability, and compatibility.
  • Compared to Redis, Valkey aims to maintain full compatibility while offering improvements in performance and community governance but has changes and features like-
    • Better data availability during failover events.
    • Support for vector search, which is beneficial for AI and machine learning applications requiring similarity searches.
    • Improved concurrency allows for parallel processing of commands, reducing bottlenecks.
    • and some other great performance improvements
  • Valkey 8.0 on Memorystore offers up to twice the Queries Per Second (QPS) compared to Memorystore for Redis Cluster at microsecond latency, enabling higher throughput with similarly sized clusters.

26:53 📢 Ryan – “ …when you see this type of change, but you know, especially right after a license kerfuffle, right? That, you know, because Valkey to come into existence. Like it’s kind of like, wow, the power of open search is really there. And now, why wasn’t this, you know, part of  the Redis thing, it’s because people weren’t going through it, you know, when it was that license. So it’s kind of a good thing in a lot of sense.”

29:56 Understand your Cloud Storage footprint with AI-powered queries and insights 

  • Managing millions or billions of objects across numerous projects and with hundreds of Cloud engineers is fun right?
  • Google Cloud is the first hyperscale cloud provider to generate storage insights specific to an environment by querying object metadata and using the power of large language models (LLMs). (Although AWS has had a similar feature for quite a bit.. But it wasn’t AI.)
  • After the initial setup, you’ll be able to access the enhanced user experience, which includes a short summary of your dataset. 
  • Bonus! Pre-curated set of prompts with validated responses. “We selected these prompts based on customers’ most common questions.”
  • To combat hallucinations there are multiple informational indicators: 
    • Every response includes the SQL query for easy validation, 
    • Curated prompts show a ‘high accuracy’ tag
    • And helpful information displays data freshness metadata.

31:42 📢 Ryan – “…it’s insights into your storage data. There’s performance tiers, the ability to migrate it to lower performance tier for cost savings. There’s the insights on the access model and insecure sort of attack vectors that you could have. Like if it’s a publicly exposed bucket and it has excessive permissions or it has sensitive content in it, it’ll sort of provide that level of insight.”

Azure

32:51 Announcing the General Availability of Azure CycleCloud Workspace for Slurm 

  • Let’s deconstruct this title:
    • Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure.
    • Slurm is a scheduler.
  • So really, what is this? It’s the ability to buy and launch from the marketplace an orchestrating and managing High Performance Computing (HPC) environments that leverages Slurm as a scheduler.
  • When Matthew doesn’t know what the Azure thing is, we’re all in trouble. 
  • And yes, this is where the Futurama references originated. Are we proud of it? At the risk of sounding negative, no.

35:33 Announcing the public preview of the new Azure FXv2-series Virtual Machines

  • Shut up and take our money – new shiny machines!
  • Best-suited to provide a balanced solution for compute-intensive workloads such as databases, data analytics workloads and EDA workloads, that also require large amounts of memory and high-performance, storage, I/O bandwidth.
    • up to 1.5x CPU performance
    • 2x vCPUs, with 96 vCPU as the largest VM size
    • 1.5x+ Network bandwidth, and offers up to 70 Gbps
    • up to 2x local storage (Read) IOPS and offers up to 5280 GiB local SSD capacity
    • up to 2x IOPS and up to 5x throughput in remote storage performance  
    • up to 400k IOPS and up to 11 GBps throughput with Premium v2/ Ultra Disk support
    • up to 1800 GiB memory
    • FXv2-series VMs feature an all-core-turbo frequency up to 4.0 GHz
    • 21:1 memory-to-vCPU ratio with the base sizes
  • The blog states that the FXv2-series Azure Virtual Machine is best-suited to provide a balanced solution for compute-intensive workloads but then goes on to the real answer: That it is purpose-built, to address several requirements of SQL Server workloads. 

37:00 📢 Ryan – “…you can deploy these where you these VMs where you get a 21 to one ratio of memory to PCP. Yeah, it’s cool. So while they do go out, they tell their best suited for balance and compute intensive workloads. But if you read further down the post, they get to the real answer, which is this is purpose built to address several requirements for Microsoft SQL Server, which totally makes sense.”

38:42 General Availability: Azure confidential VMs with NVIDIA H100 Tensor Core GPUs

  • These are on AMD EPYC with H100
  • Setup securely
  • ideal for inferencing, fine-tuning or training small-to-medium sized models such as Whisper, Stable diffusion and its variants (SDXL, SSD), and language models.

39:19 📢 Jonathan – “How weird though. The point of a confidential VM is that it has one hole that you put something in. It does some magic work on it and then spits an answer out, but you don’t get to see the sausage being made inside. the fact that they’re selling this for training or inference is really interesting.”

42:08 What’s new in FinOps toolkit 0.5 – August 2024

  • The FinOps Toolkit 0.5, released in August 2024, introduces several enhancements aimed at improving cloud financial management through Microsoft’s FinOps framework. 
  • This update focuses on simplifying the process of cost management and optimization for Azure users, with new features for reporting, data analysis, and integration with Power BI for better financial analytics.
  • Key Updates in FinOps Toolkit 0.5:
    • Users can now connect Power BI reports directly to raw cost data exports in storage without needing FinOps hubs, simplifying the setup for cost analysis.
    • The toolkit now supports the FOCUS 1.0 schema for cost and usage data, which aims to standardize FinOps data across platforms for easier analysis and comparison.
    • The update includes improvements in the Azure Optimization Engine for better custom recommendations on cost savings and performance  enhancements.
    • There are new tools and updates for reporting, including a guide on how to compare FOCUS data with actual or amortized cost data, aiding in more accurate financial reporting.
    • Expanded scenario-based documentation helps users update existing reports to use FOCUS and understand how to leverage the new data schema effectively.
  • Organizations have the choice to use the latest toolkit with existing FinOps hubs or upgrade to gain access to new features while maintaining compatibility with previous report versions.

47:11 GPT-4o-Realtime-Preview with audio and speech capabilities 

  • Woohoo! it released on Azure too now 
  • The guys may have officially lost the plot at this point. 

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.