Welcome to episode 265 of the Cloud Pod Podcast – where the forecast is always cloudy! This week, Jonathan, Ryan, and Justin are trying to keep cool in new WorkSpaces Pools, avoiding the Heatwave with Oracle’s new LLM, taking a look at AWS Jamba (hold the straw) and taking a look at the ever elusive Cloud Maturity.
All this news and more, this week on The Cloud Pod!
Titles we almost went with this week:
- 🏊The Cloud Pod takes a dip in the Workspaces Pool
- 👪The Cloud Pod Lineage view is suspect
- 🔷A Gitlab, A BitBucket, and a Blueprint build a Workspaces Pool
- 🥤AWS goes for a Jamba Juice
- 🔑Google Cloud Autokey does exactly what it sounds like
- 🥵Oracle LLM Heatwaves send us to the Amazon Workspace Pool
- 💂Jonathan is unimpressed with this weeks show
- 🛣️Highway to the DataZone
A big thanks to this week’s sponsor:
We’re sponsorless! Want to reach a dedicated audience of cloud engineers? Send us an email or hit us up on our Slack Channel and let’s chat!
General News
01:03 HashiCorp State of Cloud Strategy Survey 2024: Cloud Maturity is Elusive but Valuable
- Hashicorp just released the results of its State of Cloud Survey, and guess what? Cloud maturity is pretty elusive. Weird…
- Hashicorp finds that 8% of organizations qualify as Highly Mature, this results that the biggest benefits are cloud is only going to a small group of truly mature organizations.
- Justin wonders if this is part of the cloud repatriation push?
- Are other listeners seeing some of this, especially on places like LinkedIn? We’d love to hear.
- Trailblazers are finding faster development speed, lower costs and reduced risks while others continue to struggle to create haves and have nots with enterprises getting different business outcomes.
- Hashicorp collected responses from almost 1,200 technology practitioners and decision makers at organizations with more than 1000 employees.
- 66% of respondents report that they have increased cloud spending in the last year, but 91% believe they are wasting money in the cloud, and 64% are experiencing a shortage of skilled staff.
- 45% of low maturity organizations are still waiting for their cloud strategy to pay off!
- One of the key takeaways from the survey is that the path to cloud maturity increasingly relies on platform teams to help automate and systemize cloud operations.
- However, only half the respondents 42% say they rely on centralized platform teams to standardize cloud operations throughout their organization.
- Platform teams help manage cloud, but also help address the skills shortage that has long plagued enterprise cloud adoption.
- If only they’d pay for training. 🤷
03:22📢 Jonathan – “The skill shortage thing really bugs me sometimes because there are plenty of skilled workers around and the reccs aren’t open for them. So I don’t think there aren’t qualified staff… yeah, it’s not a shortage because of the lack of people. It’s a shortage because we’re not prepared to spend the money on the people.”
08:24 Terraform AWS provider tops 3 billion downloads
- Hashicorp’s Terraform AWS provider has just celebrated its 10 year anniversary, and has now surpassed three billion downloads.
- Cool.
- If only 3 billion people were actually upscaled and knowledgeable in Terraform, they wouldn’t be having a problem with software maturity in the cloud.
08:40 📢 Ryan – “To be fair, about a billion of those downloads are me trying to figure out which version I need. Do I switch between Brew Link or some other tool to trick my operating system into doing the right thing?”
AI Is Going Great – Or, How ML Makes All It’s Money
09:13 Announcing the General Availability of Databricks Assistant and AI-Generated Comments
-
- Databricks is announcing the general availability of Databricks Assistant and AI-Generated Comments on all cloud platforms.
- Databricks is making these features available at no additional cost for all of their customers.
- Databricks assistant was built to be the best AI-Powered productivity tool for enterprise data.
- The assistant is one of the fastest growing Databricks features ever, with over 150,000 users leveraging it every month to generate code, utilizing Databricks Assistant Autocomplete.
- It will also troubleshoot errors, create visualizations and dashboards.
- A tool like an assistant needs access to high-quality metadata in order to produce the best results.
- The other feature launching in GA is AI-Generated comments in Unity Catalog. This feature leverages generative AI to provide relevant table descriptions and column comments. Since launching the feature Databricks says 80% of table metadata updates are AI-Assisted.
- This is one of the best ways to improve assistant answers. Providing descriptive comments for tables and columns greatly improved the accuracy of responses in their benchmarks.
- “The introduction of the Databricks Assistant has made it easier for our user base to improve their skill set. By seamlessly integrating into our development cycles, it has significantly enhanced productivity for our expert data scientists, engineers, and analysts. Users can comprehend, generate, optimize, and troubleshoot code faster than ever before. With the continuous growth of the Databricks ecosystem and the ongoing advancements of the Databricks Assistant, next-level tools are more accessible.” — Nicholas Heier, Senior Manager, General Motors
AWS
12:35 Amazon WorkSpaces Pools: Cost-effective, non-persistent virtual desktops
- You can now create a pool of non-persistent virtual desktops using Amazon Workspaces and share them across a group of users.
- As the desktop administrator you manage a portfolio of persistent and non-persistent virtual desktops using one GUI, command line or set of API powered tools. Users access the desktops via browser, client app or a thin client device.
- Workspaces pools has the following features:
Each user gets the same applications and the same experience. When they login they always get a fresh workspace thats based on the latest configuration for the pool, centrally managed by their administrator.
If you enable application setting persistence for the pool, users can configure certain applications settings such as browser favorites, plugins and UI customizations to persist. You can also access persistent files or object storage external to the desktop.
- This is a great use case for remote workers, tas workers, contact center workers or students.
- The pool can be configured to accommodate the size and working hours of your user base, and you can join the pool to your organization’s domain and AD.
- Pricing for a windows Desktop:
Persistent: 2 VCPU, 8GB of memory, 45.00 per month
Pool: $148 dollars for 720 hours.
More reasonable: 20 days a month for 8 hours a day. $36.19
Make sure you scale it, and make sure you do your maths.
14:03📢 Jonathan – “There was a time when the API for workspaces was absolutely terrible or missing. So the fact that they’ve called it out in the blog post is good news.”
14:19 Amazon WorkSpaces introduces support for Red Hat Enterprise Linux
- In addition, AWS now supports Red Hat Enterprise LInux on Workspaces Personal. This includes built-in security features that help organizations run virtual desktops securely, while increasing agility and reducing costs.
- Interested in Pricing? You can find that info here.
17:54 Introducing end-to-end data lineage (preview) visualization in Amazon DataZone
- Amazon Datazone is a management service that catalogs, discovers, analyzes, shares and governs data between your organization’s data producers and consumers. Engineers, data scientists, product managers, analysts, and business users can easily access data throughout your organization using a unified data portal.
- Now in preview, a new API-driven and OpenLineage-compatible data lineage capability will provide an end-to-end view of data movement over time.
- It helps users visualize and understand data provenance, trace change management, conduct root cause analysis when a data error is reported and prepare for questions on data movement from source to target.
- Reading through this press release they just keep using the same words, but we’re not sure the author fully understands what this is either.
- Except maybe Jonathan. He understands everything.
- Data lineage can reduce the time spent mapping a data asset and its relationships, troubleshooting and developing pipelines and asserting data governance practices.
- If you’re in the DataZone, good luck to you.
19:16📢 Jonathan – “It’s just a chain of custody for data instead of a physical object. So you kind of attach metadata to the data to say where it came from. And every time you do something to it, you record in a log that you did something to it, especially in data science, where we’re enriching things or transforming things or cleaning out some types of records or whatever. It’s just a way of keeping track of who’s changed what since it came from where it came from. If stuff goes wrong in the future, you know where it went wrong.”
- Amazon is announcing they are furthering their integration between CodeCatalyst with Gitlab and Bitbucket, in addition to github.
- Using the Github, Gitlab and Bitbucket repositories extension for CodeCatalyst simplifies managing your development workflow.
- The extension allows you to view and manage external repositories directly within CodeCatalyst.
- In addition to the added support you can now create a CodeCatalyst project directly in github, Gitlab or bitbucket cloud from a blueprint by adding the blueprint to an existing code base in the repository you can speed up your development.
- These pre-built blueprint templates provide a source repo, sample code, CI/CD workflow and integrated issue tracking to get you started quickly.
22:16📢 Ryan – “Yeah, standardization in general is fantastic for this. And it’s so much easier to navigate lots of many, many separate little repos when you know how they’re organized, you know what’s supposed to be where. And it really removes a lot of the need for as much documentation because of that too. And what documentation you do need, you can include in the blueprint or at least the templates.”
23:22 AI21 Labs’ Jamba-Instruct model now available in Amazon Bedrock
- Jamba-Instruct, a powerful instruction-following large language model, is now available on Bedrock.
- Fine-tuned to instruction follow and built for reliable commercial use, Jamba-Instruct can engage in open-ended dialogue, understanding context and subtext, and complete a wide variety of tasks based on natural language instructions.
- Jamba instruct has a 256k context window, and has the capability to ingest the equivalent of a 800 page novel or an entire company’s financial filings for a given fiscal year.
- This large context window allows Jamba-Instruct to answer questions and produce summaries that are grounded in the provided inputs, eliminating the need for manual segmentation of documents in order to fit smaller context windows.
- With strong reasoning and analysis capabilities, Jamba-Instruct can break down complex problems, gather relevant information and provide structured outputs. It is ideal for enterprise use cases such as enabling Q&A on call transcripts, summarizing key points from a document, building chatbots and more.
24:45 📢 Jonathan – “You can get Jamba for free, I think, from Hugging Face. It’s a good model though, and it’s a lot more efficient. It’s significantly more efficient than other LLMs that are around. It uses a hybrid algorithm, hybrid model. So you’ve got the kind of traditional LLM generator, and then you’ve got this new thing called Mamba, which is like a selective attention thing.”
27:34 Optimizing Amazon Simple Queue Service (SQS) for speed and scale
- Jeff Barr has a great write up on the recent speed and scale improvements to SQS.
- Amazon Simple Queue Service is one of the oldest services, launched in 2006, and is a fundamental building block for microservices, distributed systems and serverless applications with processing over 100M messages per second at peak times.
- While it’s been tremendously successful there are always opportunities to improve and Jeff wants to share some of the recent things they did to reduce latency, increase fleet capacity, reduce power consumption and mitigate an approaching scalability cliff.
- Like many services, SQS is implemented using a collection of internal microservices. For this article Jeff focuses on the Customer Front-End which accepts, authenticates, and authorizes API calls such as CreateQueue and SendMessage, which are in turn routed to the storage backend.
- The backend microservice is responsible for persisting messages sent to standard (non-fifo) queues. Using a cell-based model, each cluster in the cell contains multiple hosts, each customer queue is assigned to one or more clusters, and each cluster is responsible for a multitude of queues.
- The original implementation used a connection per request between the front and back end. Each front end had to connect to multiple backend hosts, forcing the use of a connection pool and also risking reaching an ultimate, hard-wired limit on the number of open connections. While you could throw hardware at the problem and scale out, it’s not always the best way, it just moves the moment of truth (scalability cliff) into the future and does not make efficient use of resources.
- After considering several solutions, the SQS team invented a new, proprietary binary framing protocol between the customer front-end and storage back-end. The protocol multiplexes multiple requests and responses across a single connection, using 128-bit IDs and checksumming to prevent crosstalk.
- Server side encryption provides an additional layer of protection against unauthorized access to queue data.
- The new protocol was put into production earlier this year, and has processed 744.9 trillion requests as of the blog post. The scalability cliff has been eliminated and they’re looking to see if they can put the protocol to work in other ways.
- Overall the performance improvement has reduced dataplane latency by 11% on average, and by 17.4% at the P90 mark.
- Customers benefit from this too, with messages sent by SNS now spending 10% less time “inside” before being delivered. In addition, the current fleet can now handle 17.8% more requests than before.
31:25📢 Justin – “I was thinking about the, you know, the fact that using 128 bit ID to check some, to rent the cross talk and to basically identify the multiplex streams going across the packet. And I was just thinking about how many, how many combinations there are in 64 bit IP space. And then like thinking about what that means at 128 bit ID, I was like, wow, that’s some processing.”
GCP
32:55 Making Vertex AI the most enterprise-ready generative AI platform
- Google says the use of AI has been impressive, and customers like UberEats, ipsos, Jasper, Shutterstock and others are accelerating their gen AI use cases into production with google cloud.
- Some of the examples of innovations Google has seen since the release of Gemini 1.5 Pro and multi-modal use cases
- A fast food retailer is using Gemini to analyze video footage from its stores to identify peak traffic periods and to optimize store layouts for improved customer experiences. The retailer also plans to combine this video analysis with sales data to better understand the factors that drive efficient and successful service.
- A financial institution is processing scanned images of identification with submitted data forms, leveraging Gemini’s multimodality to automatically (and quickly) process both images and text to compare information for accuracy and help customers more conveniently open and access accounts.
- A sports company is leveraging Gemini to analyze a player’s swing. By overlaying Gemini’s insights into their existing application, the AI’s analysis enhances the functionality of their swing analysis tool.
- An insurance company can now analyze dashcam footage of accidents using Gemini to better understand and describe scenarios. This analysis can help calculate risk scores and even provide personalized driving tips based on observed behaviors.
- An advertising and marketing services company is revolutionizing video description solutions by developing real-time streaming capabilities for both description and narration. This innovation streamlines video creation, increases efficiency, and allows for personalized content.
- With all of this Google is working to make Vertex the most enterprise-ready generative AI platform and they have several new capabilities they want to point out.
- Gemini 1.5 Flash combines low latency, competitive pricing and their groundbreaking 1 million-token context window, making it an excellent option for a wide variety of use cases at scale.
- Gemini 1.5 Pro now has a context window of up to 2 million tokens.
- Imagen 3: gives you faster image generation and superior prompt comprehension.
- Plus the expansion of the available models to include Claude 3.5 Sonnet.
- New context caching for Gemini 1.5 Pro and Flash allows you to enable in public preview.
- As context length increases, it can be expensive and slow to get responses in long-context applications, making it difficult to deploy in production. With vertex AI context caching customers significantly reduce input costs by 75% leveraging cache data of frequently used context.
- Google is the only provider to offer a context caching API today.
- General Available of Provisioned throughput lets customers responsibly scale their usage of Google’s first party models, providing assurances for capacity and price.
- Grounding with Google search announced at Next.
- Grounding with High-fidelity, announced in experimental preview, is a purpose-built to support such grounding uses cases from financial services, healthcare, insurance that can be sourced only from the provided context, not the models world knowledge.
- Data residency requirements can now be met for 23 countries with at-rest guarantees.
37:36📢 Justin – “I think about the coffee use case, right? So the retail point of sale gives you details like, hey, the coffee was sold to Ryan, but then when they gave it to him and it was made wrong and they had to then go remake it, like that’s not that the POS typically doesn’t capture where potentially the video stuff can do that. So you could, there’s all kinds of interesting use cases of how you can start to improve churn in your product margin. You can start helping for training baristas like Hey This barista made 12 coffees today that had to get remade. You know, there’s something wrong. Do we need to retrain them on something? So, it’s maybe a little nanny state -ish, but I also could see how some of these things could benefit some organizations that need to be very cost effective.”
39:58 New Cloud KMS Autokey can help encrypt your resources quickly and efficiently
- Encryption is a fundamental control for data security, sovereignty and privacy in the cloud.
- While google cloud providers default encryption for customer data at rest, many organizations want greater control over their encryption keys that control access to their data.
- CMEK – or Customer-Managed Encryption Keys – can help you by providing flexibility in cryptographic key creation, rotation, usage logging and storage.
- While CMEK provides the additional control that many organizations need, using it requires manual processes that require time and effort to ensure the desired controls and configurations are implemented.
- To help make CMEK configurations more efficient, Google is announcing the launch of KMS Autokey in preview. Cloud KMS Auto Key automates key control operations for CMEK and incorporates recommended best practices that can significantly reduce the toil associated with managing your own encryption keys, which can help developers complete their projects faster.
- Cloud KMS autokey eliminates the manual effort in key creation, Keyrings and Keys are generated automatically during resource creation, and the necessary IAM roles for encryption and decryption operations are assigned at the same time.
- Autokey also simplifies key selection by automatically choosing the appropriate key type for each resource, reducing complexity and manual effort.
- When using the Cloud KMS Autokey, you accomplish three vital goals:
- Ensuring consistent practices
- Creating granular encryption keys
- Increasing your productivity
41:16📢 Ryan – “Implementation will be key on this one, right? Cause it’s, it’s the difference between, they’ve created a wizard that creates KMS keys or they’ve really, you know, taken stuff, manual toil that, you know, this is probably not the biggest problem, right? But also it just reduces it. And, you know, and also it’s from my perspective, having worked with keys for a long time, I might be more familiar with these practices where someone who’s just learning about these, like it’s great way to kind of learn how to manage these things when it’s sort of automated as part of the backend.”
43:20 Google Cloud expands grounding capabilities on Vertex AI
- Vertex AI Agent Builder was introduced in April, and there are several new capabilities to help you build more capable agents and apps
- Grounding with Google Search is now Generally Available, with dynamic retrieval, a new capability to help balance quality with cost efficiently by intelligently selecting when to use Google search results and when to use the models training data
- Grounding with High-Fidelity mode for grounded generation with reduced hallucinations.
- Grounding with third party datasets is coming in Q3, these capabilities will help customers build AI agents and applications that offer accurate and helpful responses. They are working with providers like Moody’s, MSCI, Thomson Reuters and Zoominfo.
- Expanding Vector search, which provides embeddings-based RAG, now offers hybrid search in public preview.
44:12📢 Justin – “Wouldn’t it be nice to talk about like a new storage thing or anything but AI?”
45:12 Google Cloud Marketplace now lets customers buy ISV solutions from channel partners
- Google is now allowing Marketplace Channel Private Offers, which enables channel partners to maintain their relationships with customers as partner of choice.
- This comprehensive program allows customers, ISV Partners and channel partners to efficiently transact private offers via channel-partner-initiated sales of third party solutions listed on the google cloud marketplace.
- This builds on the capabilities introduced last year, allowing ISV partners on the Google Cloud Marketplace to extend discounts to select resellers.
46:07📢 Ryan – “So it’s like, I still want to pay you, but I also want that to count against my cloud commit.
47:25 Gemma 2 is now available to researchers and developers
- Google is releasing Gemma 2 to researchers and developers globally.
- Available in both 9 billion and 27 billion parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements builts in.
- In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a Single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.
- Gemma 2 was built on a redesigned architecture, engineered for both exceptional performance and inference efficiency. Here is what makes it stand out:’
- Outsized Performance: at 27B, Gemma 2 delivers the best performance for its size class, and even offers a competitive alternative to models more than twice its size.
- Unmatched efficiency and cost savings: The 27B gemma 2 model is designed to run inference efficiently at full precision on a single Google Cloud TPU, Nvidia A100 80GB Tensor Core GPU or Nvidia H100 tensor core GPU, significantly reducing costs while maintaining high performance. (yeah it runs real slow on Justin’s M1 macbook… 🙂 )
- Blazing fast inference across hardware: Gemma 2 is optimized to run at incredible speed across a range of hardware, from gaming laptops and high-end desktops to cloud-based setups.
- Academic researchers can apply to the Gemma 2 Academic Research Program in order to receive Gemma 2 credits, and new Google Cloud users may be eligible for up to $300 in credits.
49:39 Announcing expanded Sensitive Data Protection for Cloud Storage
- Google is announcing Sensitive Data Protection discovery service now supports cloud storage, in addition to previously supported BigQuery, BigLake and Cloud SQL.
- SDP discovery now supports the most common services used by their customers to store data on google cloud.
- Sensitive data protection discovery provides continuous data monitoring to identify where sensitive data resides, in order to help manage security, privacy and compliance risks.
50:15📢 Ryan – “Wow! I just, yeah, I’ve always just assumed that this was available. In fact, like, you know, it’s recently been in talks trying to use this. And so glad they announced this. Yeah, this would have been terrible.”
- There wasn’t a whole blog post about this, just a little announcement, but essentially there’s a Capacity Planner update.
- It now displays GPU usage, and forecasts of your GPU usage.
- And yes, the purpose of this is to get people to buy reservations.
Azure
51:19 How hollow core fiber is accelerating AI
- So, let us first assure you that your RSS feeds aren’t broken – there’s been no new blog posts from Azure in 3 weeks, but, in our ongoing commitment to you – our listener – we dug until we found SOMETHING to present.
- And yes… we are talking about Fiber Optics. But also somehow AI?
- Microsoft is dedicated to working with the newest and greatest technology to power Scientific breakthroughs and AI solutions. One of these was highlighted at Ignite in November and that was Hollow Core Fiber, an innovative optical fiber that is set to optimize Azure’s global cloud infrastructure, offering superior network quality, improved latency and secure data transmission.
- HCF was developed to meet the heavy demands of workloads like AI and improve global latency and connectivity. (I’m pretty sure AI was not at the forethought, and this is just MS just sprinkling everything with a little AI)
- HCF uses a proprietary design where light propagates in an air core, which has significant advantages over fiber built with a solid core of glass. An interesting piece here is that the HCF structure has nested tubes which help reduce any unwanted light leakage and keep the light going in a straight path through the core.
- Light travels faster through air than glass, HCF is 47% faster than standard Silica glass, delivering an increased overall speed and lower latency. It also has a higher bandwidth per fiber.
- Honestly, we’re just glad to see advancements in Fiber.
- The AI tack on…ehhhhh.
OCI
-
- Oracle is announcing the GA of HeatWave GenAI, which includes the industry’s first in-database large language models (LLMs), an automated in-database vector store, scale-out vector processing, and the ability to have contextual conversations in natural language informed by unstructured content.
- These new capabilities enable customers to bring the power of generative AI to their enterprise data without requiring AI expertise or having to move data to a separate vector database.
- Heatwave Gen AI is available immediately in all Oracle Cloud regions, OCI dedicated regions and across clouds at no extra cost to HeatWave Customers.
- “HeatWave’s stunning pace of innovation continues with the addition of HeatWave GenAI to existing built-in HeatWave capabilities: HeatWave Lakehouse, HeatWave Autopilot, HeatWave AutoML, and HeatWave MySQL,” said Edward Screven, chief corporate architect, Oracle. “Today’s integrated and automated AI enhancements allow developers to build rich generative AI applications faster, without requiring AI expertise or moving data. Users now have an intuitive way to interact with their enterprise data and rapidly get the accurate answers they need for their businesses.”
- New features include:
- In-database LLMs
- Simplify the development of generative AI applications at a lower cost. Customers can benefit from generative AI without the complexity of external LLM selection and integration, and without worrying about the availability of LLMs in various cloud providers’ data centers.
- Automated In-database Vector Store
- Enables customers to use generative AI with their business documents without moving data to a separate vector database and without AI expertise. All the steps to create a vector store and vector embeddings are automated and executed inside the database, including discovering the documents in the object storage, parsing them, generating embeddings in a highly parallel and optimized way, inserting them into the vector store making HeatWave Vector Store efficient and easy to use.
- Scale-out vector processing
- Delivers very fast semantic search results without any loss of accuracy. HeatWave supports a new, native VECTOR data type and an optimized implementation of the distance function, enabling customers to perform semantic queries with standard SQL. in-memory hybrid columnar representation and the scale-out architecture of HeatWave enable vector processing to execute at near-memory bandwidth and parallelize across up to 512 HeatWave nodes.
- heatwave Chat
- Visual code plug-in for MySQL shell which providers a graphical interface for HeatWave GenAI and enables developers to ask questions in natural language or SQL. The integrated lakehouse Navigator enables users to select files from object storage and create a vector store. Users can search across the entire database or restrict the search to a folder.
- In-database LLMs
56:43📢 Jonathan – “It’s like copilot. It’s our copilot, isn’t it?”
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod