Posted by: drdread - 04 February 2025 20:17:06 |
---|
Could we make a framework for using the MS server 2022 AI components to run LLMs in an IIS publishing instance of an AI? |
Posted by: sirclesadmin - 04 February 2025 20:18:00 |
---|
Yes, it is possible to build a framework for running Large Language Models (LLMs) using Microsoft Server 2022 AI components within an IIS (Internet Information Services) instance. Here's a structured approach to making it work: Framework for Running LLMs on MS Server 2022 with IIS1. System RequirementsBefore setting up, ensure your server has:
2. Setting Up IIS for AI Model HostingIIS (Internet Information Services) can serve an AI inference API by hosting an application (ASP.NET Core, FastAPI via IIS, or Flask with WSGI). Steps to configure IIS for LLM API hosting:
3. Running LLM Inference on MS Server 2022 AI ComponentsWindows Server 2022 supports:
3.1 Convert and Optimize the LLM for ONNX
3.2 Running ONNX Inference via .NET APIAlternatively, if using ASP.NET with ONNX, a C# web API can run AI inference:
Expose this via an ASP.NET Core API inside IIS. 4. Performance Optimization
5. Deployment Steps
ConclusionThis framework allows Microsoft Server 2022 to host LLMs inside IIS using:
This setup provides scalable, enterprise-level LLM hosting, capable of serving AI-powered applications on a Windows Server stack. π |
Posted by: drdread - 04 February 2025 20:18:47 |
---|
so how do I deploy the AI after this? |
Posted by: sirclesadmin - 04 February 2025 20:19:21 |
---|
Deploying the AI Model After Running the Setup ScriptNow that your Windows Server 2022 is ready with IIS, ONNX Runtime, and Python/.NET, follow these steps to deploy your Large Language Model (LLM) API. π Deployment Steps for LLM on IIS (Using Python/ONNX)1. Choose Your AI API FrameworkYou can deploy the LLM as a web API using:
For this example, we will use FastAPI (Python) because itβs lightweight, fast, and integrates well with ONNX Runtime. 2. Upload Your AI Model to the ServerIf you converted your model to ONNX, upload it to
3. Install Python Dependencies (If Not Already Installed)Log in to your server, open PowerShell, and run:
This ensures all necessary packages for inference and API deployment are installed. 4. Create the AI API Using FastAPIOn the server, create a new folder for the AI service:
Create a Python script for the API:
Paste the following Python code to load the LLM and serve API requests:
Save the file and close Notepad. 5. Test the AI API LocallyRun the API manually first:
Then, in a web browser, open:
You should see the FastAPI Swagger UI to test the 6. Deploy the AI API as a Windows Service (Persistent)Instead of running manually, letβs set it up as a Windows Service so it runs automatically.
Now, the AI service will automatically restart if the server reboots. 7. Configure IIS as a Reverse Proxy for Public AccessNow, letβs make the AI API publicly accessible via IIS.
Now, you can access the AI API using:
β Final Deployment Checklistβ AI Model Deployed in ONNX Format Now your AI is live and accessible from the web! π |
Posted by: drdread - 04 February 2025 20:19:36 |
---|
how would I do it with .net core to keep everything MS compatible? |
Posted by: sirclesadmin - 04 February 2025 20:20:10 |
---|
Deploying LLM with .NET Core on Windows Server 2022 (IIS & ONNX)To keep everything Microsoft-compatible, weβll use .NET Core 6/7+, ONNX Runtime, and IIS for hosting the AI API. This method ensures:
β
Full Microsoft stack compatibility π Steps to Deploy LLM in .NET Core on Windows Server 20221. Install Required SoftwareEnsure your Windows Server 2022 is ready:
2. Prepare AI ModelIf you havenβt already converted your LLM to ONNX format, do this:
Upload gpt2.onnx to 3. Create .NET Core AI Web API
4. Implement ONNX Model InferenceReplace
5. Test the API LocallyRun the API:
Visit:
and test the 6. Publish & Deploy to IIS
Now your AI API is live at:
β Final Deployment Summaryβ Windows Server 2022 AI-ready Now you have a Microsoft-native LLM API running on IIS with .NET Core! π |