InfraOps Reliability AdministratorInsideHigherEd • Tallahassee, Florida, United States

InfraOps Reliability Administrator

InsideHigherEd • Tallahassee, Florida, United States

job_description.job_card.variable_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

serp_jobs.job_card.part_time

job_description.job_card.job_description

Job Title : InfraOps Reliability Administrator

Location : Hybrid

Regular / Temporary : Regular

Full / Part Time : Full-Time

Job ID : 60506

Department

This position is within FSU’s Department of Information Technology Services (ITS)

Click here to see what the current team has to say about this role.

Responsibilities

The FSU College of Medicine Infrastructure and Operations team designs, builds, and manages infrastructure and servers to support other IT teams, faculty, staff, researchers, and students within the college. The team leverages the latest in automation and observability solutions to make complex work easier to accomplish. Design, build, automate, and optimize infrastructure using modern tools and site reliability engineering practices. Manage primarily Windows servers in a hybrid cloud environment, with a focus on reliability, observability, security, and continuous improvement. Collaborate across teams and leverage automation, scripting, data-informed decision-making, and self-directed professional development to deliver secure, scalable, and customer-focused solutions.

Infrastructure and configuration as code : Use tools such as Terraform, Azure DevOps, Visual Studio Code, and scripting languages like PowerShell and Bash to manage infrastructure as code (IaC) and configuration as code (CaC), ensuring consistency, repeatability, and auditability of systems. Use observability solutions, such as Elastic, to monitor deployments and support data-informed decisions and rapid experiments, that drive continuous improvement. Work with CI / CD pipelines to automate deployment, validation, and testing processes, ensuring systems are secure by design, mitigate vulnerabilities, and are compliant with security policies and standards. Follow secure coding practices, adhere to coding standards, and leverage version control, automated testing, and test-driven development to produce high-quality, secure, and maintainable code. Use AI-assisted tools to accelerate development, validation, and troubleshooting. Participate in pair programming sessions as appropriate to write code and resolve deployment issues.

Provision and manage server infrastructure : Deploy and manage Windows and Linux servers across a hybrid environment that includes Microsoft Azure and over a dozen geographically dispersed on-premises locations. This includes ensuring that all systems are secure by design, follow zero trust principles, and are scalable, observable, and aligned with business needs. Provision infrastructure with reliability, maintainability, and consistency in mind, and implement observability prior to production to support proactive monitoring and data-informed decisions. Collaborate with cross-functional teams and stakeholders throughout the infrastructure lifecycle to ensure solutions align with customer needs; prioritize high-value work, assess feasibility, and conduct security reviews of new systems and applications; deliver exceptional customer service and maintain clear communication to support successful outcomes.

Automation : Automation is not just a task, it is a mindset and a strategic enabler of reliability, consistency, and scalability. Design and implement solutions that make work easier, reduce manual effort, improve system reliability, and streamline operations across provisioning, configuration, monitoring, and remediation. Use AI, scripting, workflow automation, or robotic process automation (RPA) tools to reduce operational overhead and accelerate delivery. Use observability tools to monitor automation performance, ensure reliability, and identify data-informed opportunities for continuous improvement. Collaborate with peers and stakeholders to prioritize high-value automation opportunities and ensure that solutions are effective, secure, and aligned with business needs.

Network administration : Manage and troubleshoot enterprise-grade network infrastructure, including wireless access points, switches, routers, load balancers, and next-generation firewalls. Diagnose and resolve network issues using packet captures, OS command outputs, diagnostic consoles, logs, or other tools. Leverage network observability tools to make data-informed decisions and identify opportunities for improvement. Implement and maintain security measures to protect data, systems, and network availability. Collaborate with network and security teams to validate new systems and configurations, expand observability, reduce exploitable vulnerabilities, implement security controls, and enhance system resilience and usability for customers.

Documentation and process improvement : Create and maintain clear, concise documentation for knowledge sharing, process repeatability, and operational continuity. Develop system diagrams, deployment guides, and standard operating procedures (SOPs) that support usability, compliance, and reliability. Continuously refine documentation and processes as systems evolve, incorporating feedback and lessons learned. Ensure all procedures align with FSU ITS Security Policies and Standards. Participate in peer reviews to validate documentation for accuracy, clarity, and usability.

Support and incident response : Respond to system alerts, outages, and support requests in accordance with established incident management procedures, collaborating with peers and stakeholders to ensure rapid resolution. Use observability tools to support rapid diagnosis and resolution, and create new monitoring as needed to improve visibility. Participate in post-incident reviews, highlighting key data points and observability insights to identify root causes and opportunities for system or process improvements. Implement improvements to prevent the recurrence of issues and to enhance system reliability. Participate in an on-call rotation, typically one week per month, which includes after-hours support for deployments, changes, or incidents, including on holidays and weekends. Actively work to reduce the need for after-hours assistance by leveraging automated deployment solutions, improving system reliability, and lowering the risk and complexity of changes. Assist with IT security investigations as needed. Ensure incident response processes align with the expectations of IT management, technical teams, and customers.

Professional development : Continuous learning and technical curiosity are key expectations of this role. Complete both assigned and self-directed professional development to stay current with evolving technologies, tools, and practices. Explore technical subjects that interest you, even beyond current projects. Use provided learning platforms, such as LinkedIn Learning. Participate in the ITS Professional Development Bonus Plan by completing manager-approved certifications. Pursue relevant training, certifications, and conferences aligned with team goals, subject to approval. Approved training resources will be paid for by the organization. Research and validate emerging tools, including AI, automation, observability, and other innovations, to assess their value for our organization. Apply a mindset of rapid experimentation using data to guide decisions, improvements, and the next experiment. Participation in knowledge-sharing sessions, communities of practice, and collaborative learning opportunities is encouraged.

Qualifications

Bachelor's degree in Computer Science, MIS, or other appropriate degree and two years experience or a high school diploma or equivalent and six years of experience. (Note : or a combination of appropriate post high school education and experience equal to six years.)

Preferred Qualifications

Proven ability to learn new tools and technologies quickly, with a track record of self-directed learning and adaptability in fast-paced environments.
Demonstrated commitment to continuous learning and professional development.
Proficient in scripting for infrastructure automation using PowerShell, with the ability to write, debug, and maintain scripts independently or with tools like GitHub Copilot; familiarity with Python or Bash is a plus.
Experience using infrastructure and configuration as code tools such as Terraform, Ansible, PowerShell, or similar, with version control practices using Git, and integrated development environments like Visual Studio Code.
Experience creating and troubleshooting CI / CD pipelines using tools such as Azure DevOps, GitHub Actions, or GitLab to automate infrastructure deployment and configuration.
Experience provisioning and managing infrastructure in cloud environments such as Azure, AWS, or Google Cloud, with an understanding of repeatable deployment processes, and troubleshooting network connectivity with next-generation firewalls.
Experience deploying containers and familiarity with container orchestration technologies such as Kubernetes.
Proficient using observability tools such as Elastic, Dynatrace, Prometheus, Grafana, Splunk, Datadog, or others, to ingest new types of data, build dashboards and alerts, and derive insights for performance tuning and incident response.
Experience improving infrastructure design, automation, or troubleshooting by testing ideas, learning from results, and making thoughtful adjustments over time.
Experience supporting Windows and Linux systems in an Active Directory domain, including deployment, configuration, and troubleshooting, as well as managing virtual infrastructure using platforms such as Hyper-V or VMware.
Experience leveraging AI tools to accelerate task completion and improve operational efficiency.
Demonstrated ability to write and troubleshoot firewall rules and quickly diagnose issues across firewalls, switches, and wireless access points from vendors such as Palo Alto, Juniper, Aruba, Arista, Fortinet, Extreme, Brocade, Cisco, or others, with a focus on identifying root causes across network, OS, and application layers.
Strong understanding of secure-by-design and zero trust principles, with experience applying secure configurations and patching strategies in operational environments.
Demonstrated experience in infrastructure projects by planning and executing technical tasks such as system deployments, launching new remote locations, or automating business processes. This includes prioritizing high-value work, ensuring long-term maintainability through documentation and repeatable processes, leveraging automation where appropriate, and working closely with cross-functional teams to drive project success.
Strong written and verbal communication skills, including the ability to document processes, contribute in team discussions, and explain technical concepts to various audiences.
Proficient in creating technical diagrams to communicate infrastructure design or operational workflows.

Helpful

Why is this role important to the organization and its mission?

This is a high-impact position where your work supports clinicians caring for patients today, researchers working to improve health outcomes in the future, and students preparing to serve in clinical settings after graduation. You will have the opportunity to shape infrastructure strategy, contribute to architectural decisions, and lead automation efforts that improve reliability and efficiency. Whether you are already working in an InfraOps or SRE role, or you are a systems administrator ready to take the next step, we encourage you to apply.

Who is an ideal candidate for this position?

The ideal candidate is someone who is genuinely passionate about IT, someone who has probably built a home lab (or cloud lab) just for fun, and gets excited about scripting, automation, and making systems run better. They love learning new technologies and finding better ways to get things done, whether that is writing a PowerShell script, setting up workflow automation to schedule a calendar appointment, or integrating AI into their workflow to save time and reduce toil. They are the kind of person who starts the day looking forward to pushing code to the git repo, watching it flow through the deployment pipeline, and seeing it make a real impact on clinical, research, and education efforts.

This is a role for someone who loves solving puzzles, thrives in both independent and collaborative settings, and gets excited about working with modern tools to make infrastructure smarter, faster, and more reliable. They know how to communicate clearly, whether it is writing documentation, contributing in team meetings, or breaking down complex technical ideas for someone who just needs the bottom line. They understand how to capture and interpret network traffic when things go sideways, and they know how to turn that data into action. They are curious, self-driven, and motivated by the idea that their work helps support something bigger than just infrastructure, it enables clinicians caring for patients today, researchers working to improve health outcomes in the future, and students who will one day serve in clinical settings after graduation.

What is a typical day in this position?

A typical day begins with hands-on work such as writing scripts, deploying infrastructure, or addressing tasks that have not yet been automated. You may manually configure a Windows Server 2016 system as part of legacy support, or deploy a new Windows Server 2025 instance using the latest automation tools. After completing any manual work, you will often begin designing an automated solution to replace it, write the necessary scripts, and submit a pull request to improve future efficiency for the team.

You may participate in a pair programming session focused on scripting or automation, helping to refine logic, improve readability, or troubleshoot unexpected behavior. Throughout the day, you might respond to an automated alert or assist another IT team with a technical issue. When incidents arise, you will work alongside the rest of the team to investigate and resolve the issue, often using tools like Elastic to gather context and identify root causes. This triggers a post-incident review, where the team reflects on what did not go as planned and then takes action over the following days to prevent similar issues from occurring again.

Toward the end of the day, you will join a team meeting to share updates, align on progress, and discuss priorities. You will also take time to plan and prioritize your work for the next day. Once a week, you will meet one on one with your supervisor to check in, share feedback, and work on removing any roadblocks that may be slowing progress.

What can I expect in the first 60-90 days?

In your first few weeks, you will work through onboarding tasks, get familiar with our environment, and shadow team members to learn how we approach infrastructure, automation, and reliability. You will begin writing scripts in your first week and gradually take on more responsibility as you gain context. By the end of your first month, you will be contributing to deployments, writing automation, and participating in troubleshooting efforts using modern tools and practices.

You will be integrated into broader initiatives and ongoing projects early on, contributing where your skills align and learning from the team along the way. While our project list is always evolving, one constant is the hands-on work of provisioning and configuring new or replacement infrastructure; we always use the latest supported Windows Server or Linux systems, whether on-premises or in Azure. You can expect to be involved in meaningful, high-impact work from the start, with opportunities to shape and automate the infrastructure that powers our organization.

We value the unique experience and perspective each team member brings, and we look forward to learning from you just as much as you learn from us. Our collaborative and innovative culture encourages finding better ways to accomplish work and achieve goals. You will play a key role in helping us improve our processes and remove barriers to success. As you build confidence and context in your first few months, you will be fully supported as you prepare to join our on-call rotation, where your contributions will directly support our reliability goals. We are committed to fostering a goal-oriented, easy-going environment where you can thrive.

University Information

One of the nation's elite research universities, Florida State University preserves, expands, and disseminates knowledge in the sciences, technology, arts, humanities, and professions, while embracing a philosophy of learning strongly rooted in the traditions of the liberal arts and critical thinking. Founded in 1851, Florida State University is the oldest continuous site of higher education in Florida. FSU is a community steeped in tradition that fosters research and encourages creativity. At FSU, there’s the excitement of being part of a vibrant academic and professional community, surrounded by people whose ideas are shaping tomorrow’s news!

Learn more about our university and campuses.

FSU Total Rewards

FSU offers a robust Total Rewards package. Visit our website to learn more about our Compensation, Benefits, Wellness, Recognition, and Employee Development programs.

Use our interactive tool to calculate Total Compensation options based on potential salary, benefits and retirement contributions, earned leave, and other employment-related perks.

How To Apply

If qualified and interested in a specific job opening as advertised, apply to Florida State University at https : / / jobs.fsu.edu. If you are a current FSU employee, apply via myFSU >

Self Service.

Applicants are required to complete the online application with all applicable information. Applications must include all work history up to ten years, and education details even if attaching a resume.

Considerations

This is an A&P position.

This position requires successful completion of a criminal history background check .

This position is open until filled.

This position has been designated as eligible for primarily remote based on the position functions. Employees are required to live in the Tallahassee area and report to campus as needed.

Equal Employment Opportunity

FSU is an Equal Employment Opportunity Employer.

serp_jobs.job_alerts.create_a_job

Reliability • Tallahassee, Florida, United States

Job_description.internal_linking.related_jobs

Applications Architect (1220 – Intermediate)

Pitisci & Associates • Tallahassee, FL, US

serp_jobs.job_card.part_time

serp_jobs.filters_job_card.quick_apply

APPLICATIONS ARCHITECT - INTERMEDIATE.Contractor staff assigned to this agreement must possess the following minimum qualifications and experience : . Five (5) years of experience designing, implement...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Remote Database Administrators - AI Trainer ($75-$100 per hour)

Mercor • Tallahassee, Florida, US

serp_jobs.filters.remote

serp_jobs.job_card.temporary

Role Overview • • Mercor is collaborating with a leading AI organization to identify experienced Database Administrators for a high-priority training and evaluation project.Freelancers will be tasked...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

IT Systems Analyst (Financial)

TECKpert • Tallahassee, Florida, USA

serp_jobs.job_card.full_time

No third parties and no sponsorship •.Founded in 2009 and headquartered in beautiful Miami FL TECKpert is a tech consulting and staff augmentation firm. At TECKpert we offer a contingent workforce bu...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Network Administrator

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

General Characteristics Monitors, troubleshoots and maintains network (LAN, WAN and wireless) multiplexers, hubs and routers, and uses remote monitoring tools. The duties of this position can be bro...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

OPS Infrastructure Projects Coordinator - 31902601 1

Florida State Jobs • Tallahassee, FL, US

serp_jobs.job_card.full_time

Ops Infrastructure Projects Coordinator.The Florida Division of Emergency Management (FDEM) plans for and responds to both natural and man-made disasters. These range from floods and hurricanes to i...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Cloud Infrastructure Engineer

ASSYST • Tallahassee, Florida, USA

serp_jobs.job_card.full_time

Senior Cloud Infrastructure Engineer.Azure and AWS) supporting our clients Enterprise Data and Analytics Platform (EDAP) in. The ideal candidate will bridge the gap between infrastructure security a...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Software Engineer (Leadership) - Infrastructure

META • Tallahassee, FL, United States

serp_jobs.job_card.full_time

Meta is seeking talented principal engineers to join our teams in building cutting-edge products that connect billions of people around the world. As a member of our team, you will oversee complex t...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Remote Database Administrators (Professional, Scientific, and Technical Services) - AI Trainer ($90-$160 per hour)

Mercor • Tallahassee, Florida, US

serp_jobs.filters.remote

serp_jobs.job_card.full_time

Mercor is recruiting • •Database Administrators who work in the Professional, Scientific, and Technical Services Sector • • as independent contractors working on a research project • •for one of the wo...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

Systems Administrator

Pointwest Technologies Corp • Tallahasee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

Description We are recruiting for this position on behalf of our client.We are seeking a Systems Administrator to support the Power Platform and SharePoint environments.You’ll mon...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Internet / Web Systems Administrator

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

General Characteristics Responsible for ensuring the control, integrity, and accessibility of the Internet / Intranet for the enterprise. Responsible for change management procedures regarding the ins...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Cryptographic Governance Lead

Centene • Tallahassee, FL, US

serp_jobs.job_card.full_time +1

You could be the one who changes everything for our 28 million members.Centene is transforming the health of our communities, one person at a time. As a diversified, national organization, you&rsquo...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new

UNIX Systems Administrator

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

General Characteristics Installs, configures, maintains and performs system integration testing of UNIX based operating systems, related utilities and hardware. Responsible for troubleshooting UNIX-...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Telecommunications Engineer

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

General Characteristics Responsible for engineering and / or analytical tasks and activities associated with areas within the telecommunications function (e. Monitors the operation of telecom network ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Software Engineer

Ac Disaster Consulting • Tallahassee, Florida, United States

serp_jobs.job_card.full_time +1

We are a leading, national consulting firm that provides compassionate, full-spectrum emergency management services including planning and preparedness, response, recovery, and mitigation services ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Systems Analyst Leader

Kanak Elite Services Inc • Tallahassee, Florida, USA

serp_jobs.job_card.full_time

TITLE - SYSTEMS ANALYST LEADER.INTERVIEWS : INTERVIEWS MAY BE CONDUCTED VIA MICROSOFT TEAMS OR IN PERSON.Term does not mean the awarded contractor or resource are locked in for the full year - any c...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

ERP Systems Administrator

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

General Characteristics Responsible for ensuring the performance and reliability of ERP systems.Performs troubleshooting for hardware, software and system problems that involve ERP modules.Particip...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Applications Architect

Elegant Enterprise- Wide Solutions Inc • Tallahassee, Florida, USA

serp_jobs.job_card.full_time

QUALIFICATIONS AND EXPERIENCE : .Five (5) years of experience designing implementing and administering enterprise-scale cloud environments in Azure and Amazon Web Services (AWS) including system arch...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Systems Administrator

Connective Business Solution • Tallahassee, FL, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

Systems Administrator Education : Bachelor’s Degree in Computer Science, Information Systems or other Information Technology major, or equivalent work experience. Required Tasks to be performed...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30