[HN Gopher] Automate Your Network ___________________________________________________________________ Automate Your Network Author : hjuutilainen Score : 104 points Date : 2023-07-03 15:01 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | runjake wrote: | The author states they have evolved from Ansible to pyATS[1], but | pyATS is a Cisco project. With Cisco's poor code project and open | source track record, I'm not sure how this is much of an | improvement, and IMHO, it's arguably worse. | | For possible alternatives, check out NAPALM[2] and Nornir[3]. | | It's also worth checking out Python for Network Engineers[4]. | | 1. https://developer.cisco.com/docs/pyats/ | | 2. https://napalm.readthedocs.io/en/latest/ | | 3. https://nornir.readthedocs.io/en/latest/ | | 4. https://pyneng.readthedocs.io/en/latest/index.html | xnyanta wrote: | Had the same reaction as soon as I found out pyATS is a cisco- | specific thing. I run very simple networks for events on | shoestring hardware/budgets and built a simple wrapper around | my own object model using python, jinja and napalm to deploy | cisco switches via SSH. Has terraform-like semantics | (plan/apply) and lets me be productive and eliminate config | drift. Napalm does all of the heavy lifting, it is fantastic. I | will probably be integrating it with netbox soon. | batch12 wrote: | Looks like he works for Cisco at the moment. Maybe that has | something to do with it. | betaby wrote: | ctr+f 'yang' - nothing | | ctr+f 'netconf' - nothing | dvno42 wrote: | Hey this is cool! Thanks for sharing your hard work. | | I have been living this for the past few years building an | automation product[0] and services company to lower the barrier | of entry and have tested many of these methodologies. We've also | written many different runbooks/playbooks for complicated | workflows. I'd like to share a couple experiences/opinions: | | Netconf and vendor apis are lovely when available and working | well. Many devices don't support this and falling back to SSH | (sometimes even telnet) is a must for automation. Imo, you could | add value to your book by touching on Ktbyer's | Netmiko/Paramiko[1] as well as their nuances (timeouts, dealing | with interactive prompts, etc). | | AAA is a big component of automation too. Having something in | place to handle authn/authz (radius/tacacs) enables consistency | for access across vendors. This also enables least privileged | accounts and rotation/limited lifetime of creds when used with | something like Hashicorp Vault[2]. I think you briefly mentioned | secrets management though Ansible vault. | | Another technology that may be worth mentioning is Textfsm[3] in | conjunction with Netmiko. When we automate workflows for clients, | there's often times where the data we need to parse isn't easily | parsable. Using and expanding on textfsm makes this doable. | | Lastly, much automation may only be one firmware change away from | breaking. Even with the big vendors, bugs are common that are | (ime) low priority to the OEM. Keep this in mind when writing | runbooks/playbooks, try to rely on features and output that are | unlikely to change across versions. | | [0]https://realmhelm.com [1]https://github.com/ktbyers/netmiko | [2]https://github.com/hashicorp/vault | [3]https://github.com/google/textfsm | Cyph0n wrote: | +1 to textfsm: it is an extremely powerful approach to reliably | parse CLI-based outputs. I used to do some IOS-XR device | automation when I worked at Cisco - mainly for integration | testing - and I (and other teams) used it heavily. | | This ties in to your point about how you often need to fallback | to SSH or Telnet. For example, a lot of platform-specific data | isn't exposed through standard interfaces, but almost | everything is available through a CLI. There are also times | when you have no choice but to use the CLI - for example, when | re-imaging or reloading a device. | nu11ptr wrote: | I do network automation for a profession. I build tools | (technically compilers) that take a proprietary object model | designed for our private cloud and translate that into Ansible | (v1) or Terraform (v2) code. At our company, I actually call | using these tools in isolation doing it "manually". This is | because the largest benefit of automation, I believe, is the | abstraction gained from the new object model and being to to | generate and store the inputs for Ansible/Terraform in a | database. If you have to track and specify all the inputs into | Ansible/Terraform and write the playbooks/HCL manually it is my | experience you don't actually save all that much work. However, | when you have an object model specifically designed for your use | case, you can deliver a new client network in literally minutes | (essentially nothing more than the cloud model, exactly what | AWS/Azure, etc does for their networking). The downside is most | enterprises don't have people like me to write the code to do | this, and writing it for a single deployment would likely not see | the gains that we see as a managed service provider. | jagged-chisel wrote: | Are you using an open source tool/stack to do this? Sounds | pretty awesome and I'd love to learn! | jmbwell wrote: | There's a push and pull; ansible and terraform both have some | facilities for doing what you describe, but of course if you're | using both tools, then you wind up where you are, needing yet | another layer of abstraction common to both. | | In the book, the author presents an approach for storing the | object state and organizing the repository for ansible purposes | in what is at least as sensible a way as any other I've seen. | For installations that might not directly benefit from | additional layers of abstraction, managing object model state | using ansible's native functionality might well be sufficient. | | This is all a legitimate challenge, in any case. Network | infrastructure and service instances have some management | issues in common, but where they differ, they can differ by | quite a bit, in ways that are hard to model at any level of | abstraction. | nu11ptr wrote: | I'm not using both. The first version of my tool used | Ansible. The second version used Terraform. They were written | 4 years apart. My users are not devops savvy. They use | runbook forms to call into my API giving them a very simple | UI that requires almost zero input. The object model includes | lifecycling so certain attributes can be changed, etc. and | validation done to ensure only a correct network is output. | This isn't required by everyone, but it wasn't done out of | necessity on how I'm using the tools, but to satisfy the | business problem I'm trying to solve (automate network | deployment with as few human inputs as possible over the | entire lifespan of a client and infrastructure). | | I wasn't critiquing the author, but networks inherently have | a lot of input data. Much of this is not of concern to the | end user, hence why public clouds require almost zero input | on the network side. | | I agree that my object model is purpose built for our | product. It would not work for someone else's network. | xnyanta wrote: | This model is probably more common than you think, I don't see | how anyone would be doing this any other way in a scalable | fashion. | tmerse wrote: | This sounds interesting, but I am not sure I fully understand. | Could an analogy be the object model to loosely correspond to | sth like Amazon cdk and the Ansible part being the derived | Cloudformation (any other analogy should do, but those are | things I understand a bit more although I use quite a bit of | ansible, but I am no network Person)? I still don't fully | understand the database part. Is it a better way to manage env | variables/allows for more flexible input? | | Thank you | nu11ptr wrote: | Essentially we have a very specific network topology we are | trying to build for each of our clients. The goal is to auto- | generate as much of the input as possible, validate that | which is given, and allow it to be lifecycled (attributes can | change, but only in certain valid ways, objects | created/changed/deleted, but only if they aren't referenced | by other objects, etc). Due to this, a database is need to | store each "object". When the network is "pushed", the | database walked and a fresh set of ansible (or terraform for | v2) is generated in seconds. | | Iow, it is custom set of lego bricks that can only be | combined in certain ways to build valid networks. It is | propriety to our cloud product which has the benefit of | allowing us to abstract things away that others probably | couldn't, but the downside of making it entirely non- | reusuable for a different use case. | totallywrong wrote: | Isn't that a lot of words to say that you have a custom set of | Terraform modules for your needs? If you're describing a | different or better way to do it I'm missing it. | nu11ptr wrote: | No. It is a frontend application that works as a CRUD REST | API, validates the data, generates what it can, and stores it | into a database/IPAM. It can then be changed, viewed, | modified, deleted, etc. | | When you are ready to deploy I "compile" the object model | data into an IR (representing the "network topology") and | then make a final pass and translate into HCL for all the | various backends. | | I'm not saying its "better" as it has trade offs. I'm saying | for networks specifically, it is the only way I've seen in | the real world to give these tools lots of value. Otherwise | the network engineers end up spending all their time looking | up the input data (vlans, subnets, ips, etc.) which is the | part that is most time consuming for manual configuration as | well. The validation and auto-generation of the input data is | where the value comes in. | totallywrong wrote: | Got it thanks, makes sense. The way I've frequently seen | this done, that goes more in line with the IaC and GitOps | trends, is people making a PR to the config repo with the | required values. Then a pipeline runs and does all | validations, pulls data from external sources, and runs the | terraform plan. If everything looks good upon review a | merge applies the saved plan. | tguvot wrote: | i worked on a product that did something similar for telecoms. | had a closed loop automation and graphical designer for object | model. it was 10 years ago. | | looking today at all the manual work with playbooks/etc, it's | astonishing. feels like things didn't move forward at all in | past decade | dopylitty wrote: | Even in the big public clouds the user facing networking | really hasn't progressed beyond a layer of lipstick on top of | the kludges that were created for connecting physical servers | 40 years ago. | | For instance in AWS you still have to care about BGP and ASNs | if you want to follow the most seamless approach to create a | multi-region mesh of VPCs. Why should I have to care about | that? AWS already knows where all the packets came from and | where they're going and should just put them in the right | place. I don't care how they get there and I certainly | shouldn't have to care about BGP attributes[1]. | | 1. https://docs.aws.amazon.com/network- | manager/latest/cloudwan/... | theideaofcoffee wrote: | I glanced through the guide and it's Windows and Cisco | (specifically IOS) heavy: mentions of the old Cisco architecture | via Core/Access/Distribution, where larger DC networks have | converged onto spine/spline setups, CDP/Cisco Discovery protocol | whereas the open-source LLDP is more generic, even the | nomenclature of 802.1q VLAN tags: access versus trunk. But I | guess if you are starting to automate a legacy office network, it | might be useful. | | More recent non-IOS network OSes that lend themselves to | automation, especially in the datacenter, the likes of Cumulus or | SONiC are pure linux with some asic-vendor-specific bits and | bobs, so I'm unsure of the applicability of this guide to larger, | more modern networks. Tools like ansible could be a good fit | here, but since they are 'just' linux, might as well use a | dedicated config management tool like chef or puppet. | | Otherwise I think it's well written for someone in a smaller shop | wanting to get their feet wet with ansible and other tools but | still stuck on IOS. | jimmar wrote: | > old Cisco architecture via Core/Access/Distribution, where | larger DC networks have converged onto spine/spline setups | | Please correct me if I'm wrong, but I see the "old" | core/access/distribution layers still relevant. The datacenter | spine/spline setup applies to networking between server racks | in the data center. | | > 802.1q VLAN tags: access versus trunk | | Again, are you saying that these are outdated? I'm not a | practicing network engineer, but I know several network | engineers and they've told me that understanding 802.1q VLAN | tags to segment network traffic has been helpful. | kazen44 wrote: | > Please correct me if I'm wrong, but I see the "old" | core/access/distribution layers still relevant. The | datacenter spine/spline setup applies to networking between | server racks in the data center. | | this is correct. The place where spine-leaf really shines is | when used in combination with evpn-vxlan. You can then | encapsulate every tenant network inside a VXLAN domain and | route those between your leafs switches through your spine | layer. | | This is basically a clos fabric which is non-blocking, and is | very easy to expand horizontally. It also gives you nice | features like ARP suppression[0]. These features are | important in a DC fabric because ARP flooding is traffic | which is not revenue generating, and should be minimized as | much as possible. | | For normal Enterprise/Office network, running an evpn-vxlan | fabric is usually far to complex for the benefits involved. | | [0] https://satishdotpatel.github.io/how-does-arp- | suppression-wo... | darkr wrote: | > 802.1q VLAN tags: access versus trunk | | I think the parent was saying that these are Cisco specific | terms; more generic terms would be "untagged" + "tagged". | ajsnigrutin wrote: | Trunk and access ports are like kleenex and bandaids. Yes, | technically cisco terminology, but used everywhere. | iso1631 wrote: | Absolutely, here's a config from one of my aristas(with | bits snipped) interface Ethernet1 | switchport trunk native vlan 899 switchport | trunk allowed vlan 801 switchport mode trunk | interface Ethernet13 switchport access vlan 311 | | And on a Juniper set interfaces xe-0/2/1 | unit 0 family ethernet-switching interface-mode trunk | set interfaces xe-0/2/1 unit 0 family ethernet-switching | vlan members Mgmt_B set interfaces xe-0/2/1 unit 0 | family ethernet-switching vlan members Audio_2 | .... set interfaces ge-0/0/19 unit 0 family | ethernet-switching interface-mode access set | interfaces ge-0/0/19 unit 0 family ethernet-switching | vlan members Audio_2 | | When Cisco, Arista, Juniper all use access vs trunk it's | hardly a vendor specific term | metadat wrote: | Direct link to the PDF: | | https://github.com/automateyournetwork/automate_your_network... ___________________________________________________________________ (page generated 2023-07-03 23:01 UTC)