Living in the dark
Stars - Large Magellanic Cloud from wikipedia

Living in the dark

It is a common requirement for government and enterprise customers to limit access to the internet. One common way of doing this is to use a proxy server to limit access to a predefined list of URLs

Access to resources outside the boundaries of a VPC is necessary for installing software and also to communicate with the AWS control plane. It is important however not to proxy all requests, as then communication with local resources may no longer be possible.

Some of the tools we are currently using in our project:

  1. AWS - All the infrastructure is hosted in AWS
  2. Jenkins - for automation, testing and deployment of both applications and infrastructure
  3. Terraform - infrastructure as code
  4. Packer - building AMI
  5. Amazon Linux 2 - preferred base operating system
  6. OpenVPN - Inbound connectivity for developers and administration users
  7. Squid - Outbound proxy

Web proxies in Linux are mainly controlled by the following environment variables:

  • NO_PROXY - sites to access without proxy
  • HTTP_PROXY
  • HTTPS_PROXY
  • FTP_PROXY
  • ALL_PROXY - used for all protocols

Lots of software uses a shared library called libproxy. Some programs appear to have their own implementation in their standard http library. Unfortunately there is no standard for these variables, and implementations do differ

Case of variables

Generally most applications use lower case for everything except NO_PROXY. If you are setting both I would advise setting them to the same thing.

The use of ALL_PROXY

ALL_PROXY seems too far less widely supported than the others, so it is best just to use the protocol specific variants.

No Proxy

The no proxy variable is a comma separated list of addresses to access directly rather than send to the proxy.

IP or domain names

The no proxy environment variable can contain both ip addresses and domain names. All the programs we have tested wont convert between the two. If you are trying to access a locally hosted on 192.168.1.15 then you will need to add it's IP address to the no proxy list. If the same resource is available with the name service.internal.test.com then you will need to add that to the list instead.

In a cloud environment, you probably want to be adding domain names instead of ip addresses as much as possible as ip address should not be considered static and a resource will normally have a new ip address if it is replaced.

wildcards

star can be used as a wildcard to match all domains. If you want to match just the subdomains of a specific domain then you can use

no_proxy=.internal.test.com

CIDR blocks

If you want to whitelist all your local ip addresses you can use a CIDR block like:

no_proxy=192.168.10.0/24

But be warned. This works in most situations, but does not appear to work with python (2 or 3). As a result this will not be respected by the aws cli as it is written in python.

The only good news is values that are not understood seem to be ignored rather than cause issues in all the software I have tried, so the rest of the list will still be evaluated correctly.

What to include

If you are hosting in AWS as we are, be sure to include the AWS metadata service or you will stop AMI roles working as well as potentially casing other problems. Another ip address you may want to include is the loopback address.

How to propagate environment variables

There are several places you can set environment variables. You can set them either for a local user in their .bashrc and .bash_profile or you can set them in the global config.

The best way to set these environment variables in all situations for all users is to add them as exports in /etc/bashrc

If you do set them for a local user like the ec2-user then you may need to use sudo -E to preserve the environment if you need to download something with elevated privileges.

Jenkins

You can configure proxy settings for plugin central in:

Manage Jenkins > Manage Plugins > Advanced

That sets the proxy for the Jenkins base system, but you may also need to configure it for certain jobs. This can either be done by setting job specific environment variables or you can set them in global properties to be used by all jobs.

Yum

Yum deliberately uses its own configuration to configure a proxy edit /etc/yum.conf to include:

[main]
...
proxy=http://<Proxy-Address>:<Proxy_Port>

proxy_username and proxy_password can be added if needed.

Conclusion

Using proxies can be necessary, but it is a pain to get everything working correctly. Hopefully this provides a starting point and I will add to it if I discover any more issues or make improvements.

Also its best to use domain names rather than ip addresses for as much as possible as it is both better supported and easier to replace resources.

If you have any other experience or improvements, please share!

Dylan Savage

Senior DevOps Consultant at Contino

4y

Interesting article Andrew Larssen. I had a posting where Security dictated that whitelisting was filtered on user agents rather than domains and I feel in the Cloud engineering space that domain names would be the better approach. With the myriad of tools and wrappers we use to interact with our environments, finding out what user agent a new tool is using before testing seems likes the more complex implementation. I would be interested to hear your thoughts on the matter.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics