Risk management for TTN

vannut · November 10, 2016, 9:28pm

I like the scientific approach of this RA. But there is one line that keeps coming back to me:

if you chose the wrong scope you may end up with a list of weaknesses that aren’t even under our control.

So we need to tell something about the things (scope & weaknesses) under our control

With that in the back of my mind I would say the initial scope should be the gateways sending messages to the ttn-routers.

There would be a number of other scopes definable, like the ttn-backend but that is more of an abstract thing and a more complicated scope.

Starting with the gateways reporting to ttn-routers-scope could establish a methodology for a more in depth RA of the backend.

fortean · November 10, 2016, 10:49pm

It is indeed a “scientific” approach - more specific: these are the best practices of decades, discussed at length, proven time after time, wrought into standards. So, it’s scientific pragmatism, not some theoretical model that could be used. It’s a model that SHOULD be used

[steps off the soapbox]

Yes, choosing the ‘gateways that report to the TTN-routers’ as our scope is an option. There are some issues we should consider if we do, e.g. there is no such thing as a “standardized” gateway - there re many types and flavours.

But we might start to create a list of vulnerabilities of (TTN) gateways that apply to many, e.g.

electricity - gateways need power - power can be lost - that’s a vulnerability.
Internet connection - gateways need an Internet connection - network connection unavailable - that’s a vulnerabilty
antenna - … well, you catch my drift by now.

But is it the proper scope? How do you determine the proper scope?

In most cases I’d advise customers to determine their “most important process” or “their most important asset”. We can sustain our network if a gateways fails - but can we sustain our network if the backbone fails? Hence - it may well be that we need to start there.

What do “we” consider the most important part of our infrastructure?

BTW, just to give you an idea of the type of things we might need to consider, check out this list of vulnerabilities / threats http://www.hq.nasa.gov/security/it_threats_vulnerabilities.htm

fortean · November 11, 2016, 2:56pm

Oh, perhaps I haven’t been clear, but actually I had hoped for some answers from y’all here - to the question:

What do you consider to be the most important part of our infrastructure (and why)?

vannut · November 11, 2016, 3:00pm

So not the backend; but the people running it

fortean · November 11, 2016, 3:38pm

Good, so perhaps the scope of our first RA should be “people that run the TTN network”. Let’s see, we now have:

the gateways that connect to the TTN backbone
the TTN backbone itself
the people that operate TTN

… any more options? Any more things that are considered very important to run the TTN network?

TijnOnlijn · November 11, 2016, 7:57pm

users that use the network for things that they shouldn’t use it for, potentially killing old people.

edit; ok, that’s jumping to risks. But I am serious though.

fortean · November 11, 2016, 9:39pm

The question you raise here is: are nodes in scope - or not? if nodes are in scope it is implied that we can exercise some control over them. So, do we have any control - procedural, technical, whatever - over our nodes?

I believe we have: for example each node has a unique DevEUI, which is reported to our network. If a life-guarding device is produced by a somewhat bigger company, chances are they have their own OUI range, often a certain type of device will be given a DevEUI in a certain range, and so we might be able to recognise these devices - and flat-out refuse to service them. So, ironically, by refusing servicing devices that fall in a given DevEUI range AT ALL TIMES we eliminate the risk of not servicing them correctly - as long as we make it VERY clear that we WILL NOT service these devices on our network, e…g by stating that on our main page, or during registration of the device. Not saying we should, but it’s a control

So, yes, we probably could include nodes in our RA scope.

Are nodes the part of our network that you see as the most important, @TijnOnlijn? So, do you suggest that we use the scope “the nodes that connect to the TTN gateways”?

Actually, I don’t think that failing node is a big risk to our network - but yes, it may be a big risk to a person that wears it.

Another thing: remember: R=I(t→w)P

The impact I you suggest is fierce: death of a person. That, in my book, should have the highest ranking.

What is the vulnerabilty (w) here - well, there is no guarantee that a distress transmission will be received by the proper application in time. Threat t could be a power outage, unplanned maintenance, ISP down, backbone down etc. etc.

Do you have any idea / gut feeling how often this might occur, e.g. once a year, once per decade etc.? What is the probability P? Say, it happens once every 5 years - would the remaining risk be acceptable to the community?

So, can we accept the risk of a dead guy every 5 years due to TTN not working? And how is that decided upon?

Anyway, back to the scope…

kersing · November 12, 2016, 12:40pm

No, the question is: are users in scope. A node constructed for a perfectly valid use case can be (ab)used for something different resulting in a different risk profile. That means risk is not based on the node properties, but on its use case which is partially determined by the designer and partially by the user. So may-be designers and manufacturers should be in scope as well?

I would propose to limit the initial scope to the back-end and expand it to the gateways in due time. Expanding the scope later on is always possible, starting with too large a scope will almost certainly result on failure.

BTW. I think it will not hurt if we start with a list of perceived risks at this time, once the scope has been determined any entries on that list that are not in scope can be removed. It might even help us determine the scope…

fortean · November 12, 2016, 5:58pm

Well, I believe that after all the posts in here we have established at least a few things:

you need to have “management support” (or perhaps in our case: and/or community support) if you want to achieve anything. I believe we have established this, given the nod from @wienke and the various postings of various community members in this thread.
you also need some kind of BOG / workgroup / committee (RACOM) to “push” RA and decide on stuff like what RA methodology to use, what the focus of the RA will be etc. I believe we also have achieved this: the active users in this thread are IMO ipso facto the RACOM. It’s of course all quite informal, but that’s exactly what this community wants methinks. Good!
the first thing a RACOM does is establish scope. We’re not there yet. However, there are proposals: @kersing suggests starting with the TTN back-end; I have suggested the gateways, another suggestion was “the volunteers that operate TTN”. There are some golden rules if one wants to do RA on something, roughly they are “don’t bite off more that you can chew”, “ensure that you have control over the assets you put in scope” and “analyse the most important assets first” I therefore believe that users are out (bit much to chew on for now, they are not really under our control, though they are a very important asset to us), gateways are out too (mainly too much to chew on, given the various types and various types of people that operate them), hence yes, the TTN back-end may be a good place to start. It is under our control and it is an important asset. Not sure if it is a bit too much to chew on, but we will see.

So, I second @kersings motion, let’s start with the TTN back-end.

Next problem: can I simply assume that we do and go ahead, or do we need some type of voting system in here?

fortean · November 13, 2016, 1:03pm

For clarification’s sake let me add that the concept of “scope” in practice corresponds to “the part of the organisation you do the implementation for”. E.g. if you would do an implementation of an information security management system (which involves doing risk analysis) for say a Big Bank, you would probably NOT do the implementation for the entire Big Bank, but for a department or division. Remember: don’t bite off more that you can chew. Choosing proper scope is important because if your first implementation fails, you will probably have created such company-wide negative feelings towards an ISMS/RA that it is irreparable.

Given that in as far as I know the TTN back-end is owned and operated by a foundation - if I’m wrong, please correct me - the more formal proposed scope would then be “the organisation that owns the knowledge and other assets that make up the back-end of TTN”.

If nobody chimes in before say next tuesday in I will simply interpret this as the nod to go ahead, we will set the scope as I mentionted in the last paragraph and we may start discussing the method to use to

determine the method to perform risk analysis
write that down somewhere (e.g. document in wiki?)

which probably involves

create a list of assets within our scope
determine their value, specifically checking the importance of their confidentiality, integrity and availability aspects
do the RA per the defined method, starting with the most important / vulnerable item.

The most commonly used method for smaller organisations that don’t (yet) have a large experience database / incident database etc. is the qualitative risk analysis method. It will probably appeal to this community as it more or less works like @wienke proposed; we simply use our gut feeling to establish the value of our assets, though we do it in a slightly formalised way. But before I go off and spout my superiour knowledge about such methodologies, first let’s agree on the scope.

So, unless people start telling me that I’ve got the wrong end of the stick here (and propose an alternate scope) the RACOM (that’s me and the rest of the posters in here) will go ahead with the scope @kersing indicated and which I refined.

ETA: and how about the laws that apply? We surely would not want to break the law, right?

In Who owns the network? the topic of ownership of the network was discussed. Apart from being of great importance to be able to answer the question “who’s in control of this asset / who’s responsible for this asset” it is also of great concern to find out which laws apply. E.g. the Dutch BBGT http://wetten.overheid.nl/BWBR0015808/2013-01-01 might well apply to the back-end that is owned and run by the Dutch TTN foundation. This actually means that the TTN foundation might be a provider of a public telecommunicationsnetwork and/or a public telecommunication service (In Dutch: “aanbieder: aanbieder van een openbaar telecommunicatienetwerk of van een openbare telecommunicatiedienst; [Art. 1 BBGT]” and if so, it needs to adhere to strict (information)security rules. The law even has an appendix that provides a number of (imho very sane and usable) controls that need to be in place, e.g. that the provider needs to have a person in charge of internal audit, that the assets that process (possibly confidential) data need to be placed in properly secured rooms / facilities, that you should use personalized authentication to get access to systems (so, no group of functional passwords (e.g. no root logins) and how to discard data etc.

fortean · November 15, 2016, 6:14pm

Scope now set. I will keep you posted about the next steps, which should involve getting a detailed design of the back-end, the list of processes and supporting assets and inputs / outputs. After having obtained that we can use the vulnerabilities, standards and some creativity / experience to start an initial RA (‘ist’).

fortean · November 17, 2016, 2:50pm

This just in: principles for securing IoT - with (as was to be expected) a reference to risk analysis (pg 9). But more rather logical / sound principles are listed. This is IMHO a useful document for RACOM, and I suggest y’all read this, I probably will refer to this every now and then.

https://www.dhs.gov/sites/default/files/publications/Strategic_Principles_for_Securing_the_Internet_of_Things-2016-1115-FINAL…pdf

fortean · November 21, 2016, 10:14am

Yesterday I was informed that the Dutch TTN team (whom designed / run the back-end) is currently way to busy to properly support this initiative. Without proper support it’s impossible to do a proper RA, let alone do something with the outcome of it. Which is IMO, the main reason to do RA.

In more formal terms: we currently don’t have management support, folks.

Also: a number of volunteers have expressed concern about this initiative. Some feel that it is against the culture here, which is a more pragmatical, technology driven culture. Others feel that my preferred methodology is simply not adequate for the phase we’re currently in. However, I’ve not seen any proposals for an alternate methodology to guarantee results. Sure, we could create a list of risks in the wiki, but apart from the questions about scope and methodology, if we don’t have support from folks to do something with the results, it’s all rather pointless.

@kersing and I have contemplated about this last night, we still see some light at the end of the tunnel (and hope it’s not the headlight of the approaching train). Our idea is to reset the scope to gateways - this is a domain @kersing is very familiar with (and I know a bit about it myself).

We will start compiling (stealing) a list of threats, set objectives, then work out possible controls. We may, in the process, decide on a methodology to weigh risks against each other, probably a semi-quantitative method. The result will be an ADVISORY list of control objectives / controls, which can be used by volunteers that run a gateway.

@kersing and I haven’t decided on much yet but will keep you posted. I welcome your input / feedback and if possible help / suggestions.

CurlyWurly · November 21, 2016, 1:58pm

Interesting points raised and IMO, this should tracked somewhere.
I guess one way to mitigate risk right now - is for individuals to take ownership of the risks by:
1 - Owning and managing the gateways that your devices will use
2 - Being responsible for the end to end security that your solutions use

One benefit ofthe above approach would be extensive saturation cover, which should move the “risk” emphasis onto other areas e.g. (techniques on how to stop spoofing).
Perhaps some sort of detection system smay be need in the future, which automatically checks how many gateways cover certain areas.

In the end, this is the begining of an interesting IoT ride!

julian · November 30, 2016, 10:19pm

Apologies for coming into this a bit late. From the perspective of what we are doing in Manchester this is really important especially as the ownership/liability of infrastructure both physical and digital is complicated. So we have to consider everything from network security to lightning protection and RF interference. What is great about TTN is that it is giving a lot of people experience of having to learn about these issues. What I can see happening in Manchester is that a legal entity will evolve out of it which hopefully will be a cooperative of all people involved. This will help win the support of the less risk averse public bodies but it will also give us a basis to put formal systems in place.

The work that we are doing with the Fire Service is a case in point we are developing prototype services, but there is no way that they could get beyond the prototype stage as there is all level of certification needed that requires proper RAs, compliance and standards of service. The reason why the Fire Service was interested though was because it was a) very inexpensive to participate and b) got them into a mode of thinking that wouldn’t really available if it started with standards first.

The compromise that is evolving though is creation of the free TTN network and in parallel maintaining a network with SLAs.

fortean · December 5, 2016, 10:20pm

@Julianlstar Thank you for the informative post.

Indeed, the importance of proper risk analysis can hardly be overestimated.

One of the newest developments is the publication of a series of documents about IoT security by the trade organisation of mobile telecommunications operators, the GSM Association (GSMA). The overarching document is the “IoT Security Guidelines Overview Document” Version 1.1, dated 07 November 2016. The GSMA also provides a Service Ecosystem Document, an Endpoint Ecosystem Document and a Network Operator Document. These documents provide - as their title suggests - a bonanza of best practices.

The GSMA also provides a self-assessment checklist, which enables various players in the IoT field to self-assess the conformance of their products, services and components to the GSMA IoT Security Guidelines. You can’t be ‘certified’ by the GSMA, but you can do a self-assessment, send it to the GSMA and they will review it (simple adminstrative checks). When all is found to be complete they will publish a statement on their website that you have completed the self-assessment and the name of the contact person in your organisation. As of now, there are no parties that have published a self-sessment yet - but that’s hardly surprising given the publication date of the documents (Nov 7th 2016).

The guidelines point out that ”almost all IoT services are built using endpoint device and service platform components that contain similar technologies to many other communications, computing and IT solutions. In addition to this, the threats these different services face, and the potential solutions to mitigate these threats, are usually very similar, even if the attacker’s motivation and the impact of successful security breaches may vary.”

Like in ISO27001, the importance of doing proper risk analysis is pointed out early on in the overarching document. The guidelines suggest breaking down the IoT infrastructure into components, then evaluate the risks associated with each component and then determine how to compensate for them (set controls). Even how the risk analysis should be done is indicated: “each risk shall be assigned a priority, to assist the implementer in determining the cost of the attack, as well as the cost of remediation, and the cost, if any, of not addressing the risk.” The checklist explicitly mentions the use of a ‘standard’ RA methodology and suggests CERT OCTAVE.

Apart from procedural guidance, there is also some very pragmatical guidance on e.g. physical security, and corresponding questions are in the self-assessment. Just to give you an idea, this is the set on Tamper Resistant Product Casing of endpoints (e.g. TTN gateways):

7.3 Use Tamper Resistant Product Casing
7.3.1 Do your endpoints use tamper resistant casing?
7.3.1.1 Our endpoints implement tamper resistant security controls.
7.3.1.2 Our endpoints contain circuits that invalidate NVRAM when a casing is opened.
7.3.1.3 Our endpoints contain Sensors that blow security fuses when abnormal conditions (e.g. light, temperature or voltage range) are detected.
7.3.1.4 Our endpoints contain Sensors that trigger an alert when a physically static device’s location is moved.
7.3.1.5 Our endpoints uses Epoxy covering for core circuit components.
7.3.1.6 Our endpoints raise Alerts when either internal or removable components are removed from the device.

So, apart from the NIST guidelines which are quite flimsy IMO, we now have a more substantial document to aide us - and it underwrites the importance of RA.

The main difference between ISO27001 and documents like these is that ISO27001 requires a management system (the ISMS) to be in place. The ISMS is based on continuous improvement (the well known Deming cycle, PDCA). Guidelines like that of the GSMA do not require this.

I-Connect · February 12, 2017, 4:07pm

Hi, wow big topic, lot of “process” text

For me it is an important topic as I came across it while searching the term SLA as I am intending to start developing and selling solutions using the TTN as “backbone” and I want to be able to give a clear answer to my future customers on the chance of (planned) outages.

For my own devices, software, gateway availability/coverage I can/should make the assessment myself but for software within the gateway and the TTN backend I would need input from TTN as this is not within my influence.
Is there any statement on this last part? Or is it currently only based on “best endeavors” and if so is it the intention to change this in future?

What the content of my statement to my customers exactly is is not important as long as it is the truth setting the correct expectations (although it would probably hamper my sales a bit if I have to say there is no up time guaranty at all… )

I do realize the product TTN is still very new and other areas probably have priority at the moment ( I hope so as I am waiting for my gateway ) but I think the suggestion of @Wienke to start this in a pragmatic way is very good. Is this already started? Maybe I missed this in the topic…

Regards,
Jeroen

ps
Not sure if it is the right comparison but take linux, still open source but look at how many business critical SAP systems are running on it these days. I guess at some point in time they also started with risk management somewhere being able to state/safeguard it is a reliable product…?

fortean · February 12, 2017, 6:15pm

Hi, Jeroen, very happy to see your response.

I’m not familiar with the internal processes and procedures of SAP, but yes, at a given point somebody must have done a form of risk assessment to determine if Linux would be a sufficiently stable and maintainable product to release SAP on.

It is certain that SAP is currently using an ISMS for their business, which involves selecting a methodology for risk assessment and treatmeant. See this quote from the SAP website www.sap.com/corporate/en/company/quality.html:

Certificates

We are also certified according to ISO 27001

One of the main issues with TTN’s “network” is the lack of a responsible party (a “Body of Governance”) that does risk acceptance on behalf of the community / network, including prescribing mandatory controls that need to be implemented and/or adhered to by all.

I am currently writing a dissertation about this, and one of the observations I have is that TTN consists of a kazillion entities, mostly seemingly totally unaware of risk, or if at all, then very biased towards their personal interests. The interests of actual users, and especially he interests of commercial organisations that want to use TTN are NOT represented by TTN, in my not really humble opinion.

This strikes me as odd - as TTN presents itself as a catalyst to stimulate new businesses. I can’t imagine any serious business that would say “Sure, you can buy this service / product from me, but I won’t give you any guarantees about its availabilty, as I’m working with a network that does not either.”.

I have suggested setting up a RACOM (Risk Analysis COMmittee), which could consist of (volunteer) specialists from the community, and which would have the means to enforce the controls they deem to be necessary for proper maintenance, uptime and security of the network.

I’m actually writing a dissertation about this, and one of my recommendations is indeed to set up such an entity. This should be endorsed and facilitated by the TTN foundation. This because they TTBOMK holds the rights to the name “The Things Network” and hence can decide - as can be done in a benevolent dictatorship as @Wienke once described it to me - whom are allowed to say they are part of The Things Network and whom not. This also probably requires a more formal agreement between TTN and the currently roughly 8000 entities that make up TTN.

Only if you have an entity that actively monitors the security / quality of TTN and is allowed to introduce controls to establish a proper baseline can we have a commercial grade network, that may indeed be used by businesses to build on.

I-Connect · February 12, 2017, 7:42pm

Hi Fortean,

Thx for the reply. It seems you have a lot of knowledge on the subject.

But I can image this/you can a bit “threatening” for an enthusiastic (mainly technical?) group of people that are currently very busy creating their new product with all the best intentions.They probably have a pile of technical work still to do and maybe worried at the moment to take on the full fledged risk assessment wow you are stating.

But anyways, with these posts the think process is starting, I hope we can make a pragmatic and common sense approach. I see a lot of products becoming overly expensive and cpmplex because of all the regulations and legislation around it pushing it way beyond the 80/20 rule into a quality level that is maybe never needed.

that TTN consists of a kazillion entities

Not sure if this is correct. Isn’t The Things Network (TTN) just that group of enthausiastic people? Maybe you meant to refer to IoT or Lora?
So maybe scope is not as big as we think and can we start with a reasonably small scope (TTN back-end and gateway software?) that can be overseen.

I am no expert on the topic but willing to give my input/help if needed/wanted.

Regards,
Jeroen

fortean · February 12, 2017, 8:38pm

Hi, Jeroen, thanks for the compliments.

I have studied Information security management for years and have advised companies about it (professionally). So, yes, I know a bit more about it than the average Joe.

I’m aware of the “threatening” aspect of my posts, but to be frank: I’m not here to win a popularity contest. Either the community consists of sufficiently professional participants to understand what I wrote, and hence understand or at least discuss the need for RA - in which case they clearly understand that I’m not a threat - or they don’t. In the latter case, I can be pleasing and comforting as much as I want, but that won’t help either.

My remark “TTN consists of a kazillion entities” refers to the parties that host all kinds of gateways and the few enabling legal entities, e.g. TTN foundation. In as far as I know no gateway hosting party has signed any formal contract with TTN and many of them run a gateway or other piece of hardware as they feel fit. For example, I host a gateway that @kersing has build. Knowing him, it probably is a fine gateway, but there are no guarantees. Yesterday, the wife inadvertedly triggered the test button on the RSD, causing a power-out. The gateway went down too. I’m quite sure I’m the only guy running a TTN gateway for miles, so probably the wife caused a local network outage all by herself . We do not have controls in place to prevent such outages.

No, in as far as I know the TTN foundation is NOT the leading entity. They facilitate the actual entities (legal entities, users) that make up TTN, but do NOT control them (yet). And as much as I like the idea a decentralised network from an engineers point of view , but from a risk managers position it’s a bad idea.