Risk management for TTN

kersing · November 4, 2016, 12:08pm

No, I was merely suggesting to start at the gateways because that is something you mentioned before. Also a lot of people with different levels of expertise have been placing them. More will follow and I expect there might be some low hanging fruit where a lot can be gained by just making people aware of the things to consider.

If we would have a check list for gateway owners of things to consider when placing a gateway they can at least make considered decision on them. I do not propose anyone visits them to audit if they implemented any recommendations and exclude them from the network if not. (And without a proper risk analysis we do not know what the risk would be of say an hacked gateway, would the hacker be able to gain any useful information? Damage the network? Do you know?)

The Things Network : Our mission is to enable a network by the users for the users. Which (at least to me means) an inclusive network, not an exclusive network!

Well… You could start by reading/listening to the suggestions made? Wienke suggested to start by listing risks on the wiki. You insist on having a committee with exclusion powers which does not seem to sit well with most of the responding community members. Lets start with the list and make the first point on it ‘not having management buy in’ and the second ‘no clear management structure’ (ir the other way around if you want). Then these can be addressed in the way Wienke suggests…

Could it be it equally irritates us that someone new to this community starts making demands without respecting the spirit of this community? I’ve personally known you for a while now (20+ years ) so I know you mean well, but you might want to rethink your approach… (feel free to give me a call to discuss things)

fortean · November 4, 2016, 8:35pm

Well, I wasn’t aware of making demands, actually. But if you say that’s the impression you’ve got - then no doubt I must be doing something wrong somewhere, for which I wholeheartedly apologise.

@Jac, I will give you a call later on. I’m currently recovering from a very busy week and a cold has gotten the best of me, but in as far as I can see, I’ve had the worst so will be able to call you later on this weekend.

@John: I hear you. We disagree - I believe that governance and risk frameworks like the ones I propose are sound best practice for any organisation, big or small. But it may well be that we need to be inefficient in order to obtain our objectives - sometimes that’s the only way to convince people that something is best done like it is done everywhere else. As long as it gets done in the end, that’s fine with me.

So, yes, perhaps, like Jac suggests, we need to start at what I think is the wrong end and create a list of risks first. Which would then no doubt indeed start with stuff like “not having set the scope for the risk assessment and treatment properly”, “not having a BOG” etc. in effect taking us right back to where we were 30+ postings ago - but now with the added benefit of acceptance by the community. That is of course the most important thing: acceptance by the community.

One measures a circle starting anywhere, after all.

johnmiddleton · November 10, 2016, 1:44pm

@fortean If we crowd source a list of risks then we will be able to see how they fit into the “scope” from that we will be able to see the relative importance of the scoped areas based upon the number of issues in each “logical scoped area” - its been 6 days since your last post and I still have no wiki in order to record my view on the risk factors.

fortean · November 10, 2016, 5:23pm

@John - very happy to see that you’re eager to start.

Yes, it has been six days since my last posting. Actually, I felt that I had nothing more to add in this stage. I also wanted to talk to @kersing first, given his notion that my approach might alienate me from the community I want to work with. That’s a … er… risk I am not willing to take.

Well, I can at least provide an update about that conversation with @kersing. I called him a few days ago and we had a long, lively discussion about all this. Both Jac and I feel that RA is important and should be done, we also both agree that it is of crucial importance to choose the proper approach and make sure to get support from the community. We also discussed a number of possibilities, but I won’t go into detail yet, simply because I promised Jac to send him my notes of the conversation first. I would consider it rude to relay what we discussed without his nod.

Anyway - yes, you could start compiling a list. Or even lists. But not a list of risks - that’s quite impossible

Let me explain: risk can be defined as the sum of impact of a treath working on a weakness times the probabilty of occurance, in quasi-maths:

R=I_(t→w)P

Given the above definition or risk it should now be clear that you simply can’t compile a “list of risks” unless you have at least a notion of:

weaknesses
threats
how to determine impact
how to determine probability
the scope.

Scope matters most! Selecting the wrong scope can ruin any effort to control risk.

Why is that? Well, assuming that we aren’t capable of overseeing, let alone controlling the entire Universe, we need to limit ourselves to something we CAN oversee and control. That’s called “scope”. We might, for example, say we will limit ourselves to “the TTN network”. That at least would prevent us from having to do RA for the universe

But perhaps even that scope is to broad to achieve meaningful results: are we really able to control the entire network? (Hence my quest for a BOG). So, we may need to limit the scope further to say “the TTN backend” or “the TTN gateways”. Or maybe even further: maybe a special type of gateway? Unless we first agree on scope. we can’t even start.

“Sure I can, I’ll show you” you may exclaim. Go ahead! But I will have to be pedantic and point out that you merely ended up choosing your own scope, your own methodology of determining risk - and unless you communicate that with us, we won’t be able to help you. Hence, it is far more logical to discuss and establish scope first, then compile lists and work on a methodology, then determine the list of risks, then see if we can find controls for them. If we simply barge in and compile a belly-based list of “risks” that would result in chaos. Being an anarchist makes me dislike chaos most, so I won’t help you there.

if you chose the wrong scope you may end up with a list of weaknesses that aren’t even under our control. Please note the subtle but important difference between weakness and threat: you can not ever control threats, you merely can - often - control weaknesses.

So, what CAN we do then?

determine scope (preferably something in “our” control)
determine assets that are ‘in scope’ from there
compile a list of threats
compile a list of vulnerabilities of our assets
figure out a methodology to weigh risk
and THEN we may create a meaningful list of our assets and risk…

I’m sorry, folks, but you can’t go to the moon just by wishing you were there,.

But - good news! - we don’t have to start from scratch at all. A list of threats can be compiled easily, there are plenty of such lists available on-line and in various appendices of various standards and papers. You’ll find things like “storms”, “floods”, “fire”, “human error”, “illness”, “terrorist attacks” etc. on them.

Risk analysis methodologies are also broadly available and I’m very willing and able to suggest a methodology that I think can be used by a volunteer community.

So, if we want to set up a list - a list of vulnerabilities might be a good start, and actually that’s often done in my world. You may, if you like, delve into NIST SP800-30 to get a notion of the various approaches used (and its’s a free download, standards need to be purchased, alas).

But first - I’m sorry, but such is life - we need to establish scope.

Do I still make any sense to y’all folks?

vannut · November 10, 2016, 9:28pm

I like the scientific approach of this RA. But there is one line that keeps coming back to me:

if you chose the wrong scope you may end up with a list of weaknesses that aren’t even under our control.

So we need to tell something about the things (scope & weaknesses) under our control

With that in the back of my mind I would say the initial scope should be the gateways sending messages to the ttn-routers.

There would be a number of other scopes definable, like the ttn-backend but that is more of an abstract thing and a more complicated scope.

Starting with the gateways reporting to ttn-routers-scope could establish a methodology for a more in depth RA of the backend.

fortean · November 10, 2016, 10:49pm

It is indeed a “scientific” approach - more specific: these are the best practices of decades, discussed at length, proven time after time, wrought into standards. So, it’s scientific pragmatism, not some theoretical model that could be used. It’s a model that SHOULD be used

[steps off the soapbox]

Yes, choosing the ‘gateways that report to the TTN-routers’ as our scope is an option. There are some issues we should consider if we do, e.g. there is no such thing as a “standardized” gateway - there re many types and flavours.

But we might start to create a list of vulnerabilities of (TTN) gateways that apply to many, e.g.

electricity - gateways need power - power can be lost - that’s a vulnerability.
Internet connection - gateways need an Internet connection - network connection unavailable - that’s a vulnerabilty
antenna - … well, you catch my drift by now.

But is it the proper scope? How do you determine the proper scope?

In most cases I’d advise customers to determine their “most important process” or “their most important asset”. We can sustain our network if a gateways fails - but can we sustain our network if the backbone fails? Hence - it may well be that we need to start there.

What do “we” consider the most important part of our infrastructure?

BTW, just to give you an idea of the type of things we might need to consider, check out this list of vulnerabilities / threats http://www.hq.nasa.gov/security/it_threats_vulnerabilities.htm

fortean · November 11, 2016, 2:56pm

Oh, perhaps I haven’t been clear, but actually I had hoped for some answers from y’all here - to the question:

What do you consider to be the most important part of our infrastructure (and why)?

vannut · November 11, 2016, 3:00pm

So not the backend; but the people running it

fortean · November 11, 2016, 3:38pm

Good, so perhaps the scope of our first RA should be “people that run the TTN network”. Let’s see, we now have:

the gateways that connect to the TTN backbone
the TTN backbone itself
the people that operate TTN

… any more options? Any more things that are considered very important to run the TTN network?

TijnOnlijn · November 11, 2016, 7:57pm

users that use the network for things that they shouldn’t use it for, potentially killing old people.

edit; ok, that’s jumping to risks. But I am serious though.

fortean · November 11, 2016, 9:39pm

The question you raise here is: are nodes in scope - or not? if nodes are in scope it is implied that we can exercise some control over them. So, do we have any control - procedural, technical, whatever - over our nodes?

I believe we have: for example each node has a unique DevEUI, which is reported to our network. If a life-guarding device is produced by a somewhat bigger company, chances are they have their own OUI range, often a certain type of device will be given a DevEUI in a certain range, and so we might be able to recognise these devices - and flat-out refuse to service them. So, ironically, by refusing servicing devices that fall in a given DevEUI range AT ALL TIMES we eliminate the risk of not servicing them correctly - as long as we make it VERY clear that we WILL NOT service these devices on our network, e…g by stating that on our main page, or during registration of the device. Not saying we should, but it’s a control

So, yes, we probably could include nodes in our RA scope.

Are nodes the part of our network that you see as the most important, @TijnOnlijn? So, do you suggest that we use the scope “the nodes that connect to the TTN gateways”?

Actually, I don’t think that failing node is a big risk to our network - but yes, it may be a big risk to a person that wears it.

Another thing: remember: R=I(t→w)P

The impact I you suggest is fierce: death of a person. That, in my book, should have the highest ranking.

What is the vulnerabilty (w) here - well, there is no guarantee that a distress transmission will be received by the proper application in time. Threat t could be a power outage, unplanned maintenance, ISP down, backbone down etc. etc.

Do you have any idea / gut feeling how often this might occur, e.g. once a year, once per decade etc.? What is the probability P? Say, it happens once every 5 years - would the remaining risk be acceptable to the community?

So, can we accept the risk of a dead guy every 5 years due to TTN not working? And how is that decided upon?

Anyway, back to the scope…

kersing · November 12, 2016, 12:40pm

No, the question is: are users in scope. A node constructed for a perfectly valid use case can be (ab)used for something different resulting in a different risk profile. That means risk is not based on the node properties, but on its use case which is partially determined by the designer and partially by the user. So may-be designers and manufacturers should be in scope as well?

I would propose to limit the initial scope to the back-end and expand it to the gateways in due time. Expanding the scope later on is always possible, starting with too large a scope will almost certainly result on failure.

BTW. I think it will not hurt if we start with a list of perceived risks at this time, once the scope has been determined any entries on that list that are not in scope can be removed. It might even help us determine the scope…

fortean · November 12, 2016, 5:58pm

Well, I believe that after all the posts in here we have established at least a few things:

you need to have “management support” (or perhaps in our case: and/or community support) if you want to achieve anything. I believe we have established this, given the nod from @wienke and the various postings of various community members in this thread.
you also need some kind of BOG / workgroup / committee (RACOM) to “push” RA and decide on stuff like what RA methodology to use, what the focus of the RA will be etc. I believe we also have achieved this: the active users in this thread are IMO ipso facto the RACOM. It’s of course all quite informal, but that’s exactly what this community wants methinks. Good!
the first thing a RACOM does is establish scope. We’re not there yet. However, there are proposals: @kersing suggests starting with the TTN back-end; I have suggested the gateways, another suggestion was “the volunteers that operate TTN”. There are some golden rules if one wants to do RA on something, roughly they are “don’t bite off more that you can chew”, “ensure that you have control over the assets you put in scope” and “analyse the most important assets first” I therefore believe that users are out (bit much to chew on for now, they are not really under our control, though they are a very important asset to us), gateways are out too (mainly too much to chew on, given the various types and various types of people that operate them), hence yes, the TTN back-end may be a good place to start. It is under our control and it is an important asset. Not sure if it is a bit too much to chew on, but we will see.

So, I second @kersings motion, let’s start with the TTN back-end.

Next problem: can I simply assume that we do and go ahead, or do we need some type of voting system in here?

fortean · November 13, 2016, 1:03pm

For clarification’s sake let me add that the concept of “scope” in practice corresponds to “the part of the organisation you do the implementation for”. E.g. if you would do an implementation of an information security management system (which involves doing risk analysis) for say a Big Bank, you would probably NOT do the implementation for the entire Big Bank, but for a department or division. Remember: don’t bite off more that you can chew. Choosing proper scope is important because if your first implementation fails, you will probably have created such company-wide negative feelings towards an ISMS/RA that it is irreparable.

Given that in as far as I know the TTN back-end is owned and operated by a foundation - if I’m wrong, please correct me - the more formal proposed scope would then be “the organisation that owns the knowledge and other assets that make up the back-end of TTN”.

If nobody chimes in before say next tuesday in I will simply interpret this as the nod to go ahead, we will set the scope as I mentionted in the last paragraph and we may start discussing the method to use to

determine the method to perform risk analysis
write that down somewhere (e.g. document in wiki?)

which probably involves

create a list of assets within our scope
determine their value, specifically checking the importance of their confidentiality, integrity and availability aspects
do the RA per the defined method, starting with the most important / vulnerable item.

The most commonly used method for smaller organisations that don’t (yet) have a large experience database / incident database etc. is the qualitative risk analysis method. It will probably appeal to this community as it more or less works like @wienke proposed; we simply use our gut feeling to establish the value of our assets, though we do it in a slightly formalised way. But before I go off and spout my superiour knowledge about such methodologies, first let’s agree on the scope.

So, unless people start telling me that I’ve got the wrong end of the stick here (and propose an alternate scope) the RACOM (that’s me and the rest of the posters in here) will go ahead with the scope @kersing indicated and which I refined.

ETA: and how about the laws that apply? We surely would not want to break the law, right?

In Who owns the network? the topic of ownership of the network was discussed. Apart from being of great importance to be able to answer the question “who’s in control of this asset / who’s responsible for this asset” it is also of great concern to find out which laws apply. E.g. the Dutch BBGT http://wetten.overheid.nl/BWBR0015808/2013-01-01 might well apply to the back-end that is owned and run by the Dutch TTN foundation. This actually means that the TTN foundation might be a provider of a public telecommunicationsnetwork and/or a public telecommunication service (In Dutch: “aanbieder: aanbieder van een openbaar telecommunicatienetwerk of van een openbare telecommunicatiedienst; [Art. 1 BBGT]” and if so, it needs to adhere to strict (information)security rules. The law even has an appendix that provides a number of (imho very sane and usable) controls that need to be in place, e.g. that the provider needs to have a person in charge of internal audit, that the assets that process (possibly confidential) data need to be placed in properly secured rooms / facilities, that you should use personalized authentication to get access to systems (so, no group of functional passwords (e.g. no root logins) and how to discard data etc.

fortean · November 15, 2016, 6:14pm

Scope now set. I will keep you posted about the next steps, which should involve getting a detailed design of the back-end, the list of processes and supporting assets and inputs / outputs. After having obtained that we can use the vulnerabilities, standards and some creativity / experience to start an initial RA (‘ist’).

fortean · November 17, 2016, 2:50pm

This just in: principles for securing IoT - with (as was to be expected) a reference to risk analysis (pg 9). But more rather logical / sound principles are listed. This is IMHO a useful document for RACOM, and I suggest y’all read this, I probably will refer to this every now and then.

https://www.dhs.gov/sites/default/files/publications/Strategic_Principles_for_Securing_the_Internet_of_Things-2016-1115-FINAL…pdf

fortean · November 21, 2016, 10:14am

Yesterday I was informed that the Dutch TTN team (whom designed / run the back-end) is currently way to busy to properly support this initiative. Without proper support it’s impossible to do a proper RA, let alone do something with the outcome of it. Which is IMO, the main reason to do RA.

In more formal terms: we currently don’t have management support, folks.

Also: a number of volunteers have expressed concern about this initiative. Some feel that it is against the culture here, which is a more pragmatical, technology driven culture. Others feel that my preferred methodology is simply not adequate for the phase we’re currently in. However, I’ve not seen any proposals for an alternate methodology to guarantee results. Sure, we could create a list of risks in the wiki, but apart from the questions about scope and methodology, if we don’t have support from folks to do something with the results, it’s all rather pointless.

@kersing and I have contemplated about this last night, we still see some light at the end of the tunnel (and hope it’s not the headlight of the approaching train). Our idea is to reset the scope to gateways - this is a domain @kersing is very familiar with (and I know a bit about it myself).

We will start compiling (stealing) a list of threats, set objectives, then work out possible controls. We may, in the process, decide on a methodology to weigh risks against each other, probably a semi-quantitative method. The result will be an ADVISORY list of control objectives / controls, which can be used by volunteers that run a gateway.

@kersing and I haven’t decided on much yet but will keep you posted. I welcome your input / feedback and if possible help / suggestions.

CurlyWurly · November 21, 2016, 1:58pm

Interesting points raised and IMO, this should tracked somewhere.
I guess one way to mitigate risk right now - is for individuals to take ownership of the risks by:
1 - Owning and managing the gateways that your devices will use
2 - Being responsible for the end to end security that your solutions use

One benefit ofthe above approach would be extensive saturation cover, which should move the “risk” emphasis onto other areas e.g. (techniques on how to stop spoofing).
Perhaps some sort of detection system smay be need in the future, which automatically checks how many gateways cover certain areas.

In the end, this is the begining of an interesting IoT ride!

julian · November 30, 2016, 10:19pm

Apologies for coming into this a bit late. From the perspective of what we are doing in Manchester this is really important especially as the ownership/liability of infrastructure both physical and digital is complicated. So we have to consider everything from network security to lightning protection and RF interference. What is great about TTN is that it is giving a lot of people experience of having to learn about these issues. What I can see happening in Manchester is that a legal entity will evolve out of it which hopefully will be a cooperative of all people involved. This will help win the support of the less risk averse public bodies but it will also give us a basis to put formal systems in place.

The work that we are doing with the Fire Service is a case in point we are developing prototype services, but there is no way that they could get beyond the prototype stage as there is all level of certification needed that requires proper RAs, compliance and standards of service. The reason why the Fire Service was interested though was because it was a) very inexpensive to participate and b) got them into a mode of thinking that wouldn’t really available if it started with standards first.

The compromise that is evolving though is creation of the free TTN network and in parallel maintaining a network with SLAs.

fortean · December 5, 2016, 10:20pm

@Julianlstar Thank you for the informative post.

Indeed, the importance of proper risk analysis can hardly be overestimated.

One of the newest developments is the publication of a series of documents about IoT security by the trade organisation of mobile telecommunications operators, the GSM Association (GSMA). The overarching document is the “IoT Security Guidelines Overview Document” Version 1.1, dated 07 November 2016. The GSMA also provides a Service Ecosystem Document, an Endpoint Ecosystem Document and a Network Operator Document. These documents provide - as their title suggests - a bonanza of best practices.

The GSMA also provides a self-assessment checklist, which enables various players in the IoT field to self-assess the conformance of their products, services and components to the GSMA IoT Security Guidelines. You can’t be ‘certified’ by the GSMA, but you can do a self-assessment, send it to the GSMA and they will review it (simple adminstrative checks). When all is found to be complete they will publish a statement on their website that you have completed the self-assessment and the name of the contact person in your organisation. As of now, there are no parties that have published a self-sessment yet - but that’s hardly surprising given the publication date of the documents (Nov 7th 2016).

The guidelines point out that ”almost all IoT services are built using endpoint device and service platform components that contain similar technologies to many other communications, computing and IT solutions. In addition to this, the threats these different services face, and the potential solutions to mitigate these threats, are usually very similar, even if the attacker’s motivation and the impact of successful security breaches may vary.”

Like in ISO27001, the importance of doing proper risk analysis is pointed out early on in the overarching document. The guidelines suggest breaking down the IoT infrastructure into components, then evaluate the risks associated with each component and then determine how to compensate for them (set controls). Even how the risk analysis should be done is indicated: “each risk shall be assigned a priority, to assist the implementer in determining the cost of the attack, as well as the cost of remediation, and the cost, if any, of not addressing the risk.” The checklist explicitly mentions the use of a ‘standard’ RA methodology and suggests CERT OCTAVE.

Apart from procedural guidance, there is also some very pragmatical guidance on e.g. physical security, and corresponding questions are in the self-assessment. Just to give you an idea, this is the set on Tamper Resistant Product Casing of endpoints (e.g. TTN gateways):

7.3 Use Tamper Resistant Product Casing
7.3.1 Do your endpoints use tamper resistant casing?
7.3.1.1 Our endpoints implement tamper resistant security controls.
7.3.1.2 Our endpoints contain circuits that invalidate NVRAM when a casing is opened.
7.3.1.3 Our endpoints contain Sensors that blow security fuses when abnormal conditions (e.g. light, temperature or voltage range) are detected.
7.3.1.4 Our endpoints contain Sensors that trigger an alert when a physically static device’s location is moved.
7.3.1.5 Our endpoints uses Epoxy covering for core circuit components.
7.3.1.6 Our endpoints raise Alerts when either internal or removable components are removed from the device.

So, apart from the NIST guidelines which are quite flimsy IMO, we now have a more substantial document to aide us - and it underwrites the importance of RA.

The main difference between ISO27001 and documents like these is that ISO27001 requires a management system (the ISMS) to be in place. The ISMS is based on continuous improvement (the well known Deming cycle, PDCA). Guidelines like that of the GSMA do not require this.