diff --git a/Articles/139-oci-key-concepts.html b/Articles/139-oci-key-concepts.html index e69de29..b1d46f1 100644 --- a/Articles/139-oci-key-concepts.html +++ b/Articles/139-oci-key-concepts.html @@ -0,0 +1,245 @@ + + +
+
+
+ +
+

Tenancies and Compartments

+ +

When you sign up for an Oracle Cloud Infrastructure + account, you’re assigned a secure and isolated partition within the cloud infrastructure called a + tenancy.

+ +

The tenancy has the same name as the cloud account that you selected during the sign-up process. If you + want to change the tenancy name, you can do so by following these instructions. +

+ +

The tenancy is a logical concept. You can think of it as a root container where you can create, organize, + and administer your cloud resources.

+ +

The second logical concept used for organizing and controlling access to cloud resources is + compartments. A compartment is a collection of related cloud resources. Your tenancy is + also your root compartment.

+ +

You can create more compartments within your tenancy (up to six-levels deep) and use corresponding + policies to control access to the resources in each compartment. Every time you create a cloud resource, + you must specify the compartment that you want the resource to belong to.

+ +

The following figure shows a compartment called Engineering inside the root compartment. The Engineering + compartment has two sub-compartments (for Project-A and Project-B), and each one of those compartments + is further separated into multiple compartments.

+ +

This structure isolates resources between environments (development, QA, production) and different + projects. You can apply policies to and designate administrators for each compartment. Creating more + fine-grained policies ensures that users have access to the compartments that they need and that + resources can connect to each other.

+
+
+
+ + + + + +
+
+

Create compartments

+ +

Let’s create a compartment structure like the one in the previous figure.

+ +

1. Sign in to your OCI account.

+ +

2. Open the navigation menu, select Identity, and then select Compartments.

+ +

3. To create a sub-compartment in your tenancy, click Create Compartment.

+ +

4. In the Create Compartment dialog, enter the name of the first compartment (in our example, Engineering) + and a description, and then select the tenancy or root compartment from the Parent Compartment menu. Click + Create Compartment.

+
+
+ + + + + +
+
+ +

When it’s created, the compartment is assigned an Oracle Cloud Identifier (OCID), like any other + resource in OCI. You can read more about OCID and other resource identifiers in the documentation.

+ +

The OCID uses the following syntax:

+ +
+
+
Copy
+
+ +
+
+	ocid1...[REGION][.FUTURE USE].
+	
+
+
+ +

The following example, shows what the OCID of your compartment might look like:

+ +
+
+
Copy
+
+ +
+
+
+ocid1.compartment.oc1..aaaaaaaaexampleuniqueID 
+
+
+
+
+ +

You can follow the same steps to create sub-compartments for Project A and Project B inside the Engineering + compartment and the Dev, QA, and Prod compartments inside each project.

+ +

Here’s what the Project-A compartment details page looks like after its subcompartments are created: +

+
+
+ + + + +
+
+

Policy inheritance and attachment

+ +

With the compartment hierarchy set up as described, the sub-compartments inherit all access permissions from + the compartments higher up in the hierarchy. For example, the Prod compartment inherits the permissions from + Project-A and Project-A inherits the permissions from the Engineering compartment.

+ +

When you create an access policy, you need to specify which compartment to attach it to. This setting + controls who can later modify or delete the policy. For example, you could create an access policy for the + Project-A compartment and attach it to the same compartment. With this design, you can give the + administrators of the Project-A compartment access to manage their own compartment’s policies. + However, if you attach that same access policy to the Engineering compartment, only people who have access + to manage policies in the Engineering compartment can modify or delete the policy.

+ +

Moving resources between compartments

+ + +

Most of the resources created in a compartment can be moved to another compartment. However, some resources + can’t be moved, such as Container Engine for Kubernetes clusters, functions, or policies.

+ +

Additionally, some resources might have attached resource dependencies. When you move such resources, the + attached dependencies move asynchronously. So, even if the parent resource is moved and immediately visible + in the new compartment, it might take time for the attached dependencies to move and become visible in the + new compartment.

+ +

For some resources, the attached resource dependencies don’t automatically move to the new compartment. You + must move these resources independently.

+ +

Finally, be aware of how the policies work when moving resources between compartments. When you move a + resource to a new compartment, the policies that govern the new compartment are immediately applied and will + affect access to the resource.

+ +

To learn more about compartments, see the Managing compartments documentation.

+ +

Regions and Realms

+ + +

Now that you understand what tenancies and compartments are and how they work, let’s look at regions + and realms. When you signed up for your account, you selected your home region, and a tenancy was created + for you in that region. Your home region is the geographic location where your account and Identity Access + Management (IAM) resources are created.

+ +

You can create and update the following resources only in the home region:

+ + + + + +
+
+ + + + + + + + + + + + +
+
+
+ +
+

You can’t change your home region. However, through the Console or the Oracle Cloud Infrastructure + CLI, you can subscribe your tenancy to other regions.

+ +

When you subscribe to a region, the IAM resources in your home region are propagated and enforced in that + region. Using policies you can define access levels for each region separately.

+ +

If we take the compartment hierarchy from the previous example and subscribe to two more regions, we can + then create individual resources in those regions as shown in the following figure. Notice that the + compartments span the regions, but the resources within those compartments are created in specific + regions.

+ +

For more information more about supported regions, see the Data Regions page.

+ +

All Oracle Cloud Infrastructure resources are physically hosted in regions and in one or more + availability domains. An availability domain is a data center located in a region. Availability domains + (AD) are isolated from each other - they don’t share infrastructure, such as power or networking. + This isolation minimizes the possibility of simultaneous failures. By using multiple availability + domains, you can create a highly-available and resilient architecture.

+ +

A realm is a logical collection of regions. Realms are isolated from each other, and they don’t + share any data. A tenancy can exist in a single realm, and it has access to the regions that belong to + that realm. Currently, Oracle Cloud Infrastructure has a single commercial realm (OC1) and multiple + realms for the Government Cloud.

+
+
+
+ + + + + +
+
+
+

Additional OCI Resources

+ +

This blog post gave you a short overview of some of the key concepts and terminology you need to + understand when working with the Oracle Cloud Infrastructure. We briefly mentioned access policies, and + we’ll explain those in more detail in the next post.

+ + +
+
+
+ + \ No newline at end of file diff --git a/Articles/161-machine-learning-and-neural-networks.html b/Articles/161-machine-learning-and-neural-networks.html index e69de29..e659dee 100644 --- a/Articles/161-machine-learning-and-neural-networks.html +++ b/Articles/161-machine-learning-and-neural-networks.html @@ -0,0 +1,596 @@ + + +
+
+

If you are a typical software developer, all the machine learning buzz might seem confusing to you, + especially when it comes to heavy math and statistics stuff that you haven’t been using for years or + maybe never really understood.

+ +

This article aims to explain machine learning so you can have a better idea how you can use it and benefit + from it. To achieve this, we’re going to reuse your existing Java knowledge and extend it with new + machine learning concepts. After we cover the basics by exploring a Java code example for spam email + classification using neural networks, everything else about how machine learning works should make more + sense to you.

+ +

The example explored in this article uses Deep Netts, a Java-based deep + learning development platform that provides a pure Java, open source, community edition of the Deep Netts deep learning engine. This engine is a reference + implementation for the Visual Recognition (VisRec) API developed within the JCP as JSR 381, and it is a part + of the efforts for standardizing APIs and evolving machine learning support for the Java platform.

+ +

One of the main ideas behind Deep Netts is to provide an intuitive and easy-to-use API that will enable Java + developers to apply machine learning using their existing Java knowledge; simplify integration, deployment, + and maintenance of machine learning solutions; and improve the overall developer experience.

+ +

What Is Machine Learning?

+ +

Machine learning is a type of computer algorithm that is able to adjust a set of its own internal parameters + using sample data in order to perform a specific task on similar data.

+ +

Let’s take a more detailed look at this. Machine learning is an algorithm, pretty much like a sorting + algorithm, which performs various operations with in-memory data structures. Usually this algorithm has a + number of parameters that determine its operation. What makes it special is that it is able to adjust its + own parameters using sample data, and this process is referred to as learning or training.

+ +

You can think of machine learning as a kind of self-configurable algorithm. A machine learning algorithm + trained for specific data or a specific problem is called a model. Sample data used to train the + model is called the training set.

+ +

Once a model is done with training, it can perform some specific task, for example, assign a category label + to some input (classification) or estimate some quantity (regression). Both of these tasks are sometimes + referred to as prediction. It’s also important to note that it’s expected that models will make + mistakes (errors) in these predictions, and that these tasks will be performed with a certain degree of + accuracy.

+ +

As long as appropriate data about the problem is available, machine learning can be useful for solving tasks + that are difficult or impossible to solve directly using a fixed set of rules or formulas.

+ +

Example Use Case: Spam Classification

+ +

Email spam classification is a simple example of a problem suitable for machine learning. The task is to + determine whether some email is spam or is not spam. As you probably know from personal experience, there + are cases when this can’t be easily decided only by keywords in the subject or message, and additional + properties of email message need to be taken into account. One way to solve this is to gather a set of + example emails for spam and non-spam, and train a machine learning model.

+ +

There is a publicly available data set that we’ll use, which contains 4,000 emails + labeled in Figure 1 as spam (1) or non-spam (0). The data set is available as a CSV file, which is a + commonly used format for machine learning data sets. For every email, there are 48 features that correspond + to the frequency of occurrence of specific words; 6 features that correspond to the occurrence of specific + characters, 3 features that correspond to the occurrence of capital letters; and the last feature, which + corresponds to the spam/non-spam label. This type of task, in which you have to assign items to one of two + categories, is called binary classification. Figure 1 shows a few sample lines from the CSV + file.

+ +
+ +
Figure 1. Sample rows from the spam data set with info about the occurrence of words, + characters, and capital letters Sample rows from the spam data set with info about the occurrence of words, + characters, and capital letters
+ +

In order to build a binary classifier for the given CSV file, we need to perform following steps:

+ +

1. Read data from the CSV file and create an in-memory data set.
+ 2. Configure and create a neural network for binary classification tasks.
+ 3. Train the neural network using the loaded data set.
+ 4. Test the trained model to see how well it is performing.

+ +

Let’s get to the code. Listing 1 is a Java code snippet for creating a binary classifier using a + feed-forward neural network for a given CSV file in just few lines of code. It is using the community + edition of Deep Netts. Table 1 is an + overview of the classes and methods used in Listing 1. Listing 1.

+ +

A few lines of code for building a binary classifier.

+ +
+
+
Copy
+
+ +
+
+
+
+// Read data from a csv file and create data set
+DataSet emailsDataSet= DataSets.readCsv(“spam_data_preprocessed.csv”, 57, 1, true); 
+
+// create and configure an instance of feed forward neural network using builder
+FeedForwardNetwork neuralNet = FeedForwardNetwork.builder()
+                                             .addInputLayer(57)                          
+                                                                                                                                                   
+                                             .addFullyConnectedLayer(15) 
+
+                                             .addOutputLayer(1, ActivationType.SIGMOID) 
+                                                                             
+                                             .lossFunction(LossType.CROSS_ENTROPY) 
+                                             .build();
+
+
+
+
+
+
+ +

Table 1. Classes and methods used in Listing 1

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Code item

+
+

Description

+
+

javax.visrec.ml.data.DataSet +

+
+

Data collection for training machine learning model.

+
+

deepnetts.data.MLDataItem +

+
+

A single item in data set for training machine learning model (one input-output pair).

+
+

deepnetts.data.DataSets +

+
+

Static utility methods for working with data sets.

+
+

deepnetts.data.DataSets.readCsv +

+
+

Static utility method to read data from a CSV file and return a corresponding instance of + DataSet. Accepts a CSV file name, the number of inputs and outputs in the data set, and a + flag indicating whether a file contains a header (column names).

+
+

deepnetts.net.FeedForwardNetwork +

+
+

Neural network architecture that can be used for classification and regression tasks. By + convention, it provides a static builder method that returns a corresponding builder.

+
+ +

 

+ +

The feed-forward neural network used in this example is a machine learning algorithm that is + represented as a graph-like structure in Figure 2. Each node in this graph performs some calculation, which + transforms its input. Each node applies some function to all of the inputs it receives from other nodes, and + each node sends its result to the other nodes it is connected to. Nodes in this graph are organized into + groups called layers. All nodes receive inputs from nodes in the previous layer and send output to nodes in + the next layer. This results in a forward signal flow, and that’s where the name comes from.

+ +

There is very rough analogy between the kind of interaction shown in the computational graph and the way the + neuron cells in a brain interact, and that’s why these graphs are also referred to as a neural + network. Each connection between two nodes has an internal parameter called connection weight, + and by adjusting these parameters during the training procedure, these graphs are configured to perform some + desired behavior. The network learns by adjusting these parameters using a mathematical procedure to make + the difference between the target outputs specified in a data set and the outputs of the network as small as + possible. The difference between the target and network outputs is calculated using a so-called error + function (aka a loss function).

+ +
+ +
Figure 2. Feed-forward neural network depicted as a directed graph
+ +

A feed-forward neural network must have only one input layer, only one output layer, and one or more hidden + layers.

+ +

The input layer accepts external input (in this example, an email feature array), and the output layer + provides the end result, which in our spam example is the probability that email is spam.

+ +

The number of nodes in the input layer corresponds to the number of input features in the data set, and the + number of nodes in the output layer corresponds to the number of outputs in the data set. The size of the + layers and the number of hidden layers are configurable parameters, and the optimal values depend on the + problem and the data. In its simplest form, these parameters are determined experimentally, but there are + also advanced algorithms to automatically search for optimal values for these parameters. In this type of + neural network, hidden layers are so-called fully connected layers, which means that each node in a layer is + connected to all the nodes in the previous layer.

+ +

In Listing 1, the feed-forward network has been configured to work as a binary classifier by setting + functions that are commonly used for this type of task: ActivationType.RELU for + the hidden layer, ActivationType .SIGMOID for the output layer, and LossType.CROSS_ENTROPY for the error function.

+ +

Listing 1 can be used as a template for other binary classification data sets. You just need to change the + number of nodes in the input and output layers and tweak the hidden layers, as shown in Table 2. A + feed-forward network can also be used for other types of machine learning tasks by setting the appropriate + error and activation functions.

+ +

Table 2. Feed-forward network’s main configuration parameters.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Item

+
+

Description

+
+

Input layer size

+
+

Number of inputs.

+
+

Output layer size

+
+

Number of outputs.

+
+

Hidden layers

+
+

Number of nodes in the hidden layers as array. Each value in the array corresponds to the + number of nodes in a hidden layer sequence.

+
+

Activation function

+
+

Type of function performed in the neural network nodes. Common choices are Linear, Relu, + Tanh, Sigmoid, and Softmx.

+
+

Error function

+
+

Type of function used to calculate the network error for the entire data set.

+
+ +

 

+ +

How to Prepare the Data for Neural Network Training

+ +

In most cases, data from a CSV file needs to be prepared so it can be used for neural network training. In + order for a neural network to be able to use data, the data needs to be transformed into numeric values (0 + and 1) in a range. The data transformation operation that scales data to some range is called + normalization. In the Deep Netts API, this operation is provided by the MaxNormalizer class.

+ +

The trained machine learning model will be used in production with some new (hopefully similar) data that has + not been used for training. To estimate the classification accuracy on unseen data, the available data set + is randomly split into a training set and a test set, which are then used for training and testing + respectively. The ratio between the training and test set split is usually 60% for training and 40% for + testing or 70% for training and 30% for testing, depending of the amount of available data. Example code for + loading, preprocessing, and splitting data into training and test set is shown in Listing 2.

+ +

Listing 2. Normalizing and splitting data into training and test sets.

+ +
+
+
Copy
+
+ +
+
+
+// Read data from csv file
+DataSet emailsDataSet= DataSets.readCsv(“spam_data_preprocessed.csv”, 57, 1, true); 
+
+
+// Split available data into training and test set at 60:40 ratio
+DataSet[] trainAndTestSet = emailsDataSet.split(0.6, 0.4);
+
+
+// Normalize data
+MaxNormalizer norm = new MaxNormalizer(trainTestSet[0]); // create and initialize normalizer 
+norm.normalize(trainTestSet[0]); // normalize training set
+norm.normalize(trainTestSet[1]); // normalize test set
+
+
+ +

How to Train a Neural Network

+ +

After you have prepared the data and created the neural network, the next step is to train the neural + network. A feed-forward neural network is trained using a backpropagation algorithm, which is a + commonly used algorithm for various types of problems and neural networks. The Deep Netts API provides an + implementation of this algorithm in the BackpropagationTrainer + class. The method NeuralNetwork.getTrainer() returns an instance of the + trainer that will be used to train the parent network. After setting a few parameters that control the + algorithm’s behavior, the train method is invoked on a neural network using the training set as a + method parameter, which starts the training procedure. This process is shown in Listing 3. Table 3 shows the + backpropagation trainer configuration parameters, and Output 1 shows the total data set error and accuracy + over training iterations (epochs).

+ +

Listing 3. Configuring and starting training.

+ +
+
+
Copy
+
+ +
+      // configuring trainer
+      neuralNet.getTrainer().setMaxError(0.03f)
+                              .setMaxEpochs(10000)
+                              .setLearningRate(0.001f);
+
+      // start training using specified training set
+      neuralNet.train(trainingSet);
+
+
+ +

Table 3. Backpropagation trainer configuration parameters.

+ + + + + + + + + + + + + + + + + + + + +
+

Parameter

+
+

Description

+
+

MaxError

+
+

Training will stop when the total network error gets below this value.

+
+

MaxEpochs

+
+

Training will stop when the entire data set is processed the specified number of iterations + (epochs).

+
+

Learning rate

+
+

Controls the size of learning step, which is the amount of error that will be used to change + internal network parameters (weights) in each training iteration.

+
+ +

 

+ +

Output 1. Total data set error and accuracy over training iterations (epochs)

+ +
+
+
Copy
+
+ +
+
+TRAINING NEURAL NETWORK
+-----------------------------------------------------------------------------------------------
+Epoch:1, Time:72ms, TrainError:0.66057104, TrainErrorChange:0.66057104, TrainAccuracy: 0.6289855
+Epoch:2, Time:18ms, TrainError:0.6435114, TrainErrorChange:-0.017059624, TrainAccuracy: 0.65072465
+Epoch:3, Time:17ms, TrainError:0.6278175, TrainErrorChange:-0.015693903, TrainAccuracy: 0.6786232
+Epoch:4, Time:14ms, TrainError:0.60796565, TrainErrorChange:-0.019851863, TrainAccuracy: 0.726087
+Epoch:5, Time:15ms, TrainError:0.58832765, TrainErrorChange:-0.019638002, TrainAccuracy: 0.74746376
+Epoch:6, Time:15ms, TrainError:0.5712807, TrainErrorChange:-0.017046928, TrainAccuracy: 0.7572464
+
+
+ +

How to Test a Classifier

+ +

In order to make sure that the trained classifier will provide an acceptable level of accuracy while + classifying new emails, we’re going to perform a test procedure (aka an evaluation + procedure). A test procedure for a classifier performs classification on test data that has not + been used for training the classifier. It counts the correct and wrong classifications, and it calculates + various additional metrics that help you understand various properties of the classifier. The Deep Netts API + provides an easy way to perform a test procedure using just one call to a utility method from the Evaluators + class (shown in Listing 4), which returns an object that contains a map of classification-specific metrics. + Output 2 shows the most-important evaluation results that explain various classifier performance metrics. +

+ +

Listing 4. Calling the utility method

+ +
+
+
Copy
+
+ +
+
+EvaluationMetrics em = Evaluators.evaluateClassifier(neuralNet, testSet);
+System.out.println(em);
+
+
+
+ +

Output 2. Evaluation results that explain various classifier performance metrics

+ +
+
+
Copy
+
+ +
+
+Accuracy: 0.89402175 (How often is classifier correct in total)
+Precision: 0.8658368 (How often is classifier correct when it gives positive prediction)
+Recall: 0.8658368 (When it is actually positive class, how often does it give positive prediction)
+F1Score: 0.8658368 (Average of precision and recall)
+
+
+
+
+ +

How to Use a Trained Classifier

+ +

Now that we have a trained classifier, we can use it to classify some new emails. Listing 5 shows example + code:

+ +

Listing 5. Using the trained spam classifier

+ +
+
+
Copy
+
+ +
+
+        // create binary classifier using trained network
+        BinaryClassifier binClassifier = 
+                                                new FeedForwardNetBinaryClassifier(neuralNet);        
+        // get test email as an array of features
+        float[] testEmail = testSet.get(0).getInput().getValues();
+
+        // get probability score that email is spam
+        Float result = binClassifier.classify(testEmail);
+        System.out.println("Spam probability: "+result);  
+
+
+
+ +

The trained network is wrapped with the FeedForwardNetBinaryClassifier class, + which makes it intuitive what the network does, how to use it, and what the inputs and outputs are. The + BinaryClassifier interface from the Visual Recognition API uses generics to + specify the type of inputs for a specific binary classifier, which in this case is an array of email + features. Note that this can be relatively easily changed to some user-defined class.

+ +

An email to classify is provided from test set as an array of email features (the same set of features that + we used to train the classifier). This array is then used as input for the classify method of the classifier.

+ +

The classification result for the binary classifier is given as a probability score, which indicates how + likely it is that the given email is spam.

+ +

Code for the Complete Example

+ +

The full Java code for this spam classifier example, which includes loading, data preprocessing, training, + and testing, is available as a Maven project on GitHub. Listing 6 shows the full workflow.

+ +

Listing 6. Full workflow for neural network based spam classifier

+ +
+
+
Copy
+
+ +
+
+        
+	    int numInputs = 57;
+        int numOutputs = 1;
+        
+        // load spam data  set from CSV file
+        DataSet emailsDataSet = 
+                                    DataSets.readCsv(csvFile, numInputs, numOutputs, true);                   
+
+        // split data set into train and test set
+        DataSet[] trainAndTestSet = emailsDataSet.split(0.6, 0.4);
+        DataSet trainingSet = trainAndTestSet[0];
+        DataSet testSet = trainAndTestSet[1];
+        
+        // scale data to [0,1] range since this is the value range at which nn operates
+        MaxNormalizer norm = new MaxNormalizer(trainTest[0]);
+        norm.normalize(trainingSet);
+        norm.normalize(testSet);
+        
+        // create instance of feed-forward neural network using its builder
+        FeedForwardNetwork neuralNet = FeedForwardNetwork.builder()
+                .addInputLayer(numInputs)
+                .addFullyConnectedLayer(15)
+                .addOutputLayer(numOutputs, ActivationType.SIGMOID)
+                .lossFunction(LossType.CROSS_ENTROPY)
+                .randomSeed(123)
+                .build();
+
+        // set training settings
+        neuralNet.getTrainer().setMaxError(0.03f)
+                              .setMaxEpochs(10000)
+                              .setLearningRate(0.001f);
+        
+        // start training
+        neuralNet.train(trainingSet);
+        
+        // test network and evaluate classifier
+        EvaluationMetrics em = Evaluators.evaluateClassifier(neuralNet, testSet);
+        System.out.println(em);
+        
+        // create binary classifier using trained network
+        BinaryClassifier binClassifier = FeedForwardNetBinaryClassifier(neuralNet);        
+        float[] testEmail = testSet.get(0).getInput().getValues();
+        Float result = binClassifier.classify(testEmail);
+        System.out.println("Spam probability for the given input: "+result); 
+ 
+
+
+
+ +

Learn More

+ +

To learn more about machine learning and deep learning, take a look at the series of blog posts on the Deep + Netts blog. These posts describe in more detail how these algorithms work, other types of neural networks, + and machine learning tasks.

+ + + +

About the Author

+ +

Zoran Sevarac is an associate professor at the University of Belgrade where he teaches Java + and AI. He is a Java Champion, a JCP member, one of the JSR 381 expert group leads, a Duke’s Choice + Award winner for project Neuroph, and a NetBeans contributor. His work at the moment is focused on making + Java stronger and more developer friendly for machine learning.

+
+
+ + \ No newline at end of file diff --git a/Articles/Java/110-kubernetes.html b/Articles/Java/110-kubernetes.html index e69de29..d82fc84 100644 --- a/Articles/Java/110-kubernetes.html +++ b/Articles/Java/110-kubernetes.html @@ -0,0 +1,327 @@ + + +
+
+

Setting up a Kubernetes environment can be quite challenging, especially for beginners. Rather than + concerning yourself with manually installing Kubernetes on cluster environments, you could go for a managed, + cloud option. Oracle Cloud Infrastructure provides the latest managed Kubernetes offering.

+ +

This article shows how to deploy an example Java Platfrom, Enterprise Edition (Java EE) application in a + managed Oracle Cloud Kubernetes cluster.

+ +

Docker Containers

+ +

In order to run enterprise applications in a Kubernetes cluster, they need to be packaged as Docker + containers. We will use a Docker base image that already contains the application server, a Java + installation, and the required operating system binaries.

+ +

The following shows the Dockerfile of our hello-cloud project:

+ +
+
+
Copy
+
+ +
+
+FROM sdaschner/open-liberty:javaee8-jdk8-b2
+
+COPY target/hello-cloud.war $DEPLOYMENT_DIR
+
+
+ +

We can distribute the created Docker image via the public DockerHub, another Docker registry cloud service. + or a private Docker registry.

+ +

Kubernetes Deployments

+ +

Kubernetes runs Docker containers in the form of pods. A pod contains one or more containers and is usually + created and managed by a Kubernetes deployment. A deployment provides the ability to scale and update pods + without too much manual effort.

+ +

Our example Kubernetes deployment's YAML definition looks as follows:

+ +
+
+
Copy
+
+ +
+
+kind: Deployment
+apiVersion: apps/v1beta1
+metadata:
+  name: hello-cloud
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: hello-cloud
+        version: v1
+    spec:
+      containers:
+      - name: hello-cloud
+        image: docker.example.com/hello-cloud:1
+        imagePullPolicy: IfNotPresent
+        livenessProbe:
+          httpGet:
+            port: 9080
+            path: /
+        readinessProbe:
+          httpGet:
+            port: 9080
+            path: /hello-cloud/resources/health
+      imagePullSecrets:
+      - name: regsecret
+      restartPolicy: Always
+---
+
+
+ +

The liveness and readiness probe definitions tell Kubernetes when the container is up and running and enable + it to handle incoming traffic, respectively. The deployment will cause one pod to be created on a cluster + node with given the specification.

+ +

In order to pull the image from our repository—here, docker.example.com—we usually have to provide a Kubernetes secret, which + contains the Docker credentials. The secret regsecret was created in the same + namespace for this purpose.

+ +

Services

+ +

To access the created pod from inside or outside of the cluster, we require a Kubernetes service. The service + balances the load to all instances of the running containers:

+ +
+
+
Copy
+
+ +
+
+kind: Service
+apiVersion: v1
+metadata:
+  name: hello-cloud
+  labels:
+    app: hello-cloud
+spec:
+  selector:
+    app: hello-cloud
+  ports:
+    - port: 9080
+      name: http
+---
+
+
+ +

Kubernetes connects the service to the created pods by their labels and the defined selector. The app selector is a de facto standard for grouping logical applications.

+ +

Kubernetes has an internal DNS resolution that enables cluster-internal applications to access our + hello-cloud application via hello-cloud:9080. This, by the way, is a big benefit + of minimizing the URL conuration of applications that run inside of the cluster. No matter which cluster or + environment runs our workload, the host name hello-cloud will be resolved to the + corresponding hello-cloud service.

+ +

Ingress

+ +

To access applications from outside of the cluster as well, we usually use Kubernetes ingress resources. The + following creates an NGINX ingress, which automatically routes ingress traffic through the external IP + address:

+ +
+
+
Copy
+
+ +
+
+kind: Ingress
+apiVersion: extensions/v1beta1
+metadata:
+  name: hello-cloud
+  annotations:
+    kubernetes.io/ingress.class: "nginx"
+spec:
+  rules:
+    - http:
+        paths:
+        - path: /hello-cloud
+          backend:
+            serviceName: hello-cloud
+            servicePort: 9080
+---
+
+
+ +

Enter Oracle Container Engine for Kubernetes

+ +

In order to run our example application, we need a running Kubernetes cluster with an arbitrary number of + nodes. Oracle Container Engine for Kubernetes provides a managed cluster that doesn't require us to set + up the Kubernetes resources ourselves.

+ +

The documentation describes how to create a cluster with a desired network setup. We + will use the recommended default options with two load balancer subnets, three worker subnets, RBAC + authorization, and an additional NGINX ingress deployment. For more information, you can also have a look at + my GitHub OKE repository.

+ +

The following screenshots show the creation of our cluster with a default cluster node pool, which manages + the compute instances. We are creating a cluster called oke-cluster-1 with the + recommended networking options.

+ +

Figure 1. Creating a cluster

+ +

Figure 1. Creating a cluster

+ +

The node pool, node-pool-1, is created with the worker subnets and will manage two + nodes per subnet in VM.Standard.1.2 shape. In total, our cluster will contain six + nodes in three availability domains.

+ +

Figure 2. Node pool configuration

+ +

Figure 2. Node pool configuration

+ +

After that, our cluster and its nodes will be created.

+ +

Figure 3. The created node pool

+ +

Figure 3. The created node pool

+ +

The cluster detail page will guide us regarding how to connect to the newly created Kubernetes cluster. We + can prove that our nodes have been created by using the kubectl command-line + tool:

+ +
+
+
Copy
+
+ +
+
+$> kubectl get nodes
+NAME              STATUS    ROLES     AGE       VERSION
+129.146.112.217   Ready     node      2d        v1.9.7
+129.146.126.186   Ready     node      2d        v1.9.7
+129.146.133.104   Ready     node      2d        v1.9.7
+129.146.136.107   Ready     node      2d        v1.9.7
+129.146.66.171    Ready     node      2d        v1.9.7
+129.146.98.185    Ready     node      2d        v1.9.7
+
+
+ +

The cluster description page shows how to connect our local kubectl with the newly + created Oracle Cloud cluster.

+ +

Once we confirmed that the cluster has been set up successfully, we can start using the cluster by + provisioning our workload. Therefore, we send our Kubernetes YAML definitions to the cluster. In this + example, we packaged the deployment, service, and ingress definitions into a single YAML file:

+ +
+
+
Copy
+
+ +
+
+$> kubectl apply -f deployment/hello-cloud.yaml
+service "hello-cloud" created
+deployment.apps "hello-cloud" created
+ingress.extensions "hello-cloud" created
+
+
+ +

We now can check that our service, deployment, and pod have been created successfully:

+ +
+
+
Copy
+
+ +
+
+$> kubectl get services
+NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
+hello-cloud   NodePort    10.96.27.211   <none>        9080:32133/TCP   1m
+kubernetes    ClusterIP   10.96.0.1      <none>        443/TCP          1d
+
+$> kubectl get deployments
+NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
+hello-cloud   1         1         1            1           1m
+
+$> kubectl get pods
+NAME                        READY   STATUS    RESTARTS   AGE
+hello-cloud-d6777c66-n24bw  1/1     Running   0          1m
+
+
+ +

The NGINX ingress service is exposed as a load balancer and we will use its IP address to access the cluster: +

+ +
+
+
Copy
+
+ +
+
+$> kubectl get services --namespace ingress-nginx
+NAME                     TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                       AGE
+default-http-backend     ClusterIP      10.96.76.149    <none>            80/TCP                        1d
+ingress-nginx            LoadBalancer   10.96.191.202   129.146.88.24     80:30979/TCP,443:32339/TCP    1d
+
+ +

Now we'll put it all together and access our hello-cloud example application via HTTPS, for example by + using curl:

+ +
+
+
Copy
+
+ +
+
+$> curl -k https://129.146.88.24/hello-cloud/resources/hello
+Hello from OKE!
+
+
+ +

This uses the NGINX ingress that is accessed by the external IP address and routes traffic to the hello-cloud service and ultimately to the container, which runs in the hello-cloud-d6777c66-n24bw pod.

+ +

See Also

+ + + +

About the Author

+

Sebastian Daschner is a self-employed Java consultant, author, and trainer who is + enthusiastic about programming and Java (EE). He is the author of the book Architecting Modern Java EE + Applications. Daschner is participating in the JCP—helping to form the future standards of + Java EE by serving in the JAX-RS, JSON-P, and Config Expert Groups—and collaborating on various open + source projects. For his contributions in the Java community and ecosystem he was recognized as a Java + Champion, Oracle Developer Champion, and double 2016 JavaOne Rock Star. Besides Java, he is also a heavy + user of Linux and container technologies such as Docker. He evangelizes computer science practices on his blog, through his newsletter, and on Twitter. When not working with Java, he + also loves to travel the world—either by plane or motorbike. +

+
+
+ + + \ No newline at end of file diff --git a/Articles/Java/111-federated-deep-learning-using-java.html b/Articles/Java/111-federated-deep-learning-using-java.html index e69de29..0be44f9 100644 --- a/Articles/Java/111-federated-deep-learning-using-java.html +++ b/Articles/Java/111-federated-deep-learning-using-java.html @@ -0,0 +1,231 @@ + + + +
+
+

Deep Learning algorithms are a subclass of general machine learning algorithms. One of the core ideas of deep + learning is that it has some similarities with how the human brain works. Similar to how layers of neurons + in the brain process information, deep learning software contains allows developers to create a network + containing multiple layers of neurons that process information as well.

+ +

Especially when large amounts of data are available, Deep Learning can provide high-quality results. For + example, deep learning software can be used to classify images, and detect object on those images. Before a + deep learning algorithm can tell what objects are on an image, it has to be trained with lots of data. + Sometimes there is a wide amount of high-quality data available for training the network, but in many + real-world situations this is not the case. Real-world data is often obtained via consumer devices (e.g. + pictures taken on a mobile device) and can not easily be shared or transferred, due to privacy restrictions + or regulations.

+ +

In this scenario, where there is a large amount of data but it can’t be sent to a server, Federated + Deep Learning comes to the rescue.

+ +

Using Deep Learning software to make predictions

+ +

We start with the straightforward case where, based on an image, we want the Neural Network (or model) to + tell us what is shown in the image. In order to do so, the image is converted into an array of numbers, + where each pixel contains a number that corresponds to the grayscale of the picture. Note that more complex + models are possible, taking into account colors etc, but we want to start with a simple example. The result + of this conversion, the array of numbers, is sent through the Neural Network, and when the network is well + trained, it will recognize what is in the image.

+ +

Under the hood, the Neural Network consists of a number of layers: an input layer containing the original + data (the array of numbers), an output layer containing the results (in this case the most likely label + being a bike) and one or more hidden layers. Data propagates through the network, using network parameters + like weights, biases and activation functions, and results in resulting data. The following diagram, taken + from https://deeplearning4j.org, illustrates the process:

+ +

A course in deep learning is out of scope for this article, the interested reader is referred to
+ https://deeplearning4j.org/docs/latest/deeplearning4j-beginners. +

+ +

Before a Neural Network can be used, it should be trained. This is done by applying lots of data to the + network, and tell the network when it is wrong. Based on the feedback, the deep learning algorithm modifies + the parameters of the network (weights and biases), in such a way that it is more likely that the network + will now make a good prediction. In many cases, labeled trainingdata is used to learn the neural network. + When an image of a mountain is supplied to the Neural Network, and the network classifies it as a + “bike”, but the associated label was “mountain”, the deep learning software will use + sophisticated techniques to modify the neural network.

+ +

Under the hood, this comes down to a new set of parameters, as shown below:

+ +

Internally, Deep Learning software typically involves high-performance linear algebra. While predictions are + relatively easy and straightforward, training is more complex, and requires more computing power. Also, + training typically requires lots of high-quality data.

+ +

There are many software libraries providing Deep Learning API’s, in different languages. One of the + leading Java libraries is deeplearning4j (https://deeplearning4j.org), created and maintained by SkyMind + While the top-level API for deeplearning4j is pure Java, the implementations of the functionality offered by + the API’s are leveraging native code, including GPU-specific optimisations.

+ +

Training a neural network on the server

+ +

In a first setup, we will have a client that sends raw data to a server containing the Neural Network, and + ask it for a result. Before this can be done, the server needs a trained network. We can train a neural + network using the deeplearning4j apis, for example

+ +
+
+
Copy
+
+ +
+
+
+MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()             .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
+               .updater(new Nesterovs(learningRate, 0.9))
+               .list()
+               .layer(0, new DenseLayer.Builder()
+                       .nIn(numInputs)
+                       .nOut(numHiddenNodes)
+                       .weightInit(WeightInit.XAVIER)
+                       .activation(Activation.RELU)
+                       .build())
+               .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
+                       .nIn(numHiddenNodes)
+                       .nOut(numOutputs)
+                       .weightInit(WeightInit.XAVIER)
+                       .activation(Activation.SOFTMAX)
+                       .build())
+               .backprop(true)
+               .build();
+
+
+ 
+
+ +

In this code snippet, we create the configuration for the neural network. Again, the details about the + deeplearning4j library are out of scope for this article, and the reader is referred to https://deeplearning4j.org for more information. The configuration + contains 2 layers (the input data is often not considered to be a layer as it is not configurable). The + first layer is a hidden layer, and the second layer is the output layer. The output of the first layer is + the input of the second layer.

+ +

When the network is configured, it can be trained. The following code snippet will feed the network with + labeled data.

+ +
+
+
Copy
+
+ +
+
+
+RecordReader rrTrain = new CSVRecordReader();
+File trainsrc = new File("linear_data_train.csv");
+rrTrain.initialize(new FileSplit(trainsrc));
+DataSetIterator iterTrain = new RecordReaderDataSetIterator(rrTrain, batchSize, 0, 2);
+network.fit(iterTrain);
+
+
+
+ +

In this case, the trained data is in a CSV file, and it contains both input data as well as the expected + result. The network.fit(iterTrain) call will cause the network to be trained. A number of iterations will be + executed. In each iteration, a forward loop is applied and all input data is used to make a prediction. + Based on the predicted result and the actual result (contained in the csv), the network is modified. An + error score tells how “wrong” the network is, and the goal is that with each iteration, the + network is less wrong than before.

+ +

Once the network is trained, we can make requests to predict results based on new data. A conventional way of + doing is by having the deep learning software with the neural network on a server, and a client sending raw + image data via a REST interface to the neural network. This approach is shown in the picture below.

+ +

Typically, the neural network responds with an array containing possible labels, and their probability (e.g. + there is 90% chance this image contains a bike, 5% chance it contains a car). Additional processing can be + done on this response, before it is sent back to the client. In general, the REST interface might call + something like this:

+ +
+
+
Copy
+
+ +
+
+
+String predicted = doSomeProcessing(network.output(data));
+		
+
+
+ +

An advantage of this approach is that the Neural Network can be enhanced. In case the network returns a wrong + result (e.g. “car”), the client might send a correction to the server (“bike”). + Based on this information, the neural network can be retrained and become better, as shown in the picture + below:

+ +

There are a number of drawbacks to this approach as well. Since client-server communication is needed, there + will be no real-time response. If the client needs an immediate response, the time to setup an HTTP call + alone will be too much. Also, the client device might not always be connected to the Internet, while the + application assumes that the network should always be able to return predictions. And increasingly + important, this approach requires sending raw image data to a server. Due to privacy restrictions and + regulations, this might not always be allowed.

+ +

Predictions on the client

+ +

A number of these drawbacks can be removed by doing predictions on the client. If we run the same code on the + client as on the server, the client can use a simple API to query the neural network.

+ +

The Java code for making predictions on the client is exactly the same as the code for making predictions on + the server.

+ +

If you want to run deeplearning4j Java code on mobile clients, you can use the Gluon Mobile framework which + allows to create Java apps that work on iOS and Android devices. An explanation on how to do this is outside + the scope of this article, and the interested reader is referred to + http://docs.gluonhq.com/getting-started/.

+ +

The deeplearning4j software contains import and export functionality for neural networks. Hence, it is + possible to train the network on a server, export it to a file, send that file to the client, and import the + file as a neural network on the client. In general, predictions are not very computation intensive. + Moreover, deeplearning4j leverages native implementations available on mobile devices, and the performance + can be really great.

+ +

This new approach now allows the client to get a result in real-time, and it also works fine when the device + is not connected to the Internet. Also, the data (e.g. a picture taken with a mobile phone) stays at the + client, so there are no privacy concerns. In order to get a meaningful result based on the data, there is no + need to send this data from the device to a server.

+ +

There is a drawback to this approach though, since the model is not improving anymore. In the previous setup, + the client could correct the neural network. This feedback loop is required in order to make neural networks + really useful in real-world applications. Deep learning works better if more data is available, and if the + neural network is constantly updated.

+ +

Fortunately, we can achieve this goal without giving up the other benefits of local predictions.

+ +

Training on the client

+ +

Since the training code on the server is written in Java, it works on the clients as well. When the neural + network predicts a wrong result, and the user wants to correct it, the network can be retrained.

+ +

Since training is more resource-intensive than simply making predictions, it is recommended to do this during + low-activity time, e.g. at night, and preferable when the device is charged or charging.

+ +

This approach now has a number of benefits, including the learning capability of the neural network. In order + to become really useful though, it would be interested to have a large number of clients jointly train a + neural network without sending raw data. This is done in the next setup.

+ +

Federated Learning

+ +

When a neural network on a specific client device is retrained, that client can send the new parameters of + its neural network to a server. The parameters of the neural network do not contain the raw image data that + is used to train the neural network. When many devices send the new parameters to a server, this server can + combine them and create a new “best” neural network, that is occasionally sent back to clients. + There are a number of algorithms in how to do this, and again, the details are out of scope for this + article.

+ +

The general idea though is shown below:

+ +

Each client retrains the Neural Network with its own data, and the modifications in the Neural Network are + sent to a server. This approach allows for an enhancement of the Neural Network while respecting users + privacy. Moreover, the computing power of a large amount of client devices is used to improve the Neural + Network.

+ + +

Training a neural network is not trivial. Thanks to the deeplearning4j API’s, and the availability of + Java on client devices, Java provides a great platform for enabling Federated Learning.

+
+
+ + \ No newline at end of file diff --git a/Articles/Java/129-bullet-time.html b/Articles/Java/129-bullet-time.html index e69de29..b53111b 100644 --- a/Articles/Java/129-bullet-time.html +++ b/Articles/Java/129-bullet-time.html @@ -0,0 +1,116 @@ + + +
+
+

Before diving into some of the technical challenges that went into building the Matrix Bullet Time Demo, it + is useful to review the general problem statement and the hardware design solution that went with it. The + goal for this demo was to take instantaneous photos of a subject from 360 degrees, and then stitch these + photos together to form a movie. The intended final effect was for it to appear as though the camera was + moving around a subject frozen in time.

+ +

To accomplish this, Jasper Potts and I needed to mount a large number of cameras in a 360-degree circle. To + add some visual interest, we wanted to design it such that the cameras were mounted in a kind of helix. Each + of these cameras needed to be focused on a single, distinct point. Each camera had to be connected with a + central server to receive the commands to take a picture at the same moment in time and to transfer images + back to the server to turn them into a movie.

+ +

Figure 1. Me writing the software beneath the assembled helix.

+

Figure 1. Me writing the software beneath the assembled helix.

+ +

We built the Matrix Bullet Time Demo from 60 individual Raspberry Pi 3 single-board computers with Raspberry + Pi cameras. There were a few interesting problems to solve in trying to mount 60 Raspberry Pi units in such + a way that we could surround the subject in 360 degrees! We needed to design a mounting system and a method + of powering that many Raspberry Pi units. We also needed to take into account how to break down and + transport the rig between locations. (During JavaOne, we had the demo in two different locations on + different days, and we also wanted to design it to be shipped internationally.) Jasper found a lighting + track system with curved tracks which, when joined together, formed a circle. By hanging this track from + adjustable stands, he was able to vary the height of the track as it circled, forming the helix shape we + were looking for.

+ +

Figure 2. Close-up of a Raspberry Pi and a camera.

+

Figure 2. Close-up of a Raspberry Pi and a camera.

+ +

One of the benefits of using a lighting track system is that it handles power distribution. You provide the + 120 volt input power to the track and it carries that power through copper wires built into the track. At + any point where you want to have a light, you use a mount designed for the track system, which transfers the + power through the mount to the light. What we had to do instead was route this power to a transformer for + each Raspberry Pi that would step down the power to 5 watts. Jasper designed custom boards, printed them + with a 3D printer, and mounted these to the light mounts. In this way, power was delivered to each of the 60 + Raspberry Pi units.

+ +

Originally we tried to use Wi-Fi dongles for each Raspberry Pi for communicating with the server, but we had + a horrible time getting consistent latencies and consistent connectivity. Instead, we ran an Ethernet cable + from each Raspberry Pi along the track to switches and from there to the server. Jasper and his wife Fiona + put in all the hard work designing, printing, and assembling the hardware for this demo.

+

Figure 3. Jasper assembling the hardware.

+

Figure 3. Jasper assembling the hardware.

+ +

On the software side, we needed to run software both on the Raspberry Pi units and on a central coordinating + server. We also had a web UI for running the demo. Users entered their Twitter username so that the final + video that we uploaded to Twitter could be linked back to their own personal Twitter account. The overall + system worked like this:

+ + + +

In general, this system worked really well. The primary challenge that we encountered was getting all 60 + cameras to focus on exactly the same point in space. If the cameras were not precisely focused on the same + point, then it would seem like the "virtual" camera (the resulting movie) would jump all over the + place. One camera might be pointed a little higher, the next a little lower, the next a little left, and the + next rotated a little. This would create a disturbing "bouncy" effect in the movie.

+ +

We took two approaches to solve this. First, each Raspberry Pi camera was mounted with a series of adjustable + parts, such that we could manually visit each Raspberry Pi and adjust the yaw, pitch, and roll of each + camera. We would place a tripod with a pyramid target mounted to it in the center of the camera helix as a + focal point, and using a hand-held HDMI monitor we visited each camera to manually adjust the cameras as + best we could to line them all up on the pyramid target. Even so, this was only a rough adjustment and the + resulting videos were still very bouncy.

+ +

The next approach was a software-based approach to adjusting the translation (pitch and yaw) and rotation + (roll) of the camera images. We created a JavaFX app to help configure each camera with settings for how + much translation and rotation was necessary to perfectly line up each camera on the same exact target point. + Within the app, we would take a picture from the camera. We would then click the target location, and the + software would know how much it had to adjust the x and y axis for that point to end up in the dead center + of each image. Likewise, we would rotate the image to line it up relative to a "horizon" line that + was superimposed on the image. We had to visit each of the 60 cameras to perform both the physical and + virtual configuration.

+ +

Then at runtime, the server would query the cameras to get their adjustments. Then, when images were received + from the cameras (see step 6 above), we used the Java 2D API to transform those images according to the + translation and rotation values previously configured. We also had to crop the images, so we adjusted each + Raspberry Pi camera to take the highest resolution image possible, and then we cropped it to 1920x1080 for a + resealing hi-def movie.

+ +

On each Raspberry Pi, we used a simple Python app. All communication between the Pi units and the server was + done over a multicast connection. On the server, when images were received they were held in memory and + streamed to FFMPEG, such that only the resulting movie was actually written to disk. All communication + between the Oracle JET web UI and the server was done using REST. The server itself was a simple Java 9 + application (we just used the built-in Java web server for our REST API). I would have liked to revisit this + and make use of some of the lightweight Java microservice web servers out there, because that would have + resulted in our having less code. But the end result was still rather pleasant for such a small project.

+ +

About the Author

+ +

Richard Bair is currently the cloud architect for the Oracle Internet of Things suite of products. Previously + he spent several years as the Chief Java Client Architect at Oracle. He has presented numerous times at + JavaOne over the past 12 years.

+
+
+ \ No newline at end of file diff --git a/Articles/Java/131-inside-java-mobile-app-part2.html b/Articles/Java/131-inside-java-mobile-app-part2.html index e69de29..5b7d7d9 100644 --- a/Articles/Java/131-inside-java-mobile-app-part2.html +++ b/Articles/Java/131-inside-java-mobile-app-part2.html @@ -0,0 +1,328 @@ + + +
+
+

In Part 1 of this series, I showed + how developers can use the Java APIs (for example, JavaFX APIs) to create a Java client application that + runs on desktop, mobile, and embedded devices using Gluon Mobile. I showed that with a 100 percent pure + Java codebase, applications can be uploaded to the Apple App Store and the Google Play Store.

+ +

The app we created in that article was a standalone application. When the app ran, it didn't need any + additional information, apart from what is in the application bundle. It goes without saying, but this is + not a very common situation. In reality, most apps interact with back-end applications. Typical back-end + interactions include the retrieval of weather information from a REST endpoint or of the financial + transaction associated with buying a ticket and making a seat reservation for a concert. Because there are + many Java developers working on back-end projects, these back-end systems are often familiar environments. +

+ +

The example demonstrated in the previous article presented end users with a selection of coffee types from + which they could place an order. The list of possible coffee types was statically included inside the + application. In this article, we will make this a remote interaction.

+ +

Mobile Apps Need Back-End Data and Functionality

+ +

We have a back-end system that provides the possible coffee types. By doing a REST call to a specific + endpoint, the coffee types are returned in a JSON response:

+ +
+
+
Copy
+
+ +
+
+curl "http://javahub.gluonhq.com/javahub-demo-backend/modules/coffee/types"
+
+[
+
+  {
+
+    "name":"Ethiopia",
+
+    "type":"1",
+
+    "_links":{
+
+      "self":"http://javahub.gluonhq.com/javahub-demo-backend/modules/coffee/types/coffee/types/1"
+
+    }
+
+  },
+
+  {
+
+    "name":"Honduras",
+
+    "type":"2",
+
+    "_links":{
+
+      "self":"http://javahub.gluonhq.com/javahub-demo-backend/modules/coffee/types/coffee/types/2"
+
+    }
+
+  },
+
+  {
+
+    "name":"Colombia",
+
+    "type":"3",
+
+    "_links":{
+
+      "self":"http://javahub.gluonhq.com/javahub-demo-backend/modules/coffee/types/coffee/types/3"
+
+    }
+
+  }
+
+]
+
+
+
+
+ +

There are a number of ways to connect a mobile app with an enterprise back end and to handle the + interactions. In theory, you can directly use the tools that you use in enterprise systems to connect to + other enterprise systems. In practice, however, this is not encouraged in most cases due to a number of + reasons:

+ + + +

Respecting Mobile and Enterprise Requirements by Using an MBaaS

+ +

A typical solution to overcome these problems is to position a mobile back-end as a service (MBaaS) between + the mobile devices and the back-end system.

+ + + +

Using an MBaaS, mobile devices don't connect to the back end directly; instead, they connect to the + MBaaS, which may forward their request to a back-end system.

+ +

Gluon CloudLink is an MBaaS that + can be connected to by mobile apps regardless of the way in which they are written. However, Gluon CloudLink + works best when mobile apps are written with Gluon Mobile, the toolkit used to create the app in + the previous article.

+ +

Gluon CloudLink is a public cloud service that can connect to other back-end and cloud systems (see Figure + 2), but you can also install it as a service inside other clouds. In this article, I will demonstrate this + using Oracle Container Cloud Service, which is part of Oracle Cloud.

+ + +

Installing Gluon CloudLink on Oracle Cloud

+ +

If you have an Oracle Cloud (trial) account, you can install your own or third-party software in Oracle + Container Cloud Service. You can configure a datasource (for example, a MySQL service or an Oracle Database + service), and link it to your application(s) in Oracle Container Cloud Service. You can access more Oracle + Cloud services, for example, you can leverage the storage capabilities provided by Oracle Storage Cloud + Service. A great advantage of Oracle Container Cloud Service is that it supports a microservices + architecture, and it allows for scalable applications. Gluon CloudLink is a back-end service that internally + contains a number of microservices, and each of them can individually be scaled. Using Gluon CloudLink on + Oracle Cloud allows you to scale the middleware according to your own needs. You have your own instance(s) + running Gluon CloudLink, which connect to your own database service (the public cloud version of Gluon + CloudLink is a multitenant service, where data is isolated between applications, but it is stored in the + same data storage system).

+ +

Note: In the next article, I will explain the internals of Oracle Container Cloud Service, + and I will show how a microservices product such as Gluon CloudLink can be installed and configured on + Oracle Cloud. In this article, I will abstract away these internal details to simplify the discussion.

+ +

Applications that leverage Gluon CloudLink can be managed using Gluon Dashboard, which is a web application + that allows developers, architects, and operators to configure and monitor their mobile apps and the + connections to the enterprise back-end systems. You can open Gluon Dashboard by navigating to https://gluon.io/.

+ + + +

By default, Gluon Dashboard will connect to the public Gluon CloudLink software as a service (SaaS) + installation. Navigate to the Oracle Cloud–specific page at https://gluon.io/oracle_cloud to be able to connect to your + privately hosted Gluon CloudLink instance in your Oracle Cloud infrastructure. In case this is the first + time you want to use Gluon CloudLink on your Oracle Cloud infrastructure, Gluon Dashboard will guide you + through the process of installing Gluon CloudLink on Oracle Cloud. You need to provide your Oracle Cloud + credentials, and then the Gluon CloudLink framework will be provisioned on your own Oracle Cloud instances. + Your data is stored in an instance of Oracle MySQL Cloud Service that is private to you. If you don’t + have an Oracle Cloud account, you can sign up for + a trial.

+ +

After you successfully authenticate in the Gluon Dashboard, the application will search for an existing Gluon + CloudLink installation on your Oracle Cloud environment. If it doesn't find one, a dialog box will be + shown proposing to install Gluon CloudLink on your Oracle Cloud environment, as shown in Figure 5:

+ + + +

Click OK if you want to install the software. The required software will be installed on + your account, and you can follow the progress in a console. This can take some time (for example, 15 + minutes).

+ +

Once the Gluon CloudLink microservices are installed on your Oracle Cloud environment, you are redirected to + the Gluon Dashboard landing view (Figure 6), which shows a menu and an initially empty usage graph.

+ + +

Tunneling Requests Through Gluon CloudLink

+ +

There are many things you can do with Gluon Dashboard, but we will focus on getting the coffee types from our + back-end system and providing them to the mobile app. If you select API Management, you can + add a remote function. Let's create a function named otnCoffees and provide + the REST endpoint that will return the coffee types, as shown in Figure 7:

+ + + +

When you click Save, Gluon Dashboard sends the information to Gluon CloudLink. Then, when a + mobile app requests the otnCoffees function, Gluon CloudLink knows what to do: it + will invoke the remotely defined endpoint.

+ +

Now, we need to modify the code for the mobile app so that it retrieves the list of coffee types by calling + Gluon CloudLink. Currently, we have the service providing the data as follows:

+ +
+
+
Copy
+
+ +
+
+    public ReadOnlyListProperty retrieveOTNCoffees() {
+
+        ObservableList list = FXCollections.observableArrayList();
+
+        Platform.runLater(() -> {
+
+            list.add(new OTNCoffee("type1", "Moka"));
+
+            list.add(new OTNCoffee("type2", "Arabica"));
+
+        });
+
+        return new SimpleListProperty(list);
+
+    }
+
+ +

We will now use the Gluon CloudLink client APIs and replace that code with the following code:

+ +
+
+
Copy
+
+ +
+
+    public ReadOnlyListProperty retrieveOTNCoffees() {
+
+        if (otnCoffees == null) {
+
+ DataClient dataClient = DataClientBuilder.create()
+
+                .operationMode(OperationMode.CLOUD_FIRST)
+
+                .build();
+
+ 
+
+            GluonObservableList gluonCoffees = DataProvider.retrieveList(dataClient.createFunctionListDataReader("otnCoffees", OTNCoffee.class));
+
+            otnCoffees = new ReadOnlyListWrapper<>(gluonCoffees);
+
+              }
+
+        return otnCoffees.getReadOnlyProperty();
+
+    }
+
+ +

In this new code, we first construct a Gluon CloudLink DataClient, which will + manage the connection with Gluon CloudLink and beyond. The DataClient is asked to + operate in CLOUD_FIRST mode, which means that it will first ask Gluon CloudLink + for its data, and if a connection cannot be made, it will look into its local cache (on the mobile device) + to present the data that was retrieved before.

+ +

Next, the DataClient is bound to the remote function named otnCoffees and the result is expected to be a list of OTNCoffee instances.

+ +

The static method retrieveList on DataProvider is then + called, and the data is stored in a GluonObservableList, which extends the JavaFX + ObservableList by adding a few properties that contain information about the + status of the retrieval.

+ +

Note that we didn't have to provide a URL for the actual back-end service. This is all configured by + using Gluon Dashboard in Gluon CloudLink, so once the mobile application can make a secure connection to + Gluon CloudLink, we're all set.

+ +

Gluon CloudLink defines a key pair for each mobile application, and in the case of a hosted version of Gluon + CloudLink (as opposed to the public SaaS version of Gluon CloudLink), the host name needs to be provided to + the mobile application as well.

+ +

In Gluon Dashboard, the Credentials view allows you to store the required configuration file inside the + resource directory of the mobile application, where it will be picked up by the Gluon Mobile tools that + create the app.

+ + + +

As can be seen in Figure 8, the required configuration is generated by Gluon CloudLink in a JSON format.

+ +

On the client, we simply bind the call to the otnCoffees function, as defined in + Gluon CloudLink, to a JavaFX ObservableList. By moving the configuration to the + middleware, we avoid any risks related to changes in the URL structure or protocol, which would require a + rebuild, redeployment, and redistribution of the mobile app. If you don't want to stress your real + back-end system, you can also use Gluon Dashboard to send mock data back to your mobile client. The switch + between mock data and the real data can be made by a single click in Gluon Dashboard, rather than by having + to recompile and resubmit your app to the App Store and the Play Store.

+ +

Note that we didn't create a separate thread for retrieving the data from the back-end system. This is + done under the hood by the Gluon CloudLink client. The result is a JavaFX ObservableList that can immediately be used by the graphical components in the + client app—in this case, the list is used in a selection control.

+ +

In this particular case, the three coffee types are likely to arrive quickly at the mobile app. But in case + there is much data to be retrieved from the back end, using the ObservableList + allows developers to already render the retrieved data while new data comes in. Because Gluon CloudLink + Client will populate the ObservableList when the data is parsed, the visual + control that is backed by this ObservableList will be updated on the fly. This + eliminates a huge amount of boilerplate code.

+ +

Summary

+ +

In this article, we extended the OTN demo app from the previous article. Rather than showing a hardcoded list + of coffee types, we retrieved the list from a back-end service. However, we didn't access that back-end + service directly; we used middleware called Gluon CloudLink that bridges the gap between the mobile devices + and the enterprise infrastructure.

+ + +

We explored how you can provision Gluon CloudLink on your Oracle Cloud infrastructure, and how you can easily + configure Gluon CloudLink using its dashboard to map client requests onto external requests by defining + functions that can be called on the mobile client.

+ +

About the Authors

+ +

Johan Vos started working with Java in 1995. He was part of the Blackdown team, porting Java + to Linux. His main focus is on end-to-end Java, combining back-end systems and mobile/embedded devices. He + received a Duke's Choice Award in 2014 for his work on JavaFX on mobile devices.

+ +

In 2015, he cofounded Gluon, which allows enterprises to create mobile Java client applications leveraging + their existing back-end infrastructure. Gluon received a Duke's Choice Award in 2015. Vos is a Java + Champion, a member of the BeJUG steering group and the Devoxx steering group. and he is a JCP member. He is + the lead author of Pro JavaFX 8 (Apress, 2014), and he has been a speaker at numerous conferences on Java. +

+
+
+ \ No newline at end of file diff --git a/Articles/Java/135-deep-learning-on-clients.html b/Articles/Java/135-deep-learning-on-clients.html index e69de29..991edfb 100644 --- a/Articles/Java/135-deep-learning-on-clients.html +++ b/Articles/Java/135-deep-learning-on-clients.html @@ -0,0 +1,357 @@ + + +
+
+

Neural networks are increasingly being used in a myriad of applications. In the typical use case, all raw + input data is sent to a server or a bunch of cloud instances, where the data is used to train and evaluate a + model. That model is then used to make predictions based on specific input, and hands over the prediction to + the interested party.

+ +

In this article, we will explore a complementary way of training the model by leveraging the power of client + devices.

+ +

The Concept
+ Neural networks are based on the idea of a model that accepts some input (for example, pictures, numbers, + characters, and so on) and produces some output that is relevant to the input (for example, "that was a + picture of a dog," "the number was 3," "the next character is most likely the letter + b," and so on), as shown in Figure 1.

+ +

Figure 1. Neural networks accept input and produce output that is relevant to the + input.

+ +

The quality of a neural network partially depends on the quality of the data that is fed to it. When a + specific input leads to a wrong output (Figure 2), the network can be retrained with the given data added to + the set of training data.

+ +

Figure 2. If the output is incorrect (for example, "rabbit"), the + neural network can be retrained (for example, to provide "bird").

+ +

Different algorithms exist for training neural networks, and lots of research is currently being performed in + this area. In this article, we will not focus on the training algorithms, but rather on what is required to + perform training.

+ +

Typically, the neural network is trained in a server environment. The resulting model (the numerical + constants that allow the algorithm to produce the "best" result based on the input) is used to + make predictions and evaluations.

+ +

In many cases, though, most of the data is not available in the server environment. Pictures are taken on + mobile phones, autonomous cars are using onboard cameras to generate media that has to be evaluated, and a + keyboard is used to type data.

+ +

While the evaluation of the model can be done relatively easily on clients, training of the data is typically + done on the server side. In order to improve the model, the training data is sent to the server. As a + consequence, the picture you take with your phone, the scenery you drive by with your car, or the words you + type on your smartphone have to be sent over the internet to the servers that are updating the neural + network.

+ +

Clearly, this can lead to a number of privacy issues. Also, it requires lots of bandwidth, because tons of + data has to be sent from clients to the server.

+ +

Distributed learning fixes a number of things. In this case, the local data is used to do local training on + the model. As a consequence of this training, the model on the client improves, leading to different + coefficients (internal numbers that make up the model). The gradient of the model can now be sent to the + server, where it can be combined with the existing model and with gradients sent by other clients.

+ +

As a consequence, the quality of the model improves, and the enhanced model can be sent to the client (Figure + 3).

+ +

Figure 3. The enhanced model can be sent back to the client.

+ +

There are a number of benefits to this approach:

+ + + +

What Do We Need?
+ Many real-world scenarios where neural networks are useful involve mobile or embedded devices (for example, + pictures on a phone, cameras in a car, and characters on a keyboard). Hence, it is important that the + distributed learning techniques work on those devices.

+ +

A popular and open source library providing Java APIs for neural network operations is Eclipse Deeplearning4j (DL4J), which is + supported by SkyMind.

+ +

In order to use the DL4J APIs on a client, that client should be capable of executing Java code. The DL4J + APIs depend on native code for performance reasons; hence, Java Native Interface (JNI) needs to be + supported.

+ +

The DL4J APIs work on mobile devices using the Gluon IDE plugins at https://gluonhq.com/get-started/ide-plugins/. The plugins provide an easy way to + create cross-platform user interfaces based on JavaFX and to leverage existing libraries, including DL4J. In + order to run on Android devices, the Gluon IDE plugins perform the required steps for creating an APK that + can be uploaded to the Google Play Store. No Android-specific code is required, and the JavaFX code that is + used to create a user interface on a desktop also works on Android devices.

+ +

Similarly, this code also works on iOS devices. The Gluon IDE plugins will invoke the Gluon VM tools, which + will compile the Java code ahead of time to native code, link it with other native libraries, and create an + iOS app that can be executed on an iPhone or an iPad, tested in the iOS Simulator, and uploaded to the Apple + App Store.

+ +

As a consequence, the Java code that you use to create applications that leverage the DL4J neural network + APIs is combined with a user interface created with JavaFX that runs on a desktop, an Android device, an iOS + device, and also embedded devices. The important thing for developers is that your code is 100% + cross-platform, because it is all written in Java. The hard part of translating that to the specific, native + systems is done under the hood for you.

+ +

HelloWorld (Multilayer Perceptron Classifier)
+ As an example, let's explore a simple linear classifier that is trained locally.

+ +

You can open the sample code in any IDE (NetBeans, IntelliJ, Eclipse), provided that you have the Gluon IDE + plugin installed. Follow the instructions at https://gluonhq.com/get-started/ide-plugins/ in order to do this.

+ +

Once you open the sample in your IDE, you can quickly run it, and you will see the screen shown in Figure 4: +

+ +

Figure 4. First screen displayed by the sample code.

+ +

Clicking the train network model button will start the local training, and it will result in + the screen shown in Figure 5:

+ +

Figure 5. Results of training the model.

+ +

You can also run this sample on an Android (Figure 6) or iOS device (Figure 7), if you have one connected to + your system via USB. Depending on your IDE, you will see a task called androidInstall or launchIosDevice. This task will + trigger the process for compiling the required dependencies, create the mobile app, and send it to your + device. If you want to create apps that you want to upload to the App Store or the Play Store, you select + the tasks createIPA or createAPK, respectively.

+ +

More information on this process can be found in the Gluon documentation at http://docs.gluonhq.com/charm/latest/#_getting_started.

+ +

Figure 6. Screenshot of the app on an Android device.

+ +

Figure 7. Screenshot of the app on an iOS device.

+ +

The source code contains only two Java files. The main.java file contains the code + for creating the view, and the real work is done by the code in TrainingView.java.

+ +

The TrainingView.java file contains code for training a neural network and for + showing the output in a JavaFX UI.

+ +

The relevant JavaFX code is shown in the following snippet:

+ +
+
+
Copy
+
+ +
+
+    private final Label label;
+
+    public TrainingView(String name) {
+        super(name);
+
+        label = new Label();
+
+        Button button = new Button("train network model");
+        button.setOnAction(e -> {
+            Task task = train();
+            button.disableProperty().bind(task.runningProperty());
+        });
+
+        VBox controls = new VBox(15.0, label, button);
+        controls.setAlignment(Pos.CENTER);
+
+        setCenter(controls);
+    }
+
+
+   + +

The user interface is defined by a VBox that contains a label and a button. The + content of the label will be set by the training function, and it will indicate the current status of the + training/evaluation.

+ +

Clicking the button triggers the training task. The training is performed in a JavaFX task in a dedicated + thread. The JavaFX binding API is used to make sure the button is disabled as long as the training is in + progress. Once the training task is done, the button is enabled again.

+ +

This sample is inspired by the screencast of DL4J at https://www.youtube.com/watch?v=8EIBIfVlgmU. If you want to learn more about the + concepts used in this code, check out that screencast and the related screencasts.

+ +

The basic idea of this sample is to take a pair of numbers between 0 and 1 as input and return an output that + is either 0 or 1. The model is trained by using some well-known input/output pairs. The data that is used to + train and test the model can be visualized as follows: the x and y axes show the input of a sample, the + color red is applied for samples with an output of 0, and the color blue is applied for samples with an + output of 1 (Figure 8).

+ +

Figure 8. Red indicates input samples that return an output of 0; blue indicates + input samples that return an output of 1.

+ +

Now, we will create a simple neural network that will be trained using some of the samples and then evaluated + using another set of those samples.

+ +

When you click the train network model button, a number of things happen in sequentially. +

+ +

First, training data is read from a supplied file. Note that in real-world cases, the test data is typically + obtained by interactions with the user (for example, the user enters text or takes a picture).

+ +
+
+
Copy
+
+ +
+
+     RecordReader rrTrain = new CSVRecordReader();
+     rrTrain.initialize(new 
+InputStreamInputSplit(TrainingView.class.getResourceAsStream("/linear_data_train.csv")));
+     DataSetIterator iterTrain = new RecordReaderDataSetIterator(rrTrain, batchSize, 0, 2);
+
+
+   + +

In order to evaluate the model, we need evaluation data. This is achieved using the follow code:

+ +
+
+
Copy
+
+ +
+
+    RecordReader rrEval = new CSVRecordReader();
+    rrEval.initialize(new 
+InputStreamInputSplit(TrainingView.class.getResourceAsStream("/linear_data_eval.csv")));
+    DataSetIterator iterEval = new RecordReaderDataSetIterator(rrEval, batchSize, 0, 2);
+
+
+   + +

The following code snippet creates a neural network with two layers (one hidden layer). The input of the + first layer contains two numbers that are the x and y value of the dots in Figure 8. The other layer + contains two numbers as output, which contain the probabilities of the outcome either being 0 (red) or 1 + (blue).

+ +
+
+
Copy
+
+ +
+
+int numInputs = 2;
+                int numHiddenNodes = 20;
+                int numOutputs = 2;
+
+                MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
+                        .seed(seed)
+                        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
+                        .updater(new Nesterovs.Builder()
+                                .learningRate(learningRate)
+                                .momentum(0.9)
+                                .build())
+                        .list()
+                        .layer(0, new DenseLayer.Builder()
+                                .nIn(numInputs)
+                                .nOut(numHiddenNodes)
+                                .weightInit(WeightInit.XAVIER)
+                                .activation(Activation.RELU)
+                                .build())
+                        .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
+                                .nIn(numHiddenNodes)
+                                .nOut(numOutputs)
+                                .weightInit(WeightInit.XAVIER)
+                                .activation(Activation.SOFTMAX)
+                                .build())
+                        .pretrain(false)
+                        .backprop(true)
+                        .build();
+
+
+   + +

Now that the model is defined, we can train it:

+ +
+
+
Copy
+
+ +
+
+    MultiLayerNetwork network = new MultiLayerNetwork(conf);
+  network.init();
+  network.setListeners((IterationListener) (model, iteration, epoch) -> {
+      Platform.runLater(() -> label.setText("Running iteration #" + iteration));
+  });
+  Platform.runLater(() -> label.setText("training model..."));
+  for (int n = 0; n < numEpochs; n++) {
+      network.fit(iterTrain);
+  }
+
+
+   + +

We will use a number of iterations for training the model, and whenever an iteration is done, this will be + indicated in the JavaFX label that indicates the current state.

+ +

Because we run the training in a dedicated JavaFX task that runs in its own thread, we have to make sure we + update the JavaFX Scene Graph by calling Platform.runLater().

+ +

Now that the model is trained, we can evaluate it to see how well it performs. This is achieved using the + following snippet, which uses the test data created above:

+ +
+
+
Copy
+
+ +
+
+    Platform.runLater(() -> label.setText("evaluating model..."));
+    Evaluation evaluation = new Evaluation(numOutputs);
+    while (iterEval.hasNext()) {
+        DataSet dataSet = iterEval.next();
+        INDArray features = dataSet.getFeatureMatrix();
+        INDArray labels = dataSet.getLabels();
+        INDArray predicted = network.output(features, false);
+        evaluation.eval(labels, predicted);
+    }
+    Platform.runLater(() -> label.setText("model evaluation result:\n" + evaluation.stats()));
+
+
+   + +

At the end of the evaluation, we set content of the JavaFX label to the results of the evaluation, as shown + in Figure 8.

+ +

Next Steps
+ This sample shows how you can create and train neural networks on mobile devices. This is only the basics of + what you can do. In a follow-on article, I will show how you can share the model with the server, where you + send gradient updates only and the server periodically sends enhanced versions of the model.

+ +

About the Author

+ +

Johan Vos started working with Java in 1995. He was part of the Blackdown team, porting Java to Linux. His + main focus is on end-to-end Java, combining back-end systems and mobile/embedded devices. He received a + Duke's Choice Award in 2014 for his work on JavaFX on mobile devices.

+ +

In 2015, he cofounded Gluon, which allows enterprises to create mobile Java client applications leveraging + their existing back-end infrastructure. Gluon received a Duke's Choice Award in 2015. Vos is a Java + Champion, a member of the BeJUG steering group and the Devoxx steering group, and he is a JCP member. He is + the lead author of Pro JavaFX 8 (Apress, 2014), and he has been a speaker at numerous conferences on Java. +

+
+
+ + \ No newline at end of file diff --git a/Articles/Java/166-modular-reusable-javaee-architecture-with-docker.html b/Articles/Java/166-modular-reusable-javaee-architecture-with-docker.html index e69de29..b8b2149 100644 --- a/Articles/Java/166-modular-reusable-javaee-architecture-with-docker.html +++ b/Articles/Java/166-modular-reusable-javaee-architecture-with-docker.html @@ -0,0 +1,477 @@ + +
+
+

Can you easily replicate all the services your Java EE application needs to run? Every Java EE application + needs many services: databases, message queues, NoSQL services, authentication, user management, and much + more. We run some of those services inside the application server during development and small deployments. + But once we grow and scale our infrastructure, we need to separate each service in its own environment. + That's when things get complex.

+ +

Software containers can help us manage and tame this complexity. Containers make it easy to create + sophisticated architectures composed of many "moving parts." Containers also help automate this process, and + ease the creation of complete development-test-deploy pipelines. Best of all, containers allow for the reuse + of our infrastructure components. We can build infrastructures with containers from third parties, just + adding them to our deployments. Or we can build infrastructures from the containers we create ourselves.

+ +

We saw in the first article in this series how to encapsulate a Java EE application inside a container. Java + EE gives us portability among Java EE vendors. Containers expand this portability to many infrastructure and + cloud providers.

+ +

Let's see now how containers can help us define our architecture in a modular, reusable way!

+ +

Enterprise Solutions Are Composed of Many Services

+ +

When we construct applications out of several services, we have important advantages. It is much easier to + scale independent layers than it is to scale a single monolithic application. This is also true for our + pipeline when we are creating development, testing, and deployment environments. Any discussion about + "microservices" centers around those benefits. Separating large applications into modular services is + technically sound and also fashionable!

+ +

Not every application is composed of microservices, but the Java EE architecture does define and use multiple + services. Lots of Java EE functionality is deployed on external services, which makes our applications much + more useful and easier to create. And those services, although not microservices, are deployed (and scaled + and developed) independently. This is part of what allows the Java EE architecture to scale to large + environments.

+ +

Figure 1 shows a Java EE application with multiple services, such as a web server, an application server, a + message queue (MQ) server, and a database server.

+ +

Figure 1. Multiple services that support a Java EE application

+

Figure 1. Multiple services that support a Java EE application

+ +

Almost every Java EE application uses many of the services shown in Figure 1 as well as SQL databases, + message queues, NoSQL servers, user management servers, storage services, and many more. These services are + independently installed and managed, and applications depend upon them.

+ +

This usually means that we have to install, configure, and manage each one of those services. Many times, + they will need to run on their own machines. If we want to scale or guarantee high availability (HA), we + need multiple instances of each one, which adds even more configuration and installation hurdles to the + whole process. It gets to the point that it is hard to re-create all those services in our development and + testing environments. So we don't. This leads to different environments in our pipeline. Different + environments increase the risk of failure and make debugging much harder—even if we are able to automate + those disparate environments.

+ +

Encapsulating Services for Independence and Scalability

+ +

Containers are a great way to install and manage separate services. In the first article in this series, we + saw how to encapsulate an application. We can also encapsulate any of those external services. Services + inside a container are easy to install, manage, and scale. By running several containers on a single + machine, we can develop and test our architecture. Once tested, we can separate the containers onto multiple + machines and even replicate containers. With the same building blocks, we can create large, scalable, and + highly available production environments. Figure 2 shows a diagram of a Java EE architecture built using + containers.

+ +

Figure 2. Diagram of a Java EE architecture built using containers

+

Figure 2. Diagram of a Java EE architecture built using containers

+ +

Modular and Reusable Services

+ +

Creating scalable and highly available architectures is already wonderful, and it is even better if we can + reuse services that are already scalable.

+ +

By accessing container repositories, we can compose architectures out of existing components, and we can + reuse containers in multiple projects and environments.

+ +

If we are using services that exist in the open source community or are provided by vendors, composing an + architecture is even easier. There is a large pool of ready-to-use services prepackaged as containers, so we + can compose our infrastructure.

+ +

Many repositories of containers are available. Docker Hub is the leading repository, and it provide many + (many!) images. But Google, Amazon, and others also provide public repositories. You can even host your own + repository. You can search for an official (or not!) image of services, vendors, and products. The source + code for most images is also available on GitHub.

+ +

Improving Our Original Application

+ +

Let's see how we can improve the application we created in the first article in this series. Building on top + of our simple Java EE application that has an NGINX load balancer, we'll add several services.

+ +

A typical enterprise application will need a database to store information. To talk with other systems in our + enterprise, we can use a messaging server. We can integrate our application with legacy applications or even + Internet of Things (IoT) devices. And, our application will be awesome, so it needs some big data + capabilities. For this, let's add a NoSQL database to the mix. Figure 3 shows a diagram of the final + application.

+ +

Figure 3. Diagram of our enhanced application

+

Figure 3. Diagram of our enhanced application

+ +

If you want to follow along, all the commands and files that are mentioned here are at this GitHub + repository: https://github.com/eldermoraes/javaoneus2016.

+ +

First, the Database

+ +

The most obvious thing for any Java EE application is the database. In this article, we are going to use + PostgreSQL, an open source, powerful database. It can run in a cluster, so if we need to scale, we can run + containers on multiple machines.

+ +

A quick search at Docker Hub shows an official PostgreSQL image. We can use the container as is, or we can + create a new image based on an existing one and add our own configurations. Let's do the latter to see how + we can customize PostgreSQL for our specific application.

+ +

We begin by building our PostgreSQL image. If you downloaded the project, change to the postgres directory + and run the following command:

+ +
+
+
Copy
+
+ +
+    
+    docker build -t postgres-javaone .
+    
+
+ +

Don't forget the period at the end. This command will build a new image based on the PostgreSQL Dockerfile, + which was provided in the downloaded project:

+ +
+
+
Copy
+
+ +
+    
+    # Using PostgreSQL official Docker image as a basis FROM postgres:9.6  
+    
+    # Copy your local file init-user-db.sh to the image,  
+    # as it will be executed when you run a new container 
+    COPY docker-entrypoint-initdb.d/init-user-db.sh /docker-entrypoint-initdb.d/init-user-db.sh 
+    RUN chmod a+x /docker-entrypoint-initdb.d/init-user-db.sh  
+    
+    # Define default language for the database running  
+    # from this image 
+    RUN localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 
+    ENV LANG en_US.utf8
+    
+
+ +

We just built our own PostgreSQL image! We can use it in this or other projects. Let's start it and get it up + and running:

+ +
+
+
Copy
+
+ +
+    
+    docker run --name postgresdb -p 5432:5432 postgres-javaone
+    
+
+ +

We are giving it a name, so we can access it from our application container. We are also exporting the port, + so we can connect to it externally.

+ +

Now, Let's Handle Messaging

+ +

With the database taken care of, the next service our application needs is a message queue server. Many Java + EE applications use Java Message Service (JMS) to route messages between components. It works inside the + application or to reach out to the external world to legacy systems, external services, and Internet of + Things devices. A good message server can connect our application to virtually anything. It can also deal + with synchronous and asynchronous communication. Messaging is a powerful way of connecting systems in a + decoupled way.

+ +

Most Java EE application servers offer an internal JMS service. This can work. But the message server is an + independent service, and you might need to scale it independently. An external server can also provide more + options for connectivity. In our example, we will use Apache ActiveMQ. It supports many protocols and has an + active community. More important: it is open source, so you can use in projects of any size.

+ +

In this case, we will just run an existing image as is. A search on Docker Hub locates several images. + Although there is no official ActiveMQ image, there are several done by the community. We selected one that + looks well implemented. Now, we can just name the image in the docker run command, as shown in + the following command. The docker daemon will download the image, create a container, and run + it.

+ +
+
+
Copy
+
+ +
+    
+    docker run --name activemq webcenter/activemq:5.13.2
+    
+
+ +

How About Some Big Data?

+ +

Our infrastructure is looking good! Today, every application needs some way to handle unstructured data or + just large amounts of data. So, let's add a NoSQL server to our environment. That way, we are ready to + handle some big data. Our little example app will shine!

+ +

Again, searching on Docker Hub, we see an official image for Cassandra. We don't need any special + configuration, so, let's just use this as is.

+ +

Again, we don't need to build an image. We can just run it. As before, Docker will download the referenced + image from the registry and run it:

+ +
+
+
Copy
+
+ +
+    
+    docker run --name cassandradb cassandra:3.7
+    
+
+ +

Isn't this fun? In just a few commands, we have a complete infrastructure ready for our application! Now we + can go back to the image that we created (see previous article). That image is an appliance, which means we + include in it the last version of our application, already deployed and ready to run. Let's make a few + changes to that image, so we can use the new services we just deployed.

+ +

To build the application container, we change to the following directory:

+ +
+
+
Copy
+
+ +
+    
+    javaoneus2016/tree/master/tomee-db
+    
+
+ +

And then we build the image:

+ +
+
+
Copy
+
+ +
+    
+    docker build -t tomee-db --build-arg WAR_FILE=javaonedb.war .
+    
+
+ +

Here is the Dockerfile we are using in the above command:

+ +
+
+
Copy
+
+ +
+    
+    # Using TomEE official Docker image as a basis 
+    FROM tomee:8-jre-1.7.2-webprofile  
+    
+    # Configure our server with HA settings 
+    ADD server.xml /usr/local/tomee/conf/server.xml  
+    
+    # Add some users to TomEE, so we can log in  
+    # to the admin panel later to see the results 
+    ADD tomcat-users.xml /usr/local/tomee/conf/tomcat-users.xml  
+    
+    # Add DataSource configuration 
+    ADD tomee.xml /usr/local/tomee/conf/tomee.xml  
+    
+    # Add Postgres JDBC JAR file 
+    ADD postgresql-9.4-1206.jdbc42.jar /usr/local/tomee/lib/postgresql-9.4-1206.jdbc42.jar  
+    
+    # Now we add our application.  
+    # This is the last step, so we can use Docker caching capabilities  
+    # every time we re-create the container with a new version of the application  
+    
+    ARG WAR_FILE=warfile.war 
+    ADD ${WAR_FILE} /usr/local/tomee/webapps/${WAR_FILE}
+    
+
+ +

So, now we have a new image with the latest version of the application. This version adds code to use all the + services we installed.

+ +

Note that this is a simple, standard Java EE application. There is nothing special in the application itself. + We are just using basic Java EE constructs.

+ +

In our JSP page we refer to a servlet that injects an Enterprise JavaBeans (EJB) bean to access the database. + Shown below is the relevant code for the JSP page, the servlet, and the bean.

+ +

Here's the index.jsp file:

+ +
+
+
Copy
+
+ +
+    
+    

The Host is

+

<%=InetAddress.getLocalHost().getHostAddress()%>

+
+ +
+ + <%if (dataList != null && !dataList.isEmpty()) {%> + <%for(Data data: dataList) {%> +

<%= data.print() %>

+ <%}%> + <%} %> +
+
+ +

Here's the DataServlet file:

+ +
+
+
Copy
+
+ +
+    
+    //Inject the EJB 
+    @EJB 
+    DataBean data; 
+    ... 
+    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {     
+         data.create();     
+         request.setAttribute("dataList", data.getData());     
+         RequestDispatcher dispatcher = request.getRequestDispatcher("index.jsp");     
+         dispatcher.forward(request, response); 
+    }
+    
+
+ +

And here's the DataBean file:

+ +
+
+
Copy
+
+ +
+    
+    public List getData(){     
+         return em.createNamedQuery(Data.FIND_ALL).getResultList(); 
+    }      
+    
+    public void create(){     
+         Data data = new Data();     
+         data.setNameData(new Date().toString());     
+         em.persist(data); }
+         
+
+ +

As you can see, containers allow us to deploy virtually any Java EE application in this way. This is a simple + example, and it is possible to do a more-sophisticated separation of an application into containers. We + could even go all the way and create a complete modular microservices architecture. To get to microservices, + creating containers for the major services is an easy first step. It requires hardly any changes in your + application and provides immediate benefits.

+ +

Run, Java EE, Run!

+ +

There are already several containers up and running. PostgreSQL, Cassandra, and ActiveMQ are each running in + their own containers. We can now stop, scale, and manage each one.

+ +

All that is missing is the container that holds the application itself. We just built the image, but we still + need to run the container.

+ +

As we did in the previous article, from the appliance image we create the containers. We will run multiple + containers to have a load balancer. We also need to link everything together, so the application can see the + other services:

+ +
+
+
Copy
+
+ +
+    
+    docker create --name hostdb1 -p 8080:8080 --link postgresdb:postgresdb --link cassandradb:cassandradb tomee-db 
+    docker create --name hostdb2 -p 8081:8080 --link postgresdb:postgresdb --link cassandradb:cassandradb tomee-db 
+    docker create --name hostdb3 -p 8082:8080 --link postgresdb:postgresdb --link cassandradb:cassandradb tomee-db
+    
+
+ +

And to finish, we add another important service: our load balancer. For this, we are still using the + ready-to-run NGINX image by Jason Wyatt. (See the previous article for a more detailed look into the load + balancer configuration.)

+ +

docker create --name loadbalancerdb -p 80:80 --link hostdb1:hostdb1 --link hostdb2:hostdb2 --link hostdb3:hostdb3 --env-file ./env-load.list jasonwyatt/nginx-loadbalancer +

+ +

Now that all the containers have been created, let's start them:

+ +
+
+
Copy
+
+ +
+    
+    docker start hostdb1 
+    docker start hostdb2 
+    docker start hostdb3 
+    docker start loadbalancerdb
+    
+
+ +

Conclusion

+ +

That is all that we need to do to create a complete Java EE application with all the services it requires. + The application can run in a single machine, as shown here. Or you can deploy it across a cluster. It can + run on premises or in the cloud. It can run on your laptop for development and be automated in your + continuous integration server.

+ +

Containers allow us to have more-consistent environments. Having similar test, quality assurance, and + production environments reduces risk and bugs. In production, we can even scale to multiple containers and + multiple machines, guaranteeing high availability.

+ +

This article provided a simplified example, but it outlines a robust process for experimenting with + containers:

+ +

1. Start from your own Java EE application.
+ 2. List all the services you are using now: application server, database, message queue, and so on.
+ 3. Docker Hub may have ready-to-use images for some or all those services. Do a search there, read about the + images; look for the ones that are more suitable for your project.
+ 4. Consider defining your own image. You might need to if there is no suitable image for a service or your + application requires extensive configuration.
+ 5. Either use the images directly (such as Cassandra and ActiveMQ in our example) or start from an existing + image and add your customizations (as we did with the PostgreSQL image).
+ 6. Start your containers to get your services up and running.
+ 7. Create the container for your application. Remember to link it to the service containers. This will allow + the containers to communicate. See the requirements of your orchestration platform.
+ 8. Have fun! Enjoy your modular architecture with Java EE and Docker!

+ +

Your infrastructure is now defined in containers. The next step is building your development pipeline. You + can use those same images to run your tests and quality assurance procedures. Although that will need to be + covered in another article, you already have everything you need to automate your whole development cycle! +

+ + +

About the Author

+ +

Bruno Souza believes software developers have a huge impact in the world, and can effectively improve the + planet. That is why he is passionate about developer communities. Souza has dedicated his life to helping + developers worldwide reach their true potential. Also known as the "Brazilian JavaMan," he is a Java + developer at Summa Technologies and a cloud specialist at ToolsCloud, where he has participated in some of + the largest Java projects in Brazil. Souza is also President of SouJava and has twice been on the Board of + Directors at the Open Source Initiative. He believes that Java and open source are the path to career + excellence and that taking responsibility for delivering software is the mark of great developers.

+ +

Elder Moraes is a software developer passionate about Java EE development and systems architecture. He has + experience in projects in many areas, from financial and legal to human resources and logistics. He is a + speaker at events such as JavaOne and The Developers Conference, where he focuses on how developers can + improve their projects through a better understanding of architecture challenges.

+ +

André Tadeu de Carvalho is a Java consultant at Summa Technologies, a DevOps architect at ToolsCloud, and a + technical writer at Jelastic. He is passionate about learning and experimenting, and has certificates and + certifications for various courses.

+ +
+
+ + + + \ No newline at end of file diff --git a/Articles/Microservices/147-jellema-microservice-design.html b/Articles/Microservices/147-jellema-microservice-design.html index e69de29..e82ec75 100644 --- a/Articles/Microservices/147-jellema-microservice-design.html +++ b/Articles/Microservices/147-jellema-microservice-design.html @@ -0,0 +1,405 @@ + + +
+
+

No one has achieved success with microservices just + by talking about them. Unfortunately, many organizations spend a lot of time on exactly that, debating how + to approach microservices. It is as though there is one perfect approach to designing and working with + microservices that needs only be uncovered. In actual fact, there is no such definitive solution; even if + there were, it would hold true only until changes in the organization, business objectives, technology + frameworks and regulations made adjustments necessary.

+ +

It is tempting—just as it was a decade ago with SOA Web Services—to spend a lot of time and + energy on identifying microservices. Creating an exhaustive overview of all microservices, defining the + exact scope and interface of each, is not feasible and is not a smart investment of time. It would be a lot + of work, and that work would never be complete. The definition of microservices is not an end in itself and + giving in to this temptation represents a serious risk. Microservices are an instrument for achieving + sustained business agility in a changing world of functional and non-functional requirements and evolving + technical, political, economic, and legal parameters. Microservices cannot be defined once and for all, and + they should not have to be. As architects and developers we are agile and flexible. We embrace change in all + aspects of our IT organizations.

+ +

Here’s another organizational risk familiar from the SOA era: starting with an exclusive focus on the + technology for implementing microservices and on the microservices platform, the underlying platform for + eventually running the microservices (that do not even exist yet and for which no requirements are yet + known). It is all too easy to spend time on this seemingly useful exercise and, after months of + investigation and selection and architecting, to end up with an impractical, oversized and over-engineered + platform – and no running microservices. Such discussions slow down the process of microservice + adoption, obstruct the view of the essential challenges, and set up an organization for disappointing + results (if not outright frustration).

+ +

A third category of risk is to just start building microservices without a clear business need for or + objective with a microservice architecture or, even worse, without really understanding what a microservices + architecture entails from an organizational perspective. The operative keyword being overlooked: DevOps.

+ +

This article provides some insights and guidelines that can help propel teams of architects beyond + discussions and into action. Perhaps it can also help establish some architecture guidelines, such as the + importance of domain design.

+ +

What do we want to achieve with microservices?

+ +

When discussing microservices, we must remember what our objectives are. Microservices are not the objective; + they are merely the means. Microservices are meant to help us with those objectives and if they do not do + so, we neither need nor want them.

+ +

Both in Development and in Operations we want to have better control and more insight so we can change + functional and non-functional aspects as dictated by business needs. And we have the impression that + microservice concepts and considerations will help us get there.

+ +

What is a microservice, anyway?

+ +

It’s a reasonable enough question to ask, but apparently it’s a tough one to answer. It seems + easier to describe what the microservices architecture is intended for than to explicitly define a + microservice itself. Perhaps we tend to stay away from a concrete definition because we find we are not so + much in agreement with our peers as we thought we were. Let me try to describe the picture I see in my mind + when talking about microservices:

+ +

A microservice is an independent business component that we can change, release, scale, relocate, make + available, replace, charge for, report on and decide about, without dependencies on other microservices + (and their associated teams, owners, users, etc.). A microservice has clear business value and + implements a specific business capability. A microservice is owned through its entire DevOps/BusOps + lifecycle by a single team of no more than 6-8 people. One team could own a few microservices; each + service will fall under precisely one team.

+ +

The word “microservice” leads many people to assume microservices are much + smaller than “regular” services. But, to me, one microservice can easily comprise several of our + typical SOA (Web) Services. In my mind:

+ + + +

Note: I find no clear connection between microservices and containers. Of course, containers are a great + vehicle for achieving standardized and automated CI/CD, release, scaling and management, and allow us to + package application and platform components together into standalone units; as such, they can be a great + help for implementing microservices. At the same time, one microservice can easily be implemented using + several containers or totally without using containers.

+ +

Comparison with departments in a company?

+ +

Microservices are designed from business functionality and responsibility (maximize cohesion vs. maximize + decoupling), based on business domain decomposition. Microservices translate to technical stuff, such as + deployment artifacts and containers and perhaps database schemas or partitions, and even more to + organizational aspects, such as ownership, planning and budgeting, coordination and team management.

+ +

Microservices can be compared with departments in a company: relatively separate units with a clear + responsibility that can be largely "encapsulated" and are interacted with in largely predefined + ways that should not be bypassed by outsiders.

+ +

The size and number of departments and their scope is not easily defined in a generic manner. Some are small, + some large. Some may consist of just a part timer (e.g., our legal department consists of a specialist hired + for 6 hours per week) and others can be fairly large (the pool of truck drivers, for example). As the + company grows or diversifies, the departments can be split into more specialized units.

+ +

At the departmental level or per department, it is decided how activities are performed, who is hired or + fired, where activities are performed and when the department is available to perform work. Too many tiny + departments can introduce a lot of overhead and very complex process chains, even though they may allow a + high degree of specialization and flexibility. Departments that are too large are hard to manage and may + adopt different ways of working. Creating two departments out of a single one or combining several + departments into one unit is a fairly common practice, although it may not always be a very smooth + transition.

+ +

Exploration of the bounded contexts that are the breeding ground for microservices takes place in a similar + way as the above search for the optimal departmental structure: as part of domain-driven design. Citing Gérald + Croës: "Bounded contexts are the Single + Responsibility Principle applied to your domain model. Each part of your system has its + intelligence, data, and vocabulary. Each part is independent of one another."

+ +

General guidelines

+ +

When defining microservices, some general guidelines override all other, more subjective considerations:

+ + + +

As Sam Newman states: "When it comes to how small is small enough, I like to think in these terms: + the smaller the service, the more you maximize the benefits and downsides of microservice architecture. + As you get smaller, the benefits around interdependence increase. But so too does some of the complexity + that emerges from having more and more moving parts. As you get better at handling this complexity, you + can strive for smaller and smaller services." (Building Microservices, by Sam Newman, O'Reilly + Media, Inc., 2016, ISBN: 1491950358)

+ +

Starting a new microservice: why and when?

+ +

Two application sections that are currently part of the same microservice (or monolith) are perhaps better + off in distinct microservices if they have stark differences in one or more characteristics, to such an + extent that it hampers us or cramps our style. Some measures and mechanisms are expensive, so they should + not be wasted on application areas that do not require them. Making functionality Highly Available or Ultra + Secure or Supremely Well Tested comes at a price. Identifying areas that need special measures and isolating + them as separate components (i.e., microservices) to ensure the elevated levels are focused and applied only + where needed is one of the considerations.

+ +

The motivation for breaking off a chunk from an existing application component should be the fact that the + chunk is currently not (in) a separate microservice makes it harder or more expensive to do things we want + to do to only one of them, such as make functional changes, improve availability, test for regression, set + up monitoring, train developers, or achieve high level of confidentiality. Therefore, we branch off one or + more microservices to remove these limitations that we experience. The effort of branching off and the + overhead introduced as a result of having two or more microservices instead of a single microservice is + justified by the gains from the independence we realize between the microservices. Otherwise, we don’t + do it.

+ +

Characteristics that drive the decision to extract one or more microservices from an existing component (when + various areas within a component hugely differ in them):

+ +

Business

+ + + +

Organization

+ + + +

Architecture, technology, non-functionality

+ + + +

Operational

+ + + +

How to deal with change

+ +

There is no perfect microservices design and certainly not one that will remain a perfect fit forever. + Organizations are in constant flux, as is the world around them. What was a great fit yesterday in terms of + microservices may no longer match the situation of tomorrow. We might as well face that we do not design for + posterity and instead embrace the changes and refactor our microservices accordingly.

+ +

Change is the only constant, so let's embrace change

+ +

The Agile way of working and thinking mandates us to embrace change—change with regard to functionality + and priorities and also with respect to technology, process and architecture. This mentality needs to be + spread throughout the complete organization from the C-Level down to the clerks in the business departments. + This critical aspect is not often seen in organizations.

+ +

We should be prepared to modify our microservices design when the situation changes, and be happy for the + chance to improve instead of chagrined that our design apparently was not good enough. This can happen, for + example, when the decision is made to implement a SaaS service, to outsource part of the business + activities, to implement specific security considerations regarding certain types of data, or to merge the + organization and its IT assets with another company—or simply when the organization is successful and + is growing rapidly.

+ +

Refactor microservices: split and (rarely) merge

+ +

We should be prepared almost at any time to extract newly identified microservices from an existing + application component. When the cracks revealing a candidate microservice begin to show and a certain subset + of an existing application component or microservice is found to meaningfully differ from the larger context + it lives in (e.g., on one or more of the characteristics listed above), we have to split off that subset as + a new microservice.

+ +

This will happen so regularly that organizations should have no difficulty going through this process. Given + that we will start out with one or very few microservices, it is likely that we will have to split off + microservices regularly. We can facilitate that process by preparing for it; just as in the figure above, we + should define and visualize the boundaries between subdomains so there is awareness in our team about the + areas that, though part of a single entity, should be treated as at least a little separate from each other. + We should assign names to these subdomains and use those names in user stories, documentation and in the + naming conventions for our software artefacts. Interactions within the microservices and across the border + between the subdomains should be created in a considered manner: whenever possible, through the public + channel (API or, better yet, events) or at least all through a single interface. There should be no direct + references or dependencies between data models in the subdomains.

+ +

The biggest challenge with splitting microservices is data. It’s easier to split off code than it is to + split data, both the data model and the actual data records. Once data is split, the way transactions are + supported must also be reconsidered, at least if transactions were supported across the boundary where we + now have two microservices. The transaction design perhaps should be changed, or patterns like SAGA and + Event Sourcing must be considered.

+ +

Note: The reverse operation of merging two microservices into a single component should be much easier. + Formerly heavily decoupled components with disjunct ownership, backlogs and budgets are combined into a + single unit with permission—but no mandate—to interact more intimately. In reality, merging + mature microservices does not happen very often; a merge will be considered only when there is no + justification for having two separate microservices and when the short-term effort to merge is less than the + long-term overhead of continuing with the existing situation. Unfortunately, the risk and effort are + typically not justified since there is no immediately apparent gain from merging. By having and practicing a + clear procedure for merging microservices, just as we have for splitting them, we get closer to embracing + change and can get a better fit for the microservices with our organization.

+ +

Example: the subdomains in a web shop's logistics microservice

+ +

Here’s a simple example: a small organization runs a web shop. Its application landscape has been + designed based on the functionality required, the SaaS services subscribed to, the structure of the + organization and the user groups. The size of the outfit is also a factor: the IT department is currently + very small and the scale of the operation is very limited.

+ +

However, it is clear from the beginning that the logistics microservice comprises areas of functionality that + are related but distinct as well. Warehousing (keeping track of the stock and ensuring that replenishment is + done from vendors and) Shipping (picking orders up from the warehouse and transferring them to shipping + partners for delivery to customers) are both part of Logistics. They have touchpoints and definitely deal + with similar aspects of the business, yet they are different.

+ +

It is too early for this organization to break up the Logistics domain into Warehousing and Shipping (and + more?), but it makes sense to recognize the separate areas that may well evolve into distinct microservices. + We do this by associating user stories, documentation, software artefacts, etc., explicitly with their + respective sub-domains when naming and organizing them. Additionally, we try to prevent any direct + dependencies between artefacts from these two subdomains. The data models should be kept separate. + Interactions between the two subdomains should be explicitly managed, documented and implemented, ideally + through external APIs and events.

+ +

In the same organization, a similar example is provided by the Orders and Shopping Cart. Right now, they not + only share one bounded context but they are also implemented in a single service. However, chances are that + Ordering and Shopping Cart will evolve separately and change for different reasons. They may eventually + become separate services—and perhaps distinct bounded contexts. We will then have to consider the + split in data and define events so the two pieces of functionality can still interoperate. even though now + they’re fully decoupled.

+ +

Fewer microservices is better

+ +

The fewer microservices we need to achieve our true objective, the better it is. More microservices means + more overhead and costs; in terms of organizational responsibilities and work coordination; complexity and + administration effort (cross-domain transactions, more moving parts to oversee, monitor); resource usage + (each microservice requires additional infrastructure resources); and performance (highly decoupled + interactions across microservices will take considerably longer in terms of network latency than internal + calls within a microservice).

+ +

If one microservice could bring us all we need, that would be great! One component that does it all: is that + therefore a monolith? Well, literally it is. But it should not have the challenges commonly associated with + monoliths (such as hard to scale, hard to change), or it would be broken up into multiple microservices.

+ +

In reality, in most situations our application landscape will be heterogeneous along various axes and too big + to be taken care of by a single team. Because of relevant differences between various areas in our + application landscape, we might entertain the thought of splitting off some areas as microservices. + Ultimately, it is the functional design—translating to bounded contexts—that probably is the + biggest deciding factor in designing the microservices and therefore the eventual granularity.

+ +

Note: an interesting discussion can be held around the distinction between bounded context (an area of + shared, consistent business terminology and data model language along with all persistent data stores, + aligned with a business sub domain) and microservice. We could argue that several microservices (with, e.g., + different scalability requirements) can be created within one bounded context. In terms of business + responsibility and DevOps ownership, ideally the bounded context is not split up and if there are in fact + multiple microservices for pragmatic reasons, they should all be owned by the same team.

+ +

Acknowledgments

+ +

Suggestions, contributions and general feedback from the following people have made this a better article, + and I thank them for it: Jeroen Kooij, Luis Weir, Sven Bernhardt, Michiel van der Sluis, and Sander + Rosenhart.

+ +

Resources

+ +

[1] Building Microservices, by Sam Newman, O'Reilly Media, Inc., 2016, ISBN: 1491950358
+ http://gorodinski.com/blog/2013/04/29/sub-domains-and-bounded-contexts-in-domain-driven-design-ddd/ [2] + “Sub-domains and Bounded Contexts in Domain-Driven Design (DDD)” by Lev Gorodinski , April 2013 + - http://gorodinski.com/blog/2013/04/29/sub-domains-and-bounded-contexts-in-domain-driven-design-ddd/
+ [3] “How big is a microservice?” by Ben Morris, March 2015 - http://www.ben-morris.com/how-big-is-a-microservice/
+ [4] “Thoughts on the macro architecture and the infrastructure for microservices” by Michael + Douglas, April 2018 - https://medium.freecodecamp.org/microservices-from-idea-to-starting-line-ae5317a6ff02 +

+ + +

About the Author

+ +

Lucas Jellema is solution architect and CTO at AMIS, based in the Netherlands. He works as a + consultant, architect, and instructor in such diverse areas as Database and SQL, Java/Java EE, SOA/BPM and + PaaS and SaaS Cloud Solutions. The running theme through most of his activities is the transfer of knowledge + and enthusiasm (and live demos). An Oracle ACE Director and Developer Champion, Lucas is a well-known + speaker at Oracle Code, Oracle OpenWorld, JavaOne, Devoxx and various User Group conferences around the + world. His articles have appeared on Oracle Developer Community, the AMIS Technology weblog and in magazines + and websites around the world. Lucas is the author of Oracle SOA Suite 11g Handbook (2010, McGraw Hill) and + Oracle SOA Suite 12c Handbook (2015, McGraw Hill)

+
+
+ + \ No newline at end of file diff --git a/Articles/containers/102-nodevops.html b/Articles/containers/102-nodevops.html index e69de29..40c1b70 100644 --- a/Articles/containers/102-nodevops.html +++ b/Articles/containers/102-nodevops.html @@ -0,0 +1,327 @@ + +
+
+

DevOps is not just the latest trend to hit the software community, nor is it just the latest buzzword. DevOps + is changing the way companies and teams communicate, how they collaborate, and how they bring products and + services to the market. That would be a lot of credit to give to a simple buzzword.

+

Yet, there are companies that consider DevOps the latest trend, and we all know how tempting it can be to + secure quick benefits by adopting the latest and greatest trend. The problem begins when companies start + looking to hire for DevOps roles in order to adopt this seemingly easy principle. They start looking for + "experienced DevOps"' or "senior DevOps" partners to join their teams in the hope that this will bring them + closer to the ultimate goal, DevOps adoption.

+

If only it was that simple. You see, DevOps is not as easy as just hiring one person or a team of experts. + Having these people does not mean you've adopted DevOps. In fact, you'll be even further from DevOps + adoption if you start with disjointed, siloed teams of DevOps professionals. DevOps is inherently a + cross-team approach to managing the software development processes that relies heavily on collaboration, + automation, and sharing—something smaller startups are uniquely positioned to get right from the beginning. + It's a culture that takes dedication and patience to get right.

+

Because it's a culture, it needs to be embraced by the entire organization. Once implemented, the culture + removes silos and focuses on delivering real value. And this implementation takes time and, perhaps more + importantly, it takes making mistakes to figure out how to get it right for your organization.

+

So, how can you help your team implement DevOps and stop looking for experienced DevOps roles? I've outlined + some highlights below.

+ Don't Start with DevOps Roles or Technologies +
+

For those of us that have built a DevOps culture through countless hours of analyzing mistakes, automating + systems, building communication foundations across teams, and changing the way we approach product + development, deployment, and management, it seems obvious that you can't just hire a DevOps expert to + implement DevOps. You have to understand how DevOps functions in the context of the organization in order to + make the shift to a DevOps production culture, rather than trying to fill a role.

+

Expecting an outside "DevOps expert" to implement your DevOps environment won't work. You have to commit to + making the change from within through the alignment of DevOps practices and philosophies with your + organization's values and goals. In their book Fearless + Change, authors Linda Rising PhD and Mary Lynn Manns PhD highlight the importance of + choosing a change champion at your organization. For them, successfully implementing change was directly + tied to their chosen champion. "I believe the most important element was a successful champion who + engendered interest in process change. Our champion is a respected team member who is well known for getting + work done and for his sincere desire to help lead the organization toward practical improvements."

+

So, don't underestimate the value of your internal DevOps champion and because that person will be working to + integrate long-separated systems and teams within your organization, make sure your champion has support + from all levels within your company. These necessary goals are not easy to accomplish without trust and + understanding from management.

+

Just as hiring an expert won't get you DevOps, adopting DevOps technologies won't do the trick either. + Continuous integration (CI) and continuous development (CD) are an integral part of a fully functioning + DevOps system; however, you can't put the cart before the horse. Your teams have to be disciplined in the + practices of DevOps before adopting the technologies. Otherwise, you'll be forcing developers and operations + to work together before they are ready to share and collaborate on the whole software development pipeline. +

+ Train DevOps Experts from Your Team to Oversee Adoption +
+

As I stated above, hiring DevOps experts won't help you adopt DevOps; however, you can train individuals at + your organization to become DevOps champions or experts. A DevOps expert needs to be motivated by customer + and business objectives and advocate for software—from conception to production and beyond.

+

Choose champions who display natural and strong leadership skills. They should already be interested in or + have expressed an interest in DevOps adoption at your organization. Most importantly, ensure that your + DevOps experts have excellent communication skills that they use frequently and across multiple teams.

+

Do you know the biggest reason behind change management failure? It's a lack of frequent or clear communication. In + order for a team to collaborate and automate systems, team members need to be able to communicate in a + manner that fosters deliberation, honesty, and clarity.

+

So, what do you need your DevOps experts to do? First, they should help teams understand and implement the + core DevOps values: culture, automation, measurement, + and sharing (CAMS). Second, they must work to foster an environment that encourages people to ask + questions and to cooperate with and learn from each other. Be wary of teams that don't ask questions or that + don't appear vulnerable, because they may be suffering from groupthink. Third, + DevOps experts, in order to save time and build consistency, need to champion investing in automation and + provide a clear picture of why DevOps practices are relevant at each stage. And, finally, they need to + encourage sharing tools to increase efficiency and create a higher level of engagement among employees.

+

+ Consider How the Size and Structure of Your Organization Will Affect the Transition +
+

DevOps adoption at startups usually looks different from the way a large organization with a variety of + specialized teams oversees DevOps implementation. For the former, it tends to be easier to start with DevOps + or to make the transition to it, because there are relatively few specialized teams that need to learn to + collaborate. Additionally, individuals are more often naturally part of the whole software development + process and are used to working interconnectedly throughout the process.

+

As an organization gets larger and more specialized, the need for more DevOps champions becomes greater. + More-complex projects might require more experts to lead the way to a DevOps environment, especially when + your team is just beginning. As more team members become used to producing in a DevOps environment, DevOps + experts can transition to focusing on sophisticated techniques—for example, autoscaling, complex monitoring, + and high availability—while a newly trained DevOps expert steps in to fill the previous experts' role.

+

At the organization I work with, we've spent a lot of time trying to adopt DevOps wisely and consistently + across teams while being mindful of practices or technologies we don't need. However, we've still made a lot + of mistakes, and we now feel much more comfortable speaking up about what isn't working, collaborating + across teams, and learning from others at our company that have the expertise. In the end, it's taken a + while; however, today when I look at the various teams at our office, I see many DevOps experts working in a + variety of roles.

+ Conclusion +
+

It's never worth reinventing the wheel, that's why I'm sharing some of the challenges my company has + encountered and how to fix them:

+ +
+

+ About the Author +

+
+
+ Sebastian Velez is director of engineering at PSL, a Latin American agile software + development company based in Mexico and Colombia. He has been in the industry for more than a + decade. In addition to founding different IT startups, Velez has led many large-scale enterprise + projects for notable Fortune 500 companies. He has experience as a developer, software + architect, scrum master, and college professor. +
+
+
+
+
+ + + + + + +
+
+
+
Follow Oracle Developers
+
+ +
+
+ + + + + + + + +
+
+ + + + + +
+
+
+
+ + +

Integrated Cloud Applications & Platform Services

+ +
+
+
+ + \ No newline at end of file diff --git a/Articles/databases/101-nashorn-javascript-part3.html b/Articles/databases/101-nashorn-javascript-part3.html index e69de29..3a5a5ef 100644 --- a/Articles/databases/101-nashorn-javascript-part3.html +++ b/Articles/databases/101-nashorn-javascript-part3.html @@ -0,0 +1,564 @@ + +
+
+

The Nashorn engine has been deprecated in JDK 11 as part of JEP 335 and and has been removed from JDK15 as part of JEP 372. To learn more, please read Migration Guide from + Nashorn to GraalVM JavaScript.

+ + +

This series of articles introduces Nashorn, a blazing fast JavaScript runtime engine that shipped with Java SE 8 + to provide a lightweight environment for extending, supplementing, and often even replacing Java code. Full Java + Virtual Machine (JVM) capabilities and dynamic typing represent an effective tooling that will appeal to + developers and admins alike.

+ + + +

Year 2017 marks the 20th anniversary since JDBC was introduced in JDK 1.1. That new JDBC API has proven + exceptionally successful and has revolutionized the way applications connect to database—and continues to + thrive in the upcoming JDBC 4.3, which is a part of Java 9. Nashorn proves to be excellent language to remove much + of the verbosity from the JDBC API, especially when combined with new features of Java 8.

+ +

Building on the regular driver manager in java.sql and the datasource in javax.sql, Nashorn can often position itself as a glue language between different database + engines, operating systems, and applications. Database context has been made much more relevant with the + announcement of Oracle Database 12c Release 2, where Nashorn plays an integral part of the dbms_java and dbms_javascript subsystems.

+ +

Oracle Universal Connection Pool is a mission-critical pool implementation that allows massive optimization of + connection traffic, and it is focused on reducing the database footprint while providing a sufficient number of + application endpoints. Nashorn can be a perfect tool for experimenting with and fine-tuning the pools, validating + their state, and imitating real application traffic for troubleshooting purposes.

+ +

Oracle SQL Developer Command Line (Oracle SQLcl) is a free command-line interface for Oracle Database and a + next-generation SQL*Plus replacement implemented in Java. Besides thorough compatibility with its predecessor, the + tool embeds ScriptEngine directly so that new extensions can be implemented in any Java + Virtual Machine (JVM) language. Nashorn, being part of the JDK, makes an excellent choice for extending Oracle + SQLcl by allowing out-of-the box use of ScriptEngine.

+ Nashorn Database Connectivity + +

Oracle Database 12c Release 2 was the first release to ship with a JDBC 4.2–compatible driver + (ojdbc8.jar) and it included enhancements to type support, ref cursors, and large + datasets. Using Nashorn to interact with a database guarantees the same level of compatibility and functionality + as if full-fledged Java code was applied.

+ +

Oracle Database drivers can be downloaded from the Oracle Technology Network JDBC/UCP Download Page + (see Figure 1) and the MySQL Connector/J driver can be obtained from the MySQL + Developer Zone.

+ +

Figure 1. Oracle Technology Network JDBC/UCP download page.

+ +

Because JDBC is designed for compatibility, Oracle Database drivers and JDBC drivers are all compatible with each + other between Oracle Database releases 11.2, 12.1, and 12.2. It is recommended, though, to use a JDBC driver of at + least the version of the database to have access to new features. Listing 1 illustrates an easy way to check the + exact version of the JDBC driver.

+ +
+
+
Copy
+
+ +
+  $ java -jar ojdbc8.jar
+  Oracle 12.2.0.1.0 JDBC 4.2 compiled with javac 1.8.0_91 on Tue_Dec_13_06:08:31_PST_2016
+  #Default Connection Properties Resource
+  #Tue Sep 19 16:58:25 CEST 2017
+  
+
+ +

Listing 1. Checking the JDBC driver version.

+ +

JDBC covers all aspects of accessing a database in a generic and unified way, enduring as the primary database + API of all major database engines. The consistency and portability of this API can be often used for integrating + systems, and Nashorn proves to be a very capable player on this front. Listing 2 demonstrates a sample + implementation of a helper JDBC module that allows easy database access, support for named bind parameters, IN/OUT + parameters, and JDBC 4.2 large updates. Named binds are tokenized with a private parse + function, whereas the ndbc.run function handles routing of different types of SQL + statements. The implementation responsible for handling queries illustrates a straightforward way of implementing + streams on top of existing APIs by converting the result set iterator to a stream, using spliterator.

+ +
+
+
Copy
+
+ +
+  /* ndbc.js */
+  (function() {
+    this.config = new java.util.HashMap() {
+      setAutoCommit: false,
+      setLoginTimeout: java.time.Duration.ofSeconds(5),
+      setQueryTimeout: java.time.Duration.ofSeconds(30),
+      setFetchSize: 5000
+    };
+  
+    this.connect = function(url, user, password, driver) {
+      driver != null && java.lang.Class.forName(driver);
+      java.sql.DriverManager.setLoginTimeout(config.setLoginTimeout.getSeconds());
+      var conn = java.sql.DriverManager.getConnection(url, user, password);
+      conn.setAutoCommit(config.setAutoCommit);
+      return conn;
+    };
+  
+    var parse = function(sql, params) {
+      var token = /[:]([A-Za-z_]+)/g;
+      var matches = sql.match(token);
+      var sqla = (matches == null) ? sql : sql.replace(token, "?");
+      var binds = { in : {}, out: {} };
+      for (var i = 0; matches != null && i < matches.length; i++)
+        binds[matches[i][1] != "_" ? "in" : "out"][i + 1] = matches[i].substring(1);
+      return {sql: sqla, binds: binds };
+    };
+  
+    var prepare = function(conn, sql, params, isproc) {
+      var parsed = parse(sql, params);
+      var stmt = conn[isproc ? "prepareCall" : "prepareStatement"](parsed.sql);
+      for (var k in parsed.binds.in) {
+        stmt.setObject(parseInt(k), params[parsed.binds.in[k]]);
+      }
+      for (var k in parsed.binds.out) {
+        stmt.registerOutParameter(parseInt(k), params[parsed.binds.out[k]]);
+      }
+      return stmt;
+    };
+  
+    this.run = function(conn, sql, params) {
+      var stmt, keyword = sql.toLowerCase().split(" ").shift();
+      switch (keyword) {
+        case "select":
+          var stmt = prepare(conn, sql, params);
+          stmt.setFetchSize(config.setFetchSize);
+          stmt.setQueryTimeout(config.setQueryTimeout.getSeconds());
+          return stream(stmt).collect(java.util.stream.Collectors.toList());
+        case "insert":
+        case "update":
+        case "delete":
+          var stmt = prepare(conn, sql, params);
+          var exec = ("executeLargeUpdate" in stmt) ? "executeLargeUpdate" : "executeUpdate";
+          return stmt[executeUpdate]();
+        default:
+          var stmt = prepare(conn, sql, params, true);
+          var parsed = parse(sql, params);
+          stmt.execute();
+          var ret = new java.util.HashMap();
+          for (var k in parsed.binds.out) {
+            var result = stmt.getObject(parseInt(k));
+            if (result instanceof java.sql.ResultSet) {
+              var rows = stream(null, result).collect(java.util.stream.Collectors.toList());
+              ret[parsed.binds.out[k]] = rows;
+            } else {
+              ret[parsed.binds.out[k]] = result;
+            }
+          }
+          return ret;
+      }
+    };
+  
+    this.stream = function(stmt, rs) {
+      var rs = (rs == null) ? stmt.executeQuery() : rs;
+      var iter = new java.util.Iterator() {
+        hasNext: function() {
+          var next = rs.next();
+          if (!next) {
+            stmt != null && stmt.close();
+            rs.close();
+          }
+          return next;
+        },
+        next: function() {
+          var cols = new java.util.LinkedHashMap();
+          var meta = rs.getMetaData();
+          for (var i = 1, type = meta.getColumnTypeName(i); i <= meta.getColumnCount(); i++) {
+            if (type.equals("CLOB")) {
+              var clob = rs.getClob(i);
+              var value = clob.getSubString(1, parseInt(clob.length()));
+            } else {
+              var value = rs.getObject(i);
+            }
+            cols[meta.getColumnLabel(i)] = value;
+          }
+          return cols;
+        }
+      };
+  
+      var spl = java.util.Spliterators.spliteratorUnknownSize(iter, java.util.Spliterator.ORDERED);
+      return java.util.stream.StreamSupport.stream(spl, false);
+    };
+  
+    return this;
+  })();
+  
+
+ +

Listing 2. Generic Nashorn JDBC helper module with JDBC 4.2 extensions and Java 8 streams + support.

+ +

Output PL/SQL parameters for the NDBC module must start with an underscore and also define the return JDBC type. + In Listing 3, output parameter _list is mapped to CURSOR so + that multiple rows can be returned from the procedure. All queries are fetched as lists of maps so that + Nashorn's attribute access can be used in addition to the standard [] operator. The in Listing 3 runs on + Docker image store/oracle/enterprise:12.2.0.1 and, therefore, Docker machine IP 192.168.99.100 is provided in the Transparent Network Substrate (TNS) address. The last variable displays the newest user in a pluggable database with CON_ID=3.

+ +
+
+
Copy
+
+ +
+  $ jjs -J-Djava.class.path=ojdbc8.jar
+  
+  jjs> var ndbc = load('ndbc.js');
+  
+  jjs> var conString = 'jdbc:oracle:thin:@192.168.99.100:1521/ORCLCDB.localdomain';
+  
+  jjs> var c1 = ndbc.connect(connString, 'system', '****', 'oracle.jdbc.pool.OracleDataSource');
+  
+  jjs> var last = ndbc.run(c1, 'select max(created) created from cdb_users where con_id=:pdb', {pdb: 3});
+  
+  jjs> last
+  [{CREATED=2017-09-16 22:36:39.0}]
+  
+  jjs> var cursor = Packages.oracle.jdbc.OracleTypes.CURSOR;
+  
+  jjs> var pdbs = ndbc.run(c1, 'begin open :_list for select pdb_name from cdb_pdbs; end;', {_list: cursor});
+  
+  jjs> pdbs
+  {_list=[{PDB_NAME=PDB$SEED}, {PDB_NAME=ORCLPDB1}]}
+  
+
+ +

Listing 3. Using NDBC with named binds and OUT parameters on Oracle Database 12c + Release 2.

+ Scripting Oracle Universal Connection Pool + +

A connection pool is vital for healthy database connection management, with failing or excessive + connections often ranked as the number-one issue on the troubleshooting list. Oracle Universal Connection Pool + (Oracle UCP) provides a robust pool implementation that supports all kinds of connections, whether they are + JDBC, LDAP, or others that can be load balanced, recycled, and efficiently maintained during their + lifetime.

+ +

For diagnosing connection pools, continuous rebuilding of Java programs might simply become insufficient + if many multiple features and capabilities are put into use. This in turn offers a perfect opportunity for + experimentation with Nashorn to achieve the desired pool behavior and configuration. Listing 4 provides a + definition of an Oracle UCP ping program called ucp.js with a sample configuration + file called pool.properties containing a definition of the pool parameters. All + requests made to the pool are measured implicitly by the Oracle UCP driver and can be queried during or after + program is run. This client uses the ExecutorService framework for asynchronous calls + on the pool.

+ +
+
+
Copy
+
+ +
+  
+  /* pool.properties */
+  # connection
+  URL = jdbc:oracle:thin:@192.168.99.100:1521/ORCLCDB.localdomain
+  user = system
+  password = Oradoc_db1
+  threads = 15
+  connections = 100
+  sleep = 5
+  query = select sys_context('userenv', 'instance_name') from dual
+  # ucp
+  connectionFactoryClassName = oracle.jdbc.pool.OracleDataSource
+  connectionWaitTimeout = 10
+  inactiveConnectionTimeout = 10
+  initialPoolSize = 10
+  maxConnectionReuseCount = 1000
+  maxConnectionReuseTime = 0
+  maxPoolSize = 20
+  maxStatements = 5000
+  minPoolSize = 20
+  minPoolSize = 5
+  timeoutCheckInterval = 5
+  timeToLiveConnectionTimeout = 0
+  validateConnectionOnBorrow = false
+  
+  /* ucp.js */
+  var pds = Packages.oracle.ucp.jdbc.PoolDataSourceFactory.getPoolDataSource();
+  var props = new java.util.Properties();
+  props.load(new java.io.FileInputStream(arguments[0]));
+  pds.setConnectionFactoryClassName(props.connectionFactoryClassName);
+  pds.setURL(props.URL);
+  pds.setConnectionProperties(props);
+  var format = java.lang.String.format;
+  print(format("\nPinging %s using %s:", props.URL, arguments[0]));
+  
+  var worker = function() {
+    try {
+      var conn = pds.getConnection();
+      var stmt = conn.createStatement();
+      stmt.setQueryTimeout(5);
+      stmt.setFetchSize(500);
+      var before = java.time.Instant.now();
+      var rs = stmt.executeQuery(props.query);
+      while (rs.next()) var inst = rs.getObject(1);
+      var duration = java.time.Duration.between(before, java.time.Instant.now()).toMillis();
+      java.lang.Thread.sleep(parseInt(props.sleep) * Math.random() * 1000);
+      print(format("Reply from %s time=%sms result=%s", java.lang.Thread.currentThread(), duration, inst));
+      rs.close();
+      stmt != null && stmt.close();
+      conn.close();
+    } catch (e) {
+      throw java.lang.RuntimeException(e);
+    }
+  };
+  
+  var exec = java.util.concurrent.Executors.newFixedThreadPool(parseInt(props.threads));
+  for (var t = 0; t < parseInt(props.connections); t++) {
+    exec.submit(new java.lang.Runnable() {
+      run: worker
+    });
+  }
+  exec.shutdown();
+  exec.awaitTermination(60, java.util.concurrent.TimeUnit.SECONDS);
+  
+  var stats = pds.getStatistics();
+  print(format("\nPing statistics for %s:\n  Connections = %s, Borrowed = %s, Closed = %s", props.URL, stats.getTotalConnectionsCount(), stats.getBorrowedConnectionsCount(), stats.getConnectionsClosedCount()));
+  print(format("Connection creation wait times in milli-seconds:\n  Average = %sms, Maximum = %sms", stats.getAverageConnectionWaitTime(), stats.getPeakConnectionWaitTime()));
+  
+
+ +

Listing 4. Oracle UCP "ping" utility for diagnosing correct connection pool + behavior.

+ +

Aside from the PoolDataSource and ExecutorFramework, the code in Listing 4 also introduces a reference to a new Java 8 + calendar framework introduced under JEP 150: Date & + Time API. This new API has been redesigned from the ground up and brings seamless date/time handling to + all JVM languages, Nashorn included.

+ +

The invocation and result of the sample test of the Oracle UCP pool is shown in Listing 5.

+ +
+
+
Copy
+
+ +
+  
+  $ jjs "-J-Djava.class.path=lib/ojdbc8.jar:lib/ucp.jar" ucp.js -- pool.properties
+  
+  Pinging jdbc:oracle:thin:@192.168.99.100:1521/ORCLCDB.localdomain using 
+  pool.properties:
+  Reply from Thread[pool-1-thread-3,5,main] time=85ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-5,5,main] time=2ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-6,5,main] time=91ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-7,5,main] time=86ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-8,5,main] time=22ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-1,5,main] time=3ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-10,5,main] time=7ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-2,5,main] time=8ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-9,5,main] time=90ms result=ORCLCDB
+  Reply from Thread[pool-1-thread-4,5,main] time=2ms result=ORCLCDB
+  
+  Ping statistics for jdbc:oracle:thin:@192.168.99.100:1521/ORCLCDB.localdomain:
+      Connections = 10, Borrowed = 0, Closed = 0
+  Connection creation wait times in milli-seconds:
+      Average = 606ms, Maximum = 856ms
+  
+
+ +

Listing 5. Testing Oracle UCP with Nashorn.

+ Scripting Oracle Database + +

When Oracle Database 12c Release 2 came along, Nashorn earned yet another credit of trust by becoming + the language of choice for scripting the JVM that runs inside Oracle Database. This functionality was enabled by + the introduction of Oracle's Java 8–based JVM.

+ +

Before running the code, the JVM release that's running inside Oracle Database can be confirmed with select dbms_java.get_ojvm_property(propstring=>'java.version') from dual. With + the database version 12.2.0.1 used in this article, version 1.8.0_121 is returned.

+ +

Listing 6 shows the deployment of a Java procedure responsible for invoking Nashorn inside Oracle Database with + the required privileges on the DEMO user. Aside from the comprehensive DBJAVASCRIPT role that allows execution of Nashorn code, a getClassLoader permission makes the code generic in a way that the script name can be + provided at runtime.

+ +
+
+
Copy
+
+ +
+  
+  SQL> connect sys/****@orclpdb1 as sysdba
+  
+  SQL> create user demo identified by demo;
+  
+  SQL> grant create session, resource, dbjavascript to demo;
+  
+  SQL> begin
+    2  dbms_java.grant_permission('DEMO', 'SYS:java.lang.RuntimePermission',
+    3  'getClassLoader', '');
+    4  end;
+    5  /
+  
+  -- runscript.sql
+  create or replace and compile java source named "RunScript" as
+  
+  import javax.script.*;
+  import java.net.*;
+  import java.io.*;
+  
+  public class RunScript {    
+      public static String eval(String script, String code) throws Exception {
+          String output = new String();
+          ClassLoader loader = Thread.currentThread().getContextClassLoader();	
+          try (
+              InputStream is = loader.getResourceAsStream(script);
+              Reader reader = new InputStreamReader(is)
+          ) {
+              ScriptEngineManager factory = new ScriptEngineManager();
+              ScriptEngine engine = factory.getEngineByName("javascript");
+              engine.eval(reader);
+              output = engine.eval(code).toString();
+          } catch(Exception e) {
+              output = e.getMessage();
+          }
+          return output;
+      }
+  }
+  /
+  
+  create or replace function runscript(script in varchar2, code in varchar2) 
+  return varchar2 as language java 
+  name 'RunScript.eval(java.lang.String, java.lang.String) return java.lang.String'
+  /
+  
+
+ +

Listing 6. Helper ScriptEngine objects in Java and + PL/SQL.

+ +

At this point, Nashorn scripts could be loaded into the database by using the loadjava + facility, for example, loadjava -v -u demo/demo@orclpdb1 runscript.js uploads runscript.js into the Nashorn schema on pluggable database ORCLPDB1. The newly created runscript SQL function takes the + name of a loaded script and code to execute as two VARCHAR2 parameters.

+ +

For procedures without a return value, the even simpler API of dbms_javascript.run() + can be used to quickly invoke Nashorn scripts without the need for intermediary functions. This method takes the + name of a loaded script as the first and only argument. To enable the output of such scripts, both server output + and Java output must be enabled (that is, set serveroutput on and dbms_java.set_ouput() must be invoked prior to executing dbms_javascript.run scripts).

+ Extending Oracle SQL Developer Command Line + +

The task of porting SQL*Plus to Java to modernize the tool and provide new capabilities brought one feature + bigger than all the others combined: extensibility via ScriptEngine. Nashorn, being a + part of Java SE became a natural fit for implementing these extensions and has already received a lot of traction + from both vendors and the community.

+ +

Seamless integration of Nashorn into Oracle SQLcl is based on the top-level SCRIPT + command, which loads the script under the first argument, assuming .js if only the base + file name is given. Execution of external scripts would make little sense if the current connection context was + not accessible, so Oracle SQLcl exposes the command, connection, and terminal to the script. Oracle SQLcl scripts + can be executed directly as external processors or they can implement new functionality through one of the oracle.dbtools.raptor subtypes.

+ +

Listing 7 exhibits the implementation of a new top-level Oracle SQLcl function called JSON that transforms a SQL result set into valid JSON representation using JavaScript's + built-in JSON.stringify function. Oracle SQLcl's CommandRegistry and CommandListener are used to bind the jsonCommand handler to the JSON command. To load this script in Oracle SQLcl, a simple + script json.js command will enable the extended functionality. +

+ +
+
+
Copy
+
+ +
+  
+  /* json.js */
+  var jsonCommand = function(ctx, conn, sql) {
+      print(sql);
+      var ret = util.executeReturnList(sql, null);
+      var rows = [];
+      for (var r in ret) {
+          var cols = {};
+          for (var k in ret[r]) cols[k] = ret[r][k];
+          rows.push(cols);
+      } 
+      print(JSON.stringify(rows, null, 4));
+  }
+  
+  var handle = function(conn, ctx, cmd) {
+      if (cmd.getSql().toLowerCase().startsWith('json')) {
+          jsonCommand(ctx, conn, cmd.getSql().replace(/^json\s/i, ''));
+          return true;
+      }
+      return false;
+  }
+  
+  var CommandRegistry = Java.type("oracle.dbtools.raptor.newscriptrunner.CommandRegistry");
+  var CommandListener = Java.type("oracle.dbtools.raptor.newscriptrunner.CommandListener");
+  
+  var JsonCommand = Java.extend(CommandListener, {
+      handleEvent: handle,
+      beginEvent: function() {},
+      endEvent: function() {}
+  });
+  
+  CommandRegistry.addForAllStmtsListener(JsonCommand.class);
+  
+
+ +

Listing 7. Implementing the new Oracle SQLcl command to output results as valid + JSON.

+ +

Many more ways of using the SCRIPT command exist. Because Oracle SQLcl exposes the + Connection object to the script, it is possible to use it for comparing, migrating, + unloading, exporting, or crunching data in various ways. Because all Oracle SQLcl scripts have access to its + classpath, built-in libraries that ship with Oracle SQLcl can be used, for example: +

+ + + Summary + +

Thanks to the ubiquity and popularity of JDBC, Nashorn turns out to be an excellent language for database work. + Without the need for compile-time dependencies that require the setup of Maven or Gradle, all that is needed for + running high-performing JVM bytecode is a JAR driver and a script. Combined with the portability of code, Nashorn + scripts will work on every operating system and every JDBC-compatible database, without the need for porting or + recompiling. That agility is often sought after in today's dynamic software projects, especially when it + doesn't sacrifice the performance and reliability of the JVM.

+ +
+

About the Author

+ +
+
Przemyslaw Piotrowsk is principal software engineer with 10+ years of + experience in design, development and maintenance of database systems. He is an Oracle Database 11g + Certified Master, an Oracle Database 12c Certified Master, and an Oracle Database Cloud Certified + Master, focusing on database engineering and infrastructure automation.
+
+
+
+
+ + \ No newline at end of file diff --git a/Articles/databases/109-learning-r-for-pl-sql-developers-part-3.html b/Articles/databases/109-learning-r-for-pl-sql-developers-part-3.html index e69de29..161fb4a 100644 --- a/Articles/databases/109-learning-r-for-pl-sql-developers-part-3.html +++ b/Articles/databases/109-learning-r-for-pl-sql-developers-part-3.html @@ -0,0 +1,3632 @@ + + +
+
+

Part 1 | Part 2 | Part 3 of a series + that presents an easier way to learn R by comparing and contrasting it to PL/SQL.

+ +

Welcome to the third installment of this series. In the previous two installments, you learned how to + use variables of different types and various kinds of loops and decision logic.

+ +

In this installment, you will lean how to define various types of collections of data, not just + atomic variables. Being a data-oriented language, collections are vital in R because most operations + are done on collections, not on individual pieces of data.

+ +

At the end of this article, there is a summary of what you learned followed by a + quiz you can use to test what you've learned. +

+ +

Introduction

+ +

There are five types of collections in R. In a nutshell, they are different representations of + collections of data in R.

+ + + +

We will examine each of them and how they are used in different situations. As with the previous + installments in this series, you will learn these compared to the equivalent in PL/SQL.

+ +

But first, let's explore the options available for collections in PL/SQL before exploring the + same in R.

+ +

There are three basic kinds of collections in PL/SQL:

+ + + +

Nested Tables in PL/SQL

+ + +

These are just a list of items of the same data type. The list is ordered and the elements + can be addressed by their position in the array. There is no "index" to reference them by; + the only index is their position. Therefore, this type of data can't be used for key-value + pairs.

+ +

Let's see an example where we will store the days of the week to a variable.

+ +
+
+
Copy
+
+ +
+
+-- pl1.sql
+declare
+   type ty_tabtype is table of varchar2(30);
+   l_days_of_the_week ty_tabtype;
+begin
+   l_days_of_the_week := ty_tabtype (
+          'Sun',
+          'Mon',
+          'Tue',
+          'Wed',
+          'Thu',
+          'Fri',
+          'Sat'
+   );
+   -- let's print the values
+   for d in l_days_of_the_week.FIRST .. l_days_of_the_week.LAST loop
+          dbms_output.put_line ('Day '||d||' = '||
+                  l_days_of_the_week (d)
+           );
+   end loop;
+end;
+
+
+ +

Here's the output:

+ +
+
+
Copy
+
+ +
+
+Day 1 = Sun
+Day 2 = Mon
+Day 3 = Tue
+Day 4 = Wed
+Day 5 = Thu
+Day 6 = Fri
+Day 7 = Sat
+
+
+ +

Note that the elements are all of the same data type, which, in this case is varchar2(30). You can't mix any other data type in. But you can keep + adding values, as you need, to this list.

+ +

Varrays in PL/SQL

+ + +

The second type of collections in PL/SQL is called varray. It's similar to nested tables, except + that the maximum number of elements is fixed. The actual number of elements vary and depend on the + elements added or subtracted at runtime. Here is an example:

+ +
+
+
Copy
+
+ +
+
+-- p2.sql
+declare
+   type ty_weekdays_list is varray(7) of varchar2(3);
+   v_week_days ty_weekdays_list;
+   v_count number;
+begin
+   v_week_days := ty_weekdays_list('Sun','Mon','Tue','Wed','Thu','Fri','Sat');
+   v_count := v_week_days.count;
+   for i in 1..v_count loop
+          dbms_output.put_line(i||'='||v_week_days(i));
+   end loop;
+end;
+
+
+ +

Note: The values are also of the same data type.

+ +

Associative Arrays in PL/SQL

+ + +

The third type of collection is an associative array, which is also called a PL/SQL table. An + associative array is an arbitrary collection of keys and values. The important properties of + associative arrays are

+ + + +

You have to reference the elements by an index, not by position. Therefore, associative arrays are + also called unordered lists.

+ +

To demonstrate a use case, let's see an example where we need to hold some book titles and their + authors' names using a key-value pair structure. We build an associative array of varchar2(30) values (the "value" part of key-value pair) indexed + by another varchar2 (the "key" part of the key-value pair). The + book title is the key and the author's name is the value. We will initially populate this array + with four books, as shown below:

+ + + + + + + + + + + + + + + + + + + + + + + + +
BookAuthor
Pride and PrejudiceJane Austen
1984George Orwell
Anna KareninaLeo Tolstoy
Adventures of Tom SawyerMark Twain
+ +

In the programs below, we will populate the array and then select from the array.

+ +
+
+
Copy
+
+ +
+
+--pl3.sql
+declare
+   type ty_tabtype is table of varchar2(30)
+          index by varchar2(30);
+  l_books        ty_tabtype;
+  i              varchar2(30);
+begin
+   l_books ('Pride and Prejudice') := 'Jane Austen';
+   l_books ('1984') := 'George Orwell';
+   l_books ('Anna Karenina') := 'Leo Tolstoy';
+   l_books ('Adventures of Tom Sawyer') := 'Mark Twain';
+   --
+   -- now let's display the books and their authors
+   --
+   i := l_books.first;
+   while i is not null loop
+     dbms_output.put_line('Author of '||i||' = '||
+                  l_books (i));
+      i := l_books.next(i);
+   end loop;
+end;
+
+
+ +

Here is the output:

+ +
+
+
Copy
+
+ +
+
+Author of 1984 = George Orwell
+Author of Adventures of Tom Sawyer = Mark Twain
+Author of Anna Karenina = Leo Tolstoy
+Author of Pride and Prejudice = Jane Austen
+
+
+ +

Now Let's see how these PL/SQL collections translate to R.

+ +

Vectors in R

+ + +

Features of many R collections overlap those provided by PL/SQL collections; so I am going to start + with an explanation of R collections and show their equivalence to PL/SQL collections.

+ +

Take the first one: nested table in PL/SQL. In the PL/SQL example, we used the days of the week. This + is a one-dimensional array. In R, this is called a vector. Actually, a vector is a lot + more, but we will start with the basic explanation.

+ +

Here is how we define the same days of the week in R.

+ +

Note the use the function c(), which creates a vector.

+ +
+
+
Copy
+
+ +
+
+> dow <- c('Sun','Mon','Tue','Wed','Thu','Fri','Sat')
+> dow
+[1] "Sun" "Mon" "Tue" "Wed" "Thu" "Fri" "Sat"
+
+
+ +

All the elements of the vector must be of the same data type. If you give multiple data types, R will + convert all of them to the superset data type that can accommodate all the values. You can access + the elements of a vector by their position.

+ +
+
+
Copy
+
+ +
+
+> dow[1]
+[1] "Sun"
+
+
+ +

Notice that the first position is indexed as "1," just as in PL/SQL; it is not indexed as + "0," which is the case in other programming languages such as Python or C. You don't + have to remember how many elements are present in the vector. You can use the length() function.

+ +
+
+
Copy
+
+ +
+
+> length(dow)
+[1] 7
+
+
+ +

Note the c function; that's what defines a vector. Let's explore + it with a simpler example of a vector of 10 numbers from 1 to 10.

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(1,2,3,4,5,6,7,9,10)
+
+
+ +

You can also define that vector using the range notation you learned in Part 2 of this series.

+ +
+
+
Copy
+
+ +
+
+> v1 <- 1:10
+
+
+ +

This is similar to the FOR i IN 1..20 LOOP construct in PL/SQL. If you + check the data type of the variable v1, you will notice that it is + labeled "numeric":

+ +
+
+
Copy
+
+ +
+
+> class(v1)
+[1] "numeric"
+
+
+ +

That's because the elements of the vector are numeric. So, how do you know if a variable is a + vector? Use a built-in function:

+ +
+
+
Copy
+
+ +
+
+> is.vector(dow)
+[1] TRUE
+
+
+ +

Vectors allow operations as a whole. Here is how you multiply all the elements of the vector by 2 at + once:

+ +
+
+
Copy
+
+ +
+
+> v2 <- v1 * 2
+> v2
+
+[1]  2  4  6  8 10 12 14 18 20
+
+
+ +

And here is how you subtract one vector from another:

+ +
+
+
Copy
+
+ +
+
+> v2 - v1
+[1]  1  2  3  4  5  6  7  9 10
+
+
+ +

You can check if a value is present in a vector, that is, whether it is one of the elements:

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(1,2,3,4,5,6,7,9,10)
+> 4 %in% v1
+[1] TRUE
+
+
+ +

This returned TRUE because 4 is one of the elements of the vector variable + v1. Likewise, you can check whether the elements of a vector can be found + on another vector. +

+ +

Let's define another vector called v2:

+ +
+
+
Copy
+
+ +
+
+> v2 <- c(1,2)
+> v2 %in% v1
+[1] TRUE TRUE
+
+
+ +

It returned TRUE for both values because both 1 and 2 (elements of vector + v2) are found in vector v1. By the way, the + positions of the vectors are not important. +

+ +

Let's see another vector, v3, with two elements (4 and 5) in the first + and second elements. The following checks for the presence of (4,5) in + vector v1.

+ +
+
+
Copy
+
+ +
+
+> v3 <- c(4,5)
+> v2 %in% v1
+[1] TRUE TRUE
+
+
+ +

Let's extend the example by using another vector, (11,5), and checking + if the values are present in v1.

+ +
+
+
Copy
+
+ +
+
+> v3 <- c(11,5)
+> v3 %in% v1
+[1] FALSE  TRUE
+
+
+ +

The first element of v3, 11, is not present in v1; so + the return value for that is FALSE. The + second element, 5, was found in v1; so the return value is TRUE.

+ +

Vectors have elements of the same data type, not of different data types. What if you combine + multiple data types in a vector? Let's see:

+ +
+
+
Copy
+
+ +
+
+> v4 <- c(1,2,'Three')
+> v4
+[1] "1"     "2"     "Three"
+
+
+ +

Notice how R converted everything to character format, even if you input only numeric values for the + first two elements. If you check the data types, as follows:

+ +
+
+
Copy
+
+ +
+
+> class(v4)
+[1] "character"
+
+
+ +

They will show as character. If you combine logical values, they will be + converted to numbers.

+ +
+
+
Copy
+
+ +
+
+> v5 <- c(1,2,TRUE,FALSE)
+> v5
+[1] 1 2 1 0
+
+
+ +

Note how TRUE and FALSE have been converted to + 1 and 0, respectively. If you introduce a + character, all the items will be converted to characters. +

+ +
+
+
Copy
+
+ +
+
+> v5 <- c(1,2,'Three',TRUE,FALSE)
+> v5
+[1] "1"     "2"     "Three" "TRUE"  "FALSE"
+
+
+ +

Note how the logical values TRUE and FALSE were + merely converted to their character representations—not to 1 and + 0, as was done earlier. +

+ +

There are special operations on the elements of vector where all the elements are of logical values. + Recall from Part 1 that "|" is the logical OR operator and "&" is the + logical AND operator.

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(TRUE,TRUE,FALSE,FALSE)
+> v2 <- c(TRUE,FALSE,TRUE,FALSE)
+> v1
+[1]  TRUE  TRUE FALSE FALSE
+> v2
+[1]  TRUE FALSE  TRUE FALSE
+> v1 & v2
+[1]  TRUE FALSE FALSE FALSE
+> v1 | v2
+[1]  TRUE  TRUE  TRUE FALSE
+
+
+ +

There are two more operators, "&&" and "||," which are similar to + "&" and "|," respectively; but they operate on only the first element of the + vector. Therefore, they return only one value:

+ +
+
+
Copy
+
+ +
+
+> v1 && v2
+[1] TRUE
+
+
+ +

This returns TRUE because the first elements of both vectors v1 and v2 are TRUE, + and TRUE & TRUE equals TRUE.

+ +
+
+
Copy
+
+ +
+
+> v1 = c(1,2,3)
+> v2 = c(10,20,30)
+> v1 * v2
+[1] 10 40 90
+> v2 - v1
+[1] 9 18 27
+> c(v1,v2)
+[1] 1 2 3 10 20 30
+
+
+ +

You can't operate on vectors of different lengths together.

+ +
+
+
Copy
+
+ +
+
+> v3 = c(1,2)
+> v3 * v1
+[1] 1 4 3
+Warning message:
+In v3 * v1 :
+longer object length is not a multiple of shorter object length
+
+
+ Accessing Values in R Vectors + +

You access the elements of the vector using square brackets.

+ +
+
+
Copy
+
+ +
+
+> v1 [1]
+[1] 1
+
+
+ +

The negative index has a different connotation in R than what you might be used to. It means + excluding the values in that place. For instance, -1 means remove the first element and + return the rest. It does not mean starting from the end, as seen in some languages + such as Python. +

+ +
+
+
Copy
+
+ +
+
+> v1 [-1]
+[1] 2 3
+
+
+ +

If an index is not valid, R returns NA, which is the equivalent of NULL in + PL/SQL.

+ +
+
+
Copy
+
+ +
+
+> v1 [9]
+[1] NA
+
+
+ +

If you want to access a contiguous set of values from a vector, use the m:n + notation:

+ +
+
+
Copy
+
+ +
+
+> v2 [2:3]
+[1] 20 30
+
+
+ +

What if you want to access a discrete set of values—not a contiguous set of values—from a + vector, for example, the first and third values? You would think you'd use something like this, + wouldn't you?

+ +
+
+
Copy
+
+ +
+
+> v1[1,3]
+
+
+ +

Unfortunately, that will throw an error:

+ +
+
+
Copy
+
+ +
+
+Error in v1[1, 3] : incorrect number of dimensions
+
+
+ +

What happened? The m,n notation is used for something else you will learn + later. To access the discrete elements, you will need to pass the indexes as a vector:

+ +
+
+
Copy
+
+ +
+
+> v2[c(1,3)]
+[1] 10 30
+
+
+ +

Interestingly, indexes can be logical boolean values as well.

+ +
+
+
Copy
+
+ +
+
+> v2
+[1] 10 20 30
+> v3 = c(T,F,T)
+> v2[v3]
+[1] 10 30
+
+
+ +

In this case, the first element of v3 is T, + that is, TRUE; so it gets the first element of v2. The + second element of v3 is F, or FALSE; so the second element of v2 (which is 20) is not displayed. The third + element of v3 is T; so the third element + (30) is displayed.

+ Features of R Vectors That Are Similar to PL/SQL Associative Arrays + +

Remember associative arrays in PL/SQL, which allow you to assign names to variables, such as a + key-value pair? The example we used was book titles and author names. Life is so much simpler if you + can look up the element by its key instead of trying to figure out its position and access it via + the positional index, doesn't it?

+ +

Vectors can do that as well. You do that by a labeling the elements, or, in R terms, naming the + elements. Let's see an example of sales figures for a company in different quarters. +

+ +
+
+
Copy
+
+ +
+
+> sales <- c(100,200,150,75)
+
+
+ +

This doesn't tell us the whole story. Does it start with Quarter 1? Does 100 correspond to + Quarter 1 or Quarter 4? We can make this clear by naming the elements:

+ +
+
+
Copy
+
+ +
+
+> quarters <- c('Quarter 1','Quarter 2','Quarter 3','Quarter 4')
+> names(sales) <- quarters
+
+
+ +

Now, if we display the sales variable, we get the following:

+ +
+
+
Copy
+
+ +
+
+> sales
+Quarter 1 Quarter 2 Quarter 3 Quarter 4
+      100       200       150        75
+
+
+ +

Note how the values have labels now, making the meaning clearer. Another way to name the elements is + to assign the names while creating the vector itself.

+ +
+
+
Copy
+
+ +
+
+> sales <- c('Quarter 1'=100,'Quarter 2'=200,'Quarter 3'=150,'Quarter 4'=75)
+> sales
+Quarter 1 Quarter 2 Quarter 3 Quarter 4
+      100       200       150        75
+
+
+ +

When elements are labeled, you can use the labels to access the elements, not the positional index, + exactly how a key-value pair would be accessed:

+ +
+
+
Copy
+
+ +
+
+> sales['Quarter 3']
+Quarter 3
+      150
+
+
+ +

But when you write a program, you just need the value; not the label itself, which can be source of + error. To suppress it, you use the double square brackets syntax.

+ +
+
+
Copy
+
+ +
+
+> sales[['Quarter 3']]
+[1] 150
+
+
+ +

Vectors are the most used data type in R. Behind the scenes, all the data types are just vectors of + some sort. You can compare the vectors to an atomic data type.

+ +

Factors in R

+ + +

Vectors provide a way to create a collection of values, which are important in our data manipulation. + Sometimes these values are merely identifiers. Take for instance the following vector:

+ +
+
+
Copy
+
+ +
+
+> deptname <- c("Marketing and Public Relations", "Finance and Treasury", "Engineering and Technology", "Operations and Process Control")
+
+ +

This is a pretty long variable. However these elements are merely descriptors. There is no + significance of the names relative to one another, and you will not operate on the names; but the + names will have to be used by R in their entirety. So mere pointers to these values will be + sufficient.

+ +

There is also another property of this type of variables. Consider a vector of department numbers: +

+ +
+
+
Copy
+
+ +
+
+c(10,20,30,40,50)
+
+
+ +

These are just numbers representing the department. There is nothing more to it. For instance, you + can't say that department 30 is bigger than department 20, nor can you say that there is an + average of the department numbers. These numbers are merely descriptive. If R knows about this + property, it's better for memory management and processing, because these will not just be usual + numbers; instead they will be specific values. In R, these are called factors. You can + create a factor using the factor() function on any vector.

+ +

Here is how you can create a factor of department numbers.

+ +
+
+
Copy
+
+ +
+
+> dept <- factor(c(10,20,30,40,50))
+
+
+ +

If you check the data type of this variable, you can see it's a factor.

+ +
+
+
Copy
+
+ +
+
+> class(dept)
+[1] "factor"  
+
+
+ +

This is what you see if you examine the contents:

+ +
+
+
Copy
+
+ +
+
+> dept
+[1] 10 20 30 40 50
+Levels: 10 20 30 40 50
+
+
+ +

Note the contents. They are called levels of the factor. The levels are individual + distinct values in the factor. So, if you pass the same value multiple times, it will be in + the factor but it will not be listed multiple times in levels. Here is a factor with multiple + occurrences of the same value. +

+ +
+
+
Copy
+
+ +
+
+> dept <- factor(c(10,10,10,20,20,30,40,50,50))
+> dept
+[1] 10 10 10 20 20 30 40 50 50
+Levels: 10 20 30 40 50
+
+
+ +

You can also pull up the levels of a factor explicitly by using the level() + function:

+ +
+
+
Copy
+
+ +
+
+> levels(dept)
+[1] "10" "20" "30" "40" "50"
+
+
+ +

Numbers like those in this list probably don't make much sense. So you need to assign descriptive + levels. The levels() function accomplishes that.

+ +
+
+
Copy
+
+ +
+
+> levels(dept) <- c("Marketing","Sales","Finance","Operations","IT")
+> dept
+[1] Marketing  Sales      Finance    Operations IT         IT        
+Levels: Marketing Sales Finance Operations IT
+
+
+ +

See how the factor levels are different now? You can access the elements of factor the same way as + you access matrices.

+ +
+
+
Copy
+
+ +
+
+> dept[2]
+[1] Sales
+Levels: Marketing Sales Finance Operations IT
+
+
+ +

In this example, the levels are merely descriptive; there is no comparative relationship among them. + For instance, the Marketing department is not greater than or smaller than, say, the Finance + department. If you force such a comparison, you will get an error:

+ +
+
+
Copy
+
+ +
+
+> dept[2] > dept [1]
+[1] NA
+Warning message:
+In Ops.factor(dept[2], dept[1]) : '>' not meaningful for factors
+
+
+ +

But sometimes there may be a comparative relationship. Take for instance a factor containing all the + titles in a company. The title "president" is higher than "vice president," + which is higher than "director," and so on. This is called an ordered factor. To + indicate the factor is ordered, you have to include the ordered parameter + and set it to T when creating the factor, as shown below. Here, we also + added the labels.

+ +
+
+
Copy
+
+ +
+
+> title <- factor(c('associate'=1,'manager'=2,'director'=3,'vp'=4,'president'=5), ordered = T)
+> title
+associate   manager  director        vp president 
+        1         2         3         4         5 
+Levels: 1 < 2 < 3 < 4 < 5
+
+
+ +

Note how the Levels attribute is shown now. 1 is less than 2, which is + less than 3, and so on. In this case, you can compare factors. Here is a check:

+ +
+
+
Copy
+
+ +
+
+> title["vp"] > title['manager']
+[1] TRUE
+> title["vp"] > title['president']
+[1] FALSE
+
+
+ +

In large datasets, the factors will be quite large, which makes visual comparison difficult. This + approach of declaring ordered factors comes in handy at that time.

+ +

Lists in R

+ + +

While you can see the obvious advantage of vectors, there is an important limitation: all the + elements must be of the same data type, which can be pretty huge based on the use case. So, + here comes a similar but different collection—list—which is like a vector but + the elements can be of any data type. You define a list using the function list().

+ +
+
+
Copy
+
+ +
+
+v1 <- list(1,'a',T)
+
+
+ +

Just to make sure you have created it with elements of different data types, display the v1 variable:

+ +
+
+
Copy
+
+ +
+
+> v1
+[[1]]
+[1] 1
+
+[[2]]
+[1] "a"
+
+[[3]]
+[1] TRUE
+
+
+ +

Does the output look familiar? If you notice, the double square bracket notation, [[]], came from the vector representation. The elements are all of a + different type, as you can see. But under the covers, it's just a vector of multiple vectors. + You can confirm that by checking for list and vector:

+ +
+
+
Copy
+
+ +
+
+> is.list(v1)
+[1] TRUE
+> is.vector(v1)
+[1] TRUE
+
+
+ +

The above can be written as follows:

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(c(1),c('a'),c(T))
+> v1
+[1] "1"    "a"    "TRUE"
+
+
+ +

You can create two variables and compare them:

+ +
+
+
Copy
+
+ +
+
+> v1 <- list(1,'a',T)
+> v2 <- c(c(1),c('a'),c(T))
+> v1 == v2
+[1] TRUE TRUE TRUE
+
+
+ +

They are the same. Because list is a vector of vectors, you can also define sort of a + multidimensional representation of data. I call is "sort of" just because there are better + ways to handle multiple dimensions. Here are three vectors, of numbers, characters, and logical data + types, named n1, c1, and l1 + respectively.

+ +
+
+
Copy
+
+ +
+
+n1 <- c(1,2,3,4,5)
+c1 <- c("First","Second","Third")
+l1 <- c(T,F,T,T,F)
+
+
+ +

We can create a list from all these, as shown below:

+ +
+
+
Copy
+
+ +
+
+list1 <- list(n1,c1,l1)
+
+
+ +

If you display the list, you will see its contents:

+ +
+
+
Copy
+
+ +
+
+> list1
+[[1]]
+[1] 1 2 3 4 5
+
+[[2]]
+[1] "First" "Second" "Third"
+
+[[3]]
+[1] TRUE FALSE TRUE TRUE FALSE
+
+
+ +

If you want to address the first element of the list, you will need to access it by its position:

+ +
+
+
Copy
+
+ +
+
+> list1[1]
+[[1]]
+[1] 1 2 3 4 5
+
+
+ +

This will return a vector, as expected. If you want to access the first element of this vector, you + will need to access it by index.

+ +
+
+
Copy
+
+ +
+
+> list1[[1]][1]
+[1] 1
+
+
+ +

But positional indexes can be difficult to use. It can be made easier. Remember the naming of the + elements in vectors? The same can be done in lists too, using the same function names() and passing a vector of values.

+ +

Suppose you want to name them "Numbers," "Spelled," and "Booleans," + respectively. You can use the following:

+ +
+
+
Copy
+
+ +
+
+names(list1) <- c("Numbers", "Spelled", "Booleans")
+
+
+ +

Now, if you display the variable, you will see different headers for the elements.

+ +
+
+
Copy
+
+ +
+
+> list1
+$Numbers
+[1] 1 2 3 4 5
+
+$Spelled
+[1] "First"  "Second" "Third" 
+
+$Booleans
+[1]  TRUE FALSE  TRUE  TRUE FALSE
+
+
+ +

This helps in accessing individual elements of the list.

+ +
+
+
Copy
+
+ +
+
+> list1 [1]
+$Numbers
+[1] 1 2 3 4 5
+
+
+ +

To access a specific element of the list, you can use the label as well.

+ +
+
+
Copy
+
+ +
+
+> list1 ["Numbers"]
+$Numbers
+[1] 1 2 3 4 5
+
+
+ +

If you ever want to change lists to vectors, simply use the unlist() + function:

+ +
+
+
Copy
+
+ +
+
+> unlist(list1)
+ Numbers1  Numbers2  Numbers3  Numbers4  Numbers5  Spelled1  Spelled2  Spelled3 
+      "1"       "2"       "3"       "4"       "5"   "First"  "Second"   "Third" 
+Booleans1 Booleans2 Booleans3 Booleans4 Booleans5 
+   "TRUE"   "FALSE"    "TRUE"    "TRUE"   "FALSE" 
+
+
+ +

Now each element can be addressed as in a vector. For example, list1[13]references + the thirteenth element. Sometimes we might need to do + this "flattening" of the output, as you will learn later in the series.

+ +

Matrices in R

+ + +

A matrix is a two-dimensional representation of data elements; but of the same data type. Remember, + it has to be exactly two dimensions—rows and columns—not three dimensions. You create a + matrix using the matrix() function.

+ +

In the following example, we create a matrix with 20 elements (1:20, if + you remember from Part 2, creates a sequence of 20 numbers from 1 to 20).

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(1:20)
+> m1
+[,1]
+[1,] 1
+[2,] 2
+[3,] 3
+[4,] 4
+[5,] 5
+[6,] 6
+[7,] 7
+[8,] 8
+[9,] 9
+[10,] 10
+[11,] 11
+[12,] 12
+[13,] 13
+[14,] 14
+[15,] 15
+[16,] 16
+[17,] 17
+[18,] 18
+[19,] 19
+[20,] 20
+
+
+ +

Note how it created a single column matrix, with 20 rows. If you want rows and columns, you need to + specify that. The nrow parameter specifies the number of rows of the + resultant matrix.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(1:20, nrow=4)
+> m1
+     [,1] [,2] [,3] [,4] [,5]
+[1,]    1    5    9   13   17
+[2,]    2    6   10   14   18
+[3,]    3    7   11   15   19
+[4,]    4    8   12   16   20
+
+
+ +

Note there are four columns because you mentioned that while creating the matrix. R distributed the + values and ended up with five columns.

+ +

What if you had wanted a fixed number of columns instead? You can request that with the ncol parameter.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(1:20, ncol=4)
+> m1
+     [,1] [,2] [,3] [,4]
+[1,]    1    6   11   16
+[2,]    2    7   12   17
+[3,]    3    8   13   18
+[4,]    4    9   14   19
+[5,]    5   10   15   20
+
+
+ +

You can specify both ncol and nrow. If the + values are more than required, R will discard them. Note how in the following example, only 16 of + your initial 20 values made it to the matrix of four rows and four columns.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(1:20, nrow=4, ncol=4)
+> m1
+     [,1] [,2] [,3] [,4]
+[1,]    1    5    9   13
+[2,]    2    6   10   14
+[3,]    3    7   11   15
+[4,]    4    8   12   16
+
+
+ +

By the way, matrices can be generated with discrete values as well, not just sequential values. Here + is an example.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(c('a','e','i','o','u','1'), nrow=3)
+> m1
+     [,1] [,2]
+[1,]  "a"  "o"
+[2,]  "e"  "u"
+[3,]  "i"  "1"
+
+
+ +

Note how the data has been distributed first down and then across. What if that's not what you + want? You can tell the matrix to be across rows instead of columns. Another optional + parameter, byrow, when set to TRUE, + accomplishes that.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(c('a','e','i','o','u','1'), nrow=3, byrow=T)
+> m1
+     [,1] [,2]
+[1,]  "a"  "e"
+[2,]  "i"  "o"
+[3,]  "u"  "1"
+
+
+ +

Now that you know how a matrix is defined, you might notice some annoying things about how it is + shown. The first thing you might notice is the labeling. The rows and columns do not have headers. + Instead they have just a default header with some commas and numbers, which doesn't add a lot of + value. To add labels, you can use another parameter called dimnames. The + parameter takes a vector containing the row labels and another containing the column labels.

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(c(1,2,3,4,5,6,7,8,9,10), nrow=2, byrow = T, dimnames = list(c("Row1","Row2"),c("Col1","Col2","Col3","Col4","Col5")))
+> m1
+     Col1 Col2 Col3 Col4 Col5
+Row1    1    2    3    4    5
+Row2    6    7    8    9   10
+
+
+ +

It's not just at the declaration time that you have the opportunity to mention the labels. If you + already have a matrix, you can add labels later. Or, if you want to rename the labels, you can do + that using the dimnames() function, as shown below:

+ +
+
+
Copy
+
+ +
+
+> dimnames(m1) <- list(c("R1","R2"))
+> m1
+   [,1] [,2] [,3] [,4] [,5]
+R1    1    2    3    4    5
+R2    6    7    8    9   10
+
+
+ +

Another way to change the column name is with the colnames() function:

+ +
+
+
Copy
+
+ +
+
+colnames(m1) <- c("Col1","Col2","Col3","Col4","Col5")
+
+
+ +

Now if you display m1, you'll see this:

+ +
+
+
Copy
+
+ +
+
+> m1    
+   Col1 Col2 Col3 Col4 Col5
+R1    1    2    3    4    5
+R2    6    7    8    9   10
+
+
+ +

Similarly, another way to change the row names is with the rownames() + function. Let's use that to change the names of the rows back to Row1, Row2, and so on.

+ +
+
+
Copy
+
+ +
+
+rownames(m1) <- c("Row1", "Row2")
+
+
+ Accessing a Matrix + +

Well, that's a lot about naming the rows and columns of the matrix. But a matrix is useless + without some way to access the data. It's super easy. Just use the x and y coordinates. Always + remember: rows and then columns, not other way around. For instance, to select the element in the + first row and second column, just use this:

+ +
+
+
Copy
+
+ +
+
+> m1[1,2]
+[1] 2
+
+
+ +

If you want to select multiple columns, just use the range notation. For instance, to select columns + 2 until 3, use 2:3, as shown below:

+ +
+
+
Copy
+
+ +
+
+> m1[1,2:3]
+Col2 Col3 
+   2    3 
+
+
+ +

If you want to select multiple but discrete columns, not a range, just use the column numbers as a + vector. For instance, to select columns 2, 3, and 4, use c(2,3,4), as + shown below:

+ +
+
+
Copy
+
+ +
+
+> m1[1,c(2,3,4)]
+Col2 Col3 Col4 
+   2    3    4 
+
+
+ +

If you want to select all the columns of row 1, you can of course use c(1,2,3,4,5); but what if you don't know the number of columns? No + worries. you can just omit the column reference:

+ +
+
+
Copy
+
+ +
+
+> m1[1,]
+ +
+
+Col1 Col2 Col3 Col4 Col5 
+   1    2    3    4    5 
+
+
+ +

Note there is a comma and nothing after the comma— m1[1,], not m1[1]—such as you would do with an implied second parameter. If you + don't have a value as a parameter, R assumes all the columns. If you don't place a comma, R + picks up only one element starting from the left top-most going downward, for instance:

+ +
+
+
Copy
+
+ +
+
+> m1[2]
+[1] 6
+
+
+ +

So, be careful about the placement of values and the comma. In PL/SQL, if you have a comma, it means + you are supplying the next parameter. It can be null, but it must be supplied. If you want the next + parameter to be the default, you simply don't put the comma. If you put the comma and then + don't mention the value of the next parameter, you will get a syntax error. In R, you will not + get a syntax error and the behavior will be very different. Likewise, you can omit the first + parameter as well, which then defaults to all the rows of the column.

+ +

For instance, the following will bring up all the rows of column 1.

+ +
+
+
Copy
+
+ +
+
+> m1[,1]
+Row1 Row2 
+   1    6 
+> m1[,2]
+Row1 Row2 
+   2    7 
+
+
+ +

But, note that the output comes as rows, even though it's from a column. At least the labels are + correct: Row1, Row2, and so on. If you select + multiple columns, for example, columns 1 until 3, you will see the output as expected.

+ +
+
+
Copy
+
+ +
+
+> m1[,1:3]
+     Col1 Col2 Col3
+Row1    1    2    3
+Row2    6    7    8
+
+
+ +

But, thinking like database professionals, we probably won't like to address them as numbers. + Because we have the rows and columns named, can't we address them using the names, as you would + do in a table? The answer is, of course you can. Here is how we access row Row1 + and column Col2:

+ +
+
+
Copy
+
+ +
+
+> m1["Row1","Col2"]
+[1] 2
+
+
+ +

Another way to create a matrix is to use the rbind() function, which adds + a row to the matrix.

+ +
+
+
Copy
+
+ +
+
+> row1 <- c(11,12,13,14)
+> row2 <- c(21,22,23,24)
+> row3 <- c(31,32,33,34)
+> m1 <- rbind(row1,row2,row3)
+> m1
+     [,1] [,2] [,3] [,4]
+row1   11   12   13   14
+row2   21   22   23   24
+row3   31   32   33   34
+> m2 <- cbind(row1,row2,row3)
+> m2
+     row1 row2 row3
+[1,]   11   21   31
+[2,]   12   22   32
+[3,]   13   23   33
+[4,]   14   24   34
+
+
+ +

What if you use less number of values in rbind()? Let's see:

+ +
+
+
Copy
+
+ +
+
+> row4 <- c(41,42)
+> m1 <- rbind(row1,row2,row3, row4)
+> m1
+     [,1] [,2] [,3] [,4]
+row1   11   12   13   14
+row2   21   22   23   24
+row3   31   32   33   34
+row4   41   42   41   42
+
+
+ +

Row4 didn't have enough columns. It had only two, but the total + columns needed (four) is a multiple of two. Therefore, the available values simply got repeated. + However, if you give a nondivisible value as the number of elements, you will get an error:

+ +
+
+
Copy
+
+ +
+
+> row4 <- c(41,42,43)
+> m1 <- rbind(row1,row2,row3, row4)
+Warning message:
+In rbind(row1, row2, row3, row4) :
+  number of columns of result is not a multiple of vector length (arg 4)
+
+
+ Operations on a Matrix + +

But why would you even want to create values as matrices? Ease of operations, of course. R allows you + to operate on matrices as a whole. Here is an example.

+ +

Create a matrix of 20 numbers: four rows and five columns:

+ +
+
+
Copy
+
+ +
+
+> m1 <- matrix(1:20, nrow=5)
+> m2 <- matrix(1:20, nrow=5)
+
+> m2
+     [,1] [,2] [,3] [,4]
+[1,]    1    6   11   16
+[2,]    2    7   12   17
+[3,]    3    8   13   18
+[4,]    4    9   14   19
+[5,]    5   10   15   20
+
+
+ +

If you want to multiply 2 by every element of this matrix, just multiply.

+ +
+
+
Copy
+
+ +
+
+> m2 * 2
+     [,1] [,2] [,3] [,4]
+[1,]    2   12   22   32
+[2,]    4   14   24   34
+[3,]    6   16   26   36
+[4,]    8   18   28   38
+[5,]   10   20   30   40
+
+
+ +

In PL/SQL and other languages such as C or Java, you would have had to write a loop visiting all the + elements and multiplying. Not in R. You just add, subtract, multiply, raise to the power of, and so + on, just as with any regular unary operation. You can even multiply a matrix by another matrix:

+ +
+
+
Copy
+
+ +
+
+> m1
+     Col1 Col2 Col3 Col4
+Row1   11   12   13   14
+Row2   21   22   23   24
+Row3   31   32   33   34
+Row4   41   42   41   42
+> m2
+     [,1] [,2] [,3] [,4]
+[1,]    1    5    9   13
+[2,]    2    6   10   14
+[3,]    3    7   11   15
+[4,]    4    8   12   16
+> m1 * m2
+     Col1 Col2 Col3 Col4
+Row1   11   60  117  182
+Row2   42  132  230  336
+Row3   93  224  363  510
+Row4  164  336  492  672
+
+
+ +

Let's stop the discussion on matrices for now and discuss the next important collection in R: + arrays.

+ +

Arrays in R

+ + +

Matrices can accomplish most of the data analysis tasks. Most of the data you will get is in two + dimensions. But what if your dataset comes in more than two dimensions? An array comes to rescue + here. It is a multidimensional matrix.

+ +

Here is how you can define a one-dimensional array.

+ +
+
+
Copy
+
+ +
+
+> v1 <- array(data=1:30)
+> v1
+ [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
+
+
+ +

As you can see, it looks like a vector; it's not of much value. An optional parameter creates a + multidimensional array. This parameter—dim—allows you to + define multidimensional arrays:

+ +
+
+
Copy
+
+ +
+
+> v2 <- array(1:30, dim=c(5,5))
+> v2
+     [,1] [,2] [,3] [,4] [,5]
+[1,]    1    6   11   16   21
+[2,]    2    7   12   17   22
+[3,]    3    8   13   18   23
+[4,]    4    9   14   19   24
+[5,]    5   10   15   20   25
+
+
+ +

But this also looks like another familiar structure: the matrix. So it's not of much value. + Let's define a third dimension:

+ +
+
+
Copy
+
+ +
+
+> v3 <- array(1:30, dim=c(5,5,5))
+> v3
+, , 1
+
+[,1] [,2] [,3] [,4] [,5]
+[1,] 1 6 11 16 21
+[2,] 2 7 12 17 22
+[3,] 3 8 13 18 23
+[4,] 4 9 14 19 24
+[5,] 5 10 15 20 25
+
+, , 2
+
+[,1] [,2] [,3] [,4] [,5]
+[1,] 26 1 6 11 16
+[2,] 27 2 7 12 17
+[3,] 28 3 8 13 18
+[4,] 29 4 9 14 19
+[5,] 30 5 10 15 20
+
+, , 3
+... output truncated ...
+
+
+ +

As you can see, it created a three-dimensional data structure. We provided only 30 values (as shown + in the sequence 1:30), not enough for the 125 values needed for the array + (5 x 5 x 5 = 125); so R repeated the values as much as needed to fill the array.

+ +

How do we access the elements of the array? The exact way we addressed them in matrix: variable[rownumber, column number, third dimension]. You start with the + row, then the column, and finally all the other dimensions defined in the array.

+ +

Here's how to refer to the earlier output:

+ +
+
+
Copy
+
+ +
+
+> v3[2,2,1]
+[1] 7
+
+
+ +

Note that the third dimension is printed in the output first, as headings are when you display all + the values of a list variable. There is a matrix for each of the third dimensions. Like the matrix, + you can omit the exact position of a dimension and you will get all the values for that dimension, + for instance,

+ +
+
+
Copy
+
+ +
+
+> v3[1,1,]
+[1] 1 26 21 16 11
+
+
+ +

Here we omitted the third dimension; so we get the first row and first column of all the matrices for + all values of the third dimensions, that is, v3[1,1,1], v3[1,1,2], v3[1,1,4], and so on. Just as with a + matrix, you can add and multiply values directly to the array.

+ +

For example, this

+ +
+
+
Copy
+
+ +
+
+v3 * 10 
+
+
+ +

will multiply 10 by all the elements of the array.

+ +

You can multiply arrays by other arrays as well. If you want a sum of all the first row and first + column from all the third dimensions, you can use this:

+ +
+
+
Copy
+
+ +
+
+> sum(v3[1,1,])
+[1] 75
+
+
+ +

Just as with a matrix, you can name the dimensions of the array as well.

+ +
+
+
Copy
+
+ +
+
+colnames(v3) = c("Col1","Col2","Col3","Col4","Col5")
+rownames(v3) = c("Row1","Row2","Row3","Row4","Row5")
+
+
+ +

The following is a special naming for the third dimension, which you didn't see with matrix:

+ +
+
+
Copy
+
+ +
+
+dimnames(v3)[[3]] <- c("Third1","Third2","Third3","Third4","Third5")
+
+
+ +

Now if you print v3, you will see the names:

+ +
+
+
Copy
+
+ +
+
+> v3
+, , Third1
+
+Col1 Col2 Col3 Col4 Col5
+Row1 1 6 11 16 21
+Row2 2 7 12 17 22
+Row3 3 8 13 18 23
+Row4 4 9 14 19 24
+Row5 5 10 15 20 25
+
+, , Third2
+
+Col1 Col2 Col3 Col4 Col5
+Row1 26 1 6 11 16
+Row2 27 2 7 12 17
+Row3 28 3 8 13 18
+Row4 29 4 9 14 19
+Row5 30 5 10 15 20
+
+, , Third3
+... output truncated ...
+
+
+ +

And, just as with a matrix, you will be able to address the elements by their dimension names instead + of the positions.

+ +
+
+
Copy
+
+ +
+
+> v3["Row1","Col2","Third3"]
+[1] 26
+
+
+ +

So, as you can see, arrays are just like matrices, but with multiple dimensions that you can name and + address elements by.

+ +

Data Frames in R

+ + +

Being a database professional, you probably already know that data comes in all types, not just in + one type. For instance, take employee data. You probably have employee ID (EmpID), which is a number; the name (Name), + which is characters, and another column called AtHQ, which is a logical + value TRUE or FALSE that shows whether the + employee is located at the company headquarters. This is what you typically see in database tables, + spreadsheets, and even in text files.

+ + + + + + + + + + + + + + + + + + + +
EmpIDNameAtHQ
1John SmithT
2Jane DoeF
+ +

In PL/SQL, you would have to define a record type and create a collection on that record. In Oracle + Database, you would have to create a table with those columns and specified data types. In R, the + collection is called a data frame.

+ +

A data frame is just like a matrix, but with a very important difference: the elements can contain + different data types. Recall that all the elements in a matrix must be of the same data type. + Because a data frame removes that limitation, it can be used in many data analysis cases.

+ +

Let's create a data frame for the above information. You create a data frame via the data.frame() function.

+ +
+
+
Copy
+
+ +
+
+> df1 <- data.frame(1,"John Smith",T)
+
+
+ +

If you show the data frame, you can see that the column labels are not exactly what you intended.

+ +
+
+
Copy
+
+ +
+
+> df1
+  X1 X.John.Smith.    T
+1  1 John Smith    TRUE 
+
+
+ +

R simply puts whatever it feels is right. Let's put appropriate labels for the columns.

+ +
+
+
Copy
+
+ +
+
+> colnames(df1) <- c("EmpId","Name","AtHQ")
+> df1
+  EmpId Name       AtHQ
+1     1 John Smith TRUE
+
+
+ +

When you want to add rows to the data frame, just create a new data frame and add it to the other one + using the rbind() function.

+ +
+
+
Copy
+
+ +
+
+df2 <- data.frame(2,"Jane Doe",F)
+colnames(df2) <- c("EmpId","Name","AtHQ")
+df1 <- rbind(df1,df2)
+
+
+ +

Now if you check df1, you'll see this:

+ +
+
+
Copy
+
+ +
+
+> df1
+  EmpId Name       AtHQ
+1     1 John Smith TRUE
+2     2 Jane Doe   FALSE
+
+
+ +

Assigning row and column names makes accessing the elements of the data frame immensely easy. To show + all the values of column EmpID, here is what you need to write:

+ +
+
+
Copy
+
+ +
+
+> df1["EmpId"]
+EmpId
+1 1
+2 2
+
+
+ +

But in a data frame, you have another method of writing the column values. It's the name of the + data frame followed by the column name separated by a "$" sign. Here is the example:

+ +
+
+
Copy
+
+ +
+
+> df1$EmpId
+[1] 1 2
+
+
+ +

Note that while this also gets the employee IDs, the values come back as a vector. In the previous + approach, the value returned was another data frame. This might not make much difference in your + work, but it might. So, keep that mind.

+ +

Finally, you can access the rows of the data frame using the "[]" notation.

+ +
+
+
Copy
+
+ +
+
+> df1[1,]
+EmpId Name AtHQ
+1 1 John Smith TRUE
+
+
+ +

Here, we retrieved all the columns of row 1. If you want only a selected column, for example, EmpId, you can give it as an index:

+ +
+
+
Copy
+
+ +
+
+> df1[1,"EmpId"]
+[1] 1
+
+
+ +

What about row names? Yes, you can name rows in a data frame, using the same rownames() function you saw with matrix:

+ +
+
+
Copy
+
+ +
+
+> rownames(df1) <- c("Employee1","Employee2")
+> df1
+EmpId Name AtHQ
+Employee1 1 John Smith TRUE
+Employee2 2 Jane Doe FALSE
+
+
+ +

However, naming the rows doesn't help because rows are merely observations (which are analogous + to records in a table) of multiple variables (which are columns). It makes sense to name these + variables; but naming rows might not make sense. When you add a row, you have to remember to add the + row name for that as well.

+ Working with an Actual Data Frame + +

In the previous text, I just wanted to give you a flair of the various collections. Let's see + some operations on an actual dataset, not made up values, as we saw before.

+ +

Fortunately for us, R comes with many built-in datasets. To find these datasets, just use the data() function at the R prompt:

+ +
+
+
Copy
+
+ +
+
+> data()
+
+
+ +

This will bring up a different window with the datasets. Here is a small excerpt from that window: +

+ +
+
+
Copy
+
+ +
+
+Data sets in package 'datasets':
+
+AirPassengers Monthly Airline Passenger Numbers 1949-1960
+BJsales Sales Data with Leading Indicator
+BJsales.lead (BJsales)
+Sales Data with Leading Indicator
+... output truncated ...
+
+
+ +

We will use one dataset, airquality, which shows the daily air quality + measurements in New York from May to September 1973. How do I know that? I didn't have to guess. + Just use help() function to see what it is about. Enter help(airquality) at the R command prompt and you will see a browser window + pop up to explain what the dataset is. In that help window, I see that it's a data frame with + the following properties:

+ +
+
+
Copy
+
+ +
+
+A data frame with 154 observations on 6 variables. 
+
+[,1]  Ozone     numeric  Ozone (ppb)
+
+[,2]  Solar.R   numeric  Solar R (lang)
+
+[,3]  Wind      numeric  Wind (mph)
+
+[,4]  Temp      numeric  Temperature (degrees F)
+
+[,5]  Month     numeric  Month (1--12)
+
+[,6]  Day       numeric  Day of month (1--31) 
+
+
+ +

Let's see some of the example data:

+ +
+
+
Copy
+
+ +
+
+> head(airquality)
+  Ozone Solar.R Wind Temp Month Day
+1    41     190  7.4   67     5   1
+2    36     118  8.0   72     5   2
+3    12     149 12.6   74     5   3
+4    18     313 11.5   62     5   4
+5    NA      NA 14.3   56     5   5
+6    28      NA 14.9   66     5   6
+
+
+ +

Let's see how we can use this data frame for some analysis. First, I am tired of typing + "airquality" every time. So I will create a nice little variable from the dataset.

+ +
+
+
Copy
+
+ +
+
+> a <- airquality
+
+
+ +

From now on, I will just use "a" for the data. Recall from the first article in this series + that you can use the str() function to find the structure of a data + frame.

+ +
+
+
Copy
+
+ +
+
+> str(a)
+'data.frame': 153 obs. of  6 variables:
+ $ Ozone  : num  41 36 12 18 25 28 23 19 8 25 ...
+ $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
+ $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
+ $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
+ $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
+ $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
+
+
+ +

The str() function is analogous to the describe + command in SQL*Plus to describe the structure of a table. Consider the above output carefully. It + shows there are 153 "observations." There are six variables, analogous to columns in a + table. So this is similar to a table with six columns and 153 rows. The subsequent parts of the + output shows the columns, their data type, and some of the sample.

+ +

For my first analysis, I just want to look at the quick summaries of all the available attributes: +

+ +
+
+
Copy
+
+ +
+
+> summary(a)
+    Ozone        Solar.R         Wind           Temp
+Min.   :   1.00  Min.   :  7.0   Min. : 1.700   Min. :56.00
+1st Qu.:  18.00  1st Qu.:115.8   1st Qu.: 7.400 1st Qu.:72.00
+Median :  31.50  Median :205.0   Median : 9.700 Median :79.00
+Mean   :  42.13  Mean   :185.9   Mean : 9.958   Mean :77.88
+3rd Qu.:  63.25  3rd Qu.:258.8   3rd Qu.:11.500 3rd Qu.:85.00
+Max.   : 168.00  Max.   :334.0   Max. :20.700   Max. :97.00
+  NA's :37         NA's :7
+   Month          Day
+Min.   :5.000 Min.   : 1.0
+1st Qu.:6.000 1st Qu.: 8.0
+Median :7.000 Median :16.0
+Mean   :6.993 Mean   :15.8
+3rd Qu.:8.000 3rd Qu.:23.0
+Max.   :9.000 Max.   :31.0
+
+
+ +

This is a very good starting point that tells me some of the important details of the dataset: the + minimum, the maximum, the median, the mean, the first quantile, and the third quantile for each + attribute (or column). It also shows how many values are not available, shown as NA, which is equivalent of NULL in Oracle Database.

+ +

From the output, we know that 37 records have NA in the Ozone column and seven records have NA in the + Solar.R column. The other columns don't have NA + counts, which means they don't have NA + values. Interestingly, it does not show a key factor in data analysis—standard + deviation—but you can get it easily if you want, as you will see later. +

+ +

Let's see some questions that might be asked and how we get the answers. We need to find out the + daily temperatures in the month of June. You can reference the cells of the data frame using the + absolute values (row m and column n), or you + can use the logical values for the cell. To get the logical value, you should use the comparison + operator. Because you are looking for the month of June, that is, 6 under + the Month column, use Month==6 as the index. + And because you want just the Temp column, you should use the $Temp notation.

+ +
+
+
Copy
+
+ +
+
+> a$Temp[a$Month==6]
+ [1] 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73 76
+[21] 77 76 76 76 75 78 73 80 77 83
+
+
+ +

How about we get a bit specific, for example, June 30th? The same principle applies. You just need to + extend the comparison operation to be a bit more restrictive.

+ +
+
+
Copy
+
+ +
+
+> a$Temp[a$Month==6 & a$Day==30]
+[1] 83
+
+
+ +

But mostly you will be interested in some kind of statistical inference on a group of data, for + example, the average (or mean) temperature in June, not just a single value. The mean() function accomplishes that.

+ +
+
+
Copy
+
+ +
+
+> mean(a$Temp[a$Month==6])
+[1] 79.1
+
+
+ +

Or, the maximum temperature in June:

+ +
+
+
Copy
+
+ +
+
+> max(a$Temp[a$Month==6])
+[1] 93
+
+
+ +

Remember the summary function that gives you all quick stats at a glance? + Won't you want to have the same here? We would love to slice the data for June and apply the + summary. Let's create a new variable, called june, to hold data for + June.

+ +
+
+
Copy
+
+ +
+
+> june <- a$Temp[a$Month==6]
+> june
+[1] 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73 76 77 76 76 76 75
+[26] 78 73 80 77 83
+
+
+ +

But what exactly is june?

+ +
+
+
Copy
+
+ +
+
+> class(june)
+[1] "integer"
+
+
+ +

It says it's an integer, but we wanted a data frame. Check if it is a data frame:

+ +
+
+
Copy
+
+ +
+
+> is.data.frame(june)
+[1] FALSE
+
+
+ +

So it's not a data frame. Well, what is it? Could it be a vector?

+ +
+
+
Copy
+
+ +
+
+> is.vector(june)
+[1] TRUE
+
+
+ +

It is a vector. But we want a data frame, so we will convert it to one.

+ +
+
+
Copy
+
+ +
+
+> june <- data.frame(june)
+> class(june)
+[1] "data.frame"
+
+
+ +

Now that we have our data frame, we can apply the summary() function.

+ +
+
+
Copy
+
+ +
+
+> summary(june)
+june
+Min.   :65.00
+1st Qu.:76.00
+Median :78.00
+Mean   :79.10
+3rd Qu.:82.75
+Max.   :93.00
+
+
+ +

Let's get the standard devision of this dataset using the sd() + function.

+ +
+
+
Copy
+
+ +
+
+> sd(june$june)
+[1] 6.598589
+
+
+ Subsetting + +

What if you want to get all the data, not just Temp, of the data frame for + the month of June? In SQL, you would write something like this:

+ +
+
+
Copy
+
+ +
+
+select *
+from df1
+where Month = 6;
+
+
+ +

In R, it's also trivial. There are two ways to do it. I will start with the first—and my + favorite—method, because it's just plain easy to read. It's by using the subset() function.

+ +
+
+
Copy
+
+ +
+
+> june<-subset(a, a$Month==6)
+> june
+   Ozone Solar.R Wind Temp Month Day
+32    NA     286  8.6   78     6   1
+33    NA     287  9.7   74     6   2
+34    NA     242 16.1   67     6   3
+35    NA     186  9.2   84     6   4
+36    NA     220  8.6   85     6   5
+37    NA     264 14.3   79     6   6
+38    29     127  9.7   82     6   7
+39    NA     273  6.9   87     6   8
+40    71     291 13.8   90     6   9
+41    39     323 11.5   87     6  10
+42    NA     259 10.9   93     6  11
+43    NA     250  9.2   92     6  12
+44    23     148  8.0   82     6  13
+45    NA     332 13.8   80     6  14
+46    NA     322 11.5   79     6  15
+47    21     191 14.9   77     6  16
+48    37     284 20.7   72     6  17
+49    20      37  9.2   65     6  18
+50    12     120 11.5   73     6  19
+51    13     137 10.3   76     6  20
+52    NA     150  6.3   77     6  21
+53    NA      59  1.7   76     6  22
+54    NA      91  4.6   76     6  23
+55    NA     250  6.3   76     6  24
+56    NA     135  8.0   75     6  25
+57    NA     127  8.0   78     6  26
+58    NA      47 10.3   73     6  27
+59    NA      98 11.5   80     6  28
+60    NA      31 14.9   77     6  29
+61    NA     138  8.0   83     6  30
+
+
+ +

The other approach is this:

+ +
+
+
Copy
+
+ +
+
+> a[a$Month==6,]
+
+
+ +

Note the expression carefully; there is a comma after the conditional expression. Why is that? Recall + from the previous discussion that to address an element you have to use the [RowNumber, ColumnNumber] notation. RowNumber is specified by a$Month==6, but for + columns, we don't want to restrict anything. We want to select all the columns; therefore, we + left it as null.

+ +

Now you can see why I like the subset() approach more. It's several + times more readable. However, subset() doesn't work when you want to + assign values based on conditions. For instance, suppose you want to assign the value 25 to Ozone when the value is NA. subset won't work; you have to leverage the subscript approach.

+ +
+
+
Copy
+
+ +
+
+> a[is.na(a$Ozone),]
+    Ozone Solar.R Wind Temp Month Day
+5      NA      NA 14.3   56     5   5
+10     NA     194  8.6   69     5  10
+25     NA      66 16.6   57     5  25
+26     NA     266 14.9   58     5  26
+27     NA      NA  8.0   57     5  27
+32     NA     286  8.6   78     6   1
+33     NA     287  9.7   74     6   2
+34     NA     242 16.1   67     6   3
+35     NA     186  9.2   84     6   4
+36     NA     220  8.6   85     6   5
+37     NA     264 14.3   79     6   6
+39     NA     273  6.9   87     6   8
+42     NA     259 10.9   93     6  11
+43     NA     250  9.2   92     6  12
+45     NA     332 13.8   80     6  14
+46     NA     322 11.5   79     6  15
+52     NA     150  6.3   77     6  21
+53     NA      59  1.7   76     6  22
+54     NA      91  4.6   76     6  23
+55     NA     250  6.3   76     6  24
+56     NA     135  8.0   75     6  25
+57     NA     127  8.0   78     6  26
+58     NA      47 10.3   73     6  27
+59     NA      98 11.5   80     6  28
+60     NA      31 14.9   77     6  29
+61     NA     138  8.0   83     6  30
+65     NA     101 10.9   84     7   4
+72     NA     139  8.6   82     7  11
+75     NA     291 14.9   91     7  14
+83     NA     258  9.7   81     7  22
+84     NA     295 11.5   82     7  23
+102    NA     222  8.6   92     8  10
+103    NA     137 11.5   86     8  11
+107    NA      64 11.5   79     8  15
+115    NA     255 12.6   75     8  23
+119    NA     153  5.7   88     8  27
+150    NA     145 13.2   77     9  27
+
+
+ +

Let's assign a value of 25 to all the cells where Ozone is NA.

+ +
+
+
Copy
+
+ +
+
+> a[is.na(a$Ozone),]$Ozone <- 25
+
+
+ +

Let's test only one row: row 5, which was NA earlier:

+ +
+
+
Copy
+
+ +
+
+> a[5,]
+  Ozone Solar.R Wind Temp Month Day
+5    25      NA 14.3   56     5   5
+
+
+ +

Now it's no longer NA; it's 25. + Let's confirm there are no NAs in the Ozone column.

+ +
+
+
Copy
+
+ +
+
+> a[is.na(a$Ozone),]
+[1] Ozone   Solar.R Wind    Temp    Month   Day    
+<0 rows> (or 0-length row.names)
+
+
+ Selecting Specific Columns from a Data Frame + +

What if you want to select only a specific column from the data frame, instead of all the columns, + based on some condition? For instance, you want to select temperatures for the month of June.

+ +

In SQL, it would be something like this:

+ +
+
+
Copy
+
+ +
+
+select temp
+from dataframe
+where Month = 6;
+
+
+ +

In R, you can build it upon the previous expression:

+ +
+
+
Copy
+
+ +
+
+> subset(a, a$Month==6)$Temp
+ [1] 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73 76 77 76 76 76 75 78 73 80 77 83
+
+
+ +

Note how we used the previous expression of subset, which yields a data + frame containing all the columns from "a"—but only where Month + matches 6—and we added $Temp to it to select + only that column. As a further extension, if you want + to get the minimum temperature in June, in SQL, you would write this:

+ +
+
+
Copy
+
+ +
+
+select min(temp)
+from dataframe1
+where month = 6;
+
+
+ +

In R, you'd do this:

+ +
+
+
Copy
+
+ +
+
+> min(subset(a, a$Month==6)$Temp)
+[1] 65
+
+
+ +

By the way, the condition clause can also contain multiple conditions. For instance, if you want to + select temperature on June 14, in SQL you would write something like this:

+ +
+
+
Copy
+
+ +
+
+select Temp
+from dataframe1
+where Month = 6
+and Day = 14;
+
+
+ +

In R, you would use the "&" character in place of AND:

+ +
+
+
Copy
+
+ +
+
+> subset(a, a$Month==6 & a$Day==14)$Temp
+[1] 80
+
+
+ +

There is an even simpler way for SQL-savvy developers. A package called sqldf + makes the task as easy as writing SQL code.

+ +
+
+
Copy
+
+ +
+
+> install.packages("sqldf")
+
+
+ +

Then load the package:

+ +
+
+
Copy
+
+ +
+
+> library(sqldf)
+
+
+ +

Using the library, we can rewrite our code as follows:

+ +
+
+
Copy
+
+ +
+
+> sqldf('select Temp from a where Month=6 and Day = 14')
+  Temp
+1   80
+
+
+ +

Well, that makes is super easy for SQL-literate readers. It returns a data frame, even if it's + just one value. But as you saw earlier, you can compare that with any atomic variable. Here is an + example:

+ +
+
+
Copy
+
+ +
+
+> v1 <- sqldf('select Temp from a where Month=6 and Day = 14')> if (v1 > 75) { print ("This was a hot day day")}
+[1] "This was a hot day day"
+
+
+ +

Let's go back to the subset approach. What if you wanted to select a few columns instead of all + the columns? For instance, because you are selecting for the month of June only, you don't want + to select the month number again. In addition, you might want to have the day in the first column. + In SQL, you would write this:

+ +
+
+
Copy
+
+ +
+
+select day, ozone, solar.r, temp
+from dataframe
+where month=6;
+
+
+ +

In R, you can use the select parameter of the substr() + function.

+ +
+
+
Copy
+
+ +
+
+> subset(a, a$Month==6, select=c(Day,Ozone, Solar.R, Temp))
+   Day Ozone Solar.R Temp
+32   1    25     286   78
+33   2    25     287   74
+34   3    25     242   67
+35   4    25     186   84
+36   5    25     220   85
+37   6    25     264   79
+38   7    29     127   82
+39   8    25     273   87
+40   9    71     291   90
+41  10    39     323   87
+42  11    25     259   93
+43  12    25     250   92
+44  13    23     148   82
+45  14    25     332   80
+46  15    25     322   79
+47  16    21     191   77
+48  17    37     284   72
+49  18    20      37   65
+50  19    12     120   73
+51  20    13     137   76
+52  21    25     150   77
+53  22    25      59   76
+54  23    25      91   76
+55  24    25     250   76
+56  25    25     135   75
+57  26    25     127   78
+58  27    25      47   73
+59  28    25      98   80
+60  29    25      31   77
+61  30    25     138   83
+
+
+ +

Note that we had to pass the list columns as a vector (recall that the c + function is used to create a vector). By the way, we just retrieved all the columns except one: + Month. So instead of writing a long string of column names, we can simply + write all columns except Month. There is no equivalent in SQL. In R, you + can write this: +

+ +
+
+
Copy
+
+ +
+
+> subset(a, a$Month==6, select=-Month)
+   Ozone Solar.R Wind Temp Day
+32    25     286  8.6   78   1
+33    25     287  9.7   74   2
+34    25     242 16.1   67   3
+35    25     186  9.2   84   4
+
+36    25     220  8.6   85   5
+... output truncated...
+
+
+ +

With the "[]" notation, you can achieve the same objective by writing this:

+ +
+
+
Copy
+
+ +
+
+> a[a$Month==6,c("Day","Ozone","Solar.R","Temp")]
+   Day Ozone Solar.R Temp
+32   1    25     286   78
+33   2    25     287   74
+34   3    25     242   67
+... output truncated ...
+
+
+ +

Here you learned how to manipulate a data frame using the basic R syntax. Earlier in this article, + you saw briefly how the sqldf function makes queries SQL-like, which + makes the learning curve for database professionals amazingly smaller. Actually, sqldf is designed for database access from within R. We will explore sqldf in more detail later in the series, in the installment dedicated to + database access.

+ +

Summary

+ + +

Here is a quick recap of what you learned in this article.

+ + + +

To check for the type of collection for a variable x, use the function

+ + + +

All data types can be operated on as a whole, for example, to multiply 2 by all the elements of a + vector, matrix, or array, just call 2 * variable_name.

+ +

While defining the collections, you can optionally supply column headers, or labels, as shown below: +

+ +
+
+
Copy
+
+ +
+
+ list1 <- list(col1=1,col2="A",col3=T)
+vector1 <- c(col1=1,col2=2,col3=3)
+
+
+ +

You can address the individual elements of the collection by position, for example, list1[1].

+ +

Or, you can access them by the column label, if that is defined: list1["col1"].

+ +

By default, the above will also show the column label. If you want the value alone, you use double + square brackets: list1[["col1"]].

+ +

For two-dimensional objects such as matrices and data frames, you need to use the [row,column] format, for example, matrix1[1,3] + to get the first row, third column.

+ +

If you want to get the entire first row, that is, all the columns, use matrix1[1,].

+ +

For lists, matrices, and data frames you can use the $ notation if the column labels have been + created, for instance, matrix1$col1.

+ +

To add row and column names to an exiting collection, you use the rownames() and + colnames() functions, + respectively.

+ +
+
+
Copy
+
+ +
+
+> matrix1
+     [,1] [,2] [,3] [,4] [,5] [,6]
+[1,]    1    6   11   16   21   26
+[2,]    2    7   12   17   22   27
+[3,]    3    8   13   18   23   28
+[4,]    4    9   14   19   24   29
+[5,]    5   10   15   20   25   30
+> colnames(matrix1) <- c("col1","col2","col3","col4","col5","col6")
+> rownames(matrix1) <- c("row1","row2","row3","row4","row5")
+
+
+ +

To add rows to a matrix or a data frame, you need to use the rbind() + function.

+ +

To add columns, you need to use the cbind() function.

+ +

To selectively get the values from a collection, you use the logical operator within square brackets. + For instance, to get all values in a matrix greater than 20, you use this:

+ +
+
+
Copy
+
+ +
+
+> matrix1 > 20
+col1 col2 col3 col4 col5 col6
+row1 FALSE FALSE FALSE FALSE TRUE TRUE
+row2 FALSE FALSE FALSE FALSE TRUE TRUE
+row3 FALSE FALSE FALSE FALSE TRUE TRUE
+row4 FALSE FALSE FALSE FALSE TRUE TRUE
+row5 FALSE FALSE FALSE FALSE TRUE TRUE
+> matrix1[matrix1 > 20]
+[1] 21 22 23 24 25 26 27 28 29 30
+
+
+ +

That's it. Now it's time to test your understanding using the quiz.

+ +

Quiz

+ + +

Below are 10 questions followed by the answers.

+ +

Question 1: What will be the result of the following code? If this will result in a + syntax error, mention that.

+ +
+
+
Copy
+
+ +
+
+v1 <- 100
+v2 <- c(100)
+
+if (v1==v2) {
+  paste("They are same")
+} else {
+  paste("They are different")
+}
+
+
+ +

Question 2: What will be the output of the following code? If you think it will + result in a syntax error, mention that. Hint: the c() function creates a + vector and a vector can contain values of only one data type.

+ +
+
+
Copy
+
+ +
+
+> v2 <- c(1,T)
+> v2[2]
+
+
+ +

Question 3: Refer to the dataset airquality. Write an R + expression that pulls five rows at random and displays all the columns.

+ +

Question 4: What will be the output of the following expression? If this will + produce an error, say so.

+ +
+
+
Copy
+
+ +
+
+v1 <- list(a=1:10, b=1:20)
+
+
+ +

Question 5: I want to create a two-column matrix with column labels "a" + and "b" and with 10 rows. Here is what I proposed. Will this accomplish what I need or + produce a syntax error?

+ +
+
+
Copy
+
+ +
+
+v1 <- matrix(c(a=1:10, b=1:10))
+
+
+ +

Question 6: You need to create a three-dimensional data type of all integers. + What's the best collection type to use to do that?

+ +

Question 7: You want to check if v1 is a vector. It has + been defined earlier (although you are not aware of that) as follows:

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(1:10)
+
+
+ +

So you give the following command:

+ +
+
+
Copy
+
+ +
+
+> class(v1)
+
+
+ +

Will that give you a confirmation of how the variable was defined?

+ +

Question 8: A variable v1 has been defined as follows: +

+ +
+
+
Copy
+
+ +
+
+v1 <- 7
+
+
+ +

However, you are not aware of that. To check if it's a vector, you use the following command:

+ +
+
+
Copy
+
+ +
+
+> is.vector(v1)
+[1] TRUE
+
+
+ +

Why does it show as a vector? It's just an integer.

+ +

Question 9: Here is a data frame.

+ +
+
+
Copy
+
+ +
+
+> df1
+     col1 col2 col3 col4 col5 col6
+row1    1    6   11   16   21   26
+row2    2    7   12   17   22   27
+row3    3    8   13   18   23   28
+row4    4    9   14   19   24   29
+row5    5   10   15   20   25   30
+
+
+ +

You want to extract row3 and two columns: col1 + and col2. You gave the following expression:

+ +
+
+
Copy
+
+ +
+
+> df1[row3,("col1","col2")]
+
+
+ +

Will it produce the desired result? If it will result in a syntax error, mention that.

+ +

Question 10: You defined a factor that is supposed to represent some sort of level + of seniority among employees.

+ +
+
+
Copy
+
+ +
+
+f1 <- factor(c(1,2,3,4))
+
+
+ +

So, 1 is less in seniority than 2 and so on. However, when you try to determine the minimum + seniority, you get this:

+ +
+
+
Copy
+
+ +
+
+> min(f1)
+Error in Summary.factor(1:4, na.rm = FALSE) : 
+  'min' not meaningful for factors
+
+
+ +

What happened? How can you eliminate the error?

+ +

Answers

+ + +

Answer 1: It will not produce a syntax error. Recall that behind the scenes, all + data types are essentially vectors. So you will get the message "They are same."

+ +

Answer 2: Note it carefully. This is a vector, as the c() + function would create. You have assigned a number and a logical value (T). + Elements of multiple data types are not allowed in vectors. So you can + assume that it will fail with a syntax error. However, a vector does not actually fail. R will + convert all the elements into a superset data type that can accommodate all the elements and make + them of a single data type. In this case, it will be of type number, because logical values will be + assigned a value of 0 or 1 if they are false or true, respectively. So, this vector will be stored + as c(1,1) and v2[2] will yield 1.

+ +

Answer 3: To sample five values from a set of known rows, you have to use the sample() function, as shown below.

+ +
+
+
Copy
+
+ +
+
+sample(total_no_of rows, 5)
+
+
+ +

To get the total number of rows, you have to use the nrow() function. So, + here is the expression to get five random values from the value of the total number of rows:

+ +
+
+
Copy
+
+ +
+
+> sample(nrow(a),5)
+[1]  60  56 101  16  70
+
+
+ +

This gives a vector containing the index of the rows of the dataframe. So, we can use that to get the + rows from the data frame.

+ +
+
+
Copy
+
+ +
+
+> a[sample(nrow(a),5),]
+    Ozone Solar.R Wind Temp Month Day
+24     32      92 12.0   61     5  24
+43     25     250  9.2   92     6  12
+144    13     238 12.6   64     9  21
+40     71     291 13.8   90     6   9
+12     16     256  9.7   69     5  12
+
+
+ +

Answer 4: It will not produce error. Remember lists can contain items of different + data types and structures. So this will be a two-element list with the elements being a 10-number + array and a 20-number array.

+ +

Answer 5: It will not produce a syntax error, but it will not yield what you needed. + The expression c(a=1:10,b=1:10)) will merely create a 20-element vector. + Here is a demonstration:

+ +
+
+
Copy
+
+ +
+
+> v1 <- c(a=1:10, b=1:10)
+> v1
+a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
+ 1  2  3  4  5  6  7  8  9  10  1  2  3  4  5  6  7  8  9  10
+
+
+ +

So, the expression in the question is equivalent to this:

+ +
+
+
Copy
+
+ +
+
+matrix(1 through 10 and 1 through 10 again)
+
+
+ +

It will be a single-column matrix, because rows are used first. To create a two-column matrix, I need + to use this:

+ +
+
+
Copy
+
+ +
+
+> v1 <- matrix(c(a=1:10, b=1:10), ncol=2)
+> v1
+     [,1] [,2]
+[1,]    1    1
+[2,]    2    2
+[3,]    3    3
+[4,]    4    4
+[5,]    5    5
+[6,]    6    6
+[7,]    7    7
+[8,]    8    8
+[9,]    9    9
+[10,]  10   10
+
+
+ +

And then I can name the columns using the following:

+ +
+
+
Copy
+
+ +
+
+> colnames(v1) <- c("a","b")
+
+
+ +

Answer 6: The answer is simple: it's an array collection type. Matrix allows + only two dimensions.

+ +

Answer 7: No; it will not. class(v1) will show integer, the data type of the elements of the vector. To know if it's a + vector, you have to use is.vector() function.

+ +
+
+
Copy
+
+ +
+
+> is.vector(v1)
+[1] TRUE
+
+
+ +

Answer 8: The atomic data types are also vectors under the covers. So v1 is a vector of one element.

+ +

Answer 9: It will result in a syntax error. The correct syntax is this:

+ +
+
+
Copy
+
+ +
+
+df1[c("row3"),c("col1","col2")]
+
+
+ +

Note the two columns are passed as a vector.

+ +

Answer 10: The factor has to be ordered. The default is unordered and, therefore, it + is not possible to have a comparative relationship. This should have been defined as follows:

+ +
+
+
Copy
+
+ +
+
+> f1 <- factor(c(1,2,3,4), ordered = T)
+> min(f1)
+[1] 1
+Levels: 1 < 2 < 3 < 4
+
+
+ +

About the Authors

+ +

Arup Nanda arup@proligence.com has been an + Oracle DBA since 1993, handling all aspects of database administration, from performance tuning to + security and disaster recovery. He was Oracle Magazine's DBA of the Year in 2003 and + received an Oracle Excellence Award for Technologist of the Year in 2012. Arup Nanda has been an + Oracle DBA since 1993, handling all aspects of database administration, from performance tuning to + security and disaster recovery. He was Oracle Magazine's DBA of the Year in 2003 and + received an Oracle Excellence Award for Technologist of the Year in 2012.

+
+
+ + + + \ No newline at end of file diff --git a/Articles/databases/119-learning-r-for-pl-sql-developers-part-2.html b/Articles/databases/119-learning-r-for-pl-sql-developers-part-2.html index e69de29..f940be6 100644 --- a/Articles/databases/119-learning-r-for-pl-sql-developers-part-2.html +++ b/Articles/databases/119-learning-r-for-pl-sql-developers-part-2.html @@ -0,0 +1,2885 @@ + + +
+
+

Part 1 | Part 2 | Part 3 of a series that presents an + easier way to learn R by comparing and contrasting it to PL/SQL.

+ +

Welcome to the second installment of this series. In this installment, you will learn about the more advanced + concepts of the R language such as evaluating conditions, executing loops, and creating program units such + as functions.

+ + +

Reading Input

+ +

But before we start, let's explore a rather trivial activity in any interactive program: accepting an + input from the user at runtime. The R function for that is called readline(). It + prompts the user for an input and reads the input to be stored as a value. Here is an example:

+ +
+
+
Copy
+
+ +
+
+> v1 <- readline(prompt = "what's the good word> ")
+what's the good word> 
+
+
+ +

You can enter a value at the blinking cursor. Suppose you enter 1. The variable + v1 will be assigned the value of 1. If you type v1, + you will see the value. Remember from Part 1 that you can just type the variable name and the value will be displayed. There is + no need for a print() function. +

+ +
+
+
Copy
+
+ +
+
+> v1
+[1] "1"
+
+
+ +

But note the double quotes. This is a character string. You can confirm that by using the class() function, which you learned about in Part 1:

+ +
+
+
Copy
+
+ +
+
+> class(v1)
+[1] "character"
+
+
+ +

If you want to use the value as a number, then you must convert it to a number by using the as.numeric() function, which you learned about in Part 1, or the as.integer() function. Here is an example:

+ +
+
+
Copy
+
+ +
+
+> v1 <- as.numeric(v1)
+
+
+ +

Confirm that it's a numeric value now:

+ +
+
+
Copy
+
+ +
+
+> class(v1)
+[1] "numeric"
+
+
+ +

Alternatively, you can use this:

+ +
+
+
Copy
+
+ +
+
+> v1 <- as.integer(v1)
+> class(v1)
+[1] "integer"
+
+
+ + +

What IF

+ +

Like the IF statement in PL/SQL, the most basic conditional operation in R is also + IF. In PL/SQL, the general structure of the IF condition is: +

+ +
+
+
Copy
+
+ +
+
+IF conditional expression THEN
+  some statements
+ELSE 
+  some statements
+END IF;
+
+ +

Here is some sample code:

+ +
+
+
Copy
+
+ +
+
+-- pl1.sql
+
+declare
+  v1 number;
+begin
+  v1 := &inputvalue;
+  if (v1 < 101) then
+    dbms_output.put_line('v1 is less than 101');
+  end if;
+end;
+
+
+ +

Here is the output (after you input 100 at the prompt):

+ +
+
+
Copy
+
+ +
+
+Enter value for inputvalue: 100
+old 4: v1 := &inputvalue;
+new 4: v1 := 100;
+v1 is less than 101
+
+
+ +

Here is how you write the same logic in R:

+ +
+
+
Copy
+
+ +
+
+#r1.txt
+
+v1 <- is.integer(readline(prompt = "enter a number> "))
+if (v1<101)
+{
+   print ("v1 is less than 101")
+}
+
+
+ +

Here is the output:

+ +
+
+
Copy
+
+ +
+
+[1] "v1 is less than 101"
+
+
+ +

There are a few things to note here before we further:

+ + + + +

Or, ELSE ...

+ +

When there is IF, there must be ELSE. In PL/SQL, you + would write like this:

+ +
+
+
Copy
+
+ +
+
+--pl2.sql
+
+declare
+  v1 number;
+begin
+  v1 := &inputvalue;
+  if (v1 < 101) then
+    dbms_output.put_line('v1 is less than 101');
+  else
+    dbms_output.put_line('v1 is not less than 101');
+  end if;
+end;
+
+
+ +

The R syntax is also the same—-ELSE—but there is a catch you should be + aware of. Here is the equivalent R code:

+ +
+
+
Copy
+
+ +
+
+v1 <- 100
+if (v1 < 101)
+{
+   print ('v1 is less than 101')
+print ('Not Indented')
+} else
+{
+   print ('v1 is greater than 101')
+      print ('Way Too much Indented')
+}
+
+
+ +

I used the indentation messages to show you how indentations are not important in R, just as in PL/SQL. But I + want to show you a very important differentiator. Note the presence of a curly brace before the else in the code above. The ending curly brace of the IF + condition tells R to stop evaluating and start processing. If you have an ELSE, + it must come in the same line after the curly brace; otherwise, the R interpreter will not be able + to evaluate the else. Note what happens when you put else in the next line:

+ +
+
+
Copy
+
+ +
+
+# r2a.txt
+v1 <- 100
+if (v1<=100)
+{
+   print ('v1 is less than or equal to 100')
+print ('Not Indented')
+} 
+else 
+{
+   print ('v1 is greater than 100')
+      print ('Way Too much Indented')
+}
+
+
+ +

Output:

+ +
+
+
Copy
+
+ +
+
+[1] "v1 is less than or equal to 100"
+[1] "Not Indented"
+Error: unexpected 'else' in "else"
+Execution halted
+
+
+ +

The else was not properly handled, because it was not in the same line as the + ending curly brace of the IF. This is a very important syntax difference from + PL/SQL that you should be aware of. Most developers familiar with other languages such as C, where the curly + braces are used as well, make this mistake.

+ +

What if you want to put another condition? There could be another IF after the + ELSE: +

+ +
+
+
Copy
+
+ +
+
+#r3.txt
+v1 <- 100
+if (v1 < 100)
+{
+   print ('v1 is less than 100')
+} else 
+  if (v1 == 100)
+{
+   print ('v1 is equal to 100')
+} else
+{
+   print ('v1 is greater than 100')
+}
+
+
+ +

Executing that code, we get this:

+ +
+
+
Copy
+
+ +
+
+C:\>rscript r3.txt
+[1] "v1 is equal to 100"
+
+
+ + +

Null is Nothing

+ +

When you have to put in a line but nothing needs to be done, you usually use a NULL statement in PL/SQL.

+ +
+
+
Copy
+
+ +
+
+--pl4.sql
+declare
+  x number := 10;
+  y number := 11;
+begin
+  if (x<y) then
+    dbms_output.put_line('Yes');
+  else
+    null;
+  end if;
+end;
+/
+
+ +

The null statement in line 9 is required. You have to put a valid PL/SQL statement + within the IF and END IF statements. Otherwise, the + code will produce an error. In R, you don't need to put anything between the curly braces. The following + is the equivalent code in R.

+ +
+
+
Copy
+
+ +
+
+# r4.txt
+
+x <- 10
+y <- 11
+if (x<y) {
+  print ("Yes")
+} else
+{
+}
+
+
+ + +

Let's Make a Case

+ +

The CASE statement of PL/SQL is pretty powerful. It allows you to define several + conditions in one expression. Here is an example.

+ +
+
+
Copy
+
+ +
+
+--pl5.sql
+
+declare
+  n1 number;
+begin
+  n1 := 5;
+  case
+    when (n1<=25) then
+      dbms_output.put_line('n1 is within 25');
+    when (n1<=50) then
+      dbms_output.put_line('n1 is within 50');
+    when (n1<=75) then
+      dbms_output.put_line('n1 is within 75');
+    else
+      dbms_output.put_line('n1 is greater than 75');
+  end case;
+end;
+/
+
+
+ +

There is an equivalent of the CASE statement in R. It's called the switch() function. But unlike CASE, the switch() function works in different ways when the first argument is an integer or + a character. Let's first see the effect with an integer input:

+ +
+
+
Copy
+
+ +
+
+> v1 <- switch (1,'first','second','third','fourth')
+> v1
+[1] "first"
+
+
+ +

The switch statement returned "first" because + the first argument is the integer 1. Therefore, the switch statement picked the + first argument from the list of selections. Similarly, if you choose other numbers, switch will choose the corresponding values.

+ +
+
+
Copy
+
+ +
+
+# r5.txt
+
+> v1 <- switch (2,'first','second','third','fourth')
+> v1
+[1] "second"
+> v1 <- switch (3,'first','second','third','fourth')
+> v1
+[1] "third"
+> v1 <- switch (4,'first','second','third','fourth')
+> v1
+[1] "fourth"
+
+
+ +

What if you pass a number for which there is no corresponding selection, for example, 0? Let's see:

+ +
+
+
Copy
+
+ +
+
+# r5a.txt
+
+> v1 <- switch (0,'first','second','third','fourth')
+> v1
+NULL
+> v1 <- switch (5,'first','second','third','fourth')
+> v1
+NULL
+
+
+ +

Note the output, which shows NULL. In Part 1, you + saw a value called NA, which was roughly the equivalent of NULL in PL/SQL. The R NULL has no clear equivalent; + it's sort of undefined.

+ +

The first argument does not have to be an integer. It could also be an implied integer, for example, + a boolean value. Remember, boolean TRUE and FALSE + evaluate to 1 and 0, respectively. If the first parameter is an expression that results in a boolean value, + it is converted to an integer. If the expression is 3 < 5, the result will be + TRUE, that is, 1; so switch will pick the first + argument from the choices: +

+ +
+
+
Copy
+
+ +
+
+# r5b.txt
+
+> v1 <- switch (3<5,'first','second','third','fourth')
+> v1
+[1] "first"
+
+
+ +

Unfortunately it doesn't work in all cases. What if the expression evaluates to FALSE? It will be then converted to 0; but there is no 0th choice. So it will + return a NULL:

+ +
+
+
Copy
+
+ +
+
+# r5c.txt
+
+> v1 <- switch (3>5,'first','second','third','fourth')
+> v1
+NULL
+
+
+ +

The switch function works differently when the first input is a character. In this + format, you will need to provide the return values for several input values. You can also provide the + default value for when none of the input values match. For instance, here is an R expression that returns + the position of vowels, that is, 1 for a, 2 for b, and so on. It should return 0 if the input is not a + vowel.

+ +
+
+
Copy
+
+ +
+
+# r6.txt
+
+> v1 <- switch('a',a=1,e=2,i=3,o=4,u=5,0)
+> v1
+[1] 1
+> v1 <- switch('z',a=1,e=2,i=3,o=4,u=5,0)
+> v1
+[1] 0
+
+
+ +

This format of the switch() function is closer to the CASE statement in PL/SQL when there is a default value.

+ + +

FOR the Love of Loop

+ +

Consider the usual looping code in PL/SQL using the FOR ... LOOP ... END LOOP + construct. The general structure is this:

+ +
+
+
Copy
+
+ +
+
+FOR i in StartingNumber ... EndingNumber LOOP
+   ...
+END LOOP;
+
+
+ +

The R equivalent also uses for; but there is no LOOP + keyword and, consequently, no END LOOP either. As in the case of the if statement, the end of the conditional expression is marked by the curly brace + start. Also, like the if statement, the block of program statements to be + repeated is identified by being enclosed inside curly braces.

+ +

But before we start talking about loop, we need to know how to create a start and + end number range. It's the seq() function.

+ +
+
+
Copy
+
+ +
+
+> seq(10,20)
+[1] 10 11 12 13 14 15 16 17 18 19 20
+
+
+ +

Let's see a very simple function that generates 11 values from 10 to 20 and displays them one by one:

+ +

Here's PL/SQL code:

+ +
+
+
Copy
+
+ +
+
+-- pl7.sql
+
+begin
+  for i in 1..10 loop
+    dbms_output.put_line('i= '||i);
+  end loop;
+end;
+
+
+ +

Here's the R code:

+ +
+
+
Copy
+
+ +
+
+for (i in seq(10:20)) {
+  print(i)
+}
+
+
+ +

Here's the output:

+ +
+
+
Copy
+
+ +
+
+[1] 10
+[1] 11
+[1] 12
+[1] 13
+[1] 14
+[1] 15
+[1] 16
+[1] 17
+[1] 18
+[1] 19
+[1] 20
+
+
+ +

The seq() function is very useful in R. You will be using it a lot to create data, + especially when trying to fit models. Let's see some more parameters of this function. If you want to + skip values, you can pass the optional third parameter to specify the skip values. Let's say, we want to + print 10 to 20 but skip every 2 numbers.

+ +
+
+
Copy
+
+ +
+
+> seq(10,20,2)
+[1] 10 12 14 16 18 20
+
+
+ +

Similarly we can use the third parameter to increment negatively. To produce numbers from 20 to 10, + incrementing by 1, the third parameter should be -1:

+ +
+
+
Copy
+
+ +
+
+> seq(20,10,-1)
+[1] 20 19 18 17 16 15 14 13 12 11 10
+
+
+ + +

Give Me a Break

+ +

If you want to break the loop iteration, just use the break statement, which is + exactly same as the break statement in PL/SQL. If you want to enter a number and + check if it has a multiple between 10 and 20, you could use the following. In R %% is the modulo operator, equivalent to the mod() + function in PL/SQL. The expression v1 %% v2 returns 0 if v1 is a multiple of v2. You want to iterate through the + loop from 10 and 20, but stop when you find a multiple. The break statement comes + in there.

+ +
+
+
Copy
+
+ +
+
+# r8.txt
+
+n <- as.integer(readline("Enter a number> "))
+for (i in rep(10:20)) {
+  print(i)
+  if (i%%n == 0)
+  {
+     break
+  }  
+}
+
+
+ +

Executing it produces the following:

+ +
+
+
Copy
+
+ +
+
+> source ("r8.txt")
+
+Enter a number> 7
+[1] 10
+[1] 11
+[1] 12
+[1] 13
+[1] 14
+
+
+ + +

Looping While

+ +

The second type of loop we will cover is a variant of FOR but without a start and + end: the WHILE loop. It allows you to loop as long as a condition is met (the + condition can be set to always be true for a forever loop). Here is an example of printing the 10 number. +

+ +

PL/SQL code:

+ +
+
+
Copy
+
+ +
+
+-- pl9.sql
+declare
+  i number := 0;
+begin
+  while (i<11) loop
+     dbms_output.put_line('i= '||i);
+     i := i+1;
+  end loop;
+end;
+/
+
+ +

The output:

+ +
+
+
Copy
+
+ +
+
+i= 0
+i= 1
+i= 2
+i= 3
+i= 4
+i= 5
+i= 6
+i= 7
+i= 8
+i= 9
+i= 10
+
+
+ +

In R, the syntax is the same: while. Like the FOR loop, + the code to be inside the WHILE loop is marked by the curly braces, equivalent to + the BEGIN and END markers of PL/SQL. Like PL/SQL, the + indentations are merely for readability; they are not part of the syntax.

+ +
+
+
Copy
+
+ +
+
+# r9.txt
+i <- 0
+while (i<11) {
+  print(i)
+  i <- i+1
+}
+
+
+ +

The output:

+ +
+
+
Copy
+
+ +
+
+[1] 0
+[1] 1
+[1] 2
+[1] 3
+[1] 4
+[1] 5
+[1] 6
+[1] 7
+[1] 8
+[1] 9
+[1] 10
+
+
+ + +

Breaking from the While Loop

+ +

Suppose you want to put a condition in the loop that will make the program break away from the loop when the + condition is satisfied. For instance, in the previous program, you want to break form the loop when the + variable i is a multiple of 5. In PL/SQL, you can do that in two different ways: +

+ + + +

Functionally they are the same. In R, the keyword break breaks the loop from + executing and jumps out to the first line after the loop. We will examine the approaches in both + these languages.

+ +

In PL/SQL using approach 1:

+ +
+
+
Copy
+
+ +
+
+--pl10a.sql
+declare
+  i number := 1;
+begin
+  while (i<11) loop
+    exit when mod (i,5) = 0;
+    dbms_output.put_line('i= '||i);
+    i := i+1;
+  end loop;
+end;
+/
+
+ +

The output:

+ +
+
+
Copy
+
+ +
+
+i= 1
+i= 2
+i= 3
+i= 4
+
+
+ +

In PL/SQL using approach 2:

+ +
+
+
Copy
+
+ +
+
+--pl10b.sql
+declare
+  i number := 1;
+begin
+  while (i<11) loop
+     dbms_output.put_line('i= '||i);
+     i := i+1;
+     if mod (i,5) = 0 then
+        exit;
+     end if;
+  end loop;
+end;
+/
+
+ +

In either approach, the output is the same. While the output is same, the approaches are different and might + behave differently. In the first approach, the condition for breaking is checked immediately at the start of + the loop. In the second approach, it's evaluated after the counter is incremented. So you have to be + careful when coding for either approach. The change in logic might be subtle, but it is important and can + introduce bugs in a program.

+ +

In R, there is no equivalent of the first version. The second version is what you would use in R. You already + saw an example in the FOR loop. Let's see the same for the WHILE loop. Here is the R code. Execute it yourself on the R command line see the + results.

+ +
+
+
Copy
+
+ +
+
+# r10.txt
+i <- 1
+while (i<11) {
+  print(i)
+  if (i%%5 == 0) {
+    break
+  }
+  i <- i+1
+}
+
+
+ + +

Repeat Until I Break

+ +

You might have seen another case where you you need to repeat the loop indefinitely until a break condition + comes in. In those cases a WHILE loop with a condition that always evaluates to + TRUE will help. Here is an example in PL/SQL: +

+ +
+
+
Copy
+
+ +
+
+WHILE (TRUE) LOOP
+...
+IF Condition THEN 
+   BREAK;
+END IF
+END LOOP
+
+
+ +

In R, you can write it the same way:

+ +
+
+
Copy
+
+ +
+
+while(TRUE) 
+{
+   if Condition 
+   {
+       break
+   }
+}
+
+
+ +

There is a simpler way in R: using the repeat clause. Here is an example:

+ +
+
+
Copy
+
+ +
+
+# r11.txt
+n <- as.integer(readline("Enter an integer> "))
+i <- 1
+repeat {
+  if (i == n) 
+  {
+     cat("It took me ", i, " iterations to find your number\n")
+     break
+  }
+  i <- i+1
+}
+
+
+ +

Executing it, produces the following:

+ +
+
+
Copy
+
+ +
+
+Enter an integer> 29
+It took me 29 iterations to find your number
+
+
+ + +

Next, Please

+ +

Another element of the loop is the next statement. This allows you to skip over a + loop based on a condition. Let's take the same example we saw for the break + statement. But let's say we don't need to count in multiples of 5. In other words, we count how many + iterations we had to do to come to the number entered by the user; but we will not count iterations 5, 10, + 15, and so on.

+ +
+
+
Copy
+
+ +
+
+# r12.txt
+n <- as.integer(readline("Enter an integer> "))
+i <- 0
+j <- 1
+repeat {
+  i <- i+1
+  if (i %% 5 == 0)
+  {
+     next
+  }
+  if (i == n) 
+  {
+     cat("It took me ", j, " iterations to find your number\n")
+     break
+  }
+  j <- j+1
+}
+
+
+ +

Executing it produces this:

+ +
+
+
Copy
+
+ +
+
+> source('r12.txt')
+Enter an integer> 29
+It took me 24 iterations to find your number
+
+
+ + +

Let's Continue

+ +

Remember the PL/SQL continue statement? It is used inside a loop to instruct the + program to jump to the end of the loop and continue with the rest of the loop iterations as usual. The + syntax is equivalent to the next statement in R. Let's see a small example: +

+ +

The PL/SQL code:

+ +
+
+
Copy
+
+ +
+
+-- pl13.sql
+declare
+  mynum number := 3;
+begin
+  for i in 1..10 loop
+    if mod (i,mynum) = 0 then
+      dbms_output.put_line('multiple found as '||i);
+      continue;
+      dbms_output.put_line('we are continuing');
+    end if;
+    dbms_output.put_line ('No multiple found as '||i);
+  end loop;
+end;
+/
+
+
+ +

Executing the code:

+ +
+
+
Copy
+
+ +
+
+No multiple found as 1
+No multiple found as 2
+multiple found as 3
+No multiple found as 4
+No multiple found as 5
+multiple found as 6
+No multiple found as 7
+No multiple found as 8
+multiple found as 9
+No multiple found as 10
+
+
+ +

The R code:

+ +
+
+
Copy
+
+ +
+
+# r13.txt
+mynum <- 3
+for (i in 1:10)
+{
+   if (i%%mynum == 0)
+   { 
+      cat ("Multiple found as ", i, "\n")
+      next
+   }
+   cat ("No multiple found as ", i, "\n")
+}
+
+
+ +

Executing the R code:

+ +
+
+
Copy
+
+ +
+
+> source('r13.txt')
+No multiple found as 1
+No multiple found as 2
+Multiple found as 3
+No multiple found as 4
+No multiple found as 5
+Multiple found as 6
+No multiple found as 7
+No multiple found as 8
+Multiple found as 9
+No multiple found as 10
+
+
+ + +

Functions

+ +

As is the case in most languages, R provides repeatable code segments, similar to procedures and functions in + PL/SQL. As you already know, in PL/SQL, a procedure does not return anything (although it can have an OUT parameter; but that's not a return, so it's not the same thing), and a + function returns a single value. In R, the equivalents of both PL/SQL procedures and functions is called + simply a function. A Python function may or may not return anything.

+ +

In this article, we will cover how to write functions and use them in your programs. As in Part 1 of this + series, we will see how to do something in PL/SQL and then do the equivalent in R.

+ +

A function definition in PL/SQL has this general syntax format:

+ +
+
+
Copy
+
+ +
+
+function FunctionName (
+  Parameter1Name in DataType,
+  Parameter2Name in DataType,
+...
+return ReturnDatatype
+is
+   localVariable1 datatype;
+   localVariable2 datatype;
+begin
+   ... function code ...
+    return ReturnVariable;
+end;
+
+
+ +

A procedure definition in PL/SQL has this general syntax:

+ +
+
+
Copy
+
+ +
+
+procedure ProcedureName (
+ Parameter1Name in DataType,
+ Parameter2Name in DataType,
+...
+) 
+is
+   localVariable1 datatype;
+   localVariable2 datatype;
+begin
+... procedure code ...
+end;
+
+
+ +

A function definition syntax is somewhat convoluted in my opinion, compared PL/SQL; but the rest of the body + is pretty standard. R follows this simple syntax:

+ +
+
+
Copy
+
+ +
+
+FunctionName  <- function (Parameter1Name,Parameter2Name, ...) {
+        ... function code ...
+  return ReturnVariable
+
+}
+
+
+ +

Note some important properties of the R function definition compared to the PL/SQL equivalent:

+ + + +

Now that you've got the basic idea about the syntax vis-à-vis PL/SQL, let's start with a very + simple procedure in PL/SQL that accepts a principal amount and interest rate, computes the interest amount + and the new principal after the interest is added, and displays the new principal.

+ +

Here is how we do it in PL/SQL. Note that I deliberately chose to use the R naming convention, for example, + pPrincipal, not a PL/SQL-style variable name such as p_principal. +

+ +

PL/SQL code:

+ +
+
+
Copy
+
+ +
+
+-- pl14.sql
+declare procedure calcInt (
+  pPrincipal number,
+  pIntRate number
+) is
+  newPrincipal number;
+begin
+  newPrincipal := pPrincipal * (1+(pIntRate/100));
+  dbms_output.put_line ('New Principal is '||newPrincipal);
+end;
+
+begin
+calcInt(100,10);
+end;
+/
+
+
+ +

Here is the output:

+ +
+
+
Copy
+
+ +
+
+New Principal is 110
+
+
+ +

R code:

+ +
+
+
Copy
+
+ +
+
+# r14.txt
+
+calcInt <- function (pPrincipal, pIntRate)
+{
+  newPrincipal <- pPrincipal * (1+(pIntRate/100))
+  paste("New Principal is ",as.character(newPrincipal))
+}
+
+
+ +

We save this as r1.txt and call it using the source() + function you learned about in Part 1 of this series.

+ +
+
+
Copy
+
+ +
+
+> source('r14.txt')
+> calcInt(100,10)
+[1] "New Principal is 110"
+
+
+ +

Default Value of Parameters

+ +

Sometimes you need to pass a default value to a parameter. This value is in effect if the user does not + explicitly pass the parameter. Building on the previous procedure, suppose we want to make the parameter + pIntRate optional, that is, make it a certain value (such as 5) when the user + does not explicitly mention it. In PL/SQL, you mention the parameter this way: +

+ +

ParameterName DataType := DefaultValue

+ +

In R, it's exactly the same, but because the assignment operator in R is the equals sign (=), not :=, that's what you need to use. Besides, + remember, you don't mention the data type for parameters. Here is the general syntax:

+ +

ParameterName = DefaultValue

+ +

You can write the PL/SQL function this way (the changes are in bold):

+ +
+
+
Copy
+
+ +
+
+--pl15.sql
+declare
+  procedure calcInt (
+     pPrincipal number,
+     pIntRate number := 5
+) is
+  newPrincipal number;
+begin
+  newPrincipal := pPrincipal *(1+(pIntRate/100));
+  dbms_output.put_line('New Principal is '||newPrincipal);
+end;
+
+begin
+-- don't mention the pIntRate parameter.
+-- defaults to 5
+calcInt(100);
+end;
+/
+
+ +

R code:

+ +
+
+
Copy
+
+ +
+
+# r15.txt
+calcInt <- function (pPrincipal, pIntRate = 5)
+{
+  newPrincipal <- pPrincipal * (1+(pIntRate/100))
+  paste("New Principal is ",as.character(newPrincipal))
+}
+
+
+ +

One important property of functions in R is that the default values can be variables as well. This is not + possible in PL/SQL. For instance, in PL/SQL the following will be illegal:

+ +
+
+
Copy
+
+ +
+
+-- pl16.sql
+declare
+  defIntRate number := 5;
+procedure calcInt (
+  pPrincipal number,
+  pIntRate number := defIntRate;
+) is
+ 
+...
+
+
+ +

But it's perfectly valid in R. Let's see how:

+ +
+
+
Copy
+
+ +
+
+# r16.txt
+defIntRate <- 5
+calcInt <- function (pPrincipal, pIntRate = defIntRate)
+{
+  newPrincipal <- pPrincipal * (1+(pIntRate/100))
+  paste("New Principal is ",as.character(newPrincipal))
+}
+
+
+ +

The variable defIntRate dynamically influences the operation of the function. If + you change the value of this variable, the function changes as well. Consider this following example:

+ +
+
+
Copy
+
+ +
+
+> calcInt(100)
+[1] "New Principal is 105"
+
+
+ +

Now let's change the value of this variable to 10 and re-execute this function.

+ +
+
+
Copy
+
+ +
+
+> defIntRate <- 10
+> calcInt(100)
+[1] "New Principal is 110"
+
+
+ +

The new value of the variable took effect in the function.

+ +

Positional Parameters

+ +

You already know that in PL/SQL, you do not have to provide parameter values in the order in which the + parameters were defined in the procedure. You can pass values by specifying the parameter by name. For + instance, if a procedure F1 assumes the parameters P1 + and P2in that order—you can call the procedure this way + with the parameter values Val1 and Val2, respectively: +

+ +
+
+
Copy
+
+ +
+
+F1 (Val1, Val2);
+
+
+ +

But you can also call them with explicit parameter name assignments:

+ +
+
+
Copy
+
+ +
+
+F1 (P2 => Val2, P1 => Val1);
+
+
+ +

This explicit naming allows you to order the parameters any way you want when calling the procedure. It also + allows you to skip some non-mandatory parameters. In R, the equivalent syntax is this:

+ +
+
+
Copy
+
+ +
+
+F1 (P2=Val2, P1=Val1)
+
+
+ +

So, just the greater-than operator (=>) is changed to the equals sign (=). Let's see examples in both PL/SQL and R.

+ +

PL/SQL example:

+ +
+
+
Copy
+
+ +
+
+--pl7.sql
+declare
+  procedure calcInt (
+    pPrincipal number,
+    pIntRate number := 5
+) is
+  newPrincipal number;
+begin
+  newPrincipal := pPrincipal *(1+(pIntRate/100));
+  dbms_output.put_line('New Principal is '||newPrincipal);
+end;
+
+begin
+  calcInt(pIntRate=>10, pPrincipal=>100);
+end;
+/
+
+ +

The output is this:

+ +
+
+
Copy
+
+ +
+
+New Principal is 110
+
+
+ +

R example:

+ +
+
+
Copy
+
+ +
+
+# r3.txt
+calcInt <- function (pAccType = "Savings", pPrincipal, pIntRate = 5)
+{
+  vIntRate <- pIntRate
+  if (pAccType == "Savings")
+  {
+    # eligible for bonus int rate
+    vIntRate <- pIntRate + 5
+  }
+  newPrincipal <- pPrincipal * (1+(vIntRate/100))
+  paste("New Principal is ",as.character(newPrincipal))
+}
+
+
+ +

Executing the R code produces this:

+ +
+
+
Copy
+
+ +
+
+> calcInt(pPrincipal=100)
+[1] "New Principal is 110"
+> calcInt(pPrincipal=100, pAccType = "Savings")
+[1] "New Principal is 110"
+> calcInt(pPrincipal=100, pAccType = "Checking")
+[1] "New Principal is 105"
+
+
+ +

One of the useful cases in PL/SQL is to define a default value only when the value is not explicitly + provided. Take for instance, when the user didn't specify anything for the interest rate, and you want + the default values to be based on something else, for example, the account type. If the account type is + Savings (the default), the interest rate should should be 10 percent; otherwise, it should be 5 percent. + Here is how you will need to write the function:

+ +
+
+
Copy
+
+ +
+
+-- pl18.sql
+declare
+  procedure calcInt (
+    pPrincipal number,
+    pIntRate number := null,
+    pAccType varchar2 := 'Savings'
+  ) is
+    newPrincipal number;
+    vIntRate number;
+  begin
+    if (pAccType = 'Savings') then
+      if (pIntRate is null) then
+         vIntRate := 10;
+      else
+         vIntRate := pIntRate;
+      end if;
+    else
+      if (pIntRate is null) then
+        vIntRate := 5;
+      else
+        vIntRate := pIntRate;
+      end if;
+    end if;
+    newPrincipal := pPrincipal * (1+(vIntRate/100));
+    dbms_output.put_line('New Principal is '|| newPrincipal);
+  end;
+begin
+  calcInt(100);
+  calcInt(100, pAccType => 'Checking');
+end;
+/
+
+ +

The equivalent of the following line:

+ +
+
+
Copy
+
+ +
+
+pIntRate       number := null,
+
+
+ +

in R is this:

+ +
+
+
Copy
+
+ +
+
+pIntRate = NULL
+
+
+ +

The PL/SQL equivalent of IS NULL in R is the function is.null(). Here is the complete R example (note the capitalization of + "NULL"):

+ +
+
+
Copy
+
+ +
+
+# r18.txt
+calcInt <- function (pAccType = "Savings", pPrincipal, pIntRate = NULL)
+{
+  if (is.null(pIntRate)) 
+  {
+     vIntRate <- 5
+  } else
+  {
+    vIntRate <- pIntRate
+  }
+  if (pAccType == "Savings")
+  {
+     # eligible for bonus int rate
+     vIntRate <- pIntRate + 5
+  }
+  newPrincipal <- pPrincipal * (1+(vIntRate/100))
+  paste("New Principal is ",as.character(newPrincipal))
+}
+
+
+ +

Executing the R code produces this:

+ +
+
+
Copy
+
+ +
+
+> source('r18.txt')
+> calcInt(pPrincipal=100, pAccType = "Checking")
+[1] "New Principal is 105"
+> calcInt(pPrincipal=100, pAccType = "Checking", pIntRate = 10)
+[1] "New Principal is 110"
+> calcInt(pPrincipal=100, pAccType = "Checking", pIntRate = NULL)
+[1] "New Principal is 105"
+
+
+ +

Returning Values

+ +

So far, we have talked about procedures in PL/SQL, which do not return anything. In contrast, functions in + PL/SQL return a value. Here is a simple example of a function that returns the interest rate for the account + type, which is the parameter passed to it:

+ +
+
+
Copy
+
+ +
+
+--pl19
+declare
+  function getIntRate(
+    pAccType in varchar2
+  )
+  return number
+  is
+    vRate number;
+  begin
+    case pAccType
+      when 'Savings' then vRate := 10;
+      when 'Checking' then vRate := 5;
+      when 'MoneyMarket' then vRate := 15;
+    end case;
+    return vRate;
+  end;
+begin
+  dbms_output.put_line('Int Rate = '||getIntRate('Savings'));
+  dbms_output.put_line('Int Rate = '||getIntRate('Checking'));
+  dbms_output.put_line('Int Rate = '||getIntRate('MoneyMarket'));
+end;
+/
+
+ +

Here is the output:

+ +
+
+
Copy
+
+ +
+
+Int Rate = 10
+Int Rate = 5
+Int Rate = 15
+
+
+ +

The equivalent of the following code line:

+ +
+
+
Copy
+
+ +
+
+return vRate;
+
+
+ +

in R, fortunately, is similar, but not exactly the same:

+ +
+
+
Copy
+
+ +
+
+return (vRate)
+
+
+ +

Note the parentheses. Here is the R function:

+ +
+
+
Copy
+
+ +
+
+# r19.txt
+getIntRate <- function (pAccType)
+{
+  if (pAccType == "Savings")
+  {
+    vRate <- 10
+  } else
+  if (pAccType == "Checking")
+  {
+    vRate <- 5
+  } else
+  if (pAccType == "MoneyMarket")
+  {
+    vRate <- 15
+  }
+  return (vRate)
+}
+
+
+ +

Executing the R code produces this:

+ +
+
+
Copy
+
+ +
+
+> getIntRate("Savings")
+[1] 10
+
+
+ +

You can try the other values:

+ +
+
+
Copy
+
+ +
+
+> getIntRate("Checking")
+> getIntRate("MoneyMarket")
+
+
+ +

Another way to write the same function logic is using the switch() function you + learned about earlier in this article. It's the equivalent of the CASE + statement in PL/SQL.

+ +
+
+
Copy
+
+ +
+
+# r19a.txt
+getIntRate <- function (pAccType)
+{
+  vRate <- switch (pAccType,"Savings"=10, "Checking"=5, "MoneyMarket"=15,0)
+  return (vRate)
+}
+
+
+ +

A very important concept of functions in R is that the return (...) value is + implicit. By default, the function returns the value of the last assigned object, even if you don't + actually have an explicit return statement. Let's see the example of a very + simple function that takes a number and two variables are assigned inside the body. The function doesn't + return anything.

+ +
+
+
Copy
+
+ +
+
+# r20.txt
+
+f1 <- function(inVal)
+{
+   v1 <- inVal * 2
+   v2 <- v1 * 2
+}
+
+
+ +

As you can see, the function returns nothing. Now let's call the function:

+ +
+
+
Copy
+
+ +
+
+> f1(2)
+
+
+ +

There is no result, because the function doesn't return or print anything. Now let's capture the + return value of the function in a variable v3:

+ +
+
+
Copy
+
+ +
+
+> v3 <- f1(2)
+> v3
+[1] 8
+
+
+ +

How did the function return 8, when we didn't write a return statement? + It's because the last assigned value was v2 and that was implicitly returned. + By default, all functions will implicitly return the last value assigned. So does it mean that we need not + write the return statement? Of course not. We need an explicit return in the function code for these two reasons:

+ + + +

Variable Scope

+ + +

As you write multiple levels of code in R, such as subprograms calling other subprograms, you might face the + prospect of encountering the same names being defined for variables inside these subprograms. In that case, + which of the values assigned to the variables will be relevant? This is where you have to know how the scope + of the variables, which is a very important concept to remember. Let's start with a simple example + function that accepts an input value, stores it in a local variable called v1, + and prints it.

+ +
+
+
Copy
+
+ +
+
+#r21.txt
+f1 <- function(inVal)
+{
+  v1 <- inVal
+  cat ("v1=", v1, "\n")
+}
+
+
+ +

Executing the code produces this:

+ +
+
+
Copy
+
+ +
+
+> source("r21.txt")
+> f1(1)
+v1= 1
+
+
+ +

What will happen if you have another variable outside the function that has the same name of "v1"? + Will the function use the value set inside the function or simply get the value from outside? Let's + change the code and see the output. We first set the value of v1 to 10 outside + the function and set it to 1 inside:

+ +
+
+
Copy
+
+ +
+
+#r22.txt
+v1 <- 10
+f1 <- function(inVal)
+{
+  v1 <- inVal
+  cat ("v1=", v1, "\n")
+}
+
+ +

When we execute the code, what should we get? Let's see:

+ +
+
+
Copy
+
+ +
+
+> source("r22.txt")
+> f1(1)
+v1= 1
+
+
+ +

The output is 1, which is the value assigned inside the function. The prior assigned value, 10, was + not considered. This is the expected behavior in pretty much any language, including PL/SQL. So it's no + surprise.

+ +

However, what happens if a variable called v1 is not even created inside the + function, as shown below:

+ +
+
+
Copy
+
+ +
+
+#r22a.txt
+v1 <- 10
+f1 <- function(inVal)
+{
+cat ("v1=", v1, "\n")
+}
+
+
+ +

Note that the variable v1 is not defined inside the function, yet it is called in + the cat() function. In PL/SQL, if you reference a variable not defined in the + function, you will get a syntax error. Let's see what happens in R. Executing the R code results in + this:

+ +
+
+
Copy
+
+ +
+
+> source("r22a.txt")
+> f1(1)
+v1= 10
+
+
+ +

Whoa! What happened? We did not get a syntax error. Instead R pulled up the variable v1 defined outside the function. This is a very important property of R, which is + very unlike PL/SQL. You should pay attention to this behavior, because it can cause many bugs if it is not + understood properly. Let's recap. If a variable is referenced inside a function, R first looks to see if + that variable is already defined inside the function. If so, the value is used. Otherwise, if that + variable is not defined, R looks up to the immediate top level code to see if that variable is defined + there. If it is found, that value is used. If is is not found there, the next upper level of code + is checked, and so on.

+ + +

Summary

+ +

Let's take a recap of what we explored in this article. Like the previous article, we will examine one + element of PL/SQL and contrast it with the equivalent R element.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 PL/SQLR
Read user inputIt's actually done in SQL*Plus using PROMPT and ACCEPT.It's done using the readline() function, which returns a string. + So, the input must be converted to whatever type you wish. For example, as.integer() converts the input to an integer.
Basic IFif (condition) then
+     List of statements to be executed
+ end if;
+     List of statements outside of IF block
if (condition)
+ {
+     List of statements to be executed
+ }
+     List of statements outside of IF block

+
+ Note the dissimilarities:
+ 1. There is no end if.
+ 2. The code for the IF block is inside curly braces, which is + equivalent to if..end if.
+ 3. The condition must be within parentheses. +
ELSEif (condition) then
+    List of statements to be executed
+ else
+    List of statements
+ end if;
+ List of statements outside of IF block
if (condition)
+    List of statements to be executed
+ } else
+    List of the statements in else block
+ }

+
+ There is no end if and as with IF, the + blocks of code are enclosed in curly braces. The else statement must + be on the same line as the ending curly brace. +
ELSIFELSIFNo equivalent. You should use else if (condition).
Do nothing where a legal, syntactically correct statement is neededNULL;Not needed. You can use {} to indicate a null.
FOR loopfor i in Start...End
+ loop
+     List of statements
+ end loop;
for (i in Start,End)
+ {
+     List of statements
+ } Statement not inside the for loop

+
+ Note that the expression for the for loop is within parentheses, + which is mandatory. +
Skip counters in a loop, for example, skipping one number as, 1, 3, 5, and so onfor i in Start...End
+ loop
+     if mod(i,2) = 0 then
+       Do something
+     end if;
+ end loop;
for i in seq(Start, End, how much to skip)
+      Do something
+ }
Count backwards in a loop, for example, 3, 2, 1, and so onfor i in reverse Start...End
+ loop
+     List of statements
+ end loop;
for (i in seq(Start, End, -1))
+ {
+     List of statements
+ }

+  
WHILE loopwhile (condition) loop
+     List of statements;
+ end loop;

+  
while (condition)
+ {
+     List of statements
+ }
+ Statement not inside the while loop

+
+ Note that the condition must be within parentheses. +
Break away from a loopEXIT WHEN (Some condition);
+
+ or
+
+ IF (Some condition) THEN EXIT; +
Inside a loop:
+
+ if (Some condition) {
+     break
+ }

+
+ Note the curly braces. +
case statementcase
+     when Some condition then
+       List of statements;
+     when Some other condition then
+       List of statements;
+ end case;
Use the switch() function.
Continue in a loopCONTINUEnext
Repeat a loopNo direct equivalent. Can be simulated using the WHILE(TRUE) LOOP ... END LOOP; construct.The following repeats the loop until an explicit break comes:
+
+ repeat {
+     Statements
+     if (condition) {
+       break
+     }
+ }
+
Procedures and functionsfunction FunctionName (
+     Parameter1Name in DataType,
+     Parameter2Name in DataType,
+     ...
+ )
+ return ReturnDatatype
+ is
+     localVariable1 datatype;
+     localVariable2 datatype; begin
+     ... function code ...
+     return ReturnVariable;
+ end;

+
+ A procedure definition in PL/SQL has this general syntax:
+
+ procedure ProcedureName(
+     Parameter1Name in DataType,
+     Parameter2Name in DataType,
+     ...
+ )
+ is
+     localVariable1 datatype;
+     localVariable2 datatype;
+ begin
+     ... procedure code ...
+ end;
+
A function definition in R follows this simple syntax:
+
+ FunctionName <- function (Parameter1Name, Parameter2Name, ...)
+ {
+       ... function code ...
+     return SomeValue
+ }

+
+ Unlike in PL/SQL, there is no difference between procedures and functions. Both types are called + functions. A function may or may not return anything. If it does return something, you do not + specify what data type it returns at the time of definition.
+
+ Function definitions start with the function name assignment operator followed by the keyword + function compared to function or procedure in PL/SQL.
+
+ Like PL/SQL, the parameters shown above are optional; not all functions need parameters. The + parameters do not have data types listed at definition. Therefore, you can pass any type of data + at runtime.
+
+ Unlike PL/SQL, if you don't have any parameters, you still need to have the parentheses, for + example, myFunc <- function () {
+
+ Unlike PL/SQL, there is no begin ... end block to designate the code + for a function. The curly braces determine the code of the function.
+
+ Like PL/SQL, the return statement is the last statement of the + function code to be executed.
+
+ Unlike PL/SQL, even if there is no return statement, the function + returns the last assigned value of a variable. +
Parameter default valueParameterName DataType := DefaultValueParameterName = DefaultValue
+
+ Unlike PL/SQL, default values of parameters can be variables, for example:
+
+ defIntRate = 5
+ calcInt <- function (pPrincipal, pIntRate = defIntRate) {
+
Positional parameter specification in functions and proceduresIf a PL/SQL procedure named F1 has two parameters, P1 and P2 (in that order), you have to pass + the parameter values at runtime in the same order, as shown below, where Val1 and Val2 are values for parameters + P1 and P2, respectively:
+
+ However, you can also use positional parameters by specifying the parameter names and their + values in any other order by using the => operator, as shown + below:
+
+ F1 (P2 => Val2, P1 => Val1); +
You can do the same thing in R, as follows:
+
+ f1 (p2 = val2, p1 = val1) +
Explicitly set input values of parameters to null at design time so that in the code you can + check if the parameter value was passed or not.pIntRate number := null,pIntRate = NULL
+
+ Note that "NULL" is uppercase. +
Return values for a functionreturn SomeValue;return (SomeValue)
+
+ The big difference is that you do not need to specify whether the function has to return + something, and if it does, you do not need to specify the data type of the return value at + design time.
+
+ Also, the return value has to be within parentheses. +
Global variablesThe variables defined outside functions will be different from variables defined with the same + name inside functions.Same behavior. However, two important caveats are particularly important for PL/SQL developers + to remember:
+
+ 1. Because there is no such thing as declaration of variables, variables come into existence + when they are first referenced. If they are referenced inside a function first, they are local; + otherwise, they are global. If you must be 100 percent clear about the scope, simply first + assign a value such as "None" to the variable wherever you want the scope to be.
+
+ 2. If a variable is not referenced at all in a function, but a variable of the same name exists + in the program outside the function, the variable is valid inside the function as well. +
+ +

Quiz

+ + +

Let's test your understanding with these simple questions.

+ +

Questions

+ +

1. Consider the following code:

+ +
+
+
Copy
+
+ +
+
+#q1.txt
+f1 <- function (inVal)
+{
+  v1 <- inVal * 2
+  cat ("Inside f1, v1=", v1, "\n")
+}
+f2 <- function (inVal)
+{
+  v1 <- inVal * 2
+  cat ("Inside f2, v1=", v1, "\n")
+}
+f3 <- function (inVal)
+{
+  v1 <- inVal * 2
+  cat ("Inside f3, v1=", v1, "\n")
+}
+
+f3(f2(f1(2)))
+
+
+ +

Here is the output:

+ +
+
+
Copy
+
+ +
+
+> source ("q1.txt")
+Inside f1, v1= 4
+Inside f2, v1=
+Inside f3, v1=
+
+
+ +

Why don't we see the values of v1 in the other functions?

+ +

2. Consider the following function:

+ +
+
+
Copy
+
+ +
+
+# q2.txt
+
+f1 <- function (inVal)
+{
+   v1 <- inVal * 2
+}
+
+
+ +

Note that there is no return statement. So this code doesn't return anything. + However, we still call it and assign the return value to another variable v2:

+ +
+
+
Copy
+
+ +
+
+> v2 <- f1(2)
+> v2
+[1] 4
+
+
+ +

How come the function returned 4?

+ +

3. You are starting an R session from scratch. You gave the following command:

+ +
+
+
Copy
+
+ +
+
+# q3.txt
+if (x<y) {
+  print('yes')
+}
+
+
+ +

And the output was this:

+ +
+
+
Copy
+
+ +
+
+Error: object 'x' not found
+
+
+ +

Why was the error produced? Isn't it true that R defines variables when they are referenced?

+ +

4. What will be result of the following code?

+ +
+
+
Copy
+
+ +
+
+# q4.txt
+v1 <- 10
+f1 <- function (inVal)
+{
+   v1 <- 4
+   2 * v1 * f2(inVal)
+}
+f2 <- function (inVal)
+{
+   inVal * v1
+}
+f1(2)
+
+
+ +

5. What will be the output of the following?

+ +
+
+
Copy
+
+ +
+
+> v1 <- 2
+> v2 <- switch(v1,100,200,300,400)
+> v2
+
+
+ +

6. Along the same lines, here is modified code where you ask the user to input the value of v1 instead of hardcoding it.

+ +
+
+
Copy
+
+ +
+
+# q6.txt
+
+> v1 <- readline("Enter a number> ")
+Enter a number> 2
+> v2 <- switch(v1,100,200,300,400)
+
+
+ +

But it failed with the following message:

+ +
+
+
Copy
+
+ +
+
+Error: duplicate 'switch' defaults: '100' and '200'
+
+ +

What happened? The only change you made was to accept the value; and now it's producing an error.

+ +

7. You are writing a statement to check the number input by the user is less than 100. Here is the code you + wrote:

+ +
+
+
Copy
+
+ +
+
+# q7.txt
+
+> v1 <- as.integer(readline("Enter a number> "))
+Enter a number> 5
+> v2 <- switch((v1<100), "Yes, less than 100", "No, greater than 100")
+> v2
+[1] "Yes, less than 100"
+
+
+ +

It worked correctly. It reported that the number entered by the user (5) is less than 100. So, you + re-executed the statement with a different input:

+ +
+
+
Copy
+
+ +
+
+> v1 <- as.integer(readline("Enter a number> "))
+Enter a number> 200
+> v2 <- switch((v1<100), "Yes, less than 100", "No, greater than 100")
+> v2
+NULL
+
+
+ +

Note the output. It's NULL, not the desired output. Why?

+ +

8. What is the difference between break and next + statements?

+ +

9. You have all the R commands in a file called, say, myscript.R. How can you call + the script and not have to enter the commands one by one?

+ +

10. I am trying to write a simple function that merely prints the word "Hello." So the function + doesn't accept any parameters. Here is how I started typing, but I got an error:

+ +
+
+
Copy
+
+ +
+
+> printHello <- function
++ {
+Error: unexpected '{' in:
+"printHello <- function
+{"
+
+
+ +

Why did I get the error? I don't have any parameter; so I can't pass it anyway.

+ +

Answers

+ +

1. Note the function definition. There are no return statements inside the + function. So the v1 value was not populated. The correct syntax would have been + this:

+ +
+
+
Copy
+
+ +
+
+#q1a.txt
+f1 <- function (inVal)
+{
+v1 <- inVal * 2
+cat ("Inside f1, v1=", v1, "\n")
+return (v1)
+}
+f2 <- function (inVal)
+{
+v1 <- inVal * 2
+cat ("Inside f2, v1=", v1, "\n")
+return (v1)
+}
+f3 <- function (inVal)
+{
+v1 <- inVal * 2
+cat ("Inside f3, v1=", v1, "\n")
+return (v1)
+}
+
+f3(f2(f1(2)))
+
+
+ +

Here is output now:

+ +
+
+
Copy
+
+ +
+
+> source("q1a.txt")
+Inside f1, v1= 4
+Inside f2, v1= 8
+Inside f3, v1= 16
+
+
+ +

2. Even though the function might not have an explicit return statement, the last + assigned value is returned implicitly. Because v1 was assigned last, it was + returned.

+ +

3. No; R creates variables when they are assigned, not when they are referenced. In this code, you simply + referenced x and y without assigning any value to + them. So they were not created. The following would have been valid code in which the values of x and y were assigned.

+ +
+
+
Copy
+
+ +
+
+# q3a.txt
+x <- 1
+y <- 2
+if (x<y) {
+  print('yes')
+}
+
+
+ +

4. It will be 160. Here is why. Inside the f1 code, you see a reference to f2; so R will go on to evaluate f2(2). Inside the f2 code, there is a reference to variable v1. But there + is no variable v1 defined inside f2. So R will look up + the variable v1 defined at the beginning of the code (that is, 10). So f2(2) will return 2 * 10 (that is, 20). Then control will pass to function f1. However, there is a variable named v1 inside it. So, + that value (that is, 4) will be used. f1(2) will evaluate to 2 * 4 * f2(2). Because f2(2) returned 20, this expression will + be 2 * 4 * 20 = 160.

+ +

5. It will be 200. When you pass an integer as the first argument to switch, it + uses that to decide which position it will look up to get the value. In this case, you have passed 2; hence, + it will look up to the second position, which has the value 200. Hence, the switch function will return 200.

+ +

6. The function readline() returns a value of character data type. So v1 is a character. The switch() function behaves + differently when the first argument is a character instead of a number. So, the syntax for the rest of the + arguments in the switch was wrong. The correct code is this:

+ +
+
+
Copy
+
+ +
+
+# q6a.txt
+
+> v1 <- as.integer(readline("Enter a number> "))
+Enter a number> 2
+> v2 <- switch(v1,100,200,300,400)
+> v2
+[1] 200
+
+
+ +

7. The switch function works on integers only, not on logical values. In the first + case (5<100), the expression is TRUE, which becomes 1. Therefore, switch() + picks up the first value on the list: "Yes, less than 100". However, in the second case, it was + FALSE, which equates to 0, and there was nothing for the 0-th option. It's not possible.

+ +

So, how would you write that code to test if the input number is less than or greater than 100? One option is + to use if ... else ,,, construct. But if you want to use switch(), you should use this:

+ +
+
+
Copy
+
+ +
+
+# q7a.txt
+
+v2 <- switch((v1<100)+1, "No, greater than 100", "Yes, less than 100")
+
+
+ +

Now the code will yield right results.

+ +

8. The break statement stops the loop and exits the loop completely. The code + after the loop is executed. The next statement simply jumps control to the end of + the loop; so control goes back to the beginning of the loop again.

+ +

9. There are two options:

+ + + +

In both options, we assume that the script file is in that directory.

+ +

10. Even if you don't need parameters (or arguments) to R functions, you still need to use parentheses: +

+ +
+
+
Copy
+
+ +
+
+printHello <- function()
+
+
+ +

About the Authors

+ +

Arup Nanda arup@proligence.com has been an Oracle + DBA since 1993, handling all aspects of database administration, from performance tuning to security and + disaster recovery. He was Oracle Magazine's DBA of the Year in 2003 and received an Oracle + Excellence Award for Technologist of the Year in 2012. Arup Nanda has been an Oracle DBA since 1993, + handling all aspects of database administration, from performance tuning to security and disaster recovery. + He was Oracle Magazine's DBA of the Year in 2003 and received an Oracle Excellence Award for + Technologist of the Year in 2012.

+
+
+ + \ No newline at end of file diff --git a/Articles/databases/123-oracle-r-advanced-analytics-for-hadoop-part-1.html b/Articles/databases/123-oracle-r-advanced-analytics-for-hadoop-part-1.html index e69de29..c674027 100644 --- a/Articles/databases/123-oracle-r-advanced-analytics-for-hadoop-part-1.html +++ b/Articles/databases/123-oracle-r-advanced-analytics-for-hadoop-part-1.html @@ -0,0 +1,542 @@ + + +
+
+ +

In this article, which is Part 1 of a series, we will look at how you can run R analytics at scale on a + Hadoop platform using Oracle R Advanced Analytics for Hadoop, which is a component of Oracle Big Data + Connectors and provides a set of R functions allowing you to connect to and process data stored on Hadoop + Distributed File System (HDFS) using Hive transparency as well as Oracle Database. Oracle R Advanced + Analytics for Hadoop provides interfaces to Oracle Database, HDFS, and Hive and interfaces for initiating + map-reduce jobs using by providing some simplified functions that abstract the underlying complexity away + from the data scientist. Oracle R Advanced Analytics for Hadoop has a number of highly scalable machine + learning algorithms and also utilizes some of the Apache Spark machine learning algorithms for greater + performance of in-memory distributed machine learning.

+ +

This article focuses on showing how you can get started with using Oracle R Advanced Analytics for Hadoop, + how to access and process data in Oracle Database, HDFS, and Hive and how to create and manage map-reduce + processes. In the second + part of this series, we will look at how you can use the various analytic features, machine + learning, and Apache Spark to process and analyze your data.

+ +

Overview of Oracle R Advanced Analytics for Hadoop

+ +

Oracle R Advanced Analytics for Hadoop is one of the components of the Oracle Big Data Connectors, and it + provides a set of R functions that allows you to connect to and manipulate data stored on HDFS using Hive + transparency. It also allows you to build map-reduce analytics and use the prepackaged algorithms exposed + through an R interface. Additionally, you can integrate with Apache Spark and other tools and languages for + greater performance for multilayer neural networks and for logistic regression. (Note: In Figure 1, + "Oracle R Advanced Analytics for Hadoop" is abbreviated as "ORAAH" and "SQL + Developer" refers to Oracle SQL Developer.)

+ +

Figure 1: Oracle R Advanced Analytics for Hadoop overview

+ +

Figure 1: Oracle R Advanced Analytics for Hadoop overview

+ +

Check Oracle + R Advanced Analytics for Hadoop 2.7.0 for a full listing of all the R functions contained + in the Oracle R Advanced Analytics for Hadoop package:.

+ +

How to Get Access to an Oracle R Advanced Analytics for Hadoop Environment

+ +

When it comes to trying out Oracle R Advanced Analytics for Hadoop, you have three main options. The first is + to get it installed on some of your existing servers that have Hadoop, Hive, and access to Oracle Database. + If such an environment is not easily accessible, alternatively, you can use one of the Oracle VM VirtualBox + prebuilt virtual machines. The virtual machine called Oracle Big Data Lite VM comes with a large number of + Oracle products installed on it including Oracle R Advanced Analytics for Hadoop, Oracle Database12c, + Hadoop, HBase, Hive, Impala, Kafta, and so on. This virtual machine is continually being updated with the + latest version of this software, and it can be downloaded from the following website: https://www.oracle.com/technetwork/community/developer-vm/index.html. +

+ +

As a third option, Oracle Big Data Cloud Service comes with Oracle R Advanced Analytics for Hadoop installed + along with many of the products that are on the Oracle Big Data Lite VM. Check the following web page to see + what cloud services are available: https://cloud.oracle.com/en_US/big-data.

+ +

Working with Data Across Datasources

+ +

Oracle R Advanced Analytics for Hadoop allows you to work seamlessly across many different locations for your + data including Oracle Database, Hive, and HDFS, as shown in Figure 2. In this article, I will illustrate how + you can use it to access and process data on each of these datasources.

+ +

Figure 2: Oracle R Advanced Analytics for Hadoop working with data across many datasources

+ +

Figure 2: Oracle R Advanced Analytics for Hadoop working with data across many + datasources.

+ +

The first example of using Oracle R Advanced Analytics for Hadoop is with data located in Oracle Database. + The following example code begins with loading the R package for Oracle R Advanced Analytics for Hadoop, + which is called ORCH. It then downloads a CSV file from the internet that + contains Beach Quality inspection reports for beaches located in Dublin, Ireland. This dataset will be used + in each of the examples to illustrate the moving of data between each of the data storage environments. When + this data is loaded into Oracle Database, you can use many of the R functions to inspect this data. The + following code illustrates the loading and usage this dataset.

+ +

For our first step, we can load Oracle R Advanced Analytics for Hadoop using the ORCH package and then download the dataset from the internet.

+ +
+
+
Copy
+
+ +
+
+# load the Oracle R Advanced Analytics for Hadoop library
+library(ORCH)
+
+# download the CSV file to the local environment and examine the data
+data <- read.csv(
+    file=url("http://data.fingal.ie/datasets/csv/BathingWaterStatus.csv"),
+    head=TRUE, sep=",")
+class(data)
+head(data)
+
+
+ +

After loading the data into the local R environment, you can connect to a schema in Oracle Database:

+ +
+
+
Copy
+
+ +
+
+# connect to Oracle Database
+ore.connect(user="odmuser", sid="orcl", host="localhost", password="odmuser", port=1521, all=TRUE);
+
+
+ +

You can now save the local R data frame, containing the beach quality data, into your Oracle schema by + creating new table. The ore.create() command creates a table in the database and + uses the data in the R data frame (called data) into this table. This new table + can now be referenced in the R session.

+ +
+
+
Copy
+
+ +
+
+# save the R data frame as a table in the database
+ore.create(data, table="DUBLIN_BEACH_QUALITY")
+# refresh the list of database objects
+ore.attach()
+# list the objects in the database. The newly created table is listed
+ore.ls()
+class(DUBLIN_BEACH_QUALITY)
+
+
+ +

After the table has been created, you can use it as a proxy data frame (also called an ORE data frame) and + run all your typical R functions on this data. But in this case, these R functions, via the ORE transparency + layer, are being run on the data in the table in Oracle Database instead of in the local R environment. This + allows you to use the benefits of Oracle Database. The following codes illustrates using some of the + traditional R functions to analyze the data in the Oracle table.

+ +
+
+
Copy
+
+ +
+
+# this table can be used in the same way as a tradition R data frame,
+# but the data remains in the database. Only results are returned to your R 
+# session
+summary(DUBLIN_BEACH_QUALITY)
+head(DUBLIN_BEACH_QUALITY)
+
+
+ +

The R code above illustrates how easy it is to work with data in the local R session and with data in Oracle + Database.

+ +

But what about when you want to work with data in HDFS? Oracle R Advanced Analytics for Hadoop has number of + R functions that facilitate this. Table 1 lists the functions for working with HDFS.

+
+
+ +
+
Table 1. HDFS functions in Oracle R Advanced Analytics for Hadoop + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Function NameFunction Name
hdfs.attachhdfs.mv
hdfs.cachehdfs.ncol
hdfs.cdhdfs.nrow
hdfs.cleanInputhdfs.parts
hdfs.cphdfs.pull
hdfs.cwdhdfs.push
hdfs.delimhdfs.put
hdfs.describehdfs.pwd
hdfs.dimhdfs.rm
hdfs.downloadhdfs.rmdir
hdfs.existshdfs.root
hdfs.fromHivehdfs.sample
hdfs.fromRDatahdfs.setroot
hdfs.gethdfs.size
hdfs.headhdfs.sync
hdfs.idhdfs.tail
hdfs.keysephdfs.toHive
hdfs.levelshdfs.toRData
hdfs.lshdfs.toRDD
hdfs.metahdfs.upload
hdfs.mkdirhdfs.valuesep
+
+
+
+ +
+
+

The functions listed in Table 1 allow you to work with data in HDFS, to move or save data to HDFS, and to + read data from HDFS. This is very useful for working with data as it lands in your Hadoop platform and to + save newly created datasets that are based on your R analytics and can be used by other processes.

+ +

The following examples illustrate how you can work with data files in HDFS using a selection of the HDFS R + functions listed in Table 1. The following R code will take the Dublin Beach Quality dataset and will save + it to HDFS. Then using some of the Oracle R Advanced Analytics for Hadoop R functions, you can examine this + data stored in HDFS as an alternative location to having data in Oracle Database or in your R environment, + and you can easily examine these data files from your R environment. These functions allow you to easily + persist your dataset, created during your data science projects, for reuse at a later time.

+ +
+
+
Copy
+
+ +
+
+# What is the current working directory on HDFS?
+# If needed the working directory can be changed using hdfs.cwd()
+hdfs.pwd()
+
+# Write the R data frame out to a file in HDFS
+hdfs.put(data, dfs.name="Dublin_Beach_Quality_Data")
+
+# Add the newly added file in HDFS to your search space 
+hdfs.attach("Dublin_Beach_Quality_Data")
+
+# List the properties of the file in HDFS, including path, class, types, 
+# names, size, number of rows, number of variables/features, and so on
+hdfs.describe("Dublin_Beach_Quality_Data")
+
+# List the number of records and the number of variables/features
+hdfs.dim("Dublin_Beach_Quality_Data")
+
+# Check to see if a file with this name exists in HDFS. This is 
+# a useful command before you try to create a file in HDFS.
+hdfs.exists("Dublin_Beach_Quality_Data")
+
+# Returns the first part of the file. 
+hdfs.head("Dublin_Beach_Quality_Data")
+# Returns the end of the file
+hdfs.tail("Dublin_Beach_Quality_Data")
+
+# Returns meta-data about the file. This includes the variable names, class 
+# type, variable types (for example, factors), number of records, 
+# if trimming is used, and so on.
+hdfs.meta("Dublin_Beach_Quality_Data")
+
+# Returns number of rows. Similar to R function
+hdfs.nrow("Dublin_Beach_Quality_Data")
+# Returns number of variables. Similar to R function
+hdfs.ncol("Dublin_Beach_Quality_Data")
+# Returns the size of the file in HDFS in bytes
+hdfs.size("Dublin_Beach_Quality_Data")
+# Lists the files stored in HDFS for the current working directory
+hdfs.ls()
+
+#
+hdfsData <- hdfs.get("Dublin_Beach_Quality_Data")
+# Check the class of the object. It will be an R data frame
+# This shows that the data in the file has been loaded into
+# the R environment as an R data frame
+class(hdfsData)
+
+# Finally, you can remove/delete the file from HDFS
+hdfs.rm("Dublin_Beach_Quality_Data", force=TRUE)
+
+
+ +

When working with Hive, you will need to create a connection to Hive. This is similar to making a connection + to Oracle Database, but for Hive it is a much simpler command:

+ +
+
+
Copy
+
+ +
+
+# set up a connection to Hive
+ore.connect(type="HIVE")
+ore.attach()
+
+# Displays the current database being used by Hive
+ore.showHiveOptions()
+
+# Take the Dublin Beach Quality dataset and push it out to Hive
+# Hive cannot process factor variables. Convert these to character strings
+f_filter <- sapply(data, is.factor)
+data[f_filter] <- data.frame(lapply(data[f_filter], as.character), 
+                             stringsAsFactors = FALSE) 
+# Move the modified data frame out to Hive
+hive_data <- ore.push(data)
+
+# This data can be processed using the typical R functions
+# while availing yourself of the scalability of Hive
+summary(hive_data)
+str(hive_data)
+head(hive_data)
+nrow(hive_data)
+dim(hive_data)
+
+# R data frames can be persisted to Hive to later use 
+ore.create(data, table = "dublin_beach_hive")
+ore.ls()
+
+# Hive data can be deleted from Oracle R Advanced Analytics for Hadoop
+ore.drop(table="dublin_beach_hive")
+ore.ls()
+
+
+ +

We have seen from the examples above how we can use Oracle R Advanced Analytics for Hadoop as one tool to + work with data in Oracle Database, in HDFS, and in Hive.

+ +

Map-Reduce Using Oracle R Advanced Analytics for Hadoop

+ +

Oracle R Advanced Analytics for Hadoop comes with a number of functions that allow you to create and manage + map-reduce jobs in Hadoop. A map-reduce process takes a dataset that has been distributed over Hadoop, + performs analysis on the distributed dataset, and finally calculates and returns the results. This allows + you to utilize many of the CRAN packages and invoke these as Hadoop jobs from R.

+ +

With the ORCH package, you can define a map-reduce job and submit it using the + hadoop.exec() function. Table 2 lists the various functions map-reduce function + for ORAAH. There are three main steps to define a map-reduce job in ORCH.

+ +

1. Define the dataset you are going to use. This dataset can exist on Hadoop, in + Hive, or as an R object.

+ +

2. Define the mapper function. This allows you to define what data you want to be + selected from the dataset and used in the later steps.

+ +

3. Specify how to apply the reducer function. This is the function to be applied to + the selected data. The output from this will be results from applying the reducer function or calculation. +

+
+
+ +
+
Table 2: Oracle R Advanced Analytics for Hadoop map-reduce functions + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Function NameDescription
hadoop.execThis function starts the Hadoop engine to send the mapper, reducer, and combiner R functions + for execution. The data must exist in HDFS.
hadoop.jobsLists the running Hadoop jobs.
hadoop.runThis start the Hadoop engine and sends the mapper, reducer, and combiner R functions for + execution. This is very similar to hadoop.exec above, except that + the data is not in Hadoop. It will copy the data to Hadoop before commencing the map-reduce + job.
orch.dryrunChanges the execution platform between the local host and the Hadoop cluster.
orch.exportMakes R objects, in the local R session, available in Hadoop so that they can referenced in + map-reduce jobs.
orch.keyvalOutputs the key-value pairs in a map-reduce job.
orch.keyvalsOutputs the sets of key-value pairs in a map-reduce job.
orch.packCompresses an R object that map-reduce will write as the values in key-value pairs.
orch.tempPathSets the path where temporary data is stored.
orch.unpackUncompresses an R object that was compressed using the orch.pack + function.
orch.create.parttabEnables partitioned Hive tables to be used with the ORCH map-reduce framework.
+
+
+
+ +
+
+

The following code illustrates the typical word count map-reduce process using Oracle R Advanced Analytics + for Hadoop.

+ +
+
+
Copy
+
+ +
+
+words_dataset <- hdfs.put(corpus)
+wordcount <- function(input, output=NULL, pattern= "  " ) {
+   result <- hadoop.exec(dfs.id=input,
+                   mapper = function (k, v) {
+                       lappy(strsplit(x=v, split=pattern) [[1]],
+                       function(w)  orch.keyval(w, 1) [[1]])
+                   },
+                   reducer = function(k, vv) {
+                       orch.keyval(k, sum(unlist(vv)))
+                   },
+                   config=new ("mapred.config",
+                   job.name = "wordcount",
+                   map.output = data.frame(key=0, val=''),
+                   reduce.output = data.frame(key='', val=0))
+               )
+   result
+}
+
+
+ +

Summary

+ +

In this article, we looked at what Oracle R Advanced Analytics for Hadoop is and at some of its capabilities. + Oracle R Advanced Analytics for Hadoop provides a set of R functions that allow you to connect to and + manipulate data stored in HDFS, using Hive transparency, as well as Oracle Database. In addition, you can + build highly efficient and scalable advanced analytics using map-reduce and Spark, all through simplified R + interfaces.

+ +

In Part 2 of this series, we will + look at some of the more advanced features of Oracle R Advanced Analytics for Hadoop, including advanced + analytics, how to access and run Spark, and how to use other machine learning algorithms.

+ +

About the Author

+ +

Oracle ACE Director Brendan Tierney is an independent consultant (Oralytics) and lectures on data science, databases, and big data at the Dublin + Institute of Technology/Dublin Technological University. He has 24+ years of experience working in the areas + of data mining, data science, big data, and data warehousing. As a recognized data science and big data + expert, Tierney has worked on projects in Ireland, the UK, Belgium, Holland, Norway, Spain, Canada, and the + US. He is active in the UK Oracle User Group (UKOUG) community and one of the user group leaders in Ireland. + Tierney has also been editor of the UKOUG Oracle Scene magazine, is a regular speaker at + conferences around the world, and writes for several publications. In addition, he has published four books, + three with Oracle Press/McGraw-Hill (Predictive Analytics Using Oracle Data Miner, Oracle R + Enterprise: Harnessing the Power of R in Oracle Database, and Real World SQL and PL/SQL: Advice + from the Experts) and one with MIT Press (Essentials of Data Science).

+ +
+
+ + \ No newline at end of file diff --git a/Articles/databases/138-building-apps-odb12c-apex-part1.html b/Articles/databases/138-building-apps-odb12c-apex-part1.html index e69de29..5470d14 100644 --- a/Articles/databases/138-building-apps-odb12c-apex-part1.html +++ b/Articles/databases/138-building-apps-odb12c-apex-part1.html @@ -0,0 +1,1219 @@ + + +
+
+ +

If you're a JavaScript developer, you're likely used to a diverse and constantly changing set of + frameworks to choose from when starting a project. Trying to keep up with all of the features and + functionality of this landscape is a full-time job in and of itself. While mastering the basics of multiple + frameworks is something that you're probably good at by now, taking things to the next + level—especially when it comes to more-sophisticated requirements such as analytics and + security—is likely a constant struggle.

+ +

Fortunately, there is a better way: leverage the full power of Oracle Database, Oracle REST Data Services, + JavaScript Extension Toolkit (Oracle JET).

+ +

Oracle Database is so much more than a place to just store data. It has a robust, mature, and powerful set of + features and functions that can easily be leveraged from any JavaScript framework via RESTful calls. + Features such as built-in analytics, auditing, security, and advanced text searching capabilities are just + the beginning. And the best part is that if you leverage these features in Oracle Database, you won't + have to change anything on the back end if and when you decide to change JavaScript frameworks.

+ +

Oracle REST Data Services is a free tool from Oracle that provides a seamless interface between Oracle + Database and any technology than can interact with RESTful services. As a Java application that can run + anywhere, Oracle REST Data Services can either run in an application server or by itself via Jetty. With its + support for OAuth2, Oracle REST Data Services can provide a low-maintenance yet secure pipeline from any + JavaScript library to Oracle Database 10g Release 2 and later.

+ +

Oracle JET is a new, open source JavaScript library provided by Oracle. In addition to a number of popular + open source libraries, Oracle JET also includes custom libraries that provide beautiful, responsive + visualizations; advanced two-way binding with a common model layer; and single-page application navigation + controls. Developers can use as much or as little of Oracle JET as they need. This article will simply use + one of the visualizations, which will get its data source from Oracle REST Data Services.

+ +

This series will walk you through the steps for creating a JavaScript-based data management application that + integrates with Oracle Database via RESTful service calls. While it uses jQuery as a front end, the bulk of + the logic is set up on the back end in PL/SQL. Thus, if you prefer a different JavaScript library, it should + be fairly simple to adapt the jQuery examples to a framework of your choosing.

+ +

Preparing the Environment

+ +

In order to complete the examples described in this series, you'll need two things: a browser and access + to an instance of Oracle Database 11g Release 2 or newer. Browsers are relatively easy to come by; + chances are you're using one right now to read this. But on the other hand, not everyone has a spare + Oracle Database instance lying around. That's OK, because they are easier and cheaper to get access to + than you probably think.

+ +

Choosing a Database Environment
+ There are a number of different ways to get access to an Oracle Database instance quickly, easily, and for + free. Let's review them:

+ + + +

Downloading and Installing Oracle SQL Developer
+ Once you have chosen a database platform, the next step is to download and install Oracle SQL Developer. + Oracle SQL Developer is Oracle's free IDE designed for database developers who work in either + on-premises or cloud environments. In addition to the wide variety of tools designed for database + developers, Oracle SQL Developer also includes a DBA console that can be used to manage databases. Make sure + that the version of Oracle SQL Developer you download is at least 4.2; anything earlier will not have the + required components for managing web services.

+ +

Oracle SQL Developer can be downloaded for free here: oracle.com/sqldeveloper. Simply choose your corresponding + platform. To complete the download, you will need to log in to your Oracle Technology Network account. + Oracle Technology Network accounts are free; simply navigate to the following site and click the + Create Account button: https://login.oracle.com/mysso/signon.jsp.

+ +

Depending on your platform, you might have to also download and install a JDK. JDKs for several different + platforms can be downloaded here.

+ +

Configuring Oracle SQL Developer
+ Once Oracle SQL Developer has been installed, the next step is to create a connection to your database. + Depending on which Oracle Cloud environment you selected, the steps for connecting will differ slightly.

+ + + +

If you're using a local VM or an on-premises instance of Oracle Database, you can simply create a new + database connection using the schema name, password, and connection details as you normally would. You might + need your DBA to assist with the account creation.

+ +

Creating the EMP and DEPT Tables
+ Before any front-end development starts, it's best to look at the two tables that will be used in this + article. For those who have used Oracle Database in the past, the tables EMP and DEPT should be nothing new. + But for those who are new to Oracle Database, EMP and DEPT are the tried-and-true demonstration tables that + have been used for years. Figure 1 illustrates a data model of the two tables.

+ + +

Figure 1: A data model of the EMP and DEPT tables.

+ +

The elegance of the EMP and DEPT tables lies in their simplicity: with only a few columns and a handful of + rows, it's possible to demonstrate almost any relational database concept with these two tables.

+ +

Your schema might or might not include EMP and DEPT. If it does, no further action is required. However, if + it does not, you'll have to create them. This can be done by running the script shown in Listing 3 of Appendix A.

+ +

To create the EMP and DEPT tables, do the following:

+ +
    +
  1. Start Oracle SQL Developer.
  2. +
  3. Connect to the schema defined in the previous step.
  4. +
  5. Locate the script in Listing 3 of Appendix A and copy it.
  6. +
  7. Paste the contents of the script into the Worksheet tab of Oracle SQL Developer.
  8. +
  9. Click the Run Script icon, which is circled in red in Figure 2.
    +
    + Note: If you have an EMP and DEPT table, this script will first remove and then + re-create them.
    +   +
  10. +
+ + +

Figure 2: Results of running the script after clicking the Run Script icon.

+ +

As the script runs, a list of messages should be displayed in a new window called Script Output. Once the + script is finished running, the last line will say "Commit Complete."

+ +

Scroll through the Script Output window to ensure that there were no errors when the tables were created or + populated.

+ +

Next, let's inspect the tables in our schema. We can do that by expanding the connection node in the tree + to reveal a list of schema objects and other associated services. If you expand the Tables + node, you should see at least two entries: EMP and DEPT. Simply select + EMP and notice that a new tab called EMP was created in the main section of Oracle SQL + Developer. This tab has a number of subtabs.

+ + +

Figure 3: The Columns tab of the EMP table.

+ +

By default, the Columns tab is selected. Take a minute to explore the different properties + of the EMP table by clicking each of the subtabs. Ensure that the Data tab contains the 14 + records that were created as part of the script that was just run. Repeat this process with for the DEPT + table and ensure that four rows exist there.

+   + +
Creating the Web Services
+ +

In the following sections, we will create the needed web services.

+ +

Creating the Module and the First Web Service
+ Before a database schema can be used with a web service, REST services for that schema must be enabled. This + allows Oracle REST Data Services—the web services application server—to access data stored in + the specified schema. This step takes only a few seconds to complete and needs to be done only once.

+ +

To enable your schema to work with Oracle REST Data Services, do the following:

+ +
    +
  1. Right-click the name of your schema. Then select REST Services > Enable REST + Services.
    +   +

    Figure 4: Selecting your schema to enable REST services.

    +
  2. +
  3. A wizard will appear. For the first step, ensure that Enable Schema checkbox is + selected. It's also a good idea to create an alias for the schema, so that we're not letting a + potential malicious user know our source schema name. While this bit of information by itself is + harmless, it could augment other attacks. Thus, we'll set the Schema Alias to + db. Deselect the Authorization required checkbox and click + Next.
    +   +

    Figure 5: Specify Details page of the wizard.

    +
  4. +
  5. Click Finish to complete the wizard.
    +   +

    Figure 6: Completing the RESTful Services wizard.

    +
  6. +
+ +

That's it! REST services are now enabled for your schema, and we can begin creating the required web + services.

+ +

The first web service will generate a JSON document based on all rows of data in the EMP table. But before we + can create a web service, we need to create a module. Think of a module as a package: a place where multiple + programs can be stored. All of the required web services will be added to a single module, making them + easier to manage over time.

+ +
    +
  1. In the hierarchy on the left side of the page, locate and expand the node for REST Data + Services. It should be about half-way down.
  2. +
  3. Right click Modules and select New Module. A new wizard should pop up. + When it does, enter demo in the Module Name and URI + Prefix fields, select the Publish – Make this RESTful Service available for + use checkbox, and click Next.
    +   +

    Figure 7: Specify Module screen.

    +
  4. +
  5. On the next page, enter emp in the URI Pattern field and click + Next.
    +   +

    Figure 8: Specify template screen.

    +
  6. +
  7. On the next screen, glance at the confirmation details, and then click Finish.
    +
    + A new module—demo—was created with an associated resource template—emp. As a final + step, a resource handler needs to be associated with the resource template. A resource handler + associates a resource template with some sort of data source and action.
    +   +
  8. +
  9. Expand the Modules node in the tree by clicking the small triangle icon.
  10. +
  11. Expand the demo node in the tree by doing the same.
  12. +
  13. Right-click the emp node in the tree and select Add Handler > GET. +
  14. +
  15. Set Source Type to Query, ensure that Data Format is + set to JSON, and click Apply.
    +   +

    Figure 9: Create Resource Handler screen.

    + For the next and final step, enter the SQL statement for the handler. This simple query will return a + list of employees based on their salary, from highest to lowest. +
  16. +
  17. In the Worksheet tab, enter the following SQL commands:
    +   +
    +
    + +
    + +
    +
    +SELECT 
    +  empno,
    +  dname, 
    +  ename,
    +  sal,
    +  job,
    +  sal_diff,
    +  comm,
    +  rownum rank
    +FROM 
    +  (
    +  SELECT
    +    e.empno,
    +    d.dname,
    +    e.ename,
    +    e.sal,
    +    e.job,
    +    0 sal_diff,
    +    NVL(e.comm, 0) comm
    +FROM
    +  emp e,
    +  dept d
    +WHERE
    +  e.deptno = d.deptno
    +ORDER BY
    +  e.sal DESC
    +  ) 
    +      
    +
    +
  18. +
  19. Click the floppy disk icon to save your changes. Be sure to click the single floppy disk icon in the SQL + Worksheet tab, not the double floppy disk icon in the top of the main window.
  20. +
+ +

At this point, the first RESTful module and web service have been created and are ready for use.

+ +

Testing Web Services
+ Testing web services can be very simple or somewhat involved, depending on the type of web service that is + being tested. Typically, GETs are fairly simple to test, and testing can be done with nothing more than a + browser. The others—specifically POST, PUT, and DELETE—require a tool designed to help test web + services. While there are several tools that can be used to do this, one that is easy to use, is free, and + works on all major platforms is called Postman. Postman makes it + simple to test almost any web service, even those that require authentication or header variables. We'll + use Postman in this article to verify that our web services are working properly, but any similar tool would + also work.

+ +

To test our first web service, simply start Postman and enter the following into the Enter request + URL field:

+ +
+
+
Copy
+
+ +
+
+http://servername/ords/db/demo/emp
+
+ +

Note: Be sure to replace servername with the name of your server.

+ +

If Postman successfully runs the web service, it will display the JSON document from our first web service, + as shown in Figure 10. Postman will also let us toggle between Pretty, Raw, and Preview mode to see the + document formatted in various modes.

+ + +

Figure 10: The results of testing our first web service in Postman.

+ +

Note: The first parameter of the URL might vary. Check with your DBA for the specifics.

+ +

Creating the Remaining Web Services
+ We need to create four more web services: one for the DEPT table, one for a specific record from the EMP + table, one for the source of the chart, and one to facilitate updates to the EMP table.

+ +

dept Web Service

+ +

Let's start with the dept web service, because it is nearly identical to the one we just created. This + web service will be used to populate the select list on the edit form.

+ +
    +
  1. Locate and right-click the demo node in the tree. Select Add Template. +
  2. +
  3. Enter dept in the URI Pattern field and click Next.
  4. +
  5. Click Finish.
  6. +
  7. Next, locate the dept node in the tree and right-click it. Select Add + Handler, and when that expands, select GET.
  8. +
  9. Set Source Type to Query, ensure that Data Format is + set to JSON, and click Apply.
  10. +
  11. In the resulting worksheet, enter the following SQL statement:
    +   +
    +
    + +
    + +
    +
    +SELECT 
    +    
    +  dname, 
    +  deptno 
    +FROM 
    +  dept 
    +ORDER BY 
    +  dname
    +
    +
  12. +
  13. Click the single floppy disk icon to save your changes.
  14. +
+ +

To test out the dept web service, enter the following URL into either your browser or Postman:

+ +
+
+
Copy
+
+ +
+
+
+http://servername/ords/db/demo/dept
+
+   + +

Note: Be sure to replace servername with the name of your server.

+ +

The results should contain a JSON document with four records, one for each of the departments in the table. +

+ +

emp Web Service: GET

+ +

Next, we'll create the web service that brings back a single row from the EMP table. This will be used + when selecting an employee to edit. This web service will make use of a parameter that will be passed in as + part of the requesting URI.

+ +
    +
  1. Locate and right-click the demo node in the tree. Select Add Template. +
  2. +
  3. Enter emp/{empno} in the URI Pattern field and click + Next.
  4. +
  5. Click Finish.
  6. +
  7. Next, locate the emp/{empno} node in the tree and right-click it. Select Add + Handler, and when that expands, select GET.
  8. +
  9. Set the Source Type field to Query One Row and click + Apply.
  10. +
  11. In the resulting worksheet, enter the following SQL statement:
    +   +
    +
    + +
    + +
    +
    +SELECT
    +  empno,
    +  deptno, 
    +  ename,
    +  sal,
    +  job,
    +  NVL(comm, 0) comm
    +FROM
    +  emp
    +WHERE
    +  empno = :empno
    +
    +
  12. +
  13. Click the Parameters tab.
  14. +
  15. Next, click the green plus icon.
  16. +
  17. A new row will be added, with the Name and Bind Parameter columns + blank. Click the Name column and enter empno. Repeat this for the + Bind Parameter column. Set Data Type to INTEGER. The + end result should resemble the following:
    +  
  18. +
  19. Click the SQL Worksheet tab and then click the single floppy disk icon to save your + changes.
  20. +
+ +

Testing this web service is a bit more involved, because we have to manually substitute the value of + empno when we test. Thus, a sample URL that we can try looks like this:

+ +
+
+
Copy
+
+ +
+
+
+http://servername/ords/db/demo/emp/7839
+
+   + +

Note: Be sure to replace servername with the name of your server.

+ +

This should bring back a JSON document with a single record for the employee KING.

+ +

salByJob Web Service

+ +

We also need a simple web service that will produce the data for an Oracle JET chart. This web service will + simply return a job title and the total amount of the salaries associated with each job.

+ +

Oracle JavaScript Extension Toolkit + (Oracle JET) is a free, open source JavaScript library that is maintained by Oracle. Here's a + description of it from the Oracle JET home page: "Oracle JET is targeted at intermediate to + advanced JavaScript developers working on client-side applications. It's a collection of open source + JavaScript libraries along with a set of Oracle contributed JavaScript libraries that make it as simple + and efficient as possible to build applications that consume and interact with Oracle products and + services, especially Oracle Cloud services."

+ +

One of the Oracle JET libraries that we can easily incorporate into our demonstration is the visualizations + library. Oracle JET contains a wide range of visualizations—charts, gauges, and other popular + components—that can be used either with the rest of the Oracle JET components or on their own. In this + case, we're going to use a simple pie chart visualization on the same page as our report.

+ +

For the pie chart to work, it needs a data source. Do the following to create a new web service that will act + as the source of the pie chart; it will return the aggregate amount for each job role.

+ +
    +
  1. Locate and right-click the demo node in the tree. Select Add Template. +
  2. +
  3. Enter salByJob for the URI Pattern field and click Next. +
  4. +
  5. Click Finish.
  6. +
  7. Next, locate the salByJob node in the tree and right-click it. Select Add + Handler, and when that expands, select GET.
  8. +
  9. Set the Source Type field to Query, ensure that Data + Format is set to JSON, and click Apply.
  10. +
  11. In the resulting worksheet, enter the following SQL commands:
    +   +
    +
    + +
    + +
    +
    +SELECT
    +  job,
    +  SUM(sal) sal
    +FROM
    +  emp
    +GROUP BY
    +        job
    +ORDER BY
    +        2 DESC
    +
    +
  12. +
  13. Click the single floppy disk icon to save your changes.
  14. +
+ +

Testing this web service is just as straightforward as the last few tests we did. Navigate to the following + URL:

+ +
+
+
Copy
+
+ +
+
+http://servername/ords/db/demo/salByJob
+
+ +

Note: Be sure to replace servername with the name of your server.

+ +

This should bring back a JSON document that shows the total salary for each distinct job.

+ +

emp Web Service: POST

+ +

Last, we need to create the web service that will be called when we're updating a record. This one will + be the most involved, because multiple parameters need to be mapped.

+ +
    +
  1. Locate the emp node in the tree and right-click it. Select Add + Handler, and when that expands, select POST.
  2. +
  3. Click the green plus icon, and then click the white area just below MIME Types. Enter + application/json and press the tab key. The results should look like this:
    +   +

    Figure 12. Create Resource Handler screen.

    +
  4. +
  5. Click Apply.
  6. +
  7. In the resulting worksheet, enter the following SQL statement:
    +   +
    +
    + +
    + +
    +
    +
    +BEGIN
    +UPDATE emp SET 
    +  sal = :sal,
    +  deptno = :deptno,
    +  ename = :ename,
    +  comm = :comm,
    +  job = :job
    +WHERE
    +  empno = :empno;
    +END;
    +
    +
  8. +
  9. Click the Parameters tab.
  10. +
  11. Next, click the green plus icon six times to create six rows. Add the parameters shown in the Table + 1:
    +
    + Table 1. Parameters to add + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    NameBind ParameterAccess MethodsSource TypeData Type
    empnoempnoINHTTP HEADERINTEGER
    enameenameINHTTP HEADERSTING
    deptnodeptnoINHTTP HEADERINTEGER
    salsalINHTTP HEADERINTEGER
    commcommINHTTP HEADERINTEGER
    jobjobINHTTP HEADERSTRING
    +
    +   +
  12. +
  13. Click the SQL Worksheet tab and then click the single floppy disk icon to save your + changes.
  14. +
+ +

Because this web service involves a POST, we will need to use Postman to help test it. The following steps + outline how to do this.

+ +
    +
  1. Change the transaction type from GET to POST.
    +   + +

    Figure 13. Changing the transaction type.

    +
  2. +
  3. Enter the following URL into the Enter Request URL field in Postman:
    +   +
    +
    + +
    + +
    +
    +
    +http://servername/ords/db/demo/emp
    +
    + Note: Be sure to replace servername with the name of your server.
    +   +
  4. +
  5. Click the Headers tab and then enter the six rows of data shown in Table 2:
    +
    + Table 2. Data to add + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    KeyValue
    empno7839
    enameKING
    deptno5000
    sal10
    comm1000
    jobPRESIDENT
    +
    +   +
  6. +
  7. Click Send.
  8. +
+ +

There will be no feedback, but this request should change the COMM value of the employee KING to 1000. To + verify that it worked, you can simply view the data in the table via Oracle SQL Developer, or you can rerun + the web service that is built on emp.

+   Creating the Front End + +

Now that we have all the web services up and running, the next step is to create the front end. In this + example, we're going to use jQuery. There are other more-modern frameworks available, but most people + have used jQuery at some point in their career, and the provided code can be translated for a different + framework.

+ +

The functionality of the front end will be extremely simple: produce a report and a form based on the results + of the web services. For this example, we'll use two HTML pages to do this. The report page contains + some jQuery code that will call the web service, parse it into a JSON document, and then loop through the + results. The result will be a simple HTML table that contains the results of the SQL commands in the web + service. There will also be an Oracle JET chart on this page, which is fed by a separate web service.

+ +

The code snippet in Listing 1 illustrates the function used to parse and display the data in the report. + Using the getJSON function, jQuery will make an asynchronous call to the web service, which + will in turn return a JSON document. That document will then be parsed and each record will be examined. For + each record in the document, the values will be extracted and appended to a table tag with an ID of + rpt.

+ +

Any JavaScript framework can be used to do this; the only requirement is to use the same URI for the web + service. Formatting and displaying the data can be done any way you choose.

+ +
+
+
Copy
+
+ +
+
+
+<script type="text/javascript" language="javascript">
+  $(document).ready(function() {
+    $.getJSON("http://servername/ords/db/demo/emp", function(data) {
+      $(data.items).each(function (index, value) {  
+        $("#rpt").append
+          '<tr>
+          + '<td class="text-center"><a href="form.html#'
+          + value.empno + '">Edit</a></td>
+          + '<td class="text-right">'  + value.empno    + '</td>'
+          + '<td class="text-center">' + value.dname    + '</td>'
+          + '<td class="text-left">'   + value.ename    + '</td>'
+          + '<td class="text-left">'   + value.job      + '</td>'
+          + '<td class="text-right">'  + value.sal      + '</td>'
+          + '<td class="text-right">'  + value.sal_diff + '</td>'
+          + '<td class="text-right">'  + value.comm     + '</td>
+          + '<td class="text-center">' + value.rank     + '</td>'
+          + '</tr>');
+      });
+    });
+  });
+</script>
+
+
+ + +

Listing 1: The jQuery function that calls the web service and parses and prints the + data to the page.

+ +

This all happens as soon as the page is loaded and, thus, the data is displayed on the page as if it were a + static page. A couple of CSS lines from Bootstrap and the Oracle JET chart add some polish to an otherwise + drab looking report, as shown in Figure 14. Bootstrap is a free, open source library that's a + combination of JavaScript, CSS, and HTML templates designed to provide a solid, responsive, and compatible + user interface framework.

+ + +

Figure 14: The results of the HTML page that calls the web service via jQuery. +

+ +

When a user clicks an Edit link, the form page will be called. This page will call the web + service for the corresponding row that was clicked and populate the form elements. The results are also + spruced up a bit with Bootstrap, as seen in Figure 15.

+ + +

Figure 15: The form used to update data via web services.
+ When the form page is loaded, two separate web service calls are made, as shown in Listing 2.

+ +
+
+
Copy
+
+ +
+
+<script type="text/javascript" language="javascript">
+  $(document).ready(function() {
+    $.getJSON("http://servername/ords/db/demo/dept", function(data) {
+      $(data.items).each(function( i ) {
+      $('#deptno').append($('<option>', 
+         {value:data.items[i].deptno, text:data.items[i].dname}));
+      });
+    });
+    setTimeout(function(){
+      $.getJSON("http://servername/ords/db/demo/emp/" 
+        + window.location.hash.substring(1), function(data) {
+        $(data).each(function (index, value) {
+          $("#empno").val(value.empno);  
+          $("#ename").val(value.ename);  
+          $("#job").val(value.job);  
+          $("#sal").val(value.sal);
+          $("#deptno").val(value.deptno);
+          $("#comm").val(value.comm);
+        });
+      });
+    }, 100);
+  });
+</script>
+
+ + +

Listing 2: The jQuery on the form will call two web services: one for the select list + and one for the data.

+ The first web service call— dept—will be used to populate the Department select list. + The name of the department will be displayed while the department ID will be stored in the database. The second + web service— emp—will fetch a single row from EMP and populate the corresponding fields + on the form. + +

When the user clicks Save, the page is posted to the web service, sending the updated data + back to the server where it will be processed and update the corresponding record. The user will then be + returned to the report page, where the updated values can be seen.

+ +

Now that you have a basic understanding as to how things will work, the only thing left to do is to create + three files based on Listing 4, Listing 5, and Listing 6 (all are in Appendix A). Simply + copy the source code from each listing and paste it into a new text file. Save Listing + 4 as rpt.html, save Listing 5 as form.html, and + save Listing 6 as chart.js. Be sure that all files are in the same + directory on your computer and that you change any occurrences of servername to your server's name.

+ +

Once the three files are created, simply open rpt.html in any modern browser. You should see a + report with 14 rows of data and a pie chart summarizing salaries by job, similar to Figure + 14.

+ +

Appendix A

+ + +

Listing 3: Scripts to Create the EMP and DEPT Tables

+ +
+
+
Copy
+
+ +
+
+
+DROP TABLE EMP
+/  
+  
+DROP TABLE DEPT
+/ 
+
+CREATE TABLE DEPT
+  (
+  DEPTNO NUMBER(2),
+  DNAME VARCHAR2(14),
+  LOC VARCHAR2(13)
+  CONSTRAINT dept_pk PRIMARY KEY (deptno)
+  )
+/ 
+
+
+CREATE TABLE EMP
+  (
+  EMPNO NUMBER(4) NOT NULL,
+  ENAME VARCHAR2(10),
+  JOB VARCHAR2(9),
+  MGR NUMBER(4)
+  HIREDATE DATE,
+  SAL NUMBER(7, 2),
+  COMM NUMBER(7, 2),
+  DEPTNO NUMBER(2),
+  CONSTRAINT EMP_PK PRIMARY KEY (empno),
+  CONSTRAINT deptno_fk FOREIGN KEY (deptno) REFERENCES dept (deptno)
+  )
+/ 
+
+INSERT INTO DEPT VALUES (10, 'ACCOUNTING', 'NEW YORK');
+INSERT INTO DEPT VALUES (20, 'RESEARCH', 'DALLAS');
+INSERT INTO DEPT VALUES (30, 'SALES', 'CHICAGO');
+INSERT INTO DEPT VALUES (40, 'OPERATIONS', 'BOSTON');
+ 
+
+
+INSERT INTO EMP VALUES
+  (7369, 'SMITH', 'CLERK', 7902, TO_DATE('17-DEC-1980', 'DD-MON-YYYY'), 800, NULL, 20);
+INSERT INTO EMP VALUES
+  (7499, 'ALLEN', 'SALESMAN', 7698, TO_DATE('20-FEB-1981', 'DD-MON-YYYY'), 1600, 300, 30);
+INSERT INTO EMP VALUES
+  (7521, 'WARD', 'SALESMAN', 7698, TO_DATE('22-FEB-1981', 'DD-MON-YYYY'), 1250, 500, 30);
+INSERT INTO EMP VALUES
+  (7566, 'JONES', 'MANAGER', 7839, TO_DATE('2-APR-1981', 'DD-MON-YYYY'), 2975, NULL, 20);
+INSERT INTO EMP VALUES
+  (7654, 'MARTIN', 'SALESMAN', 7698, TO_DATE('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);
+INSERT INTO EMP VALUES
+  (7698, 'BLAKE', 'MANAGER', 7839, TO_DATE('1-MAY-1981', 'DD-MON-YYYY'), 2850, NULL, 30);
+INSERT INTO EMP VALUES
+  (7782, 'CLARK', 'MANAGER', 7839, TO_DATE('9-JUN-1981', 'DD-MON-YYYY'), 2450, NULL, 10);
+INSERT INTO EMP VALUES
+  (7788, 'SCOTT', 'ANALYST', 7566, TO_DATE('09-DEC-1982', 'DD-MON-YYYY'), 3000, NULL, 20);
+INSERT INTO EMP VALUES
+  (7839, 'KING', 'PRESIDENT', NULL, TO_DATE('17-NOV-1981', 'DD-MON-YYYY'), 5000, NULL, 10);
+INSERT INTO EMP VALUES
+  (7844, 'TURNER', 'SALESMAN', 7698, TO_DATE('8-SEP-1981', 'DD-MON-YYYY'), 1500, 0, 30);
+INSERT INTO EMP VALUES
+  (7876, 'ADAMS', 'CLERK', 7788, TO_DATE('12-JAN-1983', 'DD-MON-YYYY'), 1100, NULL, 20);
+INSERT INTO EMP VALUES
+  (7900, 'JAMES', 'CLERK', 7698, TO_DATE('3-DEC-1981', 'DD-MON-YYYY'), 950, NULL, 30);
+INSERT INTO EMP VALUES
+  (7902, 'FORD', 'ANALYST', 7566, TO_DATE('3-DEC-1981', 'DD-MON-YYYY'), 3000, NULL, 20);
+INSERT INTO EMP VALUES
+  (7934, 'MILLER', 'CLERK', 7782, TO_DATE('23-JAN-1982', 'DD-MON-YYYY'), 1300, NULL, 10);
+   
+
+
+COMMIT
+/
+     
+  
+    
+
+ + +

Listing 4: Sample Report HTML File

+ Save this file as rpt.html. + +

Note: Be sure to change the value of servername to your server's name. + There is one instance in this file.

+ +
+
+
Copy
+
+ +
+
+<!DOCTYPE html>
+<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
+<meta content="utf-8" http-equiv="encoding">
+<head>
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
+<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js"></script>
+<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
+<link rel="stylesheet" href="http://rawgit.com/oracle/oraclejet/2.1.0/dist/css/alta/oj-alta-min.css">
+<script type="text/javascript" src="chart.js"></script>
+<style type="text/css">
+th { text-align: center !important }
+</style>
+<title>Employees</title>
+<script type="text/javascript" language="javascript">
+  $(document).ready(function() {
+    $.getJSON("http://servername/ords/db/demo/emp", function(data) {
+      $(data.items).each(function (index, value) {  
+        $("#rpt").append(
+          '<tr>'
+          + '<td class="text-center"><a href="form.html#' + value.empno    + '">Edit</a></td>'
+          + '<td class="text-right">'  + value.empno    + '</td>'
+          + '<td class="text-center">' + value.dname    + '</td>'
+          + '<td class="text-left">'   + value.ename    + '</td>'
+          + '<td class="text-left">'   + value.job    + '</td>'
+          + '<td class="text-right">$'  + value.sal      + '</td>'
+          + '<td class="text-right">$'  + value.sal_diff + '</td>'
+          + '<td class="text-right">$'  + value.comm     + '</td>'
+          + '<td class="text-center">' + value.rank     + '</td>'
+          + '</tr>');
+      });
+    });
+  });
+</script>
+</head>
+<body>
+ 
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-sm-8">
+      <h2>Employees</h2>
+      <table id="rpt" class="table table-striped table-hover">
+      <tr>
+        <th>EDIT</th>
+        <th>EMPNO</th>
+        <th>DEPT</th>
+        <th>ENAME</th>
+        <th>JOB</th>
+        <th>SAL</th>
+        <th>SAL_DIFF</th>
+        <th>COMM</th>
+        <th>RANK</th>
+      </tr>
+      </table>
+    </div>
+    <div class="col-sm-4" id='chart-container'>
+      <h2>Salaries by Job</h2>
+      <div id="pieChart" data-bind="ojComponent: {
+                component: 'ojChart',
+                type: 'pie',
+                series: pieSeriesValue,
+                animationOnDisplay: 'false',
+                animationOnDataChange: 'auto',
+                hoverBehavior: 'dim'
+                }"
+                style="max-width:500px;width:100%;height:350px;">
+      </div>
+    </div>
+  </div>
+</div>
+ 
+
+
+
+
+</body>
+</html>
+
+ + +

Listing 5: Sample Form HTML File

+ +

Save this file as form.html. Be sure to save it in the same directory as rpt.html. +

+ +

Note: Be sure to change the value of servername to your server's name. + There are three instances in this file.

+ +
+
+
Copy
+
+ +
+
+
+<!DOCTYPE html>
+<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
+<meta content="utf-8" http-equiv="encoding">
+<head>
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
+<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
+<style type="text/css">
+label { padding: 10px; width:150px; text-align: right;}
+button { margin: 10px;}
+</style>
+<title>Employee Details</title>
+<script type="text/javascript" language="javascript">
+  $(document).ready(function() {
+    $.getJSON("http://servername/ords/db/demo/dept", function(data) {
+      $(data.items).each(function( i ) {
+      $('#deptno').append($('<option>', {value:data.items[i].deptno, text:data.items[i].dname}));
+      });
+    });
+    setTimeout(function(){
+      $.getJSON("http://servername/ords/db/demo/emp/" + window.location.hash.substring(1), function(data) {
+        $(data).each(function (index, value) {
+          $("#empno").val(value.empno);  
+          $("#ename").val(value.ename);  
+          $("#job").val(value.job);
+          $("#sal").val(value.sal);
+          $("#deptno").val(value.deptno);
+          $("#comm").val(value.comm);
+        });
+      });
+    }, 100);
+  });
+</script>
+</head>
+<body>
+<form action="rpt.html" method="post" name="registration" id="form">
+  <div class="container" style="outline: 1px solid black;background-color:#eee;margin-top:20px;">
+    <h2>Employee Details</h2>
+    <input type="hidden" name="empno" id="empno" />
+    <label for="ename">Employee Name</label>
+    <input type="text" name="ename" id="ename" />
+    <label for="deptno">Department</label>
+    <select name="deptno" id="deptno"></select>
+    <label for="job">Job</label>
+    <input type="text" name="job" id="job" />
+    <br />
+    <label for="sal">Salary</label>
+    <input type="text" name="sal" id="sal" />
+    <label for="comm">Commission</label>
+    <input type="text" name="comm" id="comm" />
+    </br >
+    <span class="pull-right">
+      <a href="rpt.html" style="color:black;"><input type="button" value="Cancel" /></a>
+      <button type="submit">Save</button>
+    </span>
+  </div>
+</form>
+<script>
+$( "form" ).submit(function( event ) {
+  $.post( "http://servername/ords/db/demo/emp", $( "#form" ).serialize() );
+});
+</script>
+</body>
+</html>
+
+ + +

Listing 6: chart.js File

+ +

Save this file as chart.js. Be sure to save it in the same directory as rpt.html. +

+ +

Note: Be sure to change the value of servername to your server's name. + There is one instance in this file.

+ +
+
+
Copy
+
+ +
+
+
+        window.onload=function(){
+        require(['knockout',
+        'ojs/ojcore',
+        'jquery',
+        'ojs/ojknockout',
+        'ojs/ojcore',
+        'ojs/ojbutton',
+        'ojs/ojchart'
+        ], function(ko, oj, $) {
+        'use strict';
+        function ChartModel(pieSeries) {
+        var self = this;
+        self.threeDValue = ko.observable('off');
+        self.sortingValue = ko.observable('descending');
+        self.pieSeriesValue = ko.observableArray(pieSeries);
+        /* toggle buttons*
+        self.threeDOptions = [
+        {id: '2D', label: '2D', value: 'off', icon: 'oj-icon demo-2d'},
+        {id: '3D', label: '3D', value: 'on', icon: 'oj-icon demo-3d'}
+        ];
+        self.threeDValueChange = function(event, data) {
+        self.threeDValue(data.value);
+        return true;
+        }
+        }
+         
+
+
+        $(document).ready(function(){
+        $.getJSON("http://servername/ords/db/demo/salByJob").
+        then(function(ret) 
+        var pieSeries=[];
+        $.each(ret.items, function(idx,emp) {
+        var nextEl= {"name" :emp.job,"items": [emp.sal]};
+        pieSeries.push(nextEl);
+        });
+        console.log(pieSeries)
+        ko.applyBindings(new ChartModel(pieSeries), document.getElementById('chart-container'));
+    
+        });
+        });
+        });
+        //RequireJS configs (usually these come first in main.js, but they don't have to)
+        requirejs.config({
+        // Path mappings for the logical module names
+        paths: {
+        'knockout': 'http://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.0/knockout-min',
+        'jquery': 'http://cdnjs.cloudflare.com/ajax/libs/jquery/3.1.0/jquery.min',
+        "jqueryui-amd": "http://rawgit.com/jquery/jquery-ui/1.12.0/ui",
+        "promise": "http://cdnjs.cloudflare.com/ajax/libs/es6-promise/3.2.1/es6-promise.min",
+        "hammerjs": "http://cdnjs.cloudflare.com/ajax/libs/hammer.js/2.0.8/hammer.min",
+        "ojdnd": "http://rawgit.com/oracle/oraclejet/2.1.0/dist/js/libs/dnd-polyfill/dnd-polyfill-1.0.0.min",
+        "ojs": "http://rawgit.com/oracle/oraclejet/2.1.0/dist/js/libs/oj/debug",
+        "ojL10n": "http://rawgit.com/oracle/oraclejet/2.1.0/dist/js/libs/oj/ojL10n",
+        "ojtranslations": "http://rawgit.com/oracle/oraclejet/2.1.0/dist/js/libs/oj/resources",
+        "text": "http://cdnjs.cloudflare.com/ajax/libs/require-text/2.0.12/text.min",
+        "signals": "http://cdnjs.cloudflare.com/ajax/libs/js-signals/1.0.0/js-signals.min",
+        },
+        // Shim configurations for modules that do not expose AMD
+        shim: {
+        'jquery': {
+        exports: ['jQuery', '$']
+        }
+        }
+        });
+        }//]]>
+
+ +

Summary

+ +

Now that you've prepared the environment, created the web services, and created the front end, in Part 2 of this + series, all you'll have to do is simply reload the HTML pages to see changes, because they will + all be made on the server side in the database. Keep in mind that this example is intentionally simple so + that you can focus more on understanding how the server-side components work. The front end can capture and + parse JSON from the web service any number of ways and for any number of purposes, once you understand the + basics.

+ +
+

About the Author

+ +
+
Scott Spendolini is president and founder of Sumner Technologies, a + world-class Oracle services, education, and solutions firm. Throughout his career, he has assisted + clients with their Oracle Application Express development and training needs. Spendolini is a + long-time, regular presenter at many Oracle-related conferences, including Oracle OpenWorld, Kscope, + and Rocky Mountain Oracle Users Group (RMOUG). He is an Oracle Ace Director, the author of Expert + Oracle Application Express Security, and a coauthor of Pro Oracle Application Express. Spendolini is + also an Oracle Certified Oracle Application Express developer. Spendolini started his career at + Oracle Corporation, where he worked with Oracle E-Business Suite for almost seven years and was a + senior product manager for Oracle Application Express for over three years. He holds a dual + bachelor's degree from Syracuse University in management information systems and + telecommunications management.
+
+
+ +
Join the Database Community Conversation + + +
+
+
\ No newline at end of file diff --git a/Articles/databases/158-oracle-r-advanced-analytics-for-hadoop-part-2.html b/Articles/databases/158-oracle-r-advanced-analytics-for-hadoop-part-2.html index e69de29..567e6f9 100644 --- a/Articles/databases/158-oracle-r-advanced-analytics-for-hadoop-part-2.html +++ b/Articles/databases/158-oracle-r-advanced-analytics-for-hadoop-part-2.html @@ -0,0 +1,355 @@ + + +
+
+ +

In this article, which is Part 2 of a series, we will look at some of the more advanced features of Oracle R + Advanced Analytics for Hadoop, including advanced analytics and machine learning, and how to use Spark. + Oracle R Advanced Analytics for Hadoop is a component of Oracle Big Data Connectors and provides a set of R + functions allowing you to connect to and process data stored in Hadoop Distributed File System (HDFS), using + Hive transparency, as well as data stored in Oracle Database.

+ +

In Part 1 + of this series, we looked at some of the more typical use cases for using Oracle R Advanced + Analytics for Hadoop, including working with Oracle Database, HDFS, and Hive and initiating map-reduce jobs. + Oracle R Advanced Analytics for Hadoop has a number of highly scalable machine learning algorithms and + utilizes some of the Apache Spark machine learning algorithms for greater performance of in-memory + distributed machine learning. This will be the focus of this article.

+ +

Analytical and Machine Learning Features in Oracle R Advanced Analytics for Hadoop

+ +

When using Oracle R Advanced Analytics for Hadoop, you also have access to the wide range of analytic + functions available from the many thousands of R packages. It would take a very long time to cover all of + those here, but when we look closer at what analytic functions are specific to Oracle R Advanced Analytics + for Hadoop, we find the functions listed in Table 1. To find this list, you can use the following R command + once the ORCH package has been loaded.

+ +
+
+
Copy
+
+ +
+
+apropos("^orch")
+
+

+ +

Table 1: Statistical and analytic functions available in Oracle R Advanced Analytics for + Hadoop

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Function Name
orch.cor
orch.cov
orch.glm
orch.glm2*
orch.glm.control
orch.kmeans
orch.lm
orch.lm2*
orch.lmf
orch.multivar
orch.neural
orch.neural2*
orch.nmf
orch.predict
orch.princomp
orch.sample
orch.scale
+ +

Note: * These are Spark-enabled versions of the functions.

+

With each release or Oracle R Advanced Analytics for Hadoop, you will find that the list of analytic and + machine learning functions will increase. These functions have been specifically tuned to work in big data + environments with data in HDFS and Hive. This allows these functions to be available for scaling using + map-reduce jobs as well as for improvements in memory usage and so on.

+

The following example illustrates the creation of a linear regression model, using the orch.lm() function, using the on-time flight dataset.

+ +
+
+
Copy
+
+ +
+
+# Attach the dataset containing the details of flights
+# Data file is located in HDFS
+ontime_DS <- hdfs.attach("/user/oracle/ontime_s")
+
+# Create a linear regression model on this dataset to 
+# predict the possible flight delay time
+# 
+# Map-Reduce is used to scale the processing to create the model
+# using 4 mappers and 2 reducers 
+lm_model <- orch.lm(ARRDELAY ~ DISTANCE + DEPDELAY, 
+                  dfs.dat = ontime_DS,
+                  numMappers = 4, 
+                  numReducers = 2)
+
+# Display the summary details of the LM model
+summary(lm_model)
+
+
+ +

As you can see, these Oracle R Advanced Analytics for Hadoop analytic and machine learning functions are easy + to use and are highly scalable. Make sure you check the documentation + for each function to ensure that you are maximizing them fully.

+ +

Spark Machine Learning Feature in Oracle R Advanced Analytics for Hadoop

+ +

Over the past few releases of Oracle R Advanced Analytics for Hadoop, Oracle has been increasing the support + for using Spark. By doing this, Oracle is making it easier to access and use the various machine learning + functions available in Spark, thereby utilizing their memory-resident efficiency. Additionally, some of the + HDFS functions have been updated to allow data to be easily transferred from Spark RDDs into HDFS. Similarly + these Spark-based functions can be run on data stored in HDFS and Hive. Table 2 lists the Spark-enabled + functions in Oracle R Advanced Analytics for Hadoop (version 2.7.1).

+ +

Table 2: Spark-enabled functions available in Oracle R Advanced Analytics for Hadoop

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Function Name
orch.lm2
orch.glm2
orch.ml.gmm
orch.ml.linear
orch.ml.lasso
orch.ml.ridge
orch.ml.logistic
orch.ml.dt
orch.ml.random.forest
orch.ml.svm
orch.ml.kmeans
orch.ml.pca

+ +

It is expected that the list of functions in Table 2 will be expanded with each subsequent release of Oracle + R Advanced Analytics for Hadoop.

+

Additionally, some Oracle R Advanced Analytics for Hadoop functions have been updated to support the use of + these algorithms in Spark. These include an updated predict() function for + scoring new datasets using the Spark-based models. The orch.save.model() function + saves the Spark-based model details to a file in HDFS. This allows the model to be saved for later use or + for sharing with other data scientists. The orch.load.model() function can then + be used to reload the Spark model back into the environment.

+

To enable access to Spark from your R and Oracle R Advanced Analytics for Hadoop environment, it is important + that you have Spark installed and the necessary environment variables enabled to make it accessible. This + can be easily configured by editing the Renviron.site file to ensure that the + SPARK_HOME and SPARK_JAVA_OPTS environment variables + are set and to ensure that the necessary Spark directories are included in the CLASSPATH. Some of this setup is dependent on your environment. The working + environment for the articles in this series is the Oracle VM VirtualBox prebuilt virtual machine called + Oracle Big Data Lite VM (see Part 1 for more information).

+

The first step is to create a Spark connection. A Spark connection can be set up to use Yarn or it can be set + up in a standalone mode. The following example illustrates the spark.connect() + function. This function has four parameters:

+ + +
+
+
Copy
+
+ +
+
+# Load the ORCH R package 
+library(ORCH) 
+
+# Create the Spark connection using Yarn 
+spark.connect("yarn-client", 
+               memory="512m", 
+               dfs.namenode="bigdatalite.localdomain")
+
+
+ +

After the Spark connection is set up, you can proceed to process the data and run the Spark-enabled + algorithms that you need to use. The following example illustrates using the Spark algorithm orch.glm2() to fit a model for the kyphosis dataset that is part of the rpart R package.

+ +
+
+
Copy
+
+ +
+
+# Load the rpart package to allow access to the kyphosis dataset
+# Create a local copy of the dataset
+library(ORCH)
+library(rpart)
+k_dataset <- kyphosis
+
+# Write the dataset to HDFS. 
+# It will be this dataset that will be used with Spark
+k_hdfs <- hdfs.put(kyphosis)
+# List the contents of the default directory in HDFS 
+# and verify the file exists
+hdfs.ls()
+
+# Call the Spark-enabled GLM2 function to generate the 
+# machine learning model
+sparkModel <- orch.glm2(Kyphosis ~ Age + Number + Start, 
+                        dfs.dat = dfs.dat)
+
+ ORCH GLM: processed 1 factor variables, 0.365 sec 
+ ORCH GLM: created model matrix, 2  partitions, 0.398 sec 
+ ORCH GLM: iter  1,  deviance   1.12289843250711020E+02,  elapsed time 0.216 sec 
+ ORCH GLM: iter  2,  deviance   6.64219993846240600E+01,  elapsed time 0.304 sec 
+ ORCH GLM: iter  3,  deviance   6.18628545282569460E+01,  elapsed time 0.277 sec 
+ ORCH GLM: iter  4,  deviance   6.13897990884807400E+01,  elapsed time 0.313 sec 
+ ORCH GLM: iter  5,  deviance   6.13799331446360300E+01,  elapsed time 0.460 sec 
+ ORCH GLM: iter  6,  deviance   6.13799272764552550E+01,  elapsed time 0.214 sec
+
+
+ +

The GLM2 Spark model can be saved to HDFS using the orch.save.model function. This + function takes the name of the model as the first parameter and the name of the file in HDFS as the second + parameter.

+ +
+
+
Copy
+
+ +
+
+orch.save.model(sparkModel, "sparkmodel_hdfs", overwite=TRUE)
+
+
+ +

When you want to reuse the saved model, you can use the orch.load.model function + to load the model details back into your R environment, for example:

+ +
+
+
Copy
+
+ +
+
+modelReloaded <- orch.load.model("sparkmodel_hdfs")
+
+
+ +

When you are finished performing your analytics and machine learning using Spark, you can close the + connection using the spark.disconnect function. This function does not delete the + current Spark context but instead marks it as nonactive. Within a short time, all resources will be freed up + by the R environment and Java garbage collectors.

+ +
+
+
Copy
+
+ +
+
+# Disconnect from Spark
+spark.disconnect()
+
+
+ +

Summary

+ +

In this article, we looked at using Oracle R Advanced Analytics to perform some advanced analytics work. This + included using some of the machine learning algorithms and how to use Spark to enable machine learning.

+ +

In Part 1 of this series, we looked at + how to work with data in Oracle Database, HDFS, and Hive and how to initiate map-reduce jobs. Make sure to + check out this article.

+ +

About the Author

+ +

Oracle ACE Director Brendan Tierney is an independent consultant (Oralytics) and lectures on data science, databases, and big data at the Dublin + Institute of Technology/Dublin Technological University. He has 24+ years of experience working in the areas + of data mining, data science, big data, and data warehousing. As a recognized data science and big data + expert, Tierney has worked on projects in Ireland, the UK, Belgium, Holland, Norway, Spain, Canada, and the + US. He is active in the UK Oracle User Group (UKOUG) community and one of the user group leaders in Ireland. + Tierney has also been editor of the UKOUG Oracle Scene magazine, is a regular speaker at + conferences around the world, and writes for several publications. In addition, he has published four books, + three with Oracle Press/McGraw-Hill (Predictive Analytics Using Oracle Data Miner, Oracle R + Enterprise: Harnessing the Power of R in Oracle Database, and Real World SQL and PL/SQL: Advice + from the Experts) and one with MIT Press (Essentials of Data Science).

+ +
+
+ + \ No newline at end of file diff --git a/Articles/databases/164-building-apps-odb12c-apex-part2.html b/Articles/databases/164-building-apps-odb12c-apex-part2.html index e69de29..12cdea5 100644 --- a/Articles/databases/164-building-apps-odb12c-apex-part2.html +++ b/Articles/databases/164-building-apps-odb12c-apex-part2.html @@ -0,0 +1,540 @@ + + +
+
+

Now that the web services have been created and integrated into the client side in Part 1 of this + series, we can spend some time on the database tier using database features to modify which rows get + returned. There will be no more code changes in the client-side portion of the application for the rest of + this article; all updates will simply be enabling different database features or slightly modifying the SQL + query used in the web service.

+ +

Analytic Functions

+ +

Analytic functions are a powerful no-cost feature of Oracle Database that allow developers to use + sophisticated analytics in their SQL queries and PL/SQL code. Because these calculations occur on the + database server, they are highly optimized and can slice through large volumes of data with ease.

+ +

Several different analytic functions are supported; a comprehensive list can be found here.

+ +

We can easily incorporate analytic functions into the SQL statements that our web service uses. This can be + done in such a way that only minor, if any, changes to the user interface need to be made. The heavy + lifting, so to speak, will be done at the database tier.

+ +

Our current SQL query returns all employees and orders them by salary from least to greatest. This was done + using a simple inline query with an ORDER BY clause. If we run this query, we can see that SMITH has the + smallest salary while KING enjoys the largest, as shown in Figure 1.

+ +

Figure 1: Results of SQL sorting on salary.

+ +

But what if we want to find the same information but segment it by department? We could simply run the SQL + query for each department, adding a WHERE clause to only look at a specific department, and then use UNION + to put all of those queries together. With a small dataset, this approach might work. But for a table with + many departments and hundreds, if not thousands of employees, this approach is very impractical.

+ +

A better approach is to seek the help of analytic functions and use either the RANK or DENSE_RANK function to + produce our results. Both functions will rank a dataset based on some criteria; the difference is that + DENSE_RANK will assign consecutive values in the event of two of more values being the same, whereas RANK + will not.

+ +

Table 1 illustrates the difference in how RANK and DENSE_RANK assign values to a simple dataset:

+ Table 1. Comparison of RANK and DENSE_RANK values + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ValueRANKDENSE_RANK
1011
2022
2022
3043
+   + +

We'll apply RANK to our SQL query, and use deptno as the partitioned column. Thus, our new + query now looks like Listing 1:

+ +
+
+
Copy
+
+ +
+
+
+
+SELECT 
+  e.empno,
+  d.dname,
+  e.ename,
+  e.job,
+  e.sal,
+  0 sal_diff,
+  NVL(e.comm, 0) comm,
+  RANK() OVER (PARTITION BY e.deptno ORDER BY sal DESC) AS rank
+FROM 
+  emp e,
+  dept d
+WHERE
+  d.deptno = e.deptno
+
+ +

Listing 1: Modified query using the RANK() analytic function.

+ +

To incorporate the updated SQL query into the existing web service, do the following:

+ +
    +
  1. In Oracle SQL Developer, expand the REST Data Services node in the tree. From there, + expand Modules, demo, and emp to reveal a leaf node + called GET.
  2. +
  3. Right-click GET and select Open.
  4. +
  5. You should now see a Worksheet tab, where the SQL query from the creation of the web + service is displayed.
  6. +
  7. Replace the SQL query that is there with the SQL query shown in Listing 1 and save your changes by + clicking the single floppy disk icon.
  8. +
+ +

Figure 2: Updating the SQL query of the web service.

+ +

Now, the RANK column will contain the rank of an employee's salary within a specific + department, not overall across all departments, as shown in Figure 3.

+ +

Figure 3: Results of SQL query with RANK analytic function.

+ +

The results of the web service call and web page will also reflect the updated results.

+ +

We can also apply multiple analytic functions to a single SQL statement. Let's say we wanted to easily + see the difference in salary between the current employee and the previous employee in the same department. + We can turn to the LAG and LEAD analytic functions to help us with this. LAG will "look back" any + number of rows and return that value, whereas LEAD will "look forward" any number of rows and + return that value. These two analytic functions eliminate the need to create a complex self-join, and they + can be easily implemented in any SQL query.

+ +

We can modify our query slightly to include the difference between a current employee's salary and the + next-highest paid employee's salary in the same the department by using the SQL query in Listing 2.

+ +
+
+
Copy
+
+ +
+
+SELECT 
+  e.empno,
+  d.dname,
+  e.ename,
+  e.job,
+  e.sal,
+  e.sal - LAG(e.sal, 1, 0) OVER (PARTITION BY e.deptno ORDER BY e.sal DESC) AS sal_diff,
+  NVL(e.comm, 0) comm,
+  RANK() OVER (PARTITION BY e.deptno ORDER BY e.sal DESC) AS rank
+FROM 
+  emp e,
+  dept d
+WHERE
+  d.deptno = e.deptno
+
+ +

Listing 2: SQL query that makes use of two analytical functions: LAG and RANK.

+ +

To modify the SQL query used in the web service, do the following:

+ +
    +
  1. In Oracle SQL Developer, expand the REST Data Services node in the tree. From there, + expand Modules, demo, and emp to reveal a leaf node + called GET.
  2. +
  3. Right-click GET and select Open.
  4. +
  5. You should now see a Worksheet tab, where the SQL query from the creation of the web + service is displayed.
  6. +
  7. Replace the SQL query that is there with the SQL query shown in Listing 2 and save your changes by + clicking the single floppy disk icon.
  8. +
+ +

When the SQL query in Listing 2 is run, the SAL_DIFF column is computed using the LAG + function. This column represents the difference in salary between the current and previous row. In the case + of the highest paid employee per department, the value is the same as that employee's salary, because + there is no previous row to compare to. The results of the new query can be seen in Figure 4.

+ +

Figure 4: The results of the LAG function, illustrating the differences in + salary.

+ +

RANK, DENSE_RANK, LAG, and LEAD are just a few of the many analytic functions that you can easily incorporate + into your SQL statements—regardless of whether they are called as part of a web service. Read more + about analytic functions in the Oracle documentation or + on the Oracle-Base website.

+ +
Redaction
+ +

Another powerful database feature that can be easily implemented in Data Redaction, which is feature of the + Oracle Advanced Security option for Oracle Database. Data Redaction can conditionally redact sensitive data + from being displayed to the user. Data can be completely or partially redacted or replaced with random data. +

+ +

Note: In order to manage redaction policies, your schema will need EXECUTE privileges on + DBMS_REDACT. Your DBA might have to assist in granting these privileges.

+ +

We can create a simple redaction policy that will apply to our query—all without having to change a + single line of code on the client or the server. This policy will completely redact any values in the COMM + or commission column. With the policy enabled, any time the EMP table is queried, the value of the COMM + column will simply show 0.

+ +

To create a redaction policy, do the following:

+ +
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. + You can use the one that was used to create the EMP and DEPT tables in Part 1. If you don't see a + SQL Worksheet tab, create a new one by clicking the icon circled in Figure 5.
    +   +

    Figure 5. Icon for opening a new SQL worksheet

    +
  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
    +   +
    +
    + +
    + +
    +
    +BEGIN
    +DBMS_REDACT.add_policy
    +  (
    +  object_schema => 'SCOTT',
    +  object_name   => 'EMP',
    +  policy_name   => 'REDACT_COMM',
    +  expression    => '1=1',
    +  column_name   => 'COMM',
    +  function_type => DBMS_REDACT.FULL
    +  );
    +END;
    +/
    +
    +
  4. +
  5. Execute the SQL query by clicking the Run Statement icon.
    +   +

    Figure 6.

    +
  6. +
+ +

If the query executes successfully, you will see a message that reads "PL/SQL procedure successfully + completed." If you re-query the data from either Oracle SQL Developer or the web service, all values + for COMM should read 0, as shown in Figure 7.

+ +

Figure 7: The results of the same query after a redaction policy was applied.

+ +

Let's keep the redaction policy in force as we enable additional database functionality.

+ Oracle Virtual Private Database + +

Security is always on the mind of developers today. With incidents of breaches on the rise and no end in + sight, it is critical that you ensure that access to your data is restricted to only authorized users. While + this article is not intended to serve as a checklist for securing Oracle Database, it would be remiss to not + include a portion on security, specifically one that can be easily applied to a query used with a web + service.

+ +

While Oracle Database has many ways to restrict access to data, one of the more common methods is called + Oracle Virtual Private Database—a no-cost feature of Oracle Database, Enterprise Edition that can be + easily enabled. This feature aims to hide rows of data that don't match specific criteria. For example, + we'll set up a rule that restricts any query on EMP to return only employees of a specific department. +

+ +

Oracle Virtual Private Database can work in concert with other Oracle Database security features, such as + Data Redaction. Once we have our rule in place, the same query that we have been using will return only + employees that are in department 30. The redaction rules that we set up—where commission is fully + redacted—will still be applied.

+ +

Note: To manage Oracle Virtual Private Database policies, your schema will need EXECUTE + privileges on DBMS_RLS. Your DBA might have to assist in granting these privileges.

+ +

Before an Oracle Virtual Private Database policy can be created, a policy function needs to exist. A policy + function is a standard Oracle Database function that will return the WHERE clause that Oracle Virtual + Private Database will automatically apply to the query against a specified object. So in our case, all we + need to do is return deptno = 30 to limit the query to return only employees in department 30. +

+ +

Best practices dictate that the policy function be created in a different schema than where your data lives. + This can ensure that developers will not be able to modify the function and circumvent its logic. For this + example, we'll simply create it in the same schema to keep things simple.

+ +

To create an Oracle Virtual Private Database policy function and enable it for the EMP table, do the + following:

+ +
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. If + you don't see a SQL Worksheet tab, create a new one by clicking the icon circled in + Figure 8.
    +   +

    Figure 8. Icon for opening a new SQL worksheet.

    +
  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
    +   +
    +
    + +
    + +
    +
    +CREATE OR REPLACE FUNCTION restrict_by_dept
    +  (
    +  owner     IN VARCHAR2, 
    +  objname   IN VARCHAR2
    +  ) 
    +RETURN VARCHAR2 AS
    +BEGIN
    +RETURN 'DEPTNO = 30';
    +END;
    +/
    +
    +
  4. +
  5. Execute the SQL query by clicking the Run Statement icon shown in Figure 9.
    +   +

    Figure 9. Icon for creating a executing the query.

    +
  6. +
  7. Enter the following SQL query in the SQL Worksheet tab, replacing the previous + content:
    +   +
    +
    + +
    + +
    +
    +BEGIN
    +DBMS_RLS.ADD_POLICY
    +  (  
    +  object_schema   => 'SCOTT',
    +  object_name     => 'EMP',
    +  policy_name     => 'RESTRICT_BY_DEPT',
    +  policy_function => 'RESTRICT_BY_DEPT',
    +  statement_types => 'SELECT'
    +  );
    +END;
    +/
    +
    +
  8. +
  9. Execute the SQL query by clicking the Run Statement icon.
  10. +
+ +

If you simply reload the web page now, only employees in department 30 will be returned in both the report + and the chart, as illustrated in Figure 10.

+ +

Figure 10: The report and chart after a rule restricting data to the sales + department was applied.

+ +

Note that the commission values are still being redacted, and the employees are still sorted from most paid + to least paid, per the RANK analytic function. Differences in salaries are still being displayed, per the + LAG analytic function as well. Not a single line in the HTML file or even the JavaScript file was changed. +

+ +

Also, Oracle Virtual Private Database applies to anytime the database queries the EMP table. This means that + data will be restricted when we view both the form and the report. If you edit any of the employees in the + SALES department, notice the URL in your browser. It should look something like this:

+ file:///Users/scott/OTN/Files/html/form.html#7698 + +

The last parameter is the EMPNO of the user that we're editing; in this case, BLAKE. If we want to tamper + with the URL and try to see KING's record, all we need to do is change that last parameter to 7839 and + reload the page, right? Wrong.

+ +

As shown in Figure 11, if we try to view any employee who is not in the accounting department, we'll + simply get a blank form. Oracle Virtual Private Database is doing its job by protecting the data, regardless + how it is accessed.

+ +

Figure 11: The results of trying to alter the URL and view an employee not in the + accounting department.

+ +

Auditing

+ +

As a final step, we can enable auditing to record each time the query used in the web service was accessed. + This will allow us to monitor its usage and ensure that it is being run by expected clients for authorized + purposes. In fact, auditing can be configured for almost any transaction in Oracle Database; both data + definition language (DDL), for creating and modifying objects, and data manipulation language (DML), for + querying and modifying data, are supported.

+ +

Note: In order to manage audit policies, your schema will need EXECUTE privileges on + DBMS_FGA. Your DBA might have to assist in granting these privileges.

+ +

In our example, we can use something called fine-grained auditing, which allows us to specify a condition + that will be triggered when an object is audited. We might only want to audit the EMP table if an employee + of a specific department is included in the results or if the value of a salary exceeds a specific value. +

+ +

To create an audit policy on EMP that will audit any transaction where the salary of an employee is updated + to be greater than 4000, do the following:

+ +
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. +
  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
    +   +
    +
    + +
    + +
    +
    +BEGIN
    +DBMS_FGA.add_policy
    +  (
    +  object_schema   => 'SCOTT',
    +  object_name     => 'EMP',
    +  policy_name     => 'WEBSERVICE',
    +        statement_types => 'INSERT,UPDATE',
    +  audit_condition => 'SAL > 4000',
    +  audit_column    => 'SAL'
    +  );
    +END;
    +/
    +
    +
  4. +
  5. Execute the SQL query by clicking the Run Statement icon.
  6. +
+ +

Now, edit any employee and set their salary to a value less than 4000 and save the changes. Because the new + salary is less than 4000—which is the threshold defined in the rule—nothing was written to the + audit logs. Now, edit any employee and set the salary to 4001 or greater. Because the threshold of 4000 was + exceeded, an entry was written to the audit log.

+ +

If you're using Oracle Database 12c, the audit logs are written to the new unified audit table, + UNIFIED_AUDIT_TRAIL. This new view consolidates all the older audit views into a single place. If you're + still on Oracle Database 11g, the audit logs will be written to the standard DBA_FGA_* tables.

+ +

To view the audit logs in Oracle Database 12c, as the SYS or SYSTEM/PDB_ADMIN user, issue the + following query:

+ +
+
+
Copy
+
+ +
+
+SELECT * from unified_audit_trail WHERE fga_policy_name = 'WEBSERVICE' 
+  ORDER BY event_timestamp desc
+
+ +

To view the audit logs in Oracle Database 11g, as the SYS or SYSTEM user, issue the following query: +

+ +
+
+
Copy
+
+ +
+
+SELECT * FROM  dba_fga_audit_trail WHERE policy_name = 'WEBSERVICE' 
+  ORDER BY timestamp desc
+
+ +

In either case, many columns will be returned. If you look closely, you should be able to spot the SQL query + that was executed, as well as the schema and table that were impacted. 

+ +

Disabling Functionality

+ +

At the conclusion of this article, you might want to reset your schema to how it was before different + database features were enabled. To do that, simply run the corresponding SQL commands in Oracle SQL + Developer.

+ +

Redaction
+ To disable the redaction rule, execute the following:

+ +
+
+
Copy
+
+ +
+
+BEGIN
+DBMS_REDACT.drop_policy(
+  object_schema => 'SCOTT',
+  object_name   => 'EMP',
+  policy_name   => 'REDACT_COMM');
+END;
+/
+
+ +

Oracle Virtual Private Database
+ To disable Oracle Virtual Private Database, execute the following:

+ +
+
+
Copy
+
+ +
+
+BEGIN
+DBMS_RLS.DROP_POLICY(
+  object_schema   => 'SCOTT',
+  object_name     => 'EMP',
+  policy_name     => 'RESTRICT_BY_DEPT');
+END;
+/
+
+ +

Fine-Grained Auditing
+ To disable fine-grained auditing, execute the following:

+ +
+
+
Copy
+
+ +
+
+BEGIN
+DBMS_FGA.drop_policy(
+  object_schema   => 'SCOTT',
+  object_name     => 'EMP',
+  policy_name     => 'WEBSERVICE');
+END;
+/
+
+ +

Summary

+ +

No matter what front end you prefer, using Oracle Database as a back end provides a wealth of robust, mature, + and easy-to-use functionality that can be tailored to suit your individual needs. No longer do you need to + learn and incorporate complex libraries to help with analyzing and securing your data; all of this can + happen by using native functionality of Oracle Database, regardless of the front end.

+ +

About the Author

+ +

Scott Spendolini is president and founder of Sumner Technologies, a world-class Oracle + services, education, and solutions firm. Throughout his career, he has assisted clients with their Oracle + Application Express development and training needs. Spendolini is a long-time, regular presenter at many + Oracle-related conferences, including Oracle OpenWorld, Kscope, and Rocky Mountain Oracle Users Group + (RMOUG). He is an Oracle Ace Director, the author of Expert Oracle Application Express Security, and a + coauthor of Pro Oracle Application Express. Spendolini is also an Oracle Certified Oracle Application + Express developer. Spendolini started his career at Oracle Corporation, where he worked with Oracle + E-Business Suite for almost seven years and was a senior product manager for Oracle Application Express for + over three years. He holds a dual bachelor's degree from Syracuse University in management information + systems and telecommunications management.

+
+
\ No newline at end of file diff --git a/Articles/dsl/112-technote-php-instant.html b/Articles/dsl/112-technote-php-instant.html index e69de29..d718478 100644 --- a/Articles/dsl/112-technote-php-instant.html +++ b/Articles/dsl/112-technote-php-instant.html @@ -0,0 +1,408 @@ + + +
+
+
+
+
+
+ +

The easiest way to configure PHP to access a remote Oracle Database is to use Oracle Instant + Client libraries. This note describes how to install PHP with the OCI8 Extension and Oracle + Instant Client on Windows and Linux. The free The + Underground PHP and Oracle Manual explains other installation options and contains + more detail.

+

OCI8 is the PHP extension for + connecting to Oracle Database. OCI8 is open source and included with PHP. The name is + derived from Oracle's C "call interface" API first introduced in version 8 of Oracle + Database. OCI8 links with Oracle client libraries, such as Oracle Instant Client.

+ +
+
+ +
+
+
+
+ + + + +
+
+ +

Oracle Instant Client is a free set of easily installed libraries that allow programs to connect to local or + remote Oracle Database instances. To use Instant Client an existing database is needed - Instant Client does + not include one. Typically the database will be on another machine. If the database is local then Instant + Client, although convenient and still usable, is generally not needed because OCI8 can directly use the + database libraries.

+ +

When using Instant Client 11g, PHP OCI8 connects to all editions of Oracle 9.2, 10.x, and 11.x + databases.

+ +

Software Requirements

+
+
+ +
+
+
+
+ + + + + + + + + + + + + + + + + + + +
SoftwareNotes
Oracle + Instant ClientDownload the "Basic" package. On Linux, also download the "SDK" or "devel" package. If + space is at a premium, the Basic Lite package can be used instead of Basic.
Apache HTTP ServerVersion 2.2
PHPVersion 5.4
+
+
+
+
+ +
+
+ +

Enabling the PHP OCI8 Extension on Windows

+

The Instant Client binaries complement PHP's pre-built binaries for Windows.

+
    +
  1. + Install Apache by downloading httpd-2.2.22-win32-x86-no_ssl.msi from + httpd.apache.org/download.cgi +
  2. +
  3. + Double click the MSI file to start the installation wizard.

    +

    Install "for All Users, on Port 80". Do a typical install into the default destination folder: C:\Program Files\Apache Software Foundation\Apache2.2. +

  4. +
  5. + Download the FastCGI component mod_fcgid-2.3.6-win32-x86.zip from httpd.apache.org/download.cgi#mod_fcgid +
  6. +
  7. + Unzip it to the installed Apache 2.2 directory. The C:\Program Files\Apache Software Foundation\Apache2.2\modules directory should + now have mod_fcgid.so and mod_fcgid.pdb files. +
  8. +
  9. + Edit C:\Program Files\Apache Software Foundation\Apache2.2\conf\httpd.conf + and add the line: +
    LoadModule fcgid_module modules/mod_fcgid.so
    +
  10. +
  11. In httpd.conf, locate the section for htdocs and add ExecCGI to the Options: +
    <Directory "C:/Program Files/Apache Software Foundation/Apache2.2/htdocs">
    +...
    +Options Indexes FollowSymLinks ExecCGI
    +...
    +</Directory>
    +  
    +
    +
  12. +
  13. Install PHP by downloading the PHP 5.4.0 "VC9 x86 Non Thread Safe" ZIP package php-5.4.0-nts-Win32-VC9-x86.zip from windows.php.net/download.
  14. +
  15. In Windows Explorer unzip the PHP package to a directory called C:\php-5.4.0 + +
  16. +
  17. In C:\php-5.4.0 copy php.ini-development to php.ini + +
  18. +
  19. Edit php.ini to make the following changes: +
      +
    • Add a timezone line like: + date.timezone = America/Los_Angeles + + Use your local timezone name. +
    • +
    • + Add the line: + extension_dir = C:\php-5.4.0\ext +

      + This is the directory containing the PHP extensions.

      +
    • +
    • + Remove the semicolon from the beginning of the line: + extension=php_oci8_11g.dll + +
    • +
    +
  20. +
  21. Edit C:\Program Files\Apache Software Foundation\Apache2.2\conf\httpd.conf + and add the following lines. Make sure you use forward slashes '/' and not + back slashes '\': + +
    +
    
    +FcgidInitialEnv PHPRC "c:/php-5.4.0"
    +AddHandler fcgid-script .php
    +FcgidWrapper "c:/php-5.4.0/php-cgi.exe" .php
    +
    +
    +
  22. +
  23. Download the "Instant Client Package - Basic" for Windows from the OTN Instant Client page. + Because PHP is 32 bit, use the 32 bit version of Instant Client. + +

    Unzip the Instant Client files to C:\instantclient_11_2

    +
  24. +
  25. + Edit the Windows PATH environment setting and add C:\instantclient_11_2. For example, on Windows XP, follow Start -> + Control Panel -> System -> Advanced -> Environment Variables and edit PATH in the System variables list. + +

    Commonly you need to reboot Windows so the new environment is correctly set.

    + +

    Set desired Oracle globalization language environment variables such as NLS_LANG. If nothing is set, a default local environment will be assumed. + See the Globalization chapter in The + Underground PHP and Oracle Manual for more details.

    +

    Unset Oracle variables such as ORACLE_HOME and ORACLE_SID, which are unnecessary with Instant Client.

    +

    If you have other Oracle software on the computer then instead of modifying the Windows environment, + write a script that sets these values and starts Apache. Otherwise library symbol clashes are likely + because of version differences.

    +
  26. +
  27. + Restart Apache using the system tray Apache Monitor or the Start menu option. +
  28. +
+

Enabling the PHP OCI8 Extension on Linux

+

On Linux, PHP is generally manually compiled because the bundled version nevers seems to be up to date. + However, if you don't wish to recompile PHP, more recent, unsupported RPM packages for Oracle Linux are + available from oss.oracle.com, or via Unbreakable Linux Network updates. If a supported PHP environment is + desired use Zend Server. + These all have the OCI8 extension pre-built.

+ +

To build PHP and OCI8 from source code:

+ +
    +
  1. + Install the Apache HTTP Server and development packages e.g. with yum install httpd httpd-devel. +
  2. +
  3. + Download the PHP 5.4 source code and install PHP + following Installation on Unix systems in + the PHP manual. +

    At this stage, don't configure the OCI8 extension.

    +
  4. +
  5. + Download the Basic and the SDK Instant Client packages from the OTN Instant + Client page. Either the zip file or RPMs can be used. +

    Install the RPMs as the root user, for example:

    +
    +
    
    +  rpm -Uvh oracle-instantclient11.2-basic-11.2.0.3.0-1.x86_64.rpm 
    +rpm -Uvh oracle-instantclient11.2-devel-11.2.0.3.0-1.x86_64.rpm 
    +
    +
    +

    The first RPM puts Oracle libraries in /usr/lib/oracle/11.2/client64/lib + and the second creates headers in /usr/include/oracle/11.2/client64.

    +

    If you are using the ZIP files, the SDK should unzipped to the same directory as the basic package, + and a symbolic link manually created:

    + ln -s libclntsh.so.11.1 libclntsh.so +
  6. +
  7. The latest OCI8 extension from PECL is always the current + version. Although it is generally in sync with the latest PHP 5.4 source code, it can sometimes be more + recent. The latest production extension can be automatically downloaded and added to PHP using: + + pecl install oci8 + +

    This gives:

    +
    +
    
    +downloading oci8-1.4.7.tgz ...
    +Starting to download oci8-1.4.7.tgz (Unknown size)
    +.....done: 168,584 bytes
    +10 source files, building
    +running: phpize
    +Configuring for:
    +PHP Api Version:         20100412
    +Zend Module Api No:      20100525
    +Zend Extension Api No:   220100525
    +Please provide the path to the ORACLE_HOME directory.
    +Use 'instantclient,/path/to/instant/client/lib' if you're compiling
    +with Oracle Instant Client [autodetect] : 
    +
    +
    +

    + If you have the Instant Client RPMs, hit Enter and PECL will + automatically build and install an oci8.so shared library. If you have + the Instant Client zip files, or want a specific version of Instant Client used, then explicitly + give the appropriate path after "instantclient,":

    + instantclient,/usr/lib/oracle/11.2/client64/lib + +

    Use an explicit, absolute path since PECL does not expand environment variables.

    +

    If you don't have the pecl program, you can alternatively download the + OCI8 package in a browser and then install it with:

    +
    +
    
    +  tar -xzf oci8-1.4.7.tgz
    +cd oci8-1.4.7
    +phpize
    +./configure --with-oci8=instantclient,/usr/lib/oracle/11.2/client64/lib
    +make install
    +
    +
    +
  8. +
  9. Edit php.ini and enable the OCI8 extension with: + extension=oci8.so +

    + Also confirm extension_dir points to the directory the oci8.so file was installed into.

    +
  10. +
  11. Add the Instant Client directory to /etc/ld.so.conf, or manually set LD_LIBRARY_PATH to /usr/lib/oracle/11.2/client64/lib. You might also want to set Oracle + globalization language environment variables such as TNS_ADMIN and NLS_LANG. If NLS_LANG is not set, a default local + environment will be assumed. See the Globalization chapter in The + Underground PHP and Oracle Manual for more details. +

    It is important to set all Oracle environment variables before starting Apache so that the + OCI8 process environment is correctly initialized. Setting environment variables in PHP scripts can + lead to obvious or non-obvious problems. On Oracle Linux, export environment variables in /etc/sysconfig/httpd. On Debian-based machines set them in /etc/apache2/envvars.

    + +

    Restart Apache, for example:

    +

    service httpd restart +

  12. +
+

Verifying the PHP OCI8 Extension is Installed

+

To check OCI8 configuration, create a simple PHP script phpinfo.php in the Apache + document root:

+ <?php +phpinfo(); +?> + +

+ Load the script into a browser using the appropriate URL, e.g. http://localhost/phpinfo.php. The browser page will contain an "oci8" section + saying "OCI8 Support enabled" and listing the OCI8 options that can be configured.

+

Connecting to an Oracle Database

+

To create a connection, Oracle username and password credentials are passed as two parameters of oci_connect(). An Oracle + Database name connection identifier must be used for the third parameter because programs linked with + Instant Client are always considered "remote" from any database server and need to be told which database + instance to connect to. The connection string is likely to be well known for established Oracle databases. + With new systems the information is given by the Oracle installation program when the database is set up. + The installer should have configured Oracle Network and created a service name such as orcl for you.

+

There are several ways to pass the connection information to PHP. This example uses Oracle's Easy Connect + syntax to connect to the HR schema in the orcl database service running on + mymachine. No tnsnames.ora or other Oracle Network file is needed:

+ $conn = oci_connect('hr', 'hr_password', 'mymachine.mydomain/orcl'); + +

See Oracle's Using the Easy Connect Naming Method documentation for the Easy Connect syntax.

+

In new databases the demonstration schemas such as the HR user will need to be unlocked and given a password. + This may be done in SQL*Plus by connecting as the SYSTEM user and executing the statement:

+ ALTER USER username IDENTIFIED BY new_password ACCOUNT UNLOCK; + + +

Using PHP OCI8 and Oracle

+ +

Try out a simple script, testoci.php Modify the connection credentials to suit + your database and load it in a browser. This example lists all tables owned by the user HR:

+
+

+  <?php
+
+$conn = oci_connect('hr', 'hr_password', 'mymachine.mydomain/orcl');
+
+$stid = oci_parse($conn, 'select table_name from user_tables');
+oci_execute($stid);
+
+echo "<table>\n";
+while (($row = oci_fetch_array($stid, OCI_ASSOC+OCI_RETURN_NULLS)) != false) {
+    echo "<tr>\n";
+    foreach ($row as $item) {
+        echo "  <td>".($item !== null ? htmlentities($item, ENT_QUOTES) : "&nbsp;")."</td>\n";
+    }
+    echo "</tr>\n";
+}
+echo "</table>\n";
+
+?>
+
+
+

Troubleshooting

+ +

Check the Apache error log file for startup errors.

+

Temporarily set display_error=On in php.ini so script + errors are displayed. Switch it back off when finished for security reasons.

+

Chapter 9 of The + Underground PHP and Oracle Manual contains information about common connection errors and discusses + alternative ways to set environment variables.

+

Oracle's SQL*Plus command line tool can be downloaded from the Instant Client page to help resolve + environment and connection problems. Check SQL*Plus can connect and then ensure the Environment section (not + the Apache Environment section) of phpinfo.php shows the equivalent environment + settings.

+ +

Windows Specific Help

+ +

If the phpinfo.php script does not produce an "oci8" section saying "OCI8 Support + enabled", verify that extension=php_oci8_11g.dll is uncommented in php.ini.

+

If php.ini's extension_dir directive does not contain the directory with php_oci8_11g.dll then Apache startup will give an alert: "PHP Startup: Unable to + load dynamic library php_oci8_11g.dll."

+

If PATH is set incorrectly or the Oracle libraries cannot be found at all, + starting Apache will give an alert: "The dynamic link library OCI.dll could not be found in the specified + path". The Environment section of the phpinfo() page will show the values of + PATH and the Oracle variables actually being used by PHP.

+

If there are multiple versions of Oracle libraries on the machine then version clashes are likely. For some + discussion on setting variables refer to Using PHP OCI8 with 32-bit PHP on Windows 64-bit. +

+

Linux Specific Help

+

If using Instant Client ZIP files, make sure the two packages are unzipped to the same location. Make sure a + symbolic link libclntsh.so points to libclntsh.so.11.1.

+

Set all required Oracle environment variables in the shell that starts Apache.

+

Conclusion

+

Using Oracle Instant Client and installing PHP OCI8 from PECL provide maximum flexibility, allowing + components to be easily installed and upgraded.

+

Questions and suggestions can be posted on the OTN PHP or Instant + Client forums.

+

The PHP Developer Center + contains links to useful background material.

+ +
+
+ + \ No newline at end of file diff --git a/Articles/dsl/114-odb-browser-apps-js-rest-p2.html b/Articles/dsl/114-odb-browser-apps-js-rest-p2.html index e69de29..30992a5 100644 --- a/Articles/dsl/114-odb-browser-apps-js-rest-p2.html +++ b/Articles/dsl/114-odb-browser-apps-js-rest-p2.html @@ -0,0 +1,471 @@ + +
+
+

This article is Part 2 of a two-part series that describes the steps for creating a JavaScript-based + data management application that integrates with Oracle Database via RESTful service calls. +

+

Now that the web services have been created and integrated into the client side in Part 1 of this series, we can spend some time on + the database tier using database features to modify which rows get returned. There will be no more code + changes in the client-side portion of the application for the rest of this article; all updates will simply + be enabling different database features or slightly modifying the SQL query used in the web service.

+

Analytic Functions

+

Analytic functions are a powerful no-cost feature of Oracle Database that allow developers to use + sophisticated analytics in their SQL queries and PL/SQL code. Because these calculations occur on the + database server, they are highly optimized and can slice through large volumes of data with ease.

+

Several different analytic functions are supported; a comprehensive list can be found here.

+

We can easily incorporate analytic functions into the SQL statements that our web service uses. This can be + done in such a way that only minor, if any, changes to the user interface need to be made. The heavy + lifting, so to speak, will be done at the database tier.

+

Our current SQL query returns all employees and orders them by salary from least to greatest. This was done + using a simple inline query with an ORDER BY clause. If we run this query, we can see that SMITH has the + smallest salary while KING enjoys the largest, as shown in Figure 1.

+

ODB-apps-p2-Figure1

+

Figure 1: Results of SQL sorting on salary.

+

But what if we want to find the same information but segment it by department? We could simply run the SQL + query for each department, adding a WHERE clause to only look at a specific department, and then use UNION + to put all of those queries together. With a small dataset, this approach might work. But for a table with + many departments and hundreds, if not thousands of employees, this approach is very impractical.

+

A better approach is to seek the help of analytic functions and use either the RANK or DENSE_RANK function to + produce our results. Both functions will rank a dataset based on some criteria; the difference is that + DENSE_RANK will assign consecutive values in the event of two of more values being the same, whereas RANK + will not.

+

Table 1 illustrates the difference in how RANK and DENSE_RANK assign values to a simple dataset:

+ Table 1. Comparison of RANK and DENSE_RANK values + +
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Value RANK DENSE_RANK
1011
2022
2022
3043
+ + + +
+
+
+
+ + + +

We'll apply RANK to our SQL query, and use deptno as the partitioned column. Thus, + our new query now looks like Listing 1:

+
+
+
Copy
+
+
+    SELECT   e.empno,
+       d.dname,
+       e.ename, 
+       e.job,
+       e.sal, 
+       0 sal_diff, 
+       NVL(e.comm, 0) comm, 
+       RANK() OVER (PARTITION BY e.deptno ORDER BY sal DESC) AS rank
+       FROM   emp e, 
+       dept d
+       WHERE  d.deptno = e.deptno
+
+

Listing 1: Modified query using the RANK() analytic function.

+

To incorporate the updated SQL query into the existing web service, do the following:

+
    +
  1. In Oracle SQL Developer, expand the REST Data Services node in the tree. From there, + expand Modules, demo, and emp to reveal a leaf node + called GET.
  2. +
  3. Right-click GET and select Open.
  4. +
  5. You should now see a Worksheet tab, where the SQL query from the creation of the web + service is displayed.
  6. +
  7. Replace the SQL query that is there with the SQL query shown in Listing 1 and save your changes by + clicking the single floppy disk icon.
  8. +
+ +

ODB-apps-p2-Figure2

+

Figure 2: Updating the SQL query of the web service.

+

Now, the RANK column will contain the rank of an employee's salary within a specific + department, not overall across all departments, as shown in Figure 3.

+

ODB-apps-p2-Figure3

+

Figure 3: Results of SQL query with RANK analytic function.

+

The results of the web service call and web page will also reflect the updated results.

+

We can also apply multiple analytic functions to a single SQL statement. Let's say we wanted to easily see + the difference in salary between the current employee and the previous employee in the same department. We + can turn to the LAG and LEAD analytic functions to help us with this. LAG will "look back" any number of + rows and return that value, whereas LEAD will "look forward" any number of rows and return that value. These + two analytic functions eliminate the need to create a complex self-join, and they can be easily implemented + in any SQL query.

+

We can modify our query slightly to include the difference between a current employee's salary and the + next-highest paid employee's salary in the same the department by using the SQL query in Listing 2.

+
+
+
Copy
+
+
+    SELECT   e.empno, 
+     d.dname,
+     e.ename,
+     e.job,
+     e.sal,
+     e.sal - LAG(e.sal,1, 0) OVER (PARTITION BY e.deptno ORDER BY e.sal DESC) AS sal_diff, 
+     NVL(e.comm, 0) comm, 
+     RANK() OVER (PARTITION BY e.deptno ORDER BY e.sal DESC) AS rank 
+     FROM   emp e, 
+     dept d
+     WHERE  d.deptno = e.deptno 
+
+

Listing 2: SQL query that makes use of two analytical functions: LAG and RANK.

+

To modify the SQL query used in the web service, do the following:

+
    +
  1. In Oracle SQL Developer, expand the REST Data Services node in the tree. From there, + expand Modules, demo, and emp to reveal a leaf node + called GET.
  2. +
  3. Right-click GET and select Open.
  4. +
  5. You should now see a Worksheet tab, where the SQL query from the creation of the web + service is displayed.
  6. +
  7. Replace the SQL query that is there with the SQL query shown in Listing 2 and save your changes by + clicking the single floppy disk icon.
  8. +
+

When the SQL query in Listing 2 is run, the SAL_DIFF column is computed using the LAG + function. This column represents the difference in salary between the current and previous row. In the case + of the highest paid employee per department, the value is the same as that employee's salary, because there + is no previous row to compare to. The results of the new query can be seen in Figure 4.

+

ODB-apps-p2-Figure4

+

Figure 4: The results of the LAG function, illustrating the differences in salary.

+

RANK, DENSE_RANK, LAG, and LEAD are just a few of the many analytic functions that you can easily incorporate + into your SQL statements—regardless of whether they are called as part of a web service. Read more about + analytic functions in the Oracle documentation or + on the Oracle-Base website.

+

Redaction

+

Another powerful database feature that can be easily implemented in Data Redaction, which is feature of the + Oracle Advanced Security option for Oracle Database. Data Redaction can conditionally redact sensitive data + from being displayed to the user. Data can be completely or partially redacted or replaced with random + data..

+

Note: In order to manage redaction policies, your schema will need EXECUTE privileges on + DBMS_REDACT. Your DBA might have to assist in granting these privileges.

+

We can create a simple redaction policy that will apply to our query—all without having to change a single + line of code on the client or the server. This policy will completely redact any values in the COMM or + commission column. With the policy enabled, any time the EMP table is queried, the value of the COMM column + will simply show 0.

+

To create a redaction policy, do the following:

+
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. + You can use the one that was used to create the EMP and DEPT tables in Part 1. If you don't see a SQL + Worksheet tab, create a new one by clicking the icon circled in Figure 5.

    + ODB-apps-p2-Figure5
    Figure 5. Icon for opening a new SQL + worksheet. 

  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
    +
    + +
    +
    +      BEGIN
    +    DBMS_REDACT.add_policy
    +      ( 
    +      object_schema => 'SCOTT', 
    +      object_name   => 'EMP', 
    +      policy_name   => 'REDACT_COMM', 
    +      expression    => '1=1', 
    +      column_name   => 'COMM', 
    +      function_type => DBMS_REDACT.FULL 
    +      );
    +      END;
    +      /
    +
    +
  4. +
  5. Execute the SQL query by clicking the Run Statement icon.

    ODB-apps-p2-Figure6
    Figure 6. Icon for executing the query.

    +
  6. +
+

If the query executes successfully, you will see a message that reads "PL/SQL procedure successfully + completed." If you re-query the data from either Oracle SQL Developer or the web service, all values for + COMM should read 0, as shown in Figure 7.

+

ODB-apps-p2-Figure7
Figure 7: The results of the same query after a redaction policy + was applied.

+

Let's keep the redaction policy in force as we enable additional database functionality.

+

Oracle Virtual Private Database

+

Security is always on the mind of developers today. With incidents of breaches on the rise and no end in + sight, it is critical that you ensure that access to your data is restricted to only authorized users. While + this article is not intended to serve as a checklist for securing Oracle Database, it would be remiss to not + include a portion on security, specifically one that can be easily applied to a query used with a web + service.

+

While Oracle Database has many ways to restrict access to data, one of the more common methods is called + Oracle Virtual Private Database—a no-cost feature of Oracle Database, Enterprise Edition that can be easily + enabled. This feature aims to hide rows of data that don't match specific criteria. For example, we'll set + up a rule that restricts any query on EMP to return only employees of a specific department.

+

Oracle Virtual Private Database can work in concert with other Oracle Database security features, such as + Data Redaction. Once we have our rule in place, the same query that we have been using will return only + employees that are in department 30. The redaction rules that we set up—where commission is fully + redacted—will still be applied.

+

Note: To manage Oracle Virtual Private Database policies, your schema will need EXECUTE + privileges on DBMS_RLS. Your DBA might have to assist in granting these privileges.

+

Before an Oracle Virtual Private Database policy can be created, a policy function needs to exist. A policy + function is a standard Oracle Database function that will return the WHERE clause that Oracle Virtual + Private Database will automatically apply to the query against a specified object. So in our case, all we + need to do is return deptno = 30 to limit the query to return only employees in + department 30.

+

Best practices dictate that the policy function be created in a different schema than where your data lives. + This can ensure that developers will not be able to modify the function and circumvent its logic. For this + example, we'll simply create it in the same schema to keep things simple.

+

To create an Oracle Virtual Private Database policy function and enable it for the EMP table, do the + following:

+
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. If + you don't see a SQL Worksheet tab, create a new one by clicking the icon circled in + Figure 8.

    ODB-apps-p2-Figure8
    Figure 8. Icon for opening a new SQL worksheet. +

    +
  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
    +
    + +
    +
    +     CREATE OR REPLACE FUNCTION restrict_by_dept
    +      ( 
    +      owner     IN VARCHAR2, 
    +      objname   IN VARCHAR2  
    +      ) 
    +      RETURN VARCHAR2 AS
    +      BEGIN
    +      RETURN 'DEPTNO = 30';
    +      END;
    +      /
    +      
    +
    +
  4. +
  5. Execute the SQL query by clicking the Run Statement icon shown in Figure 9.

    ODB-apps-p2-Figure9
    Figure 9. Icon for creating a executing the + query.

    +
  6. +
  7. Enter the following SQL query in the SQL Worksheet tab, replacing the previous content: +
  8. +
    +
    + +
    +
    +    
    +      BEGIN
    +      DBMS_RLS.ADD_POLICY 
    +      (    object_schema   => 'SCOTT',
    +      object_name     => 'EMP',
    +      policy_name     => 'RESTRICT_BY_DEPT',
    +      policy_function => 'RESTRICT_BY_DEPT', 
    +      statement_types => 'SELECT' 
    +      );
    +      END;
    +      /
    +
    +
  9. Execute the SQL query by clicking the Run Statement icon.
  10. +
+

If you simply reload the web page now, only employees in department 30 will be returned in both the report + and the chart, as illustrated in Figure 10.

+

ODB-apps-p2-Figure10
Figure 10: The report and chart after a rule restricting data to + the sales department was applied.

+

Note that the commission values are still being redacted, and the employees are still sorted from most paid + to least paid, per the RANK analytic function. Differences in salaries are still being displayed, per the + LAG analytic function as well. Not a single line in the HTML file or even the JavaScript file was changed. +

+

Also, Oracle Virtual Private Database applies to anytime the database queries the EMP table. This means that + data will be restricted when we view both the form and the report. If you edit any of the employees in the + SALES department, notice the URL in your browser. It should look something like this:

file:///Users/scott/OTN/Files/html/form.html#7698 +

The last parameter is the EMPNO of the user that we're editing; in this case, BLAKE. If we want to tamper + with the URL and try to see KING's record, all we need to do is change that last parameter to 7839 and + reload the page, right? Wrong.

+

As shown in Figure 11, if we try to view any employee who is not in the accounting department, we'll simply + get a blank form. Oracle Virtual Private Database is doing its job by protecting the data, regardless how it + is accessed.

+

ODB-apps-p2-Figure11
Figure 11: The results of trying to alter the URL and view an + employee not in the accounting department.

+

Auditing

+

As a final step, we can enable auditing to record each time the query used in the web service was accessed. + This will allow us to monitor its usage and ensure that it is being run by expected clients for authorized + purposes. In fact, auditing can be configured for almost any transaction in Oracle Database; both data + definition language (DDL), for creating and modifying objects, and data manipulation language (DML), for + querying and modifying data, are supported.

+

Note: In order to manage audit policies, your schema will need EXECUTE privileges on + DBMS_FGA. Your DBA might have to assist in granting these privileges.

+

In our example, we can use something called fine-grained auditing, which allows us to specify a condition + that will be triggered when an object is audited. We might only want to audit the EMP table if an employee + of a specific department is included in the results or if the value of a salary exceeds a specific value. +

+

To create an audit policy on EMP that will audit any transaction where the salary of an employee is updated + to be greater than 4000, do the following:

+
    +
  1. From Oracle SQL Developer, select the SQL Worksheet that corresponds to your schema. +
  2. +
  3. Enter the following SQL query in the SQL Worksheet tab:
  4. +
    +
    + +
    +
    +    
    +      BEGIN
    +      DBMS_FGA.add_policy 
    +      ( 
    +      object_schema   => 'SCOTT', 
    +      object_name     => 'EMP',
    +      policy_name     => 'WEBSERVICE', 
    +      statement_types => 'INSERT,UPDATE', 
    +      audit_condition => 'SAL > 4000', 
    +      audit_column    => 'SAL' 
    +      );
    +      END;
    +      /
    +
    +
  5. Execute the SQL query by clicking the Run Statement icon.
  6. +
+

Now, edit any employee and set their salary to a value less than 4000 and save the changes. Because the new + salary is less than 4000—which is the threshold defined in the rule—nothing was written to the audit logs. + Now, edit any employee and set the salary to 4001 or greater. Because the threshold of 4000 was exceeded, an + entry was written to the audit log.

+

If you're using Oracle Database 12c, the audit logs are written to the new unified audit table, + UNIFIED_AUDIT_TRAIL. This new view consolidates all the older audit views into a single place. If you're + still on Oracle Database 11g, the audit logs will be written to the standard DBA_FGA_* tables.

+

To view the audit logs in Oracle Database 12c, as the SYS or SYSTEM/PDB_ADMIN user, issue the + following query:

+
+
+
Copy
+
+
+      SELECT * from unified_audit_trail WHERE fga_policy_name = 'WEBSERVICE'   ORDER BY event_timestamp desc 
+
+

To view the audit logs in Oracle Database 11g, as the SYS or SYSTEM user, issue the following query: +

+
+
+
Copy
+
+
+    SELECT * FROM  dba_fga_audit_trail WHERE policy_name = 'WEBSERVICE'   ORDER BY timestamp desc 
+
+

In either case, many columns will be returned. If you look closely, you should be able to spot the SQL query + that was executed, as well as the schema and table that were impacted. 

+

Disabling Functionality

+

At the conclusion of this article, you might want to reset your schema to how it was before different + database features were enabled. To do that, simply run the corresponding SQL commands in Oracle SQL + Developer.

+

Redaction

+

To disable the redaction rule, execute the following:

+ +
+
+
Copy
+
+
+    
+     BEGINDBMS_REDACT.drop_policy
+     ( 
+     object_schema => 'SCOTT',
+     object_name   => 'EMP', 
+     policy_name   => 'REDACT_COMM');
+     END;
+     /
+     
+    
+
+

Oracle Virtual Private Database

+

To disable Oracle Virtual Private Database, execute the following:

+
+
+
Copy
+
+
+    BEGIN
+     DBMS_RLS.DROP_POLICY
+     (  
+     object_schema   => 'SCOTT',
+     object_name     => 'EMP', 
+     policy_name     => 'RESTRICT_BY_DEPT'
+     );
+     END;
+     /
+
+

Fine-Grained Auditing

+

To disable fine-grained auditing, execute the following:

+
+
+
Copy
+
+
+    BEGIN
+     DBMS_FGA.drop_policy
+     ( 
+     object_schema   => 'SCOTT',
+     object_name     => 'EMP', 
+     policy_name     => 'WEBSERVICE'
+     );
+     END;
+     /
+
+

Conclusion

+

No matter what front end you prefer, using Oracle Database as a back end provides a wealth of robust, mature, + and easy-to-use functionality that can be tailored to suit your individual needs. No longer do you need to + learn and incorporate complex libraries to help with analyzing and securing your data; all of this can + happen by using native functionality of Oracle Database, regardless of the front end.

+

About the Author

+

Scott Spendolini is president and founder of Sumner Technologies, a world-class Oracle services, education, + and solutions firm. Throughout his career, he has assisted clients with their Oracle Application Express + development and training needs. Spendolini is a long-time, regular presenter at many Oracle-related + conferences, including Oracle OpenWorld, Kscope, and Rocky Mountain Oracle Users Group (RMOUG). He is an + Oracle Ace Director, the author of Expert Oracle Application Express Security, and a coauthor of + Pro Oracle Application Express. Spendolini is also an Oracle Certified Oracle Application Express + developer.

+

Spendolini started his career at Oracle Corporation, where he worked with Oracle E-Business Suite for almost + seven years and was a senior product manager for Oracle Application Express for over three years. He holds a + dual bachelor's degree from Syracuse University in management information systems and telecommunications + management.

+ + + +
+
+ + \ No newline at end of file diff --git a/Articles/oci/105-compute-vm-simple-tutorial.html b/Articles/oci/105-compute-vm-simple-tutorial.html index e69de29..2120b8b 100644 --- a/Articles/oci/105-compute-vm-simple-tutorial.html +++ b/Articles/oci/105-compute-vm-simple-tutorial.html @@ -0,0 +1,332 @@ + + +
+
+ +

Create an Application on Oracle Cloud Infrastructure

+ +

This tutorial shows how straightforward it is to set up an Oracle Cloud Infrastructure Compute VM and create + a Python Flask “hello + world” application.

+ +

Here are the high-level steps:

+ +
    +
  1. Create an SSH key pair
  2. +
  3. Create a Compute VM instance
  4. +
  5. Open a port in your virtual cloud network (VCN)
  6. +
  7. Open a port in the Linux firewall
  8. +
  9. Create the Flask application
  10. +
  11. Test the application
  12. +
  13. Clean up your environment
  14. +
+
+

Before you begin

+ +

To successfully perform this tutorial, you must have an Oracle Cloud account. If you + don’t have one, you can sign up for the Oracle Cloud + Infrastructure Free Tier.

+ +

Create an SSH key pair in Cloud Shell

+ +

1. In the Oracle Cloud Infrastructure Console, click the Cloud Shell icon in the Console header.
+ Cloud Shell opens in a "drawer" at the bottom of the Console. It provides a preconfigured VM that + you will use to access and set up your project.

+ +

2. If you don’t already have a key pair that you can use, follow these steps to create one:

+ +

A. Create the .ssh directory, if it doesn’t exist:

+ +
+
+
Copy
+
+ +
+ 
+mkdir ~/.ssh
+chmod 700 ~/.ssh
+
+
+ +

B. Create an SSH key + pair in Cloud Shell:

+ +
+
+
Copy
+
+ +
+ 
+ssh-keygen -t rsa -N "" -b 2048 -C "" -f ~/.ssh/id_rsa
+
+
+ +

3. Display your public key:

+ +
+
+
Copy
+
+ +
+ 
+cat ~/.ssh/id_rsa.pub
+
+
+ +

4. Highlight the public key and use CTRL-C to copy it. You will use it in the next section.

+ +

5. Minimize Cloud Shell.

+ +

Figure 1.

+ + + +

Create a VM instance

+ +

Perform the following steps in the Console.

+ +

1. In the Quick Actions section of the Console dashboard, click Create a VM + instance.

+ +

Figure 1.

+ + + +

2. Enter a name or keep the default.

+ + +

Figure 1.

+ +

3. Accept the default values for all the other sections.

+ +

4. Scroll to the Add SSH keys section, and select Paste SSH keys.

+ + +

Figure 1.

+ + +

5. Paste your public key from your Cloud Shell.

+ +

6. Click Create.

+ +

Open port 5000 in your virtual cloud network (VCN)

+ +
    +
  1. On the VM instance details page, click Public Subnet.
  2. + +

    Figure 1.

    + +
  3. Click the default security list.
  4. + +

    Figure 1.

    + + +
  5. Click Add Ingress Rules.
  6. + +

    Figure 1.

    + +
  7. For Source CIDR, enter 0.0.0.0/0.
  8. +
  9. For Destination Port Range, enter 5000.
  10. +
  11. Click Add Ingress Rules.
  12. + +

    Figure 1.

    + +
  13. Return to the VM instance details page.
  14. +
+ +

Use SSH to connect to your VM instance

+ +

After your instance is running, perform the following steps to access it:

+ +

1. Copy the public IP address.

+ + +

Figure 1.

+ + +

2. Maximize your Cloud Shell.

+ + +

Figure 1.

+ +

3. Use SSH to log in to the instance:

+ +

ssh opc@<yourPublicIP>

+ +

Open port 5000 in the Linux firewall

+ +

Run the following commands to open port 5000:

+ +
+
+
Copy
+
+ +
+
+
+sudo firewall-cmd --permanent --zone=public --add-port=5000/tcp
+sudo firewall-cmd --reload
+
+
+
+
+
+
+ +

Create the Flask application

+ +

1. Create a directory to work in:

+ +
+
+
Copy
+
+ +
+ 
+mkdir flaskexample
+cd flaskexample
+
+
+ +

2. Create and activate a Python virtual environment:

+ +
+
+
Copy
+
+ +
+ 
+python3 -m venv venv
+source venv/bin/activate
+
+
+ +

3. Install Flask:

+ +
+
+
Copy
+
+ +
+ 
+pip install flask
+
+
+ +

4. Use nano to create your application:

+ +
+
+
Copy
+
+ +
+ 
+nano flaskexample.py
+
+
+ +

5. Copy the following code and paste it into Cloud Shell (ensure that the indentation is correct):

+ +
+
+
Copy
+
+ +
+ 
+from flask import Flask
+app = Flask(__name__)
+
+@app.route("/")
+def index():
+    return "Web App with Python Flask!"
+
+if __name__ == "__main__":
+    app.run(host='0.0.0.0')
+
+
+ +

6. Exit and save the file: press Ctrl-X, type y, and then press Enter.

+ +

7. Run the application:

+ +
+
+
Copy
+
+ +
+ 
+python flaskexample.py
+
+
+
+

Test the application

+ +

Open the application in a browser by entering http://:5000.

+ + +

Screenshot that shows the Flask application running in a web browser

+ +

Clean up your environment

+ +
    +
  1. On the VM instance details page, click Public Subnet.
  2. + +

    Screenshot that shows the Flask application running in a web browser

    + + +
  3. Click the default security list.
  4. + +

    Screenshot that shows the Flask application running in a web browser

    + + + +
  5. Under Ingress Rules, select the check box for the 5000 rule.
  6. + + + +
  7. Click Remove.
  8. + +

    Screenshot that shows the location of the Remove button and the rule selection check box.

    + + + + +
  9. Click Remove.
  10. +
  11. Return to the VM instance details page.
  12. +
  13. Click More Actions.
  14. +
  15. Click Terminate.
  16. + +

    + + + +
  17. Select Permanently Delete the Attached Boot Volume.
  18. +
  19. Click Terminate Instance.
  20. + +

    + + +
+ +

Watch the video version of this tutorial + (3:35)

+ +
+
+ + \ No newline at end of file diff --git a/Articles/opensource/148-serverless-with-fn-project.html b/Articles/opensource/148-serverless-with-fn-project.html index e69de29..a608a9b 100644 --- a/Articles/opensource/148-serverless-with-fn-project.html +++ b/Articles/opensource/148-serverless-with-fn-project.html @@ -0,0 +1,766 @@ + + +
+
+

The dominant cloud topic in 2017 was serverless architectures. At the Devoxx conference in Belgium, one of + the most prestigious developer conferences in Europe, you could attend at least seven different + presentations about "serverless."

+ +

This article introduces the Fn Project as a major new step in the serverless landscape. It’s different from + most other solutions: Fn is a cloud-agnostic, polyglot, open source framework for serverless computing with + Docker as the only dependency. It is also brand new; Fn was open sourced at the Java One 2017 conference. +

+ +

This article has several objectives. First, to lay the foundation and give you, as a developer, a quick + introduction into the serverless world by demonstrating its unique advantages and by clarifying some + not-so-well-defined terms. Second, and most importantly, to show how to quickly get started coding with the + new Fn Project. For hands-on development we cover Go and Java, monitoring, testing, local development, JSON + parameter marshalling, using Docker hub and running Fn in the cloud. Finally, the article provides an + overview of recent announcements and what to expect next. Fn is just the beginning of a journey into the + serverless world.

+ +

Introduction

+ +

Serverless is obviously not a very good name. Let’s face it: the IT industry is pretty bad at proper naming + and delivering exact definitions of new concepts. Cloud computing doesn’t happen in the sky. Data lakes + aren’t wet. Serverless indeed involves real servers. So let’s better define “serverless” and some related + concepts before we start coding.

+ +

Function as a Service (FaaS)

+ +

Function as a Service (FaaS) as a cloud service started in 2014 with AWS Lambda. The idea of FaaS is simple: + you run your source code, but don’t need to care about the underlying language runtime, container, virtual + machine, or server. In the easiest scenario, you just copy your source code, paste into a web frame of the + FaaS cloud service, and run it.

+ +

Technically, cloud-based FaaS solutions are implemented on top of containers (similar to, but not necessarily + built with, Docker). However, this container is usually not exposed to the end user. The function is run + only when it’s triggered by an event; this is why it is also called ephemeral compute. With FaaS, there is + no server constantly running for a user. There is also no runtime permanently listening on an IP address and + open port.

+ +

Events that can trigger a function depend on the cloud provider. Common examples of event sources are: a file + upload, a REST request or a message consumed from a messaging system.

+ +

What makes FaaS interesting on public clouds is that you pay only for the invocation of the function. Also, + the scaling is automated—i.e., there is no configuration for the number of function instances required.

+ +

This "never pay for idle" concept is compelling. Several use cases report cost savings of one or two orders + of magnitude when replacing a traditional, server-based application [1,2].

+ +

Let’s be fair and also have a look at the drawbacks. The main concern with today’s FaaS implementations is + vendor lock-in:

+ + + +

The concept of FaaS is still evolving, and there is a discussion amongst architects whether functions + shouldn’t better be treated as containers [3]. Currently, none of the bigger public cloud providers expose + the underlying container of a function.

+ +

Microservices vs FaaS

+ +

A microservices architecture tries to implement an application as a set of independent services. Each service + runs in its own process and owns its data; the services communicate with a light-weight protocol [4]. When + FaaS and microservices were both new, there was discussion about whether FaaS is just an implementation of + microservices.

+ +

In short: FaaS fulfils the definition of microservices. However, since a FaaS implements only a single + function, several functions must be composed into a meaningful microservice. But how?

+ +

Serverless Cloud Architectures

+ +

Now that we have explained FaaS, let’s have a look at the difference between "serverless" and FaaS.

+ +

Serverless is an architectural trend that tries to "reduce all notion of infrastructure" [5]. So FaaS is + serverless. A serverless cloud service is a PaaS service with real pay per use and automated scalability. +

+ +

For an example of a serverless cloud service let’s picture a messaging service. If you pay only for the + number of messages that you produce and consume, and if the service scales automatically, it’s fair to call + it serverless.

+ +

If servers are visible, with message brokers deployed to them, and you pay per hour provisioned if you + produce or consume messages or not, then it is not serverless.

+ +

FaaS (Frameworks)

+ +

Currently, more than a dozen FaaS frameworks or platforms are available. (For an overview of the projects see + [6].) These projects can be classified into three different categories, based on their objective and reach + (where each category typically includes the characteristics of the previous one) [7]:

+ +

1.Complexity: Reduce the complexity of a particular vendor’s cloud-based FaaS implementation—e.g., the + configuration of the API gateway and access management that is required for a REST- based function. A + typical example for this category: AWS Chalice.
+ 2. Portability: Provide an abstraction framework for portability and ease of use on top of the FaaS + implementation of various public cloud providers. A popular example is the serverless.com framework.
+ 3. Standards: Provide a standard-based, serverless platform or framework to abstract running functions from + the operation of servers. These frameworks are typically developed without a particular cloud provider in + mind. When running such a framework on top of IaaS, servers are abstracted away, automated scaling is + possible, but no true per invocation is achieved due to the IaaS pricing model. Examples for this category + are Open FaaS, and Fn Project.

+ +

Fn Project

+ +

Fn Project is a serverless platform with a number of unique advantages: Fn is container-centric, polyglot, + cloud agnostic and has Docker as the only dependency. At the moment, Fn project is a software platform and + no FaaS as PaaS is available.

+ +

It’s easiest to understand all these features when seeing Fn in action and running some functions yourself. + So let’s get started with the installation first, and then do some coding.

+ +

Fn Project Installation

+ +

Fn installs easily on Windows and Unix systems with a one-line command.

+ +
+
+
Copy
+
+ +
+
+$ curl -LSs https://raw.githubusercontent.com/fnproject/cli/master/install 
+| sh
+
+
+ +

On Mac OS it can be installed with brew. For more details regarding the installation details see [8].On Mac + OS it can be installed with brew. For more details regarding the installation details see [8].

+ +

Some Fn Basics with Go

+ +

To get your head around the Fn features I explained above, let’s start with a simple Go function. We create a + new directory. In the new directory, we initialize a local Fn function using the Go language:

+ +
+
+
Copy
+
+ +
+
+# create oradev and with boilerplate for go
+
+$ fn init --runtime go oradev 
+$ cd oradev
+
+ +

Then you can immediately run the function with the run command and observe the output of our HelloWorld + application. It will look as follows:

+ +
+
+
Copy
+
+ +
+
+$ fn run
+
+Building image oradev:0.0.1 ..
+{"message":"Hello World"}
+
+ +

To see why this was possible, have a look at the generated files. There is a func.yaml configuration file + generated that specifies a version number and the Go runtime. Also, a default go routine (func.go) is + generated for you with a test data file test.json. $ tree

+ +
+
+
Copy
+
+ +
+
+.
+├── func.go
+├── func.yaml
+└── test.json
+
+
+ +

If you carefully check the output above when running the function, you’ll spot that a Docker image + oradev:0.0.1 was built.

+ +

The fn run command invokes the function directly. To create an endpoint for the function, we + first need to start the Fn server in another terminal.

+ +
+
+
Copy
+
+ +
+
+$ fn start
+
+ +

Once the server is running, you can deploy the function with the following command:

+ +
+
+
Copy
+
+ +
+
+$ fn deploy --app mygo –-local
+
+Deploying oradev to app: mygo at path: /oradev
+Bumped to version 0.0.2
+Building image oradev:0.0.2 ..
+Updating route /oradev using image oradev:0.0.2...
+
+
+ +

The function name is taken from the folder name. Alternatively, you can specify it in the func.yaml file.

+ +

While deploying the function, the version of the Docker image is bumped to 0.0.2. Due to the deploy command, + a new application is registered. An endpoint for the function is also created. To verify this, run the + following two Fn commands that list the applications and the new route:

+ +
+
+
Copy
+
+ +
+
+# check deployed applications
+$ fn apps list
+mygo
+
+# check existing routes
+$ fn routes list mygo
+
+path	image		endpoint
+/oradev	oradev:0.0.2	localhost:8080/r/mygo/oradev
+
+
+ +

The function is registered now with Fn server, which acts like a micro API gateway. It accepts calls to the + endpoint that we listed above and calls the deployed function. To try this yourself, run the following + command:

+ +
+
+
Copy
+
+ +
+
+$ # invoke function via fn server
+$ fn call mygo /oradev
+
+{"message":"Hello World"}
+
+
+ +

Alternatively, since Fn servers provide a URL for the function, you can also invoke it with a simple curl + command from the UNIX command line:

+ +
+
+
Copy
+
+ +
+
+$ # invoke function with UNIX curl
+$ curl localhost:8080/r/mygo/oradev
+{"message":"Hello World"}
+
+
+ +

Yet another alternative is running the Docker image directly. Let’s try and run the generated Docker image + with the following command:

+ +
+
+
Copy
+
+ +
+
+$ # run the docker image
+$ docker run oradev:0.0.2
+{"message":"Hello World"}
+
+
+ +

All three approaches yield identical results.

+ +

Container / Function Duality

+ +

Note that invoking the function via its URL endpoint or running the Docker container returns exactly the same + result! However, the Docker image that contains the function was built automatically, without any additional + configuration or command necessary. So to speak Fn Project gives you Docker for free.

+ +

Two of the many benefits of using Docker are:

+ + + +

We will explore both concepts below. With Fn you simply write a function without paying attention to Docker, + yet benefit from it by running your function as a container.

+ +

Fn Monitoring

+ +

Fn Project also comes with a basic monitoring tool that can be run as a Docker container with the following + command:

+ +
+
+
Copy
+
+ +
+
+$ docker run --rm -it --link fnserver:api -p 4000:4000 -e "FN_API_URL=http://api:8080" fnproject/ui
+
+> FunctionsUI@0.0.21 start /app
+> node server
+
+Using API url: api:8080
+Server running on port 4000
+
+
+ +

To access the console, open a browser and connect to port 4000. Run the Go function a couple of times more to + see a change in the graphs of the monitoring console [9].

+ +

Graph changes as viewed in monitoring console

+

Graph changes as viewed in monitoring console

+ +

Prometheus Monitoring

+ +

For a more sophisticated monitoring solution, Cloud Native Computing Foundation (CNCF) Prometheus with CNCF + Grafana is a good option. Fn exports metrics that allow monitoring with Prometheus without any additional + configuration.

+ +

Even without installing Prometheus, you can have a look at the metrics that are exported for Prometheus with + the /metrics URL:

+ +
+
+
Copy
+
+ +
+
+$ curl localhost:8080/metrics | head
+
+# HELP fn_api_completed Completed requests by path
+# TYPE fn_api_completed counter
+fn_api_completed{app="mygo",path="/oradev"} 11
+# HELP fn_api_queued Queued requests by path
+# TYPE fn_api_queued gauge
+fn_api_queued{app="mygo",path="/oradev"} 0
+# HELP fn_api_running Running requests by path
+# TYPE fn_api_running gauge
+fn_api_running{app="mygo",path="/oradev"} 0
+# HELP fn_docker_stats_cpu_kernel docker_stats metric cpu_kernel
+...
+
+
+ +

Further details regarding Prometheus and Fn are described in [10].

+ +

Java HelloWorld Example

+ +

You could create a Java HelloWorld example the same way as we did the Go example just by replacing the + runtime switch with Java:

+ +
+
+
Copy
+
+ +
+
+$ cd ~ && mkdir javatest && cd javatest
+$ fn init --runtime java
+Runtime: java
+Function boilerplate generated.
+func.yaml created.
+
+
+ +

Java 9 is the default Java version. Note that for a Java project a Maven pom.xml file and a unit test + HelloFunctionTest.java are also generated.

+ +
+
+
Copy
+
+ +
+
+$ tree 
+
+.
+├── func.yaml
+├── pom.xml
+└── src
+    ├── main
+    │   └── java
+    │       └── com
+    │           └── example
+    │               └── fn
+    │                   └── HelloFunction.java
+    └── test
+        └── java
+            └── com
+                └── example
+                    └── fn
+                        └── HelloFunctionTest.java
+
+
+ +

Java JSON Parameter Marshalling and Function Logic

+ +

To show some more advanced features of Fn we skip the Java HelloWorld example and look at a mock example for + a recommendation engine instead. You can clone it from github with the following command:

+ +
+
+
Copy
+
+ +
+
+$ git clone https://github.com/fmunz/fn-recommend.git
+$ cd fn-recommend
+
+
+ +

Have a look at the API of the function that simulates the recommendation logic. It uses a POJO as an input + parameter that defines the traveller’s age, destination, and the month of travel:

+ +
+
+
Copy
+
+ +
+
+# check the API of the handler function
+$ grep handle src/main/java/com/munzandmore/fn/RecommendFunction.java 
+
+    public String handleRequest(Traveller t) {
+
+# examine the Traveller POJO
+$ cat src/main/java/com/munzandmore/fn/Traveller.java 
+
+package com.munzandmore.fn;
+public class Traveller {
+    public Integer age ;
+    public String  destination ;
+    public String  month;
+}
+
+
+ +

This time we also want to push it automatically to Docker hub (unlike with the previous Go example, which we + kept local). Therefore, we set the environment variable FN_REGISTRY to the + DOCKER_ID and we also log into Docker hub. In the example below, replace DOCKER_ID + with your own Docker login.

+ +
+
+
Copy
+
+ +
+
+# set environment for Docker hub
+$ export FN_REGISTRY=YOUR_DOCKER_ID
+$ docker login
+Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
+Username: DOCKER_ID
+Password: 
+Login Succeeded
+
+
+ +

Then deploy the function. We use the function for an Adventure Travel application, hence the name:

+ +
+
+
Copy
+
+ +
+
+$ fn deploy --app advtravel 
+Deploying fn-recommend to app: advtravel at path: /fn-recommend
+Bumped to version 0.0.2
+Building image DOCKER_ID/fn-recommend:0.0.2 
+Pushing DOCKER_ID/fn-recommend:0.0.2 to docker registry...The push refers to repository [docker.io/DOCKER_ID/fn-recommend]
+7e2c18073a13: Layer already exists 
+...
+0.0.2: digest: sha256:549e492a08d924dcfeef5f0354dc7d2df57cba820bcfa7ec550a1779a173983c size: 1997
+Updating route /fn-recommend using image DOCKER_ID/fn-recommend:0.0.2...umped to version 0.0.2
+
+
+ +

From the output above you can tell that a Docker image is created and pushed to the Docker hub under + DOCKER_ID/fn-recommend:0.0.2.

+ +

Again, you can check for the new application and the new route created within Fn server:

+ +
+
+
Copy
+
+ +
+
+$ fn apps list 
+advtravel
+mygo
+
+
+$ fn routes list advtravel
+path		image			endpoint
+/fn-recommend	DOCKER_ID/fn-recommend:0.0.2  localhost:8080/r/advtravel/fn-recommend
+
+
+
+ +
+
+
Copy
+
+ +
+
+You can run the function with a POST request with the curl command by providing the necessary JSON data structure for the request. As default, Fn uses the Jackson Java framework to automatically marshall the JSON input parameter to the correct Java type, but you can also use any marshalling framework for JSON or other formats like XML, etc.
+
+$ cat testdata/muc.json
+{
+    "age": 41,
+    "destination": "Munich",
+    "month": "Oct"
+}
+# get a recommendation for Munich in October
+$ curl -X POST --data @testdata/syd.json localhost:8080/r/advtravel/fn-recommend 
+
+Visit the Octoberfest!
+# there is more test data under testdata/Casablanca.json   
+# see what is recommended for that city!
+
+
+ +

For evaluating different input parameters, a graphical tool such as Postman is more convenient. Check what + the Fn-based mock recommends for a trip to Sydney:

+ +

Figure 2. Example of Fn-based mock

+ +

The output should look as follows:

+ +

Figure 3. Example of Fn-based mock output

+ +

Fn in Public Clouds (IaaS)

+ +

A common question is how to use Fn Project, a cloud-agnostic framework, in public clouds. Similar to the + local installation that we used in the examples above, it can be installed on any public cloud IaaS. For + most IaaS clouds it is enough to pass the installation command directly to the creation of a compute + instance as so-called "user data" (commands that are acted upon when the instance is provisioned). Also, + when running Fn in a public cloud, don’t forget to enable access rules for Fn Server allowing port 8080, + either from your own IP or all public IP addresses.

+ +

Obviously, when running Fn Project on an IaaS you do not get the true pay per invocation benefit as you would + with a FaaS implemented by the cloud provider as PaaS. Still, functions are run serverless from a user’s + perspective in a standardized, portable, and scalable way.

+ +

Once Fn Server is running on your favourite cloud provider, you could deploy the recommender example from + above in two different ways.

+ +
+
+
Copy
+
+ +
+
+# example 1 (for demo purpose only, in production use approach below)
+# note: run these commands on the cloud instance
+$ fn apps create advtravel
+$ fn routes create advtravel /fn-recommend DOCKER_ID/fn-recommend:0.0.2
+
+# check for the created route
+$ fn routes list advtravel
+
+
+ +

Note that with the two commands above you never had to copy over the function or the container image to the + cloud instance. When the function is invoked the first time, Fn will pull the Docker container, store it + locally, and then run the function.

+ +

Another probably even more useful way to deploy the function is to set the FN_API_URL environment variable + locally, point it to the remote cloud instance, and then run the local Fn deploy command against the remote + cloud instance.

+ +
+
+
Copy
+
+ +
+
+# example 2
+# run these commands on cloud instance
+
+
+$ export FN_API_URL=URLCloudInstance
+$ fn deploy --app advtravel 
+$ fn routes list advtravel
+
+
+ +

Once the Fn is running in the cloud and your application is deployed you can access the application from a + local machine using the command-line or Postman. The invocation is the same as in the local example—just + replace localhost with the public IP address of your cloud instance:

+ +
+
+
Copy
+
+ +
+
+$ curl -X POST --data @testdata/syd.json PUBLIC_IP:8080/r/advtravel/fn-recommend 
+
+
+ +

A recorded live demo from the Devoxx conference about deploying a Fn-based recommendation engine mock on IaaS + can be seen at footnote [11].

+ +

JAX-RS, Spring Cloud and more

+ +

Since Fn Project has Docker as the only dependency and for Java projects a Maven pom.xml file is also + generated, your function development can be easily extended to use other Java frameworks.

+ +

Work has been done by the Fn team to support JAX-RS with Fn projects [12]. Spring also supports the + implementation of business logic as functions using their convention-over-configuration approach with Spring + Cloud Functions. Spring Cloud Functions can be used together with Fn [13, 14].

+ +

Fn LB

+ +

A separate component, Fn LB deals with load balancing and intelligent traffic routing. If functions are + deployed as hot functions, a container is kept alive for 30 seconds (and not restarted for every + invocation). Fn LB will then route invocations to these hot functions to ensure optimal performance [15]. +

+ +

Fn Flow

+ +

At the beginning of this article we discussed the difference between microservices and FaaS and explained + that a microservice typically contains more than a single method or function. Today, graphical tools or + higher-level PaaS are often used to compose FaaS into more meaningful larger services. However, these + graphical tools often don’t provide much visibility into the details of the higher-level service. Lessons + learned from development with ESB and BPEL show that these details cannot all be displayed at the same time + and therefore they are more often than not buried under some property tab of the graphical model. Therefore, + showing "flow" in a graphical model is often limited.

+ +

Fn Flow tackles this issue for Fn Project. It follows an interesting, different, code-first approach by using + the Java 8 CompletableFutures API with methods such as thenApply() or then thenCompose(), etc. No graphical + tool or lengthy YAML file is required; the composition of functions is done with Java 8 constructs only and + is therefore easily readable.

+ +

An interesting application of this concept is shown in a demonstration of using SAGAs instead of an ACID + transaction for a travel booking application based on microservices [16].

+ +

Using SAGAs for a travel booking application based on microservices

+

Using SAGAs for a travel booking application based on microservices

+ +

What looks like a regular Java 8 program at first sight, during execution resembles more like what you might + know from Apache Spark. The execution happens in parallel, function input parameters are being marshalled + and return values are being unmarshalled. Every function is executed in its own container using chaining, + error handling and fan in/out.

+ +

Fn Flow can track the call graphs and visualize them (the screenshot below is taken from [17]).

+ +

Tracking and visualizing call graphs

+

Tracking and visualizing call graphs

+ +

Fn on Kubernetes

+ +

At Kubecon in December 2017, support for Fn on Kubernetes was officially announced.

+ +

Fn can now be installed from the command line using Helm, a Kubernetes package manager. Preconfigured + packages of Fn resources, including Fn service, Fn UI, Flow service and Flow UI, are provided as a Helm + chart. Using this Helm chart, Fn is deployable on any Kubernetes cluster [18]. It also enables the running + of Fn on the brand new Oracle managed Kubernetes service, Oracle Container Engine (OCE) or a local + installation of minikube on your laptop [19] [20].

+ +

If Fn LB is deployed with Kubernetes, it can be updated if Fn Server Kubernetes pods are added or removed. +

+ +

Summary

+ +

Fn Project is an interesting new approach to the serverless world. It is cloud agnostic and therefore avoids + the cloud vendor lock-in. Further, developers are not bound to certain languages when using Fn. Functions + are automatically placed into a Docker image without any additional effort from the developer, so they can + be run anywhere by just pointing Fn to the correct image on Docker Hub.

+ +

Fn ties into the world of Cloud Native Computing Foundation projects with support for Kubernetes and + Prometheus as a first start and with hopefully more to come. Last but not least, it will be interesting to + see if Oracle or any other cloud provider will offer an Fn-based FaaS service as PaaS in the near future + with pay per invocation and fully automated scaling [21].

+ +

References

+ + + +

About the Author

+ +

Dr Frank Munz is an expert in cloud computing, big data, fast data, containers, and Oracle Fusion Middleware. + He runs the boutique consulting firm munz & more and works as a software architect, cloud evangelist, and + independent Oracle Developer Champion.

+
+
\ No newline at end of file diff --git a/Articles/proximasafe/122-chapter-2.html b/Articles/proximasafe/122-chapter-2.html index e69de29..8121867 100644 --- a/Articles/proximasafe/122-chapter-2.html +++ b/Articles/proximasafe/122-chapter-2.html @@ -0,0 +1,757 @@ +
+
+ +

I set a course just east of Lyra
+ And northwest of Pegasus
+ Flew into the light of Deneb
+ Sailed across the Milky Way
+

+

—Neil Peart, (1977) +

+ + +
+
+ +
+
+ +

A quick recap

+ + + + + +

In the previous article we showed the ProximaSafe scope, overall architecture and the components need to + achieve our goal: get the stream flow coming from a determined edge environment in OCI, perform the + analysis of the stream to detect possible anomalies and send back the errors to the edge in order to carry + out corrective actions. All this with development boards commercially available (almost) anywhere and easy + to pack and transport anywhere.

+

Now it is time to having fun fiddling with sensors and OCI Functions, covering a number of areas such as: +

+ + + + + +

That said, without further ado let's dive into some practical aspects of the matter.

+ +
+
+ + + + + +
+
+

Selecting the edge components

+ +

During the spring of 2020 (and the relating lockdown) I fell - almost immediately - in love with the M5Stack + development boards series, based on the ESP32 microcontroller. These cute little boxes have an integrated + display, which - sometimes - is useful to help with building simple and intuitive on-board GUIs (that's not + my case, I'll always be an ASCII fanboy) or debugging and showing messages contents without bothering to + open a serial terminal from the Arduino IDE. Furthermore, a sumptuous choice of different programming + models, IDEs and languages is available: +

+ + + + + +

Needless to say, I'll go for the first choice. I cleary remember the time when IDEs didn't exist (yes, I'm + that + old) and all you got from a compile-link-run session was a disturbing message that read "segmentation fault + (core dump)". We now have modern and productive environments, and - overall - choice, so pick up your + environment of choice and follow the rest of this articles as a reference.

+ + +

dev-promima-safe-chapter2-1

+ +

In addition to the ESP32 family, we'll use an ESP8266-based smart badge that will act as a wearable + device.

+

And, of course, we can't help but use the ubiquitous Raspberry Pi - that year over year is getting specs + almost on-par with his bigger cousins - to act as physical and logical link between edge and the Cloud + environments. This pocketable Linux device will be crucial in bridging the local MQTT instance to the OCI + Cloud instance described and set up in the previous chapter.

+ +

dev-promima-safe-chapter2-2

+ +
+
+ + + +
+
+

The Raspberry Side: MQTT Bridging

+ +

Installing Mosquitto and the related CLI utilities on a Pi is + straightforward, by issuing the command sudo apt install mosquitto and sudo apt install mosquitto-clients. + Once started, you can check the status by issuing the command systemctl status mosquitto, which should be + followed by + something like:

+ + +
+
+
Copy
+
+ +
+
+ Loaded: loaded (/lib/systemd/system/mosquitto.service; enabled; vendor
+preset: enabled)
+ Active: active (running) since Tue 2021-03-30 17:22:35 CEST; 19h ago
+ Docs: man:mosquitto.conf(5)
+ man:mosquitto(8)
+ Main PID: 635 (mosquitto)
+ Tasks: 1 (limit: 4915)
+ CGroup: /system.slice/mosquitto.service
+ └─635 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf
+...
+
+
+ +

and proceed to modify the /etc/mosquitto/conf.d/mosquitto.conf file configuring the bridging mechanism. Most + of the default parameters are just fine (unless you want to setup an encrypted connection between + microcontrollers and the edge instance). In our case we'll just configure the bridge, so using our favorite + editor of choice, even if your favorite search engine suggests otherwise(!):nd proceed to modify the + /etc/mosquitto/conf.d/mosquitto.conf file configuring the bridging mechanism. Most of the default parameters + are just fine (unless you want to setup an encrypted connection between microcontrollers and the edge + instance). In our case we'll just configure the bridge, so using our + favorite editor of choice, even if your favorite search engine suggests otherwise(!):

+ +

dev-promima-safe-chapter2-3

+ +

and reaching the Bridges section:

+ + +
+
+
Copy
+
+ +
+
+# =================================================================
+# Bridges
+# =================================================================
+
+# A bridge is a way of connecting multiple MQTT brokers together.
+# Create a new bridge using the "connection" option as described below. Set
+# options for the bridges using the remaining parameters. You must specify the
+# address and at least one topic to subscribe to.
+
+
+ +

we can add the following parameters:

+ + +
+
+
Copy
+
+ +
+
+connection proxima
+address [host:port]
+topic # out 0 "" edge/
+topic alarm in 0 cloud/ edge/
+
+
+
+ + + + + +

Where the host and port parameters are the public IP address and the of the OCI instance we configured + in the first episode, and the other parameters indicate that: +

+ + + Sure enough, we also need to setup the certificate based SSL/TLS support, so reach for the section + regarding security and complete it with: + + +
+
+
Copy
+
+ +
+
+# -----------------------------------------------------------------
+# Certificate based SSL/TLS support
+# -----------------------------------------------------------------
+# Either bridge_cafile or bridge_capath must be defined to enable TLS support
+# for this bridge.
+# bridge_cafile defines the path to a file containing the
+# Certificate Authority certificates that have signed the remote broker
+# certificate.
+# bridge_capath defines a directory that will be searched for files containing
+# the CA certificates. For bridge_capath to work correctly, the certificate
+# files must have ".crt" as the file ending and you must run "openssl rehash
+# [path to capath]" each time you add/remove a certificate.
+# bridge_capath
+bridge_cafile /etc/mosquitto/certs/ca.crt
+
+# Path to the PEM encoded client certificate, if required by the remote broker.
+bridge_certfile /etc/mosquitto/certs/server.crt
+
+# Path to the PEM encoded client private key, if required by the remote broker.
+bridge_keyfile /etc/mosquitto/certs/server.key
+
+# When using certificate based encryption, bridge_insecure disables
+# verification of the server hostname in the server certificate. This can be
+# useful when testing initial server configurations, but makes it possible for
+# a malicious third party to impersonate your server through DNS spoofing, for
+# example. Use this option in testing only. If you need to resort to using this
+# option in a production environment, your setup is at fault and there is no
+# point using encryption.
+bridge_insecure true
+
+
+ + +

thus creating a certs directory under /etc/mosquitto and copying the ca.cert, server.crt and + server.key files we generated during the first episode in section Secure the MQTT Server running on + OCI Compute.

+ + +

That is easy to test. Issuing a listening command to the Cloud instance in a shell, as shown in the first + episode: +

+ + +
+
+
Copy
+
+ +
+
+mosquitto_sub -d -t '#' -h [your host] -u [username] -P [password] -p [port] --insecure --cafile certs/ca.crt --cert certs/server.crt --key certs/server.key
+
+
+ + +

and sending a message to the local Raspberry Pi

+ + +
+
+
Copy
+
+ +
+
+mosquitto_pub -h [your RPi IP address] -t test -m 'Sympathetic resonance'
+
+
+ +

we should receive on the Cloud Mosquitto shell the message: +

+ +
+
+
Copy
+
+ +
+
+Client (null) received PUBLISH (d0, q0, r0, m0, 'edge/testtopic', ... (21
+bytes))
+Sympathetic resonance
+
+
+ +

showing that the two thingies are effectively talking themselves - albeit in a single direction, for now.

+

The pipelines we'll design in Stream Analytics will provide the logic to test the bidirectional dialogue. + And, + now, let's have some healthy fun with sensors!

+ + + +
+
+ + + +
+
+

Edge Programming

+

The goal is to build an edge that can easily fit into a small briefcase, and - certainly - an ESP32-based kit + will help saving space, time, and power consumption. Let's consider a setup that includes some edge + emitters (MQTT publishers), some receivers (MQTT subscribers) and the Gateway:

+ + +

dev-promima-safe-chapter2-4

+
+
+ +
+
+ +

Publishers

+ + + +

You can find all the sources I've used at this link (NOTE: insert the GitHub link, open in a new window/tab). +

+
+
+ +
+
+

Subscribers

+ + + + + +

Both the publisher and the subscriber will use the PubSubClient API. Specifically, the Publishers will send + messages to the local MQTT server via the publish method: +

+ +
+
+
Copy
+
+ +
+
+Result = mqttClient.publish(MACHINE_TOPIC, msg, true);
+ M5.Lcd.setCursor(10, 60);
+ if (Result)
+ M5.Lcd.println("Sent.");
+ else
+ M5.Lcd.println("Not sent.");
+
+
+ + +

while Subscribers will initialize the callback in the setup() portion of the code (executed only once at + startup): +

+ +
+
+
Copy
+
+ +
+
+configTime(gmtOffset_sec, daylightOffset_sec, ntpServer);
+ timestamp = getTime();
+ if (timestamp > 0)
+ noTime = false;
+ mqttClient.subscribe(TOPIC);
+ Serial.println("Subscribed!");
+ mqttClient.setCallback(DisplayCallback);
+ delay(100);
+
+
+ + +

and upon the reception of new messages (in our case, from OCI), processing will occur in +

+ +
+
+
Copy
+
+ +
+
+void DisplayCallback(char* topic, byte* payload, unsigned int len)
+{
+ // Process message
+ // Serial.println((String)topic);
+}
+
+
+ +

Programming these gizmos is fun and it's a very effective means of spreading the culture of programming + among students of all levels (including myself). Plus, there's plenty of examples available on the Web.

+

Still, we need to design a way to return alarm messages from Stream Analytics to the edge, using the MQTT + Bridge feature we set up not too long ago.

+

As described in the previous Episode, our approach will be as the following: +

+ +

dev-promima-safe-chapter2-5

+ + +

thus we (thankfully) need to tinker with Oracle Functions.

+ +
+
+ + + +
+ +
+

Serverless Time!

+

FnProject is a cool Open Source serverless platform that can scale from microdevices to megainstallations, + launched in 2017, and later transformed and evolved in an industrial-strength OCI service called Oracle + Functions. +

+ + +

Developing a function in OCI requires either:

+ + + + Either way, you'll be good to go with the function deployment in OCI. + We will use a Custom Dockerfile to build our image in Python, such as the following: + + +
+
+
Copy
+
+ +
+
+FROM fnproject/python:3.6-dev as build-stage
+WORKDIR /function
+ADD requirements.txt /function/
+RUN pip3 install --target /python/ --no-cache --no-cache-dir -r
+requirements.txt && rm -fr ~/.cache/pip /tmp* requirements.txt func.yaml
+Dockerfile .venv
+ADD . /function/
+RUN rm -fr /function/.pip_cache
+FROM fnproject/python:3.6
+WORKDIR /function
+COPY --from=build-stage /python /python
+COPY --from=build-stage /function /function
+COPY certs /function
+ENV PYTHONPATH=/function:/python
+ENTRYPOINT ["/python/bin/fdk", "/function/func.py", "handler"]
+
+
+ +

specifying Python requirements in requirements.txt file as we're going to use the Paho + Library:

+ +
+
+
Copy
+
+ +
+
+fdk
+paho-mqtt
+
+
+ +

and write some code to complete the round trip, copying the certs folder and files used to access the + MQTT Server on OCI in the function directory. Please find the Dockerfile and the code at this address + (NOTE: insert the GitHub link, open in a new window/tab).

+

Oracle Functions (as Fn Project) requires the function to be installed in + an artifact called Application, a + logical grouping of functions, which can be created via the fn CLI (specifying the OCI subnets) or in the + OCI + Web console following the path Home » Developer Services » Functions: +

+ + +

dev-promima-safe-chapter2-6

+ + +

Once the application is created, we can deploy the function (this time we'll leverage the good-ole CLI) using +

+ + +
+
+
Copy
+
+ +
+
+fn build
+fn deploy --app [app name]
+
+
+ + + + + +

where you can see some familiar Docker (layer-related) output messages and the result of deployment. +

+ +
+
+
Copy
+
+ +
+
+Building image fra.ocir.io/emeaseitalyproxima/gabbarepository/mqtt_pub:0.0.2 .
+Parts: [fra.ocir.io emeaseitalyproxima gabba-repository mqtt_pub:0.0.2]
+Pushing fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2 to
+docker registry...The push refers to repository
+[fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub]
+77ff3ee9cb37: Pushed
+3353efa4559: Pushed 
+0f6cdd7e71a8: Layer already exists 
+3697bae2d860: Layer already exists 
+0b66d6c41076: Layer already exists 
+85e1ba76ed69: Layer already exists 
+6881daa7bad0: Layer already exists 
+7352730c981f: Layer already exists 
+9d95bea46bad: Layer already exists 
+b84a8d46e8fb: Layer already exists 
+f66ed577df6e: Layer already exists 
+0.0.2: digest:
+sha256:e82a0abc009c0a132fc6c3c35fc8d88f516589b35a96907c41e41a350619872d
+size: 2626
+Updating function mqtt_pub using image
+fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2...
+
+
+
+
+ + +

The status of the function will be reflected in the OCI Web Console as well as in CLI, issuing the command + fn list functions :

+ + +
+
+
Copy
+
+ +
+
+NAME           IMAGE           ID
+mqtt_pub fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2
+ocid1.fnfunc.oc1.eu-frankfurt
+ 1.aaaaaaaaabbknysfrffi2olayuzykycv5boop72qi75k5aqgjwjfjdlycutq
+
+
+
+
+
+ +

The function must be provided with the three input parameters that can be set on the OCI Web Console in + the Configuration submenu: +

+ + +

dev-promima-safe-chapter2-7

+ + +

specifying your Mosquitto username, password and the alarm topic edge/alarm. Note those parameters, + as we'll use them to perform some smoke test! +

+ +
+
+ +
+
+ +

Creating the API Gateway and an API deployment

+ +

The mechanisms to expose and consume APIs in Oracle Cloud Infrastructure are accessible in the Main + menu » Developer Services » API Management section of OCI Web Console. We'll create an API + Gateway + first, and then an API deployment specifying the Oracle Function we created previously. Creating an API + Gateway involves specifying:

+ + + + +

dev-promima-safe-chapter2-8

+ +

Hitting the blue Create button starts the magic, and creation is quick. Then, we can proceed + to shape our + API deployment by clicking the link named "Deployments" in the bottom left Resources section and fire the + Create Deployment procedure, which consists of three stages: +

+ + +
+
+ + + +
+
+ +

Next Episode - Use cases and Stream Analytics + pipelines

+ +

Going further, we'll need to design some simple use cases as an example and develop some pipelines + within Stream Analytics to close the loop and test our setup. See you on the next chapter

+ + +

See you on the next chapter. Sensors, Pipelines and back to Edge

+

Zip and Zest!

+ + +
+
+ + +
+
+ +
+ +
+ +
+ +

About the Author

+

Gabriele Provinciali works as Solution Architect in Rome, Italy - passionate about everything that is + tinkerable!

+
+ +
+
+ \ No newline at end of file diff --git a/Articles/proximasafe/125-proximasafe-part-3.html b/Articles/proximasafe/125-proximasafe-part-3.html index e69de29..20c9dd3 100644 --- a/Articles/proximasafe/125-proximasafe-part-3.html +++ b/Articles/proximasafe/125-proximasafe-part-3.html @@ -0,0 +1,649 @@ +
+
+ + + +

Wandering the face of the Earth
+ Wondering what our dreams might be worth
+ Learning that we're only immortal
+ For a limited time +

+

—Neil Peart, (1991)

+ + +
+
+ +
+
+ +

What we've done so far

+ +

Greetings, and welcome to Chapter Three, the last of this series. In the previous articles + we set up all the + bits and pieces we need to build a portable lab aimed to study and develop stream analysis, such as:

+ + + + +

Now it's time to design some example use cases that imply:

+ + + + + + + + + + + + +
+
+ + + + + + + +
+
+

Use Cases

+

We can think of - at least - three scenarios using the technologies set up during the previous chapters and + effectively make the most of stream analysis:

+ + +

These simplistic use cases do not fully take into account all the possibilities offered by selecting a + Geofence in Golden Gate Stream Analytics, that is a cool and useful function for scanning what is + happening within a defined set of coordinates, but could serve as basis for more sophisticated (and real!) + analysis performed on any size of environment. Nevertheless, we'll hardwire each flow to different + coordinates mocking the presence of sensors in the real world. +

+ +
+
+ + + +
+ +
+

Assign the tasks to Edge components

+

Here's the map of the components we're going to use (and program): + +

+

dev-promima-safe-chapter3-1

+

Please find the code for each of these boards at this address (NOTE: insert the GitHub link, open in a new + window/tab).

+ +
+
+ +
+
+ +

1 - Arduino with DHT11 sensor (Environment)

+ +

This configuration is the 'Hello World' equivalent of the Arduino programming. The board is an Arduino + MKR1000, sporting a 32-bit, low power ARM MCU provided by the Atmel ATSAMW25 SoC. Wi-Fi and + MQTT features are provided by the WiFi101.h and pubsub.h libraries, respectively. No GUI is provided, + except for the Serial Monitor available in Arduino IDE, so we're (possibly) going to use the + Serial.println() extensively. +

+ +

dev-promima-safe-chapter3-2

+ +

As I mentioned in one of the previous chapters, I tried to switch to a more compact sensor (a BMP280 hat + connected to a M5StickC) in order to avoid a scruffy jumper-based connection between the MKR1000 and + the DHT11, but the temperature and humidity measured is heavily influenced by the heat generated by the + M5, which kinda misses the objective. So, we'll remain with the MKR1000 - and, believe me or not, it's + definitely not a leftover from the Proxima City days, I just love the board! This simple board will send + information about temperature and humidiy that'll be intercepted by Stream Analytics to verify up/down + trends and eventually go downstream with possible anomalies.

+ + +
+
+ +
+
+ +

2 - M5Stack Fire (Event Generator)

+

The M5Stack Fire (a.k.a the Red One) is based on the M5 packaging and it's based on the ESP32 + architecture. This box is the primary event generator, which downsizes - a bit - its capabilities (it's got + even + an IMU posture sensor, that would be a nice feature to enhance the tampering use case), so it'll fire events + simulating the people flowing in a corridor and events from sanitizers.

+

Fortunately, most of the M5 have a tiny cute display, so we'll be able to expose feedback when + sending/receiving messages to OCI.

+ +

dev-promima-safe-chapter3-3

+ +

The UI I've programmed is really basic (in fact, it's character based), with the addition of battery level + detection (found the API call somewhere, and it's not accurate nor reliable, as I discovered), the number of + messages sent, and results. The buttons are programmed to send a single message or burst of messages + to trigger a pattern-based analysis in Stream Analytics. +

+
+
+ +
+
+ +

3 - M5Stack Core 2 (Alarm Detector)

+

The M5Stack Core 2 (a.k.a the White One) is an evolution from the previous M5 box, it's got touch sensors + and another bunch of goodies that we won't use, since the its main task is to display alarms for corridors + and possibly other locations where we need to show an alert - in a form of a red traffic light.

+

dev-promima-safe-chapter3-4

+ +

This box could be expanded with an external display via the Grove port or a relay as an actuator (actually, + I've put some code for the external LCD RGB display but I'm not using that). +

+ +
+
+ + +
+
+ +

4 - E-ink Smart Badge (Badgy)

+

The Badgy device is equipped with an + e-ink display and the ESP8266 chip. While e-ink refresh rates aren't + ideal for contents changing rapidly, it is certainly good for the Sanitizers use case to report to the + maintenance employee which dispenser machine has incurred into troubles, in order to fix it quickly even + when on the move during regular maintenance routines.

+

dev-promima-safe-chapter3-5

+
+
+ + +
+
+ +

5 - M5Stick Wearable Beeper

+

In alternative to the Smart Badge, the gray M5Stick can be worn as a watch/beeper receiving alarm + messages from OCI. The M5Stick (a.k.a the Grey One) is even smaller than the previous M5, it's still based + on the ESP32 architecture and it sports Wi-Fi, Bluetooth and a IR blaster.

+ +

dev-promima-safe-chapter3-6

+ + +
+
+ + + + +
+
+ +

6 - M5Paper Billboard

+

The M5Paper is the last addition to the M5 family: e-ink, + capacitive touch screen, a beefy battery and a + wide display - looks like we just found the billboard for our microlab! This device will be in charge to + display, + one page at a time, all the alarm events coming from OCI with date and time.

+

dev-promima-safe-chapter3-7

+ +

Since it's a full-blown eInk minitablet with touch support, I suppose I should implement a nicer UI + touchbased, to allow the drill down of a particular message to get more info about it. This could be an + idea for a further development.

+ +
+
+ + +
+
+ +

7 - Raspberry Pi 4B as the Gateway

+

We've configured, during the previous chapters, the Raspberry Pi to act as our local MQTT + server + (Mosquitto-Edge) bridged to the MQTT server resident on a OCI Compute instance (Mosquitto-Cloud), and + now it's time - at last - to let the other boards/sensors send message to this thingie and see the loopy + conversation Edge » Cloud » Edge somewhat effective.

+ + +

dev-promima-safe-chapter3-8

+ +

Most of the M-Fivers share the same code snippets:

+ +

Getting a Wi-Fi connection

+ +
+
+
Copy
+
+ +
+
+ WiFi.begin(ssid, pass);
+ 
+ 
+ // Wait for connection
+ while (WiFi.status() != WL_CONNECTED) {
+ delay(500);
+ Serial.print(".");
+ }
+ Serial.println("");
+ Serial.print("Connected to ");
+ Serial.print(ssid);
+ Serial.println("");
+ Serial.print("IP address: ");
+ Serial.print(WiFi.localIP());
+ Serial.println("");
+
+
+ + + +

Connect to the MQTT server running in the Raspberry Pi

+ +
+
+
Copy
+
+ +
+
+ mqttClient.setServer(BROKER,MQTT_PORT);
+ mqttClient.connect(clientId);
+ 
+ Serial.println("Connecting to MQTT Broker");
+ while (!mqttClient.connected()) {
+ if (mqttClient.connect(clientId)) {
+ Serial.println("Waiting for MQTT Broker");
+ Serial.print(".");
+ delay(500);
+ }
+ }
+
+
+ +

Getting time from an NTP server and print it in a human-readable form

+ +
+
+
Copy
+
+ +
+
+ // Get time considering Time Zone and Daylight Offset
+ configTime(gmtOffset_sec, daylightOffset_sec, ntpServer);
+ struct tm timeinfo;
+ if (!getLocalTime(&timeinfo)) {
+ Serial.println("No Time from NTP Server");
+ }
+ strftime (buf, sizeof(buf), "%B %d %Y", &timeinfo);
+
+
+
+ + +

Subscribing to a topic: in this case we'll subscribe to the cloud/alarm topic

+ + +
+
+
Copy
+
+ +
+
+ mqttClient.subscribe(TOPIC);
+ Serial.println("Subscribed to MQTT topic");
+
+
+ + +

and process the received messages with a callback, showing the alerts in different colors

+ +
+
+
Copy
+
+ +
+
+void DisplayCallback(char* topic, byte* payload, unsigned int len)
+{
+ Serial.println((String)topic);
+ Serial.println(len+1);
+ 
+ if ((String)topic == ALARM_TOPIC) {
+  // Save Message
+ char msg[300];
+ for (int i=0; i < len; i++) 
+ msg[i] = payload[i];
+ msg[len] = NULL;
+ 
+ // Issue an audio alarm
+ AudioEffect();
+ // Filter message Status
+ if ((String)msg == "{\"STATUS\":\"Detection Point\"}") { 
+ M5.Lcd.fillScreen(TFT_RED);
+ M5.Lcd.setCursor(10, 40);
+ M5.Lcd.setTextColor(TFT_BLACK,TFT_RED);
+ M5.Lcd.setTextSize(2);
+ M5.Lcd.println("Gates Alert"); 
+ } 
+ if ((String)msg == "{\"STATUS\":\"Sanitizer 2nd Floor\"}") {
+ M5.Lcd.fillScreen(TFT_ORANGE);
+ M5.Lcd.setCursor(10, 40);
+ M5.Lcd.setTextColor(TFT_BLACK,TFT_ORANGE);
+ M5.Lcd.setTextSize(2);
+ M5.Lcd.println("Sanitizer Alert");
+ } 
+ if ((String)msg == "{\"STATUS\":\"DHT11\"}") {
+ M5.Lcd.fillScreen(TFT_ORANGE);
+ M5.Lcd.setCursor(10, 40);
+ M5.Lcd.setTextColor(TFT_BLACK,TFT_ORANGE);
+ M5.Lcd.setTextSize(2);
+ M5.Lcd.println("Environment");
+ } 
+ } else if ((String)topic == ALARM_CLEAR) {
+ M5.Lcd.fillScreen(TFT_GREEN);
+ M5.Lcd.setCursor(10, 40);
+ M5.Lcd.setTextColor(TFT_BLACK,TFT_GREEN);
+ M5.Lcd.setTextSize(2);
+ M5.Lcd.println("Ready");
+ }
+ delay(100);
+}
+
+ 
+
+
+ +

Publishing a message from the Event Generator by pressing a button

+ + +
+
+
Copy
+
+ +
+
+ if (M5.BtnA.isPressed()) {
+ M5.Lcd.clear(BLACK);
+ M5.Lcd.setCursor(10, 40);
+ M5.Lcd.println("Sending Message...");
+ char* msg = formatPeopleMessage(msgbuf, "Gates", "Detection Point",
+timestamp, "People passing", fLatGate, fLonGate);
+ Result = mqttClient.publish(PEOPLE_TOPIC, msg, true);
+ M5.Lcd.setCursor(10, 60);
+ if (Result)
+ M5.Lcd.println("Sent.");
+ else
+ M5.Lcd.println("Not sent.");
+ 
+ M5.Lcd.setCursor(10, 220);
+ M5.Lcd.setTextColor(TFT_WHITE);
+ M5.Lcd.printf("Battery : %i%%", getBatteryLevel());
+ ledFeedback(PEOPLE_EVENT);
+ delay(100);
+ } 
+
+
+
+ +

Please consider this code just as a reference: most of the times I struggle to remember what I did during the + previous tinkering sessions, so I'm all in for code that speaks for itself. +

+ +
+
+ + + +
+
+ +

Pipelines setup

+

Check tampering with Sanitizers
+ Here's the first pipeline, using the interactive designer included with Stream Analytics.

+

dev-promima-safe-chapter3-9

+
+ + +

Translating the visual pipeline into words, here's what it does:

+ + + +

Any alarm should result in a message sent to the OCI Function mqtt_pub which in turn routes the message + to the Mosquitto-Cloud instance, and eventually sent to the edge via MQTT Bridging, using the + cloud/alarm topic. +

+

The message will be shown on the Badgy and on the Billboard as well, since these devices are listening on + this specific topic:

+ +

dev-promima-safe-chapter3-10

+ +

The Beeper can also be triggered with an audio alert. Should you be interested in having the audio file + played on the alert event, I cannot publish that. It's copyrighted (Jimi Hendrix's "Purple Haze"). +

+ + + +
+ + +
+
+ +

Detection Points & Environment alamrs

+ + +

Here's the detection points pipeline, which can be triggered by push the Burst button on the + Event + Generator (the Red box)

+ +

dev-promima-safe-chapter3-11

+ +

Stunningly similar to the previous one, here are the inner workings:

+ + + + + +

In this case we should light up a red traffic light where the detection point is located in order to + discourage + (or deny) the access to that gate, corridor, or location. Since all of the edge components are miniaturized, + we'll just display a red alert within our Alarm Detector M5 box:

+ +

dev-promima-safe-chapter3-12

+ +

The Environment alarm pipeline could be designed with the same steps, analyzing the temperature and + humidity changes in order to detect an up trend, and clear the alarm when environmental values are within + an acceptable range. Let's follow the pipeline construction steps in details: +

+ +

From the Catalog Menu, let's create a new Pipeline selecting our Stream.

+

dev-promima-safe-chapter3-13

+

dev-promima-safe-chapter3-14

+ + + +

A basic canvas should be displayed with the name of the Stream involved. Then, after a right click on + the Stream icon, we are presented with a menu to add a Stage: selecting the Query stage, we'll make + sure to process only the environment data coming from the DHT11 sensor

+ +

dev-promima-safe-chapter3-15

+ +

This query on the data in motion will select messages coming from our environmental sensor by + selecting the Filters tab in the upper right menu and inserting some simple query data, such as:

+

dev-promima-safe-chapter3-16

+ +

It's a simple and intuitive mechanism can be applied anywhere in Golden Gate Stream Analytics, that + makes the construction of pipeline easy and intuitive! Even without committing the pipeline in + production mode, you can see immediately the effects in real time: note that on the Live Output pane + only the DHT11 events are displayed. +

+ +

dev-promima-safe-chapter3-17

+ +

We can proceed with pipeline build by adding an up trend pattern selecting temperature or humidity + as the Tracking Value and insert a call to the OCI Function and will return an alarm to the + edge if the + conditions are met

+

dev-promima-safe-chapter3-18

+ +

Every time that we feel that the pipeline is ready (and we can test any part of it while designing it, + interactively), we can commit the pipeline (note that the target elements are activated only the pipeline is + in + the Published state).

+
+
+ + + + +
+
+ +

Epilogue

+

Setting up a personal development laboratory with the help of few development boards, sensors and some + OCI resources has never been easier (and fun!). Every test has been made connecting the edge elements to + a wireless router 4G-enabled that I acquired some time ago (well, I don't have a full 4G connection where I + live, it's a frontier signal zone), so the connection to Oracle Cloud Infrastructure is just another + variable + factor in evaluating the lab performance, which - anyway - is frankly good.

+

Once the OCI function has been warmed up (by means of test messages sent at irregular + intervals) the + round-trip time between the anomaly phenomenon induced in the edge lab and the alarm returned and + processed in the edge is almost instantaneous. To shed a light regarding the things to + improve on this + base configuration, let's examine the following considerations:

+ + + + +

I believe that we could reuse the concept, maybe with different sensors, for a variety of use cases that + encompass the usage of actual industrial sensors, geofences and other goodies.

+ +

Thanks for your patience, and see you on Zoom.

+ +

Zip and Zest!

+
+
+ + +
+
+ +
+ +
+ +
+ +

About the Author

+ + +

Gabriele Provinciali works as Solution Architect in Rome, Italy - passionate about everything that is + tinkerable!

+
+
+
+ \ No newline at end of file diff --git a/Articles/proximasafe/128-proximasafe-part-2.html b/Articles/proximasafe/128-proximasafe-part-2.html index e69de29..cce0095 100644 --- a/Articles/proximasafe/128-proximasafe-part-2.html +++ b/Articles/proximasafe/128-proximasafe-part-2.html @@ -0,0 +1,758 @@ +
+
+ +

I set a course just east of Lyra
+ And northwest of Pegasus
+ Flew into the light of Deneb
+ Sailed across the Milky Way
+

+

—Neil Peart, (1977) +

+ + +
+
+ +
+
+ +

A quick recap

+ + + + + +

In the previous article we showed the ProximaSafe scope, overall architecture and the components need to + achieve our goal: get the stream flow coming from a determined edge environment in OCI, perform the + analysis of the stream to detect possible anomalies and send back the errors to the edge in order to carry + out corrective actions. All this with development boards commercially available (almost) anywhere and easy + to pack and transport anywhere.

+

Now it is time to having fun fiddling with sensors and OCI Functions, covering a number of areas such as: +

+ + + + + +

That said, without further ado let's dive into some practical aspects of the matter.

+ +
+
+ + + + + +
+
+

Selecting the edge components

+ +

During the spring of 2020 (and the relating lockdown) I fell - almost immediately - in love with the M5Stack + development boards series, based on the ESP32 microcontroller. These cute little boxes have an integrated + display, which - sometimes - is useful to help with building simple and intuitive on-board GUIs (that's not + my case, I'll always be an ASCII fanboy) or debugging and showing messages contents without bothering to + open a serial terminal from the Arduino IDE. Furthermore, a sumptuous choice of different programming + models, IDEs and languages is available: +

+ + + + + +

Needless to say, I'll go for the first choice. I cleary remember the time when IDEs didn't exist (yes, I'm + that + old) and all you got from a compile-link-run session was a disturbing message that read "segmentation fault + (core dump)". We now have modern and productive environments, and - overall - choice, so pick up your + environment of choice and follow the rest of this articles as a reference.

+ + +

dev-promima-safe-chapter2-1

+ +

In addition to the ESP32 family, we'll use an ESP8266-based smart badge that will act as a wearable + device.

+

And, of course, we can't help but use the ubiquitous Raspberry Pi - that year over year is getting specs + almost on-par with his bigger cousins - to act as physical and logical link between edge and the Cloud + environments. This pocketable Linux device will be crucial in bridging the local MQTT instance to the OCI + Cloud instance described and set up in the previous chapter.

+ +

dev-promima-safe-chapter2-2

+ +
+
+ + + +
+
+

The Raspberry Side: MQTT Bridging

+ +

Installing Mosquitto and the related CLI utilities on a Pi is + straightforward, by issuing the command sudo apt install mosquitto and sudo apt install mosquitto-clients. + Once started, you can check the status by issuing the command systemctl status mosquitto, which should be + followed by + something like:

+ + +
+
+
Copy
+
+ +
+
+ Loaded: loaded (/lib/systemd/system/mosquitto.service; enabled; vendor
+preset: enabled)
+ Active: active (running) since Tue 2021-03-30 17:22:35 CEST; 19h ago
+ Docs: man:mosquitto.conf(5)
+ man:mosquitto(8)
+ Main PID: 635 (mosquitto)
+ Tasks: 1 (limit: 4915)
+ CGroup: /system.slice/mosquitto.service
+ └─635 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf
+...
+
+
+ +

and proceed to modify the /etc/mosquitto/conf.d/mosquitto.conf file configuring the bridging mechanism. Most + of the default parameters are just fine (unless you want to setup an encrypted connection between + microcontrollers and the edge instance). In our case we'll just configure the bridge, so using our favorite + editor of choice, even if your favorite search engine suggests otherwise(!):nd proceed to modify the + /etc/mosquitto/conf.d/mosquitto.conf file configuring the bridging mechanism. Most of the default parameters + are just fine (unless you want to setup an encrypted connection between microcontrollers and the edge + instance). In our case we'll just configure the bridge, so using our + favorite editor of choice, even if your favorite search engine suggests otherwise(!):

+ +

dev-promima-safe-chapter2-3

+ +

and reaching the Bridges section:

+ + +
+
+
Copy
+
+ +
+
+# =================================================================
+# Bridges
+# =================================================================
+
+# A bridge is a way of connecting multiple MQTT brokers together.
+# Create a new bridge using the "connection" option as described below. Set
+# options for the bridges using the remaining parameters. You must specify the
+# address and at least one topic to subscribe to.
+
+
+ +

we can add the following parameters:

+ + +
+
+
Copy
+
+ +
+
+connection proxima
+address [host:port]
+topic # out 0 "" edge/
+topic alarm in 0 cloud/ edge/
+
+
+
+ + + + + +

Where the host and port parameters are the public IP address and the of the OCI instance we configured + in the first episode, and the other parameters indicate that: +

+ + + Sure enough, we also need to setup the certificate based SSL/TLS support, so reach for the section + regarding security and complete it with: + + +
+
+
Copy
+
+ +
+
+# -----------------------------------------------------------------
+# Certificate based SSL/TLS support
+# -----------------------------------------------------------------
+# Either bridge_cafile or bridge_capath must be defined to enable TLS support
+# for this bridge.
+# bridge_cafile defines the path to a file containing the
+# Certificate Authority certificates that have signed the remote broker
+# certificate.
+# bridge_capath defines a directory that will be searched for files containing
+# the CA certificates. For bridge_capath to work correctly, the certificate
+# files must have ".crt" as the file ending and you must run "openssl rehash
+# [path to capath]" each time you add/remove a certificate.
+# bridge_capath
+bridge_cafile /etc/mosquitto/certs/ca.crt
+
+# Path to the PEM encoded client certificate, if required by the remote broker.
+bridge_certfile /etc/mosquitto/certs/server.crt
+
+# Path to the PEM encoded client private key, if required by the remote broker.
+bridge_keyfile /etc/mosquitto/certs/server.key
+
+# When using certificate based encryption, bridge_insecure disables
+# verification of the server hostname in the server certificate. This can be
+# useful when testing initial server configurations, but makes it possible for
+# a malicious third party to impersonate your server through DNS spoofing, for
+# example. Use this option in testing only. If you need to resort to using this
+# option in a production environment, your setup is at fault and there is no
+# point using encryption.
+bridge_insecure true
+
+
+ + +

thus creating a certs directory under /etc/mosquitto and copying the ca.cert, server.crt and + server.key files we generated during the first episode in section Secure the MQTT Server running on + OCI Compute.

+ + +

That is easy to test. Issuing a listening command to the Cloud instance in a shell, as shown in the first + episode: +

+ + +
+
+
Copy
+
+ +
+
+mosquitto_sub -d -t '#' -h [your host] -u [username] -P [password] -p [port] --insecure --cafile certs/ca.crt --cert certs/server.crt --key certs/server.key
+
+
+ + +

and sending a message to the local Raspberry Pi

+ + +
+
+
Copy
+
+ +
+
+mosquitto_pub -h [your RPi IP address] -t test -m 'Sympathetic resonance'
+
+
+ +

we should receive on the Cloud Mosquitto shell the message: +

+ +
+
+
Copy
+
+ +
+
+Client (null) received PUBLISH (d0, q0, r0, m0, 'edge/testtopic', ... (21
+bytes))
+Sympathetic resonance
+
+
+ +

showing that the two thingies are effectively talking themselves - albeit in a single direction, for now.

+

The pipelines we'll design in Stream Analytics will provide the logic to test the bidirectional dialogue. + And, + now, let's have some healthy fun with sensors!

+ + + +
+
+ + + +
+
+

Edge Programming

+

The goal is to build an edge that can easily fit into a small briefcase, and - certainly - an ESP32-based kit + will help saving space, time, and power consumption. Let's consider a setup that includes some edge + emitters (MQTT publishers), some receivers (MQTT subscribers) and the Gateway:

+ + +

dev-promima-safe-chapter2-4

+
+
+ +
+
+ +

Publishers

+ + + +

You can find all the sources I've used at this link (NOTE: insert the GitHub link, open in a new window/tab). +

+
+
+ +
+
+

Subscribers

+ + + + + +

Both the publisher and the subscriber will use the PubSubClient API. Specifically, the Publishers will send + messages to the local MQTT server via the publish method: +

+ +
+
+
Copy
+
+ +
+
+Result = mqttClient.publish(MACHINE_TOPIC, msg, true);
+ M5.Lcd.setCursor(10, 60);
+ if (Result)
+ M5.Lcd.println("Sent.");
+ else
+ M5.Lcd.println("Not sent.");
+
+
+ + +

while Subscribers will initialize the callback in the setup() portion of the code (executed only once at + startup): +

+ +
+
+
Copy
+
+ +
+
+configTime(gmtOffset_sec, daylightOffset_sec, ntpServer);
+ timestamp = getTime();
+ if (timestamp > 0)
+ noTime = false;
+ mqttClient.subscribe(TOPIC);
+ Serial.println("Subscribed!");
+ mqttClient.setCallback(DisplayCallback);
+ delay(100);
+
+
+ + +

and upon the reception of new messages (in our case, from OCI), processing will occur in +

+ +
+
+
Copy
+
+ +
+
+void DisplayCallback(char* topic, byte* payload, unsigned int len)
+{
+ // Process message
+ // Serial.println((String)topic);
+}
+
+
+ +

Programming these gizmos is fun and it's a very effective means of spreading the culture of programming + among students of all levels (including myself). Plus, there's plenty of examples available on the Web.

+

Still, we need to design a way to return alarm messages from Stream Analytics to the edge, using the MQTT + Bridge feature we set up not too long ago.

+

As described in the previous Episode, our approach will be as the following: +

+ +

dev-promima-safe-chapter2-5

+ + +

thus we (thankfully) need to tinker with Oracle Functions.

+ +
+
+ + + +
+ +
+

Serverless Time!

+

FnProject is a cool Open Source serverless platform that can scale from microdevices to megainstallations, + launched in 2017, and later transformed and evolved in an industrial-strength OCI service called Oracle + Functions. +

+ + +

Developing a function in OCI requires either:

+ + + + Either way, you'll be good to go with the function deployment in OCI. + We will use a Custom Dockerfile to build our image in Python, such as the following: + + +
+
+
Copy
+
+ +
+
+FROM fnproject/python:3.6-dev as build-stage
+WORKDIR /function
+ADD requirements.txt /function/
+RUN pip3 install --target /python/ --no-cache --no-cache-dir -r
+requirements.txt && rm -fr ~/.cache/pip /tmp* requirements.txt func.yaml
+Dockerfile .venv
+ADD . /function/
+RUN rm -fr /function/.pip_cache
+FROM fnproject/python:3.6
+WORKDIR /function
+COPY --from=build-stage /python /python
+COPY --from=build-stage /function /function
+COPY certs /function
+ENV PYTHONPATH=/function:/python
+ENTRYPOINT ["/python/bin/fdk", "/function/func.py", "handler"]
+
+
+ +

specifying Python requirements in requirements.txt file as we're going to use the Paho + Library:

+ +
+
+
Copy
+
+ +
+
+fdk
+paho-mqtt
+
+
+ +

and write some code to complete the round trip, copying the certs folder and files used to access the + MQTT Server on OCI in the function directory. Please find the Dockerfile and the code at this address + (NOTE: insert the GitHub link, open in a new window/tab).

+

Oracle Functions (as Fn Project) requires the function to be installed in + an artifact called Application, a + logical grouping of functions, which can be created via the fn CLI (specifying the OCI subnets) or in the + OCI + Web console following the path Home » Developer Services » Functions: +

+ + +

dev-promima-safe-chapter2-6

+ + +

Once the application is created, we can deploy the function (this time we'll leverage the good-ole CLI) using +

+ + +
+
+
Copy
+
+ +
+
+fn build
+fn deploy --app [app name]
+
+
+ + + + + +

where you can see some familiar Docker (layer-related) output messages and the result of deployment. +

+ +
+
+
Copy
+
+ +
+
+Building image fra.ocir.io/emeaseitalyproxima/gabbarepository/mqtt_pub:0.0.2 .
+Parts: [fra.ocir.io emeaseitalyproxima gabba-repository mqtt_pub:0.0.2]
+Pushing fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2 to
+docker registry...The push refers to repository
+[fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub]
+77ff3ee9cb37: Pushed
+3353efa4559: Pushed 
+0f6cdd7e71a8: Layer already exists 
+3697bae2d860: Layer already exists 
+0b66d6c41076: Layer already exists 
+85e1ba76ed69: Layer already exists 
+6881daa7bad0: Layer already exists 
+7352730c981f: Layer already exists 
+9d95bea46bad: Layer already exists 
+b84a8d46e8fb: Layer already exists 
+f66ed577df6e: Layer already exists 
+0.0.2: digest:
+sha256:e82a0abc009c0a132fc6c3c35fc8d88f516589b35a96907c41e41a350619872d
+size: 2626
+Updating function mqtt_pub using image
+fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2...
+
+
+
+
+ + +

The status of the function will be reflected in the OCI Web Console as well as in CLI, issuing the command + fn list functions :

+ + +
+
+
Copy
+
+ +
+
+NAME           IMAGE           ID
+mqtt_pub fra.ocir.io/emeaseitalyproxima/gabba-repository/mqtt_pub:0.0.2
+ocid1.fnfunc.oc1.eu-frankfurt
+ 1.aaaaaaaaabbknysfrffi2olayuzykycv5boop72qi75k5aqgjwjfjdlycutq
+
+
+
+
+
+ +

The function must be provided with the three input parameters that can be set on the OCI Web Console in + the Configuration submenu: +

+ + +

dev-promima-safe-chapter2-7

+ + +

specifying your Mosquitto username, password and the alarm topic edge/alarm. Note those parameters, + as we'll use them to perform some smoke test! +

+ +
+
+ +
+
+ +

Creating the API Gateway and an API deployment

+ +

The mechanisms to expose and consume APIs in Oracle Cloud Infrastructure are accessible in the Main + menu » Developer Services » API Management section of OCI Web Console. We'll create an API + Gateway + first, and then an API deployment specifying the Oracle Function we created previously. Creating an API + Gateway involves specifying:

+ + + + +

dev-promima-safe-chapter2-8

+ +

Hitting the blue Create button starts the magic, and creation is quick. Then, we can proceed + to shape our + API deployment by clicking the link named "Deployments" in the bottom left Resources section and fire the + Create Deployment procedure, which consists of three stages: +

+ + +
+
+ + + +
+
+ +

Next Episode - Use cases and Stream Analytics + pipelines

+ +

Going further, we'll need to design some simple use cases as an example and develop some pipelines + within Stream Analytics to close the loop and test our setup. See you on the next chapter

+ + +

See you on the next chapter. Sensors, Pipelines and back to Edge

+

Zip and Zest!

+ + +
+
+ + +
+
+ +
+ +
+ +
+ +

About the Author

+ + +

Gabriele Provinciali works as Solution Architect in Rome, Italy - passionate about everything that is + tinkerable!

+
+
+
+ \ No newline at end of file diff --git a/Articles/solutions/106-deploy-weblogic-cloud-app-post-reg.html b/Articles/solutions/106-deploy-weblogic-cloud-app-post-reg.html index e69de29..3bd48ea 100644 --- a/Articles/solutions/106-deploy-weblogic-cloud-app-post-reg.html +++ b/Articles/solutions/106-deploy-weblogic-cloud-app-post-reg.html @@ -0,0 +1,950 @@ +
+
+

Thank you! Follow the steps below to start the hands-on lab.

+
+ +
+
If you do not already have an Oracle Cloud account.
+ +
+ +
+ + + +

 

+

Once you open the link you will be taken to the sign-up page. After you create your account, come + back to this page to start this hands-on lab.

+ +
+ + +

Lab 1: Prepare your tenancy for the lab

+
+
+ +
+
+ +

Step 1: Create an OCI Compartment for the lab

+

When provisioning WebLogic for OCI through Marketplace, you need to specify an OCI Compartment where all + resources will be created.

+ +

Make sure you have a Compartment that you can use or create a new one.

+ +

Take note of the compartment OCID:

+ +

+ +

The Compartment name is referred as CTDOKE in the Hands on Lab.

+ +
+
+ + + + + +
+
+

Step 2: Provision ATP Database

+

+ When deploying a JRF enabled WebLogic domain, a database repository is required. We will use Autonomous + Transaction Processing - ATP - Database. As in the next part we'll deploy a sample ADF application that + requires a database table and some records data, we should create the DB schema in advance.

+ +

(1) Go to Oracle Database > Autonomous Transaction Processing:

+ +

+ + +

(2) Choose to create a new Autonomous Database:

+ + +

+ + +

(3) Give it a meaningful name, for example WLSATPDB; Keep default workload type Transaction Processing:

+ +

+ + + +

(4) Scroll down and keep default setting for:

+ +

+ +

Deployment type: Shared Infrastructure
+ Database version: 19c
+ OCPU count: 1
+ Storage (TB): 1
+ Auto scaling: Enabled +

+ + + +

(5) Next setup a password for the ADMIN user: must be 12 to 30 characters and contain at least one uppercase + letter, one lowercase letter, and one number. The password cannot contain the double quote (") character or + the username "admin.

+ +

+ +

Keep default setting to Allow secure access from everywhere; this will provision ATP database with public + endpoints (access can still be restricted by allowing incoming traffic from trusted IP addresses or + whitelisted Virtual Cloud Networks):

+ + + + + +

(6) For the last step choose License included for license type and click on Create Autonomous Database:

+ + +

+ + +

(7) The provisioning process will start:

+ +

+ + + + +

(8) After a few minutes the Database should be available:

+ +

+ + + + + +
+
+ + + + + + +
+
+ +

Step 3: Prepare DB Objects

+ +

Once the ATP database available, we can use the SQL Developer Web tool to created a DB schema and some + required tables and records needed in the next part.

+ +

(1) Go to Service Console:

+ +

+ +

(2) From Development submenu open SQL Developer Web:

+ + +

+ + + +

(3) This will open in a new tab the SQL Developer Web Login screen. Use ADMIN and the password setup when + provisioning the ATP Database:

+ + +

+ + +

(4) Once logged in, you can follow a waking tour to discover the main user interface feature:

+ +

+ + + +

(5) Once ready, copy and paste the contents of this sql file into Worksheet window:

+ +

+ + + +

(6) Execute the script by clicking the Run script play button. All statements should execute with success: +

+

+ +
+
+ + + +
+
+ +

Step 4: Create OCI Secrets

+

When you provision a WebLogic instance you need to specify the WebLogic Admin password and the Database Admin + Password. For security reasons, passing these passwords in a readable format into the instance creation + script is a bad idea. Oracle Cloud provides a solution for this problem via the OCI Vault.

+ +

A vault will contain encryption keys that are used to encrypt and decrypt secret content, and also the + secrets that contain the actual content you want to secure - in this case the admin passwords of the + database and the WebLogic instance. +

+

1. Create a Vault

+

(1) Go to Governance and Administration > Security > Vault

+ +

+ + +

(2) Create a new Shared Vault (leave the Make it a Virtual Private Vault option unchecked):

+ +

+ + + + +

(3) The new Vault should be listed as Active:

+ +

+ + + + +

(4) Take a look at the Vault Information:

+ + +

+ + +

2. Create an Encryption Key

+

(1) Go to Master Encryption Keys submenu of the Vault Information page and create an new Key:

+ +

+ + + +

(2) Give the key a Name and leave the other settings as default:

+ + +

+ + +

(3) The new key should be listed as Enabled:

+ + +

+ + +

3. Create an OCI Secret for the WebLogic Admin password

+

(1) Go to Secrets submenu of the Vault Information page and create a new Secret:

+ +

+ + + +

(2) Setup a name for the OCI Secret; choose previously created Encryption Key (WLSKey) in the Encryption Key + dropdown. If you leave default value for Secret Type Template (Plain-Text), you have to enter the plain + WebLogic Admin password in the Secret Contents aria. If you switch to Base64 template, you need to provide + the password pre-encoded in base64.

+ +

The password must start with a letter, should be between 8 and 30 characters long, should contain at least + one number, and, optionally, any number of the special characters ($ # _).

+ +

+ + + +

(3) Shortly, the Secret should be listed as Active:

+ + +

+ + +

(4) Click on the Secret name and take note of its OCID. We need to provide this value in the WebLogic for OCI + Stack configuration form:

+ + +

+ + +

4. Create an OCI Secret for the Database Admin password

+

In the same way as in previous step, create a new OCI secret for your ATP Admin user Password. Instead of the + WebLogic Admin password, pass the ADMIN password created during ATP Instance provisioning. Give the Secret a + name, for example ATPDBSecret.

+ +

+ + +

Click on the new Secret name (ATPDBSecret) and take note of its OCID. We need to provide this value in the + WebLogic for OCI Stack configuration form.

+ +

5. Create an OCI Secret for the Sample Application Schema password

+

We need to create one more OCI secret, for the Sample Application Schema password. As we'll see in the next + part, when creating the WebLogic Stack, we have an option to create in advance an Application Datasource on + WebLogic Domain. To securely pass the Schema password, we need to create an OCI secret.

+ +

+ +

In the same way as in previous step, create a new OCI secret for the Sample Application Schema (ADFAPP). Give + the Secret a name, for example ADFAppSecret. Setup the Welcome1234# password (or a custom password if you + have changed the default password setup in the SQL script executed earlier).

+ + + + + + +

Click on the new Secret name (ADFAppSecret) and take note of its OCID. We need to provide this value in the + WebLogic for OCI Stack configuration form.

+ + + +
+
+ +
+
+ +

Step 5: Create ssh keys

+ +

You need to generate a public and private ssh key pair. During provisioning using Marketplace, you have to + specify the ssh public key that will be associated with each of the WebLogic VM nodes.

+ +

We will be using the Cloud Shell to generate the keys in this tutorial.

+ + +

(1) Open your Cloud Console by clicking on the > icon

+ +

+ + +

(2) Create a directory to contain your keys

+ +

mkdir keys
+ cd keys

+ + +

(3) Now create your key set:

+ +

ssh-keygen -t rsa -b 4096 -f weblogic_ssh_key

+ + +

This will create the weblogicsshkey containing the private key. The public key will be saved at the same + location, with the .pub extension added to the filename: weblogicsshkey.pub.

+ +
+
+
Copy
+
+ +
+    
+    Generating public/private rsa key pair.
+    Enter passphrase (empty for no passphrase): 
+    Enter same passphrase again: 
+    Your identification has been saved in weblogic_ssh_key.
+    Your public key has been saved in weblogic_ssh_key.pub.
+    The key fingerprint is:
+    SHA256:jnmUBEH3HnwxcibOvcpPLi5/c1p55PoE7LNLHRmijRI jan_leeman@5e83aaf6d012
+    The key's randomart image is:
+    
+    +---[RSA 4096]----+
+    |     .+.. o =    |
+    |       o = * o   |
+    |        .E* o. . |
+    |       . o.o+o. o|
+    |        S..o..oo.|
+    |       = ... . *.|
+    |      o o o . * =|
+    |       .. .+oo.* |
+    |         +oo+++o.|
+    +----[SHA256]-----+
+    
+    
+
+ + +

You should be prepared now to run the Hands on Lab on your own cloud environment.

+ + + + +
+
+ + + +
+
+

Lab 2: Create WebLogic for OCI Stack

+ +

1. After logging in, go to Main Menu, Solutions and Platform -> Marketplace:

+ +

+ + +

2. You can choose to browser-search for WebLogic Server, or you can apply the filters:

+ +

Type: Stack
+ Publisher: Oracle
+ Category: Application Development +

+ +

+ + +

3. Choose WebLogic Server Enterprise Edition UCM; This brings you to the Stack Overview page:

+ +

+ + + +

4. Make sure the CTDOKE compartment is selected and to accept Oracle Terms of Use:

+ +

+ + +

5. Fill in stack information

+ +

Name: WLSNN
+ Description: Something you'll remember +

+ +

Click Next

+ +

+ + +

6. Start to fill in details:

+ +

Resource Name Prefix: WLSNN - where NN your unique suffix
+ + WebLogic Server Shape: VM.Standard2.1
+ + SSH Public Key: copy and paste the content from the provided weblogicsshkey.pub file; it + contains the public key in RSA format; be sure to include the whole content in one line, including ssh-rsa + part at the beginning. +

+ +

Note: if you have used the Cloud Shell to generate the weblogic ssh key, you can use the cat command to + display its contents:

+ +

cat weblogic_ssh_key.pub

+ +

+ + + +

7. On Windows, use Ctrl+INSERT to copy the highlighted aria as in the above example:

+ +

On Mac, you can simply use command+c

+ + +

+ + +

8. Continue setting up:

+ +

WebLogic Server Node count: 2 (we will create a WebLogic cluster with two nodes / managed servers)

+ +

Admin user name: weblogic

+ +

Secrets OCID for WebLogic Server Admin Password Enter the OCID of the WebLogic Admin Secret that was set up + earlier for this. If you if you are using the CTD (Cloud Test Drive) environment, this OCID might be in a + document provided by your instructor.

+ +

A bit of context: the WebLogic Server Admin Password is stored in an OCI Vault as an OCI Secret (encrypted + with an OCI Encryption Key); during WebLogic Domain creation, the provisioning scripts will setup the admin + password by getting it from the OCI Secret instead of having it as a Terraform variable; in a similar way - + for JRF enabled domains - the database admin password will be referred from an OCI Secret.

+ + +

+ +

9. Don't change WebLogic Server Advanced Configuration:

+ +

WebLogic Server Network parameters:

+ +

* Choose Create New VCN
+ + * Choose the same CTDOKE Compartment
+ + * Give a name to the Virtual Cloud Network

+ +

+ + + +

10. For the Subnet Strategy:

+ +

* Create New Subnet
+ * Use Public Subnet
+ * Regional Subnet

+ +

+ +

Note: It is not recommended to use public subnets strategy in production environments. In this tutorial we + are instructing you to use public subnets for simplicity reasons.

+ +

11. Tick to Enable Access to Administration Console:

+ + + +

+ +

12. Tick to Provision Load Balancer:

+ +

Load Balancer Minimum and Maximum Bandwidth: keep defaults

+ + +

+ + + +

13. Leave Identity Cloud Service Integration unchecked as default (no integration):

+ +

Leave OCI Policies checked, as a Dynamic Group containing the WebLogic Compute nodes will be created + automatically alongside policies for letting them read Secrets from OCI Vault

+ + +

+ + +

14. Check Provision with JRF. In the Database section choose:

+ +

Database Strategy: Autonomous Transaction Processing Database
+ + Autonomous DB System Compartment: CTDOKE (or the compartment name where the ATP database + was provisioned)
+ Autonomous Database name:
WLSATPBDB
+ Secrets OCID for Autonomous Database Admin Password: Enter the OCID of the DB Admin Secret + that was set up earlier for this. If you if you are using the CTD (Cloud Test Drive) environment, this OCID + might be in a document provided by your instructor
+ Autonomous Database Service level: low (default option) +

+ +

+ + + +

15. Check Configure Application Datasource. We have a quick option to pre-configure the WebLogic Domain with + a ready to use Application Datasource.

+ + +

+ + +

16. Set:

+ +

Application Database Strategy: Autonomous Transaction Processing Database
+ Compartment: CTDOKE (or the compartment name where the ATP database was provisioned)
+ Autonomous Database name: WLSATPBDB
+ Autonomous Application Database User Name: ADFAPP
+ Secrets OCID for Autonomous Application Database User Password: Enter the OCID of the + Sample Application Schema Secret that was set up earlier for this. If you if you are using the CTD (Cloud + Test Drive) environment, this OCID might be in a document provided by your instructor
+ Autonomous Database Service level: tp +

+

+ + + + +

17. Review the Stack configuration and click Create:

+ + +

+ + +

18. A Stack Job is being kicked off and the WebLogic Domain starts to be provisioned. The Console context + moves to the Stack Details page (Solutions and Platform > Resource Manager > Stacks):

+ +

+ + + +

19. While all resources being created we can check the Job Logs; it helps fixing potentially configuration + errors if the provisioning fails:

+ +

+ + + +

20. After approx. 15 minutes, the Job should complete with success:

+ + +

+ + +

21. We can check the Outputs section of Job Resources and see two important values:

+ +

Sample Application URL (we can can try it at this moment, but the out of the box sample application won't + load as we need to finish the SSL configuration of the Load Balancer) + WebLogic Server Administration Console

+ +

+ + +

22. Let's check the WLS admin console of the newly created WebLogic Server; as we have chosen a Public Subnet + for the WLS network, both Compute instances that have been created have public IPs associated. Use the + Console URL provided in the Outputs section as shown above

+ +

Login with weblogic username and the provided password:

+ + + +

+ + + +

23. We can see that our domain has one admin server and two managed servers:

+ +

+ + + +

24. We can check the Compute Instances to see what has been provisioned; From OCI menu choose Core + Infrastructure -> Compute -> Instances:

+ + +

+ + +

25. We can see two instances having our prefix setup during Stack configuration; one of them runs the + WebLogic Admin server and a Managed Server and the other runs the second Managed Server:

+ +

+ + + +

We can check now if the out of the box deployed application is loading; From the Stack Job Outputs, open the + Sample Application URL; it's loading, but we have to bypass the browser warning as the Public Load Balancer + is configured with a Self Signed Certificate;

+ +

26. Click Advanced button and Proceed to … to continue:

+ +

+ + + +

27. The out of the box deployed sample application is being served through a secured SSL Load Balancer + Listener:

+ +

+ + + +

congratulations! Your WebLogic domain is up and running!"

+ +
+
+ + + + + + + +
+ +
+

Lab 3: Change Load Balancer Cookie persistence type

+

Before deploying the sample ADF Application, we need to change the way Session Persistence is handled by the + Public Load Balancer. By default, the Public Load Balancer comes pre-configured to use Load Balancer cooking + persistence. But in our case - or in any ADF application case actually - as the sample ADF application + generates its own cookie (JSESSIONID), we need to instruct the Load Balancer to use + Application cookie persistence. + + +

+ 1. In the OCI Console navigate to Core Infrastructure -> Networking -> Load Balancers:

+ + +

+ +

2. Identify the Load Balancer created by the WebLogic Stack and click on it (contains the Stack resource name + prefix setup during WebLogic Stack configuration):

+ + +

+

+ + +

3. In the Resources section, click on Backend Sets; click on the backend set:

+ + +

+

+ + +

4. Click on Edit:

+

+

+ + + + +

5. We see that, by default, the Backend Set has Session Persistence enabled using load balancer cookie + persistence:

+ +

+

+ + + +

6. Change to Enable application cookie persistence; Set Cookie Name to JSESSIONID; Click on Update Backend + Set:

+ +

+

+ + + +

7. A Work Request has been created and shortly the Backend Set configuration shall be updated:

+ +

+

+ + + +
+
+ + + + + +
+ +
+

Lab 4: Deploy sample ADF application

+

1. Let's go back to the WebLogic Server admin console:

+ +

+ + +

2. Before we deploy the ADF application, let's have a look at the Application Data Source that has been + created with WebLogic Server; From Domain Structure go to Services -> Data Sources:

+ +

+ + + +

3. The Data Source it's named APPDBDataSource. We need to change the JNDI Name as our sample ADF application + requires jdbc/adfappds to lookup for data source and get database connections. Click on the data source:

+ +

+ + + +

4. To change the JNDI Name we need to Lock the WebLogic Console Session. Click on Lock & Edit in the upper + left corner:

+ + +

+ + +

5. Change JNDI Name to jdbc/adfappds and click Save:

+ +

+ + + +

6. Click on Activate Changes to save and close the WebLogic Console Editing Session:

+ +

+ + + +

7. The change has been recorded, but we need also to restart the Datasource. Click on View changes and + restarts (upper left corner):

+ +

+ + +

8. Switch to Restart Checklist:

+ +

+ + + +

9. Select the AppDBDataSource and click on Restart:

+ +

+ + + +

10. Click Yes:

+ +

+ + + +

11. The Datasource will be restarted shortly:

+ + +

+ + +

12. Now, to deploy the ADF application, go to Domain Structure menu, Deployments; Lock & Edit to switch to + edit mode; Click Install:

+ +

+ + +

13. Follow Upload your files link and upload provided SampleADFApplication.ear enterprise archive file:

+ +

+ + + +

14. Click Next, Next, leave Install de deployment as an application default option; click Next:

+ + +

+ + +

15. Choose deploying the application on WebLogic Cluster; click Next:

+ +

+ + + +

16. Leave default settings and click Next:

+ +

+ + + +

17. Choose No, I will review the configuration later and click Finish

+ +

+ + + +

18. Activate Changes:

+ +

+ + + +

19. The application is now in Prepared state; switch to Control tab:

+ + +

+ + +

20. Select the SampleADFApplication enterprise application and click Start -> Serving all requests; Click Yes + in the following screen:

+ +

+ + + +

21. The SampleADFApplication application is in the Active State now:

+ +

+ + + +

22. Now, test this application at https://< public load balancer IP >/sampleADFApplication/

+ +

+ + + +

23. As we can see, calendar entries are coming from the ATP database. Play with the ADF Calendar Component, + for example switch to List view:

+ + +

+ + +

This is just a sample ADF application, but you can deploy any other applications; congratulations!

+
+
\ No newline at end of file