Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New practice exercise - Perceptron #678

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3c62326
New practice exercise - Perceptron
depial Oct 3, 2023
7b291e5
Merge remote-tracking branch 'origin/main' into depial-main
cmcaine Jan 30, 2024
053cb63
Update runtests.jl
depial Jan 31, 2024
c821340
Update runtests.jl
depial Jan 31, 2024
a4bc759
Update runtests.jl
depial Jan 31, 2024
185e88c
Delete exercises/practice/perceptron/testtools.jl
depial Jan 31, 2024
475efe3
Update config.json
depial Feb 4, 2024
4a24b2c
Update tests.toml
depial Feb 4, 2024
e4760e2
Update instructions.md
depial Feb 4, 2024
39ba0fa
Update instructions.md
depial Feb 16, 2024
735743f
Update instructions.md
depial Feb 16, 2024
121e5fa
Update exercises/practice/perceptron/perceptron.jl
depial Feb 16, 2024
dd96c6c
Update exercises/practice/perceptron/.meta/example.jl
depial Feb 16, 2024
f7d054f
Merge branch 'exercism:main' into main
depial Feb 17, 2024
7bcaf89
Update example.jl
depial Feb 17, 2024
4ac9805
Update config.json
depial Feb 17, 2024
c8d226b
Update example.jl
depial Feb 22, 2024
fe7eb26
Update runtests.jl
depial Feb 22, 2024
bed28be
Update tests.toml
depial Feb 22, 2024
9d9ce4b
Update config.json
depial Feb 22, 2024
76f21f2
Create introduction.md
depial Feb 22, 2024
ff9ce68
Update instructions.md
depial Feb 22, 2024
9c8193d
Update runtests.jl
depial Feb 23, 2024
5ac2eec
Update runtests.jl
depial Feb 23, 2024
7ec87e8
Update runtests.jl
depial Feb 23, 2024
0af5de7
Merge branch 'exercism:main' into main
depial Mar 12, 2024
233f618
Adding new practice exercise Binary Search Tree
depial Mar 12, 2024
f444359
config changes
depial Mar 12, 2024
d963209
Merge branch 'main' of https://github.com/depial/julia
depial Mar 12, 2024
c947630
Update config.json
depial Mar 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,21 @@
"practices": [],
"prerequisites": [],
"difficulty": 2
},
{
"uuid": "b43a938a-7bd2-4fe4-b16c-731e2e25e747",
"practices": [],
"prerequisites": [],
"slug": "perceptron",
"name": "Perceptron",
"difficulty": 3,
"topics": [
"machine learning",
"loops",
"arrays",
"logic",
"math"
]
}
]
},
Expand Down
34 changes: 34 additions & 0 deletions exercises/practice/perceptron/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Instructions

### Introduction
[Perceptron](https://en.wikipedia.org/wiki/Perceptron) is one of the oldest and bestestly named machine learning algorithms out there. Since perceptron is also quite simple to implement, it's a favorite place to start a machine learning journey. As a linear classifier, if a linear decision boundary (e.g. a line in 2D or hyperplane in general) can be drawn to separate two labled classes of objects, perceptron is guaranteed to do so. This can help in predicting what an unlabeled object would likely be classified as by seeing which side of the decision boundary it is on.
cmcaine marked this conversation as resolved.
Show resolved Hide resolved

### Details
The basic idea is fairly straightforward. We cycle through the objects and check if they are on the correct side of our hyperplane. If one is not, we make a correction to the hyperplane and continue checking the objects against the new hyperplane. Eventually the hyperplane is adjusted to correctly separate all the objects and we have our decision boundary!

#### A Brief Word on Hyperplanes
How do you pick your starting hyperplane? It's up to you! Be creative! Or not... Actually perceptron's convergence times are sensitive to conditions such as the initial hyperplane and even the order the objects are looped through, so you might not want to go too wild.

We will be dealing with a two dimensional space, so our divider will be a line. The standard equation for a line is usually written as $y = ax+b$, where $a,b \in \Re$, however, in order to help generalize the idea to higher dimensions, it's convenient to reformulate this equation as $w_0 + w_1x + w_2y = 0$. This is the form of the [hyperplane](https://en.wikipedia.org/wiki/Hyperplane) we will be using, so your output should be $[w_0, w_1, w_2]$. In machine learning, ${w_0,w_1,w_2}$ are usually referred to as weights.

While hyperplanes are equivalent under scalar multiplication, there is a difference between $[w_0, w_1, w_2]$ and $[-w_0, -w_1, -w_2]$ in that the normal to the hyperplane points in opposite directions. By convention, the perceptron normal points towards the class defined as positive, so this property will be checked but not result in a test failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should maybe be a test failure. It would simplify the messaging for the student in the instruction and from the tests. Do you think that would be okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, negative numbers are scalars (at least to me?), so I would probably drop the first part of the first sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should maybe be a test failure. It would simplify the messaging for the student in the instruction and from the tests. Do you think that would be okay?

I didn't want to strictly penalize a flipped normal since it is just a convention (logical though it may be) and it would at still tell the student they have successfully found a decision boundary. In any case, I think it would be nice to leave this in, but if you think it's better to enforce the convention (which of course is good practice), we can turn this into a failed test, with it's own error message.

Also, negative numbers are scalars (at least to me?), so I would probably drop the first part of the first sentence.

I've rephrase the scalar multiplication part to make more clear what I was trying to say (that a hyperplane looks the same multiplied by any scalar, but negative scalars flip what we define to be the normal). Hopefully that's clearer :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we ask the student to define a classify(boundary, point) function then they'll have to get consistent about the boundary anyway?

See #678 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: moving this down here to split up topics

I agree that testing with unseen points is a bit tricky, but it's not super hard given that we control the input data and it's not too bad for us to either only include hand-written tests or to do some geometry to ensure our unseen pointd will always be classified correctly by a valid linear boundary.

I was thinking about this, and it appears it is redundant to test unseen points which are guaranteed to be on the correct side of any possible decision boundary for a population. That's because there are two hyperplanes which define the limits of a cone of possibility, and each of these limits can be defined by two points, one from each class. These two points work like support vectors, and are robust to any additional (guaranteed valid) unseen points. This means only the support vectors need to be classified correctly to verify that the hyperplane is a valid decision boundary, and this is already done under the current testing.

For this reason I think it would be best to leave out testing of unseen points, since, for students encountering this for the first time, it could give the unintended impression that Perceptron will always correctly classify unseen points.

Any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I understand your intentions, but I want to check if I'm getting everything:

  1. Confused students won't be able to encode the seen points in the perceptron and pass the tests.

I can possibly understand this two ways, so I'll take a stab at both:

  • Do you mean students won't have seen-points to use locally to debug? If so, the first set of fixed tests have the points and labels shown explicitly.
  • Or, do you mean they might hard code the answers into their function in order to pass the tests (i.e. 'cheat')? If so, the pseudorandom tests don't explicitly show the seen points, and someone would have to hard code over 1200 points across 40+ tests to pass, which, while doable, seems unlikely (also couldn't they just look at other students' solutions after submitting once?).

We can have unseen points without producing ambiguous tests by picking points that are geometrically guaranteed...

This actually still leaves an ambiguity. If a student presents an incorrect decision boundary (DB), the unseen point is not guaranteed to be either correctly or incorrectly classified. It all depends on the particular DB. Furthermore, if the point happens to be classified correctly, it can give the impression that the DB is correct instead of properly relaying the actual information of being just not incorrect, thereby misleading the student.

Just a note: the reason I said that any test that can be done with unseen points can be done as well or better with seen-points is that correct classification of a maximum of four special points (support vectors) out of the seen-points is both necessary and sufficient to tell if the DB is 100% correct or not. Any other test is technically deficient on its own, and deficiency introduces ambiguity. On the other hand, any other in conjunction with classifying the support vectors is superfluous. However, finding them is a non-trivial task, so every point is exhaustively classified in testing (to make sure they are found and tested).

Another option would be to train on and classify larger datasets and assert that the accuracy is above some threshold that it is either impossible or implausible for a correct implementation to miss.

I believe this is what the pseudorandom test set does and accuracy is required to be 100%. Is there something further that I'm missing?

I think it's fine to just call both functions for every test.

I believe this adds unnecessary confusion when there is a failed test as to which function is causing the failure.

  1. the tests will make more sense to students who read them.

At this point, even if this is the case, the added possibility of confusion created through various ambiguities, and/or the difficulties in trying to mitigate them, make including these features seem to be not worth it. I don't mind including ambiguity in exercises, since it provides 'real world' experience, but it depends on the audience. From what I gather from your concern in this point, you are looking out for less experienced students, so I would presume less ambiguity would be better?

E.g. for linearly separable datasets, I think the centroid of any triangle drawn from points of the same set will be classified correctly by any correct implementation.

Thanks! I was trying to think of the simple way you had mentioned, but I had lost the trees for the forest :)

Copy link
Contributor Author

@depial depial Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's necessary to test the training and classification separately, though we can choose to do so. I think it's fine to just call both functions for every test, as shown above.

Sorry, I just wanted to clarify, I am on board with a classification function if it's tested separately before being used in conjunction with the perceptron function.

It's really the unseen points I'm most hesitant to accept. I think they would be very appropriate in the case of SVM, but the nature of Perceptron just seems to make them more of a bane than a benefit.

Copy link
Contributor

@cmcaine cmcaine Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your belief that unseen points will cause ambiguity (test failures for valid decision boundaries) is making you think that they're a big problem.

But, as I hope the triangle centroid thing shows, we can generate unseen points with zero chance of them causing false test failures.

If you don't believe the triangle thing, then maybe you can believe that, for linearly separable datasets, any point contained by the convex hull of the points of one label will be given that label by any linear boundary-based classifier that has zero error on the training set. For a linear boundary and a 2D space that one should be provable geometrically with a ruler and some drawings.

Or you can easily show that any point on the line segment between the support vectors for a label must be classified correctly by any valid linear decision boundary.

Copy link
Contributor Author

@depial depial Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have failed to unmuddle my concern :) I'm okay with testing to demonstrate how classification of unseen points works, but testing as proposed is too easily confused with testing unseen points to verify the correctness of a decision boundary. My thoughts on why testing unseen points to verify correctness is undesirable follow:

I'm with you on the centroid idea (which I quite liked) and familiar with the other arguments you've given, so I'm not concerned about test failures for valid decision boundaries. The minor issue I have in the case of valid decision boundaries is that testing with unseen points to validate a decision boundary is simply redundant. Why it is redundant can be gleaned from your arguments on why and how we can choose correct unseen points: namely they are chosen entirely in reference to seen points.

One issue that concerns me involves invalid decision boundaries. With an invalid decision boundary, there is no guarantee how an unseen point will be classified (just as there is no guarantee how any point will be classified). There are two cases:

  1. The unseen point is classified incorrectly and there is a test failure. The student is alerted to the fact that their decision boundary is incorrect. This I'm okay with, although it is still redundant.
  2. The unseen point is classified correctly and the test passes. The student is not alerted to the fact their decision boundary is incorrect and either receives conflicting information from other tests or may indeed end up believing that their decision boundary is correct. This is the first aspect I dislike.

The other issue that concerns me is on a more psychological note: People often take partial information, over-generalize and create false impressions (e.g. stereotypes/prejudices/etc). That's how some of the more naive or inattentive students, seeing testing on unseen points in this way, could walk away with the mistaken impression that Perceptron can correctly classify any unseen point (not just carefully selected points). This is the second aspect I dislike.

So, with what the possible outcomes are from testing unseen points, redundant good info or unnecessary potentially misleading info, and the possible misconception about Perceptron that can be conveyed simply by including the testing, I don't feel the testing of unseen points in the proposed manner brings enough to the table to be included.

To boil it all down: I feel that 'testing' unseen points to demonstrate how classification works has value, but I think this is too easily confused with testing to verify correctness of the decision boundary. I currently can't think of tests which can eradicate my concerns above. However, there are related ideas we could demonstrate.

For example, we could show that the introduction of a new point to the data will (in all likelihood) result in a different decision boundary. This doesn't involve classification of unseen points though. As an aside: with SVM, we could introduce an element of a demonstration of unseen point classification switching with a before and after approach, but it just doesn't work with Perceptron because of the non-uniqueness of the decision boundary (i.e. we can't be guaranteed a switch in classification).

Do you have any other ideas on ways to demonstration classification of unseen points?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost vs benefit analysis that is going on in my head is the following:

  1. Testing unseen points to verify correctness is both redundant and carries downsides
  2. Testing unseen points to demonstrate classification is desirable, but unnecessary to satisfy the aim of the exercise.

The exercise is first and foremost meant to have students learn about the Perceptron algorithm. Including a demonstration of how classification works would be a bonus, but I just am not sure this type of testing environment properly supports demonstration of this kind.


#### Updating
Checking if an object is on one side of a hyperplane or another can be done by checking the normal vector which points to the object. The value will be positive, negative or zero, so all of the objects from a class should have normal vectors with the same sign. A zero value means the object is on the hyperplane, which we don't want to allow since its ambiguous. Checking the sign of a normal to a hyperplane might sound like it could be complicated, but it's actually quite easy. Simply plug in the coordinates for the object into the equation for the hyperplane and check the sign of the result. For example, we can look at two objects $v_1,v_2$ in relation to the hyperplane $[w_0, w_1, w_2] = [1, 1, 1]$:

$$v_1$$ $$[x_1, y_1] = [2, 2]$$ $$w_0 + w_1*x_1 + w_2*y_1 = 1 + 1*2 + 1*2 = 5 > 0$$
cmcaine marked this conversation as resolved.
Show resolved Hide resolved


$$v_2$$ $$[x_2,y_2]=[-2,-2]$$ $$w_0 + w_1*x_2 + w_2*y_2 = 1 + 1*(-2) + 1*(-2) = -3 < 0$$

If $v_1$ and $v_2$ have different labels, such as $1$ and $-1$ (like we will be using), then the hyperplane $[1, 1, 1]$ is a valid decision boundary for them.

Now that we know how to tell which side of the hyperplane an object lies on, we can look at how perceptron updates a hyperplane. If an object is on the correct side of the hyperplane, no update is performed on the weights. However, if we find an object on the wrong side, the update rule for the weights is:

$$[w_0', w_1', w_2'] = [w_0 \pm l_{class}, w_1 \pm x*l_{class}, w_2 \pm y*l_{class}]$$
Copy link
Contributor

@cmcaine cmcaine Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This equation and the next two paragraphs are not clear enough, imo.

The update rule is where the magic happens, so it would be nice to explain this better and try to give the student some insight into how the rule works.

The $\pm\mp$ stuff will also be impenetrable to most readers, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of the plus/minus stuff, since I agree it's confusing/distracting and it really isn't terribly important anyway.

The update rule is where the magic happens, so it would be nice to explain this better and try to give the student some insight into how the rule works.

Were you thinking along the lines of more explanation of how the update makes the line move? Or were you thinking more clarity on how to implement it? I had left the implementation details a little bit vague intentionally (stopping short of pseudocode), hoping to leave room for the student to be creative or do some research for their implementation. However, I'm aware that some students are at a learning level where this is frustrating. Let me know you thoughts :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I have time and energy to attempt the exercise myself during the week and maybe that will help me clarify my ideas :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$$[w_0', w_1', w_2'] = [w_0 \pm l_{class}, w_1 \pm xl_{class}, w_2 \pm yl_{class}]$$

We also don't support rendering equations as far as I can recall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't support rendering equations as far as I can recall.

Fixed.


Where $l_{class}=\pm 1$, according to the class of the object (i.e. its label), $x,y$ are the coordinates of the object, the $w_i$ are the weights of the hyperplane and the $w_i'$ are the weights of the updated hyperplane. The plus or minus signs are homogenous, so either all plus or all minus, and are determined by the choice of which class you define to be on the positive side of the hyperplane. Beware that only two out of the four possible combinations of class on positive side of the hyperplane and the plus/minus in the update are valid ($\pm \pm, \mp \mp$), with the other two ($\pm \mp, \mp \pm$) leading to infinite loops.

This update is repeated for each object in turn, and then the whole process repeated until there are no updates made to the hyperplane. All objects passing without an update means they have been successfully separated and you can return your decision boundary!

Note: Although the perceptron algorithm is deterministic, a decision boundary depends on initialization and is not unique in general, so the tests accept any hyperplane which fully separates the objects.
20 changes: 20 additions & 0 deletions exercises/practice/perceptron/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"authors": [
"depial"
],
"contributors": [
"cmcaine"
],
"files": {
"solution": [
"perceptron.jl"
],
"test": [
"runtests.jl"
],
"example": [
".meta/example.jl"
]
},
"blurb": "Given points and their labels, provide a hyperplane which separates them"
cmcaine marked this conversation as resolved.
Show resolved Hide resolved
}
8 changes: 8 additions & 0 deletions exercises/practice/perceptron/.meta/example.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
function perceptron(points, labels)
θ, pnts = [0, 0, 0], vcat.(1, points)
while true
θ_0 = θ
foreach(i -> labels[i]*θ'*pnts[i] ≤ 0 && (θ += labels[i]*pnts[i]), eachindex(pnts))
θ_0 == θ && return θ
end
end
depial marked this conversation as resolved.
Show resolved Hide resolved
25 changes: 25 additions & 0 deletions exercises/practice/perceptron/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# This is an auto-generated file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I don't see a perceptron exercise in problem-specifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I fixed this or not. I've deleted the auto-generated comment. I had to manually create the UUID since when I ran configlet, it kept saying everything was done even when there were no UUIDs in this file (sorry I can't remember the exact message). Let me know if I need to do something else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not very familiar with what configlet needs when we're writing a practice exercise from scratch (one where there is not an upstream definition in problem-specifications). Someone in @exercism/reviewers might know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at some problem-specifications, I'm wondering if psedorandom test generation is possible on the platform. Checking canonical-data.json for Alphametics shows each exercise with data such as a unique UUID, plus the input and expected result. If this file is necessary and needs the hardcoded input and expected output, although possible, it'll be a bit more work to produce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonical-data isn't required. You can see parametric tests in the rational-numbers exercise and elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'd forgotten about the testset at the end of Rational Numbers :)

#
# Regenerating this file via `configlet sync` will:
# - Recreate every `description` key/value pair
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion)
# - Preserve any other key/value pair
#
# As user-added comments (using the # character) will be removed when this file
# is regenerated, comments can be added via a `comment` key.

[728853d3-24de-4855-a452-6520b67dec23]
description = "Initial set"

[ed5bf871-3923-47ca-8346-5d640f9069a0]
description = "Initial set w/ opposite labels"

[15a9860e-f9be-46b1-86b2-989bd878c8a5]
description = "Hyperplane cannot pass through origin"

[52ba77fc-8983-4429-91dc-e64b2f625484]
description = "Hyperplane nearly parallel with y-axis"

[3e758bbd-5f72-447d-999f-cfa60b27bc26]
description = "Increasing Populations"
3 changes: 3 additions & 0 deletions exercises/practice/perceptron/perceptron.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function perceptron(points, labels)
# Perceptronize!
end
depial marked this conversation as resolved.
Show resolved Hide resolved
86 changes: 86 additions & 0 deletions exercises/practice/perceptron/runtests.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
using Test, Random
include("perceptron.jl")

function runtestset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include some tests with manually specified input data of just a few points to make this more approachable and also as good practice (e.g. first few tests could be spaces with just 2-4 points placed manually).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure if I've understood, but there are four tests with (six) manually specified points to illustrate a couple of different possible orientations of a hyperplane (testset "Low population"). After that the 40 pseudorandomly generated tests begin (testset "Increasing Populations"). Was this what you meant?

Copy link
Contributor

@cmcaine cmcaine Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we could take some of the manually specified examples out of the runtestset() function and test them without the support functions, just so that it's less mystifying to a student reading the tests.

And maybe we should have the student write their own function for finding the computed label of a point?

e.g.

# Student must implement both `perceptron(points, labels) -> boundary (a vector of 3 weights)` and `classify(boundary, point)`

@testset "Boundary is a vector of 3 weights" begin
    boundary = perceptron([[0,0], [3, 3]], [1, -1])
    @test eltype(boundary) <: Real
    @test length(boundary) == 3
end

@testset "Originally provided points should be classified correctly" begin
    boundary = perceptron([[0,0], [3, 3]], [1, -1])
    @test classify(boundary, [0, 0]) == 1
    @test classify(boundary, [3, 3]) == -1
end

@testset "Given 3 labeled points, an unseen point is classified correctly" begin
    # Adding more points constrains the location of the boundary so that we can test
    # the classification of unseen points.
    boundary = perceptron([[0,0], [1, 0], [0, 1]], [-1, 1, 1])
    @test classify(boundary, [0, 0]) == -1
    @test classify(boundary, [2, 5]) == 1
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I can throw a couple in there like your first examples, but I've got a question about the classify:
Due to the wide range of possible decision boundaries returned by Perceptron, beyond carefully selected examples, I'm not sure if testing classification of unseen points is viable. Also, since the algo is effectively just the three parts of classify + update + repeat, couldn't requiring a separate classification function introduce an unnecessary constraint on the task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

I don't see why it would constrain the student or the task for us to ask them to provide a classify function. What do you mean by that?

I agree that testing with unseen points is a bit tricky, but it's not super hard given that we control the input data and it's not too bad for us to either only include hand-written tests or to do some geometry to ensure our unseen pointd will always be classified correctly by a valid linear boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the learning rate trivial?

Sorry, over-generalization from me there :) I guess I'm considering the basic (pedagogical) Perceptron, i.e. one provided with small, dense, separable populations, since, under separability, the learning rate doesn't affect either the final loss or the upper bound of the number of errors the algorithm can make. That said, I was wrong to say the initial hyperplane affects these, since it doesn't either. It just seems non-trivial to me because it can be returned.

I'll try to make the necessary changes to the PR today and/or tomorrow :)

Copy link
Contributor Author

@depial depial Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything you would add (e.g. subtyping) to my suggestion for a slug, or do you think I could just copy and paste?

Edit: In my initial suggestion for the slug, conversion to Float64 is enforced, but the exercise (as presented) could also be handled entirely with integers. Should I drop all references to Float64? Something like:

mutable struct Perceptron
    # instantiates Perceptron with a decision boundary 
    # this struct can remain unmodified
    dbound
    Perceptron() = new([0, 0, 0])
end

function fit!(model::Perceptron, points, labels)
    # updates the field dbound of model (model.dbound) and returns it as a valid decision boundary
    # your code here
end

function predict(model::Perceptron, points)
    # returns a vector of the predicted labels of points against the model's decision boundary
    # your code here
end

It might make it appear cleaner and less intimidating for students unfamiliar with the type system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the struct is really adding much, so I'd remove it. You can pick what you like, tho.

If I try the exercise and hate the struct then I might veto it, but not until then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include some tests with manually specified input data of just a few points to make this more approachable and also as good practice (e.g. first few tests could be spaces with just 2-4 points placed manually).

Agree with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this

Could you be more specific? There are already four tests with six manually specified points which check for different possible orientations of a decision boundary.

Beyond this, we've had an extensive conversation about using "unseen points". (TLDR: I believe testing unseen points to be potentially more detrimental to understanding than beneficial)

The other ideas for tests were of the type to see if the student is returning the correct object (vector of three real numbers), etc. Is there something else you were thinking?


@testset "Low population" begin
@testset "Initial set" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [1, 1]]
labels = [1, 1, -1, -1, 1, 1]
reference = [1, 2, 1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Initial set w/ opposite labels" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [1, 1]]
labels = [-1, -1, 1, 1, -1, -1]
reference = [-1, -2, -1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Hyperplane cannot pass through origin" begin
points = [[1, 2], [3, 4], [-1, -2], [-3, -4], [2, 1], [-1, -1]]
labels = [1, 1, -1, -1, 1, 1]
reference = [-1, 3, 3]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
@testset "Hyperplane nearly parallel with y-axis" begin
points = [[0, 50], [0, -50], [-2, 0], [1, 50], [1, -50], [2, 0]]
labels = [-1, -1, -1, 1, 1, 1]
reference = [2, 0, -1]
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
end

@testset "Increasing Populations" begin
for n in 10:50
points, labels, reference = population(n, 25)
hyperplane = perceptron(points, labels)
@test dotest(points, labels, hyperplane, reference)
end
end

end


function population(n, bound)
# Builds a population of n points with labels {1, -1} in area bound x bound around a reference hyperplane
# Returns linearly separable points, labels and reference hyperplane

vertical = !iszero(n % 10) #every tenth test has vertical reference hyperplane
x, y, b = rand(-bound:bound), rand(-bound:bound)*vertical, rand(-bound÷2:bound÷2)
y_intercept = -b ÷ (iszero(y) ? 1 : y)
points, labels, hyperplane = [], [], [b, x, y]
while n > 0
# points are centered on y-intercept, but not x-intercept so distributions can be lopsided
point = [rand(-bound:bound), y_intercept + rand(-bound:bound)]
label = point' * [x, y] + b
if !iszero(label)
push!(points, point)
push!(labels, sign(label))
n -= 1
end
end

points, labels, hyperplane
end

function dotest(points, labels, hyperplane, reference)
points = vcat.(1, points)
test = reduce(hcat, points)' * hyperplane .* labels
if all(>(0), test)
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nSeparated! And the normal points towards the positively labeled side\n")
return true
elseif all(<(0), test)
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nSeparated! But the normal points towards the negatively labeled side\n")
return true
else
println("Reference hyperplane = $reference\nYour hyperplane = $hyperplane\nThe sides are not properly separated...\n")
return false
end
end

Random.seed!(42) # set seed for deterministic test set
runtestset()
Loading